Have you ever been frustrated by the limitations of Excel's FIND function, and wished that you could use regular expressions (regex) in Excel? Or perhaps you've just been looking for a better way to filter your data table using a text column? In this blog, we'll show how Schematiq can help, by bringing the power of regex to your Excel spreadsheets.
We'll assume some knowledge of table filtering using Schematiq - but if this is new to you, don't fear! We've put together a couple of blog articles to explain the basics:
- tbl.Filter: An Introduction to Filtering With Schematiq
- Filtering Schematiq tables in Excel: allBut settled
In the second of these articles, we filtered the table below by "Class" (a column of text entries).
When we knew the exact value we wanted, filtering was easy - the snippet was:
x => x = "Football - Pitch Hire"
But when we wanted to search by a substring, things quickly became more complicated. For instance, to filter items whose class contains "Swimming", we used this snippet:
x => ISNUMBER(FIND("Swimming", x))
Looks quite messy, doesn't it? Wouldn't it be nicer if we could use a single function instead of ISNUMBER and FIND? And what if we could also be much more specific with our searches - so we could always match exactly the text we want?
You may well have come across wildcards when using Find/Replace tools: you can restrict to certain file types by searching for "*.txt", or "*.xlsx", where the * denotes any series of characters.
Regular expressions are similar to wildcards - but a lot more powerful..
We'll only cover a few basic examples here to show how to use Schematiq's regular expression functions in your spreadsheet, but to really appreciate how useful they can be, we'd recommend experimenting yourself - online tools such as RegExr allow you to visualise your regex matches as you type, and using a cheatsheet makes it easy to learn new search options.
Using regular expressions in Schematiq
Schematiq introduces a new family of functions that allow you to use regex in Excel. For now, we'll look at just one of these functions - regex.IsMatch:
= regex.IsMatch(text, pattern, [caseSensitive], [startAt])
Like FIND, this function takes some input text, and searches it for a regex pattern (a string which specifies the format of the text that we want to find). Additional options allow you to choose whether we want a case sensitive search (FALSE by default), and where to start searching from in the text, if not from the beginning. Unlike FIND, it returns TRUE/FALSE if the pattern is found, instead of the starting position - this makes it useful for filtering!
To show how to use it, let's apply it to an example from our Sports Participation data. To follow along yourself, you can download the spreadsheet used in these examples here. (This will require an active Schematiq subscription - please contact us to request a trial licence.)
We had earlier tried to search for "Swimming" within the "Class" column using a combination of ISNUMBER and FIND - let's remind ourselves how this would work as a cell formula:
= ISNUMBER(FIND("Swimming", "Swimming - Level 1"))
Now, let's use regex.IsMatch to make things simpler. To search for substrings like "Swimming" using regex, we don't need to do anything clever with the pattern - we can use the substring as it is. The function can then be entered into a cell, just like a built-in Excel function:
= regex.IsMatch("Swimming - Level 1", "Swimming")
= regex.IsMatch("Football - Pitch Hire", "Swimming")
Filtering tables using regex.IsMatch
We can combine regex.IsMatch with tbl.Filter to make it easier to filter Schematiq tables by a text column.
Continuing with our Sports Participation example, we can use regex.IsMatch to create a new filtering snippet to match all classes containing the word "Swimming":
x => regex.IsMatch(x, "Swimming")
We can then apply this filter function using tbl.Filter:
= tbl.Filter(C21, "Class", C22)
PRO-tip: Advanced regular expressions
Regular expressions are hugely versatile, and we can do a lot more than simply matching exact text strings or finding substrings. Let's look at two more advanced examples:
1. Using special characters: "." and " *"
First, let's introduce some special characters:
- . (full stop) matches any single character - for instance, a pattern of "gr.y" will match "grey", "gray", or indeed "grxy" or "gr-y".
- * (asterisk) is used after another character, and tells the regex search to match the character 0 or more times - so "gra*y" (where "*" is applied to "a") will match "gry", "gray" or "graaaay" (with as many "a"s as desired).
In combination, these two characters can be used to search for text of the form "word1__[some other text]__word2". The pattern ".*" will match 0 or more characters of any type - so "word1.*word2" will match any string containing word1 followed by word 2.
For a more concrete example, let's return to our sports classes. Suppose we wanted to find all "Swimming" classes of "Level 3". (Of course, we know that only one such class exists - but this might not be the case in general.)
We can construct a regex pattern that will match text containing the word "Swimming" and then (at some later point) the words "Level 3":
We can then apply this regex pattern using regex.IsMatch:
= regex.IsMatch("Swimming - Level 3", "Swimming.*Level 3")
= regex.IsMatch("Swimming - Level 1", "Swimming.*Level 3")
2. Character sets: "Level [1-2]"
What about if we wanted to match all Swimming classes at Level 1 or 2?
Regex has a handy feature called character sets, which allow us to match any character within a specified range. For example:
- "[a-z]" will match any single lowercase alphabetical character.
- "[1-5]" will match any numeric character between 1 and 5.
So to match swimming classes of level 1 or 2, we can replace "Level 3" in the previous example with "Level [1-2]":
= regex.IsMatch("Swimming - Level 2", "Swimming.*Level [1-2]")
= regex.IsMatch("Swimming - Level 3", "Swimming.*Level [1-2]")
Try it yourself
If you'd like to find out more about the ways in which Schematiq can help to simplify filtering tables in Excel, check out our other blog articles: