1 Vincent Square
Use Regex directly in Excel – and easily scale to filter large data sets.
10 September 2021
Ever wished that you could use regular expressions (regex) in Excel? Or perhaps you’ve been looking for a better way to filter your tables? In this blog, we’ll show how Schematiq can help, by bringing the power of regex to your Excel spreadsheets.
Straight to the point (for those in a hurry!)
Excel’s ‘Filter’ functionality can be really useful, but sometimes you need more control – and regex is the perfect way to do it. There are only three steps:
- Create a Schematiq table containing your data
- Filter this table using regex
- Get the filtered values back into Excel
1. Create a data-linked Schematiq table
2. Filter in the Schematiq Viewer using regex.IsMatch()
3. Get the live values back into Excel
A more detailed example (for those of you with a curious streak)
In this example (and the one above), we assume some knowledge of table filtering using Schematiq – but if this is new to you, don’t fear! We’ve put together a couple of blog articles to explain the basics:
- tbl.Filter: An Introduction to Filtering with Schematiq
- Filtering Schematiq Tables in Excel: all But settled
In the second of these articles, we filtered the table below by ‘Class’ (a column of text entries)
When we knew the exact value we wanted, filtering was easy – the snippet was:
x => x = “Football – Pitch Hire”
But when we wanted to search by a substring, things quickly became more complicated. For instance, to filter items whose class contains “Swimming”, we used this snippet:
x => ISNUMBER(FIND(“Swimming”, x))
Looks quite messy, doesn’t it? Wouldn’t it be nicer if we could use a single function instead of ISNUMBER and FIND? And what if we could also be much more specific with our searches – so we could always match exactly the text we want?
You may well have come across wildcards when using Find/Replace tools: you can restrict to certain file types by searching for “*.txt”, or “*.xlsx”, where the * denotes any series of characters.
Regular expressions are similar to wildcards – but a lot more powerful..
We’ll only cover a few basic examples here to show how to use Schematiq’s regular expression functions in your spreadsheet, but to really appreciate how useful they can be, we’d recommend experimenting yourself – online tools such as RegExr allow you to visualise your regex matches as you type, and using a cheatsheet makes it easy to learn new search options.
Using regular expressions in Schematiq
Schematiq introduces a new family of functions that allow you to use regex in Excel. For now, we’ll look at just one of these functions – regex.IsMatch:
= regex.IsMatch(text, pattern, [caseSensitive], [startAt])
Like FIND, this function takes some input text, and searches it for a regex pattern (a string which specifies the format of the text that we want to find). Additional options allow you to choose whether we want a case sensitive search (FALSE by default), and where to start searching from in the text, if not from the beginning. Unlike FIND, it returns TRUE/FALSE if the pattern is found, instead of the starting position – this makes it useful for filtering!
To show how to use it, let’s apply it to an example from our Sports Participation data. To follow along yourself, you can download the spreadsheet used in these examples here. (This will require an active Schematiq subscription – please contact us below to get your free trial!)
We had earlier tried to search for “Swimming” within the “Class” column using a combination of ISNUMBER and FIND – let’s remind ourselves how this would work as a cell formula:
= ISNUMBER(FIND(“Swimming”, “Swimming – Level 1”))
Now, let’s use regex.IsMatch to make things simpler. To search for substrings like “Swimming” using regex, we don’t need to do anything clever with the pattern – we can use the substring as it is. The function can then be entered into a cell, just like a built-in Excel function:
= regex.IsMatch(“Swimming – Level 1”, “Swimming”)
= regex.IsMatch(“Football – Pitch Hire”, “Swimming”)
Filtering tables using regex.IsMatch
We can combine regex.IsMatch with tbl.Filter to make it easier to filter Schematiq tables by a text column.
Continuing with our Sports Participation example, we can use regex.IsMatch to create a new filtering snippet to match all classes containing the word “Swimming”:
x => regex.IsMatch(x, “Swimming”)
We can then apply this filter function using tbl.Filter:
= tbl.Filter(C21, “Class”, C22)
PRO-tip: Advanced regular expressions
Regular expressions are hugely versatile, and we can do a lot more than simply matching exact text strings or finding substrings. Let’s look at two more advanced examples:
1. Using special characters: “.” and ” *”
First, let’s introduce some special characters:
. (full stop) matches any single character – for instance, a pattern of “gr.y” will match “grey”, “gray”, or indeed “grxy” or “gr-y”.
* (asterisk) is used after another character, and tells the regex search to match the character 0 or more times – so “gra*y” (where “*” is applied to “a”) will match “gry”, “gray” or “graaaay” (with as many “a”s as desired).
In combination, these two characters can be used to search for text of the form “word1__[some other text]__word2”. The pattern “.*” will match 0 or more characters of any type – so “word1.*word2” will match any string containing word1 followed by word 2.
For a more concrete example, let’s return to our sports classes. Suppose we wanted to find all “Swimming” classes of “Level 3”. (Of course, we know that only one such class exists – but this might not be the case in general.)
We can construct a regex pattern that will match text containing the word “Swimming” and then (at some later point) the words “Level 3”:
We can then apply this regex pattern using regex.IsMatch:
= regex.IsMatch(“Swimming – Level 3”, “Swimming.*Level 3”)
= regex.IsMatch(“Swimming – Level 1”, “Swimming.*Level 3”)
2. Character sets: “Level [1-2]”
What about if we wanted to match all Swimming classes at Level 1 or 2?
Regex has a handy feature called character sets, which allow us to match any character within a specified range. For example:
“[a-z]” will match any single lowercase alphabetical character.
“[1-5]” will match any numeric character between 1 and 5.
So to match swimming classes of level 1 or 2, we can replace “Level 3” in the previous example with “Level [1-2]“:
= regex.IsMatch(“Swimming – Level 2”, “Swimming.*Level [1-2]”)
= regex.IsMatch(“Swimming – Level 3”, “Swimming.*Level [1-2]”)
If you would like to find out more, schedule a demo or even just get some advice on the best approach to quantifying and managing spreadsheet risk, please do get in touch.Contact Us