Blog

Blog

Advanced Table Filtering in Excel: Using Regular Expressions and Schematiq to Filter Data

December 18, 2017 | Marcus Nielsen

Ever wished that you could use regular expressions (regex) in Excel? Or perhaps you've been looking for a better way to filter your tables? In this blog, we'll show how Schematiq can help, by bringing the power of regex to your Excel spreadsheets.

regexIsMatch - Telephone numbers

Straight to the point (for those in a hurry!)

Excel's 'Filter' functionality can be really useful, but sometimes you need more control - and regex is the perfect way to do it. There are only three steps:

  1. Create a Schematiq table containing your data
  2. Filter this table using regex
  3. Get the filtered values back into Excel

 

1. Create a data-linked Schematiq table

tblCreate - zoomed

 

2. Filter in the Schematiq Viewer using regex.IsMatch()

filter

 

3. Get the live values back into Excel

values

 

What are you waiting for?

Get your free trial of Schematiq and start applying regex filtering for yourself! You can find the example spreadsheet here.

Get Free Schematiq Trial


A more detailed example (for those with a curious streak)

In this example (and the one above), we assume some knowledge of table filtering using Schematiq - but if this is new to you, don't fear! We've put together a couple of blog articles to explain the basics:

  1. tbl.Filter: An Introduction to Filtering With Schematiq
  2. Filtering Schematiq tables in Excel: allBut settled

And if you haven't already, get your free trial of Schematiq to follow along! You can download the spreadsheet used in the example below here.

 


In the second of these articles, we filtered the table below by "Class" (a column of text entries).

SportsClassParticipationData.png

When we knew the exact value we wanted, filtering was easy - the snippet was:

x => x = "Football - Pitch Hire"

But when we wanted to search by a substring, things quickly became more complicated. For instance, to filter items whose class contains "Swimming", we used this snippet:

x => ISNUMBER(FIND("Swimming", x))

Looks quite messy, doesn't it? Wouldn't it be nicer if we could use a single function instead of ISNUMBER and FIND? And what if we could also be much more specific with our searches - so we could always match exactly the text we want?

Regular expressions

You may well have come across wildcards when using Find/Replace tools: you can restrict to certain file types by searching for "*.txt", or "*.xlsx", where the * denotes any series of characters.

Regular expressions are similar to wildcards - but a lot more powerful..

We'll only cover a few basic examples here to show how to use Schematiq's regular expression functions in your spreadsheet, but to really appreciate how useful they can be, we'd recommend experimenting yourself - online tools such as RegExr allow you to visualise your regex matches as you type, and using a cheatsheet makes it easy to learn new search options.

Using regular expressions in Schematiq

Schematiq introduces a new family of functions that allow you to use regex in Excel. For now, we'll look at just one of these functions - regex.IsMatch:

= regex.IsMatch(text, pattern, [caseSensitive], [startAt])

Like FIND, this function takes some input text, and searches it for a regex pattern (a string which specifies the format of the text that we want to find). Additional options allow you to choose whether we want a case sensitive search (FALSE by default), and where to start searching from in the text, if not from the beginning. Unlike FIND, it returns TRUE/FALSE if the pattern is found, instead of the starting position - this makes it useful for filtering!

To show how to use it, let's apply it to an example from our Sports Participation data. To follow along yourself, you can download the spreadsheet used in these examples here. (This will require an active Schematiq subscription - please contact us below to get your free trial!)

We had earlier tried to search for "Swimming" within the "Class" column using a combination of ISNUMBER and FIND - let's remind ourselves how this would work as a cell formula:

C4:
= ISNUMBER(FIND("Swimming", "Swimming - Level 1"))

= TRUE

Now, let's use regex.IsMatch to make things simpler. To search for substrings like "Swimming" using regex, we don't need to do anything clever with the pattern - we can use the substring as it is. The function can then be entered into a cell, just like a built-in Excel function:

regexIsMatch - Swimming

C5:
= regex.IsMatch("Swimming - Level 1", "Swimming")

= TRUE

C6:
= regex.IsMatch("Football - Pitch Hire", "Swimming")

= FALSE

Filtering tables using regex.IsMatch

We can combine regex.IsMatch with tbl.Filter to make it easier to filter Schematiq tables by a text column.

Continuing with our Sports Participation example, we can use regex.IsMatch to create a new filtering snippet to match all classes containing the word "Swimming":

C22:
x => regex.IsMatch(x, "Swimming")

We can then apply this filter function using tbl.Filter:

C23:
= tbl.Filter(C21, "Class", C22)

FilteredTable.png

PRO-tip: Advanced regular expressions

Regular expressions are hugely versatile, and we can do a lot more than simply matching exact text strings or finding substrings. Let's look at two more advanced examples:

1. Using special characters: "." and " *"

First, let's introduce some special characters:

  • . (full stop) matches any single character - for instance, a pattern of "gr.y" will match "grey", "gray", or indeed "grxy" or "gr-y".

  • * (asterisk) is used after another character, and tells the regex search to match the character 0 or more times - so "gra*y" (where "*" is applied to "a") will match "gry", "gray" or "graaaay" (with as many "a"s as desired).

In combination, these two characters can be used to search for text of the form "word1__[some other text]__word2". The pattern ".*" will match 0 or more characters of any type - so "word1.*word2" will match any string containing word1 followed by word 2.

For a more concrete example, let's return to our sports classes. Suppose we wanted to find all "Swimming" classes of "Level 3". (Of course, we know that only one such class exists - but this might not be the case in general.)

We can construct a regex pattern that will match text containing the word "Swimming" and then (at some later point) the words "Level 3":

"Swimming.*Level 3"

We can then apply this regex pattern using regex.IsMatch:

C28:
= regex.IsMatch("Swimming - Level 3", "Swimming.*Level 3")

= TRUE

C29:
= regex.IsMatch("Swimming - Level 1", "Swimming.*Level 3")

= FALSE

2. Character sets: "Level [1-2]"

What about if we wanted to match all Swimming classes at Level 1 or 2?

Regex has a handy feature called character sets, which allow us to match any character within a specified range. For example:

  • "[a-z]" will match any single lowercase alphabetical character.
  • "[1-5]" will match any numeric character between 1 and 5.

So to match swimming classes of level 1 or 2, we can replace "Level 3" in the previous example with "Level [1-2]":

C31:
= regex.IsMatch("Swimming - Level 2", "Swimming.*Level [1-2]")

= TRUE

C32:
= regex.IsMatch("Swimming - Level 3", "Swimming.*Level [1-2]")

= FALSE


Try it yourself

You can download the spreadsheet used in these examples here.

Read more

If you'd like to find out more about the ways in which Schematiq can help to simplify filtering tables in Excel, check out our other blog articles:

Tags: Excel Schematiq Table Functions Filter tbl.Filter Regex regex.IsMatch