Blog

Blog

Filtering Schematiq Tables in Excel: allBut Settled

December 7, 2017 | Marcus Nielsen

For anyone who works with large datasets, filtering is indispensable. When you are starting from a huge number of data rows, in order to draw any meaningful conclusions you need to carefully decide which items are useful (and should be included), and which are irrelevant.

In Excel, we could just delete the offending rows... but then our data would be lost forever (unless we press "Undo")! Much better, we can use the "Filter" option - you may be familiar with the little arrows and drop-down menus that appear next to your table headers when in filtering mode. You can manually select which items to include, or you can filter using more advanced rules, e.g. only include rows where "Item count" is greater than 10.

This works well for Excel tables - but what about Schematiq tables? How can we combine the functionality of Excel filtering with the speed and efficiency of Schematiq?

The answer is to use Schematiq's tbl.Filter function:

= tbl.Filter(table, inputColumnNames, function, [allBut])

tbl.Filter allows us to extract rows from our input table by choosing input columns and applying a filtering function to them (this should become clearer in the example below).

Additionally, as of the Schematiq v4.0 update, tbl.Filter now takes a new optional argument: allBut. By default, allBut is FALSE; however, if set to TRUE, instead of the new table containing only rows matching the filter, it contains all of the original rows but for those matching the filter.

In this blog post, we will explore a scenario where we can use the allBut argument to easily exclude unwanted rows in our data.

An Example: Sports Centre Data

The data we will look at will be from a (fictional) sports centre, which records daily attendances to each of their types of sports classes, and usage of their football pitch.

Excel table.png

Suppose the centre wishes to rebalance the number of places available in each class, so that it can maximise the overall attendance. Looking at the rows of data, we can see that the "Football - Pitch Hire" row won't be relevant for our analysis, because it isn't actually a "class", as such. How can we create a table excluding this row?

As usual, the first thing we do is create a Schematiq table from the Excel data:

D18:
= tbl.Create(B7:H13, B6:H6)
= tbl.Create([data], [columnNames])

tblCreate.png

Looking in the Viewer, we can see that we have all our data, with the correct headers. Now, we can get to work on processing the data - enter our tbl.Filter function. Just like Excel's "Filter" option, we can select certain rows to include or exclude from our dataset, and we create a new table with only the data items that we want.

First, we need to create a snippet which will match the "Football - Pitch Hire" row:

D20:
x => x = "Football - Pitch Hire"

Similar to an Excel function, this snippet will return TRUE if the input (x) is "Football - Pitch Hire", and FALSE otherwise.

Using tbl.Filter, we will apply this with the "Class" column as an input - so that for each row, we look at the "Class" item and calculate the result of our filtering function. Then:

- If allBut is FALSE, the new table will contain only the TRUE rows.
- If allBut is TRUE, the new table will contain only the FALSE rows.

We can now filter our table:

D22:
= tbl.Filter(D18, "Class", D20, TRUE)
= tbl.Filter(table, inputColumnNames, function, [allBut])

tblFilter allBut football.png

We can use the Viewer to check that the new table contains all the original data, apart from the "Football - Pitch Hire" row - it does!

Now we have a table with only the rows that we want, and we can continue our analysis using other Schematiq add-in functions such as tbl.CalculateColumn.

Note that the allBut argument gives us a really easy way to partition a table - for a given snippet, we can create one table with all the matching rows (allBut = FALSE), and another with all the rows which don't match (allBut = TRUE). This way, we don't lose any rows from the original table, and we can choose to process each of our two new tables separately.


PRO-tip: Using advanced criteria to filter tables

Searching for substrings

Suppose that instead of excluding football pitch hire, we wanted to exclude the three swimming classes. Again, we can do this with tbl.Filter, but we will need a more complicated function to filter the rows.

We would like to be able to match all rows containing the word "Swimming" in the "Class" column. How can we create a function which does this for us?

We can make use of Excel's FIND function, which searches for a substring within the provided text, and returns the starting position if found. If not found, it returns #VALUE. Therefore, we can create a filtering function by combining FIND with ISNUMBER:

D25:
x => ISNUMBER(FIND("Swimming", x))

This returns TRUE if "Swimming" is found within the input (x), and FALSE otherwise. We want to search for "Swimming" within the "Class" column, so we use this as our inputColumnName argument to tbl.Filter:

D27:
=tbl.Filter(D18, "Class", D25, TRUE)

tblFilter allBut swimming.png

The Viewer shows that we've successfully managed to exclude the swimming classes from the new table!

 

Filtering using multiple input columns

Why stop at filtering by a single column? Using our snippet functions, we can use multiple columns as inputs - for instance, to include (i.e. allBut = FALSE) only those rows where "Class" is "Yoga" or "Places / day" is less than 10, we create a snippet with two inputs:

D30:
(x, y) => OR(x = "Yoga", y < 10)
D32:
=tbl.Filter(D18, {"Class", "Places / day"}, D30, FALSE)

tblFilter Yoga and Places-962056-edited.png


Try it yourself

If you'd like to follow this example, you can find the sports centre spreadsheet here. This sheet requires an active Schematiq subscription - please contact us for a trial licence.

Read more


Tags: Excel Schematiq Table Functions tbl.Filter Filter