This blog post explains how to import a CSV file into Excel using Schematiq, and how this can help with data cleaning and re-formatting.
Using a rich data source to gain insights can be great, especially when that source contains exactly the data that you're looking for, and in precisely the right format. This type of analysis is made for Excel, and a previous blog post explains how to use Schematiq within Excel to further improve this process.
But the first step is to get hold of this data - and this can be frustrating work.
You may, in some fortunate cases, have started with a well-formatted CSV file. But sometimes your data-source… is maybe not so kind.
How can we improve this experience of loading, cleaning and formatting, so that we can be un-hindered in our analysis?
This post deals with actually getting our data from a local file source, in this case a comma-separated-values (CSV) file. Let's also suppose that the creator of our CSV thinks that the “C” stands for semi-colon, and that “V” is a loose approximation to the values we want…so some data cleaning is required...
We’ll show how Schematiq achieves this with less manual work than Excel alone.
Let’s take a look at some example “JanuarySales.csv” data, if we try to load it directly in to Excel:
The data has been given individual columns in our sheet – although we would still need to fix the upper-case characters in the “Item” column.
If we wanted to do this for a different CSV file for every month, we’re spending quite a bit of our time applying the same formatting; each time contemplating if this task is quite worth a macro!
A better way?
We can use Schematiq to insert the CSV file directly in to Excel. We can keep this either as a Schematiq table (for further editing and processing), or use the "Get Values" wizard to paste these in to the Excel sheet. This works for a single CSV file, but below we will show how to import and combine the data in multiple CSV files using the manual tbl.ImportDelimited function.
Having seen how we can use the wizard to import the same CSV file, let’s now load this file directly in to Excel using Schematiq’s file.Open function:
C7: = file.Open(path)
We see the file contents in the Viewer, and can load this directly in to a Schematiq table, using tbl.ImportDelimited:
C9: = tbl.ImportDelimited(source, [delimiters])
= tbl.ImportDelimited(C7, “;”)
We can fix the casing for every row of our table, by applying Excel’s LOWER function in a single function call to tbl.CalculateColumn:
C11: = tbl.CalculateColumn(table, [inputColumnNames], function)
= tbl.CalculateColumn(C9, “Item”, “LOWER”)
Creating our own function
These steps have shown us that given a file name in cell C5, we are able to open the file, create a Schematiq table of the data, and finally format the result.
This behaviour can be re-used for any other file, and was much quicker to create than recording a macro to change every column. To do this, we specify to Schematiq that this behaviour should be copied for different values of the input call C5.
We can do this using fn.FromTemplate:
C13: = fn.FromTemplate(templateRange, [inputs], [outputs])
= fn.FromTemplate(B5:C11, C5, C11)
This has created a template function that we can apply to format and load in CSV files for every month, without manually formatting each one.
Opening multiple files from a folder
We could have dozens of CSV files, all of which we’d like to open and analyse – and Schematiq provides the file.ListFiles function to find all matching files, by searching within a specified folder – in this case we’re interested in all CSV files:
C16: = file.ListFiles(searchPattern)
Now we have a list of files, we can apply our tbl.CalculateColumn function again, to apply this function to every row. Since the result of loading a file produces a table of the data, this means that every row will contain a single loaded table for that month:
=tbl.CalculateColumn(C16, D18, C13, E18)
Using Schematiq’s tbl.Unpack function will give each row in "Loaded Sales" its own row in the main table, giving 10 rows in total - one for each sale. Then we can select the three columns from the table that we need:
C20: = tbl.Unpack(table, [columnToUnpack])
C22: = tbl.SelectColumns(table, columnNames)
Finally to conclude, we would like a summary table consisting of the specific orders placed by each person. We can do this using Schematiq’s tbl.Group function where we can specify which columns we want to make distinct, and which columns we want to aggregate. The following two reports are possible:
1) Group by the item only:
C26 = tbl.Group(table, [columnsToGroupBy], [columnsToAggregate])
= tbl.Group(C22, D26, E26)
2) Group by the person and the item they ordered:
=tbl.Group(C22, D28:D29, E28)
This example has shown how to load and clean multiple poorly-formatted files using Schematiq. These files can then be used for further analysis, and the related blog post here demonstrates how to join this data with a price list using Schematiq without the use of VLOOKUP.
You can download the workbook used in this example here. This will require an active Schematiq licence – if you would like a trial, please contact us.