Datasets

In this module, we will use the dataset and analysis that you created in Module 1 (Build your first dashboard).

In Dataset layer, you can pull in data from different sources, define join criteria, apply preliminary filters, provide more business friendly names to columns and prepare the data for visualizing in analyis layer. We will touch upon the most important aspects pertaining to datasets here and will work with some of these in the exercises that follow.

Query modes
QuickSight Datasets can work in either Direct query or SPICE mode.
In Direct query mode, queries are executed against backend data source each time a dashboard/analysis (that uses the dataset) is loaded.
In SPICE mode, QuickSight extracts a point in time snap shot of the data - pulling from all tables defined in the dataset per joins specified therein. This extracted data is stored in QuickSight SPICE layer and is used to drive all requests against that dataset. Data in SPICE layer can be refreshed on a schedule (which can be set from UI) or triggered as last step in your data refresh pipeline via create-ingestion API call.
Query mode can be switched back and forth without impacting the meta data defined in the dataset.

Using SPICE mode gets you two benefits.

  • Scale - QuickSight handles all the load without burdening the underlying data source. So, you don’t have to worry if it is 10 users or 10K users hitting a dataset via one or more dashboards.
  • Speed - SPICE layer is built for QuickSight with performance in mind and provides super fast dashboard loads.

Dynamic query generation
When used in direct query mode, QuickSight can generate smart queries that uses only required tables rather than joining in all tables defined in the dataset. While defining joins, you can specify which table has unique keys and this helps QuickSight decide whether it is safe to drop the table from the query if no fields are being pulled from it.

Custom SQL
If you already have a query that say pulls data from 20 tables and don’t want to remap all the relationships in the UI, you can plug in your query as a Custom SQL in the dataset. (This is relevant to live backend sources; Not applicable to file uploads)
The Custom SQL will act as a regular table in the dataset and can be joined with other tables therein.

Field Folders
If you have a large number of fields in your dataset, field folders can be used to organize the fields. You can create one nested level of field folders as well. For eg - You can create a field folder called Dimensions and have child folders - Product Dim, Date Dim etc within it and actual dimension columns within these child folders.
The folder structure created in DataSets will be visible in analysis layer.

Centralized calculations
Standard calculations that you want to make available to all your BI authors can be included in the dataset itself. In SPICE datasets, row level calculations will be evaluated at data refresh time and the result will be persisted in the dataset. This will help reduce the extent of computation that needs to happen at dashboard/analysis load. For aggregate calculations, only the calculation text is stored and it gets evaluated in context of the visual within the dashboard/analysis at load time.

Dataset as source
In the near future, we will allow you to use a dataset within another dataset (just as you would use a table). A sneak peek of this is provided in the exercises section.

Versioning
Current way to version control datasets is to extract it’s definition using describe-data-set API and store it externally in your code repository. We will soon be releasing in product versioning for datasets which will allow you to see older versions and revert to an older version as and when needed.