QuickSight let’s you integrate your SageMaker models while pulling data into SPICE datasets. This way, you don’t have to build the processes to feed data to SageMaker and bring the enriched data back to QuickSight. Rather, you are allowed to select the model directly in QuickSight’s dataset layer and map the input & output fields. Each time the dataset is refreshed, the SageMaker model is run in batch transform mode, the enriched data is pulled in SPICE and made available for visualizing.
Exercise 1 - Create a SageMaker role
We will first create a role that will be used to quickly create a sample binary classification model in SageMaker.
Select SageMaker from service list and click Next: Permissions button.
Click Next: Tags button.
Click Next: Review button.
Enter SageMakerRoleForQuickSightWorkshop as the role name.
Click Create role button.
Leave the tab open. We can reuse it to launch SageMaker console later on.
Exercise 2 - Create a SageMaker model
SageMaker Studio’s Autopilot feature enables us to upload a training data set and have SageMaker create a model that is best suited to make predictions based on the training data. This is a really powerful capability that lets us unleash the power of Machine Learning without getting into the nitty gritty details. However, this process takes several hours as SageMaker has to run through 250 model variants to pick out the best model. So, for purposes of this workshop, we have already run the autopilot process ahead of time. We used a training dataset wherein for prior year’s orders, we have a field indicating whether the order was completed within SLA or not. There could be various factors driving this - like the country where sales is being made, the sales team involved or the actual product itself. The model generated from this training data will be able to take the current order details and predict whether these new orders are likely to be completed within SLA or not. Let’s now create the best fit model into your SageMaker environment using the following steps. [If you want to try out the Autopilot feature, the training dataset used to train this model is provided further down the page]
In Cloud9, open SageMaker Integration folder and run Exercise2-1.py script.
Exercise 3 - Enable QuickSight to access SageMaker
In QuickSight (admin user) tab, Click Username ** from top right and choose Manage QuickSight.
In Management console, click Security & permissions from left panel.
Click Add or remove button.
Scroll down and check the box against Amazon SageMaker.
Click Update button.
Click QuickSight icon to exit management console.
Exercise 4 - Create a dataset
Let’s create a copy of SaaS-Sales dataset to try out the SageMaker model we created in last step.
If you already have SaaS-Sales dataset created from Author Workshop, use the following steps to create and open a copy of it. (See second set of instructions if you don’t have this dataset created already)
In QuickSight, from datasets view, click on SaaS-Sales dataset.
From dataset menu, click Duplicate dataset button.
Add -SM suffix to dataset name.
Click Duplicate button.
Click on SaaS-Sales-SM dataset (refresh browser if this dataset is now visible yet) and click Edit dataset button.
Move on to Exercise 5.
If you don’t have SaaS-Sales dataset, take the following steps instead.
In QuickSight, from datasets view, click New dataset button.
Select Upload a file option and choose the SaaS-Sales.csv file from local.
Click Edit settings and prepare data button.
Add -SM suffix to dataset name.
Exercise 5 - Augment with SageMaker
We will now configure the SageMaker model from SaaS-Sales-SM dataset.
Download schema file
This file contains details on all the input fields that the model needs, the output field that it predicts and the kind of batch transform compute instance that should be used to run the model.
From SaaS-Sales-SM dataset’s edit interface, click Augment with SageMaker button.
From Model drop down, select QSWS-OrderCompletionSLAPrediction.
Click in Schema box and select the OrderCompletionSLASchema.json from local.
Click Upload button.
Note that the input and output field list got populated and click the Next button.
On Review inputs screen, note that fields from dataset has been correctly mapped. If names are not identical, you can do the mapping manually as well. Click Next button.
On Review outputs screen, note that the output field name has been populated. You can change the name if needed. Click Prepare data button.
Click Save & Publish / Save.
Click QuickSight icon to exit dataset edit screen.
Exercise 6 - Optional - Check SageMaker console
From AWS Console (You can reuse the IAM tab from exercise 1), search and launch SageMaker.
Click Inference > Models from left panel and see the QSWS-OrderCompletionSLAPrediction model created in Exercise 2.
Click Batch transform jobs from left panel (Under Inference section itself). You should be able to see the job triggered from QuickSight dataset. It will have a quicksight-auto-generated- prefix.
Exercise 7 - Optional - Watch autopilot train the model
The batch transform job takes 5-6 minutes to run.
Meanwhile, you can catch a glimpse of how we used SageMaker studio autopilot to generate the model we are using for this workshop. You do not need to do these steps as we already deployed the model to your account. Just watch to get an idea of how you can leverage machine learning in your own use cases. If you want to try this out, here’s the training data :
Download training data
Exercise 8 - See the predictions
As noted above the batch transform takes 5-6 minutes to complete. Once dataset refresh is completed, we can take a look at the predictions.
From QuickSight datasets view, click on SaaS-Sales-SM dataset.
Check if the data import has been completed. If not, wait for it to complete.
Click the Create analysis button.
From Visual types, choose table visual.
From Field list, select Order Date, Order ID and Will SLA be met? fields.
Sort Order Date in descending mode to see predictions for latest orders. (You might see some future dated orders. This is just as we have mocked up orders going through end of 2021 in the input data.)