Upgrade the solution
Important
- Please be advised that upgrading directly from version 1.0.x to 1.1.6(+) version is not supported. It is necessary to upgrade to version 1.1.5 first.
- By upgrading the web console from earlier 1.1 versions before 1.1.6, you could continue to view the dashboards of the project. However, you could not explore the existing Clickstream data due the changes of data schemas. If you wish to continue to use the Explorations, you will need to upgrade the data pipeline as well and migrate the existing data to new data schemas (if you want to explore historical data).
Planning and Preparation
- Data Processing Interval (only applicable when upgrading from a version earlier than v1.1.6 and data processing enabled): The data schema has been updated since version 1.1.6. Ensure no data processing job is running while upgrading the existing pipeline. The pipeline upgrade will take approximately 20 minutes. You can update the existing pipeline to increase the interval and check if there are any running jobs of the EMR Serverless application in the console.
Upgrade Process
Upgrade web console stack
- Log in to AWS CloudFormation console, select your existing web console stack, and choose Update.
- Select Replace current template.
-
Under Specify template:
- Select Amazon S3 URL.
- Refer to the table below to find the link for your deployment type.
- Paste the link in the Amazon S3 URL box.
- Choose Next again.
Template Description Use Cognito for authentication Deploy as public service in AWS regions Use Cognito for authentication with custom domain Deploy as public service with custom domain in AWS regions Use OIDC for authentication Deploy as public service in AWS regions Use OIDC for authentication with custom domain Deploy as public service with custom domain in AWS regions Use OIDC for authentication within VPC Deploy as private service within VPC in AWS regions Use OIDC for authentication with custom domain in AWS China Deploy as public service with custom domain in AWS China regions Use OIDC for authentication within VPC in AWS China Deploy as private service within VPC in AWS China regions -
Under Parameters, review the parameters for the template and modify them as necessary. Refer to Deployment for details about the parameters.
- Choose Next.
- On the Configure stack options page, choose Next.
- On the Review page, review and confirm the settings. Be sure to check the box acknowledging that the template might create (IAM) resources.
- Choose View change set and verify the changes.
- Choose Execute change set to deploy the stack.
You can view the status of the stack in the AWS CloudFormation console in the Status column. You should receive an UPDATE_COMPLETE
status after a few minutes.
Upgrade the Project Pipeline
Important
If you encounter any issues during the upgrade process, refer to Troubleshooting for more information.
- Log in to the web console of the solution.
- Go to Projects, and choose the project to be upgraded.
- Click on
project id
or View Details button, which will direct to the pipeline detail page. - In the project details page, click on the Upgrade button
- You will be prompted to confirm the upgrade action.
- Click on Confirm, the pipeline will be in
Updating
status.
You can view the status of the pipeline in the solution console in the Status column. After a few minutes, you can receive an Active status.
Post-Upgrade Actions
This section provides instructions for post-upgrade actions.
Ingestion
As of version 1.1.7, this solution uses launch templates. After upgrading the data ingestion module, complete the following steps to replace the Amazon EC2 instances used by Amazon ECS with the new launch template configuration.
- Increase the desired task number by updating the Amazon ECS service.
- After the newly added Amazon ECS tasks have started successfully, manually stop the old tasks.
- Manually terminate the old Amazon EC2 instances.
Data Modeling
Upgrade the Data Schema and Out-of-the-box Dashboards
The solution automatically and asynchronously upgrades the views and materialized views used by the dashboard after upgrading the pipeline of the project. The duration of the update depends on the workload of the Redshift cluster and the existing data volume, and can take minutes to hours. You can track the progress in the Redshift Schemas section in the Processing tab of the Pipeline Detail page. If the post-configuration job fails, you can access the execution of the workflow through its link and rerun the job via Actions - Redrive or New execution with the input unchanged.
Migrate the Existing Data (only applicable when upgrading from a version earlier than v1.1.6)
Important
The data migration process is CPU-intensive, and it will incur additional cost. Before starting the migration, ensure that the load on your Redshift is low. It's also advisable to consider temporarily increasing the RPUs of Redshift Serverless or the cluster size when migrating large volumes of data.
In our benchmark, we migrated the events from the last 30 days. Here are the details:
Average number of events per day: 10 million
Total events for 30 days: 300 million
Redshift RPU: 32 RPUs
Total duration: 4 hour 45 minutes
Total cost: $47.77
Please note that the total duration and cost will increase as the number of events you migrate increases.
-
Open Redshift query editor v2. You can refer to the AWS document Working with query editor v2 to log in and query data using Redshift query editor v2.
Note
You must use the
admin
user or a user with schema (known as the app ID) ownership permission. See this FAQ for more details. -
Select the Serverless workgroup or provisioned cluster,
<project-id>
-><app-id>
->Tables, and ensure that tables for the appId are listed there. -
Create a new SQL Editor, select your project's schema.
-
Customize the date range as desired, and execute the following SQL in the editor to migrate events from the past 30 days, or any number of days up to the present, to the new tables.
-
Wait for the SQL to complete. The execution time depends on the volume of data in the
events
table. In our test, migrating 300 million events using 32 RPUs will take approximately 2 hours and 5 minutes. -
Execute the following SQL to check the stored procedure execution log; ensure there are no errors. If there are any interruptions, timeouts, or other errors, you can re-execute step 4 to continue the data migration.
-- please replace <app-id> with your actual app id SELECT * FROM "<app-id>"."clickstream_log_v2" WHERE log_name = 'sp_migrate_event_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log_v2" WHERE log_name = 'sp_migrate_user_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log_v2" WHERE log_name = 'sp_migrate_item_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log_v2" WHERE log_name = 'sp_migrate_session_to_v2' ORDER BY log_date DESC; SELECT * FROM "<app-id>"."clickstream_log_v2" WHERE log_name = 'sp_migrate_data_to_v2' ORDER BY log_date DESC;
-
populate the event data to
clickstream_event_base_view
table.-- please replace <app-id> with your actual app id -- update the day range(30 days in below example) based on your needs CALL "<app-id>".clickstream_event_base_view_sp(NULL, NULL, 24*30);
Note
It is recommended to refresh the
clickstream_event_base_view
in batches, especially in the following scenarios:- When there are new event load jobs coming in before the migration job completes.
- When the volume of migrated data is large (e.g., 100 millions in a batch).
Noe that refreshing the data in batches needs to be done based on the event timestamp. Call the following stored procedure multiple times, in order from old to new event timestamps.
For example, to refresh data between 2024-05-10 00:00:00 and 2024-05-12 00:00:00, execute the following SQL:Based on our test parameters mentioned above, this process will take about 20 minutes.
-
Follow this guide to calculate metrics for the new preset dashboard based on the migrated data.
Based on our test parameters mentioned above, this process will take about 2 hours and 20 minutes.
-
If you don't have other applications using the legacy tables and views, you could run the following SQL to clean up the legacy views and tables to save Redshift storage.
-- please replace `<app-id>` with your actual app id DROP TABLE "<app-id>".event CASCADE; DROP TABLE "<app-id>".item CASCADE; DROP TABLE "<app-id>".user CASCADE; DROP TABLE "<app-id>".event_parameter CASCADE; DROP PROCEDURE "<app-id>".sp_migrate_event_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_migrate_item_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_clear_expired_events(retention_range_days integer); DROP PROCEDURE "<app-id>".sp_migrate_data_to_v2(nday integer); DROP PROCEDURE "<app-id>".sp_migrate_user_to_v2(); DROP PROCEDURE "<app-id>".sp_migrate_session_to_v2(); DROP PROCEDURE "<app-id>".sp_clear_item_and_user();