Step 2 - Configure pipeline
After you create a project, you need to configure the data pipeline for it. A data pipeline is a set of integrated modules that collect and process the clickstream data sent from your applications. A data pipeline contains four modules, namely data ingestion, data processing, data modeling and reporting. For more information, see pipeline management.
Here we provide an example with steps to create a data pipeline with end-to-end serverless infrastructure.
Steps
- Log into Clickstream Analytics on AWS Console.
- In the left navigation pane, choose Projects, then select the project you just created in Step 1, choose View Details in the top right corner to navigate to the project homepage.
- Choose Configure pipeline, and it will bring you to the wizard of creating data pipeline for your project.
-
On the Basic information page, fill in the form as follows:
- AWS Region: select an AWS region you want to deploy the data pipeline into, for example,
us-east-1
. - VPC: select a VPC that meets the following requirements
- At least two public subnets across two different AZs (Availability Zone)
- At least two private subnets across two different AZs
- One NAT Gateway or Instance
- Data collection SDK:
Clickstream SDK
- Data location: select an S3 bucket. (You can create one bucket, and select it after clicking the Refresh button.)
Tip
Please refer Security best practices for Amazon S3 to create and configure Amazon S3 bucket. For example: Enable Amazon S3 server access logging, Enable S3 Versioning and so on.
Tip
If you don't have a VPC meet the criteria, you can create a VPC by using VPC creation wizard quickly. For more information, see Create a VPC. We also recommend that you refer Security best practices for your VPC to configure your vpc.
- AWS Region: select an AWS region you want to deploy the data pipeline into, for example,
-
Choose Next.
-
On the Configure ingestion page, fill in the information as follows:
- Fill in the form of Ingestion endpoint settings.
- Public Subnets: Select two public subnets in two different AZs
- Private Subnets: Select two private subnets in the same AZs as public subnets
- Ingestion capacity: Keep the default values
- Enable HTTPS: Uncheck and then Acknowledge the security warning
- Cross-Origin Resource Sharing (CORS): leave it blank
- Additional settings: Keep the default values
- Fill in the form of Data sink settings.
- Sink type:
Amazon Kinesis Data Stream(KDS)
- Provision mode:
On-demand
- In Additional Settings, change Sink Maximum Interval to
60
and Batch Size to1000
- Sink type:
- Click Next to move to step 3.
Important
Using HTTP is NOT a recommended configuration for production workload. This example configuration is to help you get started quicker.
- Fill in the form of Ingestion endpoint settings.
-
On the Configure data processing information, fill in the information as follows:
- In the form of Enable data processing, toggle on the Enable data processing
- In the form of Execution parameters,
- Data processing interval:
- Select
Fixed Rate
- Enter
10
- Select
Minutes
- Select
- Event freshness:
35
Days
- Data processing interval:
Important
In this example, we set the Data processing interval to be 10 minutes so that you can view the data faster. You can change the interval to be less frequent later to save cost. Refer to Pipeline Management to make change to data pipeline.
- In the form of Enrichment plugins, make sure the two plugins of IP lookup and UA parser are selected.
- In the form of Analytics engine, fill in the form as follow:
- Check the box for Redshift
- Select the Redshift Serverless
- Keep Base RPU as 8
- VPC: select the default VPC or the same one you selected previously in the last step
- Security group: select the
default
security group - Subnet: select three subnets across three different AZs
- Keep Athena selection as default
- Choose Next.
-
On the Reporting page, fill in the form as follows:
- If your AWS account has not subscribed to QuickSight, please follow this guide to subscribe.
- Toggle on the option of Enable Analytics Studio
- Choose Next.
-
On the Review and launch page, review your pipeline configuration details. If everything is configured properly, choose Create.
We have completed all the steps of configuring a pipeline for your project. This pipeline will take about 20 minutes to create, and please wait for the pipeline status change to be Active in pipeline detail page.