Skip to content

Step 2 - Configure pipeline

After you create a project, you need to configure the data pipeline for it. A data pipeline is a set of integrated modules that collect and process the clickstream data sent from your applications. A data pipeline contains four modules, namely data ingestion, data processing, data modeling and reporting. For more information, see pipeline management.

Here we provide an example with steps to create a data pipeline with end-to-end serverless infrastructure.

Steps

  1. Log into Clickstream Analytics on AWS Console.
  2. In the left navigation pane, choose Projects, then select the project you just created in Step 1, choose View Details in the top right corner to navigate to the project homepage.
  3. Choose Configure pipeline, and it will bring you to the wizard of creating data pipeline for your project.
  4. On the Basic information page, fill in the form as follows:

    • AWS Region: select an AWS region you want to deploy the data pipeline into, for example, us-east-1.
    • VPC: select a VPC that meets the following requirements
      • At least two public subnets across two different AZs (Availability Zone)
      • At least two private subnets across two different AZs
      • One NAT Gateway or Instance
    • Data collection SDK: Clickstream SDK
    • Data location: select an S3 bucket. (You can create one bucket, and select it after clicking the Refresh button.)

    Tip

    Please refer Security best practices for Amazon S3 to create and configure Amazon S3 bucket. For example: Enable Amazon S3 server access logging, Enable S3 Versioning and so on.

    Tip

    If you don't have a VPC meet the criteria, you can create a VPC by using VPC creation wizard quickly. For more information, see Create a VPC. We also recommend that you refer Security best practices for your VPC to configure your vpc.

  5. Choose Next.

  6. On the Configure ingestion page, fill in the information as follows:

    1. Fill in the form of Ingestion endpoint settings.
      • Public Subnets: Select two public subnets in two different AZs
      • Private Subnets: Select two private subnets in the same AZs as public subnets
      • Ingestion capacity: Keep the default values
      • Enable HTTPS: Uncheck and then Acknowledge the security warning
      • Cross-Origin Resource Sharing (CORS): leave it blank
      • Additional settings: Keep the default values
    2. Fill in the form of Data sink settings.
      • Sink type: Amazon Kinesis Data Stream(KDS)
      • Provision mode: On-demand
      • In Additional Settings, change Sink Maximum Interval to 60 and Batch Size to 1000
    3. Click Next to move to step 3.

    Important

    Using HTTP is NOT a recommended configuration for production workload. This example configuration is to help you get started quicker.

  7. On the Configure data processing information, fill in the information as follows:

    • In the form of Enable data processing, toggle on the Enable data processing
    • In the form of Execution parameters,
      • Data processing interval:
        • Select Fixed Rate
        • Enter 10
        • Select Minutes
      • Event freshness: 35 Days

    Important

    In this example, we set the Data processing interval to be 10 minutes so that you can view the data faster. You can change the interval to be less frequent later to save cost. Refer to Pipeline Management to make change to data pipeline.

    • In the form of Enrichment plugins, make sure the two plugins of IP lookup and UA parser are selected.
    • In the form of Analytics engine, fill in the form as follow:
      • Check the box for Redshift
      • Select the Redshift Serverless
      • Keep Base RPU as 8
      • VPC: select the default VPC or the same one you selected previously in the last step
      • Security group: select the default security group
      • Subnet: select three subnets across three different AZs
      • Keep Athena selection as default
    • Choose Next.
  8. On the Reporting page, fill in the form as follows:

    • If your AWS account has not subscribed to QuickSight, please follow this guide to subscribe.
    • Toggle on the option of Enable Analytics Studio
    • Choose Next.
  9. On the Review and launch page, review your pipeline configuration details. If everything is configured properly, choose Create.

We have completed all the steps of configuring a pipeline for your project. This pipeline will take about 20 minutes to create, and please wait for the pipeline status change to be Active in pipeline detail page.

Next