The Amazon Kinesis Data Generator (KDG) makes it easy to send data to Amazon Kinesis Data Streams or Amazon Data Firehose. Learn how to use the tool and create templates for your records.
NOTE: Setting up the Kinesis Data Generator (KDG) in an AWS account will create a set of Cognito credentials. Users who can authenticate with those credentials will be able to publish to all Kinesis Data Streams and Amazon Data Firehoses in the account. After executing the setup below, you may change the IAM roles that are created to restrict permissions to publish to specific streams or firehoses.
Before you can send data to Kinesis, you must first create an Amazon Cognito user in your AWS account with permissions to access Amazon Kinesis. To simplify this process, an Amazon Lambda function and an Amazon CloudFormation template are provided to create the user and assign just enough permissions to use the KDG.
Because Amazon Cognito is not supported by CloudFormation, much of the setup is done in a Lambda
function.
The source code for the function can be downloaded from here.
The CloudFormation template will create the following resources in your AWS account:
The Cognito Lambda function will create the following resources in your AWS account:
Create the CloudFormation stack by clicking the button below. It will take you to the AWS CloudFormation console and start the stack creation wizard. You only need to provide a Username and Password for the user that you will use to log in to the KDG. Accept the defaults for any other options presented by CloudFormation.
Please note that the CloudFormation Template below is currently supported in the following regions:
In addition to the above regions, GovCloud is also supported by manually importing the
CloudFormation template in the following regions:
For manually installing KDG by CloudFormation:
Create stack
and Upload a template
with the downloaded file.
After the CloudFormation stack has been successfully created, you will need to use a special URL to access the KDG. CloudFormation creates this URL as part of the stack generation, and you can find it in the Outputs section of the CloudFormation stack.
To find the URL, choose the CloudFormation stack, and then choose the Outputs tab as shown below. Simply bookmark this URL in your browser for easy future access to the KDG.
The KDG can generate records using random data based on a template you provide. In the Record Template textarea, provide a template that represents a single record. The KDG will create a unique record based on the template, replacing your templated record with actual data. The record template can be of any type: json, csv, or unstructured. Because of this, there is no validation of the data before it is sent to Kinesis.
The KDG extends faker.js, an open source random data generator. For full documentation of the items that can be "faked" by faker.js, see the faker.js documentation.
Data elements in a teamplate that need to be replaced for each record use moustache syntax (i.e. enclosed in double curly-braces {{ replace.this }} ). Consider the following data record, representing somebody's first name, last name, age, and IP address:
John,Doe,42,127.0.0.1
The template to generate records of this type:
{{name.firstName}},{{name.lastName}},{{random.number(70)}},{{internet.ip}}
Records can be represented in any data structure, such as JSON:
{ "sensorId": 40, "currentTemperature": 76, "status": "OK" }
In this template, assume that the "status" can be only one of three items (OK, WARN, FAIL). Also assume that we want the temperature range to be a random value between 10 and 150. The template for this would look like:
{ "sensorId": {{random.number(50)}}, "currentTemperature": {{random.number( { "min":10, "max":150 } )}}, "status": "{{random.arrayElement( ["OK","FAIL","WARN"] )}}" }
The KDG supports several other templating features, in addition to the native templating provided by faker.js.
You can insert the current date and time into each record by including a date.now and date.utc items in your record template. The KDG uses the moment.js library for datetime formatting. Details for creating an appropriate format string for your use case can be found in the moment.js documentation. Several examples are shown here:
{{date.now}} // 2014-09-08T08:02:17-05:00 {{date.now("dddd, MMMM Do YYYY, h:mm:ss a")}} // Sunday, February 14th 2010, 3:25:50 pm {{date.now("ddd, hA")}} // Sun, 3PM {{date.now("DD/MMM/YYYY:HH:mm:ss Z")}} // 14/Jul/2009:20:12:22 -0700
{{date.utc}} {{date.utc("dddd, MMMM Do YYYY, h:mm:ss a")}} {{date.utc("ddd, hA")}} {{date.utc("DD/MMM/YYYY:HH:mm:ss Z")}}
Sometimes you don't want randomness to be completely random. You might want to to choose elements from an array, but you want the randomness to be weighted such that over time, each element is chosen a certain number of times, relative to other elements in the array. To accomplish this, use random.weightedArrayElement. It takes a JSON object as input, with that JSON object containing two attributes: weights, and data. Each attribute contains a single array. The data array contains the data from which you want the function to choose, and the weights array contains items representing the percentage that each corresponding element in the data array should be chosen.
{{random.weightedArrayElement( { "weights": [0.3,0.2,0.5], "data": ["cat","fish","dog"] } )}}
The KDG can use one of two different strategies for calculating the send rate of messages: Constant or Periodic. Select between the two by choosing the "Constant" or "Periodic" tab.
In "Constant" mode, the KDG sends the same number of records each second.
In "Periodic" mode, the KDG sends a variable number of records each second. The rate varies by hour-of-day and day-of-week. This allows for simulation of periodic data. The record send rate is pseudo-random. The actual value is chosen along a guassian distribution centered on Mu with a standard deviation of the value provided by Sigma (you can specify a different Mu and Sigma for each hour of each day of the week).
If "Enable Linear Smoothing" is selected, Mu and Sigma will be adjusted linearly between adjacent hours, with the actual provided values for Mu and Sigma used at the top of the hour and transitioning to the next hour linearly.
If "Lock to Real Time" is selected, data will be generated every second - similar to the "Constant" mode. If "Lock to Real Time" is deselected, the KDG will prompt you for start and end datetimes and will generate data for each second between those start and endtimes in "ticks." The number of seconds of data that is generated in each "tick" is the value in "Seconds of data per tick" and the data generation thread will yield for "Wait time between ticks" (milliseconds) between each tick (to allow other browser processes, including the Kinesis send thread, to execute). In this "Unlocked to real time" mode,
{{date.simTime}}will be evaluated in the template as the currently-simulated time (as opposed to the wall-clock time that will be returned by {{date.now}}). This is especially useful for generating bulk datasets as quickly as your browser can (as opposed to generating them in real time).