AWS Glue Data Brew

AWS Glue Data Brew

What is AWS Glue Data Brew?

AWS Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code. Using DataBrew helps reduce the time it takes to prepare data for analytics and machine learning (ML) by up to 80 percent, compared to custom developed data preparation. You can choose from over 250 ready-made transformations to automate data preparation tasks, such as filtering anomalies, converting data to standard formats, and correcting invalid values.

How does AWS Workbench integrate with AWS Glue Data Brew?

AWS Glue DataBrew is installed as a plugin to the JupyterLab environment. This means that you can use AWS Glue DataBrew from within your JupyterLab environment. Some features of AWS Glue Data Brew are shown below integrated with, AWS Orbit Workbench. You can find out more about AWS Glue DataBrew by looking at the documentation.

Project

AWS Glue Data Brew The interactive data preparation workspace in DataBrew is called a project. Using a data project, you manage a collection of related items: data, transformations, and scheduled processes. As part of creating a project, you choose or create a dataset to work on. Next, you create a recipe, which is a set of instructions or steps that you want DataBrew to act on. These actions transform your raw data into a form that is ready to be consumed by your data pipeline. You can access you AWS Glue DataBrew projects directly from AWS Orbit Workbench and use them to work on your data.

Profile

AWS Glue Data Brew When you profile your data, DataBrew creates a report called a data profile. This summary tells you about the existing shape of your data, including the context of the content, the structure of the data, and its relationships. You can make a data profile for any dataset by running a data profile job.

Data Lineage

AWS Glue Data Brew DataBrew tracks your data in a visual interface to determine its origin, called a data lineage. This view shows you how the data flows through different entities from where it originally came. You can see its origin, other entities it was influenced by, what happened to it over time, and where it was stored.