TAR Inflate
The TAR inflate processor makes it possible to extract, on-the-fly, the content of TAR archives and map each file within them to other middlewares in a pipeline. This makes it possible for customers to process documents within TAR and TAR Gzipped archives in a Lakechain pipeline.
đī¸ Inflating Archives
To use this middleware, you import it in your CDK stack and connect it to a data source that provides TAR archives, such as the S3 Trigger if your TAR archives are stored in S3.
âšī¸ The below example shows how to create a pipeline that inflates TAR archives uploaded to an S3 bucket.
Streaming Processing
đ The TAR inflate processor processes TAR archives in streaming, meaning that the compute driving archive inflation do not need to hold the entire archive in memory. This makes it possible to process large archives without having to worry about memory constraints.
đī¸ Architecture
The TAR inflate processor uses AWS Lambda as a compute for inflating tarballs. The compute can run up to 15 minutes to extract the files part of a tarball, and provides the next middlewares in the pipeline with the extracted files.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
application/x-tar | TAR archives |
application/x-gzip | Gzipped TAR archives |
Supported Outputs
Mime Type | Description |
---|---|
*/* | The TAR inflate processor will publish each file within the tarball to the next middlewares in the pipeline. |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Inflate Pipeline - An example showcasing how to inflate archives.