Unzip
The Zip inflate processor makes it possible to extract the content of Zip archives and map each file within them to other middlewares in a pipeline. This makes it possible for customers to process documents stored within Zip archives in a Lakechain pipeline.
đī¸ Inflating Archives
To use this middleware, you import it in your CDK stack and connect it to a data source that provides Zip archives, such as the S3 Trigger if your Zip archives are stored in S3.
âšī¸ The below example shows how to create a pipeline that inflates Zip archives uploaded to an S3 bucket.
Streaming Processing
đ The Zip inflate processor processes Zip archives in streaming, meaning that the compute driving archive inflation do not need to hold the entire archive in memory. This makes it possible to process large archives without having to worry about memory constraints.
đī¸ Architecture
The Zip inflate processor uses AWS Lambda as a compute for inflating archives. The compute can run up to 15 minutes to extract the files part of a compressed archives, and provides the next middlewares in the pipeline with the extracted files.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
application/zip | Zip archives |
Supported Outputs
Mime Type | Description |
---|---|
*/* | The Zip inflate processor will publish each file within archives to the next middlewares in the pipeline. |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Inflate Pipeline - An example showcasing how to inflate archives.