Translate
The Translate text processor makes it possible to translate documents from one language to a set of languages. at scale, using the Amazon Translate service. It supports various document formats such as Text, HTML, Docx, PowerPoint, Excel, and Xliff.
Using Amazon Translate, the input documents formatting and structure is preserved during the translation process, and the output documents are stored in the same format as the input documents.
đŦ Translating Documents
To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.
đ The below example takes supported input document uploaded into a source S3 bucket, and translates them to French and Spanish.
Profanity Detection
Amazon Translate supports masking profane words and sentences from translation results. To enable profanity detection, you can use the .withProfanityRedaction
method.
Tone Formality
You can also adapt the tone formality of the translation results using the .withFormality
method across FORMAL
and INFORMAL
tones.
âąī¸ Sync vs Async Jobs
âšī¸ This middleware uses both the real-time synchronous API, and the asynchronous batch translation API provided by Amazon Translate to translate documents.
Synchronous translations are faster, but have a limit of 100KB per document with support for Text, Docx, and HTML documents. Asynchronous batch jobs on the other hand support much larger documents sizes (up to 20MB per document) and a wider array of document types, but are significantly slower than synchronous translations.
This middleware will intelligently determines the right job type to use for each input document based on its size and format in order to optimize the translation process.
âšī¸ Limits
Using Amazon Translate as a backbone, the Translate middleware can translate between 70+ different languages. Please note though that Amazon Translate supports specific language-to-language translation pairs (e.g English to French).
As such, it is possible that not all combinations of languages are supported given the original language of the document. In such a case, an exception will be raised within the pipeline at runtime and the execution for that specific document will fail.
đī¸ Architecture
The processing flow implemented by this middleware depends on whether synchronous or asynchronous jobs are used to translate documents.
When using synchronous translations, the middleware uses the Amazon Translate real-time API to translate documents using a Lambda function which waits for the translations to be completed before forwarding them to the next middlewares in the pipeline.
When using asynchronous translations, This middleware uses an event-driven architecture leveraging Amazon Translate batch jobs, DynamoDB to maintain a mapping between jobs, and runs several Lambda computes based on the ARM64 architecture to orchestrate the overall translations.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
text/plain | Plain text documents. |
text/html | HTML documents. |
application/vnd.openxmlformats-officedocument.wordprocessingml.document | Word documents. |
application/vnd.openxmlformats-officedocument.presentationml.presentation | PowerPoint documents. |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Excel documents. |
application/x-xliff+xml | XLIFF documents. |
Supported Outputs
This middleware supports the same output types as its input types.
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- Text Translation Pipeline - An example showcasing how to translate documents using Amazon Translate.