ElevenLabs
Unstable API
0.8.0
@project-lakechain/elevenlabs-synthesizer
The ElevenLabs synthesizer middleware transforms text documents into speech using the ElevenLabs API. It implements a throttling mechanism to stay within the ElevenLabs API Limits, and allows pipeline builders to customize the model and voices they use for synthesis.
âšī¸ This middleware interacts with a third-party API outside of your AWS account.
đŖī¸ Synthesizing Text
To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.
đ You need to specify an ElevenLabs API key to the middleware, by specifying a reference to an AWS Secrets Manager secret containing the API key.
Model Selection
You can specify the ElevenLabs Model you want to use during the synthesis process using the withModel
method.
đ By default, this middleware uses the
eleven_multilingual_v2
model.
Voice Selection
You can specify the voice identifier you would like to use and that is attached to your ElevenLabs account.
đ See the ElevenLabs documentation for more information.
You can also customize the voice settings by passing an optional object describing the features of the voice you want to generate.
đ See the Voice settings documentation for more information.
Output Format
You can customize the output format of the synthesized speech using the withOutputFormat
method.
Note that updating the output format will affect the mime-type of the output documents.
đ By default, this middleware uses the
mp3_44100_128
format.
âšī¸ Limits
This middleware automatically applies a throttling mechanism when consuming messages from its input queue to stay within the minimal ElevenLabs API Limits. However, throttling may happen when using different instances of this middleware in parallel with the same API key.
This middleware does not chunk the audio for long form text, and relies on the remote model to generate consistent speech for long form documents. While the ElevenLabs API is pretty good at that, the quality of the results may vary between different models for long form documents.
đī¸ Architecture
This middleware is based on a Lambda ARM64 compute to perform the text-to-speech synthesis using the ElevenLabs API. It also leverages AWS Secrets Manager to retrieve the ElevenLabs API key at runtime.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
text/plain | This middleware supports plain text documents. |
Supported Outputs
The output mime type of this middleware depends on the output format specified.
Mime Type | Description |
---|---|
audio/mpeg | This middleware can output audio files in the MPEG format by default. |
audio/L16 | This middleware can output audio files in the L16 format. |
audio/basic | This middleware can output audio files in the ulaw format. |
Supported Compute Types
Type | Description |
---|---|
CPU | This middleware only supports CPU compute. |
đ Examples
- ElevenLabs Synthesizer - Builds a pipeline for synthesizing text to speech using the ElevenLabs API.