ElevenLabs

Unstable API

0.8.0

@project-lakechain/elevenlabs-synthesizer

The ElevenLabs synthesizer middleware transforms text documents into speech using the ElevenLabs API. It implements a throttling mechanism to stay within the ElevenLabs API Limits, and allows pipeline builders to customize the model and voices they use for synthesis.

ℹ️ This middleware interacts with a third-party API outside of your AWS account.

🗣️ Synthesizing Text

To use this middleware, you import it in your CDK stack and instantiate it as part of a pipeline.

💁 You need to specify an ElevenLabs API key to the middleware, by specifying a reference to an AWS Secrets Manager secret containing the API key.

import { ElevenLabsSynthesizer } from '@project-lakechain/elevenlabs-synthesizer';

class Stack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string) {
    const cache = new CacheStorage(this, 'Cache');

    // The ElevenLabs API key.
    const apiKey = secrets.Secret.fromSecretNameV2(
      this,
      'ApiKey',
      process.env.ELEVENLABS_API_KEY_SECRET_NAME as string
    );

    // Convert the text to speech using the ElevenLabs API.
    const synthesizer = new ElevenLabsSynthesizer.Builder()
      .withScope(this)
      .withIdentifier('ElevenLabsSynthesizer')
      .withCacheStorage(cache)
      .withSource(source) // 👈 Specify a data source
      .withApiKey(apiKey)
      .withVoice('Rachel')
      .build();
  }
}

Model Selection

You can specify the ElevenLabs Model you want to use during the synthesis process using the withModel method.

💁 By default, this middleware uses the eleven_multilingual_v2 model.

const synthesizer = new ElevenLabsSynthesizer.Builder()
  .withScope(this)
  .withIdentifier('ElevenLabsSynthesizer')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(apiKey)
  .withModel('eleven_turbo_v2_5') // 👈 Specify the model
  .build();

Voice Selection

You can specify the voice identifier you would like to use and that is attached to your ElevenLabs account.

💁 See the ElevenLabs documentation for more information.

const synthesizer = new ElevenLabsSynthesizer.Builder()
  .withScope(this)
  .withIdentifier('ElevenLabsSynthesizer')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(apiKey)
  .withVoice('pNInz6obpgDQGcFmaJgB') // 👈 Specify the voice
  .build();

You can also customize the voice settings by passing an optional object describing the features of the voice you want to generate.

💁 See the Voice settings documentation for more information.

import { ElevenLabsSynthesizer, VoiceSettings } from '@project-lakechain/elevenlabs-synthesizer';

const synthesizer = new ElevenLabsSynthesizer.Builder()
  .withScope(this)
  .withIdentifier('ElevenLabsSynthesizer')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(elevenlabsApiKey)
  .withVoice('pNInz6obpgDQGcFmaJgB', new VoiceSettings.Builder()
    .withStability(0)
    .withSimilarityBoost(0.5)
    .withStyle(0.5)
    .withSpeakerBoost(false)
    .build())
  .build();

Output Format

You can customize the output format of the synthesized speech using the withOutputFormat method. Note that updating the output format will affect the mime-type of the output documents.

💁 By default, this middleware uses the mp3_44100_128 format.

const synthesizer = new ElevenLabsSynthesizer.Builder()
  .withScope(this)
  .withIdentifier('ElevenLabsSynthesizer')
  .withCacheStorage(cache)
  .withSource(source)
  .withApiKey(apiKey)
  .withOutputFormat('mp3_22050_32') // 👈 Specify the output format
  .build();

ℹ️ Limits

This middleware automatically applies a throttling mechanism when consuming messages from its input queue to stay within the minimal ElevenLabs API Limits. However, throttling may happen when using different instances of this middleware in parallel with the same API key.

This middleware does not chunk the audio for long form text, and relies on the remote model to generate consistent speech for long form documents. While the ElevenLabs API is pretty good at that, the quality of the results may vary between different models for long form documents.

🏗️ Architecture

This middleware is based on a Lambda ARM64 compute to perform the text-to-speech synthesis using the ElevenLabs API. It also leverages AWS Secrets Manager to retrieve the ElevenLabs API key at runtime.

ElevenLabs Synthesizer Architecture

🏷️ Properties

Supported Inputs

Mime Type	Description
`text/plain`	This middleware supports plain text documents.

Supported Outputs

The output mime type of this middleware depends on the output format specified.

Mime Type	Description
`audio/mpeg`	This middleware can output audio files in the MPEG format by default.
`audio/L16`	This middleware can output audio files in the L16 format.
`audio/basic`	This middleware can output audio files in the ulaw format.

Supported Compute Types

Type	Description
`CPU`	This middleware only supports CPU compute.

📖 Examples

ElevenLabs Synthesizer - Builds a pipeline for synthesizing text to speech using the ElevenLabs API.