PANN
The Large-Scale Pre-trained Audio Neural Networks provides the foundation for creating embeddings for audio documents leveraging specific features from audio such as Mel-frequency cepstral coefficients (MFCCs) and Chroma features. This allows customers to perform semantic similarity search on a set of audio documents using this middleware, in combination with a vector database.
đ§ Embedding Audio
To use this middleware, you import it in your CDK stack and specify a VPC in which the audio processing cluster will be deployed.
Auto-Scaling
The cluster of containers deployed by this middleware will auto-scale based on the number of images that need to be processed. The cluster scales up to a maximum of 5 instances by default, and scales down to zero when there are no images to process.
âšī¸ You can configure the maximum amount of instances that the cluster can auto-scale to by using the
withMaxInstances
method.
đ Output
The PANNs embedding processor middleware does not modify or alter source documents in any way. It instead enriches the metadata of the audio documents with a pointer to the vector embedding that were created for the document.
đ Click to expand example
đī¸ Architecture
The PANNs embedding processor requires GPU-enabled instances (g4dn.xlarge) to run the PANNs embedding model. To orchestrate deployments, it deploys an ECS auto-scaled cluster of containers that consume documents from the middleware input queue. The cluster is deployed in the private subnet of the given VPC, and caches the models on an EFS storage to optimize cold-starts.
âšī¸ The average cold-start for the PANNs containers is around 3 minutes when no instances are running.
đˇī¸ Properties
Supported Inputs
Mime Type | Description |
---|---|
audio/mpeg | MPEG audio documents. |
audio/mp3 | MP3 audio documents. |
audio/mp4 | MP4 audio documents. |
audio/wav | WAV audio documents. |
audio/x-wav | WAV audio documents. |
audio/x-m4a | M4A audio documents. |
audio/ogg | OGG audio documents. |
audio/x-flac | FLAC audio documents. |
audio/flac | FLAC audio documents. |
audio/x-aiff | AIFF audio documents. |
audio/aiff | AIFF audio documents. |
audio/x-ms-wma | WMA audio documents. |
audio/x-matroska | MKV audio documents. |
audio/webm | WebM audio documents. |
audio/aac | AAC audio documents. |
Supported Outputs
This middleware supports as outputs the same types as the supported inputs.
Supported Compute Types
Type | Description |
---|---|
GPU | This middleware only supports GPU compute. |
đ Examples
- PANNs OpenSearch Pipeline - An example showcasing an audio embedding pipeline using PANNS and OpenSearch.