Skip to content

Documentation Translation

Documentation Translation

This project uses AWS Bedrock’s Haiku 3.5 model to automatically translate documentation from English to multiple languages. The translation system is designed to be efficient, accurate, and easy to use.

Supported Languages

Currently, the following languages are supported:

  • Japanese (jp)
  • French (fr)
  • Spanish (es)
  • German (de)
  • Chinese (zh)
  • Korean (ko)

How It Works

The translation system works by:

  1. Splitting documents by h2 headers - This allows for more efficient processing and better context for the translation model.
  2. Preserving markdown formatting - All markdown syntax, code blocks, and HTML tags are preserved during translation.
  3. Special handling for frontmatter - YAML frontmatter is translated while preserving its structure.
  4. Incremental translation - Only changed files are translated by default, saving time and resources.

Running Translations Locally

To translate documentation locally, use the scripts/translate.ts script:

Terminal window
# Translate only changed files to Japanese (default)
./scripts/translate.ts
# Translate all files
./scripts/translate.ts --all
# Translate to specific languages
./scripts/translate.ts --languages jp,fr,es
# Show what would be translated without actually translating
./scripts/translate.ts --dry-run
# Show verbose output
./scripts/translate.ts --verbose

GitHub Workflow

A GitHub workflow automatically translates documentation when changes are made to English documentation files in pull requests. The workflow:

  1. Detects changes to English documentation files
  2. Translates the changed files using AWS Bedrock
  3. Commits the translations back to the source branch
  4. Updates the PR with translation status

Manual Workflow Trigger

You can also manually trigger the translation workflow from the GitHub Actions tab. This is useful for:

  • Running a full translation of all documentation
  • Translating to specific languages
  • Updating translations after making changes to the translation script

AWS Configuration

The translation system uses AWS Bedrock’s Haiku 3.5 model for translation. To use this feature, you need:

  1. AWS credentials - For local development, configure your AWS credentials using the AWS CLI or environment variables.
  2. IAM Role - For GitHub Actions, configure an IAM role with OIDC authentication and the necessary permissions for AWS Bedrock.

Required Permissions

The IAM role or user needs the following permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-3-haiku-20240307-v1:0"
]
}
]
}

Translation Quality

The translation quality is generally high, but there are a few things to keep in mind:

  • Technical terms - The system is configured to preserve technical terms in English.
  • Code blocks - Code blocks are not translated, as they should remain in their original form.
  • Context awareness - The translation model understands the context of the documentation, which helps with technical translations.

Customizing the Translation

You can customize the translation process by modifying the scripts/translate.ts file. Some possible customizations include:

  • Adding support for more languages
  • Changing the translation model
  • Adjusting the prompts used for translation
  • Modifying how documents are split and processed

Troubleshooting

If you encounter issues with the translation process:

  1. Check AWS credentials - Ensure your AWS credentials are properly configured.
  2. Check AWS region - Make sure you’re using a region where AWS Bedrock is available.
  3. Run with verbose output - Use the --verbose flag to see more detailed logs.
  4. Check for rate limiting - AWS Bedrock has rate limits that may affect large translation jobs.