Skip to main content

CAD/Mesh Metadata Extraction Pipeline

The CAD/Mesh Metadata Extraction pipeline automatically extracts comprehensive metadata from CAD and mesh files, writing the results as file-level attributes in VAMS. It uses CadQuery for CAD file processing and Trimesh for mesh file processing, running as a containerized AWS Lambda function. This pipeline enriches your asset catalog with geometric, structural, and format-specific properties without manual data entry.

Supported Formats

CAD Formats

FormatExtensionHandlerKey Metadata
STEP.step, .stpCadQuery (Open CASCADE)Geometry dimensions, assembly hierarchy, volumes, surface areas, shape statistics, units
DXF.dxfCadQueryLayer information, 2D drawing data

Mesh Formats

FormatExtensionHandlerKey Metadata
STL.stlTrimeshTriangle count, vertex count, bounding box, model size
OBJ.objTrimeshPolygon count, vertex count, texture references, materials
PLY.plyTrimeshVertex count, vertex colors, normals
GLTF.gltfTrimeshShader info, animation data, texture references
GLB.glbTrimeshShader info, animation data, embedded textures
3MF.3mfTrimesh3D printing metadata, units
XAML.xamlTrimeshTransform matrices, model size
3DXML.3dxmlTrimeshDassault-specific metadata
DAE.daeTrimeshAnimation data, materials, scene hierarchy
XYZ.xyzTrimeshPoint count, bounding box

Architecture

Execution Type

This pipeline uses the Lambda execution type with synchronous invocation. Like the 3D Basic Conversion pipeline, it does not use AWS Step Functions task token callbacks.

Processing Flow

  1. The Lambda function receives the input Amazon S3 URI and the output Amazon S3 metadata path.
  2. The file extension is inspected to determine the handler type: cad for STEP/DXF files, mesh for all other supported formats.
  3. The input file is downloaded from Amazon S3 to the Lambda container's temporary storage.
  4. The appropriate extractor processes the file:
    • CAD extractor: Uses CadQuery to parse geometric details, assembly hierarchy, and material properties.
    • Mesh extractor: Uses Trimesh to extract polygon counts, vertex data, bounding box dimensions, and format-specific metadata.
  5. Extracted metadata is transformed into the VAMS attribute format and saved as a JSON file.
  6. The attribute JSON is uploaded to the output path as a file-level attribute file (<filename>.attribute.json).

Extracted Metadata

CAD Files (STEP, DXF)

CategoryFields
Geometric detailsDimensions (length, width, height), volume, surface area
Assembly hierarchyComponent tree, relationships between parts
MaterialsMaterial names and properties (if embedded in the file)
Shape statisticsFace count, edge count, vertex count per component
UnitsUnit of measurement from the file header
Custom metadataTop-level node properties

Mesh Files (STL, OBJ, PLY, GLB, etc.)

CategoryFields
GeometryTriangle count, vertex count, polygon count
Bounding boxDimensions (X, Y, Z extents), model size
TexturesEmbedded or referenced texture information
ShadersShader information (GLTF/GLB formats)
AnimationFrame count, duration (if present)
TransformsRotation, scale, translation matrices
Format-specificDRACO compression info, 3D Tiles data, units

Configuration

Enable this pipeline in infra/config/config.json:

{
"app": {
"pipelines": {
"useConversionCadMeshMetadataExtraction": {
"enabled": true,
"autoRegisterWithVAMS": true,
"autoRegisterAutoTriggerOnFileUpload": true
}
}
}
}

Configuration Options

OptionDefaultDescription
enabledfalseDeploy the metadata extraction pipeline infrastructure.
autoRegisterWithVAMStrueAutomatically register the pipeline and workflow during CDK deployment.
autoRegisterAutoTriggerOnFileUploadtrueAutomatically trigger the pipeline when supported CAD or mesh files are uploaded.
Automatic Metadata Enrichment

When both autoRegisterWithVAMS and autoRegisterAutoTriggerOnFileUpload are enabled, every CAD or mesh file uploaded to VAMS is automatically analyzed and enriched with metadata. This provides immediate searchable properties for newly uploaded assets without user intervention.

Output Format

The pipeline produces file-level attribute files in the following JSON structure:

{
"type": "attribute",
"updateType": "update",
"metadata": [
{
"metadataKey": "vertex_count",
"metadataValue": "45230",
"metadataValueType": "string"
},
{
"metadataKey": "bounding_box_dimensions",
"metadataValue": "{\"x\": 10.5, \"y\": 8.2, \"z\": 3.1}",
"metadataValueType": "string"
}
]
}

The attribute file is uploaded to the metadata output path with the naming pattern <original_filename>.attribute.json. The VAMS workflow's process-output step reads this file and writes the attributes to the VAMS metadata system, making them searchable and visible in the web interface.

Prerequisites

No VPC Required

This pipeline runs as a containerized Lambda function and does not require a VPC. It operates independently of the global VPC configuration, making it one of the simplest pipelines to enable.

Container Image

The Lambda container image is built during CDK deployment from backendPipelines/conversion/meshCadMetadataExtraction/lambdaContainer/Dockerfile. It includes:

  • Python 3.12 -- Lambda runtime
  • CadQuery -- CAD file processing (STEP, DXF)
  • Trimesh -- Mesh file processing (STL, OBJ, PLY, GLTF, GLB, etc.)
  • NumPy -- Numerical computation for geometric analysis
  • AWS Lambda Powertools -- Structured logging

Infrastructure Components

ResourceServicePurpose
Container Lambda FunctionAWS LambdaMetadata extraction execution
Container ImageAmazon ECRCadQuery + Trimesh container image
Step Functions State MachineAWS Step FunctionsWorkflow orchestration
Lambda Function (vamsExecute)AWS LambdaPipeline coordination

Limitations

ConstraintDetails
Maximum file sizeLimited by Lambda container /tmp storage (10 GB)
Execution timeout15 minutes (Lambda maximum)
Read-only extractionThe pipeline reads metadata but does not modify the source file
Format fidelityMetadata depth varies by format; some formats embed richer metadata than others