Skip to main content

Data Model

This page documents the data model used by VAMS across Amazon DynamoDB, Amazon S3, and Amazon OpenSearch. It covers table schemas with partition keys, sort keys, and global secondary indexes; S3 bucket organization and key structure; OpenSearch index mappings; and data lifecycle patterns such as archiving and versioning.

Amazon DynamoDB Table Schemas

All Amazon DynamoDB tables use on-demand billing (PAY_PER_REQUEST), point-in-time recovery, and optional AWS KMS customer-managed key encryption. Tables with DynamoDB Streams enabled are indicated below.

Asset Storage Table

Stores the primary record for each asset within a database.

AttributeTypeKey
databaseIdStringPartition Key
assetIdStringSort Key

DynamoDB Streams: NEW_IMAGE

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
BucketIdGSIbucketIdassetIdKeys Only
assetIdGSIassetIddatabaseIdKeys Only

Common Attributes: assetName, assetType, description, isDistributable, tags, assetLocation, previewLocation, bucketId, createdAt, updatedAt

Database Storage Table

Stores database (collection) records.

AttributeTypeKey
databaseIdStringPartition Key

DynamoDB Streams: NEW_IMAGE

Asset Versions Storage Table (V2)

Stores version records for each asset, scoped by database.

AttributeTypeKey
databaseId:assetIdStringPartition Key
assetVersionIdStringSort Key

Common Attributes: versionAlias, comment, isArchived, createdAt, createdBy

Asset File Versions Storage Table (V2)

Stores file records per asset version.

AttributeTypeKey
databaseId:assetId:assetVersionIdStringPartition Key
fileKeyStringSort Key

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
databaseIdAssetIdIndexdatabaseId:assetId--ALL

Asset File Metadata Versions Storage Table

Stores metadata snapshots per asset version for point-in-time metadata recovery.

AttributeTypeKey
databaseId:assetId:assetVersionIdStringPartition Key
type:filePath:metadataKeyStringSort Key

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
databaseIdAssetIdIndexdatabaseId:assetId--ALL

Asset Uploads Storage Table

Tracks in-progress file uploads.

AttributeTypeKey
uploadIdStringPartition Key
assetIdStringSort Key

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
AssetIdGSIassetIduploadIdKeys Only
DatabaseIdGSIdatabaseIduploadIdKeys Only
UserIdGSIUserIdcreatedAtKeys Only

Database Metadata Storage Table (V2)

Stores metadata key-value pairs at the database level.

AttributeTypeKey
metadataKeyStringPartition Key
databaseIdStringSort Key

DynamoDB Streams: NEW_IMAGE

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
DatabaseIdIndexdatabaseIdmetadataKeyALL

Asset File Metadata Storage Table (V2)

Stores metadata key-value pairs at the file level within an asset.

AttributeTypeKey
metadataKeyStringPartition Key
databaseId:assetId:filePathStringSort Key

DynamoDB Streams: NEW_IMAGE

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
DatabaseIdAssetIdFilePathIndexdatabaseId:assetId:filePathmetadataKeyALL
DatabaseIdAssetIdIndexdatabaseId:assetIdmetadataKeyALL

File Attribute Storage Table (V2)

Stores system-generated file attributes (distinct from user-defined metadata).

AttributeTypeKey
attributeKeyStringPartition Key
databaseId:assetId:filePathStringSort Key

DynamoDB Streams: NEW_IMAGE

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
DatabaseIdAssetIdFilePathIndexdatabaseId:assetId:filePathattributeKeyALL
DatabaseIdAssetIdIndexdatabaseId:assetIdattributeKeyALL

Metadata Schema Storage Table (V2)

Defines metadata schemas that govern which metadata keys are expected for a given entity type.

AttributeTypeKey
metadataSchemaIdStringPartition Key
databaseId:metadataEntityTypeStringSort Key

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
DatabaseIdMetadataEntityTypeIndexdatabaseId:metadataEntityTypemetadataSchemaIdALL
MetadataEntityTypeIndexmetadataEntityTypemetadataSchemaIdALL
DatabaseIdIndexdatabaseIdmetadataSchemaIdALL

Stores directional relationships between assets (parent, child, related).

AttributeTypeKey
assetLinkIdStringPartition Key

DynamoDB Streams: NEW_IMAGE

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
fromAssetGSIfromAssetDatabaseId:fromAssetIdtoAssetDatabaseId:toAssetIdKeys Only
toAssetGSItoAssetDatabaseId:toAssetIdfromAssetDatabaseId:fromAssetIdKeys Only

Stores metadata attached to asset relationships.

AttributeTypeKey
assetLinkIdStringPartition Key
metadataKeyStringSort Key

DynamoDB Streams: NEW_IMAGE

Pipeline Storage Table

Stores pipeline definitions scoped to a database.

AttributeTypeKey
databaseIdStringPartition Key
pipelineIdStringSort Key

Workflow Storage Table

Stores workflow definitions scoped to a database.

AttributeTypeKey
databaseIdStringPartition Key
workflowIdStringSort Key

Workflow Executions Storage Table

Stores individual workflow execution records.

AttributeTypeKey
databaseId:assetIdStringPartition Key
executionIdStringSort Key

Local Secondary Indexes:

LSI NameSort Key
WorkflowLSIworkflowDatabaseId:workflowId

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
WorkflowGSIworkflowDatabaseId:workflowIdexecutionIdKeys Only
ExecutionIdGSIworkflowIdexecutionIdKeys Only

Authorization Tables

Constraints Storage Table

AttributeTypeKey
constraintIdStringPartition Key

Global Secondary Indexes:

GSI NamePartition KeySort KeyProjection
GroupPermissionsIndexgroupIdobjectTypeALL
UserPermissionsIndexuserIdobjectTypeALL
ObjectTypeIndexobjectTypeconstraintIdALL

Auth Entities Storage Table

AttributeTypeKey
entityTypeStringPartition Key
skStringSort Key

Other Authorization Tables

TablePartition KeySort Key
RolesStorageTableroleName--
UserRolesStorageTableuserIdroleName
UserStorageTableuserId--
ApiKeyStorageTableapiKeyId-- (GSIs: apiKeyHashIndex, userIdIndex)

Classification Tables

TablePartition KeySort Key
TagStorageTabletagName--
TagTypeStorageTabletagTypeName--
SubscriptionsStorageTableeventNameentityName_entityId
CommentStorageTableassetIdassetVersionId:commentId

Configuration Tables

TablePartition KeySort Key
AppFeatureEnabledStorageTablefeatureName--
S3AssetBucketsStorageTablebucketIdbucketName:baseAssetsPrefix (GSI: bucketNameGSI)

Amazon S3 Bucket Organization

Asset Buckets

Asset buckets store all user-uploaded files and pipeline-generated outputs. Each bucket supports versioning and uses the following key structure:

{baseAssetsPrefix}{assetId}/{relative_path}/{filename}

Where:

  • baseAssetsPrefix is the configured prefix for the bucket (default /, meaning root)
  • assetId is the unique asset identifier
  • relative_path is zero or more subdirectory levels within the asset
  • filename is the actual file name

File Output Conventions

Pipeline outputs follow specific naming conventions within the asset key structure:

Output TypeKey PatternExample
Preview file{assetId}/{relative_path}/{filename}.previewFile.{ext}xd130a6d.../test/pump.e57.previewFile.gif
Asset preview{assetId}/preview.{ext}xd130a6d.../preview.jpg
Metadata output{assetId}/{relative_path}/metadata.jsonxd130a6d.../test/metadata.json
Preserving Relative Paths

When pipelines write output files adjacent to input files, the relative subdirectory path within the asset must be preserved. The process-output step expects outputs at the same relative location as the input file.

Auxiliary Bucket

The auxiliary bucket stores non-versioned working files and special viewer data:

{assetId}/{viewer_type}/{generated_files}

Common uses:

  • Potree octree data for point cloud visualization
  • Temporary pipeline processing files
  • Pipeline intermediate outputs

Web App Bucket

Stores the built React frontend static assets. Served as an origin for Amazon CloudFront or Application Load Balancer.

Artefacts Bucket

Stores template notebooks and deployment artefacts. Populated at deploy time from infra/lib/artefacts/.

Access Logs Bucket

Stores server access logs from all other buckets, with 90-day lifecycle expiration. Separate prefixes are used per source:

  • asset-bucket-logs/
  • assetAuxiliary-bucket-logs/
  • artefacts-bucket-logs/
  • cloudtrail-logs/ (when AWS CloudTrail is enabled)

Amazon OpenSearch Index Schemas

VAMS uses a dual-index architecture with separate asset index and file index in Amazon OpenSearch.

Dynamic Field Naming Convention

All indexed fields follow a type-prefix naming convention:

PrefixOpenSearch TypeExample
str_text with keyword sub-fieldstr_assetname, str_databaseid
num_longnum_filesize
bool_booleanbool_archived
date_datedate_lastmodified
list_text with keyword sub-fieldlist_tags
gp_geo_pointgp_location (from metadata)
gs_text (JSON string)gs_properties (from metadata)

Asset Index Schema

The asset index stores one document per asset.

Document ID: {databaseId}:{assetId}

FieldTypeDescription
str_databaseidtext + keywordDatabase identifier
str_assetidtext + keywordAsset identifier
str_assetnametext + keywordAsset display name
str_assettypetext + keywordAsset type classification
str_descriptiontext + keywordAsset description
str_bucketidtext + keywordAssociated bucket identifier
str_bucketnametext + keywordBucket name
str_bucketprefixtext + keywordBucket prefix
str_asset_version_idtext + keywordCurrent version identifier
str_asset_version_commenttext + keywordVersion comment
str_assetlocationkeytext + keywordS3 key from asset's assetLocation
str_previewfilekeytext + keywordS3 key of asset preview image
bool_isdistributablebooleanWhether asset is distributable
list_tagstext + keywordAsset tags
date_asset_version_createdatedateVersion creation timestamp
bool_has_asset_childrenbooleanHas child assets
bool_has_asset_parentsbooleanHas parent assets
bool_has_assets_relatedbooleanHas related assets
bool_archivedbooleanArchive status (#deleted marker)
MD_flat_objectDynamic metadata fields
_rectypekeywordAlways "asset"

File Index Schema

The file index stores one document per file within an asset.

Document ID: {databaseId}:{assetId}:{fileKey}

FieldTypeDescription
str_keytext + keywordFull S3 file path (relative to bucket)
str_databaseidtext + keywordDatabase identifier
str_assetidtext + keywordAsset identifier
str_assetnametext + keywordParent asset name
str_bucketidtext + keywordBucket identifier
str_bucketnametext + keywordBucket name
str_bucketprefixtext + keywordBucket prefix
str_fileexttext + keywordFile extension
str_etagtext + keywordAmazon S3 ETag
str_s3_version_idtext + keywordAmazon S3 version identifier
str_previewfilekeytext + keywordS3 key of associated preview file
date_lastmodifieddateLast modification timestamp
num_filesizelongFile size in bytes
bool_archivedbooleanArchive status (delete marker present)
list_tagstext + keywordTags inherited from parent asset
MD_flat_objectDynamic metadata fields
AB_flat_objectDynamic attribute fields
_rectypekeywordAlways "file"

Dynamic Templates

Both indexes use OpenSearch dynamic templates to handle fields that follow the type-prefix convention but are not explicitly mapped:

{
"dynamic_templates": [
{
"core_strings": {
"match": "str_*",
"mapping": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }
}
},
{ "core_numeric": { "match": "num_*", "mapping": { "type": "long" } } },
{ "core_boolean": { "match": "bool_*", "mapping": { "type": "boolean" } } },
{ "core_dates": { "match": "date_*", "mapping": { "type": "date" } } },
{
"core_lists": {
"match": "list_*",
"mapping": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }
}
}
]
}
Flat Object Fields for Metadata and Attributes

The MD_ and AB_ fields use the OpenSearch flat_object type. This stores all dynamic metadata and attribute key-value pairs within a single field, preventing field explosion that would occur if each metadata key created a new top-level index field.

Excluded Fields

Fields prefixed with VAMS_ or _ (except _rectype) are excluded from indexing. These are internal system fields not intended for search.

Archived Data Pattern

VAMS uses a #deleted suffix on the databaseId partition key to mark archived assets:

Active asset:    PK = "my-database",         SK = "asset-123"
Archived asset: PK = "my-database#deleted", SK = "asset-123"

This pattern allows efficient queries for either active or archived assets using the partition key, without requiring a secondary index or scan filter.

In the OpenSearch indexes, archived assets and files are indicated by the bool_archived field set to true.

Versioning Data Model

VAMS implements a versioning system that combines Amazon S3 object versioning with Amazon DynamoDB version records:

Version Lifecycle

  1. Create Version: A new record is inserted into the Asset Versions table with a unique assetVersionId. File records are captured in the Asset File Versions table, each referencing the Amazon S3 object version ID at that point in time.
  2. Update Version: The version's versionAlias and comment fields can be updated.
  3. Archive Version: The version record's isArchived flag is set to true. The asset's databaseId in the main Asset Storage table gains the #deleted suffix.
  4. Unarchive Version: The isArchived flag is reverted and the #deleted suffix is removed from the databaseId.

Metadata Version Snapshots

The Asset File Metadata Versions table captures a snapshot of all metadata and attribute values at the time a version is created. The composite sort key type:filePath:metadataKey allows querying metadata for a specific file within a specific version, or all metadata across all files in a version.

Next Steps