Skip to content

Security Model

In this section, we describe the security model implemented by the Lakechain framework that applies to any document processing pipeline.

๐Ÿ‘ฎ Least-Privilege Principle

In Lakechain, middlewares are only granted IAM read access to documents provided by their immediate predecessors in the pipeline.

To illustrate how this principle applies in practice, letโ€™s take a simple visual example. Below is an illustration of a pipeline composed of 2 middlewares (M1 and M2) triggered by document events from S3. We highlight on the below diagram the IAM grants of each middleware relative to the documents they are processing.

Access Policies

M1 has read-only access to the initial document in the S3 bucket. It can read the document, but not modify it. Similarly, M2 has read-only access to the document produced by M1. However, M2 does not have access to the original document in the bucket, nor M1 does have access to documents produced by M2.



Cache Access Policy

While middlewares can only access documents provided by their immediate predecessors, they can still share metadata with each other using the Lakechain cache storage. This storage is implemented on top of S3, and is used to store metadata that are too large to fit in a Cloud Event.

The cache storage is readable by all middlewares in a pipeline. However, each middleware is only given write access to its own specific prefix within the cache storage, and cannot modify or delete metadata produced by other middlewares.

Below is an example of a middleware (M1) storing metadata in the cache storage within its own prefix, and a middleware (M2) reading it.

Cache Read-Write

๐Ÿ‘ฎ If M1 attempts to write in a different location than its own dedicated prefix, the write operation will fail.

Cache Write Failure


๐Ÿ”’ Encryption

Lakechain enforces best-practices in terms of encryption for data at rest and in transit. Below is a summary of the encryption models implemented per component.

Default Encryption

By default, Lakechain attempts to use AWS managed keys for encryption at rest, and in transit capabilities provided by AWS services. The table below describes the encryption defaults associated with each type of services used by middlewares when no Customer Managed Key (CMK) is provided.

ServiceEncryption at RestEncryption in Transit
SQSโœ… SSE-SQSโœ… TLS 1.2+
SNSโŒ (1)โœ… TLS 1.2+
EFSโŒ (2)โœ… TLS 1.2+
DynamoDBโœ… SSEโœ… TLS 1.2+
Logsโœ… SSEโœ… TLS 1.2+
Cache Storageโœ… SSE-S3โœ… TLS 1.2+
Middleware Storageโœ… SSE-S3โœ… TLS 1.2+
Environment VariablesโŒ (3)N/A

(1) SNS does not support default encryption at rest (SSE-SNS) in all AWS regions. (2) EFS does not support encryption with AWS managed keys in all AWS regions. (3) Environment variables in AWS Lambda require a CMK to be encrypted.

CMK Encryption

Lakechain also supports the use of Customer Managed Keys (CMK) to encrypt data at rest and create end-to-end encrypted pipelines. When a CMK is provided, Lakechain will use it to encrypt data at rest for all services that support CMK encryption.

ServiceEncryption at RestEncryption in Transit
SQSโœ… KMSโœ… TLS 1.2+
SNSโœ… KMSโœ… TLS 1.2+
EFSโœ… KMSโœ… TLS
DynamoDBโœ… KMSโœ… TLS 1.2+
Logsโœ… KMSโœ… TLS 1.2+
Cache Storageโœ… KMSโœ… TLS 1.2+
Middleware Storageโœ… KMSโœ… TLS 1.2+
Environment Variablesโœ… KMSN/A


๐Ÿšง VPC Endpoints

Customers with specific security and compliance requirements can place any Lakechain middleware in a customer-provided VPC with the appropriate VPC endpoints, subnets, and route tables configuration to ensure that traffic between the middleware and AWS services does not leave the AWS network.

๐Ÿ’ Customers are responsible for configuring their VPC, and can pass an instance of an IVpc to the middleware construct to let it know in which network its compute resources should be placed.

const middleware = new ExampleMiddleware.Builder()
.withScope(this)
.withIdentifier('Example')
.withCacheStorage(cache)
.withVpc(vpc) // ๐Ÿ‘ˆ Pass the VPC
.build();

Endpoint Types

The VPC endpoints that you need to enable on your VPC depend on the AWS services used by the middlewares in your pipeline. Below is a list of VPC endpoints associated with AWS services commonly used by all middlewares.

ServiceEndpoint Type
AWS SQSInterface Endpoint
AWS SNSInterface Endpoint
AWS CloudWatch LogsInterface Endpoint
AWS S3Interface or Gateway Endpoint

๐Ÿ’ Please read the documentation page associated with each middlewares part of your pipeline to understand which AWS service they use and what associated VPC endpoint is required.