Skip to content

Pre-requisites

In this section we outline the main pre-requisites for using Project Lakechain and start deploying pipelines.

💻 Environment

Project Lakechain has been successfully tested on different Linux distributions, MacOS, and Cloud development environments such as AWS Cloud9 and GitHub Codespaces.

We recommend having 50GB of free storage on your development machine to be able to build and deploy all the middlewares and examples.

👇 We have a ready made Dev Container for GitHub Codespaces that you can use to get started quickly.

Github Codespaces

ℹ️ Tip We’ve also created a Cloud9 script that you can use in your Cloud9 environment to resize the EBS storage associated with the instance.



☁️ AWS Access

You will need access to an AWS account on your development machine with valid credentials. You can use the AWS CLI to verify that you have valid credentials.

ℹ️ The AWS documentation describes how to configure the AWS CLI.

Terminal window
$ aws sts get-caller-identity
{
  "UserId": "USEREXAMPLE",
  "Account": "123456789012",
  "Arn": "arn:aws:sts::123456789012:user/JohnDoe"
}


🐳 Docker

As some middlewares are packaged as Docker containers, you need to have Docker installed and running on your development machine. You can use the Docker CLI to verify that you have access to the Docker daemon.

Terminal window
$ docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:39 2021
 OS/Arch:           darwin/amd64
 Context:           default


📦 Node.js + NPM

Node.js and NPM must be available to install the Lakechain project dependencies. You can use the Node.js CLI to verify that you have access to the Node.js runtime.

We recommend using Node.js 18+. You can use nvm to easily manage multiple versions of Node.js on your development machine.

Terminal window
$ node --version
v20.3.1


🐍 Python and Pip

Python 3.9+ and Pip are used to package some Lakechain middlewares written in Python. You can use the Python binary to verify that you have a Python 3.9+ runtime up and running.

Terminal window
$ python3 --version
Python 3.11.5


Optional Dependencies

Although optional, as they will be installed by Project Lakechain and run using npx, we recommend installing the following dependencies on your development machine:

TypeScript 5.0+

Terminal window
$ npm install -g typescript

AWS CDK v2

Terminal window
$ npm install -g aws-cdk