Skip to main content

Downsize existing FSx Volume

Instructions to downsize an existing FSxL Filesystem using DRA import/export

These steps detail how to migrate from an existing FSxL filesystem to a smaller FSx FileSystem. There are several steps involved in this process, which uses FSxL DRA Export Policies to backup existing FileSystem to s3, then create a new FSxL Filesystem with a DRA Import policies to move old data to new FileSystem, and mount it to an existing Cluster. For more information, see the FSx Lustre guide on linking your filesystem to S3 bucket .

warning

You cannot downsize the storage size of an existing FSxL filesystem, hence the below method to migrate to a smaller FSxL filesystem is the alternative method

It is recommended to create a new S3 bucket in the desired region of your destination filesystem for this purpose:

To create a new bucket using the cli:

aws s3api create-bucket --bucket <YOUR_DRA_BUCKET> --region <AWS_REGION> --create-bucket-configuration LocationCons
traint=<AWS_REGION>

2. Create a Data Repository Association for your existing FSx

In the FSx Lustre Console, go to your existing Filesystem > Data Repository Assoications > Create. Follow the below sample configuration:

Users

warning

Note: for filesystem path, it is recommended to set the root filesystem path of / to ensure the entire filesystem is included in the DRA policy.

3. Create a Data Repository Export Task

Once your Data Repositary association is created, create a data repository task for that DRA. Follow the below sample configuration:

DRA Task Create

Once created, you can view the status of your export task and view the Task status and total number of files processed by the task.

DRA Task Status

When the task has reached a status of "Succeeded", verify that the data has been successfully exported to your S3 bucket. You should see your root Filesystem directory stored in S3, similar to this example:

S3 DRA

4. Create a new FSx FileSystem which is right-sized:

  • Create file system for FSx - Persistent SSD, same configuration (VPC, Subnet, Security Group) as old FileSystem.
  • Create a DRA Import Policy targeting s3 bucket created in step 1
4a. Create file system for FSx

Create new FSx Persistent or Scratch filesystem, ensure proper VPC, Subnet, Security Group -reference existing filesystem, SG can be same SG as HyperPod cluster nodes in SageMaker console.

FSxL Config

4b. Create Data Repostiory Association for your new FSxL filesytem

Create a DRA Import Policy (and optional export policy) for your new Filesystem. Ensure you specify the path to the S3 bucket which you have backed up your existing Filesystem.

DRA Create 2

5. Unmount existing FSxL from HyperPod Nodes.

For reference, see FSx Lustre documentation for unmounting Filesystem. Instructions also provided below:

# on controller / login node of HyperPod cluster

# Login as Root User
$ sudo su

# cd to root directory
$ cd

#verify location as /root
pwd

#should show /root
warning

if you see the following error, you will need to logout of ubuntu user which is auto mounted on home directory

root@ip-10-1-79-94:~# umount /fsx
umount: /fsx: target is busy.

# $exit
#sudo su
#retry

6. Mount new FSxL on Controller / Login Nodes

Mount new FSxL on cluster controller / login node following commands in FSxL console. Run the command from FSxL console as root, once filesystem is successful. Note, we recommend waiting for the DRA task to execute successfully before continuing with this task.

Mount FSx

7. Unmount /fsx from the compute nodes using srun

srun -N <NUM_COMPUTE_NODES> sudo umount /fsx

8. Mount new filesystem to compute nodes

srun -N <NUM_COMPUTE_NODES> sudo mount -t lustre ... (command from FSxL console)

9. Delete old FSx Lustre Filesystem

Once you have confirmed the new filesystem is mounted and you can read files from it, proceed with deleting the old FSx Filesystem

10. Update HyperPod Lifecycle Scripts with new FSxL

Update provisioning_parameters.json file in your sagemaker lifecycle bucket with new FSxL DNS and Mount Name.

Provisioning Parameters Update

Once updated, upload the new provisioning_parameters.json file to your s3 bucket in the lifecycle script directory to overwrite the old provisioning_parameters.json file. You can do so via the console (image below) or CLI.

Upload to S3

After completing above steps, it is recommended to test that the new FSx Lustre defined in provisioning_parameters.json is mounted to new cluster nodes upon HyperPod replacement actions. You can trigger a manual node replacement from the head/controller node with the below command:

sudo scontrol update node=<NODE-IP> state=down reason="Action:Replace"

When the node comes back online, SSH into the node and confirm that FSx Lustre is properly mounted.