Manually Rebalance your MinIO Modern Datalake

AJ AJ on AI/ML |
Manually Rebalance your MinIO Modern Datalake

When a MinIO Modern Datalake deployment is extended by adding a new server pool, by default it does not rebalance objects. Instead, MinIO writes new files/objects to the pool with more free space. A manual trigger of rebalance for MinIO scans the whole deployment and then objects are moved around the server pools (if needed) to make sure all the pools are almost at similar free space level afterwards. It is an expensive operation for a MinIO deployment and should be triggered judiciously. Though a rebalance can be stopped and restarted at any point.

It's not easy to simulate a rebalance scenario in a stand alone deployment whether it's a kind based cluster or standalone using directories as drives. To actually get a rebalance triggered, we need existing pools to be populated almost to full and then after adding a new server pool, manually trigger the rebalance. For this reason, it will always be easier and better to simulate this scenario using Virtual Machines.

For a quick and easy developer mode of simulation of rebalance, LXD (Linux Container Hypervisor) is a good option. This blog will list the required settings and describe the procedure for how a simulated rebalance can be achieved.

Setup Infrastructure

Follow the below steps to simulate the rebalance in a MinIO Modern Datalake deployment. For this we should spin a total 8 LXD VMs with ubuntu. We will start the initial MinIO instance using the first 4 VMs and later extend with the next 4 VMs. To limit the size of objects which can be loaded, we would add virtual disks of size 1GiB to all the VMs using loopback devices from the host. So, let's get going!!! Follow the below steps.

LXD is available officially as a snap package, so install snapd first.

$ sudo apt install snapd

$ sudo ln -s /var/lib/snapd/snap /snap

$ snap version

$ sudo systemctl restart snapd.service

Now install lxd.

$ sudo snap install lxd

Verify the installation and start LXD.

$ sudo snap enable lxd

$ sudo snap services lxd

$ sudo snap start lxd

Add your username to the lxd group.

$ sudo usermod -a -G lxd <USERNAME>

$ id <USERNAME>

$ newgrp lxd

Initialize LXD.

$ lxd init

This would take you through a set of questions and most of the default ones can be accepted.

Would you like to use LXD clustering? (yes/no) [default=no]:

Do you want to configure a new storage pool? (yes/no) [default=yes]:

Name of the new storage pool [default=default]:

Name of the storage backend to use (btrfs, ceph, dir, lvm) [default=btrfs]:

Create a new BTRFS pool? (yes/no) [default=yes]:

Would you like to use an existing block device (yes/no) [default=no]:

Size in GB of the new block device (1GB minimum) (default=30GB):

Would you like to connect to a MAAS server (yes/no) [default=no]:

Would you like to create a new local network bridge (yes/no) [default=yes]:

What should new bridge be called (default=lxdbr0):

What IPv4 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]:

What IPv6 address should be used? (CIDR subnet notation, "auto" or "none") [default=auto]: none

Would you like LXD to be available over the network? (yes/no) [default=no]:

Would you like stale cached images to be updated automatically? (yes/no) [default=yes]:

Would you like a YAML "lxd init" pressed to be printed? (yes/no) [default=no]:

Create VMs using below commands.

$ lxc init images:ubuntu/jammy vm-01 --profile=default -c boot.autostart=true -c security.privileged=true -c security.syscalls.intercept.mount=true -c security.syscalls.intercept.mount.allowed=ext4 -c limits.memory=1024MB -c limits.cpu.allowance=10%

$ lxc start vm-01

Repeat these commands for all 8 VMs. Now to make all the IPv4 IPs are in order for the VMs execute the below commands to reset their IPs.

$ lxc stop vm-01

$ lxc network attach lxdbr0 vm-01 eth0 eth0

$ lxc config device set vm-01 eth0 ipv4.address 10.115.111.111

$ lxc start vm-01

Similarly, for other VMs the IPs could be set as 10.115.111.112, 10.115.111.113, ....

Get inside the VMs now and create virtual disk images of 1GiB size and format it using mkfs.ext4. Also create a mount path for the loopback device from the host.

$ lxc exec vm-01 bash

$ truncate -s 1GiB /media/disk.img

$ mkfs.ext4 /media/disk.img

$ mkdir /mnt/virtual-disk

Repeat this for all the VMs.

Now we will attach available loopback devices from the host to the VMs. These are just files listed as /dev/loop* and not necessarily all listed would be available to use. In case while mounting within a VM, you face an issue mount: /mnt/virtual-disk: failed to setup loop device for /media/disk.img., create a fresh loopback device on host and try with that new one. Use command sudo losetup -f for getting a new free loopback device to be used. Follow the below steps for attaching and mounting the loop device within VMs

$ lxc config device add vm-01 loop-control unix-char path=/dev/loop-control

$ lxc config device add vm-01 loop4 unix-block path=/dev/loop4

From inside VM , mount it now.

$ lxc exec vm-01 bash

$ mount -t ext4 -o loop /media/disk.img /mnt/virtual-disk

Repeat this process for all the VMs. To verify if mount is successful, run the command as below within the VMs.

$ mount | grep virtual-disk

/media/disk.img on /mnt/virtual-disk type ext4 (rw,realtime)

The VMs are ready now and we can start MinIO deployment

Setup MinIO

We need a number of objects created which we would later push to MinIO buckets. Execute the below commands on host node for the same

$ mkdir -p $HOME/test-minio-rebal

$ cd $HOME/test-minio-rebal

$ for index in {1..4500}; do truncate -s 1M file$index; done

This would create 4500 random files with 1M size each.

On all VMs install MinIO binary

$ wget -O https://dl.min.io/server/minio/release/linux-amd64/minio

$ chmod +x minio

$ mv minio /usr/local/bin

The MinIO client can be installed on the host itself.

$ wget -O https://dl.min.io/client/mc/release/linux-amd64/mc

$ chmod +x mc

$ sudo mv mc /usr/local/bin

Start the MinIO instance and load objects. First get a list of all running lxc VMs and note their IPv4 IPs.

$ lxc list

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| NAME  |  STATE  |    IPV4     |                 IPV6                 |   TYPE | SNAPSHOTS |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-01 | RUNNING | 10.49.238.61 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef3:f0f (eth0)  | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-02 | RUNNING | 10.49.238.62 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe16:4d04 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-03 | RUNNING | 10.49.238.63 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe34:44cd (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-04 | RUNNING | 10.49.238.64 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef9:4262 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-05 | RUNNING | 10.49.238.65 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe16:2e02 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-06 | RUNNING | 10.49.238.66 (eth0) | fd42:9cd0:6055:a53:216:3eff:fe94:4610 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-07 | RUNNING | 10.49.238.67 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef1:40f3 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

| vm-08 | RUNNING | 10.49.238.68 (eth0) | fd42:9cd0:6055:a53:216:3eff:fef5:d909 (eth0) | CONTAINER | 0     |

+-------+---------+---------------------+----------------------------------------------+-----------+-----------+

Now start a MinIO instance using the first four VMs as below. This command should be run from inside the first 4 VMs.

$ minio server http://10.49.238.{61...64}/mnt/virtual-disk/disk{1...4}

Once the instance is stabilized, create an mc alias for the cluster as below.

mc alias set ALIAS http://10.49.238.61:9000 minioadmin minioadmin

We are ready to load objects to the cluster now. Run the below command for the same.

$ mc mb ALIAS/test-bucket

$ mc cp $HOME/test-minio-rebal/* ALIAS/test-bucket

You may see errors towards the end that not more space left on disk and that's fine as the cluster is now loaded to the limit with objects. Wait for a few secs and verify that objects are loaded to the pool.

$ mc admin info ALIAS --json | jq -r '.info.pools'

{                                                                                                           

  "0": {                                                                                                           

"0": {                                                                                                           

   "id": 0,                                                                                                           

   "rawUsage": 3785478144,                                                                                                           

   "rawCapacity": 3800956928,                                                                                                           

   "usage": 1155530752,                                                                                                           

   "objectsCount": 1102,                                                                                                           

   "versionsCount": 0,                                                                                                           

   "healDisks": 0                                                                                                           

}                                                                                                           

  }                                                                                                          

}

We are all good to extend the cluster now with a new set of nodes. Stop MiniIO processes on the first 4 VMs and now from all 8 run the following command.

$ minio server http://10.49.238.{61...64}/mnt/virtual-disk/disk{1...4} http://10.49.238.{65...68}/mnt/virtual-disk/disk{1...4}

Let the cluster stabilize and check if the new pools are added.

$ mc admin info ALIAS --json | jq -r '.info.pools'

{

  "0":

"0": {                                                                                                                                                               

   "id": 0,                                                                                                                                                           

   "rawUsage": 3785478144,                                                                                                                                            

   "rawCapacity": 3800956928,                                                                                                                                         

   "usage": 1155530752,                                                                                                                                               

   "objectsCount": 1102,                                                                                                                                              

   "versionsCount": 0,                                                                                                                                                

   "healDisks": 0                                                                                                                                                     

}                                                                                                                                                                    

  },                                                                                                                                                                     

  "1": {                                                                                                                                                                 

"0": {                                                                                                                                                               

   "id": 0,                                                                                                                                                           

   "rawUsage": 376832,                                                                                                                                                

   "rawCapacity": 3800956928,                                                                                                                                         

   "usage": 0,                                                                                                                                                        

   "objectsCount": 0,                                                                                                                                                 

   "versionsCount": 0,                                                                                                                                                

   "healDisks": 0                                                                                                                                                     

}                                                                                                                                                                    

  }                                                                                                                                                                      

}

Now safely you can run the rebalance on the cluster.

$ mc admin rebalance start ALIAS

You can track the status of running rebalance as below.

$ mc admin rebalance status ALIAS

Per-pool usage:

┌─────────┬────────┐

│ Pool-0  │ Pool-1      │

│ 0.85% * │ 0.14%       │

└─────────┴────────┘

Summary:

Data: 390 MiB (195 objects, 195 versions)

Time: 11.40798155s (52.794879439s to completion)

Once rebalance is completed and no more objects to move, you should be able to verify the same as below.

$ mc admin info ALIAS --json | jq -r '.info.pools'

{

  "0": {

"0": {

   "id": 0,

   "rawUsage": 2029391872,

   "rawCapacity": 3800956928,

   "usage": 1394606080,

   "objectsCount": 1330,

   "versionsCount": 0,

   "healDisks": 0

}

  },

  "1": {

"0": {

   "id": 0,

   "rawUsage": 1756606464,

   "rawCapacity": 3800956928,

   "usage": 435159040,

   "objectsCount": 415,

   "versionsCount": 0,

   "healDisks": 0

}

  }

}

It's as simple as that.

Final Thoughts

Generally MinIO does not require any manual rebalancing when new pools are added or removed. MinIO is intelligent enough to add the new data as it comes in where the space is available all while keeping Erasure Coding requirements in mind. Rebalancing  is a very resource intensive operation so it's not recommended to run it during peak hours when the cluster is used the most. Instead if the cluster needs to be rebalanced it has to be done during off hours at a time when the cluster is minimally used.

If you have any questions on rebalancing and how to manage the MinIO Modern Datalake be sure to reach out to us on Slack!