Storage Infrastructure for Automating Configuration Management with Salt and Puppet
Globally there has been a shift to bring applications closer to home. Enterprises want more control of their data and have had enough of paying egress fees to the public cloud to get access to their own data. Besides cost, there is also the matter of security, or lack thereof, when resources are shared with unknown organizations. Vulnerabilities can trickle down and affect your systems since everything is shared.
With the average enterprise running on 4 clouds (including on-prem private cloud), it’s important to standardize configurations, write scripts and automate as much as possible. This will decrease the data management burden while still giving everyone access to the data they need.
While enterprises are bringing things on-prem, they are re-evaluating cloud spending with a focus on data gravity – cloud isn’t a place it’s an operating paradigm – software, tooling and Kubernetes make it possible to build their own clouds. They build data lake, AI/ML, and analytics solutions using best-of-breed apps and cloud services, then they use MinIO to replicate data where it is needed. But how do they maintain consistency across clouds?
Enter configuration management systems. These are tools such as Chef, Puppet, Salt, Ansible and Terraform, and they make the lives of DevOps team members easier by reducing their tech debt and simplifying requirements for ongoing management and scaling. Rather than performing each operation manually, these tools perform the same operation on thousands of servers idempotently (that is to say identically and instantaneously).
In this blog post, we will focus on Puppet and Salt because these are the most popular pull-based configuration management systems – each relies on an agent that is always running on the node either as a daemon or a cronjob. This agent then “pulls” the configuration and settings from a remote location such as a Master server. Maintaining a central set of approved configurations means that all agents perform consistent local operations.
Master-Agent vs Masterless Headless Architecture
Let’s talk about scaling backend systems in order to support frontend systems. In this case, let’s talk about configuration management systems specifically. These systems work by ensuring the configuration is the same across 10 or 1,000 servers in an idempotent manner, meaning no matter how many times the configurations are run on the node the end result should be the same. If it's not the same the idempotent nature of the system will ensure the system is brought back to its desired state. In a nutshell that is how these configuration management systems work. They are generally declarative in nature – instead of writing control logic you just have to state the desired outcome of the task. This is because there are so many different moving parts while building infrastructure components that it's highly improbable to think of all the logic to write ourselves.
When you have a handful of servers, say about 10 agents and 1 master server, the configuration runs pretty quickly for each and there is no reason to make the system more complex than it needs to be. But what about when you have over 1,000 nodes that you have to configure? My own experience of managing over 8,000 servers in a previous life with just 1 master server would not cut it because the system simply could not operate with performance-at-scale. The solution we found was to back the configuration management system with 5-6 masters and scale to more as the number of frontend systems increased. Now, this is a complex configuration management system and, over time it’s just going to get more complex.
To scale as we did, and don’t feel bad if you did this too, because it’s what everyone does, then we needed to address the networking aspect of the system. With multiple masters, you also need to configure a load balancer or a reverse proxy (like Nginx) to be able to distribute the load among the masters because often you can only configure one master to talk to. Okay, well then why not round-robin (RR) DNS? While this might seem an acceptable solution on the surface, RR DNS randomly sends requests to different servers based on each request. So if you have multiple requests they will be directed to different master servers and if not all master servers are synchronized, the nodes will apply both old and new configurations which is very difficult to debug. Due to this unpredictability of RR DNS we do not recommend this for stateful applications.
Let's assume you do get this far scaling your environment, how are you going to maintain high availability? What happens when the load balancer goes down or has to be taken down for maintenance? So you need a way to distribute the load between multiple load balancers, and now you’ve hit a hard limit on expansion because configuring and maintaining this rabbit hole is more painful than it's worth.
Wouldn’t it be cool if we take out this entire Master-LoadBalancer rigamarole and simplify our architecture? Could we replace the entire backend with something simpler with the requirement that no additional technical debt be incurred?
Yes, we can. The answer is everyone’s favorite software-defined object storage, MinIO. Organizations already use MinIO as a key infrastructure tool in their ecosystem for a variety of purposes such as data pipelines, storage of developer code and artifacts, object storage, ElasticSearch indices, log files, and even external SQL tables for Snowflake – the usage is truly ubiquitous. Similarly, can MinIO help us over the hurdles of scaling our configuration management systems??
Yes. In these pull-based configuration systems, there is generally another method called masterless headless configuration. There is still an agent running on all those instances but there is no master. Instead, all the files and configurations that need to be applied to the node are stored locally on the node where the configuration is being applied. By setting it up as headless, MinIO acts as a central store for all the artifacts and configurations needed to apply to the systems. The beauty of MinIO is that even if you are setting it up for the first time for just this specific purpose you will immediately benefit from all the performance, durability and scaling we built into it.
MinIO Security, Encryption and Replication
As a DevOps engineer, I had the opportunity to manage several thousands of servers under my belt at any given time. One thing I’ve learned is as the user-facing systems grow you need to scale your backend systems that support these frontend systems. But, as you’ll see later in the post, scaling configuration management is not so straightforward as compared to other management functions.
Today’s configuration management tools fail to scale because the communication between them is too inefficient. There are just too many agents authenticating, re-authenticating and re-connecting all at once, too few resources on the master such as CPU and drive space – all of which will cause the agents to slow down. More specifically, the file systems where the config data, secrets, non-sensitive data and other files are slow need to be fast. While these files are not necessarily big they are accessed concurrently quite often from thousands of nodes trying to pull these small files at a furious pace. For these kinds of operations, a simple single HDD with a normal file system won't cut it and would keel over as more and more nodes try to pull this data. In addition to this, you might need to increase your disk size based on the number of agents being managed and for how long you need to keep the data in storage. For example, if you have a high-throughput system and your system is in a highly-regulated industry that requires 7-8 years of data retention, that might require higher disk size and storage capacity. To tackle these issues DevOps teams do heroic things like building everything from scratch and glue it all together in order to scale their backend systems. Wouldn’t it be cool if we could replace the backend of these configuration management systems with something more resilient and scalable, especially when this new backend has a lower tech debt to a point where we don’t even have to think about it?
MinIO is the perfect fit here because we are achieving this with a two-pronged approach. Firstly, MinIO eliminates the need for a complex system of Agent-MultiMaster-MultiLoadBalancers with its built-in site-to-site replication architecture, which is easy to configure and scale not only within the same region but across multiple locations. Secondly, MinIO from the get-go has been built with performance, speed and simplicity in mind – it's part of our ethos. We recommend that our users and customers use commodity hardware with disks in pure JBOD mode to ensure the underlying infrastructure is as simple and performant as possible. MinIO is the perfect combination of high performance and scalability, and this puts every data-intensive workload within reach. MinIO is capable of tremendous performance - a recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs.
Let’s see how we can achieve this.
We will set up a geographically distributed infrastructure where we have a multi-node multi-drive MinIO cluster in three different sites across the globe. This will allow us to appreciate replication working at scale and provide insight into the infrastructure needed for geographic load-balancing and high availability.
Since we introduced multi-site active-active replication, our focus has been on improving replication performance without causing additional degradation on existing cluster operations. This allows you to replicate data across multiple data centers and clouds for daily operational needs and for disaster recovery needs such that one site going offline will not decrease global availability. The replication configuration is completely handled on the server side with a set-it-and-forget-it ethos where the application using MinIO does not need to be modified in any way, everything is handled on the server side.
MinIO encrypts objects at the storage layer by using Server-Side Encryption (SSE) to protect objects as part of write operations. MinIO does this with extreme efficiency – benchmarks show that MinIO is capable of encrypting/decrypting at close to wire speed.
How does MinIO achieve this level of efficiency and performance?
The secret sauce MinIO uses is called SIMD; Single Instruction Multiple Data. Generally, you can only send one CPU instruction at a time and wait for a response before sending the next instruction. This is highly inefficient, especially when performing thousands – if not millions – of encryption and decryption instructions per second. MinIO takes advantage of SIMD to send multiple instructions in a CPU single request. MinIO is written in assembly within GoLang so we are as close to the hardware layer as possible to take full advantage of the typically underutilized CPU power to perform cryptographic operations at scale.
MinIO also includes:
- Encryption: MinIO supports both encryption at Rest and in Transit. This ensures that data is encrypted in all facets of the transaction from the moment the call is made till the object is placed in the bucket.
- Bitrot Protection: There are several reasons data can be corrupted on physical disks. It could be due to voltage spikes, bugs in firmware, or misdirected reads and writes among other things. MinIO ensures that these are captured and fixed on the fly to ensure data integrity.
- Tiering: For data that doesn’t get accessed as often you can siphon off data to another cold storage running MinIO so you can optimize the latest data on your best hardware without the unused data taking space.
- Erasure Coding: Rather than ensure redundancy of data using a combination of local RAID and node replication, which adds additional overhead on performance, MinIO uses this data redundancy and availability feature to reconstruct objects on the fly without any additional hardware or software.
Set up MinIO
In this scenario, we will assume there are two discrete data centers or regions, and that you are running MinIO in both.In site 1, we will set up minio1
cluster, and in site 2 the minio2
cluster. Both of these will be configured with site-to-site replication so no matter which site data is added to, MinIO will ensure data gets replicated to the other sites. You can expand to N sites later as you add more regions or data centers. It is paramount you create a new site for each new region so that cross-region and other network traffic is kept to a minimum, with the added resiliency and geographic load-balancing so you can download data at a faster clip.
If you have not yet set up your MinIO clusters, please see Install and Deploy MinIO — MinIO Object Storage for Linux, and download the MinIO Client (mc).
Let's set up replication across both clusters
$ mc admin replicate add minio1 minio2
Requested sites were configured for replication successfully.
Verify that both sites are configured correctly
mc admin replicate info minio1
SiteReplication enabled for:
Deployment ID | Site Name | Endpoint
f96a6675-ddc3-4c6e-907d-edccd9eae7a4 | minio1 | http://<site1_public_ip>
0dfce53f-e85b-48d0-91de-4d7564d5456f | minio2 | http://<site2_public_ip>
Check to make sure replication is working properly
mc admin replicate status minio1
Bucket replication status:
No Buckets present
Policy replication status:
● 5/5 Policies in sync
User replication status:
No Users present
Group replication status:
No Groups present
Test by creating a bucket in minio1
/opt/minio-binaries/mc mb minio1/testbucket
Add any object into the bucket
/opt/minio-binaries/mc cp my_object minio1/testbucket
List the objects in the other sites, in this case on minio2
/opt/minio-binaries/mc ls minio2/testbucket
[2023-08-14 18:52:09 UTC] 3.0KiB STANDARD my_object
Configuring the Configuration Management Systems
Let’s look at a couple of masterless configurations from different configuration management providers.
Salt
Running the salt minion in a masterless configuration allows you to have the configurations locally without having to call out to the master node. The goal of this setup is to run the Salt States entirely from files local to the minion pulled from the MinIO site-to-site replicated cluster.
Let's bootstrap the salt minion using the script below. You can run it on any operating system but in this example, we are going to set it up on Ubuntu.
curl -L https://bootstrap.saltstack.com -o bootstrap_salt.sh
sudo sh bootstrap_salt.sh
We need to configure to use the local
files instead of connecting to the remote
salt server. In order to do this, change the file_client
as shown below.
file_client: local
This will ensure the salt minion does not communicate with the remote server and instead use the files and pillars available locally on the disk.
Create a bucket in the MinIO cluster and mirror the salt states from local to the MinIO1 bucket
/opt/minio-binaries/mc mb minio1/salt
Using the mirror
command mirror the salt states and files to the MinIO `salt` bucket
mc mirror ~/salt_states/ minio1/salt
Because we have site-to-site replication setup earlier in the process, the salt files will be automagically replicated to the other site without any additional steps.
Now that the MinIO salt
bucket has all the necessary files and states, MinIO will now act as the “server” for Salt, essentially replacing the Salt Master with a backend that is more robust, lightning-fast and resilient to the demanding load.
We will now have to copy these files and states to individual nodes where we can run the salt-call
command locally and apply the configuration to the node.
mc cp --recursive minio1/salt/ /srv/salt/
This will ensure the salt state files are recursively copied to the local filesystem located at /srv/salt
which is where salt-call
will read from when used with the --local
flag
salt-call --local state.apply -l debug
The debug flag will give a verbose output of the run which will help with any issues that might appear when applying states.
Next, we’ll show you how simple it is to incorporate MinIO as a server backend in another configuration management, this time Puppet.
Puppet
Puppet works in a similar fashion to Salt in the sense that it's a pull-based configuration model. Let’s take a look at how to set up Masterless/Headless configuration in Puppet.
Let's install the puppet prerequisites on our node, starting with the puppet agent-related packages. Install the Puppet APT repo on the Ubuntu node
wget http://apt.puppetlabs.com/puppetlabs-release-trusty.deb
dpkg -i /tmp/puppetlabs-release-trusty.deb
apt-get update
apt-get install puppet
Open the Puppet configuration file at /etc/puppet/puppet.conf
and replace the entire file with the following configuration settings
[main]
logdir=/var/log/puppet
vardir=/var/lib/puppet
ssldir=/var/lib/puppet/ssl
rundir=/var/run/puppet
factpath=$confdir/facter
In the above settings the $confdir
is equivalent to the directory path /etc/puppet
on the node.
Once we have the masterless-headless Puppet agent set up on the node, let’s set up the bucket on the MinIO side. Create a bucket in the MinIO cluster and mirror the puppet modules and manifests from local to the MinIO 1 bucket
/opt/minio-binaries/mc mb minio1/puppet
Using the mirror
command, mirror the puppet modules and files to the MinIO puppet
bucket
mc mirror ~/puppet_modules/ minio1/puppet
Using mc cp
copy these modules and manifests to individual nodes where we can run the puppet agent
command locally and apply the manifests to the node.
mc cp --recursive minio1/puppet/ /etc/puppet/
This tutorial assumes you have a file already at /etc/puppet/manifests/site.pp
which has been downloaded when we ran mc cp
on the node with the following contents
node default {
# Include modules that run on all the nodes
}
Last but not least, run the puppet command to apply the changes
puppet apply /etc/puppet/manifests/site.pp
…
Notice: Finished catalog run in 0.03 seconds
As you can see, we now have an end-to-end solution where MinIO acts as the Puppet server, holding all the modules, manifests and configuration data while the nodes can pull from the MinIO cluster based on the location where the datacenter is running the nodes.
Bringing it all together
We’ve shown you how to use MinIO as a backend for your headless-masterless configuration management system. But, as you know, we try to provide - as much as possible - a holistic view of the entire process so you can understand our reasoning behind why we do the things that we do.
A good steward of DevOps would know even the smallest change in the infrastructure needs to be tracked. With that spirit, we recommend putting your Salt, Puppet, or any other configuration management in git version control. In this case, we need the codebase in a git repo so that we can git clone/pull the latest files before mirror
ing them to a MinIO bucket.
Before we look at how to do this, you might be wondering, why not use something like `git clone` and run that on every node to pull the configurations? While this is possible git clone
is very inefficient because as more and more servers do `git clone` the more load falls on your internal git
infrastructure. This has a cascading effect that could slow down your code changes and it is tremendously complex to scale the git
cluster.
You can do this one of two ways:
- You can write a simple script to
git clone
andgit pull
to a bastion or some kind of a common DevOps node, thenmc mirror
to the MinIO bucket, like so
git clone https://github.com/org/puppet_modules.git /etc/puppet
mc mirror ./puppet_modules/ minio1/puppet
But the downside to this method is you need to have a separate process such as cron or a git post-hook so that every time the repo has a new commit it would run this script.
- The recommended way is to have a GitHub Actions or Jenkins job that would run
mc mirror
whenever a new commit is made or better yet when a new tag is added.
git tag 0.1.0
mc mirror ./puppet_modules/ minio1/puppet
This method will not require you to have a separate script each time the repo is updated, instead, the files will be mirrored to MinIO bucket as soon as the repo is committed or tagged.
Let’s bring this all together with a small wrapper bash script on the local node that will run via cronjob and save this script as /etc/my_puppet_run.sh
#!/bin/bash -e
mc cp --recursive minio1/puppet/ /etc/puppet/
puppet apply /etc/puppet/manifests/site.pp
Chmod the above script so that it is executable.
chmod +x /etc/my_puppet_run.sh
As you can see the script is very simple, barely 2 lines, which is all you really need. Next let's setup a cronjob to run every 30 minutes using the script above.
*/30 * * * * /etc/my_puppet_run.sh
It’s as simple as that.
Salt-N-Puppet: Push It
To recap, in this blog post we showed you the differences between configuration management systems such as Puppet and Salt in a masterless headless manner by introducing MinIO as an essential central piece of the infrastructure puzzle. We set up MinIO in a scalable distributed fashion because Infrastructure as Code (IaC) systems need to be every bit as scalable, flexible, resilient and powerful, if not more, than the frontend appliances and other infrastructure devices that depend on Puppet or Salt. By storing the files and configurations in MinIO, the configuration management systems are performant, scalable, and integrated out of the box with enterprise features such as KMS and IDP.
If you have any questions on how to build a headless-masterless configuration management system backed by MinIO be sure to reach out to us on Slack!