minio/dsync: a distributed locking and syncing package for Go

Frank Wessels Frank Wessels on Performance 15 September 2016

minio/dsync is a package for doing distributed locks over a network of nnodes. It is designed with simplicity in mind and offers limited scalability (n <= 16). Each node is connected to all other nodes and lock requests from any node will be broadcast to all connected nodes. A node will succeed in getting the lock if n/2 + 1 nodes respond positively. If the lock is acquired it can be held for as long as the client desires and it needs to be released afterwards. This will cause an unlock message to be broadcast to all nodes after which the lock becomes available again.

Motivation

This package was developed for the distributed server version of the Minio Object Storage. For this we needed a simple and reliable distributed locking mechanism for up to 16 servers that each would be running minio server. The locking mechanism itself should be a reader/writer mutual exclusion lock meaning that it can be held by a single writer or by an arbitrary number of readers.

For minio the distributed version is started as follows (eg for a 6-server system):

$ minio server server1/disk server2/disk server3/disk server4/disk server5/disk server6/disk

(note that the same identical command should be run on servers server1 through to server6)

Design goals

Simple design: by keeping the design simple, many tricky edge cases can be avoided.
No master node: there is no concept of a master node which, if this would be used and the master would be down, causes locking to come to a complete stop. (Unless you have a design with a slave node but this adds yet more complexity.)
Resilient: if one or more nodes go down, the other nodes should not be affected and can continue to acquire locks (provided not more than n/2–1nodes are down).
Drop-in replacement for sync.RWMutex and support for sync.Lockerinterface.
Automatically reconnect to (restarted) nodes.

Example usage

Below is a simple example showing how to protect a single resource using dsync:


import (
    "github.com/minio/dsync"
)

func lockSameResource() {

    // Create distributed mutex to protect resource 'test'
    dm := dsync.NewDRWMutex("test")

    dm.Lock()
    log.Println("first lock granted")

    // Release 1st lock after 5 seconds
    go func() {
        time.Sleep(5 * time.Second)
        log.Println("first lock unlocked")
        dm.Unlock()
    }()

    // Acquire lock again, will block until initial lock is released
    log.Println("about to lock same resource again...")
    dm.Lock()
    log.Println("second lock granted")

    time.Sleep(2 * time.Second)
    dm.Unlock()
}

which would give the following output when run:

2016/09/02 14:50:00 first lock granted
2016/09/02 14:50:00 about to lock same resource again...
2016/09/02 14:50:05 first lock unlocked
2016/09/02 14:50:05 second lock granted

(note that it is more fun to run this distributed over multiple machines).

In addition to a write lock, dsync also has support for multiple read locks. See here for an example.

Performance

For a syncing package performance is of course of paramount importance since it is typically a quite frequent operation. As dsync naturally involves network communications the performance will be bound by the number of messages (or so called Remote Procedure Calls or RPCs) that can be exchanged every second.

Depending on the number of nodes participating in the distributed locking process, more messages need to be sent. For instance on an 8 server system, a total of 16 messages are exchanged for every lock and subsequent unlock operation whereas on a 16 server system this is a total of 32 messages.

Also, as the syncing mechanism is a supplementary operation to the actual function of the (distributed) system, it should not consume too much CPU power.

minio/dsync supports up to:

7500 locks/sec for 16 nodes (at 10% CPU usage/server) on moderately powerful server hardware

More performance numbers can be found here.

Stale locks and known deficiencies

In a distributed system, a stale lock is a lock at a node that is in fact no longer active. This can happen due to eg a server crashing or the network becoming temporarily unavailable (partial network outage) so that for instance an unlock message cannot be delivered anymore.

Stale locks are normally not easy to detect and they can cause problems by preventing new locks on a resource. minio/dsync has a stale lock detection mechanism that automatically removes stale locks under certain conditions (see here for more details).

Another potential issue is allowing more than one exclusive (write) lock on a resource (as multiple concurrent writes could lead to corruption of data). By default minio/dsync requires a minimum quorum of n/2+1 underlying locks in order to grant a lock (and typically it is much more or all servers that are up and running under normal conditions).

However even when a lock is just supported by the minimum quorum of n/2+1 nodes, it is required for two of the nodes to go down in order to allow another lock on the same resource to be granted (provided all down nodes are restarted again). Depending on the number of nodes the chances of this happening become smaller and smaller, so while not being impossible it is very unlikely to happen.

This is a more elaborate example that also includes a table that lists the total number of nodes that needs to be down or crashed for such an undesired effect to happen.

More to tell

Of course there is more to tell concerning implementation details, extensions and other potential use cases, comparison to other techniques and solutions, restrictions, etc. Head over to minio/dsync on github to find out more.

If you have any comments we like hear from you and we also welcome any improvements.

Happy Distributed Locking !

Previous Post Next Post

S3 Select Security Modern Data Lakes Apache Presto SQL Performance S3 Brand/Design Golang Programming Cloud Computing Microservices Docker AWS Kubernetes Apache Spark Open Source Benchmarks Integrations SUBNET Edge Computing Sidekick Secure-by-Design Splunk Veeam Intel Apache Nifi Immutability Software Defined Storage VMware Apache Arrow Hybrid Cloud Red Hat OpenShift Multicloud Scalability Cloud Field Day Cloud Native Apache Kafka Architect's Guide Awards Operator's Guide Security Advisory AI/ML AGPLv3 Apache Hadoop SFD Azure GCP Observability Analytics R H20 DirectPV DevOps Apache Iceberg Apache Hudi YouTube Summaries EKS Elastic Load Balancers CI/CD Object Storage Compliance opentelemetry BC/DR Storage Newsletter Predictions Best Practices Dremio New MinIO Features partners Small Files Databases DuckDB PostgreSQL Delta Lake Cloud Repatriation Python Object Lambdas Data Pipelines Cloud Operating Model Webhook ClickHouse Vector Database Events Value Engineering Change Data Capture Enterprise Object Store GitOps Case Study Equinix Certifications Snowflake Repatriation Migration Tabular Databricks

Motivation

Design goals

Example usage

Performance

Stale locks and known deficiencies

More to tell

Get a Quote

Select Plan

Choose Capacity