Renewing KES certificate
MiniO KES (Key Encryption Service) is a service developed by MinIO to bridge the gap between applications that run in Kubernetes and a centralized Key Management Service (KMS). The central KMS server contains all the state information, while the KES talks to the KMS when it is required to do any operation related to fetching new keys or updating existing ones. Once it fetches a key, as long as it doesn’t need to be updated or deleted, it will be cached in KES so the subsequent calls will be much faster.
So why use KES rather than directly using the KMS? Depending on the KMS used and the load it needs to handle, sometimes KMS systems do not have the capability or the support to handle large deployments where it has to manage hundreds if not thousands of keys back and forth while the Kubernetes cluster puts an enormous load on them. In these situations, it's paramount you use KES because it can scale horizontally very easily, unlike traditional KMS systems.
All KES operations between the Application <-> KES and between KES <-> KMS use mTLS authentication for authentication and authorization functions. This is done using a pair of public/private keys and X.509 certificate. The thing with certs is they have a very common problem, they tend to expire and when they do, services all around fail spectacularly with little rhyme or reason. What do we mean by that?
What we mean is once the cert expires, you will start to see errors such as these in the KES log
{"message":"2024/01/04 02:23:21 http: TLS handshake error from 10.244.2.9:32816: remote error: tls: bad certificate"} {"message":"2024/01/04 02:23:28 http: TLS handshake error from 10.244.3.11:53456: remote error: tls: bad certificate"} {"message":"2024/01/04 02:23:28 http: TLS handshake error from 10.244.1.9:56722: remote error: tls: bad certificate"} {"message":"2024/01/04 02:23:28 http: TLS handshake error from 10.244.4.11:34152: remote error: tls: bad certificate"} {"message":"2024/01/04 02:23:28 http: TLS handshake error from 10.244.2.9:55300: remote error: tls: bad certificate"} {"message":"2024/01/04 02:23:28 http: TLS handshake error from 10.244.4.11:34160: remote error: tls: bad certificate"} … |
Also, when MinIO tries to do a periodic IAM refresh, those would also fail with the following messages in the MinIO log
Error: Failure in periodic refresh for IAM (took 0.03s): Post "https://kes-tenant-kes-hl-svc.default.svc.cluster.local:7373/v1/key/decrypt/my-minio-key": x509: certificate has expired or is not yet valid: current time 2024-01-04T02:27:31Z is after 2024-01-04T02:12:40Z (*errors.errorString) |
If you are lucky, you will see an obvious message such as certificate has expired
. Other times it's not so obvious, you could also see edge case issues when trying to create or delete keys among a host of other issues. The quickest solution is to renew and update KES with new certs as soon as possible. In this post we’ll show you exactly how to do that.
How to Renew
Let’s first start by creating a new private key
openssl genrsa -out private.key 2048 |
Create a file called cert.cnf
which will be used by openssl
to create the Certificate Signing Request (CSR)
[req] distinguished_name = req_distinguished_name req_extensions = req_ext prompt = no [req_distinguished_name] O = "system:nodes" C = US CN = "system:node:*.kes-tenant-kes-hl-svc.default.svc.cluster.local" [req_ext] subjectAltName = @alt_names [alt_names] DNS.1 = kes-tenant-kes-0.kes-tenant-kes-hl-svc.default.svc.cluster.local DNS.2 = kes-tenant-kes-hl-svc.default.svc.cluster.local |
Be sure to modify the Common Name CN
and Subject Alternative Names SAN
(under [alt_names]
) to match the FQDN of your KES nodes. Be sure to use proper FQDNs and not IP addresses.
Create the CSR using the command below
openssl req -new -config cert.cnf -key private.key -out kes.csr |
Convert the CSR into an encoded string so it can be added to Kubernetes as a CertificateSigningRequest
resource.
cat kes.csr | base64 | tr -d "\n" |
Create a file kes-csr.yaml
with the content below and paste the above encoded CSR in the request
field. The cert has been truncated so you can see the entire yaml.
apiVersion: certificates.k8s.io/v1 kind: CertificateSigningRequest metadata: name: kes-csr spec: expirationSeconds: 604800 groups: - system:serviceaccounts - system:serviceaccounts:minio-operator - system:authenticated - system:nodes request: LS0tLS1CRUdJTiBDRV…FUVVFU1QtLS0tLQo= signerName: kubernetes.io/kubelet-serving usages: - digital signature - key encipherment - server auth username: system:serviceaccount:minio-operator:minio-operator |
Be sure to update the expirationSeconds
to something high so that it doesn’t expire very soon.
Once the encoded CSR has been added and other settings have been set, apply the yaml.
kubectl apply -f kes-csr.yaml |
Be sure to approve the kes-csr
CSR created above
kubectl certificate approve kes-csr |
Get a public cert from the csr
resource
kubectl get csr kes-csr -o jsonpath='{.status.certificate}'| base64 -d > public.crt |
Convert both the private.key
(from the beginning of the process) and public.crt
(from the previous step) to an encoded string.
cat private.key | base64 | tr -d "\n" |
cat public.crt | base64 | tr -d "\n" |
Using the encoded strings from above, we’ll update the existing Secret
kes-tenant-kes-tls
, in order to do that, follow the steps below.
Copy the existing secret where the existing expired cert is located.
kubectl get secret kes-tenant-kes-tls -o yaml > kes-tls-secret.yaml |
Once you have backed up the existing secret, delete it
kubectl delete secret kes-tenant-kes-tls |
Open kes-tls-secret.yaml
with the expired certs and replace the following two fields with their respective base64 encoded strings.
data: private.key: >- LS0tLS1CRUd…ZLS0tLS0 public.crt: >- LS0tLS1CRUdJTi…tLS0K |
Once the new certs have been added apply the Secret
, which will recreate kes-tenant-kes-tls
kubectl apply -f kes-tls-secret.yaml |
Once a valid cert is added, be sure to restart the KES service and you should see the output like so:
'http://vault.default.svc.cluster.local:8200' ... Endpoint: https://127.0.0.1:7373 https://10.244.4.16:7373 Admin: _ [ disabled ] Auth: off [ any client can connect but policies still apply ] Keys: Hashicorp Vault: http://vault.default.svc.cluster.local:8200 CLI: export KES_SERVER=https://127.0.0.1:7373 export KES_CLIENT_KEY= // e.g. $HOME/root.key export KES_CLIENT_CERT= // e.g. $HOME/root.cert kes --help |
The MinIO log should also be clean and should not show any TLS errors anymore.
Waiting for all MinIO sub-systems to be initialized.. lock acquired Automatically configured API requests per node based on available memory on the system: 221 All MinIO sub-systems initialized successfully in 15.44125ms MinIO Object Storage Server Copyright: 2015-2024 MinIO, Inc. License: GNU AGPLv3 Version: RELEASE.2024-01-04T09-40-09Z (go1.19.4 linux/arm64) Status: 4 Online, 0 Offline. API: https://minio.default.svc.cluster.local Console: https://10.244.3.12:9443 https://127.0.0.1:9443 Documentation: https://min.io/docs/minio/linux/index.html |
Final Thoughts
KES is an integral part when it comes to managing keys used to encrypt objects. It is important that objects get encrypted and decrypted in the quickest manner possible because each nanosecond it takes to perform these operations the end user will get the objects that much slower. Yes, ultimately, inefficient and slow KMS systems can degrade overall cluster performance. So it's paramount to ensure the service performing these actions is fast, lean, performant and scalable. MinIO’s KES enables any KMS to be a high performance and scalable service without any modification to the existing KMS. By following the steps above, you can get KES back to having valid unexpired certs in no time!
If you have any questions on KES be sure to reach out to us on Slack!