Describe the issue:
I installed elasticsearch with searchguard with expired self-signed certificates. But even with expired certs, node-to-node communication is working, sgadmin is running successfully and the cluster is serviceable.
As you can see the certificate got expired on Jul 09 13:17:22 UTC 2020, but even after that, I am able to use these in searchguard.
$ keytool -list -v -storepass changeit -keystore /etc/elasticsearch/certs/elasticsearch.jks
Keystore type: PKCS12
Keystore provider: SUN
Your keystore contains 1 entry
Alias name: elasticsearch
Creation date: Jul 8, 2020
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=elasticsearch, C=IN
Issuer: CN=elasticsearch, C=IN
Serial number: 6bca73af Valid from: Wed Jul 08 13:17:22 UTC 2020 until: Thu Jul 09 13:17:22 UTC 2020
Certificate fingerprints:
SHA1: 35:A5:04:6A:3E:36:5E:57:38:AA:48:E5:F > $ date > Fri Jul 10 16:05:03 UTC 2020 $ curl https://elasticsearch:9200 -uadmin:admin -k -v
//* About to connect() to elasticsearch port 9200 (#0)
//* Trying 10.254.236.64âŚ
//* Connected to elasticsearch (10.254.236.64) port 9200 (#0)
//* Initializing NSS with certpath: sql:/etc/pki/nssdb
//* skipping SSL peer certificate verification
//* NSS: client certificate not found (nickname not specified)
//* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
//* Server certificate:
//* subject: CN=elasticsearch,C=IN
//* start date: Jul 08 13:17:22 2020 GMT
//* expire date: Jul 09 13:17:22 2020 GMT
//* common name: elasticsearch
//* issuer: CN=elasticsearch,C=IN
//* Server auth using Basic with user âadminâ
GET / HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
User-Agent: curl/7.29.0
Host: elasticsearch:9200
Accept: /
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 554
After 1d (after the certificate expires), start elasticsearch and run sgadmin.
Expected behavior: : As the certificates are expired, the ssl communication should have failed.
No error logs are seen in elasticsearch.
Is it expected? Shouldnât searchguard fail when certs expire?
$ keytool -printcert -file node.crt.pem
Owner: CN=elasticsearch, C=IN
Issuer: CN=elasticsearch, C=IN
Serial number: a34db07d40311ef7 Valid from: Wed Jul 22 05:54:57 UTC 2020 until: Thu Jul 23 05:54:57 UTC 2020
$ date
Thu Jul 23 07:02:49 UTC 2020
Even after the certicates expired, I am able to install the cluster, run sgadmin (with admin.crt.pem,admin.key.pem), add more nodes. Cluster health is green, and there are no error logs seen in elasticsearch pods.
Not sure how it failed with expired certificates in your case.
Also, another follow-up question -
Could you help with the timeline when the next release with this fix would be available?
Hi @shivani.aggarwal2195 We donât forget about this issue, it has the high priority. But we need more time to work on it. Iâll ping you when it is solved.
Although I would like to share our new observations with the SG TLS tool suggested previously.
We generated PEM certificates valid for 1 day using the tool.
When a fresh install of cluster is done with expired certificates, the SSL communication fails between the nodes with java.security.cert.CertificateExpiredException: and cluster fails to form (works as expected).
When a cluster is already running and the certificates expire after a while, we see no error logs related to SSL handshake issues. The cluster remains reachable on 9200 without any issues.
i. However, when we run Sgadmin on the same cluster to update few configurations, that fails with the error -
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:369) ~[?:?]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:275) ~[?:?]
at sun.security.validator.Validator.validate(Validator.java:264) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1317) ~[?:?]
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Thu Jul 30 06:17:10 UTC 2020
at sun.security.x509.CertificateValidity.valid(CertificateValidity.java:274) ~[?:?]
at sun.security.x509.X509CertImpl.checkValidity(X509CertImpl.java:675) ~[?:?]
at sun.security.provider.certpath.BasicChecker.verifyValidity(BasicChecker.java:190) ~[?:?]
at sun.security.provider.certpath.BasicChecker.check(BasicChecker.java:144) ~[?:?]
at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125) ~[?:?]
at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:237) ~[?:?]
at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:145) ~[?:?]
at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:84) ~[?:?]
at java.security.cert.CertPathValidator.validate(CertPathValidator.java:309) ~[?:?]
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:364) ~[?:?]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:275) ~[?:?]
at sun.security.validator.Validator.validate(Validator.java:264) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276) ~[?:?]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) ~[?:?]
at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1317) ~[?:?]
... 29 more
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{RwXrrucpTpCB722K0nAUMg}
ii.Also, adding newer nodes to the cluster (or restarting one of the nodes) fails with these exceptions
in elected master node:
SSL Problem Received fatal alert: certificate_unknown"}
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
and in the new node :
"SSL Problem PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed"}
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:326) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:269) ~[?:?]
at sun.security.ssl.TransportContext.fatal(TransportContext.java:264) ~[?:?]
Is this behavior expected? Are the certificates validated only when the first time nodes establish communication among each other?
I expected the transport-layer communication on 9300 between nodes to fail after the certificates expired.
This issue is a configuration issue. In your configuration, you put the node certificates into the trust store. This, however, is not the a proper TLS configuration. You should create a root certificate which signs all node certificates. Only the root certificate should be in the trust store.
If you put the node certificates into the trust store, these will be ultimatively trusted. Certificates put in the default Java trust store wonât even expire. Thus, the cluster doesnât fail.
$ cat node.crt.pem admin.cert.pem > root-ca.pem
And in the case of PEM, you essentially do the same thing, root-ca.pem should only take the root certificate, not node or admin certificates.
When a cluster is already running and the certificates expire after a while, we see no error logs related to SSL handshake issues. The cluster remains reachable on 9200 without any issues.
Finally, the SG TLS tool certificates question, Elasticsearch uses long-lasting TCP connections that wontât be expired immediately on the TLS certificate expiration. Thus this behavior is expected too. We expect the connections to fail after awhile.
You can reach the cluster on TCP 9200 because you use the -k key in the curl command. The long version of this key is --insecure; it makes curl ignore the expired certificate.
Hi @srgbnd, Thanks for pointing out the issue in TLS configuration in keystores.
So that implies, self-signed certificates will always be trusted (even after they expire) as they are in the truststore.
We have now corrected the setup by having a rootca that signs the node & admin certs and only the rootca in placed into the truststore.
With this change, a new install of cluster with expired certificates fails as expected.
However, having an existing cluster (that was brought up with valid certificates and the certificates expire after a while), we see there are no failures/errors in the cluster. All node-to-communication keeps working and cluster is operational. We have monitored it for around 48 hrs and no error yet in any of the nodes.
In general, how long can we expect these long-lasting TCP connections to keep running and not fail?