Searchguard not failing for expired certificates

Elasticsearch version: - 7.0.1

Describe the issue:
I installed elasticsearch with searchguard with expired self-signed certificates. But even with expired certs, node-to-node communication is working, sgadmin is running successfully and the cluster is serviceable.

As you can see the certificate got expired on Jul 09 13:17:22 UTC 2020, but even after that, I am able to use these in searchguard.

keytool -list -v -storepass changeit -keystore /etc/elasticsearch/certs/elasticsearch.jks** Keystore type: PKCS12 Keystore provider: SUN Your keystore contains 1 entry Alias name: elasticsearch Creation date: Jul 8, 2020 Entry type: PrivateKeyEntry Certificate chain length: 1 Certificate[1]: Owner: CN=elasticsearch, C=IN Issuer: CN=elasticsearch, C=IN Serial number: 6bca73af **Valid from: Wed Jul 08 13:17:22 UTC 2020 until: Thu Jul 09 13:17:22 UTC 2020** Certificate fingerprints: SHA1: 35:A5:04:6A:3E:36:5E:57:38:AA:48:E5:F **> date
> Fri Jul 10 16:05:03 UTC 2020
$ curl https://elasticsearch:9200 -uadmin:admin -k -v
//* About to connect() to elasticsearch port 9200 (#0)
//* Trying 10.254.236.64…
//* Connected to elasticsearch (10.254.236.64) port 9200 (#0)
//* Initializing NSS with certpath: sql:/etc/pki/nssdb
//* skipping SSL peer certificate verification
//* NSS: client certificate not found (nickname not specified)
//* SSL connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
//* Server certificate:
//* subject: CN=elasticsearch,C=IN
//* start date: Jul 08 13:17:22 2020 GMT
//* expire date: Jul 09 13:17:22 2020 GMT
//* common name: elasticsearch
//* issuer: CN=elasticsearch,C=IN
//* Server auth using Basic with user ‘admin’
GET / HTTP/1.1
Authorization: Basic YWRtaW46YWRtaW4=
User-Agent: curl/7.29.0
Host: elasticsearch:9200
Accept: /
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 554

{
“name” : “shiv-elk-elasticsearch-client-5f6bb789c6-5n7sq”,
“cluster_name” : “shiv-sg-es-dont-delete”,
“cluster_uuid” : “iaT1Fsd5ThCrBpAwAT93uQ”,
“version” : {
“number” : “7.0.1”,
“build_flavor” : “oss”,
“build_type” : “rpm”,
“build_hash” : “e4efcb5”,
“build_date” : “2019-04-29T12:56:03.145736Z”,
“build_snapshot” : false,
“lucene_version” : “8.0.0”,
“minimum_wire_compatibility_version” : “6.7.0”,
“minimum_index_compatibility_version” : “6.0.0-beta1”
},
“tagline” : “You Know, for Search”
}

Steps to reproduce:

  1. Generate node & admin certificates with 1d validity

     $ keytool -genkey -keyalg RSA -alias elasticsearch -keystore elasticsearch.jks -storepass changeit -validity 1 -keysize 2048 -dname "CN=elasticsearch,C=IN"
     $ keytool -genkey -keyalg RSA -alias admin -keystore admin-keystore.jks -storepass changeit -validity 1 -keysize 2048 -dname "CN=admin,C=IN"
     # Extract certs
     $ keytool -export -alias elasticsearch -keystore elasticsearch.jks -storepass changeit -file elasticsearch-cert.pem
     keytool -export -alias admin -keystore admin-keystore.jks -storepass changeit -file admin-cert.pem
     # Importing the certs into truststore
     $ keytool -import -noprompt -alias elasticsearch -file elasticsearch-cert.pem -keystore truststore.jks -storepass changeit
     $ keytool -import -noprompt -alias admin -file admin-cert.pem -keystore truststore.jks -storepass changeit
    
  2. Mount the the generated certificates in elasticsearch pods and configure the properties in elasticsearch.yaml acc to the mount paths -

 searchguard:
    ssl.transport:
        enabled: true
        keystore_type: JKS
        keystore_filepath: /etc/elasticsearch/certs/elasticsearch.jks
        keystore_password: changeit
        truststore_type: JKS
        truststore_filepath: /etc/elasticsearch/certs/truststore.jks
        truststore_password: changeit
        enforce_hostname_verification: false
    ssl.http:
        enabled: true
        clientauth_mode: OPTIONAL
        enable_openssl_if_available: true
        keystore_type: JKS
        keystore_filepath: /etc/elasticsearch/certs/elasticsearch.jks
        keystore_password: changeit
        truststore_type: JKS
        truststore_filepath: /etc/elasticsearch/certs/truststore.jks
        truststore_password: changeit
    authcz.admin_dn:
      - "CN=admin,C=IN"
    enterprise_modules_enabled: false
    nodes_dn:
      - "CN=elasticsearch,C=IN"
  1. After 1d (after the certificate expires), start elasticsearch and run sgadmin.

Expected behavior: : As the certificates are expired, the ssl communication should have failed.
No error logs are seen in elasticsearch.
Is it expected? Shouldn’t searchguard fail when certs expire?

Thanks for reporting this. I have created certificates, they expire tomorrow. I’ll try to reproduce the issue then.

@srgbnd, Thanks. Were you able to reproduce the issue? Any thoughts?

Hi. Yes, I was able to reproduce the issue yesterday. It is a bug indeed. I added it to the current sprint. It will be solved in the next release.

Also, I tested with expired TLS PEM certificates. In this case, the Search Guard cluster fails as expected.

Hi, Thanks for the response. I tried now with expired TLS PEM certificates too, but I observe the same issue :frowning:

Steps to generate (self-signed) PEM certificates valid for 1day:

  1. Node cert (Generates node.key.pem, node.crt.pem)

$ openssl req -x509 -newkey rsa:4096 -keyout node.key.pem -out node.crt.pem -days 1 -subj “/C=IN/CN=elasticsearch” -nodes

  1. Admin cert (Generates admin.key.pem , admin.cert.pem)

$ openssl req -x509 -newkey rsa:4096 -keyout admin.key.pem -out admin.crt.pem -days 1 -subj “/C=IN/CN=admin” -nodes

  1. Rootca

$ cat node.crt.pem admin.cert.pem > root-ca.pem

SG Configurations with PEM certificates:

searchguard.ssl.transport.pemkey_filepath=/etc/elasticsearch/certs/node.key.pem
searchguard.ssl.transport.pemtrustedcas_filepath=/etc/elasticsearch/certs/root-ca.pem
searchguard.ssl.transport.pemcert_filepath=/etc/elasticsearch/certs/node.crt.pem
searchguard.nodes_dn=“CN=elasticsearch,C=IN”
searchguard.authcz.admin_dn=“CN=admin,C=IN”

Certificate validity

keytool -printcert -file node.crt.pem Owner: CN=elasticsearch, C=IN Issuer: CN=elasticsearch, C=IN Serial number: a34db07d40311ef7 **Valid from: Wed Jul 22 05:54:57 UTC 2020 until: Thu Jul 23 05:54:57 UTC 2020** date
Thu Jul 23 07:02:49 UTC 2020

Even after the certicates expired, I am able to install the cluster, run sgadmin (with admin.crt.pem,admin.key.pem), add more nodes. Cluster health is green, and there are no error logs seen in elasticsearch pods.
Not sure how it failed with expired certificates in your case.

Also, another follow-up question -

Could you help with the timeline when the next release with this fix would be available?

Could you help with the timeline when the next release with this fix would be available?

The next release will be out in the first week of the next month.

If you need the solution now. The SG TLS tool generates PEM certificates that “expire correctly”. https://docs.search-guard.com/latest/offline-tls-tool

Hi @shivani.aggarwal2195 We don’t forget about this issue, it has the high priority. But we need more time to work on it. I’ll ping you when it is solved.

Hi @srgbnd, sure, thanks for the update.

Although I would like to share our new observations with the SG TLS tool suggested previously.

We generated PEM certificates valid for 1 day using the tool.

  1. When a fresh install of cluster is done with expired certificates, the SSL communication fails between the nodes with java.security.cert.CertificateExpiredException: and cluster fails to form (works as expected).

  2. When a cluster is already running and the certificates expire after a while, we see no error logs related to SSL handshake issues. The cluster remains reachable on 9200 without any issues.

i. However, when we run Sgadmin on the same cluster to update few configurations, that fails with the error -

 Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
        at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:369) ~[?:?]
        at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:275) ~[?:?]
        at sun.security.validator.Validator.validate(Validator.java:264) ~[?:?]
        at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) ~[?:?]
        at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276) ~[?:?]
        at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) ~[?:?]
        at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1317) ~[?:?]
    Caused by: java.security.cert.CertificateExpiredException: NotAfter: Thu Jul 30 06:17:10 UTC 2020
            at sun.security.x509.CertificateValidity.valid(CertificateValidity.java:274) ~[?:?]
            at sun.security.x509.X509CertImpl.checkValidity(X509CertImpl.java:675) ~[?:?]
            at sun.security.provider.certpath.BasicChecker.verifyValidity(BasicChecker.java:190) ~[?:?]
            at sun.security.provider.certpath.BasicChecker.check(BasicChecker.java:144) ~[?:?]
            at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:125) ~[?:?]
            at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:237) ~[?:?]
            at sun.security.provider.certpath.PKIXCertPathValidator.validate(PKIXCertPathValidator.java:145) ~[?:?]
            at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:84) ~[?:?]
            at java.security.cert.CertPathValidator.validate(CertPathValidator.java:309) ~[?:?]
            at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:364) ~[?:?]
            at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:275) ~[?:?]
            at sun.security.validator.Validator.validate(Validator.java:264) ~[?:?]
            at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:313) ~[?:?]
            at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:276) ~[?:?]
            at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:141) ~[?:?]
            at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1317) ~[?:?]
            ... 29 more
    ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
    Trace:
    NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{RwXrrucpTpCB722K0nAUMg}

ii.Also, adding newer nodes to the cluster (or restarting one of the nodes) fails with these exceptions
in elected master node:

SSL Problem Received fatal alert: certificate_unknown"}
javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

and in the new node :

"SSL Problem PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed"}
javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:326) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:269) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:264) ~[?:?]

Is this behavior expected? Are the certificates validated only when the first time nodes establish communication among each other?

I expected the transport-layer communication on 9300 between nodes to fail after the certificates expired.

1 Like

Hi @shivani.aggarwal2195
Thank you very much for the feedback! We look at this and get back to you.

Hi @shivani.aggarwal2195

This issue is a configuration issue. In your configuration, you put the node certificates into the trust store. This, however, is not the a proper TLS configuration. You should create a root certificate which signs all node certificates. Only the root certificate should be in the trust store.

If you put the node certificates into the trust store, these will be ultimatively trusted. Certificates put in the default Java trust store won’t even expire. Thus, the cluster doesn’t fail.

$ cat node.crt.pem admin.cert.pem > root-ca.pem

And in the case of PEM, you essentially do the same thing, root-ca.pem should only take the root certificate, not node or admin certificates.

When a cluster is already running and the certificates expire after a while, we see no error logs related to SSL handshake issues. The cluster remains reachable on 9200 without any issues.

Finally, the SG TLS tool certificates question, Elasticsearch uses long-lasting TCP connections that wont’t be expired immediately on the TLS certificate expiration. Thus this behavior is expected too. We expect the connections to fail after awhile.

You can reach the cluster on TCP 9200 because you use the -k key in the curl command. The long version of this key is --insecure; it makes curl ignore the expired certificate.

Hi @srgbnd, Thanks for pointing out the issue in TLS configuration in keystores.

So that implies, self-signed certificates will always be trusted (even after they expire) as they are in the truststore.

We have now corrected the setup by having a rootca that signs the node & admin certs and only the rootca in placed into the truststore.

  • With this change, a new install of cluster with expired certificates fails as expected.
  • However, having an existing cluster (that was brought up with valid certificates and the certificates expire after a while), we see there are no failures/errors in the cluster. All node-to-communication keeps working and cluster is operational. We have monitored it for around 48 hrs and no error yet in any of the nodes.

In general, how long can we expect these long-lasting TCP connections to keep running and not fail?

You have only the root TLS certificate in the existing cluster trust store, haven’t you?

In general, how long can we expect these long-lasting TCP connections to keep running and not fail?

I haven’t found the exact value in the Elastic docs, need to look at the code. Here is the explanation for the long-lived connections https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-transport.html#long-lived-connections

Yes, only the rootca certificate is placed in the truststore.

Do you have this cluster running still? Let’s wait for a couple of days more and tell me the results.