Switching Certificates (TLS Tool 1.6)

Elasticsearch version 6.6
Searchguard: search-guard-6-6.6.1-24.1.zip

JVM: openjdk8 - Java 1.8.0

Searchguard configuration files - default shipped (or noted below)

Other plugins:
Timelion in kibana - nothing else in elasticsearch.

The cluster was initially setup with Elasticsearch 6.4 cluster and searchguard.
The scripts from searchguard (from some example scripts downloaded at the time I think) were used to generate the initial scripts:
gen_client_node_cert.sh
gen_root_ca.sh
etc

Then added the config for the keystore/truststores as such:

searchguard.enterprise_modules_enabled: true
searchguard.restapi.roles_enabled: [“sg_all_access”]
searchguard.authcz.admin_dn:
-“<sanitized, but works>”

#searchguard.ssl.http.clientauth_mode: REQUIRE
#searchguard.ssl.http.clientauth_mode: NONE
searchguard.ssl.http.clientauth_mode: OPTIONAL

Transport layer SSL

searchguard.ssl.transport.enabled: true

searchguard.ssl.transport.keystore_type: JKS
searchguard.ssl.transport.keystore_filepath: /etc/elasticsearch/elk-certs/keystores/node1-keystore.jks
searchguard.ssl.transport.keystore_password:
searchguard.ssl.transport.truststore_type: JKS
searchguard.ssl.transport.truststore_filepath: /etc/elasticsearch/elk-certs/keystores/truststore.jks
searchguard.ssl.transport.truststore_password:
searchguard.ssl.transport.enforce_hostname_verification: false

HTTP/REST layer SSL

#searchguard.ssl.http.enabled: false
searchguard.ssl.http.enabled: true

searchguard.ssl.http.keystore_type: JKS
searchguard.ssl.http.keystore_filepath: /etc/elasticsearch/elk-certs/keystores/node1-keystore.jks
searchguard.ssl.http.keystore_password:
searchguard.ssl.http.truststore_type: JKS
searchguard.ssl.http.truststore_filepath: /etc/elasticsearch/elk-certs/keystores/truststore.jks
searchguard.ssl.http.truststore_password:

searchguard.roles_mapping_resolution: BOTH

The cluster was/is running fine.

I stepped in and upgraded to ES 6.6 and the new version of searchguard with the same config. It still works fine.


Important: Now I want to replace our current certificates with a new set because the password for the root-ca key that was used during the root CA generation has been lost so I can’t seem to sign new certs to add more nodes, I seem to only have access to the store to use it to verify the current chain.


I downloaded tls-tool-1.6 and generated new certs with the following sgtlstool.sh config:

ca:
root:
# The distinguished name of this CA. You must specify a distinguished name.
# example: dn: CN=root.ca.example.com,OU=CA,O=Example Com, Inc.,DC=example,DC=com
dn: CN=root.ca-,OU=,O=,DC=d1,DC=d2,DC=com

  # The size of the generated key in bits
  keysize: 2048
  # The validity of the generated certificate in days from now
  validityDays: 3650
  # Password for private key
  #   Possible values: 
  #   - auto: automatically generated password, returned in config output; 
  #   - none: unencrypted private key; 
  #   - other values: other values are used directly as password   
  pkPassword: <password>
  # The name of the generated files can be changed here
  file: root-ca.pem
  # If you have a certificate revocation list, you can specify its distribution points here
  # crlDistributionPoints: URI:https://raw.githubusercontent.com/floragunncom/unittest-assets/master/revoked.crl

defaults:

  # The validity of the generated certificate in days from now
  validityDays: 3650


  # Password for private key
  #   Possible values: 
  #   - auto: automatically generated password, returned in config output; 
  #   - none: unencrypted private key; 
  #   - other values: other values are used directly as password   
  pkPassword: <password>

  # Specifies to recognize legitimate nodes by the distinguished names
  # of the certificates. This can be a list of DNs, which can contain wildcards.
  # Furthermore, it is possible to specify regular expressions by
  # enclosing the DN in //. 
  # Specification of this is optional. The tool will always include
  # the DNs of the nodes specified in the nodes section.
  # 
  # Examples:      
  # - "CN=*.example.com,OU=Ops,O=Example Com\\, Inc.,DC=example,DC=com"
  # - 'CN=node.other.com,OU=SSL,O=Test,L=Test,C=DE'
  # - 'CN=*.example.com,OU=SSL,O=Test,L=Test,C=DE'
  # - 'CN=elk-devcluster*'
  # - '/CN=.*regex/' 
  nodesDn:
          - '/CN=node[0-9]*/i,OU=<ou>,O=<org>,DC=d1,DC=d2,DC=com'

nodes:
# The node name is just used as name of the generated files

  • name: node1.d1.d2.com

    The distinguished name of this node

    dn: CN=node1,OU=,O=,DC=d1,DC=d2,DC=com

    DNS names of this node. Several names can be specified as list

    dns:

    The IP addresses of this node. Several addresses can be specified as list

    ip:

    • ip1
    • ip2

    If you want to override the keysize, pkPassword or validityDays values from

    the defaults, just specify them here.

  • name: node2.d1.d2.com

    The distinguished name of this node

    dn: CN=node2,OU=,O=,DC=d1,DC=d2,DC=com

    DNS names of this node. Several names can be specified as list

    dns:

    • node2.d1.d2.ca

    The IP addresses of this node. Several addresses can be specified as list

    ip: ip1

    If you want to override the keysize, pkPassword or validityDays values from

    the defaults, just specify them here.

Repeated for a total of 5 nodes, all others matching node2

I ran the command for sgtlstool.sh

sgtlstool.sh -ca -crt -c cluster.yml -t ./output/

The /yml snippets from the output had this:

searchguard.ssl.transport.pemcert_filepath: node5.d1.d2.com.pem
searchguard.ssl.transport.pemkey_filepath: node5.d1.d2.com.key
searchguard.ssl.transport.pemkey_password:
searchguard.ssl.transport.pemtrustedcas_filepath: root-ca.pem
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: node5.d1.d2.com_http.pem
searchguard.ssl.http.pemkey_filepath: node5.d1.d2.com_http.key
searchguard.ssl.http.pemkey_password:
searchguard.ssl.http.pemtrustedcas_filepath: root-ca.pem
searchguard.nodes_dn:

  • /CN=node[0-9]*/i,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node1,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node2,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node3,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node4,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node5,OU=,O=,DC=d1,DC=d2,DC=com
    searchguard.authcz.admin_dn:
  • CN=sgAdmin,OU=,O=,DC=d1,DC=d2,DC=com

I replaced it with this:

searchguard.ssl.transport.pemcert_filepath: elk-certs/node5.d1.d2.com.pem
searchguard.ssl.transport.pemkey_filepath: elk-certs/node5.d1.d2.com.key
searchguard.ssl.transport.pemkey_password:
searchguard.ssl.transport.pemtrustedcas_filepath: elk-certs/root-ca.pem
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false
searchguard.ssl.http.enabled: true
searchguard.ssl.http.pemcert_filepath: elk-certs/node5.d1.d2.com_http.pem
searchguard.ssl.http.pemkey_filepath: elk-certs/node5.d1.d2.com_http.key
searchguard.ssl.http.pemkey_password:
searchguard.ssl.http.pemtrustedcas_filepath: elk-certs/root-ca.pem
searchguard.nodes_dn:

  • /CN=node[0-9]*/i,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node1,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node2,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node3,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node4,OU=,O=,DC=d1,DC=d2,DC=com
  • CN=node5,OU=,O=,DC=d1,DC=d2,DC=com
    searchguard.authcz.admin_dn:
  • CN=sgAdmin,OU=,O=,DC=d1,DC=d2,DC=com

sgtlsdiag.sh lists no problems when run against this.

I shut down all the nodes. Replaced the certs and started back up. They didn’t seem to have any problems with it starting up. However after a short time and they all started seeing the other nodes the logs got filled with certificate unknown errors:
This and the waiting for master nodes messages are all that appear for about 15 minutes.

[2019-03-28T20:28:38,925][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [log1-op] SSL Problem Received fatal alert: certificate_unknown
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634) ~[?:?]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1800) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1083) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_181]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:405) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:372) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2019-03-28T20:28:38,925][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [log1-op] SSL Problem Received fatal alert: certificate_unknown
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1666) ~[?:?]
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1634) ~[?:?]
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1800) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1083) ~[?:?]
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) ~[?:?]
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[?:?]
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[?:1.8.0_181]
at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:295) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1301) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:405) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:372) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:355) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
at io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1054) ~[netty-handler-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:474) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

interspersed with this every now and then:

[2019-03-28T20:28:41,208][WARN ][o.e.d.z.ZenDiscovery ] [node1] not enough master nodes discovered during pinging (found [], but needed [2]), pinging again

I assume I am doing something wrong, but I don’t know what and I would appreciate any help to sort it out.

If you need more logs, I can do so, it just might take some time as I have to schedule windows where I can attempt the switch and thus can get the logs.

Thanks.

I cannot reproduce this issue.

Please have a look on the demo script for the TLS Tool: https://gist.github.com/floragunncom/11afce28a77219db92f2d76bb5a0b803#file-tls_tool_demo-sh

I used your basic setup to create this script. It generates certificates and later starts up two elasticsearch nodes with those certificates. Works well for me.

Just download the .sh file, make it executeable and run it in a folder where you have write permissions.

Thank, I will be able to test it shortly.

However, just based on your response, I would like to clarify.

This solution just seems to test that the certificates work from scratch.

I am having trouble migrating the existing cluster with old certificates to the new ones and I was wondering if that is the problem somewhere.

Do you mean migrating without downtime? If yes and in case you need to change the CA it’s a two step process (means: you need to to a rolling restart of your cluster twice). First add your new CA to every node and do a rolling restart of the whole cluster. Then update the certificates for every node (thats the second rolling restart). In case you want remove the old CA from the chain of trust you need a third rolling restart (but maybe you can defer that until you need to restart anyhow - if your old CA is somehow compromised thats of course not an option)

See also changing CA and host certificates

No downtime is a bonus, but not necessary.

In my original post, I mentioned I brought down all the nodes at the same time.
Replaced the certificates and changed the config to point to the new ones.
The config in elasticsearch.yml changed as the old one used keystore/truststore, but the new TLS tool just uses straight pem and keyfiles and I used the config almost exactly as the snippet provided.
Then brought them all back up. Elasticsearch initial startup seemed to have no problem with the new certs, but once it saw the other nodes that is when it started complaining.

However, I will attempt the method you mentioned.

Please help me confirm the summary of steps:

  1. I should add the new root-ca pem and key to the existing truststore doing rolling restarts as I do so

  2. bring down the nodes one at a time and replace the certificates/config

  • leaving the truststore config in elasticsearch.yml alone:

searchguard.ssl.http.truststore_type: JKS
searchguard.ssl.http.truststore_filepath: /etc/elasticsearch/elk-certs/keystores/truststore.jks
searchguard.ssl.http.truststore_password:

And not replacing it with this (from the snippet):

searchguard.ssl.http.pemtrustedcas_filepath: root-ca.pem

  1. Bring up the single node and then repeat for the others

  2. Then (you mentioned this a optional until I actually needed to do it) do a rolling restart as I change the truststore config to the pemtrustedca_filepath config

Thanks.

  1. Yes, but not the key of course because the key of the root ca should be kept secret. You never have keys (only certs) in the truststore. Do this for every node, one at a time following the general rules of how to do a rolling restart.

  2. In this step only change the certificate the of the node (keystore), not the truststore (you have updated them already on step 1 so that it contains the old and the new root ca cert)

  3. yes

  4. Yes - in this step you remove the old root ca from the truststore node by node

If you can live with downtime its easier. Just shutdown all nodes and replace key- and truststore files with the new ones and copy the complete snippets from tlstool.

All right, since you said I should be able to just replace the existing certificates and config with the ones form the tool and .yml snippet, I’m afraid we’re back at square one.

That is what I initially did in the first place and it didn’t work.

  • Ran the tool and copied the certificates to /etc/elasticsearch
  • Shutdown all the nodes
  • Copied and replaced the config from the snippet into elasticsearch.yml
  • Brought the nodes back up
    —> Nothing but certificate unknown errors

I have some time scheduled to do the rolling method later this week and I will try that, but it doesn’t sound like it will change anything.

I will update when I can.

But “square one” is know to work as explained in Switching Certificates (TLS Tool 1.6) - #3 by hsaly

I’m sorry, but it doesn’t work.

The certificates may work on a blank, from scratch installation, I tested it on a new 2 node testing setup as well; both your script and my initial running of the tool.

However, they are not working for trying to do the straight switch for our current operational systems, for some reason.
This is what I am trying to find out.

I don’t know if some remnant issue from 6.2.4 of elasticsearch/search-guard is causing problems. Or there is an issue with the existing SG index linked to the certificates, but something is not working.
I listed the current situation and the exact steps I took in the first post (some of the formatting got odd when I copied from the google groups though, sorry)

If more information is needed to help troubleshoot, I can certainly provide it, but it is not working.

Can you attach both zipped config folders from the working two node cluster before you updated the certs and both zipped config folders from the not working two node cluster after cert update.

I can, but I should also clarify:

The existing operational cluster is 5 nodes, this is the one that works currently but I am trying to switch certificates and the new certificates don’t seem to like it.

The 2 node cluster was just a blank testing setup and I ran the tls tool with a blank installation.
I don’t have a testing environment of the “switch”.
I can provide the config for this too though.

I will get the config from the existing installation and upload it shortly.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.