Nodes can't connect when replacing JKS with PEM certs

#1

Hi

Our cluster has been using truststore and keystore JKS files for admin/HTTP and node (transport) certificates, but we’re going to use PEM files (using OpenSSL) for transport certs for node communication, while using existing truststore/keystore for admin/HTTP certificates.

Following the guides to setup PEM certs, requirements are installed and configured. I stopped the whole cluster, the first node started without an issue, but when the 2nd node starts, it can’t connect to the other node (logs below). The cert has the SAN attribute, and the hostname verification is disabled (cert and configs below). I tried this with and without explicitly defining “searchguard.nodes_dn” in the configuration file.

Solutions/hints are much appreciated.

Logs

[2019-05-09T15:50:15,548][WARN ][o.e.n.Node               ] [es-warm2] timed out while waiting for initial discovery state - timeout: 30s
[2019-05-09T15:50:15,557][INFO ][o.e.h.n.Netty4HttpServerTransport] [es-warm2] publish_address {10.1.2.189:9200}, bound_addresses {127.0.0.1:9200}, {127.0.1.1:9200}, {10.1.2.189:9200}
[2019-05-09T15:50:15,557][INFO ][o.e.n.Node               ] [es-warm2] started
[2019-05-09T15:50:15,558][INFO ][c.f.s.SearchGuardPlugin  ] [es-warm2] 4 Search Guard modules loaded so far: [Module [type=DLSFLS, implementing class=com.floragunn.searchguard.configuration.SearchGuardFlsDlsIndexSearcherWrapper], Module [type=REST_MANAGEMENT_API, implementing class=com.floragunn.searchguard.dlic.rest.api.SearchGuardRestApiActions], Module [type=MULTITENANCY, implementing class=com.floragunn.searchguard.configuration.PrivilegesInterceptorImpl], Module [type=AUDITLOG, implementing class=com.floragunn.searchguard.auditlog.impl.AuditLogImpl]]
[2019-05-09T15:50:15,615][WARN ][o.e.d.z.ZenDiscovery     ] [es-warm2] not enough master nodes discovered during pinging (found [[Candidate{node={es-warm2}{eDxGfvIsQP-LhzJvQI0O7A}{NXGBOSpWRHGy1B_N6vqr1A}{10.1.2.189}{10.1.2.189:9300}{xpack.installed=true, box_type=warm, cabinet=hypervisor9.c1}, clusterStateVersion=-1}]], but needed [3]), pinging again
[2019-05-09T15:50:23,703][ERROR][c.f.s.t.SearchGuardRequestHandler] [es-warm2] ElasticsearchException[Illegal parameter in http or transport request found.
This means that one node is trying to connect to another with 
a non-node certificate (no OID or searchguard.nodes_dn incorrect configured) or that someone 
is spoofing requests. Check your TLS certificate setup as described here: See http://docs.search-guard.com/latest/troubleshooting-tls]
[2019-05-09T15:50:25,616][WARN ][o.e.d.z.ZenDiscovery     ] [es-warm2] not enough master nodes discovered during pinging (found [[Candidate{node={es-warm2}{eDxGfvIsQP-LhzJvQI0O7A}{NXGBOSpWRHGy1B_N6vqr1A}{10.1.2.189}{10.1.2.189:9300}{xpack.installed=true, box_type=warm, cabinet=hypervisor9.c1}, clusterStateVersion=-1}]], but needed [3]), pinging again
[2019-05-09T15:50:25,672][WARN ][o.e.d.z.UnicastZenPing   ] [es-warm2] [5] failed send ping to {10.1.1.120:9300}{h8Gu7bRQR8GvKK2nyzYu6Q}{es-hot1.c1}{10.1.1.120:9300}
java.lang.IllegalStateException: handshake failed with {10.1.1.120:9300}{h8Gu7bRQR8GvKK2nyzYu6Q}{es-hot1.c1}{10.1.1.120:9300}
	at org.elasticsearch.transport.TransportService.handshake(TransportService.java:418) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.TransportService.handshake(TransportService.java:386) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.discovery.zen.UnicastZenPing$PingingRound.getOrConnect(UnicastZenPing.java:371) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.discovery.zen.UnicastZenPing$3.doRun(UnicastZenPing.java:476) [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759) [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.6.0.jar:6.6.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
Caused by: org.elasticsearch.transport.RemoteTransportException: [es-hot1][10.1.1.120:9300][internal:transport/handshake]
Caused by: org.elasticsearch.ElasticsearchException: Illegal parameter in http or transport request found.
This means that one node is trying to connect to another with 
a non-node certificate (no OID or searchguard.nodes_dn incorrect configured) or that someone 
is spoofing requests. Check your TLS certificate setup as described here: See http://docs.search-guard.com/latest/troubleshooting-tls
	at com.floragunn.searchguard.ssl.util.ExceptionUtils.createBadHeaderException(ExceptionUtils.java:57) ~[?:?]
	at com.floragunn.searchguard.transport.SearchGuardRequestHandler.messageReceivedDecorate(SearchGuardRequestHandler.java:201) ~[?:?]
	at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceived(SearchGuardSSLRequestHandler.java:141) ~[?:?]
	at com.floragunn.searchguard.SearchGuardPlugin$7$1.messageReceived(SearchGuardPlugin.java:645) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1288) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:140) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1246) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1110) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:913) ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1436) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203) ~[?:?]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) ~[?:?]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
	... 1 more

ES config (elasticearch.yml, removed unrelated configs)

node.name: es-warm2
node.master: true
node.data: true
node.attr.box_type: warm

xpack.graph.enabled: false
xpack.ml.enabled: false
xpack.monitoring.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false
xpack.logstash.enabled: false

bootstrap.memory_lock: true

searchguard.nodes_dn:
  - 'CN=*.elk.byte.nl'
  - 'CN=elk.byte.nl'
searchguard.ssl.transport.enable_openssl_if_available: true
searchguard.ssl.transport.pemkey_filepath: ssl-transport.key
searchguard.ssl.transport.pemcert_filepath: ssl-transport.cert
searchguard.ssl.transport.pemtrustedcas_filepath: ssl-transport.ca
searchguard.ssl.transport.enforce_hostname_verification: false

searchguard.ssl.http.enabled: true
searchguard.ssl.http.keystore_filepath: node-keystore.jks
searchguard.ssl.http.truststore_filepath: truststore.jks
searchguard.ssl.http.keystore_password: ***************
searchguard.ssl.http.truststore_password: ****************

searchguard.enterprise_modules_enabled: true
searchguard.authcz.admin_dn:
  - ...

cert info (from sgtlsdiag, stripped hashes, validation time, etc.)

========================================================================
/etc/elasticsearch/ssl-transport.cert
------------------------------------------------------------------------
Certificate 1
------------------------------------------------------------------------
Subject DN [RFC2253]: CN=*.elk.byte.nl,OU=PositiveSSL Wildcard,OU=Domain Control Validated
 Issuer DN [RFC2253]: CN=Sectigo RSA Domain Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB
           Key Usage: digitalSignature keyEncipherment
 Signature Algorithm: SHA256WITHRSA
             Version: 3
  Extended Key Usage: id_kp_serverAuth id_kp_clientAuth
  Basic Constraints: -1
                SAN: 
                  dNSName: *.elk.byte.nl
                  dNSName: elk.byte.nl

------------------------------------------------------------------------
Trust anchor:
C=GB,ST=Greater Manchester,L=Salford,O=Sectigo Limited,CN=Sectigo RSA Domain Validation Secure Server CA

========================================================================
/etc/elasticsearch/ssl-transport.ca
------------------------------------------------------------------------
Certificate 1
------------------------------------------------------------------------
Subject DN [RFC2253]: CN=Sectigo RSA Domain Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB
 Issuer DN [RFC2253]: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
           Key Usage: digitalSignature keyCertSign cRLSign
 Signature Algorithm: SHA384WITHRSA
             Version: 3
  Extended Key Usage: id_kp_serverAuth id_kp_clientAuth
  Basic Constraints: 0
                SAN: (none)
------------------------------------------------------------------------
Certificate 2
------------------------------------------------------------------------
Subject DN [RFC2253]: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
 Issuer DN [RFC2253]: CN=USERTrust RSA Certification Authority,O=The USERTRUST Network,L=Jersey City,ST=New Jersey,C=US
           Key Usage: keyCertSign cRLSign
 Signature Algorithm: SHA384WITHRSA
             Version: 3
  Extended Key Usage: null
  Basic Constraints: 2147483647
                SAN: (none)
------------------------------------------------------------------------

System info:

  • Debian GNU/Linux 8.10 (jessie)
  • Java:
    • OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-1~bpo8+1-b11)
    • OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)
  • ES version: 6.6.0
  • SG Version: 6.6.0-24.1
#2

It seems to me that the nodes_dn are not configured correcty:

Caused by: org.elasticsearch.ElasticsearchException: Illegal parameter in http or transport request found. This 
means that one node is trying to connect to another with a non-node certificate (no OID or 
searchguard.nodes_dn incorrect configured) or that someone is spoofing requests. Check your TLS certificate 
setup as described here: See http://docs.search-guard.com/latest/troubleshooting-tls

The Distinguished Name of your “ssl-transport.cert” node certificate is according to tlsdiag:

CN=*.elk.byte.nl,OU=PositiveSSL Wildcard,OU=Domain Control Validated

But in the nodes_dn setting you only specified the CN part of the DN:

searchguard.nodes_dn:
  - 'CN=*.elk.byte.nl'

Please try to give the full DN like:

searchguard.nodes_dn:
  - 'CN=*.elk.byte.nl,OU=PositiveSSL Wildcard,OU=Domain Control Validatedl'
assigned jkressin #3
#4

Thanks, using the full DN instead of just the CN in “nodes_dn” fixed it.

I’m curious why we had to specify this option. According to the documentation this is optional, since the output from sgtlsdiag displays the SAN attribute.

#5

Just a guess, but maybe you confuse the DN (“Distinguished Name”) of the certificate with the dNSName (“Domain Name Service”) in the SAN section?

A TLS certificate is identified by its Distinguished Name (which can be found in the Subject DN section of the tlsdiag outpout) and can have multiple different SAN entries.

Do you have any input on how to make that more clear in the docs or our tools, so other users don’t run into this problem as well? Thx for any suggestions!

#6

That sounds right. Current docs say:

If your node certificates have an OID identifier in the SAN section, you can omit this configuration completely. See TLS for production environments for more details regarding this option.

And the TLS for production environments docs say:

The OID is defined in the Subject Alternative Name (SAN) section of the certificate, and must have the default value 1.2.3.4.5.5 for node certificates. If this value is found, the certificate is considered to be a valid node certificate.

And I thought the fact that the certificate contains the SAN attribute (although not displaying the DN, but dNSName values) matches the requirements to avoid this config.

To improve the documentation, maybe it could be helpful to include an example of a SAN attribute that contains enough information to render this config optional. If this was there, I would have noticed the difference.

#7

Perfect, thanks for the hint. When working with this stuff everyday it’s easy to overlook ambiguities like this one, so we’re always thankful if someone points them out!