Hi there
We had an unresponsive Elasticsearch cluster (5 node host-[06-10] , with 32GB of heap space assigned to each host). Upon checking cluster against each host only one of it (host-08) would respond after a while - Cluster state as Yelllow and having only 4 hosts. The rest would either timeout or return SSL error.
Also, found that most hosts have reached a heap space (_cat/nodes) of almost 99%
Upon checking the logs we found the exceptions listed in the log messages section below.
I checked the validity of the certificates in JKS and they are good till Aug 20th 2018.
Eventually I had restart the cluster to resolve this and it took almost 5-6 hours with ~330GB of index size. The cluster is now reen and healthy. However, I still see the below error messages in the log
[2018-06-26 15:49:17,539] [ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [host-10] SSL Problem null cert chain
javax.net.ssl.SSLHandshakeException: null cert chain
Any thoughts on what could’ve gone wrong, please let me know your views.
- Search Guard and Elasticsearch version
Elasticsearch version - 2.3.3
Search Guard version -
“component”: “search-guard-2”,
“version”: “2.3.3.1”,
“component”: “search-guard-ssl”,
“version”: “2.3.3.13”,
- JVM version and operating system version
java version “1.7.0_51”
OS : Oracle Linux 6.9
- Search Guard configuration files
searchguard.ssl.transport.enabled: true
searchguard.ssl.transport.keystore_type: JKS
searchguard.ssl.transport.keystore_filepath: jks1/server.jks
searchguard.ssl.transport.keystore_password: <>
searchguard.ssl.transport.truststore_type: JKS
searchguard.ssl.transport.truststore_filepath: jks1/server.jks
searchguard.ssl.transport.truststore_password: <>
searchguard.ssl.http.enabled: true
searchguard.ssl.http.keystore_type: JKS
searchguard.ssl.http.keystore_filepath: jks1/server.jks
searchguard.ssl.http.keystore_password: <>
searchguard.ssl.http.truststore_type: JKS
searchguard.ssl.http.truststore_filepath: jks1/server.jks
searchguard.ssl.http.truststore_password: <>
searchguard.ssl.http.clientauth_mode: NONE
searchguard.enabled: true
security.manager.enabled: false
searchguard.authcz.admin_dn:
- “C=US,ST=California,O=A* Inc.,OU=management:group.752639,CN=rody.com”
- “C=US,ST=California,O=A* Inc.,OU=management:group.752639,CN=pwk-rody.com”
- “C=US,ST=California,O=A* Inc.,OU=management:group.752639,CN=pwk-f5p-lnn01.com”
searchguard.authcz.impersonation_dn:
“C=US,ST=California,O=A* Inc.,OU=management:group.752639,CN=rody.com”:
- '’
searchguard.authcz.impersonation_dn:
"C=US,ST=California,O=A Inc.,OU=management:group.752639,CN=pwk.rody.com":
- '’
searchguard.authcz.impersonation_dn:
"C=US,ST=California,O=A Inc.,OU=management:group.752639,CN=pwk-f5p-lnn01…com":
- Elasticsearch log messages on debug level
[2018-06-19 18:56:30,600][WARN ][netty.handler.ssl.SslHandler] Unexpected leftover data after SSLEngine.unwrap(): status=OK handshakeStatus=NEED_WRAP consumed=0 produced=0 remaining=7 data=15030300020100
//Several instances of GC cycles
[2018-06-19 18:56:30,593][WARN ][monitor.jvm ] [host-07] [gc][old][1374850][5979] duration [2.1m], collections [1]/[2.1m], total [2.1m]/[4.6d], memory [31.7gb]->[31.7gb]/[31.7gb], all_pools {[young] [1.8gb]->[1.8gb]/[1.8gb]}{[survivor] [230mb]->[230.1mb]/[232.9mb]}{[old] [29.7gb]->[29.7gb]/[29.7gb]}
[2018-06-19 18:56:30,589][DEBUG][action.search ] [host-07] [14368933] Failed to execute query phase
RemoteTransportException[[host-10][17.160.69.238:9300][indices:data/read/search[phase/query/id]]]; nested: SearchContextMissingException[No search context found for id [14368933]];
Caused by: SearchContextMissingException[No search context found for id [14368933 ]]
at org.elasticsearch.search.SearchService.findContext(SearchService.java:613 )
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:425 )
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:376 )
at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:373 )
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33 )
at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService.messageReceivedDecorate(SearchGuardSSLTransportService.java:161 )
at com.floragunn.searchguard.transport.SearchGuardTransportService.messageReceivedDecorate(SearchGuardTransportService.java:293 )
at com.floragunn.searchguard.ssl.transport.SearchGuardSSLTransportService$Interceptor.messageReceived(SearchGuardSSLTransportService.java:128 )
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75 )
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:300 )
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37 )
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615 )
at java.lang.Thread.run(Thread.java:744)
[2018-06-19 19:59:16,722][WARN ][discovery.zen.ping.unicast] [host-07] failed to send ping to [{#zen_unicast_3#}{17.160.69.236}{host-08/17.99.69.266:9300}]
ReceiveTimeoutTransportException[[host-08m/17.99.69.266:9300][internal:discovery/zen/unicast] request_id [80874756] timed out after [124435ms ]]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:679 )
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615 )
at java.lang.Thread.run(Thread.java:744)
[2018-06-19 20:17:07,562][WARN ][com.floragunn.searchguard.transport.SearchGuardTransportService] [host-07] Received response for a request that has timed out, sent [1195273ms] ago, timed out [1070838ms] ago, action [internal:discovery/zen/unicast], node [{#zen_unicast_3#}{17.160.69.236}{ma2-rodyp-lcb08.corp.apple.com/17.160.69.236:9300}], id [80874756]
[2018-06-20 03:07:58,459][WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [host-07] exception caught on transport layer [[id: 0x0f3baf15, /17.96.16.84:48738 => /17.160.69.235:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at sun.security.ssl.EngineInputRecord.compareMacTags(EngineInputRecord.java:330 )
at sun.security.ssl.EngineInputRecord.checkMacTags(EngineInputRecord.java:313 )
at sun.security.ssl.EngineInputRecord.decrypt(EngineInputRecord.java:244)
[2018-06-20 12:08:05,960][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [host-08] SSL Problem Received close_notify during handshake
javax.net.ssl.SSLException: Received close_notify during handshake
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208 )
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1619 )
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1587 )
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1732 )
at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1060 )
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884 )
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758 )
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624 )
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1218 )
at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852 )
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425 )
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303 )
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70 )
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564 )
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559 )
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268 )
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255 )
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88 )
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108 )
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337 )
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89 )
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178 )
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108 )
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42 )
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615 )
at java.lang.Thread.run(Thread.java:744)
[2018-06-20 15:34:35,383][ERROR][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [host-08] SSL Problem null cert chain
javax.net.ssl.SSLHandshakeException: null cert chain
at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1290 )
at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:513 )
at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:790 )
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758 )
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624 )
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1218 )
at org.jboss.netty.handler.ssl.SslHandler.decode(SslHandler.java:852 )
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425 )
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303 )
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70 )
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564 )
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559 )
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268 )
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255 )
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88 )
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108 )
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337 )
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89 )
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178 )
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108 )
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42 )
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615 )
at java.lang.Thread.run(Thread.java:744)
Caused by: javax.net.ssl.SSLHandshakeException: null cert chain
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192 )
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1619 )
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:278 )
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:266 )
at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:1631 )
at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:176 )
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:868 )
at sun.security.ssl.Handshaker$1.run(Handshaker.java:808 )
at sun.security.ssl.Handshaker$1.run(Handshaker.java:806 )
at java.security.AccessController.doPrivileged(Native Method )
at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1227 )
at org.jboss.netty.handler.ssl.SslHandler.runDelegatedTasks(SslHandler.java:1392 )
at org.jboss.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1255 )
… 18 more
[2018-06-21 21:29:00,075][WARN ][gateway ] [host-08] [searchguard][0]: failed to list shard for shard_store on node [kxb4QDvVR8eNNwOoPT-uVQ]
FailedNodeException[total failure in fetching]; nested: ElasticsearchException[unauthenticated request internal:cluster/nodes/indices/shard/store for user User [name=_sg_internal, roles= ]];
at org.elasticsearch.gateway.AsyncShardFetch$1.onFailure(AsyncShardFetch.java:277 )
at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:95 )
at com.floragunn.searchguard.filter.SearchGuardFilter.apply(SearchGuardFilter.java:135 )
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170 )
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:144 )
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:85 )
at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.list(TransportNodesListShardStoreMetaData.java:88 )
at org.elasticsearch.gateway.AsyncShardFetch.asyncFetch(AsyncShardFetch.java:267 )
at org.elasticsearch.gateway.AsyncShardFetch.fetchData(AsyncShardFetch.java:117 )
at org.elasticsearch.gateway.GatewayAllocator$InternalReplicaShardAllocator.fetchData(GatewayAllocator.java:183 )
at org.elasticsearch.gateway.ReplicaShardAllocator.allocateUnassigned(ReplicaShardAllocator.java:137 )
at org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:123 )
at org.elasticsearch.cluster.routing.allocation.allocator.ShardsAllocators.allocateUnassigned(ShardsAllocators.java:70 )
at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:258 )
at org.elasticsearch.cluster.routing.allocation.AllocationService.applyStartedShards(AllocationService.java:86 )
at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardStartedClusterStateHandler.execute(ShardStateAction.java:218 )
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:468 )
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772 )
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231 )
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194 )
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145 )
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615 )
at java.lang.Thread.run(Thread.java:744)
Caused by: ElasticsearchException[unauthenticated request internal:cluster/nodes/indices/shard/store for user User [name=_sg_internal, roles= ]]
… 21 more
- Other installed Elasticsearch or Kibana plugins, if any
delete-by-query - 2.3.3
head