Exception during establishing a SSL connection: java.net.SocketException: Connection reset

Hi guys,

from time to time I have following error on logstash side which is sending the logs to the loadbalancer url (azure loadbalancer):

[2020-07-21T09:45:48,982][WARN ][logstash.outputs.elasticsearch][some_pipeline] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic.cluster.com:9200/][Manticore::SocketTimeout] Read timed out {:url=>https://logstash:xxxxxx@elastic.cluster.com:9200/, :error_message=>"Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic.cluster.com:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2020-07-21T09:45:48,991][ERROR][logstash.outputs.elasticsearch][some_pipeline] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [https://logstash:xxxxxx@elastic.cluster.com:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2020-07-21T09:45:50,842][ERROR][logstash.outputs.elasticsearch][some_pipeline] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>2}
[2020-07-21T09:45:50,996][ERROR][logstash.outputs.elasticsearch][some_pipeline] Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>4}
[2020-07-21T09:45:51,363][WARN ][logstash.outputs.elasticsearch][some_pipeline] Restored connection to ES instance {:url=>"https://logstash:xxxxxx@elastic.cluster.com:9200/"}

and on the elasticsearch node side I have following:

[2020-07-21T07:45:36,900][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [elastic01.node.com] Exception during establishing a SSL connection: java.net.SocketException: Connection reset
java.net.SocketException: Connection reset
	at sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376) ~[?:?]
	at org.elasticsearch.transport.CopyBytesSocketChannel.readFromSocketChannel(CopyBytesSocketChannel.java:141) ~[transport-netty4-client-7.6.2.jar:7.6.2]
	at org.elasticsearch.transport.CopyBytesSocketChannel.doReadBytes(CopyBytesSocketChannel.java:126) ~[transport-netty4-client-7.6.2.jar:7.6.2]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:600) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:554) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) [netty-transport-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050) [netty-common-4.1.43.Final.jar:4.1.43.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.43.Final.jar:4.1.43.Final]
	at java.lang.Thread.run(Thread.java:830) [?:?]

the pipeline configuraiton is:

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/logstash.crt"
    ssl_key => "/logstash.pkcs8.key"
    ssl_key_passphrase => "${LOGSTASH_KEY_PASS}"
  }
}

output {
  elasticsearch {
    hosts => ["https:/elastic.cluster.com:9200"]
	index => "some-index-name%{+YYYY.MM.dd}"
	ssl => true
	ssl_certificate_verification => true
	cacert => '/certs/ca.crt'
    user => logstash   
    password => "${LOGSTASH_PASS}"
  }
}

Do you have any idea why elasticsearch closes the connection from time to time?

Make sure you have all the system configurations set to the recommended values:

Monitor the server RAM. Maybe you reach the limit from time to time. If yes, increase the Java heap size https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

Thanks for your reply.

So i double checked the limits and everything seems fine according to the documentation on all nodes. Heap is set to:

-Xms16g
-Xmx16g

There is very little load on the machines yet. So this behaviour is very strange.

from time to time

How often does it happen? Twice per day? Ten times per day? Do you lose any data?

Maybe the load is low most of the time. But sometimes, rarely, the load is very high during short periods of time. Try to monitor the Elasticsearch cluster health, plus Elasticsearch and Logstash system load over several days. Then look at the data and logs, maybe you will be able to spot a pattern. It will be easier if you have some tools to visualize the systems load data over time. You can use visualisations in Kibana for this.

Have you looked at the load balancer logs? Maybe there is a network shortage from time to time.

I have no loadbalancer logs and cannot configure it. What I am gonna try now is to try to ship the logs to the nodes directly without loadbalancer and see if the error occurs

So after I have removed the elastic azure loadbalancer in the logstash output configurations and used direct connection to elastic nodes instead I have no connection issues anymore between logstash and elasticsearch. I have no idea why the loadbalancer causing troubles yet, we are still investigating it.

This SocketException occurs on the server side when the client closed the socket connection before the response could be returned over the socket. For example, by quitting the browser before the reponse was retrieved. Connection reset simply means that a TCP RST was received. TCP RST packet is that the remote side telling you the connection on which the previous TCP packet is sent is not recognized, maybe the connection has closed, maybe the port is not open, and something like these. A reset packet is simply one with no payload and with the RST bit set in the TCP header flags. There are several possible causes.

  • The other end has deliberately reset the connection, in a way which I will not document here. It is rare, and generally incorrect, for application software to do this, but it is not unknown for commercial software.

  • More commonly, it is caused by writing to a connection that the other end has already closed normally. In other words an application protocol error.

  • It can also be caused by closing a socket when there is unread data in the socket receive buffer.

1 Like