ELK: 6.4.2
SG: 23.1
(Have a busy cluster)
Search Guard is configured for external_elasticsearch audit logging and the target cluster is at times busy being hammered by Logstash and will throw bulk write rejects (Logstash just keeps retrying). This is throwing an exception in Search Guard. Is there any auto retry option in Search Guard audit logging to help avoid this? I have tried increasing the searchguard.audit.threadpo settings but there is no noticeable improvement. I want to try and avoid raising the bulk write threadpool sizes on the remote cluster nodes if possible (but will if needed)
searchguard.audit.config.enable_ssl: false
searchguard.audit.config.http_endpoints: xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200
searchguard.audit.config.index: rollover-custom-searchguard
searchguard.audit.config.verify_hostnames: false
searchguard.audit.type: external_elasticsearch
[2018-11-26T13:46:36,818][ERROR][c.f.s.h.HttpClient ] ElasticsearchStatusException[Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]]
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1406) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1382) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1269) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]
at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]
at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]
at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]
at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]
{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:920) ~[?:?]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:227) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1256) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]
at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]
at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]
at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]
at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]
{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:540) ~[?:?]
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:529) ~[?:?]
at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[?:?]
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[?:?]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[?:?]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[?:?]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[?:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[?:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[?:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[?:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[?:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[?:?]
… 1 more
[2018-11-26T13:46:36,827][ERROR][c.f.s.a.s.ExternalESSink ] Unable to send audit log {“audit_node_id”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_request_layer”:“REST”,“audit_request_exception_stacktrace”:“io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f3f66696c7465725f706174683d76657273696f6e2e6e756d62657220485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a2031302e33342e33312e3234303a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d487474704173796e63436c69656e742f342e312e3220284a6176612f312e382e305f313931290d0a417574686f72697a6174696f6e3a20426173696320654842685932747462323570644739794f6a68366558683256336c79643356685a44526f613274695a575a7a596c56515657347a57454e465a30347a0d0a0d0a\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1106)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)\n\tat java.lang.Thread.run(Thread.java:748)\n”,“audit_cluster_name”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_format_version”:3,“audit_utc_timestamp”:“2018-11-26T19:46:36.791+00:00”,“audit_node_host_address”:“xx.xx.xx.xx”,“audit_node_name”:“xxxxxxxxxxxxxxxxxxxxx-01”,“audit_category”:“SSL_EXCEPTION”,“audit_request_origin”:“REST”,“audit_node_host_name”:“xx.xx.xx.xx”} to one of these servers: [xx.xx.xx.xx:9200, xx.xx.xx.xx:9200, xx.xx.xx.xx:9200]