external_elasticsearch audit logging throwing errors on bulk write rejects from the target cluster

ELK: 6.4.2
SG: 23.1

(Have a busy cluster)

Search Guard is configured for external_elasticsearch audit logging and the target cluster is at times busy being hammered by Logstash and will throw bulk write rejects (Logstash just keeps retrying). This is throwing an exception in Search Guard. Is there any auto retry option in Search Guard audit logging to help avoid this? I have tried increasing the searchguard.audit.threadpo settings but there is no noticeable improvement. I want to try and avoid raising the bulk write threadpool sizes on the remote cluster nodes if possible (but will if needed)

searchguard.audit.config.enable_ssl: false

searchguard.audit.config.http_endpoints: xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200

searchguard.audit.config.index: rollover-custom-searchguard

searchguard.audit.config.verify_hostnames: false

searchguard.audit.type: external_elasticsearch

[2018-11-26T13:46:36,818][ERROR][c.f.s.h.HttpClient ] ElasticsearchStatusException[Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]]

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]

at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.4.2.jar:6.4.2]

at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1406) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1382) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1269) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]

at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]

at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]

{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}

at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:920) ~[?:?]

at org.elasticsearch.client.RestClient.performRequest(RestClient.java:227) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1256) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]

at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]

at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]

{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}

at org.elasticsearch.client.RestClient$1.completed(RestClient.java:540) ~[?:?]

at org.elasticsearch.client.RestClient$1.completed(RestClient.java:529) ~[?:?]

at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[?:?]

at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[?:?]

at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[?:?]

at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[?:?]

at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[?:?]

at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[?:?]

at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[?:?]

at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[?:?]

at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[?:?]

… 1 more

[2018-11-26T13:46:36,827][ERROR][c.f.s.a.s.ExternalESSink ] Unable to send audit log {“audit_node_id”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_request_layer”:“REST”,“audit_request_exception_stacktrace”:“io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f3f66696c7465725f706174683d76657273696f6e2e6e756d62657220485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a2031302e33342e33312e3234303a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d487474704173796e63436c69656e742f342e312e3220284a6176612f312e382e305f313931290d0a417574686f72697a6174696f6e3a20426173696320654842685932747462323570644739794f6a68366558683256336c79643356685a44526f613274695a575a7a596c56515657347a57454e465a30347a0d0a0d0a\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1106)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)\n\tat java.lang.Thread.run(Thread.java:748)\n”,“audit_cluster_name”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_format_version”:3,“audit_utc_timestamp”:“2018-11-26T19:46:36.791+00:00”,“audit_node_host_address”:“xx.xx.xx.xx”,“audit_node_name”:“xxxxxxxxxxxxxxxxxxxxx-01”,“audit_category”:“SSL_EXCEPTION”,“audit_request_origin”:“REST”,“audit_node_host_name”:“xx.xx.xx.xx”} to one of these servers: [xx.xx.xx.xx:9200, xx.xx.xx.xx:9200, xx.xx.xx.xx:9200]

At the moment there is no such option, however, I have added this to our backlog since it seems like a reasonable feature.

But, I wonder if this would not just simply shift the problem from the remote cluster to the one that is generating the audit events. If there is a retry we need to keep the events in memory until the call succeeds or the retry limit is exceeded. If both your clusters are busy, this would pile up the events on the originating cluster. If the thread pool is exhausted we have no other option than to log the events locally. What do you think?

···

On Monday, November 26, 2018 at 9:14:15 PM UTC+1, Brian wrote:

ELK: 6.4.2
SG: 23.1

(Have a busy cluster)

Search Guard is configured for external_elasticsearch audit logging and the target cluster is at times busy being hammered by Logstash and will throw bulk write rejects (Logstash just keeps retrying). This is throwing an exception in Search Guard. Is there any auto retry option in Search Guard audit logging to help avoid this? I have tried increasing the searchguard.audit.threadpo settings but there is no noticeable improvement. I want to try and avoid raising the bulk write threadpool sizes on the remote cluster nodes if possible (but will if needed)

searchguard.audit.config.enable_ssl: false

searchguard.audit.config.http_endpoints: xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200

searchguard.audit.config.index: rollover-custom-searchguard

searchguard.audit.config.verify_hostnames: false

searchguard.audit.type: external_elasticsearch

[2018-11-26T13:46:36,818][ERROR][c.f.s.h.HttpClient ] ElasticsearchStatusException[Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]]

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=es_rejected_execution_exception, reason=rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]]

at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.4.2.jar:6.4.2]

at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1406) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1382) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1269) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]

at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]

at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]

{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}

at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:920) ~[?:?]

at org.elasticsearch.client.RestClient.performRequest(RestClient.java:227) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1256) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1231) ~[?:?]

at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:587) ~[?:?]

at com.floragunn.searchguard.httpclient.HttpClient.index(HttpClient.java:193) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.ExternalESSink.doStore(ExternalESSink.java:175) ~[?:?]

at com.floragunn.searchguard.auditlog.sink.AuditLogSink.store(AuditLogSink.java:56) ~[?:?]

at com.floragunn.searchguard.auditlog.routing.AsyncStoragePool.lambda$submit$0(AsyncStoragePool.java:60) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_191]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Caused by: org.elasticsearch.client.ResponseException: method [POST], host [http://xx.xx.xx.xx:9200], URI [/rollover-custom-searchguard/auditlog?refresh=true&timeout=1m], status line [HTTP/1.1 429 Too Many Requests]

{“error”:{“root_cause”:[{“type”:“remote_transport_exception”,“reason”:“[xxxxxxxxxxxxxxx-es-01][xx.xx.xx.xx:9300][indices:data/write/bulk[s]]”}],“type”:“es_rejected_execution_exception”,“reason”:“rejected execution of org.elasticsearch.transport.TransportService$7@7a8b7873 on EsThreadPoolExecutor[name = xxxxxxxxxxxxxxx-es-01/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7c6c4359[Running, pool size = 16, active threads = 16, queued tasks = 201, completed tasks = 775680898]]”},“status”:429}

at org.elasticsearch.client.RestClient$1.completed(RestClient.java:540) ~[?:?]

at org.elasticsearch.client.RestClient$1.completed(RestClient.java:529) ~[?:?]

at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[?:?]

at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[?:?]

at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[?:?]

at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[?:?]

at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[?:?]

at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[?:?]

at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[?:?]

at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[?:?]

at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[?:?]

at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[?:?]

… 1 more

[2018-11-26T13:46:36,827][ERROR][c.f.s.a.s.ExternalESSink ] Unable to send audit log {“audit_node_id”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_request_layer”:“REST”,“audit_request_exception_stacktrace”:“io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f3f66696c7465725f706174683d76657273696f6e2e6e756d62657220485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a2031302e33342e33312e3234303a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d487474704173796e63436c69656e742f342e312e3220284a6176612f312e382e305f313931290d0a417574686f72697a6174696f6e3a20426173696320654842685932747462323570644739794f6a68366558683256336c79643356685a44526f613274695a575a7a596c56515657347a57454e465a30347a0d0a0d0a\n\tat io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1106)\n\tat io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1162)\n\tat io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)\n\tat io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)\n\tat io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)\n\tat java.lang.Thread.run(Thread.java:748)\n”,“audit_cluster_name”:“xxxxxxxxxxxxxxxxxxxxx”,“audit_format_version”:3,“audit_utc_timestamp”:“2018-11-26T19:46:36.791+00:00”,“audit_node_host_address”:“xx.xx.xx.xx”,“audit_node_name”:“xxxxxxxxxxxxxxxxxxxxx-01”,“audit_category”:“SSL_EXCEPTION”,“audit_request_origin”:“REST”,“audit_node_host_name”:“xx.xx.xx.xx”} to one of these servers: [xx.xx.xx.xx:9200, xx.xx.xx.xx:9200, xx.xx.xx.xx:9200]

I think that is perfectly acceptable.

If the problem is that the remote cluster occasionally has bulk write rejects due to momentary sudden spikes of load, then a small queue locally to help buffer out the local audit logs would be immensely valuable. The above example shows up as an “SSL_EXCEPTION” in the audit log in Elasticsearch, oddly enough the write for the original audit log remote write was rejected but Search Guards log of that reject ended up being written to the remote cluster.

To your point, if the remote cluster is undersized and has constant bulk write rejects then it is not your (searchguard) problem to fix.

Ok, makes sense to me. I’ve added it to the backlog. Since the change required to support this is not massive, we might release it with the next regular Search Guard version already. Thanks for your input!

···

On Thursday, November 29, 2018 at 8:02:22 AM UTC+1, Brian wrote:

I think that is perfectly acceptable.

If the problem is that the remote cluster occasionally has bulk write rejects due to momentary sudden spikes of load, then a small queue locally to help buffer out the local audit logs would be immensely valuable. The above example shows up as an “SSL_EXCEPTION” in the audit log in Elasticsearch, oddly enough the write for the original audit log remote write was rejected but Search Guards log of that reject ended up being written to the remote cluster.

To your point, if the remote cluster is undersized and has constant bulk write rejects then it is not your (searchguard) problem to fix.