Deadlock on SG/FLX & ES 9.3.x during configuration

Elasticsearch version: 9.3

Describe the issue:
Running sgctl update-config against a two-node ES 9.3.3 + SG FLX 4.1.10 cluster hangs
indefinitely. If the pool size is 1, the ES management thread on the master node blocks inside
ConfigUpdateAction.nodeOperation() and never completes.
The problem does not happen on ES 9.2 neither on ES 8.19; the root cause is an ES 9.3 behaviour change: auto_put_mapping is now dispatched to the
MANAGEMENT thread pool. In earlier versions it ran on a different pool, so the deadlock did
not occur.

Steps to reproduce:

  1. See attached file & README

Expected behavior:

No sgctl hangs

Provide configuration:
In attached file

searchguard-es-9_3-deadlock.zip (6.7 KB)

@issac.garcia Which versions of SG plugin did you test with 9.2 and 8.19? Also what where exact ELK tested versions?

Hi @pablo For 9.2 I tested with 4.0.1-es-9.2.6, for 8.19 I tested with 4.0.1-es-8.19.13
In those two it worked,
In 4.1.0-es-9.3.3 there is the deadlock issue
We discover the deadlock in an integration test environment, in the attached file you can see how to reproduce

Thank you @issac.garcia for sharing the info. Could you tell me why you need to set the node.processors to 1? Is it for testing or you have limited resources?

I don’t need to set the node to 1, our testing cluster was using 1 by default when we discover the deadlock. It took us a while to understand the issue because it didn’t happen on previous ES versions. We change it and now we don’t have any problem. I just share what we found with you guys to improve your software

@issac.garcia Thank you for sharing your findings. We will investigate it further.