Cluster status is red due to unassigned shards .signals_* and .searchguard_*

Hi Team,

I am spining up elasticsearch cluster with SG install on docker swarm env and cluster is red because of unassigned shard. Cluster is red even after initializing SG.

I used admin cert and key to remove unassigned shards, Is there any options or config settings to avoid this in future when setting-up fresh cluster with SG installed?

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1088  100  1088    0     0  23148      0 --:--:-- --:--:-- --:--:-- 23652
.signals_accounts              0 r UNASSIGNED
.signals_accounts              0 p UNASSIGNED
.signals_watches_trigger_state 0 p UNASSIGNED
.signals_watches_trigger_state 0 r UNASSIGNED
.searchguard_authtokens        0 r UNASSIGNED
.searchguard_authtokens        0 p UNASSIGNED
.searchguard_config_history    0 r UNASSIGNED
.searchguard_config_history    0 p UNASSIGNED
.signals_watches               0 r UNASSIGNED
.signals_watches               0 p UNASSIGNED
.signals_settings              0 p UNASSIGNED
.signals_settings              0 r UNASSIGNED
.signals_watches_state         0 r UNASSIGNED
.signals_watches_state         0 p UNASSIGNED

{“type”: “server”, “timestamp”: “2021-03-26T14:06:43,116Z”, “level”: “ERROR”, “component”: “c.f.s.c.ProtectedConfigIndexService”, “cluster.name”: “<cluster_name>”, “node.name”: “esmaster01”, “message”: “Index .searchguard_authtokens is not yet ready:\n{“cluster_name”:”<cluster_name>",“status”:“red”,“timed_out”:true,“number_of_nodes”:5,“number_of_data_nodes”:2,“active_primary_shards”:0,“active_shards”:0,“relocating_shards”:0,“initializing_shards”:0,“unassigned_shards”:4,“delayed_unassigned_shards”:0,“number_of_pending_tasks”:0,“number_of_in_flight_fetch”:0,“task_max_waiting_in_queue_millis”:0,“active_shards_percent_as_number”:0.0}\nRetrying.", “cluster.uuid”: “<cluster_uuid>”, “node.id”: “<node_id>” }

ES Version
“version” : {
“number” : “7.10.1”,
“build_flavor” : “default”,
“build_type” : “docker”,
“build_hash” : “1c34507e66d7db1211f66f3513706fdf548736aa”,
“build_date” : “2020-12-05T01:00:33.671820Z”,
“build_snapshot” : false,
“lucene_version” : “8.7.0”,
“minimum_wire_compatibility_version” : “6.8.0”,
“minimum_index_compatibility_version” : “6.0.0-beta1”
},
“tagline” : “You Know, for Search”
}

Steps to reproduce:

  1. Deploy es cluster using docker swarm

Provide configuration:
Basic ES Configuration with certs

Hi. Have only the SG indices had the unassigned shards? Maybe you did the rolling restart and forgot to re-enable the shard allocation? It is recommended to disable the allocation before the restart and re-enable when the cluster is yellow to avoid unnecessary replication Full-cluster restart and rolling restart | Elasticsearch Reference [7.12] | Elastic

@srgbnd, This is a fresh install using docker images with SG installed in it. There was no indices created earlier. I didnt experienced this issue before signals and auth-token.

Best,
Yash

Now cluster is stable but no indices for signals and auth-token, will check whether I can mannually add the indices for signals and auth-tokens.

Update to my previous issue, I did upgrade from 7.10.1 to 7.10.2 and search-guard SecurityIndices are created.

If we are setting up cluster using docker-swarm with SG installed, issue occurs because we couldn’t override sharding allocation setting on start-up. Once cluster is up,

  • Initialize SG,
  • Remove unassigned shards
  • Disable sharding allocation
  • Upgrade the cluster to next minor version so SG installation will re-deploy all the required security indices.

Would be great if SG installs security-indices after initialization when there is no existing search-guard indices. Can you guide me if there is a way to do this which I am not aware.

@yasvanth it sounds like you have this instance under control, but if you can still reproduce the original issue it would be helpful if you could call this ES endpoint:

GET /_cluster/allocation/explain

while you still have all the UNASSIGNED shards, and share the results here. Since the searchguard index won’t be initialized at that point, you’d have to make this call using the admin cert and key (as you did before).

@koyaanisqatsi , Yes, I can spin-up small cluster with single node today and extract the details that has been requested.

@koyaanisqatsi, I got the response below when I spuned single node cluster with SG installed.

{
  "index" : ".signals_settings",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-03-29T10:35:21.977Z",
    "failed_allocation_attempts" : 1,
    "details" : "failed shard on node [xtteOv-oQZm0xM3TGHcwzQ]: master {esmaster01-test}{xtteOv-oQZm0xM3TGHcwzQ}{3IG9fTReR4K5Bra2KZmNPw}{10.0.4.3}{10.0.4.3:9300}{dm}{xpack.installed=true, transform.node=false} has not removed previously failed shard. resending shard failure",
    "last_allocation_status" : "awaiting_info"
  },
  "can_allocate" : "awaiting_info",
  "allocate_explanation" : "cannot allocate because information about existing shard data is still being retrieved from some of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "xtteOv-oQZm0xM3TGHcwzQ",
      "node_name" : "esmaster01-test",
      "transport_address" : "10.0.4.3:9300",
      "node_attributes" : {
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "yes"
    }
  ]
}

@yasvanth could you please test whether a “retry” would finally allocate the failed shards:

POST /_cluster/reroute?retry_failed=true&explain=true

Also, are you planning to use the Signals functionality ? If you’re not, as a workaround, we can explicitly disable Signals - in elasticsearch.yml:

signals.enabled: false

we’d have to disable it once the SG plugin has been installed (otherwise ES will reject this value while starting up), but before the first time you start the ES node with SG already installed (so that the signals indices have not been created) - do you think you can introduce that in the middle of your set up sequence?

@koyaanisqatsi I wanted to use signals, even if I disable signals then issue persists with “.searchguard_authtoken” indices.

Output of “retrying” allocation is provided below and initiallized SG after retrying but no luck.

{"acknowledged":true,"state":{"cluster_uuid":"J8twOq0qTYOlWMUYUw3gpQ","version":41,"state_uuid":"LgFSN4gpRNO-vmmM7rRtaQ","master_node":"mRfb21dAQbC37ZBfOWSUyg","blocks":{},"nodes":{"mRfb21dAQbC37ZBfOWSUyg":{"name":"esmaster01-test","ephemeral_id":"6zBRxEI5T46iGmoTBNOB6g","transport_address":"10.0.2.3:9300","attributes":{"xpack.installed":"true","transform.node":"false"}}},"routing_table":{"indices":{".searchguard_authtokens":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".searchguard_authtokens","allocation_id":{"id":"zKNzOSqMRG22S9lrQlRBsw"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".searchguard_authtokens","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.185Z","delayed":false,"allocation_status":"no_attempt"}}]}},".signals_watches_state":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches_state","allocation_id":{"id":"MiJOJKzzQBa_lNU0kr2rAw"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches_state","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.588Z","delayed":false,"allocation_status":"no_attempt"}}]}},".searchguard_config_history":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".searchguard_config_history","allocation_id":{"id":"o5VL_OyuTEGDLJz2YHVfaQ"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".searchguard_config_history","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.791Z","delayed":false,"allocation_status":"no_attempt"}}]}},".signals_watches_trigger_state":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches_trigger_state","allocation_id":{"id":"U7J8dsE3SDm7mtx_oJLyRQ"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches_trigger_state","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.326Z","delayed":false,"allocation_status":"no_attempt"}}]}},".signals_settings":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_settings","allocation_id":{"id":"hj9hcmMmTA2lC-6ZstzLCw"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_settings","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.703Z","delayed":false,"allocation_status":"no_attempt"}}]}},".signals_accounts":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_accounts","allocation_id":{"id":"fCjWkZd4QmuTuk-CygJgCQ"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_accounts","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.472Z","delayed":false,"allocation_status":"no_attempt"}}]}},".signals_watches":{"shards":{"0":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches","allocation_id":{"id":"mFDrx0pjScefWh5fcR2IrA"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.863Z","delayed":false,"allocation_status":"no_attempt"}}]}}}},"routing_nodes":{"unassigned":[{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".searchguard_authtokens","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.185Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches_state","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.588Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".searchguard_config_history","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.791Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches_trigger_state","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.326Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_settings","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.703Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_accounts","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.472Z","delayed":false,"allocation_status":"no_attempt"}},{"state":"UNASSIGNED","primary":false,"node":null,"relocating_node":null,"shard":0,"index":".signals_watches","recovery_source":{"type":"PEER"},"unassigned_info":{"reason":"INDEX_CREATED","at":"2021-03-30T08:32:12.863Z","delayed":false,"allocation_status":"no_attempt"}}],"nodes":{"mRfb21dAQbC37ZBfOWSUyg":[{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".searchguard_authtokens","allocation_id":{"id":"zKNzOSqMRG22S9lrQlRBsw"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches_state","allocation_id":{"id":"MiJOJKzzQBa_lNU0kr2rAw"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".searchguard_config_history","allocation_id":{"id":"o5VL_OyuTEGDLJz2YHVfaQ"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches_trigger_state","allocation_id":{"id":"U7J8dsE3SDm7mtx_oJLyRQ"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_settings","allocation_id":{"id":"hj9hcmMmTA2lC-6ZstzLCw"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_accounts","allocation_id":{"id":"fCjWkZd4QmuTuk-CygJgCQ"}},{"state":"STARTED","primary":true,"node":"mRfb21dAQbC37ZBfOWSUyg","relocating_node":null,"shard":0,"index":".signals_watches","allocation_id":{"id":"mFDrx0pjScefWh5fcR2IrA"}}]}}}}