When upgrading from 6.8.6 to 7.10.2 I noticed the cluster state was inconsistent once more than half the nodes were upgraded.
The 7 nodes report RED while the 6 Nodes report YELLOW/GREEN.
The corresponding indices are the following:
Now that all nodes have been upgraded, the cluster is RED, with the following unassigned shards:
.signals_watches_trigger_state 3 p UNASSIGNED
.signals_watches_trigger_state 3 r UNASSIGNED
.signals_watches_trigger_state 0 p UNASSIGNED
.signals_watches_trigger_state 0 r UNASSIGNED
.signals_watches 2 p UNASSIGNED
.signals_watches 2 r UNASSIGNED
.searchguard_authtokens 4 r UNASSIGNED
.searchguard_authtokens 4 p UNASSIGNED
.searchguard_authtokens 2 r UNASSIGNED
.searchguard_authtokens 2 p UNASSIGNED
.signals_settings 1 r UNASSIGNED
.signals_settings 1 p UNASSIGNED
.signals_settings 4 r UNASSIGNED
.signals_settings 4 p UNASSIGNED
.signals_accounts 1 r UNASSIGNED
.signals_accounts 1 p UNASSIGNED
.signals_watches_state 1 r UNASSIGNED
.signals_watches_state 1 p UNASSIGNED
.signals_watches_state 3 r UNASSIGNED
.signals_watches_state 3 p UNASSIGNED
.searchguard_config_history 2 p UNASSIGNED
.searchguard_config_history 2 r UNASSIGNED
.searchguard_config_history 0 p UNASSIGNED
.searchguard_config_history 0 r UNASSIGNED
It seems the shards are simply missing.
So they must have been lost during the upgrade, for some reason.
.searchguard_authtokens 2 r UNASSIGNED
.searchguard_authtokens 2 p UNASSIGNED
.searchguard_authtokens 3 p STARTED 999.999.108.88 node83.example.com
.searchguard_authtokens 3 r STARTED 999.999.108.89 node84.example.com
.searchguard_authtokens 1 p STARTED 999.999.233.5 node78.example.com
.searchguard_authtokens 1 r STARTED 999.999.233.4 node77.example.com
.searchguard_authtokens 4 r UNASSIGNED
.searchguard_authtokens 4 p UNASSIGNED
.searchguard_authtokens 0 p STARTED 999.999.238.221 node221.example.com
.searchguard_authtokens 0 r STARTED 999.999.233.6 node79.example.com
So the question boils down to : is it safe to DELETE those ?
I’m not particularily anxious about .signals_*. But what about .searchguard_config_history and .searchguard_authtokens ?
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
red open .searchguard_config_history rsFVFnlMT0OzpdsJYFzA8Q 5 1 0 0 1.4kb 717b
red open .signals_settings TNWmQ7zeSKOBjbwOKBiYlw 5 1 0 0 1.4kb 717b
red open .signals_watches 1oXl_COeT92OrIp9kNiRVQ 5 1 0 0 1.8kb 956b
red open .signals_accounts i0YKW43VQVqe7C58BTM-6Q 5 1 0 0 1.8kb 956b
red open .signals_watches_state _WYARV0HQ5CkXXvQ9jQv7A 5 1 0 0 1.4kb 717b
red open .signals_watches_trigger_state Fz4R0VmcQtm0EuvldZGPAg 5 1 0 0 1.4kb 717b
red open .searchguard_authtokens f3sWqRDRS_24T-VDCShiBw 5 1 0 0 1.4kb 717b
I’m guessing they were added by search-guard-7 during the rolling upgrade.
If you don’t use these features, it should be safe to just delete the indices. It is however possible that Search Guard will try to create the index again whenever a node starts up and discovers it has elected to be the master.
To be on the safe side I just used the cluster reroute API to allocate_empty_primary for all the concerned indices, and now My cluster is GREEn. Thanks for your help.
And no, I used 7.10.2-51.0.0 because that’s what I used on preproduction
For us it has been proven that it is much easier and less error-prone to spin up a new cluster with new version and just restore the cluster state and snapshot there and delete the old cluster.
Rolling Update of running productive cluster in my experience ALWAYS lead to troubles, stress and problems.