Elasticsearch 5.6.14. Searchguard 5-5.6.14-19.2
I am trying to understand why searchguard index shards are being allocated to nodes in a way which I think Elasticsearch should not allow.
I work with a cluster that has a hot/cold architecture. We have Ansible use sgadmin.sh to disable automatic replication, count how many hot nodes we have and explicitly set the number of replicas to that number minus one. I.e. we currently have 6 hot nodes so Ansible tells sgadmin.sh to set 5 replicas.
We have 3 hot nodes in a data centre called foo, and 3 in a data centre called bar. We have this in our elasticsearch.yml
cluster.routing.allocation.awareness.attributes: datacentre cluster.routing.allocation.awareness.force.zone.values: [u'foo', u'bar']
My understanding of those settings is they mean a replica cannot be in the same data centre as it’s primary. Except some replicas for the searchguard index are in the same data centre as the primary. We’ve only recently started using Search Guard and I hadn’t given this any thought until the other day when I had to stop then start Elasticsearch on a hot node. After the node re-joined the cluster the replica shard of the searchguard index that had been on it was not re-assigned back to it. The cluster allocation explanation API said
[NO(there are too many copies of the shard allocated to nodes with attribute [datacentre], there are  total configured shard copies for this shard id and  total attribute values, expected the allocated shard count per attribute  to be less than or equal to the upper bound of the required number of shards per attribute )
(This is the same message I get if I tell Elasticsearch to move a primary shard of an index to a node in the same datacentre as it’s replica by way of a POST to
/_cluster/reroute) To get cluster health back to green I PUT
"number_of_replicas": 1 in to
/searchguard/_settings. I then did an Ansible run which used sgadmin.sh to set the replicas back to 5, all of which got assigned.
So given the
cluster.routing.allocation.awareness settings our cluster has, how does sgadmin.sh create 5 replicas and get them all assigned to a hot node?