I am upgrading ELK(with searchguard) from version 6.6.1 to 7.0.1 in kubernetes environment using helm charts.
Since SG configurations need to be migrated from version 6 to 7, I have written a post-upgrade job in my helm chart. ./sgadmin.sh -migrate /usr/share/elasticsearch/sg_migrate -ts $TRUSTSTORE_FILEPATH -ks $CLIENT_KEYSTORE_FILEPATH -cn $CLUSTER_NAME -kspass $KS_PWD -tspass $TS_PWD -h $HOSTNAME -nhnv
I see that sometimes, at post-upgrade, the sg migration fails with the below error:
Contacting elasticsearch cluster 'logging-elk' and wait for YELLOW clusterstate ...
Clustername: logging-elk
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 0
searchguard index does not exists, attempt to create it ... done (0-all replicas)
ERR: Seems cluster is already migrated
It did not detect any data nodes even though data nodes were up. It tries to create searchguard-7 index and subsequently after this, if i run migrate-job again, it fails with the error - “searchguard index already exists, ERR: seems cluster is already migrated”.
But my SG configuration files are still in SG-6 format. Because of this, I cannot even run sg-admin as files are in SG-6 format & sg index is in 7 format. With this, the cluster becomes unreachable.
How can I guarantee that migrate job runs only after data nodes are detected so as to never end up with this issue?
Hi,
Any suggestions? Can someone help me understand how is the cluster state green in this case when the number of data nodes is 0?
As SGadmin waits for yellow/green health status, in my case, as the cluster health was found to be green, it continued to run the --migrate method. But, as there were no data nodes, it failed to detect an SG index.
I guess thats more a kubernetes/helm issue because it simply looks like your data nodes are not up or they did not join the cluster (maybe misconfiguration of the new ES 7 discovery options, see Discovery | Elasticsearch Guide [7.16] | Elastic
A cluster can be green even if there is no data node (but thats a ES and not a SG thingy). I will check if it makes sense to add an option to SG admin to wait for at least one data node. Until now you can check this with a curl command (using the admin certificates) doing something like Cluster health API | Elasticsearch Guide [8.4] | Elastic with wait_for_active_shards=1
Thanks @hsaly for the response.
But, I cannot check with a curl command with wait_for_active_shards because unless SgAdmin runs successfully, I cannot reach elasticsearch rest API. And until SG migration completes, SgAdmin will keep failing. Your thoughts?
Also, another observation I made - In my previous post, you can see the error :
Number of data nodes: 0
searchguard index does not exists, attempt to create it ... done (0-all replicas)
When there is no data node that had joined the cluster, where was this searchguard index created?
When the next time Sg migration is triggered, it fails with the error -
searchguard index already exists, so we do not need to create one.
INFO: searchguard index state is YELLOW, it seems you miss some replicas
ERR: Seems cluster is already migrated