Seems cluster is already migrated when data nodes are not detected failing sg-migration

shivani.aggarwal2195 · February 11, 2020, 1:10pm

Hi,

I am upgrading ELK(with searchguard) from version 6.6.1 to 7.0.1 in kubernetes environment using helm charts.
Since SG configurations need to be migrated from version 6 to 7, I have written a post-upgrade job in my helm chart.
./sgadmin.sh -migrate /usr/share/elasticsearch/sg_migrate -ts $TRUSTSTORE_FILEPATH -ks $CLIENT_KEYSTORE_FILEPATH -cn $CLUSTER_NAME -kspass $KS_PWD -tspass $TS_PWD -h $HOSTNAME -nhnv

I see that sometimes, at post-upgrade, the sg migration fails with the below error:

Contacting elasticsearch cluster 'logging-elk' and wait for YELLOW clusterstate ...
Clustername: logging-elk
Clusterstate: GREEN
Number of nodes: 2
Number of data nodes: 0
searchguard index does not exists, attempt to create it ... done (0-all replicas)
ERR: Seems cluster is already migrated

It did not detect any data nodes even though data nodes were up. It tries to create searchguard-7 index and subsequently after this, if i run migrate-job again, it fails with the error - “searchguard index already exists, ERR: seems cluster is already migrated”.

But my SG configuration files are still in SG-6 format. Because of this, I cannot even run sg-admin as files are in SG-6 format & sg index is in 7 format. With this, the cluster becomes unreachable.
How can I guarantee that migrate job runs only after data nodes are detected so as to never end up with this issue?

Thanks,
Shivani

shivani.aggarwal2195 · February 17, 2020, 4:06am

Hi,
Any suggestions? Can someone help me understand how is the cluster state green in this case when the number of data nodes is 0?
As SGadmin waits for yellow/green health status, in my case, as the cluster health was found to be green, it continued to run the --migrate method. But, as there were no data nodes, it failed to detect an SG index.

hsaly · February 18, 2020, 9:50am

I guess thats more a kubernetes/helm issue because it simply looks like your data nodes are not up or they did not join the cluster (maybe misconfiguration of the new ES 7 discovery options, see Discovery | Elasticsearch Guide [7.16] | Elastic

A cluster can be green even if there is no data node (but thats a ES and not a SG thingy). I will check if it makes sense to add an option to SG admin to wait for at least one data node. Until now you can check this with a curl command (using the admin certificates) doing something like Cluster health API | Elasticsearch Guide [8.4] | Elastic with wait_for_active_shards=1

shivani.aggarwal2195 · February 20, 2020, 4:49am

Thanks @hsaly for the response.
But, I cannot check with a curl command with wait_for_active_shards because unless SgAdmin runs successfully, I cannot reach elasticsearch rest API. And until SG migration completes, SgAdmin will keep failing. Your thoughts?

Also, another observation I made - In my previous post, you can see the error :

Number of data nodes: 0
searchguard index does not exists, attempt to create it ... done (0-all replicas)

When there is no data node that had joined the cluster, where was this searchguard index created?
When the next time Sg migration is triggered, it fails with the error -

searchguard index already exists, so we do not need to create one.
INFO: searchguard index state is YELLOW, it seems you miss some replicas
ERR: Seems cluster is already migrated

shivani.aggarwal2195 · February 24, 2020, 7:40pm

@hsaly, Any thoughts?
When there is no data node that had joined the cluster, where was this searchguard index created?

system · March 16, 2020, 7:40pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELK+SG migration from 5.6.6 to 6.6.1 problem on Rolling upgrade Search Guard	0	495	March 22, 2019
Cannot retrieve cluster state due to: No user found for cluster:monitor/health. Search Guard	11	2544	January 18, 2018
New install fails on Search Guard not initialized Search Guard	2	403	July 20, 2017
SG 6.5.1 [security_exception] no permissions for [indices:admin/create] how to fix ? Search Guard	1	3084	December 3, 2018
ELK stack minor then major upgrade Search Guard	5	486	December 1, 2020

Seems cluster is already migrated when data nodes are not detected failing sg-migration

Related topics