Sgadmin migrate failure "Unrecognized field"

Elasticsearch 7.1.1 and Searchguard 35.0.0, upgraded from ES 6.5.4.

The sgadmin “migrate” function fails on internalusers with the following error - it appears that entries which included a “password” field, which was valid prior to 7.x, and cause no problems while in the searchguard legacy index, fail validation when extracted by the migration function. This prevents any migration.

Will retrieve 'sg/internalusers' into .\sg_internal_users_2019-Jun-19_16-25-15.yml (legacy mode)
ERR: Seems internalusers from cluster is not in legacy format: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "password" (class com.floragunn.searchguard.sgconf.impl.v6.InternalUserV6), not marked as ignorable (6 known properties: "readonly", "username", "attributes", "hidden", "roles", "hash"])

To work around this on my test cluster, I’m disabling searchguard and extracting the json for internalusers from the index to validate against my managed sg_internal_users.yml file - I’m expecting that I can pre-emptively modify and load the file into other clusters prior to performing the 7.x upgrade, but once it’s in this state there seems to be no clean way to recover.

To prevent this we have a extra step documented in the upgrade procedure called “Check your Search Guard configuration” https://docs.search-guard.com/latest/upgrading-6-7#check-your-search-guard-configuration Please execute this before you start the actual migration.

You can also run an offline migration and revise the files (run sgadmin with --migrate-offline <folder>) before uploading them.

And I am wondering why you have a field called “password” in sg_internal_users.yml. That field never existed (because we never had plaintext passwords in this file). Password were always stored a bcrypt hashes in a field called “hash”.

I saw those instructions but couldn’t get them to work. The backup step gives:

[migration@es-master-0001 sgadmin-standalone-7.1.1]$ tools/sgadmin.sh  -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -backup sgconfig.6.5.4.retrieved -h $LB_IP -keypass $KEYPASS
...
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{5WQIa3GhRvSe9Ap7h9FgBQ}{10.x.x.x}{10.x.x.x:9300}]]
        at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
        at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
        at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:57)
        at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:386)
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:393)
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:382)
        at com.floragunn.searchguard.tools.SearchGuardAdmin.execute(SearchGuardAdmin.java:510)
        at com.floragunn.searchguard.tools.SearchGuardAdmin.main(SearchGuardAdmin.java:142)

(retrieving using “-r” with the existing 6.5.4 sgadmin here works fine; the ES log file shows no output at this time)

The sgadmin.sh -vc 6 ./sgconfig invocation just spits out the sgadmin help text.

I don’t know where these “password” fields have come from, either; they seem to be in our managed sgconfig files from at least as far back as May 2018. I’d have generated them initially using the web UI and used sgadmin to retrieve the yml versions, but they may have been “curated” since.

Can you please post the full output of those sgadmin commands?

tools/sgadmin.sh -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -h $LB_IP -keypass $KEYPASS --diagnose -si

tools/sgadmin.sh -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -h $LB_IP -keypass $KEYPASS --diagnose -r

tools/sgadmin.sh -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -h $LB_IP -keypass $KEYPASS --diagnose -migrate /tmp

All three variants give the same output as I posted previously. None of them create a diagnostic trace log. (this is running against a pre-migration 6.5.4 cluster - our main QA cluster)

In case it’s relevant, I tried running these commands with the version of sgadmin in the installed plugin directory, and the first gives a NPE:

[root@es7851 sgadmin-standalone-7.1.1]# $SG6/tools/sgadmin.sh -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -h $LB_IP -keypass "$KEYPASS" --diagnose -si    
Search Guard Admin v6
Will connect to 10.x.x.x:9300 ... done
Elasticsearch Version: 6.5.4
Search Guard Version: 6.5.4-24.0
Connected as CN=...redacted...
ERR: An unexpected NullPointerException occured: null
Trace:
java.lang.NullPointerException

Please post the full output, also that redacted with “…” in the previous post

I’m generally redacting only to take out sensitive organisational information like hostnames and IPs in public postings - but here are those two lines sans IP. If you need pristine output I can mail or open a support ticket.

# tools/sgadmin.sh -icl -key $CFG/kirk/kirk.key -cert $CFG/kirk/kirk.pem -cacert $CFG/root-ca.pem -nhnv -h $LB_IP -keypass "$KEYPASS" --diagnose -r     
Search Guard Admin v7
Will connect to xx.xx.xx.xx:9300 ... done
ERR: Cannot connect to Elasticsearch. Please refer to elasticsearch logfile for more information
Trace:
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{R6dHaglPQh-exwZmamuVgA}{xx.xx.xx.xx}{xx.xx.xx.xx:9300}]]
        at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:352)
        at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:248)
        at org.elasticsearch.client.transport.TransportProxyClient.execute(TransportProxyClient.java:57)
        at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:386)
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:393)
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:382)
        at com.floragunn.searchguard.tools.SearchGuardAdmin.execute(SearchGuardAdmin.java:510)
        at com.floragunn.searchguard.tools.SearchGuardAdmin.main(SearchGuardAdmin.java:142)

Nothing in the logs of the ES nodes?

And I’am right that sgadmin 6 can connect and then produces a NPE while sgadmin 7 cannot even connect properly to the cluster?

From what i see when you connect with sgadmin 6 is that your cluster still seems to be on 6.5.4 instead of 7.1.1?

No, nothing in the logs of the node I connect to, or any other in the cluster (unrelated logging is appearing, e.g. if I deliberately fail a permissions check with curl) so it’s not a log system problem)

Yes, sgadmin 6 can connect:

  • it produces an NPE when the -si option is used
  • -dg -r works OK
  • -migrate is of course not supported in 6.

Yes:

That’s the point of this process, right? As that upgrade guide describes, this configuration check has to be done before making the upgrade to 7.x.

Thanks. Will try to reproduce and report back …

But it seems that you need to upgrade from 6.5 to 6.7 or 6.8 first and then to 7.1.1

Please upgrade your 6.5.4 cluster to 6.7 or 6.8 first and then to 7.1.1
sgadmin 7 can not connect because the wire format is only compatible with >= 6.7
Please refer also to the prerequisites in the docs.

Those are the prerequisites for a rolling restart not for an upgrade in general. Perhaps this is just a matter of docs clarification.

What you can do to perform an upgrade from 6.5.4 to 7.1.1 in full cluster restart mode is:

  1. Extract your current config with -r (with sgadmin 6) and change filenames to be without timestamp
  2. Download standalone version of sgadmin 7
  3. Run sgadmin.sh -vc 6 to validate them (with sgadmin 7)
  4. If files are not valid rework them until all valid
  5. Run sgadmin.sh -cd ... to upload the sanitized files to the cluster (with sgadmin 6)
  6. Take down all your nodes and upgrade ES/SG to 7.1.1
  7. Start all nodes, Search Guard 7 should initialize because it can read valid/sanitized V6 config files
  8. Run sgadmin.sh -migrate to migrate config to V7 (with sgadmin 7)