How S.G. initialize on a dedicated master node before other data nodes could join?

I have a pretty basic/fundamental question here: how does SearchGuard work with dedicated master node(s) in full cluster restart scenario. By dedicated master node, I mean “node.data: false and node.ingest: false”, i.e. the master node will not store any indice nor will do any ingestion.

I had SG7 + ES 7 + Kibana 7 working fine in single-node mode in the docker containers, so at least I had some successes and basic understandings with how SG should work with ES/Kibana 7.

What I’m seeing with dedicated master node(s) is, right after each master node being started, it will finish/pass master bootstrap check first, then masters will wait for SearchGuard plugin to finish its own initialization before master nodes start receiving any incoming requests 9200/9300.

S.G. won’t be able to finish initialization until it could create searchguard index successfully. However, since at this point there is no data node available yet, and there will never be, since data nodes can’t connect to dedicated master nodes before SG is fully initialized.

Does this sound like a deadlock situation? Is there some options to let SG perform lazy initialization, so at least some data nodes can join and searchguard can be created by then?

Do we have to make master a data node as well? Thus we should forget about the idea of running dedicated master node? Should SG experts share some insight, please?

Also I saw (when I searched similar topics) some people said setting “cluster.routing.allocation.enable: none” during cluster startup might help. Is this true? What
is the rationale behind this? I know SG installation guide said this flag is optional and it will
prevent big shards moving around during cluster restart.

However, we are talking about no data node at all, and no master, nor SG can finish initialization here.

Thanks,

[elasticsearch@elasticsearch-master-7cfbb7d4c-4rhfd ~]$ ./searchguard_init.sh
Populating and initializing SearchGuard …
Search Guard Admin v7
Will connect to localhost:9300 … done
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.bouncycastle.jcajce.provider.drbg.DRBG (file:/usr/share/elasticsearch/plugins/search-guard-7/bcprov-jdk15on-1.61.jar) to constructor sun.security.provider.Sun()
WARNING: Please consider reporting this to the maintainers of org.bouncycastle.jcajce.provider.drbg.DRBG
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Connected as CN=admin.elasticsearch.com
Elasticsearch Version: 7.1.0
Search Guard Version: 7.1.0-35.0.0
Contacting elasticsearch cluster ‘elasticsearch’ and wait for YELLOW clusterstate …
Clustername: elasticsearch2
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 0
searchguard index does not exists, attempt to create it … done (0-all replicas)
Populate config from /usr/share/elasticsearch/plugins/search-guard-7/sgconfig/
Will update ‘_doc/config’ with /usr/share/elasticsearch/plugins/search-guard-7/sgconfig/sg_config.yml

FAIL: Configuration for ‘config’ failed because of UnavailableShardsException[[searchguard][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[searchguard][0]] containing [index {[searchguard][_doc][config], source[n/a, actual length: [3.7kb], max length: 2kb]}] and a refresh]]
Will update ‘_doc/roles’ with /usr/share/elasticsearch/plugins/search-guard-7/sgconfig/sg_roles.yml

When posting in this category, please add:

  • Elasticsearch logfiles on debug level

  • Your Search Guard configuration files

  • Your elasticsearch.yml configuration file
    searchguard.disabled: false
    searchguard.enterprise_modules_enabled: true
    searchguard.allow_default_init_sgindex: false

searchguard.ssl.transport.pemcert_filepath: node.pem
searchguard.ssl.transport.pemkey_filepath: node.key
searchguard.ssl.transport.pemkey_password: ${ELASTICSEARCH_NODE_PKPASSWORD:}
searchguard.ssl.transport.pemtrustedcas_filepath: root-ca.pem
searchguard.ssl.transport.enforce_hostname_verification: false
searchguard.ssl.transport.resolve_hostname: false
searchguard.ssl.http.enabled: false

searchguard.nodes_dn:

  • CN=node.elasticsearch.com
    searchguard.authcz.admin_dn:
  • CN=admin.elasticsearch.com
    searchguard.audit.type: internal_elasticsearch
    searchguard.check_snapshot_restore_write_privileges: true
    searchguard.enable_snapshot_restore_privilege: true
    searchguard.roles_mapping_resolution: MAPPING_ONLY
    searchguard.restapi.roles_enabled: [“SGS_ALL_ACCESS”]

The cluster formation (or cluster bootstrapping) process is possible without having Search Guard initialized. That means that the discovery and master election can can happen even when SG is not initialized. Technically this is possible because the nodes can trust each other because all nodes are configured with a SSL node certificate and so node-to-node communication, which is about cluster bootstrapping, is always unrestricted. Run sgadmin after the cluster is fully bootstrapped, means all nodes are up and master elected. Do not set cluster.routing.allocation.enable setting to none. If you have trouble (especially with ES 7) to get the cluster bootstrapped try first without Search Guard installed to make sure you have the cluster.initial_master_nodes and discovery.seed_hosts set correctly. Please refer to Bootstrapping a cluster for more details.

That said it makes absolutely sense to have at least three dedicated master nodes in a production cluster.

cluster.routing.allocation.enable is another topic not related to initial cluster bootstrapping. But it gets probably important when you perform a rolling upgrade to upgrade Elasticsearch or Search Guard (or you have to apply a critical OS patch and you need to restart the physical machine). The upgrade process is documented here. In terms of Search Guard you need to make sure that if you run sgadmin (which is normally not needed for a rolling upgrade except your upgrade to a new major version like from ES 6 to ES 7) the cluster.routing.allocation.enable setting is not set to none.

Thanks @cstaley for the comprehensive answer above.

Yeah, we did make sure cluster.initial_master_nodes and discovery.seed_hosts were set correctly. And we WERE able to create multi-node ES 7 cluster (w/ dedicated master node, no data, no ingestion). And we did it with SG 7 is installed but disabled by default. This is our baseline and it has been solid.

The ES cluster bootstrapped fine, master/ingest/data nodes all came up nicely.

Right now, I’m focusing on getting dedicated master nodes to work with SG7 enabled.

I think the actual problem I’m still facing is we haven’t found a proper readinessProbes to indicate the ES masters had been correctly started (even though SG7 is not fully initialized). Once readinessProbe passed, K8s will allow the ES7+SG7 POD to start accepting connections from other nodes even before SG7 is fully initialized.

So far, I had tried,
http://:9200, which will return 503 “Search Guard not initialized (SG11)”
http://:9300, which will return 000

Maybe in the readinessProbe, I should use some sgadmin.sh option to help ensure SG7 is up and running,
just not fully initialized waiting at least one data node to join.

Do SG team have some recommendations here, then I think we will be all good here.

Thanks! Cheers!

I think the only thing that works is:

        readinessProbe:
          tcpSocket:
            port: 9300
          initialDelaySeconds: 20
          periodSeconds: 10

That’s what we use in our helm chart: https://github.com/floragunncom/search-guard-helm/blob/master/sg-helm/templates/master-statefulset.yaml

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.