sgadmin won't start

Hello,
I have been doing many deployment tests locally (on my computer) with success. Now, I am testing deployment onto our staging environment and have run into a blocker. sgadmin says it connects, but hangs on “waiting for cluster to turn yellow”. I’ll now explain my set up.

  • ES 5.5.2

  • SG 5.5.2-16 (no enterprise modules)

  • ES config is NOT in elasticsearch/config but in a separate folder. Since the cluster existed before, this is already configured.

  • This cluster has 2 nodes whereas my local test cluster only had one node.

  • I have copied known working keystores to the ES config folder and those seem to work (as sgadmin complained when the paths were set incorrectly).

My ES config is as such:

cluster.name: profiles-poc

node.name: poc-profiles-es-01

path.data: /data/elasticsearch

network.host: 0.0.0.0

indices.memory.index_buffer_size: 40%

thread_pool.bulk.queue_size: 1000

discovery.zen.ping.unicast.hosts: [“poc-profiles-es-01”, “poc-profiles-es-02”]

discovery.zen.minimum_master_nodes: 2

http.cors.enabled: true

http.cors.allow-origin: ‘*’

action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history-*

reindex.remote.whitelist: 10.132.0.*:9200

searchguard.disabled: true

searchguard.ssl.transport.keystore_filepath: node-0-keystore.jks

searchguard.ssl.transport.truststore_filepath: truststore.jks

searchguard.ssl.transport.keystore_password: eznodeks

searchguard.ssl.transport.truststore_password: ezrootts

searchguard.ssl.transport.enforce_hostname_verification: false

searchguard.ssl.http.enabled: false

searchguard.authcz.admin_dn:

  • “CN=admin,OU=Comptes de services,OU=SIEGE Accounts,DC=siege,DC=np6,DC=local”

######## End Search Guard Demo Configuration ########

xpack.graph.enabled: false

xpack.security.enabled: false

xpack.watcher.enabled: false

xpack.monitoring.enabled: true

xpack.monitoring.exporters:

id1:

type: http

host: ["http://127.0.0.1:9200"]

auth.username: kibanaserver

auth.password: kibanaserver

``

Now, about the problem. When I run sgadmin.sh, at the very beginning it says its connecting to my cluster at 9300 “…done”. Then it prints an SG license warning, then it says something along the lines of “Connecting to cluster ‘profiles-poc’ at localhost:9300 and waiting for YELLOW state”. Here is hangs.

I’ve read there is a diagnose flag so I turned that on. Sadly I lost the logs (and had to relinquish the servers to other workers for now), but the main problem was a “MasterNotDiscoveredException”.

Would you ladies/gentlemen have any ideas on where to start debugging this?

Please and thank you,

Marco.

Oh. I want to add something: I only copied the keyfiles to the master node. I don’t know why I stopped there but maybe I thought this sgadmin problem is unrelated and wanted to solve it first.

···

On Tuesday, September 26, 2017 at 5:26:39 PM UTC+2, mcost...@np6.com wrote:

Hello,
I have been doing many deployment tests locally (on my computer) with success. Now, I am testing deployment onto our staging environment and have run into a blocker. sgadmin says it connects, but hangs on “waiting for cluster to turn yellow”. I’ll now explain my set up.

  • ES 5.5.2
  • SG 5.5.2-16 (no enterprise modules)
  • ES config is NOT in elasticsearch/config but in a separate folder. Since the cluster existed before, this is already configured.
  • This cluster has 2 nodes whereas my local test cluster only had one node.
  • I have copied known working keystores to the ES config folder and those seem to work (as sgadmin complained when the paths were set incorrectly).

My ES config is as such:

cluster.name: profiles-poc

node.name: poc-profiles-es-01

path.data: /data/elasticsearch

network.host: 0.0.0.0

indices.memory.index_buffer_size: 40%

thread_pool.bulk.queue_size: 1000

discovery.zen.ping.unicast.hosts: [“poc-profiles-es-01”, “poc-profiles-es-02”]

discovery.zen.minimum_master_nodes: 2

http.cors.enabled: true

http.cors.allow-origin: ‘*’

action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history-*

reindex.remote.whitelist: 10.132.0.*:9200

searchguard.disabled: true

searchguard.ssl.transport.keystore_filepath: node-0-keystore.jks

searchguard.ssl.transport.truststore_filepath: truststore.jks

searchguard.ssl.transport.keystore_password: eznodeks

searchguard.ssl.transport.truststore_password: ezrootts

searchguard.ssl.transport.enforce_hostname_verification: false

searchguard.ssl.http.enabled: false

searchguard.authcz.admin_dn:

  • “CN=admin,OU=Comptes de services,OU=SIEGE Accounts,DC=siege,DC=np6,DC=local”

######## End Search Guard Demo Configuration ########

xpack.graph.enabled: false

xpack.security.enabled: false

xpack.watcher.enabled: false

xpack.monitoring.enabled: true

xpack.monitoring.exporters:

id1:

type: http
host: ["[http://127.0.0.1:9200](http://127.0.0.1:9200)"]
auth.username: kibanaserver
auth.password: kibanaserver

``

Now, about the problem. When I run sgadmin.sh, at the very beginning it says its connecting to my cluster at 9300 “…done”. Then it prints an SG license warning, then it says something along the lines of “Connecting to cluster ‘profiles-poc’ at localhost:9300 and waiting for YELLOW state”. Here is hangs.

I’ve read there is a diagnose flag so I turned that on. Sadly I lost the logs (and had to relinquish the servers to other workers for now), but the main problem was a “MasterNotDiscoveredException”.

Would you ladies/gentlemen have any ideas on where to start debugging this?

Please and thank you,

Marco.

I don’t think I fully understand the problem / setup here. What exactly do you mean by “I only copied the keyfiles to the master node”? You need to first make sure that the certificates on all nodes are correct. If this is not the case, and nodes cannot join the cluster due to wrong certificates, you will end up with a red cluster state. I think this is the case here, since sgadmin prints out “waiting for cluster to turn yellow” - indicating your cluster state is red.

So to debug:

  • Make sure that the certificates in the key- and truststores (or PEM for that matter)

  • Start up all your nodes

  • Make sure the nodes can discover each other - for that monitor the log file and look for any exceptions during startup

  • If the cluster is up, execute sgadmin

···

On Tuesday, September 26, 2017 at 5:28:35 PM UTC+2, mcostantini@np6.com wrote:

Oh. I want to add something: I only copied the keyfiles to the master node. I don’t know why I stopped there but maybe I thought this sgadmin problem is unrelated and wanted to solve it first.

On Tuesday, September 26, 2017 at 5:26:39 PM UTC+2, mcost...@np6.com wrote:

Hello,
I have been doing many deployment tests locally (on my computer) with success. Now, I am testing deployment onto our staging environment and have run into a blocker. sgadmin says it connects, but hangs on “waiting for cluster to turn yellow”. I’ll now explain my set up.

  • ES 5.5.2
  • SG 5.5.2-16 (no enterprise modules)
  • ES config is NOT in elasticsearch/config but in a separate folder. Since the cluster existed before, this is already configured.
  • This cluster has 2 nodes whereas my local test cluster only had one node.
  • I have copied known working keystores to the ES config folder and those seem to work (as sgadmin complained when the paths were set incorrectly).

My ES config is as such:

cluster.name: profiles-poc

node.name: poc-profiles-es-01

path.data: /data/elasticsearch

network.host: 0.0.0.0

indices.memory.index_buffer_size: 40%

thread_pool.bulk.queue_size: 1000

discovery.zen.ping.unicast.hosts: [“poc-profiles-es-01”, “poc-profiles-es-02”]

discovery.zen.minimum_master_nodes: 2

http.cors.enabled: true

http.cors.allow-origin: ‘*’

action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history-*

reindex.remote.whitelist: 10.132.0.*:9200

searchguard.disabled: true

searchguard.ssl.transport.keystore_filepath: node-0-keystore.jks

searchguard.ssl.transport.truststore_filepath: truststore.jks

searchguard.ssl.transport.keystore_password: eznodeks

searchguard.ssl.transport.truststore_password: ezrootts

searchguard.ssl.transport.enforce_hostname_verification: false

searchguard.ssl.http.enabled: false

searchguard.authcz.admin_dn:

  • “CN=admin,OU=Comptes de services,OU=SIEGE Accounts,DC=siege,DC=np6,DC=local”

######## End Search Guard Demo Configuration ########

xpack.graph.enabled: false

xpack.security.enabled: false

xpack.watcher.enabled: false

xpack.monitoring.enabled: true

xpack.monitoring.exporters:

id1:

type: http
host: ["[http://127.0.0.1:9200](http://127.0.0.1:9200)"]
auth.username: kibanaserver
auth.password: kibanaserver

``

Now, about the problem. When I run sgadmin.sh, at the very beginning it says its connecting to my cluster at 9300 “…done”. Then it prints an SG license warning, then it says something along the lines of “Connecting to cluster ‘profiles-poc’ at localhost:9300 and waiting for YELLOW state”. Here is hangs.

I’ve read there is a diagnose flag so I turned that on. Sadly I lost the logs (and had to relinquish the servers to other workers for now), but the main problem was a “MasterNotDiscoveredException”.

Would you ladies/gentlemen have any ideas on where to start debugging this?

Please and thank you,

Marco.

Your confusion is warranted. I was doing things incorrectly. It had slipped my mind that the confs and plugin needed to be installed on all nodes. That did the trick.

Many thanks again.

···

On Tuesday, September 26, 2017 at 10:24:57 PM UTC+2, Jochen Kressin wrote:

I don’t think I fully understand the problem / setup here. What exactly do you mean by “I only copied the keyfiles to the master node”? You need to first make sure that the certificates on all nodes are correct. If this is not the case, and nodes cannot join the cluster due to wrong certificates, you will end up with a red cluster state. I think this is the case here, since sgadmin prints out “waiting for cluster to turn yellow” - indicating your cluster state is red.

So to debug:

  • Make sure that the certificates in the key- and truststores (or PEM for that matter)
  • Start up all your nodes
  • Make sure the nodes can discover each other - for that monitor the log file and look for any exceptions during startup
  • If the cluster is up, execute sgadmin

On Tuesday, September 26, 2017 at 5:28:35 PM UTC+2, mcost...@np6.com wrote:

Oh. I want to add something: I only copied the keyfiles to the master node. I don’t know why I stopped there but maybe I thought this sgadmin problem is unrelated and wanted to solve it first.

On Tuesday, September 26, 2017 at 5:26:39 PM UTC+2, mcost...@np6.com wrote:

Hello,
I have been doing many deployment tests locally (on my computer) with success. Now, I am testing deployment onto our staging environment and have run into a blocker. sgadmin says it connects, but hangs on “waiting for cluster to turn yellow”. I’ll now explain my set up.

  • ES 5.5.2
  • SG 5.5.2-16 (no enterprise modules)
  • ES config is NOT in elasticsearch/config but in a separate folder. Since the cluster existed before, this is already configured.
  • This cluster has 2 nodes whereas my local test cluster only had one node.
  • I have copied known working keystores to the ES config folder and those seem to work (as sgadmin complained when the paths were set incorrectly).

My ES config is as such:

cluster.name: profiles-poc

node.name: poc-profiles-es-01

path.data: /data/elasticsearch

network.host: 0.0.0.0

indices.memory.index_buffer_size: 40%

thread_pool.bulk.queue_size: 1000

discovery.zen.ping.unicast.hosts: [“poc-profiles-es-01”, “poc-profiles-es-02”]

discovery.zen.minimum_master_nodes: 2

http.cors.enabled: true

http.cors.allow-origin: ‘*’

action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history-*

reindex.remote.whitelist: 10.132.0.*:9200

searchguard.disabled: true

searchguard.ssl.transport.keystore_filepath: node-0-keystore.jks

searchguard.ssl.transport.truststore_filepath: truststore.jks

searchguard.ssl.transport.keystore_password: eznodeks

searchguard.ssl.transport.truststore_password: ezrootts

searchguard.ssl.transport.enforce_hostname_verification: false

searchguard.ssl.http.enabled: false

searchguard.authcz.admin_dn:

  • “CN=admin,OU=Comptes de services,OU=SIEGE Accounts,DC=siege,DC=np6,DC=local”

######## End Search Guard Demo Configuration ########

xpack.graph.enabled: false

xpack.security.enabled: false

xpack.watcher.enabled: false

xpack.monitoring.enabled: true

xpack.monitoring.exporters:

id1:

type: http
host: ["[http://127.0.0.1:9200](http://127.0.0.1:9200)"]
auth.username: kibanaserver
auth.password: kibanaserver

``

Now, about the problem. When I run sgadmin.sh, at the very beginning it says its connecting to my cluster at 9300 “…done”. Then it prints an SG license warning, then it says something along the lines of “Connecting to cluster ‘profiles-poc’ at localhost:9300 and waiting for YELLOW state”. Here is hangs.

I’ve read there is a diagnose flag so I turned that on. Sadly I lost the logs (and had to relinquish the servers to other workers for now), but the main problem was a “MasterNotDiscoveredException”.

Would you ladies/gentlemen have any ideas on where to start debugging this?

Please and thank you,

Marco.