Receiving signals_unavailable_exception

Doug_Renze · February 2, 2021, 2:41pm

TL;DR

Receiving signals_unavailable_exception when going to Search Guard Signals in 7.10.1, did not have issue in 7.9.1.

Full Issue

I’ve recently updated our environment to 7.10.1 from 7.9.1. I’m using the official Elastic container images, with the appropriate version of SG installed, deploying to Kubernetes via the Elastic Helm charts.

I’m (finally) starting to configure alerts for some of our teams and am seeing the following error, when going to Signals:

{
  "error": {
    "root_cause": [
      {
        "type": "signals_unavailable_exception",
        "reason": "Signals is still initializing. Please try again later."
      }
    ],
    "type": "signals_unavailable_exception",
    "reason": "Signals is still initializing. Please try again later."
  },
  "status": 500
}

I did not receive this alert when exploring signals under previous versions of the SG plugin.

nils · February 2, 2021, 2:49pm

Please have a look at the ES logs. These should contain more details on the problem.

Doug_Renze · February 5, 2021, 2:41am

ES logs at INFO level show the following for Signals right after startup. After this, no add’l info is shown for Signals that I can locate:

{"type": "server", "timestamp": "2021-02-05T02:26:12,381Z", "level": "INFO", "component": "c.f.s.j.c.IndexJobStateStore", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler signals/_main is initialized. Jobs: 0 Active Triggers: 0", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:12,382Z", "level": "INFO", "component": "o.q.c.QuartzScheduler", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler meta-data: Quartz Scheduler (v2.3.2) 'signals/_main' with instanceId 'signals/_main'\n  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.\n  NOT STARTED.\n  Currently in standby mode.\n  Number of jobs executed: 0\n  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.\n  Using job-store 'com.floragunn.searchsupport.jobs.core.IndexJobStateStore' - which supports persistence. and is clustered.\n", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:12,383Z", "level": "INFO", "component": "o.q.c.QuartzScheduler", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler signals/_main_$_signals/_main started.", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:13,340Z", "level": "INFO", "component": "c.f.s.j.c.IndexJobStateStore", "cluster.name": "es", "node.name": "es-master-1", "message": "Reinitializing jobs for IndexJobStateStore [schedulerName=signals/admin_tenant, statusIndexName=.signals_watches_trigger_state, jobConfigSource=IndexJobConfigSource [indexName=.signals_watches, jobFactory=com.floragunn.signals.watch.Watch$JobConfigFactory@70c3847d, jobDistributor=JobDistributor signals/admin_tenant], jobFactory=com.floragunn.signals.watch.Watch$JobConfigFactory@70c3847d]", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:18,366Z", "level": "INFO", "component": "c.f.s.j.c.IndexJobStateStore", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler signals/admin_tenant is initialized. Jobs: 0 Active Triggers: 0", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:18,366Z", "level": "INFO", "component": "o.q.c.QuartzScheduler", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler meta-data: Quartz Scheduler (v2.3.2) 'signals/admin_tenant' with instanceId 'signals/admin_tenant'\n  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.\n  NOT STARTED.\n  Currently in standby mode.\n  Number of jobs executed: 0\n  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.\n  Using job-store 'com.floragunn.searchsupport.jobs.core.IndexJobStateStore' - which supports persistence. and is clustered.\n", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:18,366Z", "level": "INFO", "component": "o.q.c.QuartzScheduler", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler signals/admin_tenant_$_signals/admin_tenant started.", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }
{"type": "server", "timestamp": "2021-02-05T02:26:18,367Z", "level": "INFO", "component": "c.f.s.j.c.IndexJobStateStore", "cluster.name": "es", "node.name": "es-master-1", "message": "Scheduler signals/admin_tenant is initialized. Jobs: 0 Active Triggers: 0", "cluster.uuid": "816ooLAjQmWfG71KY0DSSA", "node.id": "Sog6C_VJSii5bnPXrc5oYw"  }

nils · February 5, 2021, 7:12am

Interesting. If Signals is already starting the tenant schedulers, it should be close to finishing the initialization:

The logs mention the tenant selected by default (_main) and the admin_tenant. Do you have any more tenants configured?

Some more questions:

What exact version of Search Guard are you trying? If it is 48, would it be possible to try 49?
How many nodes do you have in your cluster? What roles (client, data, master) do these nodes have?
How are you accessing Signals? Are you using the UI or the REST API?
- If you are using the UI, do you get the error immediately on the overview page?
- If you are using the REST API, what is the endpoint you are calling?

Doug_Renze · February 5, 2021, 6:22pm

No, just one tenant.

Using SG 48, I’ll need to build the images for SG 49, but will do so, probably on Monday.

Currently have 34 Kubernetes Pods as nodes in the cluster. Two are coordinator nodes, which only handle requests from Kibana. Three are configured as master nodes. The remainder are data nodes, divided between hot (12)/warm (5)/cold (12) in the lifecycle; we made the decision to use more nodes, rather than more resources per node.

I’m getting this error while using the UI, and yes, I get it as soon as going to the Signals overview page.

Doug_Renze · February 9, 2021, 1:46pm

I’ve upgraded my test region to ELK 7.10.2 with SG 49.0.0, and I’m no longer getting this error. I’ve got to wait until tonight to redeploy my production instance and will let you know if it works.

Doug_Renze · February 9, 2021, 11:36pm

I’ve deployed ELK 7.10.2, with SG 49.0.0 to my production region. Everything appears to be working as expected after this upgrade.

Any idea what the issue was?

nils · February 11, 2021, 9:42am

SG 48 had a bug where Signals was not correctly initialized on some nodes in certain cluster topologies. The details are a bit complicated to explain in a brief forum post, though

Doug_Renze · February 12, 2021, 3:24pm

Has the “Solution” checkbox been removed from the forums? I’d like to accept your solution for the next person who comes along with the same or similar issue.

nils · February 15, 2021, 9:07am

The Signals category had this feature disabled; this was probably an oversight, I have enabled it now.

Topic		Replies	Views
Missing Signal plugin 7.7.0 Signals Alerting	8	1034	June 4, 2020
Intermittent errors using Signals plugin Signals Alerting	3	770	February 18, 2020
Signals Alerting for Elasticsearch 7.4.x available now Announcements	1	721	March 3, 2020
Search Guard not initialized (SG11) Search Guard	2	1178	May 16, 2018
New install fails on Search Guard not initialized Search Guard	2	407	July 20, 2017

Receiving signals_unavailable_exception

TL;DR

Full Issue

Related topics