Signals Watch getting "Failed to Fetch"

Doug_Renze · February 10, 2021, 6:58pm

TL;DR

Search Guard Watch doesn’t appear to be executing, receiving “failed to fetch” in Kibana UI.

Application Version

Elastic Stack 7.10.2 with SG 49.0.0, running in EKS on stock Elastic images, with Search Guard installed.

The Issue

I’ve configured a Watch in Signals through the Kibana Plugin UI with the following check. The objective of the Watch is simply to get the number of logged documents from the previous 15 min. and to send an alert if none have been received, which indicates that the application is most likely down.

I do not currently have an action configured for this watch, as I’m just trying to test the watch, for now.

The check is:

[
  {
    "type": "static",
    "name": "constants",
    "target": "constants",
    "value": {
      "threshold": 0,
      "window": "15m"
    }
  },
  {
    "type": "search",
    "name": "event_count",
    "target": "event_count",
    "request": {
      "indices": [
        "filebeat-*"
      ],
      "body": {
        "size": 0,
        "track_total_hits": true,
        "query": {
          "bool": {
            "must": {
              "match": {
                "service.name": "my-service-name"
              }
            },
            "filter": {
              "range": {
                "@timestamp": {
                  "gte": "now-{{data.constants.window}}"
                }
              }
            }
          }
        }
      }
    }
  },
  {
    "name": "event_count_alert",
    "type": "condition",
    "source": "data.event_count.hits.total.value <= data.constants.threshold"
  }
]

It appears to fail to execute and I get a “failed to fetch” message in the Kibana UI (see screenshot).

However, when I execute the following search in the console:

GET filebeat-*/_search
{
  "size": 0,
  "track_total_hits": true,
  "query": {
    "bool": {
      "must": {
        "match": {
          "service.name": "my-service-name"
        }
      },
      "filter": {
        "range": {
          "@timestamp": {
            "gte": "now-15m"
          }
        }
      }
    }
  }
}

I get the following results, as expected. The condition in the check should trigger if hits.total.value == 0.

{
  "took" : 315,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 2,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4378,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

I’m not seeing anything in either the Elasticsearch or Kibana logs, which might indicate what the issue is.

Update

I don’t know whether this has anything to do with it, but I decided to try to create an Elasticsearch action, so that I could see what was being output by this alert. I ended up getting a 401/Unauthorized error in the UI while SG was trying to retrieve the list of indices, and then I got this in the Elasticsearch log:

Cannot retrieve roles for User [name=me@example.com, backend_roles=[SG_ADMIN, SG_USER], requestedTenant=null] from ldap due to ElasticsearchSecurityException[ElasticsearchSecurityException[No user me@example.com found]]; nested: ElasticsearchSecurityException[No user me@example.com found];

I’m uncertain why, as I should have full admin options, and I appear to be able to save watches.

nils · February 11, 2021, 9:40am

When exactly do you get the “failed to fetch” error message? When trying to manually executing the watch from the Kibana plugin?

For both problems, we would need more information from the Elasticsearch logs. In case of manually executing a watch, you can also check the Browser console log for more logs.

Regarding the LDAP error: Can you also please post your complete sg_config.yml configuration? Remember to redact passwords. While redacting, please double check that LDAP passwords are correct, both in the authc section and the authz section.

Also, how are you assigning roles to users? In the log message, the user has the roles SG_ADMIN and SG_USER. There are two ways: You can use the role mapping feature to directly map user names to roles. Alternatively, you can assign roles coming from LDAP to Search Guard roles.

Doug_Renze · February 11, 2021, 2:18pm

No. I get it after I’ve configured the watch, while I’m still connected during that session. I have the watch set to execute every 10m, and I’m getting it about every 10m, so my assumption is that they’re connected, especially since it goes away when I disable the watch.

How do I manually execute a watch? Will this allow me to see what’s being returned by Elasticsearch, for troubleshooting purposes?

What log settings do you need me to enable to get the correct information?

We use ADFS as our primary authentication. With the sole exception of this error, we’re not receiving any regular errors. LDAP is configured as a secondary authentication, so that users are able to authenticate directly to Elasticsearch to execute queries, etc. with the same permissions as they have when using the Kibana UI. Since we’re using EKS, we follow best practice and mount all secrets as environment variables in the pods. All authentication (ADFS & LDAP) is working as expected, with the exception of this issue, so passwords should be correct.

My sg_config.yml is:

---
_sg_meta:
  type: 'config'
  config_version: 2
sg_config:
  dynamic:
    auth_failure_listeners: {}
    disable_rest_auth: false
    disable_intertransport_auth: false
    do_not_fail_on_forbidden: true
    do_not_fail_on_forbidden_empty: false
    filtered_alias_mode: 'warn'
    hosts_resolver_mode: 'ip-only'
    multi_rolespan_enabled: true
    respect_request_indices_options: false
    license: '${env.SG_LICENSE}'
    http:
      anonymous_auth_enabled: false
      xff:
        enabled: true
        internalProxies: '.*'  # Trust all internl proxies, regex pattern.
        remoteIpHeader: 'X-Forwarded-For'
    kibana:
      multitenancy_enabled: true
      server_username: 'kibanaserver'
      index: '.kibana'
      rbac_enabled: false
    authc:
      basic_internal_auth_domain:
        http_enabled: true
        transport_enabled: true
        order: 0
        http_authenticator:
          challenge: false  # true for ldap/basic auth
          type: 'basic'
          config: {}
        authentication_backend:
          type: 'intern'
          config: {}
        description: 'Internal Users'
        skip_users: []

      adfs_ldap:
        http_enabled: true
        transport_enabled: true
        order: 1
        http_authenticator:
          challenge: false  # true for ldap/basic auth
          type: 'basic'
          config: {}
        authentication_backend:
          type: 'ldap'
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: false
            pemtrustedcas_filepath: 'certs/app/ca_bundle.pem'
            hosts:
              - '${env.SG_LDAP_HOST}'
            bind_dn: '${env.SG_LDAP_BIND_DN}'
            password: '${env.SG_LDAP_BIND_DN_PASSWD}'
            users:
              athene:
                base: '${env.SG_LDAP_USER_BASE}'
                search: '(sAMAccountName={0})'
            username_attribute: 'cn'
        description: 'LDAP Authentication'
        skip_users: []

      adfs_saml:
        http_enabled: true
        order: 2
        http_authenticator:
          type: 'com.floragunn.dlic.auth.http.saml.HTTPSamlAuthenticator' # Was: 'saml'
          # type: 'saml'
          challenge: true  # false for ldap/basic auth
          config:
            idp:
              metadata_url: '${env.SG_SAML_METADATA_URL}'
              entity_id: '${env.SG_SAML_IDP_ENTITY_ID}'
              enable_ssl: true
              verify_hostnames: false
              # verify_hostnames: true
              pemtrustedcas_filepath: 'certs/host/ca-bundle.crt'
            sp:
              entity_id: '${env.SG_SAML_SP_ENTITY_ID}'
            roles_key: 'Role'
            # roles_key: 'http://schemas.microsoft.com/ws/2008/06/identity/claims/role'
            # roles_key: 'role'
            exchange_key: '${env.SG_SAML_EXCHANGE_KEY}'
            kibana_url: '${env.KIBANA_URL}'
        authentication_backend:
          type: 'noop'

    authz:
      adfs_ldap_roles:
        http_enabled: true
        transport_enabled: true
        authorization_backend:
          type: 'ldap'
          config:
            enable_ssl: true
            enable_start_tls: false
            enable_ssl_client_auth: false
            verify_hostnames: false
            pemtrustedcas_filepath: '/usr/share/elasticsearch/config/certs/app/ca_bundle.pem'
            hosts:
              - '${env.SG_LDAP_HOST}'
            bind_dn: '${env.SG_LDAP_BIND_DN}'
            password: '${env.SG_LDAP_BIND_DN_PASSWD}'
            roles:
              athene:
                base: '${env.SG_LDAP_ROLE_BASE}'
                search: '(member={1})'
            userroleattribute: null
            userrolename: 'memberOf'
            rolename: 'cn'
            resolve_nested_roles: true
            users:
              athene:
                base: '${env.SG_LDAP_USER_BASE}'
                search: '(sAMAccountName={0})'
            username_attribute: 'cn'
        description: 'LDAP Roles'
        skipped_users: []

We loaded initial roles per SG config, with the following additional role defined:

---
_sg_meta:
  type: 'roles'
  config_version: 2
TLD_BASIC_ACCESS:
  reserved: false
  hidden: false
  description: 'Grant RO access to indices for users.'
  cluster_permissions:
    - 'SGS_CLUSTER_COMPOSITE_OPS_RO'
  index_permissions:
    - index_patterns:
        - '/^(?!sg7-)\S*$/'
      fls: []
      masked_fields: []
      allowed_actions:
        - 'SGS_READ'
  tenant_permissions: []
  static: false

This is the role-mapping we loaded:

---
TLD_BASIC_ACCESS:
  reserved: false
  hidden: false
  backend_roles:
    - '${env.SG_LDAP_IT_USERS}'
    - '${env.SG_LDAP_NONIT_USERS}'
    - '${env.SG_SAML_USERS}'
  hosts: []
  users: []
  and_backend_roles: []
  description: 'Grant RO access to indices for users.'
SGS_LOGSTASH:
  reserved: false
  hidden: false
  backend_roles:
    - 'logstash'
  hosts: []
  users: []
  and_backend_roles: []
SGS_ALL_ACCESS:
  reserved: true
  hidden: false
  backend_roles:
    - 'admin'
    - '${env.SG_LDAP_ADMINS}'
    - '${env.SG_SAML_ADMINS}'
  hosts: []
  users: []
  and_backend_roles: []
  description: 'Map admin users to SGS_ALL_ACCESS.'
SGS_KIBANA_SERVER:
  reserved: true
  hidden: false
  backend_roles: []
  hosts: []
  users:
    - 'kibanaserver'
  and_backend_roles: []
SGS_MANAGE_SNAPSHOTS:
  reserved: true
  hidden: false
  backend_roles:
    - 'snapshotrestore'
  hosts: []
  users: []
  and_backend_roles: []
_sg_meta:
  type: 'rolesmapping'
  config_version: 2
SGS_OWN_INDEX:
  reserved: false
  hidden: false
  backend_roles: []
  hosts: []
  users:
    - '*'
  and_backend_roles: []
  description: 'Allow full access to an index named like the username'
SGS_KIBANA_USER:
  reserved: false
  hidden: false
  backend_roles:
    - 'kibanauser'
    - '${env.SG_LDAP_IT_USERS}'
    - '${env.SG_LDAP_NONIT_USERS}'
    - '${env.SG_SAML_USERS}'
  hosts: []
  users: []
  and_backend_roles: []
  description: 'Grant basic access to Kibana.'
SGS_READALL:
  reserved: true
  hidden: false
  backend_roles:
    - 'readall'
  hosts: []
  users: []
  and_backend_roles: []
  description: 'Grant read-only permission to all indices.'

I’ve created a few additional roles/mappings since then (I probably need to pull a new copy of everything for backup, in fact), but again, all these are working.

What I do not see defined, when I look at defined roles in the UI (including system roles) are the SG_USER or SG_ADMIN roles. Is this something I need to create?

Update

I just checked the 49.0.0 initial config files and don’t see the SG_USER or SG_ADMIN roles defined in them, either.

Doug_Renze · February 11, 2021, 4:23pm

So…figured out how to manually execute a watch (I can’t believe I missed the button), and with the help of that, I’ve debugged the check. It turns out that it was failing, and the filter was causing the issue. The following watch:

[
  {
    "type": "static",
    "name": "constants",
    "target": "constants",
    "value": {
      "threshold": 0,
      "window": "15m"
    }
  },
  {
    "type": "search",
    "name": "event_count",
    "target": "event_count",
    "request": {
      "indices": [
        "filebeat-*"
      ],
      "body": {
        "size": 0,
        "track_total_hits": true,
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "service.name": "my-service-name"
                }
              },
              {
                "range": {
                  "@timestamp": {
                    "gte": "now-{{data.constants.window}}"
                  }
                }
              }
            ]
          }
        }
      }
    }
  },
  {
    "name": "event_count_alert",
    "type": "condition",
    "source": "data.event_count.hits.total.value <= data.constants.threshold"
  }
]

…gives me the query results I’m expecting for event_count:

{
  "watch": {
    "id": "__inline_watch",
    "tenant": "_main"
  },
  "data": {
    "constants": {
      "threshold": 0,
      "window": "15m"
    },
    "event_count": {
      "_shards": {
        "total": 3,
        "failed": 0,
        "successful": 3,
        "skipped": 1
      },
      "hits": {
        "hits": [],
        "total": {
          "value": 4680,
          "relation": "eq"
        },
        "max_score": null
      },
      "took": 10028,
      "timed_out": false
    }
  },
  "severity": null,
  "trigger": {
    "triggered_time": null,
    "scheduled_time": null,
    "previous_scheduled_time": null,
    "next_scheduled_time": null
  },
  "execution_time": "2021-02-11T16:12:48.400158573Z"
}

However, based on the trigger section, I assume that my trigger is failing. I get the same results if I set data.constants.threshold: 10000. Any ideas about that? And since this has forked into two separate questions, please let me know if it makes more sense to split them into two separate posts.

nils · February 12, 2021, 8:02am

No, that can be actually not concluded from the trigger section, assuming you are referring to this:

  "trigger": {
    "triggered_time": null,
    "scheduled_time": null,
    "previous_scheduled_time": null,
    "next_scheduled_time": null
  },

The properties here are null because the watch was executed by manual execution. Thus, it has no schedule in this case.

If you want to learn about the status of the automatically triggered watch executions, go to the overview page of Signals and click on the “Execution History” icon on the left side of the watch:

On the following page, you get a list of the statuses of all executions of the watch. Click on an entry to see the detailed information.

However, looking at the watch definition, I just notice that the watch is missing a trigger definition. Triggers define a schedule when a watch is automatically executed:

It is okay for watches to have no trigger. But then, you have to trigger them externally via the API. This is for cases where you want to “push” events from external systems into Signals:

Yes, this is a good idea. I’ll start a new thread for the LDAP-related question:

Doug_Renze · February 12, 2021, 1:55pm

That makes sense. Thanks for the clarification.

I’m defining the trigger definition through the Kibana UI. This is the trigger, as defined in the UI:

However, even when enabled, I’m not seeing anything in the execution history. Note that right now I don’t have an action defined. Does the execution history only show up if there is an action defined?

srgbnd · February 15, 2021, 8:44am

Hi @Doug_Renze
A watch can be triggered if there are no actions. Check if the watch is enabled. Execute the watch manually, do you see any result?

An example of a watch without actions.

The execution history. See the status.code, it says NO_ACTION.

srgbnd · February 15, 2021, 8:50am

Btw, you can use https://webhook.site to try the webhook and email actions in a test environment.

Doug_Renze · February 15, 2021, 3:23pm

Watch is enabled, see screenshot, below:

…and when I manually execute, I get the following response (on the right), which shows the numbers I expect to see:

However, as of this morning, this particular watch has been scheduled since Thursday or Friday, and I’m seeing the following, when I look at the execution history:

nils · February 15, 2021, 3:44pm

We probably need to have a look at the raw watch definition. For this, can you please open the following URI directly on the Elasticsearch endpoint (i.e., bypassing Kibana)?

https://localhost:9200/_signals/watch/_main/My%20%Application%20Event%Count

Replace localhost by the hostname of your ES endpoint. This is assuming that you are using the default tenant. If you are using another tenant, replace _main by the name of the particular tenant.

This should provide you a JSON version of the whole watch definition.

Additionally, it would be helpful to read out the internal watch state. This can be done by appending to the URI mentioned above the suffix /_state, i.e.:

https://localhost:9200/_signals/watch/_main/My%20%Application%20Event%Count/_state

The resulting JSON might be a bit bigger, as it can contain results from the last execution. If necessary, remove the data attribute.

Finally: Can you check the internal Signals settings? This also needs to be done directly on ES:

https://localhost:9200/_signals/settings

Normally, this will just provide an empty JSON object: {}. If you get anything else here, it would be helpful if you share it.

Doug_Renze · February 15, 2021, 4:27pm

{
  "_id" : "_main/My Application Event Count",
  "_tenant" : "_main",
  "found" : true,
  "_version" : 10,
  "_seq_no" : 20,
  "_primary_term" : 99,
  "_source" : {
    "_ui" : {
      "isSeverity" : false,
      "watchType" : "json",
      "index" : [ ],
      "timeField" : "",
      "aggregationType" : "count",
      "fieldName" : [ ],
      "topHitsAgg" : {
        "field" : [ ],
        "size" : 3,
        "order" : "asc"
      },
      "overDocuments" : "all documents",
      "bucketValue" : 1,
      "bucketUnitOfTime" : "h",
      "thresholdValue" : 1000,
      "thresholdEnum" : "ABOVE",
      "isResolveActions" : true,
      "severity" : {
        "value" : [ ],
        "valueString" : "",
        "order" : "ascending",
        "thresholds" : {
          "info" : 100,
          "warning" : 200,
          "error" : 300,
          "critical" : 400
        }
      }
    },
    "checks" : [
      {
        "type" : "static",
        "name" : "constants",
        "target" : "constants",
        "value" : {
          "threshold" : 0,
          "window" : "15m"
        }
      },
      {
        "type" : "search",
        "name" : "event_count",
        "target" : "event_count",
        "request" : {
          "indices" : [
            "filebeat-*"
          ],
          "body" : {
            "size" : 0,
            "track_total_hits" : true,
            "query" : {
              "bool" : {
                "must" : [
                  {
                    "match" : {
                      "service.name" : "my-service"
                    }
                  },
                  {
                    "range" : {
                      "@timestamp" : {
                        "gte" : "now-{{data.constants.window}}"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      },
      {
        "name" : "event_count_alert",
        "type" : "condition",
        "source" : "data.event_count.hits.total.value <= data.constants.threshold"
      }
    ],
    "resolve_actions" : [ ],
    "active" : true,
    "_meta" : {
      "last_edit" : {
        "user" : "me@example.com",
        "date" : "2021-02-11T18:31:20.406Z"
      }
    },
    "trigger" : {
      "schedule" : {
        "interval" : [
          "15m"
        ],
        "timezone" : "Europe/Berlin"
      }
    },
    "_tenant" : "_main",
    "actions" : [ ]
  }

Note that the above watch definition does not include a closing }. This was not present in the output when I fetched it.

{
  "_tenant": "_main",
  "actions": {},
  "last_execution": {
    "watch": {
      "id": "My Application Event Count",
      "tenant": "_main"
    },
    "data": {
      "event_count": {
        "_shards": {
          "total": 4,
          "failed": 0,
          "successful": 4,
          "skipped": 2
        },
        "hits": {
          "hits": [],
          "total": {
            "value": 8067,
            "relation": "eq"
          },
          "max_score": null
        },
        "took": 5,
        "timed_out": false
      },
      "constants": {
        "threshold": 0,
        "window": "15m"
      }
    },
    "severity": null,
    "trigger": {
      "triggered_time": "2021-02-15T16:15:05.884Z",
      "scheduled_time": "2021-02-15T16:15:05.884Z",
      "previous_scheduled_time": "2021-02-15T16:00:05.884Z",
      "next_scheduled_time": "2021-02-15T16:30:05.884Z"
    },
    "execution_time": "2021-02-15T16:15:05.885300585Z"
  },
  "last_status": {
    "code": "NO_ACTION",
    "detail": "No action needed due to check event_count_alert"
  },
  "node": null
}

As you noted it should be, this was an empty JSON object: {}.

nils · February 15, 2021, 4:50pm

This looks rather good; the output of the _state endpoint indicates that the watch is getting executed.

The most probable reason why you don’t see the execution history is that access rights to the index are missing.

You can add these rights by adding the following role to a user or by adding the index_pattern to an existing suitable role:

signals_log:
  index_permissions:
  - index_patterns:
    - ".signals_log*"
    allowed_actions:
    - "SGS_READ"

Note that this gives you access to all log entries, including those produced by other tenants. If you want to restrict this, you would need to use document level security.

This is not well documented; I’ll add an issue to improve the documentation in this regard.

Doug_Renze · February 15, 2021, 7:57pm

nils:

You can add these rights by adding the following role to a user or by adding the index_pattern to an existing suitable role:
signals_log:
  index_permissions:
  - index_patterns:
    - ".signals_log*"
    allowed_actions:
    - "SGS_READ"
Note that this gives you access to all log entries, including those produced by other tenants. If you want to restrict this, you would need to use document level security.

I’ve added a role with these permissions and mapped it to my admin group:

However, I’m still unable to see them, and when I view the system & hidden indices, I don’t have a .signals_log index:

Additionally, my account has SGS_ALL_ACCESS assigned. Shouldn’t that have access to .signals_log by default, if it existed?

nils · February 16, 2021, 2:27pm

This is very strange indeed. You are right, as a user with the role SGS_ALL_ACCESS you should have access to the index anyway.

To get the whole picture, we should check whether we can access the log in any way directly from ES. For this, can you open directly in ES this URL and post the results?

https://localhost:9200/*signals*/_search?size=100

Also, it would be useful to see some logs at the time a watch has been executed.

For this, it would be good if you would increase the Signals log level. On a running Elasticsearch cluster this can be done by a HTTP PUT request:

PUT /_cluster/settings
{
  "transient": {
    "logger.com.floragunn.signals": "DEBUG"
  }
}

If you have curl installed, you can send this request by using this command:

curl -k -u admin -X PUT -H "Content-Type: application/json" -d '{ "transient": {  "logger.com.floragunn.signals": "DEBUG" }}' 'https://your-es-host:9200/_cluster/settings'

Please replace admin by your user name and your-es-host by an endpoint of your ES cluster.

Then, logs would be interesting that are generated around the point of time the watch was executed. You can find out the exact timestamp of the last execution using the _state REST API I noted in an earlier post:

https://localhost:9200/_signals/watch/_main/My%20%Application%20Event%Count/_state

Details on configuring the log level in ES can be found here:

Doug_Renze · February 16, 2021, 4:00pm

nils:

To get the whole picture, we should check whether we can access the log in any way directly from ES. For this, can you open directly in ES this URL and post the results?
https://localhost:9200/*signals*/_search?size=100

Here’s the result. No hits came back:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

nils:

Also, it would be useful to see some logs at the time a watch has been executed.

For this, it would be good if you would increase the Signals log level. On a running Elasticsearch cluster this can be done by a HTTP PUT request:
PUT /_cluster/settings
{
  "transient": {
    "logger.com.floragunn.signals": "DEBUG"
  }
}

Here’s my status after doing this:

{
  "_tenant": "_main",
  "actions": {},
  "last_execution": {
    "watch": {
      "id": "event-count",
      "tenant": "_main"
    },
    "data": {
      "event_count": {
        "_shards": {
          "total": 4,
          "failed": 0,
          "successful": 4,
          "skipped": 2
        },
        "hits": {
          "hits": [],
          "total": {
            "value": 5335,
            "relation": "eq"
          },
          "max_score": null
        },
        "took": 4,
        "timed_out": false
      },
      "constants": {
        "threshold": 10000,
        "window": "15m"
      }
    },
    "severity": null,
    "trigger": {
      "triggered_time": "2021-02-16T15:18:31.647Z",
      "scheduled_time": "2021-02-16T15:18:31.647Z",
      "previous_scheduled_time": "2021-02-16T15:03:31.647Z",
      "next_scheduled_time": "2021-02-16T15:33:31.647Z"
    },
    "execution_time": "2021-02-16T15:18:31.648823169Z"
  },
  "last_status": {
    "code": "NO_ACTION",
    "detail": "No action needed to be executed due to their conditions"
  },
  "node": "elk-es-data-9"
}

Here’s an ES log entry from after the scheduled watch execution:

{
  "type": "server",
  "timestamp": "2021-02-16T15:33:32,145Z",
  "level": "DEBUG",
  "component": "c.f.s.w.s.WatchStateIndexWriter",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Updated event-count to:\n{\"_tenant\":\"_main\",\"actions\":{},\"last_execution\":{\"watch\":{\"id\":\"event-count\",\"tenant\":\"_main\"},\"data\":{\"event_count\":{\"_shards\":{\"total\":4,\"failed\":0,\"successful\":4,\"skipped\":2},\"hits\":{\"hits\":[],\"total\":{\"value\":4607,\"relation\":\"eq\"},\"max_score\":null},\"took\":4,\"timed_out\":false},\"constants\":{\"threshold\":10000,\"window\":\"15m\"}},\"severity\":null,\"trigger\":{\"triggered_time\":\"2021-02-16T15:33:31.647Z\",\"scheduled_time\":\"2021-02-16T15:33:31.647Z\",\"previous_scheduled_time\":\"2021-02-16T15:18:31.647Z\",\"next_scheduled_time\":\"2021-02-16T15:48:31.647Z\"},\"execution_time\":\"2021-02-16T15:33:31.648212973Z\"},\"last_status\":{\"code\":\"NO_ACTION\",\"detail\":\"No action needed to be executed due to their conditions\"},\"node\":\"elk-es-data-9\"}\n{\"_index\":\".signals_watches_state\",\"_type\":\"_doc\",\"_id\":\"_main/event-count\",\"_version\":88,\"result\":\"updated\",\"forced_refresh\":true,\"_shards\":{\"total\":2,\"successful\":2,\"failed\":0},\"_seq_no\":746,\"_primary_term\":100}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}

This is what the message field expands to:

{
  "_tenant": "_main",
  "actions": {},
  "last_execution": {
    "watch": {
      "id": "event-count",
      "tenant": "_main"
    },
    "data": {
      "event_count": {
        "_shards": {
          "total": 4,
          "failed": 0,
          "successful": 4,
          "skipped": 2
        },
        "hits": {
          "hits": [],
          "total": {
            "value": 4607,
            "relation": "eq"
          },
          "max_score": null
        },
        "took": 4,
        "timed_out": false
      },
      "constants": {
        "threshold": 10000,
        "window": "15m"
      }
    },
    "severity": null,
    "trigger": {
      "triggered_time": "2021-02-16T15:33:31.647Z",
      "scheduled_time": "2021-02-16T15:33:31.647Z",
      "previous_scheduled_time": "2021-02-16T15:18:31.647Z",
      "next_scheduled_time": "2021-02-16T15:48:31.647Z"
    },
    "execution_time": "2021-02-16T15:33:31.648212973Z"
  },
  "last_status": {
    "code": "NO_ACTION",
    "detail": "No action needed to be executed due to their conditions"
  },
  "node": "elk-es-data-9"
}

{
  "_index": ".signals_watches_state",
  "_type": "_doc",
  "_id": "_main/event-count",
  "_version": 88,
  "result": "updated",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 746,
  "_primary_term": 100
}

nils · February 17, 2021, 3:43pm

Thank you for the update. Unfortunately, the problem remains mysterious for now. Thus, we need to do some more investigations.

Would it be possible to check whether there are more log messages which come temporally directly after the log message you cited? They might be helpful, even if they seem to be unrelated on the first sight.

Please also check your config/elasticsearch.yml. Does it contain a property named signals.index_names.log? By default, it should not.

There is one more thing you can try to rule out problems with the cluster topolgy. By default, Signals distributes the watch execution over all nodes for load balancing purposes. It is however possible to restrict this to certain nodes. With this command you restrict the watch execution for the _main (default) tenant to master nodes:

curl -k -u admin -X PUT -H "Content-Type: application/json" -d '"master:true"' 'https://your-es-host:9200/_signals/settings/tenant._main.node_filter'

The setting takes effect without any further action. Just wait for the next execution of the watch and then check whether the .signals_log_* index is created.

If this does not help either, it might be worth the try to change the index name of the log index. To do so, execute this command:

curl -k -u admin -X PUT -H "Content-Type: application/json" -d '"signals_log_test"' 'https://your-es-host:9200/_signals/settings/watch_log.index'

Note that this setting does not get picked up by the Kibana plugin right now, so Kibana won’t be able to display log entries. This is just for testing.

Doug_Renze · February 17, 2021, 6:39pm

Here are all log entries on that node, starting from the most recent watch execution through the current end of the logfile. At the point where I grabbed this, there had not been any events written to the log for about 5 min.

Note that there appears to be an error in there from c.f.s.w.r.WatchLogIndexWriter, which (of course) notes that there is no .signals_log_<now/d> index.

{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,648Z",
  "level": "INFO",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Running my-application-event-count",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,648Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Before running StaticInput constants",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,648Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "After running StaticInput constants\n{constants={threshold=10000, window=15m}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,648Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Before running SearchInput event_count",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,648Z",
  "level": "DEBUG",
  "component": "c.f.s.w.c.AbstractSearchInput",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Executed template script:\n{\"size\":0,\"track_total_hits\":true,\"query\":{\"bool\":{\"must\":[{\"match\":{\"service.name\":\"my-application-service-name\"}},{\"range\":{\"@timestamp\":{\"gte\":\"now-15m\"}}}]}}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,649Z",
  "level": "DEBUG",
  "component": "c.f.s.w.c.AbstractSearchInput",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Executing: SearchRequest{searchType=QUERY_THEN_FETCH, indices=[filebeat-*], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={\"size\":0,\"query\":{\"bool\":{\"must\":[{\"match\":{\"service.name\":{\"query\":\"my-application-service-name\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"auto_generate_synonyms_phrase_query\":true,\"boost\":1.0}}},{\"range\":{\"@timestamp\":{\"from\":\"now-15m\",\"to\":null,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"track_total_hits\":2147483647}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,656Z",
  "level": "DEBUG",
  "component": "c.f.s.w.c.AbstractSearchInput",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Response: {\"took\":4,\"timed_out\":false,\"_shards\":{\"total\":5,\"successful\":5,\"skipped\":3,\"failed\":0},\"hits\":{\"total\":{\"value\":1940,\"relation\":\"eq\"},\"max_score\":null,\"hits\":[]}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,656Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "After running SearchInput event_count\n{event_count={_shards={total=5, failed=0, successful=5, skipped=3}, hits={hits=[], total={value=1940, relation=eq}, max_score=null}, took=4, timed_out=false}, constants={threshold=10000, window=15m}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,656Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Before running Condition event_count_alert",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,656Z",
  "level": "DEBUG",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "After running Condition event_count_alert\n{event_count={_shards={total=5, failed=0, successful=5, skipped=3}, hits={hits=[], total={value=1940, relation=eq}, max_score=null}, took=4, timed_out=false}, constants={threshold=10000, window=15m}}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,656Z",
  "level": "ERROR",
  "component": "c.f.s.w.r.WatchLogIndexWriter",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Error while writing WatchLog WatchLog [id=null, watchId=my-application-event-count, status=NO_ACTION No action needed to be executed due to their conditions, executionStart=Wed Feb 17 16:48:31 UTC 2021, executionFinished=Wed Feb 17 16:48:31 UTC 2021, data={event_count={_shards={total=5, failed=0, successful=5, skipped=3}, hits={hits=[], total={value=1940, relation=eq}, max_score=null}, took=4, timed_out=false}, constants={threshold=10000, window=15m}}, actions=[]]",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ",
  "stacktrace": [
    "org.elasticsearch.index.IndexNotFoundException: no such index [<.signals_log_{now/d}>] and [action.auto_create_index] ([.*,sg7-*,searchguard,searchguard*,ilm-history-*]) doesn't match",
    "at org.elasticsearch.action.support.AutoCreateIndex.shouldAutoCreate(AutoCreateIndex.java:109) ~[elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportBulkAction.shouldAutoCreate(TransportBulkAction.java:393) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportBulkAction.doInternalExecute(TransportBulkAction.java:252) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:177) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:102) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) [elasticsearch-7.10.2.jar:7.10.2]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.apply0(SearchGuardFilter.java:246) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.lambda$apply$0(SearchGuardFilter.java:122) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:46) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.lambda$provide$0(SpecialPrivilegesEvaluationContextProviderRegistry.java:41) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.authtoken.AuthTokenService.provide(AuthTokenService.java:732) [dlic-search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:37) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.lambda$provide$0(SpecialPrivilegesEvaluationContextProviderRegistry.java:41) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.internalauthtoken.InternalAuthTokenProvider.userAuthFromToken(InternalAuthTokenProvider.java:96) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:37) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:28) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.apply(SearchGuardFilter.java:118) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:155) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:52) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:36) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) [elasticsearch-7.10.2.jar:7.10.2]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.apply0(SearchGuardFilter.java:246) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.lambda$apply$0(SearchGuardFilter.java:122) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:46) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.lambda$provide$0(SpecialPrivilegesEvaluationContextProviderRegistry.java:41) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.authtoken.AuthTokenService.provide(AuthTokenService.java:732) [dlic-search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:37) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.lambda$provide$0(SpecialPrivilegesEvaluationContextProviderRegistry.java:41) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.internalauthtoken.InternalAuthTokenProvider.userAuthFromToken(InternalAuthTokenProvider.java:96) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:37) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.privileges.SpecialPrivilegesEvaluationContextProviderRegistry.provide(SpecialPrivilegesEvaluationContextProviderRegistry.java:28) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchguard.filter.SearchGuardFilter.apply(SearchGuardFilter.java:118) [search-guard-suite-security-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:155) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:83) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:86) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:75) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:412) [elasticsearch-7.10.2.jar:7.10.2]",
    "at org.elasticsearch.client.support.AbstractClient.index(AbstractClient.java:425) [elasticsearch-7.10.2.jar:7.10.2]",
    "at com.floragunn.signals.watch.result.WatchLogIndexWriter.put(WatchLogIndexWriter.java:58) [search-guard-suite-signals-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.signals.execution.WatchRunner.execute(WatchRunner.java:199) [search-guard-suite-signals-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.signals.execution.WatchRunner.execute(WatchRunner.java:126) [search-guard-suite-signals-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at com.floragunn.searchsupport.jobs.execution.AuthorizingJobDecorator.execute(AuthorizingJobDecorator.java:37) [search-guard-suite-scheduler-7.10.2-49.0.0.jar:7.10.2-49.0.0]",
    "at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.3.2.jar:?]",
    "at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.3.2.jar:?]"
  ]
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:31,658Z",
  "level": "INFO",
  "component": "c.f.s.e.WatchRunner",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Finished my-application-event-count: NO_ACTION No action needed to be executed due to their conditions",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:48:32,138Z",
  "level": "DEBUG",
  "component": "c.f.s.w.s.WatchStateIndexWriter",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Updated my-application-event-count to:\n{\"_tenant\":\"_main\",\"actions\":{},\"last_execution\":{\"watch\":{\"id\":\"my-application-event-count\",\"tenant\":\"_main\"},\"data\":{\"event_count\":{\"_shards\":{\"total\":5,\"failed\":0,\"successful\":5,\"skipped\":3},\"hits\":{\"hits\":[],\"total\":{\"value\":1940,\"relation\":\"eq\"},\"max_score\":null},\"took\":4,\"timed_out\":false},\"constants\":{\"threshold\":10000,\"window\":\"15m\"}},\"severity\":null,\"trigger\":{\"triggered_time\":\"2021-02-17T16:48:31.647Z\",\"scheduled_time\":\"2021-02-17T16:48:31.647Z\",\"previous_scheduled_time\":\"2021-02-17T16:33:31.647Z\",\"next_scheduled_time\":\"2021-02-17T17:03:31.647Z\"},\"execution_time\":\"2021-02-17T16:48:31.648732017Z\"},\"last_status\":{\"code\":\"NO_ACTION\",\"detail\":\"No action needed to be executed due to their conditions\"},\"node\":\"elk-es-data-9\"}\n{\"_index\":\".signals_watches_state\",\"_type\":\"_doc\",\"_id\":\"_main/my-application-event-count\",\"_version\":189,\"result\":\"updated\",\"forced_refresh\":true,\"_shards\":{\"total\":2,\"successful\":2,\"failed\":0},\"_seq_no\":847,\"_primary_term\":100}",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:50:29,548Z",
  "level": "INFO",
  "component": "o.o.s.m.r.i.AbstractReloadingMetadataResolver",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Metadata Resolver SamlHTTPMetadataResolver com.floragunn.dlic.auth.http.saml.HTTPSamlAuthenticator_23: New metadata successfully loaded for 'https://ifs.example.com/FederationMetadata/2007-06/FederationMetadata.xml'",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}
{
  "type": "server",
  "timestamp": "2021-02-17T16:50:29,548Z",
  "level": "INFO",
  "component": "o.o.s.m.r.i.AbstractReloadingMetadataResolver",
  "cluster.name": "elk-es",
  "node.name": "elk-es-data-9",
  "message": "Metadata Resolver SamlHTTPMetadataResolver com.floragunn.dlic.auth.http.saml.HTTPSamlAuthenticator_23: Next refresh cycle for metadata provider 'https://ifs.example.com/FederationMetadata/2007-06/FederationMetadata.xml' will occur on '2021-02-17T19:50:29.182Z' ('2021-02-17T19:50:29.182Z' local time)",
  "cluster.uuid": "816ooLAjQmWfG71KY0DSSA",
  "node.id": "bpRgvpSwTM27WDQlRmwYxQ"
}

It is not. Since we’re running in EKS, I also confirmed that I didn’t set this elsewhere in the Helm values.

nils:

There is one more thing you can try to rule out problems with the cluster topolgy. By default, Signals distributes the watch execution over all nodes for load balancing purposes. It is however possible to restrict this to certain nodes. With this command you restrict the watch execution for the _main (default) tenant to master nodes:
curl -k -u admin -X PUT -H "Content-Type: application/json" -d '"master:true"' 'https://your-es-host:9200/_signals/settings/tenant._main.node_filter'
The setting takes effect without any further action. Just wait for the next execution of the watch and then check whether the .signals_log_* index is created.

Did this. Got the identical error on the master node.

nils:

If this does not help either, it might be worth the try to change the index name of the log index. To do so, execute this command:
curl -k -u admin -X PUT -H "Content-Type: application/json" -d '"signals_log_test"' 'https://your-es-host:9200/_signals/settings/watch_log.index'
Note that this setting does not get picked up by the Kibana plugin right now, so Kibana won’t be able to display log entries. This is just for testing.

This worked. I had to create it as sg7-signals-log, however, since I have index auto-creation disabled. My exceptions to this are indices matching the following patterns (you can see that .signals_log_* should’ve autocreated without any issues, and other indices matching that pattern are created as expected):

{
  "persistent" : {
    "action" : {
      "auto_create_index" : ".*,sg7-*,searchguard,searchguard*,ilm-history-*"
    }
  }
}

However, when I go to look at indices or the settings for this specific index in the Kibana UI, I now get a “forbidden” message (please see the screenshot, below). I don’t get that with any other index, including the .signals_* indices.

I tried to create an index under ILM with a rollover alias to see whether that works and will populate for me in a fashion that I can view the data. As part of this, I attempted to delete the sg7-signals-log index. However, SG appears to have created an index that even I, as the admin with SGS_ALL_ACCESS can’t delete. Here’s the error I get:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "no permissions for [indices:admin/delete] and User [name=me@example.com, backend_roles=[SG_ADMIN, SG_USER], requestedTenant=null]"
      }
    ],
    "type" : "security_exception",
    "reason" : "no permissions for [indices:admin/delete] and User [name=me@example.com, backend_roles=[SG_ADMIN, SG_USER], requestedTenant=null]"
  },
  "status" : 403
}

Doug_Renze · February 17, 2021, 7:51pm

Since I manually created an index with ILM rollover and told it to populate to that rollover alias, I can now see the events populating into that index. Here’s a sample, as executed from the Kibana console (**note: sg-signals-log is the log alias, which is also what I set watch_log.index to):

GET /sg-signals-log/_search
{
  "took" : 1874,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "sg-signals-log-000001",
        "_type" : "_doc",
        "_id" : "kgFgsXcBZV7d8RktCw-d",
        "_score" : 1.0,
        "_source" : {
          "tenant" : "_main",
          "watch_id" : "event-count",
          "status" : {
            "code" : "NO_ACTION",
            "detail" : "No action needed to be executed due to their conditions"
          },
          "execution_start" : "2021-02-17T19:03:17.649Z",
          "execution_end" : "2021-02-17T19:03:17.660Z",
          "actions" : [ ],
          "node" : "elk-es-master-0"
        }
      },
      {
        "_index" : "sg-signals-log-000001",
        "_type" : "_doc",
        "_id" : "kwFtsXcBZV7d8Rktxw8_",
        "_score" : 1.0,
        "_source" : {
          "tenant" : "_main",
          "watch_id" : "event-count",
          "status" : {
            "code" : "NO_ACTION",
            "detail" : "No action needed to be executed due to their conditions"
          },
          "execution_start" : "2021-02-17T19:18:17.649Z",
          "execution_end" : "2021-02-17T19:18:17.663Z",
          "actions" : [ ],
          "node" : "elk-es-master-0"
        }
      },
      {
        "_index" : "sg-signals-log-000001",
        "_type" : "_doc",
        "_id" : "lAF7sXcBZV7d8Rktgg_a",
        "_score" : 1.0,
        "_source" : {
          "tenant" : "_main",
          "watch_id" : "event-count",
          "status" : {
            "code" : "NO_ACTION",
            "detail" : "No action needed to be executed due to their conditions"
          },
          "execution_start" : "2021-02-17T19:33:17.648Z",
          "execution_end" : "2021-02-17T19:33:17.658Z",
          "actions" : [ ],
          "node" : "elk-es-master-0"
        }
      }
    ]
  }
}

So my question is this: Is it simply possible that watch_log.index was previously undefined, and that this was the reason it wasn’t creating? If this is the case (or even just to test what happens), can you generate the command that will allow me to set watch_log.index to what the default value should be (not just to reset it to the default value, in case that default value was blank for me)?

If this can’t be done, is there any way I can set SG to use this value for watch_log.index for everything, including Kibana, as a workaround?

Doug_Renze:

As part of this, I attempted to delete the sg7-signals-log index. However, SG appears to have created an index that even I, as the admin with SGS_ALL_ACCESS can’t delete. Here’s the error I get:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "no permissions for [indices:admin/delete] and User [name=me@example.com, backend_roles=[SG_ADMIN, SG_USER], requestedTenant=null]"
      }
    ],
    "type" : "security_exception",
    "reason" : "no permissions for [indices:admin/delete] and User [name=me@example.com, backend_roles=[SG_ADMIN, SG_USER], requestedTenant=null]"
  },
  "status" : 403
}

…and how do I delete the sg7-signals-log index, since I don’t seem to have any permissions to it?

Doug_Renze · February 18, 2021, 1:48pm

When I execute the following against my dev cluster, I get back an empty string (""):

curl -k -u $USERNAME:$PASSWORD https://elasticsearch-dev:9200/_signals/settings/watch_log.index

On the Signals Administration page, I found the following information:

watch_log.index: The name of the watch log index. Defaults to <.signals_log_{now/d}> which starts a new index every day. If you expect a higher throughput which makes it necessary to start an index more often, you can use another date math expression.

Based on this, I executed the following:

curl -k -u $USERNAME:$PASSWORD -X PUT -H "Content-Type: application/json" -d '".signals_log_{now/d}"' 'https://elasticsearch:9200/_signals/settings/watch_log.index'

However, the next time it ran, I got the following error in the log when it attempted to create the index:

"stacktrace": [
  "org.elasticsearch.transport.RemoteTransportException: [elk-es-master-2][10.229.24.58:9300][indices:admin/auto_create]",
  "Caused by: org.elasticsearch.indices.InvalidIndexNameException: Invalid index name [.signals_log_{now/d}], must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]",
  ...

So, reading more closely and following the example given on the date math web page, I executed it again as the following:

curl -k -u $USERNAME:$PASSWORD -X PUT -H "Content-Type: application/json" -d '"%3C.signals_log_%7Bnow%2Fd%7D%3E"' 'https://elasticsearch:9200/_signals/settings/watch_log.index'

However, then I get the following in the log:

"stacktrace": [
   "org.elasticsearch.index.IndexNotFoundException: no such index [%3C.signals_log_%7Bnow%2Fd%7D%3E] and [action.auto_create_index] ([.*,sg7-*,searchguard,searchguard*,ilm-history-*]) doesn't match",

Based on this, how do I give this a setting using date math, if that’s the issue?

I was able to do this by logging into one of the nodes and executing the following, where $SG_ADMIN_USER is Search Guard’s internal admin user (evidently SGS_ALL_ACCESS doesn’t have any perms on the sg7-* logs, as I couldn’t delete it using my regular admin user (even from the command line) and I get the forbidden message when trying to look at any of the sg7-* indices):

curl -k -u $SG_ADMIN_USER:$SG_ADMIN_PASSWD -X DELETE https://localhost:9200/sg7-signals-log

nils · February 18, 2021, 2:16pm

So, the key to the problem is here:

By default, Signals writes logs to indexes specified by the index expression <.signals_log_{now/d}>. This means that every day, a new index is created with the current date appended as suffix to the index name.

As such indexes are created on demand with auto creation, it is not created when auto creation is disabled.

If you want to use that index name, just add signals_log_* to the action.auto_create_index setting.

To restore the default of the index name setting, just do this:

curl -k -u $USERNAME:$PASSWORD -X DELETE 'https://elasticsearch:9200/_signals/settings/watch_log.index'

Regarding the problem with the date math syntax: Have you tried specifying the angle brackets without URL encoding? Like this:

`curl -k -u $USERNAME:$PASSWORD -X PUT -H "Content-Type: application/json" -d '"<.signals_log_{now/d}>"' 'https://elasticsearch:9200/_signals/settings/watch_log.index'`

If you don’t want to use date math for the logs index, it should be possible to use any index name starting with .signals_log_ instead. As the Kibana plugin reads the index using a wildcard, it should be also able to access such an index.

Topic		Replies	Views
Unable to save watches Search Guard	42	1813	April 12, 2021
Signals watch are skipping Signals Alerting	26	657	January 28, 2025
Signals runtime data not accessible Signals Alerting	13	1066	October 4, 2021
Error while initializing IndexJobStateStore "this.nodeId" is null Signals Alerting	4	914	March 15, 2021
Unable to create Watch using UI Search Guard	8	4536	May 2, 2022

Signals Watch getting "Failed to Fetch"

TL;DR

Application Version

The Issue

Update

Update

Related topics