during some work on my ELK cluster, .signals_accounts and .signals_watches indexes lost one of the main and backup shard turning cluster into red state. I tried deleting the shards with admin user, but it doesn’t have the correct permissions.
I’m using digicert certs for HTTPS (port 9200) access and selfsigned for transport API (port 9300). I tried generating a sgadmin cert with sgtlstool (using the same config I do for generating transport api certs) and I set the same DN under searchguard.authcz.admin_dn. When running the command from the ticket I linked earlier, I get an error:
http: error: SSLError: HTTPSConnectionPool(host='es1.domain.tld', port=9200): Max retries exceeded with url: /.signals_settings (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate unknown (_ssl.c:2548)'))) while doing a DELETE request to URL: https://es1.domain.tld:9200/.signals_settings
Command I ran: http --cert=sgadmin.pem --cert-key=sgadmin.key DELETE "https://es1.domain.tld:9200/.signals_settings"
I can see the following error in elasticsearch:
[2022-11-10T22:05:32,431][ERROR][c.f.s.s.h.n.SearchGuardSSLNettyHttpServerTransport] [es1] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
curl -XGET "https://es1.domain.tld:9200/" -v \
--key "sgadmin.key" \
--cert "sgadmin.pem"
Enter PEM pass phrase:
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to es1.arnes.si:9200
I tripple ckecked admin_dn and it’s a match.
Regarding es1_https.crt, it is not signed with the same root CA as agadmin.pem. es1_https.crt is signed by public CA whereas sgadmin.pem is internally signed, like our transport certs (for port 9300).
Do certs needs to be signed by the same root CA as the HTTPS certs?
when running without --insecure, I get curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to es1.domain.tld:9200, but when using --insecure, error is curl: (56) LibreSSL SSL_read: error:1404C416:SSL routines:ST_OK:sslv3 alert certificate unknown, errno 0.
Logging in with internal user admin works if I access / endpoint, but I can’t delete the index with this user (I get 403 Forbidden).
curl --insecure -XDELETE "https://es1.example.tld:9200/.signals_settings" -v -u "admin:xxxxx"
{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[], requestedTenant=null]"},"status":403}
Success. With a new cert signed by root CA of https endpoint did the trick.
Funny enough, I still had to run with --insecure flag. Running without it returned curl: (56) LibreSSL SSL_read: error:1404C416:SSL routines:ST_OK:sslv3 alert certificate unknown, errno 0.