Using ESHadoop with a searchguard enabled cluster

Hello,

I know this is a long shot, but does anyone else used successfully the official Elasticsearch-Hadoop connector with searchguard enabled ?
I’ve been trying with Pig and I can’t seem to be able to make it work. At first I had an LDAP error so I switched to the always successful backend “com.floragunn.searchguard.authentication.backend.simple.AlwaysSucceedAuthenticationBackend”

The LDAP error was:

ERROR
pigstats.PigStats: ERROR 0: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: AuthException[org.apache.directory.api.ldap.model.exception.LdapAuthenticationException:
]; nested: LdapAuthenticationException;

When disabling LDAP backend, I’ve got another error, which I can’t seem to solve:

[main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 0: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cluster state volatile; cannot find node backing shards - please check whether your cluster is stable

and in the Elastic logs:

[WARN ][com.floragunn.searchguard.filter.SearchGuardActionFilter] Cannot determine types for indices:admin/exists (class org.elasticsearch.action.admin.indices.exists.indices.IndicesExistsRequest) due to types method not found

my Pig job is:

REGISTER /root/elasticsearch-hadoop-pig-2.1.0.jar

A = LOAD ‘/files/test.json’ USING PigStorage() AS (json:chararray);
STORE A INTO ‘testingindex/test’ USING org.elasticsearch.hadoop.pig.EsStorage(‘es.input.json=true’, ‘es.net.http.auth.user=simon’ , ‘es.net.http.auth.path=simon’, ‘es.nodes=192.168.1.1’, ‘es.nodes.discovery=false’ );

Let me know if you have any ideas. Thank you.

I’ve changed my configuration to

REGISTER /root/elasticsearch-hadoop-pig-2.1.0.jar

A = LOAD ‘/files/test.json’ USING PigStorage() AS (json:chararray);
STORE A INTO ‘testingindex/test’ USING org.elasticsearch.hadoop.pig.EsStorage(‘es.input.json=true’, ‘es.net.http.auth.user=simon’ , ‘es.net.http.auth.path=simon’, ‘es.nodes=192.168.1.1’, ‘es.nodes.discovery=false’, ‘es.nodes.client.only=true’ );

As it seems that without es.nodes.client.only=true the job tries to connect to every nodes and I just want it to connect to the nodes listed in the es.nodes parameters.

Now I have an error about cluster stability:

15/08/14 16:19:53 ERROR grunt.GruntParser: ERROR 0: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Client-only routing specified but no client nodes with HTTP-enabled available; node discovery is disabled and none of nodes specified fits the criterion [192.168.1.1:9200]

Thank you, have a great day.