Issue when writing to elasticsearch using es-hadoop - hadoop

Am getting this exception when I'm trying to write to Elasticsearch using mapreduce program with es-hadoop. Am trying to write to index=employee and type=basic which already exists in my Elasticsearch cluster.
My stack trace :-
Exception in thread "main"
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No resource
['es.resource'] (index/query/location) specified at
org.elasticsearch.hadoop.util.Assert.hasText(Assert.java:30) at
org.elasticsearch.hadoop.mr.EsOutputFormat.init(EsOutputFormat.java:257)
at
org.elasticsearch.hadoop.mr.EsOutputFormat.checkOutputSpecs(EsOutputFormat.java:233)
at
org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at
org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at
com.mstack.mapreduce.DIGDriver.main(DIGDriver.java:22) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Driver class :-
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "es-hadoop");
job.setJarByClass(DIGDriver.class);
conf.set("es.nodes", "localhost:9200");
conf.set("es.port", "9200");
conf.set("es.resource", "employee/basic");
job.setNumReduceTasks(0);
job.setOutputFormatClass(EsOutputFormat.class);
job.setMapperClass(DIGMapper.class);
job.setMapOutputValueClass(MapWritable.class);
conf.setBoolean("mapreduce.map.speculative", false);
conf.setBoolean("mapreduce.reduce.speculative", false);
boolean status = job.waitForCompletion(true);
if (status) {
System.exit(0);
} else {
System.out.println("Job Failed : Some error!");
System.exit(1);
}

Resolved myself by changing the configs :-
conf.set("es.nodes", "localhost");
conf.set("es.port", "9200");

Related

java action to access kerberos Hive(SSL enabled)

I have been trying to access Hive(Kerberos and SSL enabled) server from java action.
I need to perform some Hive actions like accessing Hive database and tables, working with hive partitions etc.
public static void main(String[] args) throws Exception, IOException {
Configuration conf=new Configuration();
UserGroupInformation.setConfiguration(conf);
conf.set("hadoop.security.authentication", "kerberos");
conf.set("hadoop.security.authorization", "true");
System.setProperty("java.security.auth.login.config","C:\\krb\\jaas.conf");
System.setProperty("sun.security.jgss.debug","true");
System.setProperty("javax.security.auth.useSubjectCredsOnly","false");
String principalConfig = System.getProperty("kerberosPrincipal", "hive/cdlk-tars-control#ABC.COM");
String keytab = System.getProperty("kerberosKeytab", "C:\\krb\\hive.keytab");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab(principalConfig, keytab);
final UserGroupInformation ugi = UserGroupInformation.getLoginUser();
System.out.println("Login as: " + ugi.getUserName());
HiveConf hiveConf=new HiveConf();
hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, args[0]);
Hive hive = Hive.get(hiveConf);
hive.getConf().setIntVar(HiveConf.ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT, METASTORE_CLIENT_TIMEOUT);
List<String> database=hive.getAllDatabases();
//database.forEach(x->System.out.println(x));
}
Able to login, but unable to fetch Hive with metastore.
Login as: hive/cdlk-tars-control#ABC.COM
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:242)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:394)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:338)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:318)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:294)
at com.cdlk.util.hive.HiveTest.main(HiveTest.java:36)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4108)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:254)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:237)
... 5 more
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3727)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3715)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2628)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
at com.sun.proxy.$Proxy10.getAllFunctions(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2562)
at com.sun.proxy.$Proxy10.getAllFunctions(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:4105)
... 7 more
jass configuration
com.sun.security.jgss.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
useTicketCache=false
principal="hive/cdlk-tars-control#ABC.COM"
doNotPrompt=true
keyTab="C:\krb\hive.keytab"
debug=false;
};
Please suggest the way to access Hive from java

JobInstanceAlreadyCompleteException when running a Batch-Task in Spring cloud Dataflow

I have a Batch job working as a Task in spring cloud dataflow. When i try to execute the same task Definition a second time i get the exception:
java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:793)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:774)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:335)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1246)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1234)
at com.tigerbooks.importer.ImportTask.main(ImportTask.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={-spring.cloud.task.executionid=3, -spring.datasource.username=dataflow_stage, -spring.datasource.url=jdbc:mysql://10.59.254.101:3306/dataflow_staging, -spring.datasource.driverClassName=org.mariadb.jdbc.Driver, -import.import-audio-content=false, -spring.datasource.password=bVs64CMlKvcTdkRLWL2zNPANYD3HMB, -import.syncfolder=/import-integrationtest, -spring.cloud.task.name=Integration-Test, -spring.profiles.active=ftp01,dev}. If you want to run this job again, change the parameters.
at org.springframework.batch.core.repository.support.SimpleJobRepository.createJobExecution(SimpleJobRepository.java:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java
My Job configuration looks like this:
#EnableBatchProcessing
public class JobConfiguration {
#Bean
public Job importJob() {
return jobBuilderFactory.get("Import-Products").incrementer(new RunIdIncrementer())
.flow(step1())
.next(step2())
.end()
.build();
}
}
As far as i found out, a JobParametersIncrementor is supposed to fix this exception by adding a incrementing runId to the parameters. But in my case i can only run the job once on the server, then i have to clear the database (mysql) and the RunId is nowhere in the database.
As #Michael Minela already guessed my jobs where started with the wrong docker image-version. So the RunIdIncrementer was not present and it did not work.
After registering a new app with the correct version everything worked. Thanks.

Unable to connect spark standlone application with kerberized hadoop

I am using Spark standalone 1.6.x version to connect kerberos enabled hadoop 2.7.x
JavaDStream<String> status = stream.map(new Function<String, String>() {
public String call(String arg0) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = null;
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
conf.set("hadoop.security.authentication", "kerberos");
conf.set("dfs.namenode.kerberos.principal", "hdfs/_HOST#REALM");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("abc","~/abc.ketyab"));
System.out.println("Logged in successfully.");
fs = FileSystem.get(new URI(activeNamenodeURI), conf);
FileStatus[] s = fs.listStatus(new Path("/"));
for (FileStatus status : s) {
System.out.println(status.getPath().toString());
}
return "success";
}
});
But getting below exception
User : abc#REALM (auth:KERBEROS) Caused by: java.io.IOException:
Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS]; Host Details : local host is:
"hostname1/0.0.0.0"; destination host is: "hostname2":8020; at
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at
org.apache.hadoop.ipc.Client.call(Client.java:1472) at
org.apache.hadoop.ipc.Client.call(Client.java:1399) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy44.create(Unknown Source) at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:295)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy45.create(Unknown Source) at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1725)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668) at
org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593) at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908) at
org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889) at
com.abc.HDFSFileWriter.createOutputFile(HDFSFileWriter.java:354) ...
21 more Caused by: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS] at
org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:643)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:730)
at
org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at
org.apache.hadoop.ipc.Client.call(Client.java:1438) ... 43 more Caused
by: org.apache.hadoop.security.AccessControlException: Client cannot
authenticate via:[TOKEN, KERBEROS] at
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172)
at
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396)
at
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:553)
at
org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:368)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:722) at
org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:718) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415)

Hadoop:Cascading FlowException

I Installed hadoop 1.0.4 and hive 0.12.When i run the Cascading Pattern on this it Give Cascading flow exception. when i run with following hadoop command
hadoop jar bulid/libs/pattern-example*.jar
i am getting above mention exception,for reference i include Cascading Code.
Tap inputTap = new Hfs(new TextDelimited(true, "\t"),
"hdfs://hdmaster:54310/user/hive/warehouse/temp/Dataformated/finalformated");
String classifyPath=Output Path;
hdfsPath = classifyPath/pmml File Name;
Tap classifyTap = new Hfs(new TextDelimited(true, "\t"),
classifyPath/pmml File Name));
String formatLocalHdfsData = classifyPath/PMML FILE NAME);
FlowDef flowDef = FlowDef.flowDef().setName("classify")
.addSource("input", inputTap)// input is LFs or HFS
.addSink("classify", classifyTap);
flowDef.addAssemblyPlanner(pmmlPlanner);
Flow classifyFlow = flowConnector.connect(flowDef);
classifyFlow.writeDOT("dot/classify.dot");
classifyFlow.complete();
Cascading Flow Exception
Exception in thread "main" cascading.flow.FlowException: step failed: (1/1) ...eg_Nocoerce20150513093050, with job id: job_201505130921_0003, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:221)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:149)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Log File Exceprtion
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: cascading.flow.FlowException: internal error during mapper configuration
at cascading.flow.hadoop.FlowMapper.configure(FlowMapper.java:99)
... 14 more
Caused by: java.io.InvalidClassException: cascading.tap.hadoop.Hfs; local class incompatible: stream classdesc serialVersionUID = -2723557385578774808, local class serialVersionUID = -4246440312226820384
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:560)
Please Help Me to solve this Issue.
I resolve the issue. In log file I was getting serialVersionID compatibility issue. Generate the new SerialVersionID and it worked.

org.apache.hadoop.net.StandardSocketFactory not found

configuration = new Configuration();
configuration.set("fs.default.name",NAME_NODE_URL);
hdfs = FileSystem.get(configuration);
i am getting the below exception while using the code specified above,
java.lang.RuntimeException: Socket Factory class not found: java.lang.ClassNotFoundException: Class org.apache.hadoop.net.StandardSocketFactory not found
at org.apache.hadoop.net.NetUtils.getSocketFactoryFromProperty(NetUtils.java:142)
at org.apache.hadoop.net.NetUtils.getDefaultSocketFactory(NetUtils.java:122)
at org.apache.hadoop.net.NetUtils.getSocketFactory(NetUtils.java:100)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:477)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166)
at com.arista.cvp.commons.db.HdfsClient.copyfromLocaltoHdfs(HdfsClient.java:55)
at com.arista.cvp.services.hadoop.HDFSService.copyFromLocal(HDFSService.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
could anyone help in resolving the issue?
You definitely need either the hadoop-common-2.x jar on the classpath or the hadoop-core-1.x!

Resources