My Inputformat on hive cannot find when calling MapReduce - hadoop

I'm using CDH cluster.
After writing an Inputformat, I copied it to all HiveServer2 libs and re-start HiveServer2.
When I create a table using my inputformat and write a common query like 'select *', it can run and everything is right!
BUT!
When I write a query which should using MapReduce like 'where' or 'join' or 'count', it will throw a Class Not Found error:
Error: java.io.IOException: cannot find class
com.my.parquet.MyParquetInputFormat at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:714)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Wish you helps!

Related

/var/mapr/cluster/yarn/rm/staging/username/.staging/job_id_1234_4321/libjars/myudfs.jar (Invalid argument)

We are using Mapr with PAM Authentication to execute select query on hive. Select query uses a myudfs.jar where we have defined our UDFs.
I have tried many links but couldn't figure out why this is happening. From the stacktrace it seems like Hadoop is not able to copy the jar into libjars directory inside /.staging. But UDF is there in the classpath.
Any help would be really appreciated.
java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 2063.142.526190 /var/mapr/cluster/yarn/rm/staging/username/.staging/job_112233333_0002/libjars/myudfs.jar (Invalid argument)
at com.mapr.fs.Inode.throwIfFailed(Inode.java:390)
at com.mapr.fs.Inode.flushPages(Inode.java:505)
at com.mapr.fs.Inode.releaseDirty(Inode.java:583)
at com.mapr.fs.MapRFsOutStream.dropCurrentPage(MapRFsOutStream.java:73)
at com.mapr.fs.MapRFsOutStream.write(MapRFsOutStream.java:85)
at com.mapr.fs.MapRFsDataOutputStream.write(MapRFsDataOutputStream.java:39)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:376)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:346)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:297)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:203)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:201)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:79)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:283)
Following is the process how we execute the query and what we do before executing it.
First we create a function as given below
CREATE FUNCTION func_name AS 'com.test.ClassContainingMapper'
This ClassContainingMapper class is packed in a udf jar named myudf.jar. We have this jar added in the classpath by using property hive.aux.jars.path.
Now, our query is as follows:
Select func_name(col1, col2) from dbname.tablename;
When this query is executed Hadoop tries to upload the jars mentioned in the classpath to it's libjars folder under staging directory. That's when it fails.
Interesting thing is that on one similar cluster this query executes successfully but on other cluster it is being failed with the exception mentioned in the stacktrace.
UPDATE:
Actually there is another query executed before the Select query. It's
ADD JAR '/path/to/the/jar/file/myudf.jar';
This makes more sense that when this query is executed then the jar is uploaded to the cluster. And during that action of uploading query fails.

Export Avro file using Sqoop to Oracle

We are using hortonworks distro and on sqoop 1.4.6. I am trying to load an Oracle table with an Avro file using sqoop export but fails with the following stacktrace
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: Can't parse input data: 'Objavro.schema��{"type":"record"'
at HADOOP_CLAIMS.__loadFromFields(HADOOP_CLAIMS.java:208)
at HADOOP_CLAIMS.parse(HADOOP_CLAIMS.java:156)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more
Caused by: java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:470)
at java.math.BigDecimal.<init>(BigDecimal.java:739)
at HADOOP_CLAIMS.__loadFromFields(HADOOP_CLAIMS.java:205)
Looks like its trying to use TextExportMapper instead of AvroExportMapper. I found this issue that should have addressed this problem in version 1.4.5 - https://issues.apache.org/jira/browse/SQOOP-1283
Any idea why this is continuing to happen in sqoop 1.4.6? Is there a different component that needs to be patched as well?

How can I define or solve this error for hadoop streaming?

I got some errors for hadoop mr job, how can I define this problem for hadoop streaming?
Error: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:63)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Error: java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:63)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
(unfortunately I don't have permission to post any source code)
I'm hoping you resolved this somehow.
But there are probably empty files that constitute a part of the table you are querying or the directory / batch reading from.
Running with a check for safeguarding against the occurrence of files with size 0 would be great.
These error logs are not helping much. Since you dont have the permission to share the code, you can try the following steps.
Check the dependent libraries used in your code is present in all the nodes of your hadoop cluster. This is required because the task may execute in any of the worker nodes.
Get a sample input file and execute the code locally before running it as a mapreduce program. You can execute it locally in the following way.
cat sampleinput | python mappercode.py

Cassandra : Caused by: InvalidRequestException(why:Invalid restrictions)

I am very new to cassandra and using 1.2.10. I have a primary key column which is of timestamp datatype. Now I am trying retrieve the data for the date ranges. Since we know we can't use between in cassandra, I am using greater than(<) and less than(>) to get the date ranges. This perfectly seems to work in cassandra's cqlsh. But with pig_cassandra integration, here is the load function.
cql://<keyspace>/<columnfamily>?where_clause=time1%3E1357054841000590+and+time1%3C1357121822000430"
And here is the error it throws.
2013-12-18 04:32:51,196 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: java.lang.RuntimeException
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:651)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:352)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.computeNext(CqlPagingRecordReader.java:275)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.getProgress(CqlPagingRecordReader.java:181)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: InvalidRequestException(why:Invalid restrictions found on time1)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)
... 17 more
Any help is highly appreciated!!!
We seem to have fixed the problem by modifying the cassandra source, compiling and then adding the obtained jar file in the lib folder.

FileNotFoundException for hive/lib/hive-builtins-0.9.0.jar in Hive

I've a configured Hadoop 0.23 on my local box and got it work with a simple map-reduce wordcount program. I Have configured Hive to work with it. All the DDL queries works fine. But when i fire queries that have aggregates (which will trigger Map-educe jobs)
java.io.FileNotFoundException: File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:252)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:290)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:361)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:609)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:604)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:604)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar)'
You should create the same file /Users/varadham/projects/hadoop/hive/lib/hive-builtins-0.9.0.jar in your hadoop file system. Then it should work.
Make sure you have the jar in HDFS.

Resources