'_partition.lst' does not exist error - hadoop

I tried to use 'RandomSampler' to partition the mapper results. And here're the related codes:
JobConf conf = new JobConf(SortExample.class);
conf.setJobName("sortexample");
...
conf.setPartitionerClass(TotalOrderPartitioner.class);
InputSampler.Sampler<IntWritable, Text> sampler =
new InputSampler.RandomSampler<IntWritable, Text>(0.1, 100);
InputSampler.writePartitionFile(conf, sampler);
conf.setNumReduceTasks(2);
In my understanding, the sampler will write into a file called, by default, _patition.lst describing the partition that the job will automatically use to decide which key/value pairs to send to which reducers.
But after running, I got the following error:
ava.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.<init>(MapTask.java:574)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 10 more
Caused by: java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapred.lib.TotalOrderPartitioner.configure(TotalOrderPartitioner.java:91)
... 15 more
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
And I didn't know where the file '_partition.lst' is? There must be some mis-understanding here, and I appreciate any helps~

I think you need to call getSample() on one of the InputSample inner classes. That's what reads a file via an InputFormat and actually figures out what splits should go into the _partition.lst file.

Related

MapReduce, FileNotFoundException

Hadoop 2.9.1, standalone installation.
The hdfs directory is organized by time (yyyyMMdd/HH/mm), like, hdfs://server1:9000/foo/20190410/10/00. And there're several files in each minute.
What I need to do is, process data for each hour, for example, process all data under hdfs://server1:9000/foo/20190410/10. So the mapreduce input setting is something like,
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.class);
Path inputPath = new Path("hdfs://server1:9000/foo/20190410/10");
org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.addInputPath(job, inputPath);
But I keep getting this,
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://server01:9000/foo/20190410/10/00/data
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1526)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:67)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:393)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:314)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:331)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:202)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at com.misc.mr.TestJob.main(TestJob.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I have no idea why it try to access path hdfs://server01:9000/foo/20190410/10/00/data
If the input is a file instead of a folder (for example hdfs://server1:9000/foo/20190410/10/00/part1), it works fine.
Can anyone please help to give some explanation? Many thanks.
Figured out.
Set mapreduce.input.fileinputformat.input.dir.recursive to true.
Or
In code, call, FileInputFormat.setInputDirRecursive(job, true)

java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found in Hive

I am trying to process multicharacter delmiter in hive.
I already created a table with the same successfully
create external table showtmp3(doc_name STRING,doc_content STRING) row format SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ('field.delim'='#a#') location '/unmesha/OUT';
Then I need to issue a query as below.
INSERT OVERWRITE DIRECTORY '/unmesha/OUT_tmpShowData' SELECT * showtmp3 limit 10;
But it is giving me the error below
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:153)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:122)
... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:136)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
... 24 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Should I download any jar and put in some location?
Please suggest
This is probably what you are looking for: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-contrib/0.14.0/org/apache/hadoop/hive/contrib/serde2/MultiDelimitSerDe.java
You can download the original jar from Maven repo: https://mvnrepository.com/artifact/org.apache.hive/hive-contrib/0.14.0
Download this jar file and place it into you hive /lib folders of all nodes of your cluster. This should resolve the issue
you have to add hive-contrib.jar file to the hive. Run the below command in hive shell.
add JAR /opt/cloudera/parcels/CDH <version> /jars/hive-contrib.jar
(or)
add jar /usr/lib/hive/lib/hive-contrib-<version>.jar;

No valid local directories in property: mapred.local.dir

I am running the VM in pseudo mode.
Due to some resource related issues (Name Node in safe mode, not able to leave) I had to format and restart the namenode of my Cloudera 4.x. I didn't have any other choice.
I used the steps provided here:
Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)
After that I am able to properly use get/put command in hdfs which means I have read/write permission.
Now, when I try to submit the job, I am getting following exception.
Exception in thread "main"org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3491)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:474)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Caused by: java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:1678)
at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:500)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:409)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3489)
... 13 more
at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at org.apache.hadoop.mapred.$Proxy10.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:561)
at clustering.mapreduce.KMeansClusteringJob.main(KMeansClusteringJob.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)**
When I searched for above exception I found multiple links stating that mapred.local.dir should be properly defined and if not set then hadoop.tmp.dir is used.
I explicitly set mapred.local.dir in mapred-site.xml and given full permission to the default folder (/var/lib/hadoop-hdfs/cache).
The problem still persists.
Can someone please help in solving the issue?
Regards
Didn't give proper permission to the local directory -- Marking as Community wiki as answer was provided in the comments

Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce

I have written a MapReduce program in which I am storing some part of output data into Hive table.
I have used Hive-JDBC driver to access Hive table via MapReduce code.
This program has compiled successfully on local machine.
After this, I created a JAR file and uploaded it on S3. Then I created an elasticmapreduce cluster and started it.
However, it is resulting into below mentioned errors:
java.lang.Throwable: Child Error at
org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused
by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
attempt_201407161054_0001_m_000001_0: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.jdbc.HiveDriver
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
attempt_201407161054_0001_m_000001_0: at
java.security.AccessController.doPrivileged(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:424)
attempt_201407161054_0001_m_000001_0: at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
attempt_201407161054_0001_m_000001_0: at
java.lang.ClassLoader.loadClass(ClassLoader.java:357)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName0(Native Method)
attempt_201407161054_0001_m_000001_0: at
java.lang.Class.forName(Class.java:190)
attempt_201407161054_0001_m_000001_0: at
HubAndAuthority.InputHubMapper.configure(InputHubMapper.java:38)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
attempt_201407161054_0001_m_000001_0: at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
attempt_201407161054_0001_m_000001_0: at
java.lang.reflect.Method.invoke(Method.java:606)
It appears to be an issue of missing Hive-JDBC driver and it should get resolved by adding Hive-JDBC driver in classpath. However, I am not aware of the exact step to do this on Amazon's EMR.
Could you please let me know what is missing from my end and how to resolve it?
Thanks and Regards,
Prafulla
I'm not sure enough, but you should try this:
"Note
If you want your custom classpath to override the original class path, you should set the environment variable, HADOOP_USER_CLASSPATH_FIRST to true so that the HADOOP_CLASSPATH value specified in hadoop-user-env.sh is first."
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config.html
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
Regards,
revet

HBase completebulkload returns exception

I am trying to bulk-populate an HBase table quickly from a text file (several GB) by using the bulk loading method described in the Hadoop docs.
I have created an HFile which I now want to push to my HBase table.
When I use this command:
hadoop jar /home/hxcaine/hadoop/lib/hbase.jar completebulkload /user/hxcaine/dbpopulate/output/cf1 my_hbase_table
The job starts and then I get this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/ThreadFactoryBuilder
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:195)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:696)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:701)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.ThreadFactoryBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 17 more
However, I can see that the Guava jar is in my classpath and when I check inside the jar I can see ThreadFactoryBuilder.class.
I am using these versions (and stuck with them):
Hadoop 0.20.2-cdh3u3
HBase 0.90.4-cdh3u3
Guava jar: /usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar
I do have an older Guava jar in my classpath but I don't know where it came from, I don't suppose it should have an effect.
Any ideas?
what happens if you run:
export HADOOP_CLASSPATH=`hbase classpath`
before running the load? From the stack trace, it looks like the jar is needed by one of the actual tasks though I am surprised to see that this actually kicks off an M/R job.

Resources