How to get DistributedFileSystem? - hadoop

I am following some example to try to get the DistributedFileSystem using the following, however I found the following returns deprecated error
FileSystem fs=FileSystem.get(conf);
DistributedFileSystem hdfs = (DistributedFileSystem) fs;
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.fs.LocalFileSystem cannot be cast to
org.apache.hadoop.hdfs.DistributedFileSystem at
Hadoop.File.infoNode(File.java:55) at
Hadoop.Driver.main(Driver.java:8) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
How to fix this?

This is not a depreciation error
LocalFileSystem cannot be cast
Your conf object needs to be initialized with a fs.defaultFS that starts with hdfs:// instead of file://
Check out the core-site.xml file in your HADOOP_CONF_DIR to set the property

Related

MapReduce, FileNotFoundException

Hadoop 2.9.1, standalone installation.
The hdfs directory is organized by time (yyyyMMdd/HH/mm), like, hdfs://server1:9000/foo/20190410/10/00. And there're several files in each minute.
What I need to do is, process data for each hour, for example, process all data under hdfs://server1:9000/foo/20190410/10. So the mapreduce input setting is something like,
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.class);
Path inputPath = new Path("hdfs://server1:9000/foo/20190410/10");
org.apache.hadoop.mapreduce.lib.input.SequenceFileAsBinaryInputFormat.addInputPath(job, inputPath);
But I keep getting this,
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://server01:9000/foo/20190410/10/00/data
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1533)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1526)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1526)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:67)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:393)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:314)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:331)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:202)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at com.misc.mr.TestJob.main(TestJob.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I have no idea why it try to access path hdfs://server01:9000/foo/20190410/10/00/data
If the input is a file instead of a folder (for example hdfs://server1:9000/foo/20190410/10/00/part1), it works fine.
Can anyone please help to give some explanation? Many thanks.
Figured out.
Set mapreduce.input.fileinputformat.input.dir.recursive to true.
Or
In code, call, FileInputFormat.setInputDirRecursive(job, true)

How to disable logs in Hive while writing to a file

I have a use case where i am executing hive query and storing output to a file.
hive -S -e "SELECT * from test.employee where empid=1" > /mapr/Piyush/test/output.txt
The query is executing fine but i am getting logs also along with the data in the file. I am guessing it's because of the log4j properties. The problem here is i have no access to the log4j config file so i can't do any changes in it.
I tried setting couple of configuration like.
set hive.root.logger=ERROR, console
and
set hive.root.logger=INFO,console
and
set hive.server2.logging.operation.enabled=false
But nothing helps. I need just the data from the table in the file. Please let me know if i am missing anything or any solution to this issue.
Sample output file
2017-06-09 11:49:18,265 main ERROR Cannot access RandomAccessFile {}) java.io.FileNotFoundException: /mapr/pankaj-hive.log (Permission denied)
2017-06-09 11:49:18,272 main ERROR Unable to invoke factory method in class class org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender for element RollingRandomAccessFile. java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:813)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:753)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:745)
at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:389)
at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:169)
at org.apache.logging.log4j.core.config.builder.impl.DefaultConfigurationBuilder.build(DefaultConfigurationBuilder.java:158)
at org.apache.logging.log4j.core.config.builder.impl.DefaultConfigurationBuilder.build(DefaultConfigurationBuilder.java:43)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:149)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:46)
at org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:413)
at org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:257)
at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:519)
at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:536)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:214)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:146)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:185)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:103)
at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:43)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:29)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:284)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:655)
at org.apache.hadoop.util.ShutdownHookManager.<clinit>(ShutdownHookManager.java:44)
at org.apache.hadoop.util.RunJar.run(RunJar.java:200)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalStateException: ManagerFactory [org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$RollingRandomAccessFileManagerFactory#48aca48b] unable to create manager for [/mapr/pankaj-hive.log] with data [org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$FactoryData#13fd2ccd]
at org.apache.logging.log4j.core.appender.AbstractManager.getManager(AbstractManager.java:73)
at org.apache.logging.log4j.core.appender.OutputStreamManager.getManager(OutputStreamManager.java:61)
at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager.getRollingRandomAccessFileManager(RollingRandomAccessFileManager.java:84)
at org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender.createAppender(RollingRandomAccessFileAppender.java:206)
... 33 more
2017-06-09 11:49:18,277 main ERROR Null object returned for RollingRandomAccessFile in Appenders.
2017-06-09 11:49:18,281 main ERROR Unable to locate appender "DRFA" for logger config "root"
2017-06-09 11:49:19,640 main ERROR Cannot access RandomAccessFile {}) java.io.FileNotFoundException: /mapr/pankaj-hive.log (Permission denied)
2017-06-09 11:49:19,644 main ERROR Unable to invoke factory method in class class org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender for element RollingRandomAccessFile. java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.logging.log4j.core.config.plugins.util.PluginBuilder.build(PluginBuilder.java:136)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createPluginObject(AbstractConfiguration.java:813)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:753)
at org.apache.logging.log4j.core.config.AbstractConfiguration.createConfiguration(AbstractConfiguration.java:745)
at org.apache.logging.log4j.core.config.AbstractConfiguration.doConfigure(AbstractConfiguration.java:389)
at org.apache.logging.log4j.core.config.AbstractConfiguration.initialize(AbstractConfiguration.java:169)
at org.apache.logging.log4j.core.config.builder.impl.DefaultConfigurationBuilder.build(DefaultConfigurationBuilder.java:158)
at org.apache.logging.log4j.core.config.builder.impl.DefaultConfigurationBuilder.build(DefaultConfigurationBuilder.java:43)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:149)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:46)
at org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:236)
at org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:445)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:228)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:140)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:113)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:98)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:156)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jDefault(LogUtils.java:155)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:91)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:83)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:66)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:661)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:646)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalStateException: ManagerFactory [org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$RollingRandomAccessFileManagerFactory#48aca48b] unable to create manager for [/mapr/pankaj-hive.log] with data [org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$FactoryData#bff34c6]
at org.apache.logging.log4j.core.appender.AbstractManager.getManager(AbstractManager.java:73)
at org.apache.logging.log4j.core.appender.OutputStreamManager.getManager(OutputStreamManager.java:61)
at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager.getRollingRandomAccessFileManager(RollingRandomAccessFileManager.java:84)
at org.apache.logging.log4j.core.appender.RollingRandomAccessFileAppender.createAppender(RollingRandomAccessFileAppender.java:206)
... 34 more
2017-06-09 11:49:19,649 main ERROR Null object returned for RollingRandomAccessFile in Appenders.
2017-06-09 11:49:19,652 main ERROR Unable to locate appender "DRFA" for logger config "root"
1 piyush bangalore 20
I want output as just
1 piyush bangalore 20
Hi as per hive documentation you can't change logging property by setting hive.root.logger via 'set' command because hive reads the logger properties at time of initialization i.e prior to opening cli.
you can do it like this:
hive --hiveconf hive.root.logger=OFF -S -e "SELECT * from test.employee where empid=1" > /mapr/Piyush/test/output.txt
P.S i have assumed that you don't want any logging. You can change logging level as per your requirement.
Below i am attaching screenshot of official documentation with link
Click for official Documentation

Null Pointer Exception in mapreduce and GlobStatus

I recently came across an issue when using Hadoop FileSystem API and GlobStatus while writing Mapreduce application.
Here's snippet of the driver program.
FileSystem fs = FileSystem.get(URI.create(args[0]), conf);
Path path = new Path(args[0] + args[1]);
FileStatus[] status = fs.globStatus(path);
Path[] paths = FileUtil.stat2Paths(status);
and here is how I invoke the program
yarn jar MyMapReduceTest01.jar com.abc.test.MyMapRedTestDriver /user/root/raw_data/ abc*
This results in the following exception
Exception in thread "main" java.lang.NullPointerException
at com.abc.test.MyMapRedTestDriver.run(MyMapRedTestDriver.java:42)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.abc.test.MyMapRedTestDriver.main(MyMapRedTestDriver.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
The files in the /user/root/raw_data/ directory are named as
abc_01.txt
abc_02.txt
...
there seems to be a problem in the way the globstatus is handled in the Hadoop filesystem API.
To make the above program work without any code change I just needed to change the command to
yarn jar MyMapReduceTest01.jar com.abc.test.MyMapRedTestDriver /user/root/raw_data/ abc_*

No valid local directories in property: mapred.local.dir

I am running the VM in pseudo mode.
Due to some resource related issues (Name Node in safe mode, not able to leave) I had to format and restart the namenode of my Cloudera 4.x. I didn't have any other choice.
I used the steps provided here:
Writing to HDFS could only be replicated to 0 nodes instead of minReplication (=1)
After that I am able to properly use get/put command in hdfs which means I have read/write permission.
Now, when I try to submit the job, I am getting following exception.
Exception in thread "main"org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3491)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:474)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
Caused by: java.io.IOException: No valid local directories in property: mapred.local.dir
at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:1678)
at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:500)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:409)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3489)
... 13 more
at org.apache.hadoop.ipc.Client.call(Client.java:1160)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:225)
at org.apache.hadoop.mapred.$Proxy10.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:973)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:561)
at clustering.mapreduce.KMeansClusteringJob.main(KMeansClusteringJob.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)**
When I searched for above exception I found multiple links stating that mapred.local.dir should be properly defined and if not set then hadoop.tmp.dir is used.
I explicitly set mapred.local.dir in mapred-site.xml and given full permission to the default folder (/var/lib/hadoop-hdfs/cache).
The problem still persists.
Can someone please help in solving the issue?
Regards
Didn't give proper permission to the local directory -- Marking as Community wiki as answer was provided in the comments

hive-site.xml not found on classpath

while running giraph hiverunner i get following error regarding classpath related to hive-site.xml, i have already set hive-env.sh, and bash.bashrc, but error is still coming.. any help how to set classpath and resolve this error..?? any thing else i need to modify..??
i have already tried hivejdbc ex its executes without any error.. but while working hadoop with jars gives error..
will be great full for any help..
13/01/16 11:58:23 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
Exception in thread "main" java.lang.NullPointerException
at org.apache.giraph.io.hcatalog.HiveGiraphRunner.adjustConfigurationForHive(HiveGiraphRunner.java:212)
at org.apache.giraph.io.hcatalog.HiveGiraphRunner.run(HiveGiraphRunner.java:164)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.giraph.io.hcatalog.HiveGiraphRunner.main(HiveGiraphRunner.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Did you set on Hadoops classpath as well? In hadoop-env.sh there is as line export HADOOP_CLASSPATH=. Uncomment it and add Hives conf and lib folder to it. That does it for me.
Add export HADOOP_CLASSPATH= $HIVE_HOME/conf:$HIVE_HOME/lib in bash_rc or bash_profile

Resources