HORTONWORKS - Hbase/Phoenix - WALEditCodec - missing - hadoop

I am receiving the following error while trying to run Phoenix on top of Hbase:
EXCEPTION #1:
2017-11-07 12:40:12,620 WARN [RS_LOG_REPLAY_OPS-XXX:16020-0]
regionserver.SplitLogWorker: log splitting of
WALs/XXX.XXX.XXX.XXX,16020,1507179047656-
splitting/XXX.XXX.XXX.XXX%2C16020%2C1507179047656.default.1507179049782 failed, returning error
java.io.IOException: Cannot get log reader
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
:$
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:355)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:839)
at org.apache.hadoop.hbase.wal.WALSplitter.getReader(WALSplitter.java:763)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:297)
at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:235)
at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:36)
at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec.create(WALCellCodec.java:103)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.getCodec(ProtobufLogReader.java:297)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.initAfterCompression(ProtobufLogReader.java:307)
at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:82)
at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.init(ProtobufLogReader.java:164)
at org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:303)
... 11 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:32)
... 17 more
APPLIED PATCHES #1:
I have applied the following settings through the Ambari web UI for the advanced Hbase configs as specified by the Hortonworks document:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-upgrade/content/configure-phoenix-25.html
EXCEPTION #2
FATAL [RS_LOG_REPLAY_OPS-XXX:16020-1] conf.Configuration: error parsing conf core-site.xml
java.io.FileNotFoundException: /etc/hadoop/2.6.1.0-129/0/core-site.xml (Too many open files)
APPLIED PATCHES #2
I checked each 'core-site.xml' file on each server that contained a Hbase region server and made sure it ended with a </configuration>. As well as the core-site.xml files in the specified directory '/etc/hadoop/2.6.1.0-129/0/core-site.xml'
Haven't been able to find any other information regarding this issue.

I went into the HDFS and deleted all WAL split logs using the following command:
hdfs dfs -rm -r /apps/hbase/data/WALs/*splitting*
This resolved exception #1. Keep in mind from what I've read this will incur data loss.
For exception #2 I went back and checked the open file limits for each server (ulimit -n) and updated where applicable with respect to the Hortonworks doc:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_security/content/kerb-config-limits.html

Related

Getting error as Failed to create data storage when trying to load the data from HDFS with MovieLens data

I am trying to load data from HDFS to Pig but I am getting error as Failed to create Data Storage.
The command that I executed was:
movies = LOAD 'hdfs://localhost:9000/Movie_Lens/ratings' USING PigStorage(':') AS (user_id, dummy1, movie_id, dummy2, movie_rating, dummy3, timestamp);
I tried to find the mentioned problem in stack overflow but the link that I got are not related to HDFS and Pig, they are related to HDFS and HBase or Pig and HBase.
The detail of the log file is mentioned below.
Somewhere in the log file I found this mentioned:
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
Pig Stack Trace
ERROR 1200: Failed to create DataStorage
Failed to parse: Failed to create DataStorage
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:201)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:565)
at org.apache.pig.Main.main(Main.java:177)
Caused by: java.lang.RuntimeException: Failed to create DataStorage
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:53)
at org.apache.pig.builtin.JsonMetadata.findMetaFile(JsonMetadata.java:109)
at org.apache.pig.builtin.JsonMetadata.getSchema(JsonMetadata.java:189)
at org.apache.pig.builtin.PigStorage.getSchema(PigStorage.java:538)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:901)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:191)
... 10 more
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:70)
... 23 more
To solve this problem I tried doing 'ant'
so when I run the command
bash ant -version
in ant bin folder it is working
but when I am running the command
bash ant clean jar-all -Dhadoopversion=23
in bin folder it is not working. In some of the links I found that new version of pig does not have jar-all command so I tried the following command
bash ant clean jar -Dhadoopversion=23
and this command is also not working.

Job fails to read from one ORC file and write a subset to another

Working in the Apache Pig interactive shell in HDP 2.3 for Windows, I've got an existing ORC file in /path/to/file. If I load and then save that using:
a = LOAD '/path/to/file' USING OrcStorage('');
STORE a INTO '/path/to/second_file' USING OrcStorage('');
Then everything works. However, if I try:
a = LOAD '/path/to/file' USING OrcStorage('');
b = LIMIT a 10;
STORE b INTO '/path/to/third_file' USING OrcStorage('');
Then I get the following error traceback in the logs for the second job (out of two that it schedules):
2015-08-25 16:03:42,161 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/orc/OrcNewOutputFormat
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:657)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:726)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:251)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:88)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:71)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:289)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:476)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1560)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:377)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1518)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
I suspect that the classpath for the two jobs is different, causing a ClassNotFound. Is that likely to be the case? If so, how can I fix it? (Bonus question: Why has this happened?)
Check the dependent library for OrcStorage is placed in all nodes.
The first option only spawn single job
The second option will spawn multiple jobs which maybe run in different machine
which doesnt have the dependent library in its classpath.

HiveServer Class Not Found Exception

I'm trying to run hive from the command prompt it is working absolutely fine. But when I try running hiveserver using "hive --service hiveserver" command, I'm getting the following exception.
Starting Hive Thrift Server
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
So I then tried with the command "hive --service hiveserver2"; still I'm not finding any solution.
Can anybody please suggest a solution for this problem.
May be another process (another hiveserver) already listening on port 10000.
can you check it by :
netstat -ntulp | grep ':10000' to see it and if found then kill the process.
Otherwise start the server on another port.
By the way which version you are using ?
This error occurred to me when it can't find hive-service-*.jar in hadoop classpath. Just copy the hive-service-*.jar to your hadoop lib folder or export classpath in hadoop-env.sh. I have mentioned how to add classpath below.
Add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/lib/hive-*.jar
I have mentioned the path for hive as /usr/local/hive since i have hive installed at that location. Change it to point to your hive installation.

Class not found in a MapReduce job

I have a mapreduce job which takes an avro file as input. I export it along with all the required libraries (jar library files) into a jar file. I have 2 different clusters, one is HDInsight simulator and the other one in HDP sandbox. It works fine on the HDP sandbox but it gives me an error on the HDInsight simulator and cannot find AvroInputFormat class. I tried running the job with the -libjar option but it didn't help. Here is the error message:
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:686)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.avro.mapred.AvroInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 10 more
This looks weired because it runs fine on one cluster! Does anyone know what can be the problem?

EMR Hadoop Pig job error "Internal error creating job configuration"

I have a PIG job running on Amazon EMR and suddnly it has stopped working giving the following error:
Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:855)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.execute(PigServer.java:1239)
at org.apache.pig.PigServer.executeBatch(PigServer.java:333)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:479)
at org.apache.pig.Main.main(Main.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:875)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480)
... 17 more
================================================================================
Does anyone know why or what might be the problem? this is one of the most vague errors I have ever seen.
The problem actually turned out to be that PIG was unable to locate one of the input files to be processed, yet the error doesn't even remotely suggest a missing file issue.

Resources