I'm running a MapReduce task against Wikipedia dump with history using XmlInputFormat for parsing the XML.
"xxx_m_000053_0" always stop at 70% before it's kill due to time out.
in the console:
xxx_m_000053_0 failed to report status for 300 seconds. Killing!
I increase the timeout to 2 hours. It didn't work.
In xxx_m_000053_0 log file:
Processing split: hdfs://localhost:8020/user/martin/history/history.xml:3556769792+67108864
I was expecting something wrong in history.xml in offset [3556769792,3623878656]. I split the file from this offset and run it in hadoop. It worked... (???)
In xxx_m_000053_0 log file:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2326)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
**at com.doduck.wikilink.history.XmlInputFormat$XmlRecordReader.close(XmlInputFormat.java:109)**
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector#54e9a7c2
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
So I'm thinking it might be a configuration problem? Why is my file system stop?
Something wrong with XmlInputFormat ?
My empty mapper:
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//nothing to do...
}
My Main:
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("xmlinput.start", "<page>");
conf.set("xmlinput.end", "</page>");
Job job = new Job(conf, "wikipedia link history");
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(XmlInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx9216m</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>300000</value>
</property>
My core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/WD/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
Related
I am trying to access hive metastore and I am using SparkSql for this . I have setup sparksession , but when I run my program and see log I see this exception
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 61 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 62 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 68 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
I am running a servlet which accesses following code
public class HiveReadone extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* #see HttpServlet#HttpServlet()
*/
public HiveReadone() {
super();
// TODO Auto-generated constructor stub
}
/**
* #see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
*/
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
response.getWriter().append("Served at: ").append(request.getContextPath());
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "hdfs://saurab:9000/user/hive/warehouse")
.config("mapred.input.dir.recursive", true)
.config("hive.mapred.supports.subdirectories", true)
.config("hive.vectorized.execution.enabled", true)
.master("local")
.getOrCreate();
response.getWriter().println(spark);
Nothing gets print on browser accept output from response.getWriter().append("Served at:
").append(request.getContextPath()); which is Served at: /hiveServ
Please take a look at my conf/hive-site.xml
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://saurab:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/home/saurab/hadoopec/hive/lib/hive-serde-2.1.1.jar</value>
</property>
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://saurab:9000/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<!--Make sure that <value> points to the Hive Metastore URI in your cluster -->
<value>thrift://saurab:9083</value>
<description>URI for client to contact metastore server</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface.
Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
As far as I have read if we configure hive.metastore.uris spark will connect to hive metastore, but in my case it is not and giving me above error.
To configure spark on hive try to copy your hive-site.xml to the spark/conf directory
HOW TO CONFIGURE JDBC WITH HIVE
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class table {
private static String driverName = "org.apache.hadoop.hive.mysql.jdbc.Driver";
public static void main(String[] args) throws SQLException {
// Register driver and create driver instance
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Connection con = DriverManager.getConnection("jdbc:mysql://localhost:1000/default", "", "");
Statement stmt = con.createStatement();
stmt.executeQuery("CREATE DATABASE userdb");
// System.out.println(“Database userdb created successfully”);
con.close();
}
}
akshay#akshay:~$ javac table.java
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
akshay#akshay:~$ java table
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
java.lang.ClassNotFoundException: org.apache.hadoop.hive.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at table.main(table.java:14)
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:1000/default
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at table.main(table.java:20)
My hive-site.xml contains
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server </description>
</property>
I have configured meta store of hive with MySQL. So what should my ConnectionURL and Drivername in JAVA connection code?
I am not getting where I am going wrong. Please provide solution for above problem.
For running hive queries using JDBC API's, you need to start your hiveserver2 first. Configure the thrift server port in your hive-site.xml file as shown below
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on, default 10000</description>
</property>
Start the hiveserver2 using the command
cd $HIVE_HOME/bin
./hiveserver2
Also you need to add the below dependencies to your project.
Hive-jdbc-*-standalone.jar
hive-jdb-*.jar
hive-metastore-*.jar
hive-service-*.jar
After that try running the program. You can refer to this blog for more information on step by step procedure to run hive queries using java programs.
I have a MR program which runs perfectly on a bunch of SequenceFile's and output is as expected.
When I try to achieve the same via an Oozie WorkFlow for some reason the InputFormat class property is not recognized and I feel the input is considered as default TextInputFormat only.
Here is how the mapper is declared. SequenceFile key is LongWritable and value is Text.
public static class FeederCounterMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
// setup map function for stripping the feeder for a zone from the input
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
final int count = 1;
// convert input rec to string
String inRec = value.toString();
System.out.println("Feeder:" + inRec);
// strip out the feeder from record
String feeder = inRec.substring(3, 7);
// write the key+value as map output
context.write(new Text(feeder), new IntWritable(count));
}
}
The workflow layout for my application is as below
/{$namenode}/workflow.xml
/{$namenode}/lib/FeederCounterDriver.jar
The below is my workflow.xml. The $namenode, $jobtracker, $outputdir, $inputdir are defined in the job.properties file.
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/${outputDir}"/>
</prepare>
<configuration>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/flume/events/sincal*</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${outputDir}</value>
</property>
<property>
<name>mapred.input.format.class</name>
<value>org.apache.hadoop.mapred.SequenceFileInputFormat</value>
</property>
<property>
<name>mapred.output.format.class</name>
<value>org.apache.hadoop.mapred.TextOutputFormat</value>
</property>
<property>
<name>mapred.input.key.class</name>
<value>org.apache.hadoop.io.LongWritable</value>
</property>
<property>
<name>mapred.input.value.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapreduce.map.class</name>
<value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterMapper</value>
</property>
<property>
<name>mapreduce.reduce.class</name>
<value>org.poc.hadoop121.gissincal.FeederCounterDriver$FeederCounterReducer</value>
</property>
<property>
<name>mapreduce.map.tasks</name>
<value>1</value>
</property>
</configuration>
</map-reduce>
A snippet of the stout(first 2 lines) when I run the MR job is
Feeder:00107371PA1700TEET67576 LKHS 5666LH 2.....
Feeder:00107231PA1300TXDS 8731TX 1FSHS 8731FH 1.....
A snippet of the output(first 3 lines) when I run using Ooozie work flow is
Feeder:SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text�������b'b��X�...
Feeder:��00105271PA1000FSHS 2255FH 1TXDS 2255TX 1.....
Feeder:��00103171PA1800LKHS 3192LH 2LKHS 2335LH 1.....
With the above output from the Oozie workflow I highly doubt the input format SequenceFileInputFormat mentioned in the workflow.xml is even considered, else I feel this is overridden.
Any inputs towards this would help. Thanks
Find the job.xml created for this mapreduce job in job tracker and see what is the input format class being set there. This will confirm whether it is a problem with input format or not.
I had a really similar problem and I got oozie to use the proper input format by setting my property like this
<property>
<name>mapreduce.inputformat.class</name>
<value>org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat</value>
</property>
So one dot to remove from the property name (check for your version) and the class to change too.
I am attempting to run a simple MapReduce process to write HFiles for later import into an HBase table.
When the job is submitted:
hbase com.pcoa.Driver /test /bulk pcoa
I am getting the following exception indicating that netty-3.6.6.Final.jar does not exist in HDFS (it does however exist here).
-rw-r--r--+ 1 mbeening flprod 1206119 Sep 18 18:25 /dedge1/hadoop/hbase-0.96.1.1-hadoop2/lib/netty-3.6.6.Final.jar
I am afraid that I do not understand how to address this configuration(?) error.
Can anyone provide any advice to me?
Here is the Exception:
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost/dedge1/hadoop/hbase-0.96.1.1-hadoop2/lib/netty-3.6.6.Final.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
at com.pcoa.Driver.main(Driver.java:63)
Here is my driver routine:
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "HBase Bulk Import");
job.setJarByClass(HBaseKVMapper.class);
job.setMapperClass(HBaseKVMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(KeyValue.class);
job.setInputFormatClass(TextInputFormat.class);
HTable hTable = new HTable(conf, args[2]);
HFileOutputFormat.configureIncrementalLoad(job, hTable);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I am not sure why/if i had to do this (didn't see anything like this in any of the startup docs anywhere)
but I ran one of these:
hdfs dfs -put /hadoop/hbase-0.96.1.1-hadoop2/lib/*.jar /hadoop/hbase-0.96.1.1-hadoop2/lib
And....my MR job seems to run now
If this is an incorrect course - please let me know
thanks!
Recently I started to use hadoop . Now I want to access hdfs from a remote host,which does not install hadoop-client, just with a dependency of hadoop-client-2.0.4-alpha.jar .
But when I tried to access hdfs , I got the following exception:
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status; Host Details : local host is: "webserver/127.0.0.1"; destination host is: "222.333.111.77":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
at org.apache.hadoop.ipc.Client.call(Client.java:1239)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy25.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy25.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:630)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1559)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:811)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1345)
at com.kongming.kmdata.service.ExportService.copyToLocalFileFromHdfs(ExportService.java:60)
at com.kongming.kmdata.service.KMReportManager.run(KMReportManager.java:105)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status
at com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.buildParsed(RpcPayloadHeaderProtos.java:1094)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.access$1300(RpcPayloadHeaderProtos.java:1028)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:986)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:946)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
It looks like a rpc exception, how to fix it ? here is my code :
package com.xxx.xxx.service;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.log4j.Logger;
import com.xxx.xxx.fileSystem.IFilePath;
import com.xxx.xxx.inject.GuiceDependency;
public class ExportService {
private static Logger log = Logger.getLogger(ExportService.class);
private static Configuration configuration = new Configuration();
private static String dir = "./";
private static String hadoopConf = "hadoop-conf/";
static {
configuration.addResource(new Path(hadoopConf + "core-site.xml"));
configuration.addResource(new Path(hadoopConf + "hdfs-site.xml"));
configuration.addResource(new Path(hadoopConf + "mapred-site.xml"));
configuration.addResource(new Path(hadoopConf + "yarn-site.xml"));
}
public static boolean copyToLocalFileFromHdfs(String reportID) {
IFilePath filePath = GuiceDependency.getInstance(IFilePath.class);
String resultPath = filePath.getFinalResult(reportID) + "/part-r-00000";
Path src = new Path(resultPath);
String exportPath = dir + reportID + ".csv";
Path dst = new Path(exportPath);
System.out.println(configuration.get("fs.defaultFS"));
System.out.println("zxz copyToLocalFileFromHdfs scr: "
+ src.toString() + " , dst: " + dst.toString());
try {
System.out.println("zxz get fileSystem start ");
FileSystem fs = FileSystem.get(configuration);
System.out.println("zxz get fileSystem end "
+ fs.getHomeDirectory().toString());
System.out.println("zxz ~~~~~~~~~~~~~~~~~~~~~~~~~"
+ fs.exists(src));
;
fs.copyToLocalFile(false, src, dst);
fs.copyToLocalFile(false, src, dst, true);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
log.error("copyFromHDFSFile error : ", e);
return false;
}
System.out.println("zxz end copyToLocalFileFromHdfs for report: "
+ reportID);
return true;
}
}
and core-site.xml :
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2013-07-19T00:57:49.581Z-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://222.333.111.77:8020</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>
Anyone know this issue? Thank you very much for help~
I belive hdfs using google protobuf library. And your client code seems to be using wrong (incompatible) version of protobuf.