Recently I started to use hadoop . Now I want to access hdfs from a remote host,which does not install hadoop-client, just with a dependency of hadoop-client-2.0.4-alpha.jar .
But when I tried to access hdfs , I got the following exception:
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status; Host Details : local host is: "webserver/127.0.0.1"; destination host is: "222.333.111.77":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
at org.apache.hadoop.ipc.Client.call(Client.java:1239)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy25.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy25.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:630)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1559)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:811)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1345)
at com.kongming.kmdata.service.ExportService.copyToLocalFileFromHdfs(ExportService.java:60)
at com.kongming.kmdata.service.KMReportManager.run(KMReportManager.java:105)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Message missing required fields: callId, status
at com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.buildParsed(RpcPayloadHeaderProtos.java:1094)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.access$1300(RpcPayloadHeaderProtos.java:1028)
at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:986)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:946)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
It looks like a rpc exception, how to fix it ? here is my code :
package com.xxx.xxx.service;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.log4j.Logger;
import com.xxx.xxx.fileSystem.IFilePath;
import com.xxx.xxx.inject.GuiceDependency;
public class ExportService {
private static Logger log = Logger.getLogger(ExportService.class);
private static Configuration configuration = new Configuration();
private static String dir = "./";
private static String hadoopConf = "hadoop-conf/";
static {
configuration.addResource(new Path(hadoopConf + "core-site.xml"));
configuration.addResource(new Path(hadoopConf + "hdfs-site.xml"));
configuration.addResource(new Path(hadoopConf + "mapred-site.xml"));
configuration.addResource(new Path(hadoopConf + "yarn-site.xml"));
}
public static boolean copyToLocalFileFromHdfs(String reportID) {
IFilePath filePath = GuiceDependency.getInstance(IFilePath.class);
String resultPath = filePath.getFinalResult(reportID) + "/part-r-00000";
Path src = new Path(resultPath);
String exportPath = dir + reportID + ".csv";
Path dst = new Path(exportPath);
System.out.println(configuration.get("fs.defaultFS"));
System.out.println("zxz copyToLocalFileFromHdfs scr: "
+ src.toString() + " , dst: " + dst.toString());
try {
System.out.println("zxz get fileSystem start ");
FileSystem fs = FileSystem.get(configuration);
System.out.println("zxz get fileSystem end "
+ fs.getHomeDirectory().toString());
System.out.println("zxz ~~~~~~~~~~~~~~~~~~~~~~~~~"
+ fs.exists(src));
;
fs.copyToLocalFile(false, src, dst);
fs.copyToLocalFile(false, src, dst, true);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
log.error("copyFromHDFSFile error : ", e);
return false;
}
System.out.println("zxz end copyToLocalFileFromHdfs for report: "
+ reportID);
return true;
}
}
and core-site.xml :
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera CM on 2013-07-19T00:57:49.581Z-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://222.333.111.77:8020</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>1</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>65536</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>
Anyone know this issue? Thank you very much for help~
I belive hdfs using google protobuf library. And your client code seems to be using wrong (incompatible) version of protobuf.
Related
I am trying to access hive metastore and I am using SparkSql for this . I have setup sparksession , but when I run my program and see log I see this exception
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 61 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 62 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 68 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
I am running a servlet which accesses following code
public class HiveReadone extends HttpServlet {
private static final long serialVersionUID = 1L;
/**
* #see HttpServlet#HttpServlet()
*/
public HiveReadone() {
super();
// TODO Auto-generated constructor stub
}
/**
* #see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
*/
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
// TODO Auto-generated method stub
response.getWriter().append("Served at: ").append(request.getContextPath());
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "hdfs://saurab:9000/user/hive/warehouse")
.config("mapred.input.dir.recursive", true)
.config("hive.mapred.supports.subdirectories", true)
.config("hive.vectorized.execution.enabled", true)
.master("local")
.getOrCreate();
response.getWriter().println(spark);
Nothing gets print on browser accept output from response.getWriter().append("Served at:
").append(request.getContextPath()); which is Served at: /hiveServ
Please take a look at my conf/hive-site.xml
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://saurab:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>/home/saurab/hadoopec/hive/lib/hive-serde-2.1.1.jar</value>
</property>
<property>
<name>spark.sql.warehouse.dir</name>
<value>hdfs://saurab:9000/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<!--Make sure that <value> points to the Hive Metastore URI in your cluster -->
<value>thrift://saurab:9083</value>
<description>URI for client to contact metastore server</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface.
Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property>
As far as I have read if we configure hive.metastore.uris spark will connect to hive metastore, but in my case it is not and giving me above error.
To configure spark on hive try to copy your hive-site.xml to the spark/conf directory
HOW TO CONFIGURE JDBC WITH HIVE
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class table {
private static String driverName = "org.apache.hadoop.hive.mysql.jdbc.Driver";
public static void main(String[] args) throws SQLException {
// Register driver and create driver instance
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Connection con = DriverManager.getConnection("jdbc:mysql://localhost:1000/default", "", "");
Statement stmt = con.createStatement();
stmt.executeQuery("CREATE DATABASE userdb");
// System.out.println(“Database userdb created successfully”);
con.close();
}
}
akshay#akshay:~$ javac table.java
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
akshay#akshay:~$ java table
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
java.lang.ClassNotFoundException: org.apache.hadoop.hive.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at table.main(table.java:14)
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:1000/default
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at table.main(table.java:20)
My hive-site.xml contains
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server </description>
</property>
I have configured meta store of hive with MySQL. So what should my ConnectionURL and Drivername in JAVA connection code?
I am not getting where I am going wrong. Please provide solution for above problem.
For running hive queries using JDBC API's, you need to start your hiveserver2 first. Configure the thrift server port in your hive-site.xml file as shown below
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on, default 10000</description>
</property>
Start the hiveserver2 using the command
cd $HIVE_HOME/bin
./hiveserver2
Also you need to add the below dependencies to your project.
Hive-jdbc-*-standalone.jar
hive-jdb-*.jar
hive-metastore-*.jar
hive-service-*.jar
After that try running the program. You can refer to this blog for more information on step by step procedure to run hive queries using java programs.
I'm trying to get a hold of DistributedCache. I'm using Apache Hadoop 1.2.1 on two nodes.
I referred to the Cloudera post which is simply extended in the other posts that explain how to use third-party jars using -libjars
Note:
In my jar, I haven't included any jar libs. - neither Hadoop core nor commons lang.
The code :
public class WordCounter extends Configured implements Tool {
#Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
// Job job = new Job(getConf(), args[0]);
Job job = new Job(super.getConf(), args[0]);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setJarByClass(WordCounter.class);
FileInputFormat.setInputPaths(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
int jobState = job.waitForCompletion(true) ? 0 : 1;
return jobState;
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
if (args == null || args.length < 3) {
System.out.println("The below three arguments are expected");
System.out
.println("<job name> <hdfs path of the input file> <hdfs path of the output file>");
return;
}
WordCounter wordCounter = new WordCounter();
// System.exit(ToolRunner.run(wordCounter, args));
System.exit(ToolRunner.run(new Configuration(), wordCounter, args));
}
}
The Mapper class is naive, its only attempting to use the StringUtils from Apache Commons(and NOT hadoop)
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/**
* #author 298790
*
*/
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static IntWritable one = new IntWritable(1);
#Override
protected void map(
LongWritable key,
Text value,
org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
StringTokenizer strTokenizer = new StringTokenizer(value.toString());
Text token = new Text();
while (strTokenizer.hasMoreTokens()) {
token.set(strTokenizer.nextToken());
context.write(token, one);
}
System.out.println("Converting " + value + " to upper case "
+ StringUtils.upperCase(value.toString()));
}
}
The commands that I use :
bigdata#slave3:~$ export HADOOP_CLASSPATH=dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$
bigdata#slave3:~$ echo $HADOOP_CLASSPATH
dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$
bigdata#slave3:~$ echo $LIBJARS
dumphere/lib/commons-lang3-3.1.jar
bigdata#slave3:~$ hadoop jar dumphere/code/jars/hdp_3rdparty.jar com.hadoop.basics.WordCounter "WordCount" "/input/dumphere/Childhood_days.txt" "/output/dumphere/wc" -libjars ${LIBJARS}
The exception I get :
Warning: $HADOOP_HOME is deprecated.
14/08/13 21:56:05 INFO input.FileInputFormat: Total input paths to process : 1
14/08/13 21:56:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/13 21:56:05 WARN snappy.LoadSnappy: Snappy native library not loaded
14/08/13 21:56:05 INFO mapred.JobClient: Running job: job_201408111719_0190
14/08/13 21:56:06 INFO mapred.JobClient: map 0% reduce 0%
14/08/13 21:56:37 INFO mapred.JobClient: Task Id : attempt_201408111719_0190_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.hadoop.basics.WCMapper.map(WCMapper.java:40)
at com.hadoop.basics.WCMapper.map(WCMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
14/08/13 21:56:42 INFO mapred.JobClient: Task Id : attempt_201408111719_0190_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.commons.lang3.StringUtils
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at com.hadoop.basics.WCMapper.map(WCMapper.java:40)
at com.hadoop.basics.WCMapper.map(WCMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The Cloudera post mentions :
The jar will be placed in distributed cache and will be made available to all of the job’s task attempts. More specifically, you will find the JAR in one of the ${mapred.local.dir}/taskTracker/archive/${user.name}/distcache/… subdirectories on local nodes.
But on that path, I'm not able to find the commons-lang3-3.1.jar
What am I missing?
I'm running a MapReduce task against Wikipedia dump with history using XmlInputFormat for parsing the XML.
"xxx_m_000053_0" always stop at 70% before it's kill due to time out.
in the console:
xxx_m_000053_0 failed to report status for 300 seconds. Killing!
I increase the timeout to 2 hours. It didn't work.
In xxx_m_000053_0 log file:
Processing split: hdfs://localhost:8020/user/martin/history/history.xml:3556769792+67108864
I was expecting something wrong in history.xml in offset [3556769792,3623878656]. I split the file from this offset and run it in hadoop. It worked... (???)
In xxx_m_000053_0 log file:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:323)
at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2326)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
**at com.doduck.wikilink.history.XmlInputFormat$XmlRecordReader.close(XmlInputFormat.java:109)**
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:496)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1776)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:778)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Starting flush of map output
2013-09-17 13:13:32,248 INFO org.apache.hadoop.mapred.MapTask: Ignoring exception during close for org.apache.hadoop.mapred.MapTask$NewOutputCollector#54e9a7c2
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
So I'm thinking it might be a configuration problem? Why is my file system stop?
Something wrong with XmlInputFormat ?
My empty mapper:
#Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
//nothing to do...
}
My Main:
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("xmlinput.start", "<page>");
conf.set("xmlinput.end", "</page>");
Job job = new Job(conf, "wikipedia link history");
job.setJarByClass(Main.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(XmlInputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx9216m</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>300000</value>
</property>
My core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/Volumes/WD/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
I'm trying a simple program from the "hadoop in Action" book to merge a series of files from the local file system into one file in the hdfs. The code snippet is the same as the one provided in the book.
import java.lang.*;
import java.util.*;
import java.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
public class PutMerge {
public static void main(String[] args) throws IOException{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]); // First argument has the input directory
Path hdfsFile = new Path(args[1]); // Concatenated hdfs file name
try {
FileStatus[] inputFiles = local.listStatus(inputDir); // list of Local Files
FSDataOutputStream out = hdfs.create(hdfsFile); // target file creation
for (int i = 0; i<inputFiles.size; i++ {
FSDataInputStream in = local.open(inputFiles[i].getPath());
int bytesRead = 0;
byte[] buff = new byte[256];
while (bytesRead = (in.read(buff))>0) {
out.write(buff,0,bytesRead);
}
in.close();
}
out.close();
}
catch(Exception e) {
e.printStackTrace();
}
}
}
The program successfully compiled and while trying to run I'm getting the following exception
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:37)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:34)
at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:217)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:185)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:237)
at org.apache.hadoop.security.KerberosName.(KerberosName.java:79)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:210)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:185)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:237)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:468)
at org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1519)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at PutMerge.main(PutMerge.java:16) Caused by: java.lang.ClassNotFoundException:
org.apache.commons.configuration.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 17 more
Based on inputs from some of the posts, I added the commons package. My classpath definition is
/usr/java/jdk1.7.0_21:/data/commons-logging-1.1.2/commons-logging-1.1.2.jar:/data/hadoop-1.1.2/hadoop-core-1.1.2.jar:/data/commons-logging-1.1.2/commons-logging-adapters-1.1.2.jar:/data/commons-logging-1.1.2/commons-logging-api-1.1.2.jar:.
Any clue on why this is not working?
You didnt include apache configuration in your classpath.
Really though you shouldn't need to include much besides hadoop itself. Make sure you are running your jar with hadoop itself.
> hadoop -jar myJar.jar