how add hadoop jars to classpath?

how add hadoop jars to classpath? - hadoop

Hadoop 2.7.3 on my mac is installed at:
/usr/local/Cellar/hadoop/2.7.3
I write a demo to read file from HDFS using java:
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class HDFSTest{
public static void main(String[] args) throws IOException, URISyntaxException{
String file= "hdfs://localhost:9000/hw1/customer.tbl";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(file), conf);
Path path = new Path(file);
FSDataInputStream in_stream = fs.open(path);
BufferedReader in = new BufferedReader(new
InputStreamReader(in_stream));
String s;
while ((s=in.readLine())!=null) {
System.out.println(s);
}
in.close();
fs.close();
}
}
When I compile the java file ,error as shown blow：
hero:Documents yaopan$ javac HDFSTest.java
HDFSTest.java:8: error: package org.apache.hadoop.conf does not exist
import org.apache.hadoop.conf.Configuration;
^
HDFSTest.java:10: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataInputStream;
^
HDFSTest.java:12: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FSDataOutputStream;
^
HDFSTest.java:14: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.FileSystem;
I know the reason is can not find hadoop jars，how to configure that？
^

Locate a jar file named "hadoop-common-2.7.3.jar" under your installation (i.e /usr/local/Cellar/hadoop/2.7.3) and set it in classpath or give it directly in the command line along with javac.
javac -cp "/PATH/hadoop-common-2.7.3.jar" HDFSTest.java
(replace PATH with appropriate path)

Just add hadoop jars to classpath:
I install hbase using homebrew on /usr/local/Cellar/hbase/1.2.2,
add all jars under /usr/local/Cellar/hbase/1.2.2/libexec/lib to classpath:
1.edit .bash_profile
sudo vim ~/.bash_profile
2.add classpath
#set hbase lib path
export CLASSPATH=$CLASSPATH://usr/local/Cellar/hbase/1.2.2/libexec/lib/*
save and exit
wq

Related

Pig store fails to run store command when it is called using java code (embedded mode)

I am learning Hadoop, I tried running my pig script using java but it seems like it skips the store command written in the script and does not produce the output data file at the particular location.
But when I try running the pig script using command line, it gives the output data file as desired.
First I thought that java might have some permission issues that it is not creating a file. But I tried creating a file in the exact location using java and it easily creates an empty file. So it doesn't seem to be a permission issue.
Can anybody tell me what is the issue here that pig script runs successfully when used through command line but fails in the embedded mode?
Java Code:
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.pig.PigServer;
import java.io.IOException;
public class storePig {
public static void main(String args[]) throws Exception {
try {
PigServer pigServer = new PigServer("local");
runQuery(pigServer);
}catch (Exception e){
e.printStackTrace();
}
}
public static void runQuery(PigServer pigServer) throws IOException {
pigServer.registerScript("/home/anusharma/Desktop/stackoverflow/sampleScript.pig");
}
}
Pig-Script:
Employee = LOAD '/home/anusharma/Desktop/Hadoop/Pig/record.txt' using PigStorage(',') as (id:int, firstName:chararray, lastName:chararray, age:int, contact:chararray, city:chararray);
Employe = ORDER Employee BY age desc;
limitedEmployee = LIMIT Employe 4;
STORE limitedEmployee into '/home/anusharma/Desktop/stackoverflow/output' using
PigStorage('|');

Running a mapreduce job no output at all. It doesn't even run . very weird. no error thrown on the terminal

I compiled the mapreduce code (driver, mapper and reducer classes) and created Jar files. When I run it on the dataset, it doesn't seem to run. It just comes back to the prompt as shown in the image. Any suggestions folks?
thanks much
basam
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
//This driver program will bring all the information needed to submit this Map reduce job.
public class MultiLangDictionary {
public static void main(String[] args) throws Exception{
if (args.length !=2){
System.err.println("Usage: MultiLangDictionary <input path> <output path>");
System.exit(-1);
}
Configuration conf = new Configuration();
Job ajob = new Job(conf, "MultiLangDictionary");
//Assigning the driver class name
ajob.setJarByClass(MultiLangDictionary.class);
FileInputFormat.addInputPath(ajob, new Path(args[0]));
//first argument is the job itself
//second argument is the location of the output dataset
FileOutputFormat.setOutputPath(ajob, new Path(args[1]));
ajob.setInputFormatClass(TextInputFormat.class);
ajob.setOutputFormatClass(TextOutputFormat.class);
//Defining the mapper class name
ajob.setMapperClass(MultiLangDictionaryMapper.class);
//Defining the Reducer class name
ajob.setReducerClass(MultiLangDictionaryReducer.class);
//setting the second argument as a path in a path variable
Path outputPath = new Path(args[1]);
//deleting the output path automatically from hdfs so that we don't have delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
}
}

try with java packagename.classname in the command
hadoop jar MultiLangDictionary.jar [yourpackagename].MultiLangDictionary input output

You could try adding the Map and Reduce output key types to your driver. Something like (this is an example):
job2.setMapOutputKeyClass(Text.class);
job2.setMapOutputValueClass(Text.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
In the above both the Mapper and Reducer would be writing (Text,Text) in their context.write() methods.

I am getting the error "The method addCacheFile(URI) is undefined for the type Job" with CDH4.0

I am getting the error
The method addCacheFile(URI) is undefined for the type Job
with CDH4.0 when trying to call the addCacheFile(URI uri) method, as shown below:
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class DistributedCacheDriver {
public static void main(String[] args) throws Exception {
String inputPath = args[0];
String outputPath = args[1];
String fileName = args[2];
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "TestingDistributedCache");
job.setJarByClass(DistributedCache.class);
job.addCacheFile(new URI(fileName)); //Getting error here -The method addCacheFile(URI) is undefined for the type Job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
Any suggestions/hints to get rid of this error?

If you have chosen to install MapReduce version 1, then you should replace the job.addCacheFile() command with DistributeddCache.addCacheFile(); and change the setup() method accordingly (call it configure()).
Find some official documentation and examples here.

Why does job submission from Java fail?

I submit a Spark job from Java as a RESTful service. I keep getting the following error:
Application application_1446816503326_0098 failed 2 times due to AM
Container for appattempt_1446816503326_0098_000002 exited with
exitCode: -1000 For more detailed output, check application tracking
page:http://ip-172-31-34-
108.us-west-2.compute.internal:8088/proxy/application_1446816503326_0098/Then,
click on links to logs of each attempt. Diagnostics:
java.io.FileNotFoundException: File file:/opt/apache-tomcat-
8.0.28/webapps/RESTfulExample/WEB-INF/lib/spark-yarn_2.10-1.3.0.jar does not exist Failing this attempt. Failing the application.
spark-yarn_2.10-1.3.0.jar file is there in the lib folder.
Here is my program.
package SparkSubmitJava;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import java.io.IOException;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.core.Response;
#Path("/spark")
public class JavaRestService {
#GET
#Path("/{param}/{param2}/{param3}")
public Response getMsg(#PathParam("param") String bedroom,#PathParam("param2") String bathroom,#PathParam("param3")String area) throws IOException {
String[] args = new String[] {
"--name",
"JavaRestService",
"--driver-memory",
"1000M",
"--jar",
"/opt/apache-tomcat-8.0.28/webapps/scalatest-0.0.1-SNAPSHOT.jar",
"--class",
"ScalaTest.ScalaTest.ScalaTest",
"--arg",
bedroom,
"--arg",
bathroom,
"--arg",
area,
"--arg",
"yarn-cluster",
};
Configuration config = new Configuration();
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf sparkConf = new SparkConf();
ClientArguments cArgs = new ClientArguments(args, sparkConf);
Client client = new Client(cArgs, config, sparkConf);
client.run();
return Response.status(200).entity(client).build();
}
}
Any help will be appreciated.

Hadoop Custom Java Program

I have a simple java program called putmerge that I am trying to execute. I have been at it for like 6hrs, researched many places on the web but could not find solution. Basically I try to build the jar with all class libraries with the following command:
javac -classpath *:lib/* -d playground/classes playground/src/PutMerge.java
And then I build the jar with the following command.
jar -cvf playground/putmerge.jar -C playground/classes/ .
And then I try to execute it with the following command:
bin/hadoop jar playground/putmerge.jar org.scd.putmerge "..inputPath.." "..outPath"
..
Exception in thread "main" java.lang.ClassNotFoundException: com.scd.putmerge
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
I tried every permutation/combination to run this simple jar, however I always get some kind of exception as shown above.
My source code:
package org.scd.putmerge;
import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
*
* #author Anup V. Saumithri
*
*/
public class PutMerge
{
public static void main(String[] args) throws IOException
{
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
FileSystem local = FileSystem.getLocal(conf);
Path inputDir = new Path(args[0]);
Path hdfsFile = new Path(args[1]);
try
{
FileStatus[] inputFiles = local.listStatus(inputDir);
FSDataOutputStream out = hdfs.create(hdfsFile);
for(int i=0; i<inputFiles.length; i++)
{
System.out.println(inputFiles[i].getPath().getName());
FSDataInputStream in = local.open(inputFiles[i].getPath());
byte buffer[] = new byte[256];
int bytesRead = 0;
while((bytesRead = in.read(buffer)) > 0)
{
out.write(buffer, 0, bytesRead);
}
in.close();
}
out.close();
}
catch(IOException ex)
{
ex.printStackTrace();
}
}
}

The way you are putting your PutMerge class inside the jar may be a little incorrect.
If you do a jar tf putmerge.jar, you must see the PutMerge class inside the path mentioned in your package (org.scd.putmerge) in your code (i.e. org/scd/putmerge).
If not try doing the following to achieve that. Make sure you have copied PutMerge.class inside org/scd/putmerge/ directory.
jar -cvf playground/putmerge.jar org/scd/putmerge/PutMerge.class
Next, verify again with jar tf putmerge.jar to check if now see org/scd/putmerge/PutMerge.class in the output.
If everything's fine, you can try to run the hadoop jar again. But looking at the errors, I see that you haven't actually included the PutMerge class with the package. You should use org.scd.putmerge.PutMerge. So, the correct way should be something like --
bin/hadoop jar playground/putmerge.jar org.scd.putmerge.PutMerge "..inputPath.." "..outPath"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how add hadoop jars to classpath? - hadoop

Locate a jar file named "hadoop-common-2.7.3.jar" under your installation (i.e /usr/local/Cellar/hadoop/2.7.3) and set it in classpath or give it directly in the command line along with javac. javac -cp "/PATH/hadoop-common-2.7.3.jar" HDFSTest.java (replace PATH with appropriate path)

Related

Pig store fails to run store command when it is called using java code (embedded mode)

Running a mapreduce job no output at all. It doesn't even run . very weird. no error thrown on the terminal

I am getting the error "The method addCacheFile(URI) is undefined for the type Job" with CDH4.0

Why does job submission from Java fail?

Hadoop Custom Java Program

Categories

Resources