ClassNotFoundException while running MissingPokerCards on ec2 Instance - hadoop

I'm getting the following error when I try to run jar file -
Exception in thread "main" java.lang.ClassNotFoundException: finalPoker.MissingPokerCards
at java.net.URLClassLoader$1.run(URLClassLoader.java:360)
at java.net.URLClassLoader$1.run(URLClassLoader.java:349)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:348)
at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
The following code is my MissingPokerCards program which will count the number of missing cards from the deck of 52 cards.
package MissingPokerCards;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class PokerCardsProgramme {
//Mapper function
//Reduce funtion
//Main function
public static void main(String[] args) throws Exception {
Configuration config = new Configuration();
Job job = new Job(config, "Search for list of missing Cards");
job.setJarByClass(PokerCardsProgramme.class);
job.setMapperClass(mapper.class);
job.setReducerClass(reducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Compiled code using - javac -classpath /home/ec2-user/hadoop_home/hadoop-1.2.1/hadoop-core-1.2.1.jar PokerCardsProgramme.java
Jar is created by using following command - jar cvf MissingPokerCards.jar PokerCardsProgramme*.class
Jar file is ran using - hadoop jar MissingPokerCards.jar MissingPokerCards.PokerCardsProgramme \input\inputcards.txt output
My Hadoop version is 1.2.1 and java version is 1.7.0_241
Even I tried using a different version of Hadoop-2-7-3
hadoop jar /home/ec2-user/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar MissingPokerCards.PokerCardsProgramme inputcards.txt /output
Still facing the same issue. I think I am missing the PokerCards function related jar file.
Can anybody please help me with this problem. Am I using the correct command to compile and run the program or else is there any way to execute the MissingPokerCards program on ec2 instance.
I am able to run the same code in eclipse but when I tried to execute on ec2 it is showing this issue.

The error has nothing to do with Hadoop or EC2. This is just a regular Java error. If you really want to run Hadoop code in AWS use EMR, not EC2 instances
Your package is MissingPokerCards. The error says it's finalPoker
Your class is PokerCardsProgramme. Your error says it's MissingPokerCards
FWIW, not many people actually write mapreduce nowadays, but you definitely should be using Hadoop 2 or 3 with Java 8, not 1.2.1 with Java 7

Related

error: cannot import non-existent remote object , image import failing

I'm trying to import an existing Linux image. I used the following command
terraform import azurerm_marketplace_agreement.publisher /subscriptions/YOUR-AZURE-SUBSCRIPTION-ID/providers/Microsoft.MarketplaceOrdering/agreements/publisher/offers/offer/plans/plan
But when I run this in pipeline, I'm getting error at every alternate run. The error is
Error: cannot import non-existent remote object
Do I need to do anything special in my script before I run this command?

Input path doesn't exist in pyspark for hadoop path

Am trying to get the fetch the file from hdfs in pyspark using visual studio code...
i have checked through jps all the nodes are in active status only.
my file path in hadoop is
hadoop fs -cat emp/part-m-00000
1,A,ABC
2,B,ABC
3,C,ABC
and core-site.xml is
fs.default.name
hdfs://localhost:9000
am fetching the above mentioned file through visual studio code in pyspark..
but am getting error like
py4j.protocol.Py4JJavaError: An error occurred while calling o31.partitions.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/emp/part-m-00000
please help me
i have tried giving the hadoop path
from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import HiveContext
sc= SparkContext('local','example')
hc = HiveContext(sc)
tf1 = sc.textFile("hdfs://localhost:9000/emp/part-m-00000")
print(tf1.first())
i need to get the file from hadoop

Jobs spark failure

When I want to launch a spark job on R I get this error :
Erreur : java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82) ....
In spark logs (/opt/mapr/spark/spark-version/logs) I find a lot of theses exceptions :
ERROR FsHistoryProvider: Exception encountered when attempting to load application log maprfs:///apps/spark/.60135a9b-ec7c-4f71-8f92-4d4d2fbb1e2b
java.io.FileNotFoundException: File maprfs:///apps/spark/.60135a9b-ec7c-4f71-8f92-4d4d2fbb1e2b does not exist.
Any idea how could I solve this issue ?
You need to create sparkContext (or get if it exists)
import org.apache.spark.{SparkConf, SparkContext}
// 1. Create Spark configuration
val conf = new SparkConf()
.setAppName("SparkMe Application")
.setMaster("local[*]") // local mode
// 2. Create Spark context
val sc = new SparkContext(conf)
or
SparkContext.getOrCreate()

sparkSession/sparkContext can not get hadoop configuration

I am running spark 2, hive, hadoop at local machine, and I want to use spark sql to read data from hive table.
It works all fine when I have hadoop running at default hdfs://localhost:9000, but if I change to a different port in core-site.xml:
<name>fs.defaultFS</name>
<value>hdfs://localhost:9099</value>
Running a simple sql spark.sql("select * from archive.tcsv3 limit 100").show(); in spark-shell will give me the error:
ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
.....
From local/147.214.109.160 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused;
.....
I get the AlreadyExistsException before, which doesn't seem to influence the result.
I can make it work by creating a new sparkContext:
import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
sc.stop()
var sc = new SparkContext()
val session = SparkSession.builder().master("local").appName("test").enableHiveSupport().getOrCreate()
session.sql("show tables").show()
My question is, why the initial sparkSession/sparkContext did not get the correct configuration? How can I fix it? Thanks!
If you are using SparkSession and you want to set configuration on the the spark context then use session.sparkContext
val session = SparkSession
.builder()
.appName("test")
.enableHiveSupport()
.getOrCreate()
import session.implicits._
session.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
You don't need to import SparkContext or created it before the SparkSession

Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

I am developing an application that try to read log file stored in S3 bucks and parse it using Elastic MapReduce. Current the log file has following format
-------------------------------
COLOR=Black
Date=1349719200
PID=23898
Program=Java
EOE
-------------------------------
COLOR=White
Date=1349719234
PID=23828
Program=Python
EOE
So I try to load the file into my Pig script, but the build-in Pig Loader doesn't seems be able to load my data, so I have to create my own UDF. Since I am pretty new to Pig and Hadoop, I want to try script that written by others before I write my own, just to get a teast of how UDF works. I found one from here http://pig.apache.org/docs/r0.10.0/udf.html, there is a SimpleTextLoader. In order to compile this SimpleTextLoader, I have to add a few imports, as
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.data.DataByteArray;
import org.apache.pig.PigException;
import org.apache.pig.LoadFunc;
Then, I found out I need to compile this file. I have to download svn and pig running
sudo apt-get install subversion
svn co http://svn.apache.org/repos/asf/pig/trunk
ant
Now i have a pig.jar file, then I try to compile this file.
javac -cp ./trunk/pig.jar SimpleTextLoader.java
jar -cf SimpleTextLoader.jar SimpleTextLoader.class
It compiles successful, and i type in Pig entering grunt, in grunt i try to load the file, using
grunt> register file:/home/hadoop/myudfs.jar
grunt> raw = LOAD 's3://mys3bucket/samplelogs/applog.log' USING myudfs.SimpleTextLoader('=') AS (key:chararray, value:chararray);
2012-12-05 00:08:26,737 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc Details at logfile: /home/hadoop/pig_1354666051892.log
Inside the pig_1354666051892.log, it has
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc
java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc
I also try to use another UDF (UPPER.java) from http://wiki.apache.org/pig/UDFManual, and I am still get the same error by try to use UPPER method. Can you please help me out, what's the problem here? Much thanks!
UPDATE: I did try EMR build-in Pig.jar at /home/hadoop/lib/pig/pig.jar, and get the same problem.
Put the UDF jar in the /home/hadoop/lib/pig directory or copy the pig-*-amzn.jar file to /home/hadoop/lib and it will work.
You would probably use a bootstrap action to do either of these.
Most of the Hadoop ecosystem tools like pig and hive look up $HADOOP_HOME/conf/hadoop-env.sh for environment variables.
I was able to resolve this issue by adding pig-0.13.0-h1.jar (it contains all the classes required by the UDF) to the HADOOP_CLASSPATH:
export HADOOP_CLASSPATH=/home/hadoop/pig-0.13.0/pig-0.13.0-h1.jar:$HADOOP_CLASSPATH
pig-0.13.0-h1.jar is available in the Pig home directory.

Resources