Could not find or load main class hdfs problem - hadoop

I am trying to use Apache Rya for some tests (https://rya.apache.org/).
For those who are familiar with Rya and RDF stores, I am trying to do a bulk loading which is explained here: https://github.com/apache/rya/blob/master/extras/rya.manual/src/site/markdown/loaddata.md.
Briefly, I should copy a Jar file 'mapreduce/target/rya.mapreduce--shaded.jar' into an hdfs volume then run the following command:
hadoop hdfs://volume/rya.mapreduce-<version>-shaded.jar org.apache.rya.accumulo.mr.tools.RdfFileInputTool -Dac.zk=localhost:2181 -Dac.instance=accumulo -Dac.username=root -Dac.pwd=secret -Drdf.tablePrefix=rya_ -Drdf.format=N-Triples hdfs://volume/dir1,hdfs://volume/dir2,hdfs://volume/file1.nt
Well I copied the needed Jar and the input files into hdfs and verified that they are really there using bin/hadoop fs -put command. My problem is that when I run the cmd in the official example I get the following lines of error that I could not understand or resolve.
/project/hadoop/libexec/hadoop-functions.sh: line 2393: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_USER: invalid variable name
/project/hadoop/libexec/hadoop-functions.sh: line 2358: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_USER: invalid variable name
/project/hadoop/libexec/hadoop-functions.sh: line 2453: HADOOP_HDFS://LOCALHOST:9000/USER/RYA.MAPREDUCE-4.0.0-INCUBATING-SHADED.JAR_OPTS: invalid variable name
Error: Could not find or load main class hdfs:..localhost:9000.user.rya.mapreduce-4.0.0-incubating-shaded.jar
For information; all env variables are properly set, HADOOP_HOME and HADOOP_PREFIX

Related

Sqoop through JAVA API

We are trying to sqoop data from mysql to HDFS. When we run the code the data gets stored in local file system. We want the data to be in HDFS. Can any one suggest us with the following code?
SqoopOptions options = new SqoopOptions();
options.setConnectString("jdbc:mysql:hostname/db_name");
options.setUsername("user");
options.setPassword("pass");
options.setTableName("table");
options.setDirectMode(true);
options.setNumMappers(4);
options.setDriverClassName("com.mysql.jdbc.Driver");
options.setSqlQuery("select * from table");
options.setWhereClause("value > 15.0");
options.setTargetDir("output");
options.doHiveImport();
System.out.println();
int ret=new ImportTool().run(options);
System.out.println(ret);
I ran the same program in hdfs and got the output :)
Here the issue is with options.setTargetDir("output");
You are not specifying a qualifying HDFS path. If you change "output" with a valid HDFS path, you should be able to run the code from anywhere and still get a proper result.

Reading sas file from blob storage in R

I am trying to read .sas7bdat file from default container. I have tried following till now:
sas_file <- RxSasData("wasbs://container#storageaccount.blob.core.windows.net/abc/xyz.sas7bdat")
sas_df <- rxImport(sas_file)
but I get following error:
The file 'wasbs://container#storageaccount.blob.core.windows.net/abc/xyz.sas7bdat' does not exist.
Could not open data source.
Error in doTryCatch(return(expr), name, parentenv, handler) :
Could not open data source.
File exists at the mentioned location in code. Still it throws error. Can someone please help me this?
According to your code, I think you want to local a SAS data file from HDFS on Azure HDInsight via RxSasData. However, RxSasData seems to be not supported on Hadoop env, as the figure below, please see here.
Please try to copy the file to local filesystem on HDI, then to read.

Pig register jar, file does not exist error

I'm using Hortonworks sandbox and trying to run a simple pig script. There appear to be annoying error related to "file does not exist".
Below is the script:
REGISTER '/piggybank.jar';
inp = load '/my.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage..
ERROR 2997: Encountered IOException. File does not exist:
hdfs://sandbox.hortonworks.com:8020/tmp/udfs/ '/piggybank.jar'
However, my jar is present at the root(/) and I have given proper permission as well. Don't know why the path is pointing to /tmp/udfs....
Can anyone provide some suggestion?
Do not place the path within quotes. Also provide full URI of the Jar file location.
REGISTER hdfs://sandbox.hortonworks.com:8020/piggybank.jar;
Refer REGISTER (a jar/script).

Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found (Spark 1.6 Windows)

I am trying to access s3 files from local spark context using pySpark.
I keep getting File "C:\Spark\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o20.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
I had set os.environ['AWS_ACCESS_KEY_ID'] and
os.environ['AWS_SECRET_ACCESS_KEY'] before I called df = sqc.read.parquet(input_path). I also added these lines:
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsSecretAccessKey", os.environ["AWS_SECRET_ACCESS_KEY"])
hadoopConf.set("fs.s3.awsAccessKeyId", os.environ["AWS_ACCESS_KEY_ID"])
I have also tried changing s3 to s3n, s3a. Neither worked.
Any idea how to make it work?
I am on Windows 10, pySpark, Spark 1.6.1 built for Hadoop 2.6.0
I'm running pyspark appending the libraries from hadoop-aws.
You will need to use s3n in your input path. I'm running that from Mac-OS. so I'm not sure if it will work in Windows.
$SPARK_HOME/bin/pyspark --packages org.apache.hadoop:hadoop-aws:2.7.1
This package declaration works even in spark-shell
spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.1
and specify in the shell
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxxxxxxxxxxxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxxxxxxxxxxxxxxxx")

Oozie Workflow and Coordinator

I have 2 properties files one for workflow and one for coordinator.
./job.properties and ./coordinator/job.properties
2 files are identical except in coordinator there are a few additional variables set. below are those variables
coordstartTime=2013-04-08T18:40Z
coordendTime=2020-04-08T18:40Z
coordTimeZone=GMT
oozie.coord.application.path=${workflowRoot}/coordinator
wfPath=${workflowRoot}/workflow-master.xml
Everything is fine when I run the workflow but I am getting error when I run coordinator
error :
Error: E0301 : E0301: Invalid resource [filename]
that filename exists and when I do hadoop fs -ls [filename] it is listed.
What am I doing wrong here.
thanks
Problem was both
oozie.wf.application.path
and
oozie.coord.application.path
existed in the coordinator properties file.
I removed oozie.wf.application.path and the coordinator worked.
thanks

Resources