Error on sqoop2 job submission - hadoop

I'm getting an error on sqoop2 job submission.
sqoop:000> start job --jid 1
Submission details
Job id: 1
Status: FAILURE_ON_SUBMIT
Creation date: 2013-11-06 11:21:30 IST
Last update date: 2013-11-06 11:21:30 IST
Exception: java.io.FileNotFoundException: File does not exist: hdfs://master:9000/usr/local/sqoop/server/webapps/sqoop/WEB-INF/lib/sqoop-common-1.99.3.jar
Do we need to put all sqoop jar files on HDFS?
I’m running sqoop jobs on the same master node of hadoop 2.2.0

I copied all required jar libs into the hdfs finally and it worked.
hadoop fs -mkdir -p /usr/local/sqoop/server/webapps/sqoop/WEB-INF/lib/
hadoop fs -copyFromLocal /usr/local/sqoop/server/webapps/sqoop/WEB-INF/lib/*.jar /usr/local/sqoop/server/webapps/sqoop/WEB-INF/lib/

Modify mapred-site.xml file.
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
I succeeded

Related

oozie java.io.IOException: No FileSystem for scheme: hdfs

I have setup the oozie 4.3.1 with Hadoop 2.7.3.
oozie has been setup and running successfully and able to see web console http://localhost:11000/oozie/
and also confirm using oozie status command.
Issue 1:
While running the oozie examples after changing the job.properties with relevant details getting the error.
nameNode=hdfs://localhost:9000
jobTracker=localhost:8032
bin/oozie job -oozie http://localhost:11000/oozie -config $OOZIE_HOME/examples/apps/map-reduce/job.properties -run
Error: E0902 : E0902: Exception occured: [No FileSystem for scheme: hdfs]
Issue 2: oozie admin -sharelibupdate
[ShareLib update status]
host = http://f091403isdpbato05:11000/oozie
status = java.io.IOException: No FileSystem for scheme: hdfs
hdfs path and other oozie related .xml files also updated with proper configurations.
Please let me know any solution to move ahead.
You can try adding the following to you core-site.xml :
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>

Hadoop - Mkdirs failed to create C:\Users\acer\AppData\Local\Temp\hadoop-unjar778 7707269774970262\META-INF\license

I installed Hadoop on a Windows machine in pseudo-distributed mode and tried to run a MapReduce job on it. The Namenode and Datanode ran without any problems, however, the MapReduce job kept failing with the error:
Exception in thread "main" java.io.IOException: Mkdirs failed to create C:\Users\acer\AppData\Local\Temp\hadoop-unjar778
7707269774970262\META-INF\license
at org.apache.hadoop.util.RunJar.ensureDirectory(RunJar.java:128)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:104)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
I've checked that I already have full permission to that folder, and I also tried using maven-shade-plugin with no success.
Not sure what the issue but there are some todo
verify the folder permission with proper user for Temp\hadoop-unjar778
7707269774970262\META-INF (Can use chmod -R 777)
Check Namenode is running while executing MR
Node Managger service is running
Check the configuration:
For Hadoop 1.x:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9101</value>
</property
For Hadoop 2.x:
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:9101</value>
</property>

Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server

I am new to oozie and was following this for my first oozie hive job.
As per given in tutorial ,i made following files in a directory:
hive-default.xml
hive_job1.hql
job.properties
workflow.xml
But when i run this command:
oozie job -oozie http://localhost:11000/ -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
I get following error:
Error: IO_ERROR : java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, Authentication failed, status: 404, message: Not Found
I tried finding solution for this on internet ,but none solved the problem.(Might have missed something)
Please let me know where i am going wrong and what additional information will be required more from my side to understand the problem.
The error is because of the incorrect value for -oozie parameter. You forgot to add the oozie in the end. It should be -oozie http://localhost:11000/oozie
oozie job -oozie http://localhost:11000/oozie -config /home/ec2-user/ankit/oozie_job1/job.properties -submit
Please try setting following properties in core-site.xml:
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
where * represents to all users.
Restart the hadoop cluster after making above changes.

Cannot load a file from Hadoop HDFS from Pig Latin

I am having trouble trying to load a csv from file. I keep on getting the following error:
Input(s):
Failed to read data from "hdfs://localhost:9000/user/der/1987.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"
Looking at my Hadoop hdfs installed in my local machine I see the file. In fact the file is located at multiple locations such as /, /user/ , etc.
hdfs dfs -ls /user/der
Found 1 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv
My pig scripts is as follows:
records = LOAD '1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);
milage_recs= GROUP records ALL;
tot_miles = FOREACH milage_recs GENERATE SUM(records.Distance);
STORE tot_miles INTO 'totalmiles3';
I ran pig with the -x local option. I was able to read the files from my local hard disk with the -x local option. Got the right answer and the tail -f on Hadoop namenode did not scroll which proves I ran the files all locally on hard disk:
pig -x local totalmiles.pig
Now I am getting errors. It seems the hadoop name server is getting request because I used tail -f and see the logs scroll.
pig totalmiles.pig
records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS
I get the following error:
Failed Jobs:
JobId Alias Feature Message Outputs
job_local602774674_0001 milage_recs,records,tot_miles
GROUP_BY,COMBINER Message: ENOENT: No such file or directory
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.j
ava:724)
at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java: 502)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSys tem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUpl
oader.java:94)
at
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitte
r.java:98)
at org .apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
...blah...
Input(s):
Failed to read data from "/user/der/1987.csv"
Output(s):
Failed to produce result in "hdfs://localhost:9000/user/der/totalmiles3"
I used the hdfs to check for permissions by mkdir and that seems ok:
hdfs dfs -mkdir /user/der/temp2
hdfs dfs -ls /user/der
Found 3 items
-rw-r--r-- 1 der supergroup 127162942 2015-05-28 12:42
/user/der/1987.csv
drwxr-xr-x - der supergroup 0 2015-05-28 16:21
/user/der/temp2
drwxr-xr-x - der supergroup 0 2015-05-28 15:57
/user/der/test
I tried the pig with mapreduce option and still get the same type of error:
pig -x mapreduce totalmiles.pig
5-05-28 20:58:44,608 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.jobc
ontrol.ControlledJob - PigLatin:totalmiles.pig while
submitting
ENOENT: No such file or directory
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Na at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermissi at
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSy
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:600)
at
org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(Job
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(Jo
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobS
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
My core-site.xml has the temp dir as follows:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop</value>
<description>A base for other temporary directories.
</description>
</property>
and my hdfs-site.xml as the namenode and datanode as follows:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/dfs/datanode</value>
</property>
I've gotten a bit further in debugging the issue. It seems my namenode is misconfigured as I cannot reformat it:
[hadoop hdfs formatting gets error failed for Block pool ]
We have to give the hadoop file path as : /user/der/1987.csv
records = LOAD '/user/der/1987.csv' USING PigStorage(',') AS
(Year, Month, DayofMonth, DayOfWeek, DepTime, CRSDepTime, ArrTime,
CRSArrTime, UniqueCarrier, FlightNum, TailNum,ActualElapsedTime,
CRSElapsedTime,AirTime,ArrDelay, DepDelay, Origin, Dest,
Distance:int, TaxIn, TaxiOut, Cancelled,CancellationCode,
Diverted, CarrierDelay, WeatherDelay, NASDelay, SecurityDelay,
lateAircraftDelay);
If its for testing, you can have the file : 1987.csv in the path from where you are executing the pig script, i.e. have 1987.csv and the .pig file in the same location.

hadoop file system list my own root directory

I met a very wired situation when I try to install single node hadoop yarn 2.2.0 on my mac. I follow the tutorial on this link: http://raseshmori.wordpress.com/2012/09/23/install-hadoop-2-0-1-yarn-nextgen/.
When I start the hadoop, and jps to check the status, it shows: (which means normal, I think)
5552 Jps
7162 ResourceManager
7512 Jps
7243 NodeManager
6962 DataNode
7060 SecondaryNameNode
6881 NameNode
However, after enter
hadoop fs -ls /
The files lists are the files in my own root but not the hadoop file system root. There must be some error when I set the hadoop that mix my own fs with the hdfs. Could any one give me a hint about it?
Use the following command for accessing HDFS
hadoop fs -ls hdfs://localhost:9000/
Or
Populate ${HADOOP_CONF_DIR}/core-site.xml as follows. If your doing so even without specifying hdfs:// URI you will be able to access HDFS.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Add the following line at the starting of the file $HOME/yarn/hadoop-2.0.1-alpha/libexec/hadoop-config.sh
export HADOOP_CONF_DIR=$HOME/yarn/hadoop-2.0.1-alpha/etc/hadoop

Resources