Running pig script gives the error as: job has failed. Stop running all dependent jobs - hadoop

I need a help in why I am getting an error while running a pig script. But when I try the same script in a smaller data, it executes successfully.
There are a few questions with similar issues, but none of them have the solution.
My script looks like this:
A = load ‘test.txt’ using TextLoader();
B = foreach A generate STRSPLIT($0,’”,”’) as t;
C = FILTER B BY (t.$1==2 and t.$2 matches ‘.*xxx.*’);
Store C into temp;
The error is:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 40% complete
2013-07-15 14:21:41,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201307111759_7495 has failed! Stop running all dependent jobs
2013-07-15 14:21:41,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-07-15 14:21:42,754 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /xxx/ temp/_temporary/_attempt_201307111759_7495_m_000527_0/part-m-00527 File does not exist. Holder DFSClient_attempt_201307111759_7495_m_000527_0 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1606)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1597)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1652)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1640)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:689)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.c
2013-07-15 14:21:42,754 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
Any help will be appreciated.
Thanks.

After some research, I found that the problem here is LeaseExpiredException. This might be because the output of mapper was removed. One of the reason for this might be due to the quota allocated for the user. In my case, I was running this script in a very large data, and my quota was insufficient to process/store the data.
We can check the quota by the following command:
hadoop fs -count -q /user/username
Thank you.

Related

Not able to export Hbase table into CSV file using HUE Pig Script

I have installed Apache Amabari and configured the Hue. I want to export hbase table data into csv file using pig script but I am getting following error.
2017-06-03 10:27:45,518 [ATS Logger 0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Exception caught by TimelineClientConnectionRetry, will try 30 more time(s).
Message: java.net.ConnectException: Connection refused
2017-06-03 10:27:45,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-06-03 10:27:45,709 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 101: file '/usr/lib/hbase/lib/hbase-common-1.2.0-cdh5.11.0.jar' does not exist.
2017-06-03 10:27:45,899 [main] INFO org.apache.pig.Main - Pig script completed in 4 seconds and 532 milliseconds (4532 ms)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code 2
Oozie Launcher failed, finishing Hadoop job gracefully
Please help me and where I am doing wrong.
Let me know your concerns.

Getting error while loading file in pig:

I am trying to execute pig script in terminal and i am getting following error:
INFO [Thread-13] org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
WARN [Thread-13] org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
INFO [Thread-13] org.apache.hadoop.mapred.JobClient - Cleaning up the staging area file:/tmp/hadoop-biadmin/mapred/staging/biadmin-341199244/.staging/job_local_0001
ERROR [Thread-13] org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:biadmin cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/home/biadmin/PIGData/books.csv
ERROR [main] org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/home/biadmin/PIGData/books.csv
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:886)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:738)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/biadmin/PIGData/books.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
ERROR [main] org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
ERROR [main] org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias b
Details at logfile: /opt/ibm/biginsights/pig/bin/pig_1487413261020.log
can anybody help me to resolve this?
The code:
data = LOAD '/home/biadmin/PIGData/books.csv';
b = FOREACH data GENERATE $0;
DUMP b;
Based on the above exception , the input file is not there in the given path file:/home/biadmin/PIGData/books.csv. (which is local file system path)
Pig has two execution modes:
1. local mode (To process local file system files)
$ pig -x local
2. Mapreduce mode (To process HDFS file system files)
$ pig or $ pig -x mapreduce
Make sure that you are running the pig script in appropriate mode.

Pig script for frequency of books published each year

I am trying to run the pig script by following the steps given on this link- http://www.orzota.com/pig-tutorialfor-beginners/
But I am getting this error.It is not able to read the file loaded into HDFS.
Can you please help? The error is as follows-
Failed Jobs:
JobId Alias Feature Message Outputs
N/A BookXRecords,CountByYear,GroupByYear GROUP_BY,COMBINER Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:271)
/user/hduser/output/pig_output_bookx,
Input(s):
Failed to read data from "/user/hduser/input/BX-BooksCorrected1.txt"
Output(s):
Failed to produce result in "/user/hduser/output/pig_output_bookx"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2015-02-19 22:19:45,852 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
As I understand you have just downloaded the script and run. That is the reason why your the script is not able to find the exact file you want to run the pig script on. Please ensure:
You have run sed command to filter the books.csv file on your local system
Name the filtered file which you have mentioned in the script get after running the sed (Not sure what's in your case but it should be BX-BooksCorrected or BX-BooksCorrected1, please check)
Then move that file into HDFS and then try running the script it will work and wont give the error
P.S.: You get to know the nature of the error by carefully reading the error log. Happy Hadooping!

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
2014-09-06 08:41:23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-09-06 08:41:23,544 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-09-06 08:41:23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-09-06 08:41:23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-09-06 08:41:23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job2739171785773930333.jar
2014-09-06 08:42:39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job2739171785773930333.jar created
2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-09-06 08:42:39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-09-06 08:42:39,630 [Thread-12] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2014-09-06 08:42:39,891 [Thread-12] INFO org.apache.hadoop.mapred.JobClient - Cleaning up the staging area hdfs://192.168.195.130:8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009
2014-09-06 08:42:39,891 [Thread-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
2014-09-06 08:42:40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop running all dependent jobs
2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.0.0-cdh4.1.1 0.10.0-cdh4.1.1 training 2014-09-06 08:41:23 2014-09-06 08:42:40 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A inputfile MAP_ONLY Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1014)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1031)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:531)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:318)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:238)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:269)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785,
Input(s):
Failed to read data from "/training/pig/datetemp.txt"
Output(s):
Failed to produce result in "hdfs://192.168.195.130:8020/tmp/temp-1004538676/tmp1582688785"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2014-09-06 08:42:40,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-09-06 08:42:40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias inputfile
Details at logfile: /home/training/pig_1410006833865.log
Please help me here..!!
PigStorage is case sensitive. Use PigStorage and not pigstorage.
Your question headliner said you were trying to load a CSV file. For that, I've had good luck with using org.apache.pig.piggybank.storage.CSVExcelStorage() in my LOAD statements as demonstrated at https://martin.atlassian.net/wiki/x/WYBmAQ.
Why don't you write PigStorage('\t') as you have mentioned already you have tab delimited file instead of PigStorage()
Mentioned code -
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage()
As (EventID: chararray,eventdate: chararray,count:int);
May be this might solve your problem.
let me know if it is something else.
hdfs://192.168.195.130:8020/training/pig/datetemp.txt
file wat not found in your hdfs!! make sure the input file to be placed in the above location.
Have you checked whether the input path exists?
Try:
fs -ls /training/pig/ in Grunt Shell
if it displays datetemp.txt in the list then it will work otherwise give proper input path
Log is telling the ERROR clearly.
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://192.168.195.130:8020/training/pig/datetemp.txt
Can you check the file exists in HDFS or not?
You can also check your pig is running in mapreduce mode or local mode.
You can specify ',' in PigStorage Class to read CSV file.
Query looks like :
grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int);
grunt> dump inputfile;
And make sure that you have file '/training/pig/datetemp.txt' on HDFS.
To test run : hadoop fs -ls /training/pig/datetemp.txt

Pig Join in Cloudera VM

I try to perform a simple join in apache pig. The datasets that I use are from http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html
This is what I do in the pig shell:
profiles = LOAD '/user/hadoop/tests/userid-profile.tsv' AS (id,gender,age,country, dreg);
songs = LOAD '/user/hadoop/tests/userid-timestamp-artid-artname-traid-traname.tsv' AS (userID, timestamp, artistID, artistName, trackID, trackName);
prDACH = filter profiles by country=='Germany' or country=='Austria' or country=='Switzerland';
songsDACH = join songs by userID, prDACH by id;
dump songsDACH;
This is a part of the log:
2013-04-20 01:01:33,885 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-04-20 01:02:39,802 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 2% complete
2013-04-20 01:13:23,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 37% complete
2013-04-20 01:14:48,704 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 39% complete
2013-04-20 01:15:40,166 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 41% complete
2013-04-20 01:15:41,142 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-20 01:15:41,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1366403809583_0023 has failed! Stop running all dependent jobs
2013-04-20 01:15:41,143 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-20 01:15:43,117 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: AttemptID:attempt_1366403809583_0023_m_000019_0 Info:Container killed by the ApplicationMaster.
When I use a small sample of the songs then the join is performed without any problem.
Any ideas?
It looks like it is a problem on the hdfs settings, since I can perform the join using a subset of the songs data (100000 samples).
PS I am using the cloudera demo vm.
You should have a look at the task attempt's log: point your browser at the job tracker (http://[your-jobtracker-node]:50030), look for the failed job, find a failed task attempt, browse through the log and you'll be able to see the actual exception - I suspect that it may have something to do with task heap size configuration, but you'll have to look at the exception first and then come up with a solution (configuration change, etc.).

Resources