Transfer data to HBASE using Pig - hadoop

I have pseudo distributed mode.. Versions are as follows...
kandabap#prakashl:~$ hadoop version
Hadoop 1.2.0
kandabap#prakashl:~$ pig version
Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
kandabap#prakashl:~$ hbase version
HBase 0.90.1-cdh3u0
I am trying to transfer data into HBASE using PIG script as follows...
PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.8/hbase-0.94.8.jar:/usr/lib/hbase/hbase-0.94.8/lib/zookeeper-3.4.5.jar /usr/lib/pig/pig-0.12.0/bin/pig /home/kandabap/Documents/H/HBASE/scripts.pig
But there seems to be some errors as,
Failed Jobs:
JobId Alias Feature Message Outputs
job_201405201031_0005 raw_data MAP_ONLY Message: Job failed! Error - JobCleanup Task Failure, Task: task_201405201031_0005_m_000001 hbase://sample_names,
Input(s):
Failed to read data from "/input.csv"
Output(s):
Failed to produce result in "hbase://sample_names"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201405201031_0005
2014-05-20 11:14:59,604 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-05-20 11:14:59,608 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /usr/lib/hbase/hbase-0.94.8/pig_1400548430579.log

Related

Not able to insert data into hbase table using PIG

if i run ->
data = LOAD 'hdfs:/user/zzz/Pokemon.csv' USING PigStorage(',') AS (serial_no:int,name:chararray,type1:chararray,type2:chararray,total:int,hp:int,attack:int,defence:int,sp_attk:int,sp_def:int,speed:int);
data loaded successfully as i can see by dumping the data.
but after that when i run ->
STORE data INTO 'hbase://pokemons' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:name,cf:type1,cf:type2,cf:total,cf:hp,cf:attack,cf:defence,cf:sp_attk,cf:sp_def,cf:speed');
then the problem arises you can check that below ->
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.1 0.17.0 zzz 2019-12-11 12:57:34 2019-12-11 12:57:43 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1576044193401_0008 data MAP_ONLY Message: Job failed! hbase://pokemons,
Input(s):
Failed to read data from "hdfs:/user/zzz/Pokemon.csv"
Output(s):
Failed to produce result in "hbase://pokemons"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1576044193401_0008
2019-12-11 12:57:43,115 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
I'm not exactly sure what happened but I do know that current Pig is not tested against hadoop version 3 or higher. Tracked in https://issues.apache.org/jira/browse/PIG-5253

Pig on mapreduce mode is stuck on dumping hdfs data in Hortonworks HDP

I have some data files in my Hortonworks HDFS location. My requirement is to dump HDFS data in pig shell using pig-mapreduce mode. After loading the file data from HDFS, when trying to dump the data in pig shell using DUMP command, the map reduce job is getting stuck at 0% and not completing the job as well for a long time.
Followed the given below steps:
1) Start pig on mapreduce mode:
pig -x mapreduce
2) Load data into pig from a HDFS directory:
mapdata = load 'hdfs://ip-xxx-xx-xx-xx.us-east-2.compute.internal:8020/user/abc/datadir1' as (a:map[chararray]);
3) Print data:
dump mapdata;
After executing the 3rd step getting given below messages on the shell:
2018-10-09 07:25:51,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2018-10-09 07:25:51,099 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1539066382468_0147]

Pig script for frequency of books published each year

I am trying to run the pig script by following the steps given on this link- http://www.orzota.com/pig-tutorialfor-beginners/
But I am getting this error.It is not able to read the file loaded into HDFS.
Can you please help? The error is as follows-
Failed Jobs:
JobId Alias Feature Message Outputs
N/A BookXRecords,CountByYear,GroupByYear GROUP_BY,COMBINER Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:271)
/user/hduser/output/pig_output_bookx,
Input(s):
Failed to read data from "/user/hduser/input/BX-BooksCorrected1.txt"
Output(s):
Failed to produce result in "/user/hduser/output/pig_output_bookx"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2015-02-19 22:19:45,852 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
As I understand you have just downloaded the script and run. That is the reason why your the script is not able to find the exact file you want to run the pig script on. Please ensure:
You have run sed command to filter the books.csv file on your local system
Name the filtered file which you have mentioned in the script get after running the sed (Not sure what's in your case but it should be BX-BooksCorrected or BX-BooksCorrected1, please check)
Then move that file into HDFS and then try running the script it will work and wont give the error
P.S.: You get to know the nature of the error by carefully reading the error log. Happy Hadooping!

Loading data from HDFS to HBASE

I'm using Apache hadoop 1.1.1 and Apache hbase 0.94.3.I wanted to load data from HDFS to HBASE.
I wrote pig script to serve the purpose. First i created hbase table in habse and next wrote pig script to load the data from HDFS to HBASE. But it is not loading the data into hbase table. Dont know where it's going worng.
Below is the command used to craete hbase table :
create table 'mydata','mycf'
Below is the pig script to load data from hdfs to hbase:
A = LOAD '/user/hduser/Dataparse/goodrec1.txt' USING PigStorage(',') as (c1:int, c2:chararray,c3:chararray,c4:int,c5:chararray);
STORE A INTO 'hbase://mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:c1,mycf:c2,mycf:c3,mycf:c4,mycf:c5');
After excecuting the script it says
2014-04-29 16:01:06,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-04-29 16:01:06,376 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-04-29 16:01:06,382 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.1.1 0.12.0 hduser 2014-04-29 15:58:07 2014-04-29 16:01:06 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201403142119_0084 A MAP_ONLY Message: Job failed! Error - JobCleanup Task Failure, Task: task_201403142119_0084_m_000001 hbase://mydata,
Input(s):
Failed to read data from "/user/hduser/Dataparse/goodrec1.txt"
Output(s):
Failed to produce result in "hbase://mydata"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201403142119_0084
2014-04-29 16:01:06,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
Please help where i'm going wrong ?
You have specified too many columns in the output to hbase. You have 5 input columns and 5 output columns, but HBaseStorage requires the first column to be the row key so there should only be 4 in the output

How to use hadoop pig streaming a compiled c program?

I have testing the hadoop pig on a small cluster.
I have successfully using pig to stream perl, python, shell script and even jars but not c binaries!
I just build a simple Hello World program in c
And compiled it as test
And run it using ./test under ubuntu11.04 and g++ compiler is up to date.
The program run perfectly in the OS.
However when I tried to stream it in pig, it is always failed!
This is the pig script:
a = load ('test.txt');
define p `./test` ship('/home/clouduser/test');
b = stream a through p;
dump p;
test.txt contains only a space
and I have test the same configuration with perl, python, shell script and java successfully.
grunt> a = load 'test.txt';
grunt> define p `./1.sh` ship('/home/clouduser/1.sh');
grunt> b = stream a through p;
grunt> dump b
2011-09-08 23:53:33,940 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: STREAMING
2011-09-08 23:53:33,940 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-09-08 23:53:34,017 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: b: Store(hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919:org.apache.pig.impl.io.InterStorage) - scope-2 Operator Key: scope-2)
2011-09-08 23:53:34,026 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2011-09-08 23:53:34,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-09-08 23:53:34,048 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-09-08 23:53:34,111 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-09-08 23:53:34,126 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-09-08 23:53:35,938 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-09-08 23:53:35,994 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-09-08 23:53:36,312 [Thread-9] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2011-09-08 23:53:36,313 [Thread-9] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2011-09-08 23:53:36,324 [Thread-9] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2011-09-08 23:53:36,324 [Thread-9] WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
2011-09-08 23:53:36,326 [Thread-9] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2011-09-08 23:53:36,494 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-09-08 23:53:37,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201109051400_0283
2011-09-08 23:53:37,101 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://172.19.1.4:50030/jobdetails.jsp?jobid=job_201109051400_0283
2011-09-08 23:54:01,755 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201109051400_0283 has failed! Stop running all dependent jobs
2011-09-08 23:54:01,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-09-08 23:54:01,774 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
2011-09-08 23:54:01,774 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-09-08 23:54:01,776 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u1 0.8.1-cdh3u1 clouduser 2011-09-08 23:53:34 2011-09-08 23:54:01 STREAMING
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201109051400_0283 a,b STREAMING,MAP_ONLY Message: Job failed! Error - NA hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919,
Input(s):
Failed to read data from "hdfs://cloudlab-namenode/user/clouduser/test.txt"
Output(s):
Failed to produce result in "hdfs://cloudlab-namenode/tmp/temp-502536453/tmp-1972014919"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201109051400_0283
2011-09-08 23:54:01,776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2011-09-08 23:54:01,793 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
Details at logfile: /home/clouduser/pig_1315540364239.log
I even tried to run this file in shell script and ship the shell script and this c binary
but still failed!
Anyone has idea?!
StackOverflow Seems not allowing the original c code but the code runs fine
From the given log:
Failed to read data from "hdfs://cloudlab-namenode/user/clouduser/test.txt"
Make sure you have the file test.txt in the cluster path "hdfs://cloudlab-namenode/user/clouduser/test.txt"
From the log line:
2011-09-08 23:54:01,793 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan: './1.sh ' failed with exit status: 127
Check if ./1.sh can be executed?

Resources