Not able to insert data into hbase table using PIG - hadoop

if i run ->
data = LOAD 'hdfs:/user/zzz/Pokemon.csv' USING PigStorage(',') AS (serial_no:int,name:chararray,type1:chararray,type2:chararray,total:int,hp:int,attack:int,defence:int,sp_attk:int,sp_def:int,speed:int);
data loaded successfully as i can see by dumping the data.
but after that when i run ->
STORE data INTO 'hbase://pokemons' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:name,cf:type1,cf:type2,cf:total,cf:hp,cf:attack,cf:defence,cf:sp_attk,cf:sp_def,cf:speed');
then the problem arises you can check that below ->
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
3.2.1 0.17.0 zzz 2019-12-11 12:57:34 2019-12-11 12:57:43 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_1576044193401_0008 data MAP_ONLY Message: Job failed! hbase://pokemons,
Input(s):
Failed to read data from "hdfs:/user/zzz/Pokemon.csv"
Output(s):
Failed to produce result in "hbase://pokemons"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1576044193401_0008
2019-12-11 12:57:43,115 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!

I'm not exactly sure what happened but I do know that current Pig is not tested against hadoop version 3 or higher. Tracked in https://issues.apache.org/jira/browse/PIG-5253

Related

hadoop stuck at “running job”

I want to running the hadoop word count program from the doc. However the program stuck at running job
16/09/02 10:51:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/02 10:51:13 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/09/02 10:51:13 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/09/02 10:51:14 INFO input.FileInputFormat: Total input paths to process : 1
16/09/02 10:51:14 INFO mapreduce.JobSubmitter: number of splits:2
16/09/02 10:51:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1472783047951_0003
16/09/02 10:51:14 INFO impl.YarnClientImpl: Submitted application application_1472783047951_0003
16/09/02 10:51:14 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1472783047951_0003/
16/09/02 10:51:14 INFO mapreduce.Job: Running job: job_1472783047951_0003
And show http://hadoop-master:8088/proxy/application_1472783047951_0003/
And it runs a AppMaster on http://hadoop-slave2:8042, show it
However, since it stucks on WordCount, it also stuck on Hive
hive (default)> select a, b, count(1) as cnt from newtb group by a, b;
Query ID = hadoop_20160902110124_d2b2680b-c493-4986-aa84-f65794bfd8e4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1472783047951_0004, Tracking URL = http://hadoop-master:8088/proxy/application_1472783047951_0004/
Kill Command = /opt/hadoop-2.6.4/bin/hadoop job -kill job_1472783047951_0004
The is nothing wrong with select *.
hive (default)> select * from newtb;
OK
1 2 3
1 3 4
2 3 4
5 6 7
8 9 0
1 8 3
Time taken: 0.101 seconds, Fetched: 6 row(s)
So, I think there is something wrong with MapReduce. There is enough disk an memory. So, How to solve it?
You are having issues because application master is unable to start containers and run the job. First try restarting your system and if it doesn't change you have to change memory allocations in yarn-site.xml and mapred-site.xml. go with basic memory settings.
use Following links
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/#yarn-configuration_1
here
here
As I was running hadoop on the ubuntu as a quest on the wmware, I just do increase the amount of RAM that I had been allocated to the ubuntu, I allocated 4GB Ram instead of 2GB Ram to the ubuntu, finally it could continue out and finished the job.

problems with hive query in cloudera

I can do all other queries in hive, but when I do a join it just gets stuck.
hive> select count (*) from tab10 join tab1;
Warning: Map Join MAPJOIN[13][bigTable=tab10] in task 'Stage-2:MAPRED' is a cross product
Query ID = root_20160406145959_b57642e0-7499-41a0-914c-0004774fe4ac
Total jobs = 1
Execution log at: /tmp/root/root_20160406145959_b57642e0-7499-41a0-914c-0004774fe4ac.log
2016-04-06 03:00:03 Starting to launch local task to process map join; maximum memory = 2058354688
2016-04-06 03:00:03 Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable
2016-04-06 03:00:03 Uploaded 1 File to: file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable (280 bytes)
2016-04-06 03:00:03 End of local task; Time Taken: 0.562 sec.
Its hung at this point, and it doesn't spawn any of the map reduce tasks at all. What could be wrong?
I did see this in hive.log.
2016-04-06 15:00:00,124 INFO [main]: ql.Driver (Driver.java:launchTask(1643)) - Starting task [Stage-5:MAPREDLOCAL] in serial mode
2016-04-06 15:00:00,125 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(159)) - Generating plan file file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10006/plan.xml
2016-04-06 15:00:00,233 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(288)) - Executing: /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/jars/hive-exec-1.1.0-cdh5.5.2.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10006/plan.xml -jobconffile file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10007/jobconf.xml
There is nothing beyond this. Anyone knows how to fix this ?
Open mapred-site.xml file and add the property:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
you need to increase your heap memory used by hadoop JVM

Pig script for frequency of books published each year

I am trying to run the pig script by following the steps given on this link- http://www.orzota.com/pig-tutorialfor-beginners/
But I am getting this error.It is not able to read the file loaded into HDFS.
Can you please help? The error is as follows-
Failed Jobs:
JobId Alias Feature Message Outputs
N/A BookXRecords,CountByYear,GroupByYear GROUP_BY,COMBINER Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
at java.lang.Thread.run(Thread.java:662)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:271)
/user/hduser/output/pig_output_bookx,
Input(s):
Failed to read data from "/user/hduser/input/BX-BooksCorrected1.txt"
Output(s):
Failed to produce result in "/user/hduser/output/pig_output_bookx"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2015-02-19 22:19:45,852 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
As I understand you have just downloaded the script and run. That is the reason why your the script is not able to find the exact file you want to run the pig script on. Please ensure:
You have run sed command to filter the books.csv file on your local system
Name the filtered file which you have mentioned in the script get after running the sed (Not sure what's in your case but it should be BX-BooksCorrected or BX-BooksCorrected1, please check)
Then move that file into HDFS and then try running the script it will work and wont give the error
P.S.: You get to know the nature of the error by carefully reading the error log. Happy Hadooping!

Loading data from HDFS to HBASE

I'm using Apache hadoop 1.1.1 and Apache hbase 0.94.3.I wanted to load data from HDFS to HBASE.
I wrote pig script to serve the purpose. First i created hbase table in habse and next wrote pig script to load the data from HDFS to HBASE. But it is not loading the data into hbase table. Dont know where it's going worng.
Below is the command used to craete hbase table :
create table 'mydata','mycf'
Below is the pig script to load data from hdfs to hbase:
A = LOAD '/user/hduser/Dataparse/goodrec1.txt' USING PigStorage(',') as (c1:int, c2:chararray,c3:chararray,c4:int,c5:chararray);
STORE A INTO 'hbase://mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:c1,mycf:c2,mycf:c3,mycf:c4,mycf:c5');
After excecuting the script it says
2014-04-29 16:01:06,367 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-04-29 16:01:06,376 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-04-29 16:01:06,382 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.1.1 0.12.0 hduser 2014-04-29 15:58:07 2014-04-29 16:01:06 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201403142119_0084 A MAP_ONLY Message: Job failed! Error - JobCleanup Task Failure, Task: task_201403142119_0084_m_000001 hbase://mydata,
Input(s):
Failed to read data from "/user/hduser/Dataparse/goodrec1.txt"
Output(s):
Failed to produce result in "hbase://mydata"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201403142119_0084
2014-04-29 16:01:06,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
Please help where i'm going wrong ?
You have specified too many columns in the output to hbase. You have 5 input columns and 5 output columns, but HBaseStorage requires the first column to be the row key so there should only be 4 in the output

Transfer data to HBASE using Pig

I have pseudo distributed mode.. Versions are as follows...
kandabap#prakashl:~$ hadoop version
Hadoop 1.2.0
kandabap#prakashl:~$ pig version
Apache Pig version 0.12.0 (r1529718) compiled Oct 07 2013, 12:20:14
kandabap#prakashl:~$ hbase version
HBase 0.90.1-cdh3u0
I am trying to transfer data into HBASE using PIG script as follows...
PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.8/hbase-0.94.8.jar:/usr/lib/hbase/hbase-0.94.8/lib/zookeeper-3.4.5.jar /usr/lib/pig/pig-0.12.0/bin/pig /home/kandabap/Documents/H/HBASE/scripts.pig
But there seems to be some errors as,
Failed Jobs:
JobId Alias Feature Message Outputs
job_201405201031_0005 raw_data MAP_ONLY Message: Job failed! Error - JobCleanup Task Failure, Task: task_201405201031_0005_m_000001 hbase://sample_names,
Input(s):
Failed to read data from "/input.csv"
Output(s):
Failed to produce result in "hbase://sample_names"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201405201031_0005
2014-05-20 11:14:59,604 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2014-05-20 11:14:59,608 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /usr/lib/hbase/hbase-0.94.8/pig_1400548430579.log

Resources