Pig Simple Dump function - hadoop

My Input file is below . I am trying to dump the loaded data in relation. I am using pig 0.12.
a,t1,1000,100
a,t1,2000,200
b,t2,1000,200
b,t2,5000,100
I entered into HDFS mode by entering pig
myinput = LOAD 'file' AS(a1:chararray,a2:chararray,amt:int,rate:int);
if i do dump myinput then it shows the below error.
describe, illustrate works fine..
so
dump myinput ;
As soon i enter the dump command i get the below error message.
ERROR org.apache.hadoop.ipc.RPC - FailoverProxy: Failing this Call: submitJob for error (RemoteException): org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: User 'myid' cannot perform operation SUBMIT_JOB on queue default.
Please run "hadoop queue -showacls" command to find the queues you have access to .
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:179)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:136)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:113)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:4541)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:993)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1326)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1320)
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias myinput
Is this access issues? kind of privilege issue?
Can someone help me

If you didn't mention any load functions like PigStorage('\t') then it reads data with the column separator as tab(\t) by default.
In your data, the column separator is comma(,)
So Try this one,
myinput = LOAD 'file' using PigStorage(',') AS(a1:chararray,a2:chararray,amt:int,rate:int);
Hope it should work..

you could describe your input data(separator), in your case comma :
try this code please :
myinput = LOAD 'file' USING PigStorage(',') AS (a1:chararray,a2:chararray,amt:int,rate:int);

Related

Unable to create(Mismatch between expected number of columns) Dashboard report in Jmeter...!

I have issues while generating dashboard report in Jmeter (through command line)
1)Coped reportgenerator Properties to User Properties file
2)Restarted Jmeter to pick up the data
3)Added below to user properties file:
jmeter.save.saveservice.bytes=true
jmeter.save.saveservice.label=true
jmeter.save.saveservice.latency=true
jmeter.save.saveservice.response_code=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.successful=true
jmeter.save.saveservice.thread_counts=true
jmeter.save.saveservice.thread_name=true
jmeter.save.saveservice.time=true
jmeter.save.saveservice.timestamp_format=ms
jmeter.save.saveservice.timestamp_format=yyyy/MM/dd HH:mm:ss
I feel main problem is with mismatch with the CSV file/JTL file I have and trying to create report. – Give me your suggestions
ERROR | An error occurred:
org.apache.jmeter.report.dashboard.GenerationException: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.dashboard.ReportGeenter code herenerator.generate(ReportGenerator.java:246)
at org.apache.jmeter.JMeter.start(JMeter.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.jmeter.NewDriver.main(NewDriver.java:248)
Caused by: org.apache.jmeter.report.core.SampleException: Mismatch between expected number of columns:16 and columns in CSV file:6, check your
jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.core.CsvSampleReader.nextSample(CsvSampleReader.java:183)
at org.apache.jmeter.report.core.CsvSampleReader.readSample(CsvSampleReader.java:201)
at org.apache.jmeter.report.processor.CsvFileSampleSource.produce(CsvFileSampleSource.java:180)
at org.apache.jmeter.report.processor.CsvFileSampleSource.run(CsvFileSampleSource.java:238)
at org.apache.jmeter.report.dashboard.ReportGenerator.generate(ReportGenerator.java:244)
... 6 more
An error occurred: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
errorlevel=1
I made the same mistake. Just forget about those properties and copy in user.properties file only this:
jmeter.reportgenerator.overall_granularity=60000
jmeter.reportgenerator.apdex_statisfied_threshold=1500
jmeter.reportgenerator.apdex_tolerated_threshold=3000
jmeter.reportgenerator.exporter.html.series_filter=((^s0)|(^s1))(-success|-failure)?
jmeter.reportgenerator.exporter.html.filters_only_sample_series=true
Then from the command line run this:
.\jmeter -n -t sample_jmeter_test.jmx -l test.csv -e -o tmp
Where:
.\jmeter - you run the jmeter in \bin directory
sample_jmeter_test.jmx - name of the test that will be run, located in \bin directory
test.csv - located again in the \bin directory, this is the file that all gathered statistics will be written into
tmp is the directory where I create under \bin where the dashboard files will be saved
The file csv or jtl may be in writing still, so jmeter process report try to read file while another field and row are added to the same file. Infact I resolve the error by manual running of report generation command on the same jtl file:
jmeter -g <file csv or jtl> -o <path report>
may be possible configure a delay after running load process and the report process but I don't know if exist this option.

Pig Script issues

I am using pig with Hcatalog to load data from hive external table
I enter grunt using pig -useHCatalog and execute the following:
register 'datafu'
define Enumerate datafu.pig.bags.Enumerate('1');
imported_data = load 'hive external table' using org.apache.hive.hcatalog.pig.HCatLoader() ;
converted_data = foreach imported_data generate name,ip,domain,ToUnixTime(ToDate(dateandtime,'MM/dd/yyyy hh:mm:ss.SSS aa'))as unix_DateTime,date;
grouped = group converted_data by (name,ip,domain);
result = FOREACH grouped {
sorted = ORDER converted_data BY unix_DateTime;
sorted2 = Enumerate(sorted);
GENERATE FLATTEN(sorted2);
};
All commands run and provide desired result.
Problem:
I made a pig script with above commands named as pigFinal.pig and executed the following in local mode coz script in local filesystem.
pig -useHCatalog -x local '/path/to/pigFinal.pig';
Exception
Failed to generate logical plan. Nested exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070:
Could not resolve datafu.pig.bags.Enumerate using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at
org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1507)
at
org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9372)
at
org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11051)
at
org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10810)
at
org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10159)
at
org.apache.pig.parser.LogicalPlanGenerator.nested_command(LogicalPlanGenerator.java:16315)
at
org.apache.pig.parser.LogicalPlanGenerator.nested_blk(LogicalPlanGenerator.java:16116)
at
org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:16024)
at
org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15849)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 17 more
Where do i need register datafu jar for pig scripts?I guess this is the issue.
Please help
You have to ensure the jar file is located in the same folder as your pigscript or ensure that the correct path is provided in the pigscript while registering the jar file. So in your case
Modify this
register 'datafu'
To
-- If,lets say datafu-1.2.0.jar is your jar file and is located in the same folder as your pigscript then in your pigscript at the top have this
REGISTER datafu-1.2.0.jar
-- Else,lets say datafu-1.2.0.jar is your jar file and is located in the folder /usr/hadoop/lib then in your pigscript at the top have this
REGISTER /usr/hadoop/lib/datafu-1.2.0.jar
pig -useHCatalog \
-x local \
-Dpig.additional.jars="/local/path/to/datafu.jar:/local/path//other.jar" \
/path/to/pigFinal.pig;
OR
in your pig script use fully qualified path
register /local/path/to/datafu.jar;

Error getting when passing parameter through pig script

When I'm trying to invoke pig script with property file then I'm getting error:
pig -P /mapr/ANALYTICS/apps/PigTest/pig.properties -f pig_if_condition.pig
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hbase/hbase-0.98.4/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/20 15:42:52 ERROR pig.Main: ERROR 2999: Unexpected internal error. Unable to parse properties file
'/mapr/ANALYTICS/apps/PigTest/pig.properties'
15/05/20 15:42:52 WARN pig.Main: There is no log file to write to.
15/05/20 15:42:52 ERROR pig.Main: java.lang.RuntimeException: Unable to parse properties file
'/mapr/ANALYTICS/apps/PigTest/pig.properties'
at org.apache.pig.Main.run(Main.java:343)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Pig script is:
test = load '$path' USING PigStorage(',') AS (ip:chararray,country:chararray);
DUMP test;
-param (-p) is to specify a single parameter. To specify parameter file we have to use -param_file attribute.
Short Cut Commands :
-m same as -param_file
-p same as -param
Usage :
pig -param_file {property_file} -f {pig_file}
example :
pig -param_file a.properties -f a.pig
Pig Script : a.pig
A = LOAD '$INPUT' USING PigStorage(',') AS (country_code:chararray, country_name:chararray);
DUMP A;
Property File : a.properties
INPUT=a.csv
test file : a.csv
IN,India
US,United States
UK,United Kingdom
Output :
(IN,India)
(US,United States)
(UK,United Kingdom)

Using s3distcp with Amazon EMR to copy a single file

I want to copy just a single file to HDFS using s3distcp. I have tried using the srcPattern argument but it didn't help and it keeps on throwing java.lang.Runtime exception.
It is possible that the regex I am using is the culprit, please help.
My code is as follows:
elastic-mapreduce -j $jobflow --jar s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar --args '--src,s3://<mybucket>/<path>' --args '--dest,hdfs:///output' --arg --srcPattern --arg '(filename)'
Exception thrown:
Exception in thread "main" java.lang.RuntimeException: Error running job at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:586) at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/a088f00d-a67e-4239-bb0d-32b3a6ef0105/files at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1036) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1028) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:897) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:871) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1308) at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:568) ... 9 more
DistCp is intended to copy many files using many machines. DistCp is not the right tool if you want to only copy one file.
On the hadoop master node, you can copy a single file using
hadoop fs -cp s3://<mybucket>/<path> hdfs:///output
The regex I was using was indeed the culprit.
Say the file names have dates, for example files are like abcd-2013-06-12.gz , then in order to copy ONLY this file, following emr command should do:
elastic-mapreduce -j $jobflow --jar
s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar
--args '--src,s3:///' --args '--dest,hdfs:///output' --arg --srcPattern --arg '.*2013-06-12.gz'
If I remember correctly, my regex initially was *2013-06-12.gz and not .*2013-06-12.gz. So the dot at the beginning was needed.

Pig is not loading csv

I'm trying to load a pipe delimited file ('|') in pig using the following command:
A = load 'test.csv' using PigStorage('|');
But I keep getting this error:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. java.net.URISyntaxException cannot be cast to java.lang.Error
I've looked all over, but I can't find any reason this would happen. The test file I have above is a simple file that just contains 1|2|3 for testing.
If you are running Pig in MAPREDUCE as the ExecType mode, then the following command should work
A = LOAD '/user/pig/input/pipetest.csv' USING PigStorage('|');
DUMP A;
Here is the output on your screen
(1,2,3)
Note that I have included the full path in HDFS for my csv file in the LOAD command

Resources