When I'm trying to invoke pig script with property file then I'm getting error:
pig -P /mapr/ANALYTICS/apps/PigTest/pig.properties -f pig_if_condition.pig
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mapr/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mapr/hbase/hbase-0.98.4/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/05/20 15:42:52 ERROR pig.Main: ERROR 2999: Unexpected internal error. Unable to parse properties file
'/mapr/ANALYTICS/apps/PigTest/pig.properties'
15/05/20 15:42:52 WARN pig.Main: There is no log file to write to.
15/05/20 15:42:52 ERROR pig.Main: java.lang.RuntimeException: Unable to parse properties file
'/mapr/ANALYTICS/apps/PigTest/pig.properties'
at org.apache.pig.Main.run(Main.java:343)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Pig script is:
test = load '$path' USING PigStorage(',') AS (ip:chararray,country:chararray);
DUMP test;
-param (-p) is to specify a single parameter. To specify parameter file we have to use -param_file attribute.
Short Cut Commands :
-m same as -param_file
-p same as -param
Usage :
pig -param_file {property_file} -f {pig_file}
example :
pig -param_file a.properties -f a.pig
Pig Script : a.pig
A = LOAD '$INPUT' USING PigStorage(',') AS (country_code:chararray, country_name:chararray);
DUMP A;
Property File : a.properties
INPUT=a.csv
test file : a.csv
IN,India
US,United States
UK,United Kingdom
Output :
(IN,India)
(US,United States)
(UK,United Kingdom)
Related
I have issues while generating dashboard report in Jmeter (through command line)
1)Coped reportgenerator Properties to User Properties file
2)Restarted Jmeter to pick up the data
3)Added below to user properties file:
jmeter.save.saveservice.bytes=true
jmeter.save.saveservice.label=true
jmeter.save.saveservice.latency=true
jmeter.save.saveservice.response_code=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.successful=true
jmeter.save.saveservice.thread_counts=true
jmeter.save.saveservice.thread_name=true
jmeter.save.saveservice.time=true
jmeter.save.saveservice.timestamp_format=ms
jmeter.save.saveservice.timestamp_format=yyyy/MM/dd HH:mm:ss
I feel main problem is with mismatch with the CSV file/JTL file I have and trying to create report. – Give me your suggestions
ERROR | An error occurred:
org.apache.jmeter.report.dashboard.GenerationException: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.dashboard.ReportGeenter code herenerator.generate(ReportGenerator.java:246)
at org.apache.jmeter.JMeter.start(JMeter.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.jmeter.NewDriver.main(NewDriver.java:248)
Caused by: org.apache.jmeter.report.core.SampleException: Mismatch between expected number of columns:16 and columns in CSV file:6, check your
jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.core.CsvSampleReader.nextSample(CsvSampleReader.java:183)
at org.apache.jmeter.report.core.CsvSampleReader.readSample(CsvSampleReader.java:201)
at org.apache.jmeter.report.processor.CsvFileSampleSource.produce(CsvFileSampleSource.java:180)
at org.apache.jmeter.report.processor.CsvFileSampleSource.run(CsvFileSampleSource.java:238)
at org.apache.jmeter.report.dashboard.ReportGenerator.generate(ReportGenerator.java:244)
... 6 more
An error occurred: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
errorlevel=1
I made the same mistake. Just forget about those properties and copy in user.properties file only this:
jmeter.reportgenerator.overall_granularity=60000
jmeter.reportgenerator.apdex_statisfied_threshold=1500
jmeter.reportgenerator.apdex_tolerated_threshold=3000
jmeter.reportgenerator.exporter.html.series_filter=((^s0)|(^s1))(-success|-failure)?
jmeter.reportgenerator.exporter.html.filters_only_sample_series=true
Then from the command line run this:
.\jmeter -n -t sample_jmeter_test.jmx -l test.csv -e -o tmp
Where:
.\jmeter - you run the jmeter in \bin directory
sample_jmeter_test.jmx - name of the test that will be run, located in \bin directory
test.csv - located again in the \bin directory, this is the file that all gathered statistics will be written into
tmp is the directory where I create under \bin where the dashboard files will be saved
The file csv or jtl may be in writing still, so jmeter process report try to read file while another field and row are added to the same file. Infact I resolve the error by manual running of report generation command on the same jtl file:
jmeter -g <file csv or jtl> -o <path report>
may be possible configure a delay after running load process and the report process but I don't know if exist this option.
I'm trying to set up a basic Kafka-Flume-HDFS pipeline.
Kafka is up and running but when I start the flume agent via
bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -D flume.root.logger=INFO,console
it seems like the agent isn't coming up as the only console log I get is:
Info: Sourcing environment configuration script /opt/hadoop/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /opt/jdk1.8.0_111/bin/java -Xmx20m -D -cp '/opt/hadoop/flume/conf:/opt/hadoop/flume/lib/*:/opt/hadoop/flume/lib/:/lib/*' -Djava.library.path= org.apache.flume.node.Application -n flume1 -f conf/flume-conf.properties flume.root.logger=INFO,console
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/flume/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
The flume config file:
flume1.sources = kafka-source-1
flume1.channels = hdfs-channel-1
flume1.sinks = hdfs-sink-1
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = localhost:2181
flume1.sources.kafka-source-1.topic = twitter_topic
flume1.sources.kafka-source-1.batchSize = 100
flume1.sources.kafka-source-1.channels = hdfs-channel-1
flume1.channels.hdfs-channel-1.type = memory
flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /tmp/kafka/twitter_topic/%y-%m-%d
flume1.sinks.hdfs-sink-1.hdfs.rollCount= 100
flume1.sinks.hdfs-sink-1.hdfs.rollSize= 0
flume1.channels.hdfs-channel-1.capacity = 10000
flume1.channels.hdfs-channel-1.transactionCapacity = 1000
Is this a configuration problem in flume-conf.properties or am I missing something important?
EDIT
After restarting everything it seems to work better than before, Flume is actually doing something now (it seems like the order is important when starting hdfs, zookeeper, kafka, flume and my streaming application).
I now get an exception from flume
java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.isFileClosed(org.apache.hadoop.fs.path)
...
Edit the hdfs.path value with the full HDFS URI,
flume1.sinks.hdfs-sink-1.hdfs.path = hdfs://namenode_host:port/tmp/kafka/twitter_topic/%y-%m-%d
For the logs:
The logs are not being printed on the console, remove the whitespace between -D and flume.root.logger=INFO,console.
Try,
bin/flume-ng agent -n flume1 -c conf -f conf/flume-conf.properties -Dflume.root.logger=INFO,console
Or access the logs from $FLUME_HOME/logs/ directory.
I am using pig with Hcatalog to load data from hive external table
I enter grunt using pig -useHCatalog and execute the following:
register 'datafu'
define Enumerate datafu.pig.bags.Enumerate('1');
imported_data = load 'hive external table' using org.apache.hive.hcatalog.pig.HCatLoader() ;
converted_data = foreach imported_data generate name,ip,domain,ToUnixTime(ToDate(dateandtime,'MM/dd/yyyy hh:mm:ss.SSS aa'))as unix_DateTime,date;
grouped = group converted_data by (name,ip,domain);
result = FOREACH grouped {
sorted = ORDER converted_data BY unix_DateTime;
sorted2 = Enumerate(sorted);
GENERATE FLATTEN(sorted2);
};
All commands run and provide desired result.
Problem:
I made a pig script with above commands named as pigFinal.pig and executed the following in local mode coz script in local filesystem.
pig -useHCatalog -x local '/path/to/pigFinal.pig';
Exception
Failed to generate logical plan. Nested exception:
org.apache.pig.backend.executionengine.ExecException: ERROR 1070:
Could not resolve datafu.pig.bags.Enumerate using imports: [,
java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at
org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1507)
at
org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:9372)
at
org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:11051)
at
org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:10810)
at
org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:10159)
at
org.apache.pig.parser.LogicalPlanGenerator.nested_command(LogicalPlanGenerator.java:16315)
at
org.apache.pig.parser.LogicalPlanGenerator.nested_blk(LogicalPlanGenerator.java:16116)
at
org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:16024)
at
org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15849)
at
org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
at
org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at
org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at
org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 17 more
Where do i need register datafu jar for pig scripts?I guess this is the issue.
Please help
You have to ensure the jar file is located in the same folder as your pigscript or ensure that the correct path is provided in the pigscript while registering the jar file. So in your case
Modify this
register 'datafu'
To
-- If,lets say datafu-1.2.0.jar is your jar file and is located in the same folder as your pigscript then in your pigscript at the top have this
REGISTER datafu-1.2.0.jar
-- Else,lets say datafu-1.2.0.jar is your jar file and is located in the folder /usr/hadoop/lib then in your pigscript at the top have this
REGISTER /usr/hadoop/lib/datafu-1.2.0.jar
pig -useHCatalog \
-x local \
-Dpig.additional.jars="/local/path/to/datafu.jar:/local/path//other.jar" \
/path/to/pigFinal.pig;
OR
in your pig script use fully qualified path
register /local/path/to/datafu.jar;
My Input file is below . I am trying to dump the loaded data in relation. I am using pig 0.12.
a,t1,1000,100
a,t1,2000,200
b,t2,1000,200
b,t2,5000,100
I entered into HDFS mode by entering pig
myinput = LOAD 'file' AS(a1:chararray,a2:chararray,amt:int,rate:int);
if i do dump myinput then it shows the below error.
describe, illustrate works fine..
so
dump myinput ;
As soon i enter the dump command i get the below error message.
ERROR org.apache.hadoop.ipc.RPC - FailoverProxy: Failing this Call: submitJob for error (RemoteException): org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: User 'myid' cannot perform operation SUBMIT_JOB on queue default.
Please run "hadoop queue -showacls" command to find the queues you have access to .
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:179)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:136)
at org.apache.hadoop.mapred.ACLsManager.checkAccess(ACLsManager.java:113)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:4541)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:993)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1326)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1320)
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias myinput
Is this access issues? kind of privilege issue?
Can someone help me
If you didn't mention any load functions like PigStorage('\t') then it reads data with the column separator as tab(\t) by default.
In your data, the column separator is comma(,)
So Try this one,
myinput = LOAD 'file' using PigStorage(',') AS(a1:chararray,a2:chararray,amt:int,rate:int);
Hope it should work..
you could describe your input data(separator), in your case comma :
try this code please :
myinput = LOAD 'file' USING PigStorage(',') AS (a1:chararray,a2:chararray,amt:int,rate:int);
I'm trying to load a pipe delimited file ('|') in pig using the following command:
A = load 'test.csv' using PigStorage('|');
But I keep getting this error:
[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. java.net.URISyntaxException cannot be cast to java.lang.Error
I've looked all over, but I can't find any reason this would happen. The test file I have above is a simple file that just contains 1|2|3 for testing.
If you are running Pig in MAPREDUCE as the ExecType mode, then the following command should work
A = LOAD '/user/pig/input/pipetest.csv' USING PigStorage('|');
DUMP A;
Here is the output on your screen
(1,2,3)
Note that I have included the full path in HDFS for my csv file in the LOAD command