ERROR 2103: doing work on Longs - hadoop

I have data
store trn_date dept_id sale_amt
1 2014-12-15 101 10007655
1 2014-12-15 101 10007654
1 2014-12-15 101 10007544
6 2014-12-15 104 100086544
8 2014-12-14 101 1000000
8 2014-12-15 101 100865761
I'm trying to aggregate the data using below code -
Loading the data (tried both the way using HCatLoader() and using PigStorage())
data = LOAD 'data' USING org.apache.hcatalog.pig.HCatLoader();
group_table = GROUP data BY (store, tran_date, dept_id);
group_gen = FOREACH grp_table GENERATE
FLATTEN(group) AS (store, tran_date, dept_id),
SUM(table.sale_amt) AS tota_sale_amt;
Below is the Error stack Trace which I'm getting while running the job
================================================================================
Pig Stack Trace
---------------
ERROR 2103: Problem doing work on Longs
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: grouped_all: Local Rearrange[tuple]{tuple}(false) - scope-1317 Operator Key: scope-1317): org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:183)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:84)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:108)
at org.apache.pig.builtin.AlgebraicLongMathBase$Intermediate.exec(AlgebraicLongMathBase.java:102)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:369)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:333)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:77)
================================================================================
As i was looking for solution, many of said it is due to loading the data using HCatalog Loader. so i have tried loading the data using "PigStorage()".
still getting the same error.

This is may be because of the way you are storing data in hive. If any aggregation is going to happen on any column do mention it's data type integer or numeric.

Basically each aggregation function returns data with it's default data type,
Like -
AVG returns DOUBLE
SUM returns DOUBLE
COUNT returns LONG
I don't think so this is the issue while storing it into hive because you already tried PigStore() it means, it's just data type issue while you are passing it to aggregation.
Try to change the data type before passing it to aggregation and try.

Related

What is 152³305746

I am getting 152³305746 as some bad data in my input file. I am trying to filter it, but have been unsuccessful in telling hive on how to detect it and filter. I am not even sure what data type it is. I am expecting bigint values instead of what i see here.
I have tried the below things and their various combinations which did not help skip the few bad records i have in my input:
1)
CAST(mycol AS string) RLIKE "^[0-9]+$"
2)
mycol < 2147483647
3)
CREATE TEMPORARY MACRO isNumber(s string) CAST(s as BIGINT) IS NOT NULL;
4)
isNumber(mycol) != false
5)
SET mapreduce.map.skip.maxrecords = 100000000;
None of the above approaches worked. Hive fails with the following error:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NumberFormatException: For input string: "152³305746"
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:126)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:879)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
... 9 more
Caused by: java.lang.NumberFormatException: For input string: "152³305746"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.openx.data.jsonserde.objectinspector.primitive.ParsePrimitiveUtils.parseLong(ParsePrimitiveUtils.java:49)
at org.openx.data.jsonserde.objectinspector.primitive.JavaStringLongObjectInspector.get(JavaStringLongObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:400)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:279)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:239)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:201)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.makeValueWritable(ReduceSinkOperator.java:565)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:395)
... 17 more

Spring ldaptemplate update group with large membership issue

I have issues updating groups in Active Directory with > 1500 members. It's only trying to modify the member attribute.
I have no issues updating groups with fewer members. I can also add a new group with many members.
However if its too large, update fails. I can try to update the large group to just one member and it still fails with the same error.
Code fails on the modifyAttributes line:
ModificationItem[] modList =
nameContext.getDirContextAdapter().getModificationItems();
writeADTemplate.modifyAttributes(nameContext.getName(),modList);
StackTrace Below:
org.springframework.ldap.NameAlreadyBoundException: [LDAP: error code 68 -
00000562: UpdErr: DSID-031A122A, problem 6005 (ENTRY_EXISTS), data 0
nested exception is javax.naming.NameAlreadyBoundException: [LDAP: error
code 68 - 00000562: UpdErr: DSID-031A122A, problem 6005 (ENTRY_EXISTS), data 0
remaining name 'cn=Atlassian Users,ou=Groups'
at org.springframework.ldap.support.LdapUtils.convertLdapException
(LdapUtils.java:169)
at org.springframework.ldap.core.LdapTemplate.executeWithContext
(LdapTemplate.java:810)
at
org.springframework.ldap.core.LdapTemplate.executeReadWrite
(LdapTemplate.java:802)
at org.springframework.ldap.core.LdapTemplate.modifyAttributes
(LdapTemplate.java:967)
more ...
Caused by: javax.naming.NameAlreadyBoundException: [LDAP: error code 68 -
00000562: UpdErr: DSID-031A122A, problem 6005 (ENTRY_EXISTS), data 0
remaining name 'cn=Atlassian Users,ou=Groups'
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(Unknown Source)
at com.sun.jndi.ldap.LdapCtx.processReturnCode(Unknown Source)
at com.sun.jndi.ldap.LdapCtx.processReturnCode(Unknown Source)
at com.sun.jndi.ldap.LdapCtx.c_modifyAttributes(Unknown Source)
at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_modifyAttributes(Unknown
Source)
at
com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.modifyAttributes(Unknown
Source)
at javax.naming.directory.InitialDirContext.modifyAttributes(Unknown Source)
at
org.springframework.ldap.core.LdapTemplate$19.executeWithContext
(LdapTemplate.java:969)
at
org.springframework.ldap.core.LdapTemplate.executeWithContext
(LdapTemplate.java:807)
... 88 more
Ok my real issue is that Active Directory will not return a multi value attribute like member if the values > 1500.
When I was getting the current group members it was return 0 values so my code was trying to add all the members back to the group.
Looks like I'll have to figure out how to use
DefaultIncrementalAttributesMapper to get all the members

Hive compression in ORC using Lz4

I am trying to compress RC and ORC file using LZ4. I have installed Hadoop-2.7.1 and Hive-1.2.1. In case of LZ4, I can compress RC file without any problem. But, when I try to load data in ORC file using LZ4, it is not working. I have created ORC table like below:
CREATE TABLE FINANCE_orc(
PERMNO STRING,
DATE STRING,
CUSIP STRING,
NCUSIP STRING,
COMNAM STRING,
TICKET STRING,
PERMCO STRING,
SHRCD STRING,
EXCHCD STRING,
HEXCD STRING,
SICCD STRING,
HSLCCD STRING,
PRC STRING,
VOL STRING,
RET STRING,
SHROUT STRING,
DLRET STRING,
VWRETD STRING,
EWRETD STRING,
SPRTRN STRING)
STORED AS ORC tblproperties ("orc.compress"="Lz4");
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.type = BLOCK;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
INSERT OVERWRITE table finance_orc select * from finance;
But at the time of loading data it gives the following error:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"permno":"PERMNO","ndate":"DATE","cusip":"CUSIP","ncusip":"NCUSIP","comnam":"COMNAM","ticket":"TICKER","permco":"PERMCO","shrcd":"SHRCD","exchcd":"EXCHCD","hexcd":"HEXCD","siccd":"SICCD","hslccd":"HSICCD","prc":"PRC","vol":"VOL","ret":"RET","shrout":"SHROUT","dlret":"DLRET","vwretd":"VWRETD","ewretd":"EWRETD","sprtrn":"SPRTRN"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:675)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
... 9 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:622)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:566)
... 16 more
Caused by: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4
at java.lang.Enum.valueOf(Enum.java:236)
at org.apache.hadoop.hive.ql.io.orc.CompressionKind.valueOf(CompressionKind.java:25)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getOptions(OrcOutputFormat.java:143)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:203)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getHiveRecordWriter(OrcOutputFormat.java:52)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
... 18 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 4 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I have used Snappy and Zlib with the same command and it is working fine. But problem is only with LZ4. I do not know the reason why?
The compression we can use in addition to ORC columnar compression is either of one of NONE, ZLIB, SNAPPY.
The default compression codec is ZLIB.
The compression codecs other than above are not allowed.
In general to get an idea about error, read the error log completely to find the issue to some extent. The error log said -
"org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.io.orc.CompressionKind.Lz4"
Now you can manually substitute the jar file for lz4 on official spark repo.

Pig filter fails due to unexpected data

I am running Cassandra and have about 20k records in it to play with. I am trying to run a filter in pig on this data but am getting the following message back:
2015-07-23 13:02:23,559 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.RuntimeException: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:260)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:205)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Expected 8 or 0 byte long (1)
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:263)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:44)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.(CqlRecordReader.java:259)
at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:256)
... 7 more
You would think this is an obvious error, and believe me there are a ton of results on google for this. It's clear that some piece of my data isn't conforming to the expected type of a given column. What I don't understand is 1.) why this is happening, and 2.) how to debug it. If I try to insert invalid data into Cassandra from my nodejs app, it will throw this kind of error if my data type doesn't match the columns data type, which means that this shouldn't be possible? I've read that data validation using UTF8 is wonky and that setting a different kind of validation is the answer, but I don't know how to do that. Here are my steps to reproduce:
grunt> define CqlNativeStorage org.apache.cassandra.hadoop.pig.CqlNativeStorage();
grunt> test = load 'cql://blah/blahblah' USING CqlNativeStorage();
grunt> describe test;
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - Found ksDef name: blah
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - partition keys: ["ad_id"]
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster keys: []
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - row key validator: org.apache.cassandra.db.marshal.UTF8Type
13:09:54.544 [main] DEBUG o.a.c.hadoop.pig.CqlNativeStorage - cluster key validator: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type)
blahblah: {ad_id: chararray,address: chararray,city: chararray,date_created: long,date_listed: long,fireplace: bytearray,furnished: bytearray,garage: bytearray,neighbourhood: chararray,num_bathrooms: int,num_bedrooms: int,pet_friendly: bytearray,postal_code: chararray,price: double,province: chararray,square_feet: int,url: chararray,utilities_included: bytearray}
grunt> query1 = FILTER blahblah BY city == 'New York';
grunt> dump query1;
Then it runs for awhile and dumps out tons of logs and the error appears.
Discovered my problem: the pig partioner did not match CQL3, and therefore the data was being parsed incorrectly. Previously the environment variable was PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner. After I changed it to PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner it started working.

how to run illustrate command in pig?

I tried to run illustrate command on alias in pig in both local and hdfs mode also.
But i am getting below error.
2014-08-27 19:18:06,703 [main] ERROR
org.apache.pig.pen.ExampleGenerator - Error reading data. Internal
error creating job configuration. java.lang.RuntimeException: Internal
error creating job configuration.
at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:160)
at org.apache.pig.PigServer.getExamples(PigServer.java:1182)
at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:739)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 2014-08-27 19:18:06,707 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2997: Encountered IOException. Exception Details at logfile: /opt/pig_1409147241095.log
I am running Illustrate command using this example.
Say the input file is 'visits.txt' containing the following data :
Amy cnn.com 20070218 Fred harvard.edu 20071204 Amy
bbc.com 20071205 Fred stanford.edu 20071206
A grunt session might look something like this (Note the use of schemas while loading data. ExampleGenerator needs you to provide aliases) :
grunt> visits = load 'visits.txt' as (user, url, timestamp);
grunt> recent_visits = filter visits by timestamp >= '20071201';
grunt> user_visits = group recent_visits by user;
grunt> num_user_visits = foreach user_visits generate group, COUNT(recent_visits);
grunt> illustrate num_user_visits
set pig.enable.plan.serialization=false on /etc/pig/conf/pig.properties

Resources