Pig:Relation and Schema name confusion - hadoop

In Pig Latin;this works as expected:
filtered = FILTER records BY age > 27;
But this throws an exception (when >> DUMP filtered):
filtered = FILTER records BY records.age > 27;
This is the excepiton:
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (John,Wilk,27,M), 2nd :(Tri,Tim,27,F)
at org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:119)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:345)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextInteger(POUserFunc.java:394)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:322)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr.getNextBoolean(GreaterThanExpr.java:74)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
What is the difference between the two? Are not they same?

No, both the stmts are different.
First stmt is perfectly valid, in this case, pig will iterate through each row and apply the filter constraint(age > 27). Its a standard way of using filter stmts.
In the second case, you used dereference operator(.) to access the fields, but the dereference operator are mainly used to access the complex data types(Tuples,Bags and Maps) values, when you use dereference operator to access the fields then pig will always expect the scalar output(ie, only one output after the filter condition) unfortunately your filter condition(age > 27) return more than one matching result, that is the reason you got "Scalar has more than one row in the output"
In case your filter condition(age>27) return only one output then your stmt is perfectly valid.

Related

Compare Dates with Springboot Thymeleaf

I try to compare two dates with format yyyy-MM-ddTHH:mm:ss in a thymeleaf template.
I am able to compare a date, when I try:
${#dates.createNow().before(#dates.createNow())} // =false - just as an example
But my case is a bit tricky. At first I parse a string to obtain a date:
<th:block th:with="sdf = ${new java.text.SimpleDateFormat('yyyy-MM-dd''T''HH:mm:ss')}, isodate = ${isodate}, isodatemod = ${isodateModified}" >
Then I compare it:
<p th:switch="${#dates.format(sdf.parse(isodate)).before(#dates.format(sdf.parse(isodatemod)))}" />
This reveals the following error statement:
Caused by: org.thymeleaf.exceptions.TemplateProcessingException: Exception evaluating SpringEL expression: "#dates.format(sdf.parse(isodatemod)).before(#dates.format(sdf.parse(isodatemod)))
[...]
Caused by: org.springframework.expression.spel.SpelEvaluationException: EL1004E: Method call: Method before(java.lang.String) cannot be found on type java.lang.String
I just don't see, why after parsing my date, it is still seen as string? Is there any way to make these two dates comparable in thymeleaf?
I use the following import:
implementation group:"org.springframework.boot", name:"spring-boot-starter-thymeleaf"

How to access / address nested fields on Logstash

I am currently trying to convert a nested sub field that contains hexadecimal string to an int type field using Logstash filter
ruby{
code => 'event.set("[transactions][gasprice_int]", event.get("[transactions][gasPrice]").to_i(16))'
}
but it's returning the error
[ERROR][logstash.filters.ruby ][main][2342d24206691f4db46a60285e910d102a6310e78cf8af43c9a2f1a1d66f58a8] Ruby exception occurred: wrong number of arguments calling `to_i` (given 1, expected 0)
I also tried looping through json objects in transactions field using
transactions_num = event.get("[transactions]").size
transactions_num.times do |index|
event.set("[transactions][#{index}][gasprice_int]", event.get("[transactions][#{index}][gasPrice].to_i(16)"))
end
but this also returned an error of
[ERROR][logstash.filters.ruby ][main][99b05fdb6022cc15b0f97ba10cabb3e7c1a8fabb8e0c47d6660861badffdb28e] Ruby exception occurred: Invalid FieldReference: `[transactions][0][gasPrice].to_i(16)`
This method of conversion of hex-string to int type using a ruby filter worked when I wasn't dealing with nested fields, so can anyone please help me how to correctly address nested fields in this case?
btw,
this is the code that still works
ruby {
code => 'event.set("difficulty_int", event.get("difficulty").to_i(16))'
}
I think you have answered this one yourself in your final example - the to_i should not be inside the double quotes. So
...event.get("[transactions][#{index}][gasPrice].to_i(16)"))
should be
...event.get("[transactions][#{index}][gasPrice]").to_i(16))

Why does Java 8 DateTimeFormatter allows an incorrect month value in ResolverStyle.STRICT mode?

Why does this test pass, while the month value is obviously invalid (13)?
#Test
public void test() {
String format = "uuuuMM";
String value = "201713";
DateTimeFormatter.ofPattern(format).withResolverStyle(ResolverStyle.STRICT)
.parse(value);
}
When using a temporal query, the expected DateTimeParseException is thrown:
DateTimeFormatter.ofPattern(format).withResolverStyle(ResolverStyle.STRICT)
.parse(value, YearMonth::from);
What happens when no TemporalQuery is specified?
EDIT: the 13 value seems to be a special one, as I learned thanks to the answer of ΦXocę 웃 Пepeúpa ツ (see Undecimber).
But the exception is not thrown even with another value, like 50:
#Test
public void test() {
String format = "uuuuMM";
String value = "201750";
DateTimeFormatter.ofPattern(format).withResolverStyle(ResolverStyle.STRICT)
.parse(value);
}
I've made some debugging here and found that part of the parsing process is to check the fields against the formatter's chronology.
When you create a DateTimeFormatter, by default it uses an IsoChronology, which is used to resolve the date fields. During this resolving phase, the method java.time.chrono.AbstractChronology::resolveDate is called.
If you look at the source, you'll see the following logic:
if (fieldValues.containsKey(YEAR)) {
if (fieldValues.containsKey(MONTH_OF_YEAR)) {
if (fieldValues.containsKey(DAY_OF_MONTH)) {
return resolveYMD(fieldValues, resolverStyle);
}
....
return null;
As the input has only the year and month fields, fieldValues.containsKey(DAY_OF_MONTH) returns false, the method returns null and no other check is made as you can see in the Parsed class.
So, when parsing 201750 or 201713 without a TemporalQuery, no additional check is made because of the logic above, and the parse method returns a java.time.format.Parsed object, as you can see by the following code:
DateTimeFormatter fmt = DateTimeFormatter.ofPattern("uuuuMM").withResolverStyle(ResolverStyle.STRICT);
TemporalAccessor parsed = fmt.parse("201750");
System.out.println(parsed.getClass());
System.out.println(parsed);
The output is:
class java.time.format.Parsed
{Year=2017, MonthOfYear=50},ISO
Note that the type of the returned object is java.time.format.Parsed and printing it shows the fields that were parsed (year and month).
When you call parse with a TemporalQuery, though, the Parsed object is passed to the query and its fields are validated (of course it depends on the query, but the API built-in ones always validate).
In the case of YearMonth::from, it checks if the year and month are valid using the respective ChronoField's (MONTH_OF_YEAR and YEAR) and the month field accepts only values from 1 to 12.
That's why just calling parse(value) doesn't throw an exception, but calling with a TemporalQuery does.
Just to check the logic above when all the date fields (year, month and day) are present:
DateTimeFormatter fmt = DateTimeFormatter.ofPattern("uuuuMMdd").withResolverStyle(ResolverStyle.STRICT);
fmt.parse("20175010");
This throws:
Exception in thread "main" java.time.format.DateTimeParseException: Text '20175010' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 50
As all the date fields are present, fieldValues.containsKey(DAY_OF_MONTH) returns true and now it checks if it's a valid date (using the resolveYMD method).
The month 13 is called : Undecimber
The gregorian calendar that many of us use allows 12 months only but java includes support for calendars which permit thirteen months so it depends on what calendar system you are talking about
For example, the actual maximum value of the MONTH field is 12 in some years, and 13 in other years in the Hebrew calendar system. So the month 13 is valid
It is a little odd that an exception is not thrown when parse is called without a given TemporalQuery. Some of the documentation for the single argument parse method:
This parses the entire text producing a temporal object. It is typically more useful to use parse(CharSequence, TemporalQuery). The result of this method is TemporalAccessor which has been resolved, applying basic validation checks to help ensure a valid date-time.
Note that it says it is "typically more useful to use parse(CharSequence, TemporalQuery)". In your examples, parse is returning a java.time.format.Parsed object, which is not really used for anything other than creating a different TemporalAccessor.
Note that if you try to create a YearMonth from the returned value, an exception is thrown:
YearMonth.from(DateTimeFormatter.ofPattern(format)
.withResolverStyle(ResolverStyle.STRICT).parse(value));
throws
Exception in thread "main" java.time.DateTimeException: Unable to obtain YearMonth from TemporalAccessor: {Year=2017, MonthOfYear=50},ISO of type java.time.format.Parsed
at java.time.YearMonth.from(YearMonth.java:263)
at anl.nfolds.Test.main(Test.java:21)
Caused by: java.time.DateTimeException: Invalid value for MonthOfYear (valid values 1 - 12): 50
at java.time.temporal.TemporalAccessor.get(TemporalAccessor.java:224)
at java.time.YearMonth.from(YearMonth.java:260)
... 1 more
Documentation for Parsed:
A store of parsed data.
This class is used during parsing to collect the data. Part of the parsing process involves handling optional blocks and multiple copies of the data get created to support the necessary backtracking.
Once parsing is completed, this class can be used as the resultant TemporalAccessor. In most cases, it is only exposed once the fields have been resolved.
Since:1.8
#implSpecThis class is a mutable context intended for use from a single thread. Usage of the class is thread-safe within standard parsing as a new instance of this class is automatically created for each parse and parsing is single-threaded

Pig: Unable to Load BAG

I have a record in this format:
{(Larry Page),23,M}
{(Suman Dey),22,M}
{(Palani Pratap),25,M}
I am trying to LOAD the record using this:
records = LOAD '~/Documents/PigBag.txt' AS (details:BAG{name:tuple(fullname:chararray),age:int,gender:chararray});
But I am getting this error:
2015-02-04 20:09:41,556 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 7, column 101> mismatched input ',' expecting RIGHT_CURLY
Please advice.
It's not a bag since it's not made up of tuples. Try
load ... as (name:tuple(fullname:chararray), age:int, gender:chararray)
For some reason Pig wraps the output of a line in curly braces which make it look like a bag but it's not. If you have saved this data using PigStorage you can save it using a parameter ('-schema') which tells PigStorage to create a schema file .pigschema (or something similar) which you can look at to see what the saved schema is. It can also be used when loading with PigStorage to save you the AS clause.
Yes LiMuBei point is absolutely right. Your input is not in the right format. Pig will always expect the bag should hold collection of tuples but in your case its a collection of (tuple and fields). In this case pig will retain the tuple and reject the fields(age and gender) during load.
But this problem can be easily solvable in different approach(kind of hacky solution).
1. Load each input line as chararray.
2. Remove the curly brackets and function brackets from the input.
3. Using strsplit function segregate the input as (name,age,sex) fields.
PigScript:
A = LOAD 'input' USING PigStorage AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(REPLACE(line,'[}{)(]+','')) AS (newline:chararray);
C = FOREACH B GENERATE FLATTEN(STRSPLIT(newline,',',3)) AS (fullname:chararray,age:int,sex:chararray);
DUMP C;
Output:
(Larry Page,23,M)
(Suman Dey,22,M)
(Palani Pratap,25,M)
Now you can access all the fields using fullname,age,sex.

How to insert dummpy map values in pig

I am doing a conditional check for null and empty occurrence of a bag. The contains multiple map arrays. Whenever 'info' is null or empty I want to put a dummy map values into this. Because in the next step I am doing a FLATTEN operation on 'info'.
Why I need this because null or empty bag in FLATTEN will remove the complete record from the data which I don't want.
((info is null or IsEmpty(info)) ? {(['Unknown'#'unknown'])} : info) as info;
This is giving me below compilation error?
2014-09-02 06:20:37,978 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " ": "" at line 24, column 70.
Was expecting one of:
"cat" ...
"clear" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
It seems there is a syntax error while creating a map. There is an easy way to create map using TOMAP function, which you can use as below:
((info is null or IsEmpty(info)) ? {(TOMAP('Unknown','unknown'))} : info) as info;

Resources