How to generate tuple in ? operator of pig - hadoop

My code is as follows
temp = foreach requiredData generate (recordType == 3 ? controllingCalledNum : callingPtyNum)as ServiceNumber, (recordType == 3 ? callingPtyNum : controllingCalledNum)as DestinationNumber;
Here my code is reduntant..
Can I generate tuple inside '?' operator and do something like this which I can further FLATTERN
temp = foreach requiredData generate (recordType == 3 ? (controllingCalledNum,callingPtyNum) : (callingPtyNum,controllingCalledNum))as (ServiceNumber,DestinationNumber);
I am getting error if I try to do like this
Please help me.

Use the built-in TOTUPLE UDF:
temp = foreach requiredData generate FLATTEN(recordType == 3 ? TOTUPLE(controllingCalledNum,callingPtyNum) : TOTUPLE(callingPtyNum,controllingCalledNum))as (ServiceNumber,DestinationNumber);

Related

Query Regarding PIG- How to put a if like condition in ForEach

I have a query wrt writing pig script
RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED GENERATE flatten(group) , SUM(SOMETYPEDATA.DURATION) as duration, COUNT(SOMETYPEDATA.DURATION) as cnt;
Here I want to replace SUM(SOMETYPEDATA.DURATION) with some number like
if(0>Sum > 1000) then put 1
if(1001> Sum > 2000 ) then put 2
if(2001> Sum > 3000 ) then put 3
How to acheive this in pig
Please suggest
SPLIT will do that but not inside the FOREACH loop. Pig also has a ternary operator kind of thing but that will not be helpful to store the result in a variable. Here is how you can use SPLIT to achieve something close to your requirement.
A = LOAD '/home/vignesh/a.dat' using PigStorage(',') as (a:int,b:int,c:int);
SPLIT A INTO B IF (a > 0 AND a < 1000), C IF (a > 1001 AND a<2000), D IF (a > 2001 AND a < 3000);
We can use either bincond operator (?:) or CASE statement (from Pig Version : 0.12 on wards) to achieve the objective.
RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED GENERATE flatten(group) AS grp_name , SUM(SOMETYPEDATA.DURATION) as duration_sum, COUNT(SOMETYPEDATA.DURATION) as cnt;
result_required = FOREACH RESULT_SOMETYPE GENEATE grp_name,
(duration_sum > 0 AND duration_sum < 1000 ? 1 :
(duration_sum > 1001 AND duration_sum < 2000 ? 2 :
(duration_sum > 2001 AND duration_sum < 3000 ? 3 : 9999)
)
) AS duration, cnt;
Refer : http://pig.apache.org/docs/r0.12.0/basic.html#arithmetic

Get the count through iterate over Data Bag but condition should be different count for each value associated to that field

Below is the data I have and the schema for the same is-
student_name, question_number, actual_result(either - false/Correct)
(b,q1,Correct)
(a,q1,false)
(b,q2,Correct)
(a,q2,false)
(b,q3,false)
(a,q3,Correct)
(b,q4,false)
(a,q4,false)
(b,q5,flase)
(a,q5,false)
What I want is to get the count for each student i.e. a/b for total
correct and false answer he/she has made.
For the use case shared, below pig script is suffice.
Pig Script :
student_data = LOAD 'student_data.csv' USING PigStorage(',') AS (student_name:chararray, question_number:chararray, actual_result:chararray);
student_data_grp = GROUP student_data BY student_name;
student_correct_answer_data = FOREACH student_data_grp {
answers = student_data.actual_result;
correct_answers = FILTER answers BY actual_result=='Correct';
incorrect_answers = FILTER answers BY actual_result=='false';
GENERATE group AS student_name, COUNT(correct_answers) AS correct_ans_count, COUNT(incorrect_answers) AS incorrect_ans_count ;
};
Input : student_data.csv :
b,q1,Correct
a,q1,false
b,q2,Correct
a,q2,false
b,q3,false
a,q3,Correct
b,q4,false
a,q4,false
b,q5,false
a,q5,false
Output : DUMP kpi:
-- schema : (student_name, correct_ans_count, incorrect_ans_count)
(a,1,4)
(b,2,3)
Ref : For more details on nested FOR EACH
http://pig.apache.org/docs/r0.12.0/basic.html#foreach
http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach
Use this:
data = LOAD '/abc.txt' USING PigStorage(',') AS (name:chararray, number:chararray,result:chararray);
B = GROUP data by (name,result);
C = foreach B generate FLATTEN(group) as (name,result), COUNT(data) as count;
and answer will be like:
(a,false,4)
(a,Correct,1)
(b,false,3)
(b,Correct,2)
Hope this is the output you are looking for

Optimizing pig script

I am trying to generate aggregated output. The issue is that all the data is going to a single reducer(Filter and Count are creating a problem). How can I optimize the following script?
Expected output:
group, 10,2,12,34...
data = LOAD '/input/useragents' USING PigStorage('\t') AS (Col1:chararray,Col2:chararray,Col3:chararray,col4:chararray,col5:chararray);
grp1 = GROUP data BY UA PARALLEL 50;
fr1 = FOREACH grp1 {
fltrCol1 = FILTER data BY Col1 == 'Other';
fltrCol2 = FILTER data BY Col2 == 'Other';
fltrCol3 = FILTER data BY Col3 == 'Other';
fltrCol4 = FILTER data BY col4 == 'Other';
fltrCol5 = FILTER data BY col5 == 'Other';
cnt_fltrCol1 = COUNT(fltrCol1);
cnt_fltrCol2 = COUNT(fltrCol2);
cnt_fltrCol3 = COUNT(fltrCol3);
cnt_fltrCol4 = COUNT(fltrCol4);
cnt_fltrCol5 = COUNT(fltrCol5);
GENERATE group,cnt_fltrCol1,cnt_fltrCol2,cnt_fltrCol3,cnt_fltrCol4,cnt_fltrCol5;
}
You could put the filter logic before the group by adding fltrCol{1,2,3,4,5} columns as integers, than sum them up. From the top of my head here is the script :
data = LOAD '/input/useragents' USING PigStorage('\t') AS (Col1:chararray,Col2:chararray,Col3:chararray,col4:chararray,col5:chararray);
filter = FOREACH data GENERATE UA,
((Col1 == 'Other') ? 1 : 0) as fltrCol1,
((Col2 == 'Other') ? 1 : 0) as fltrCol2,
((Col3 == 'Other') ? 1 : 0) as fltrCol3,
((Col4 == 'Other') ? 1 : 0) as fltrCol4,
((Col5 == 'Other') ? 1 : 0) as fltrCol5;
grp1 = GROUP data BY UA PARALLEL 50;
fr1 = FOREACH grp1 {
cnt_fltrCol1 = SUM(fltrCol1);
cnt_fltrCol2 = SUM(fltrCol2);
cnt_fltrCol3 = SUM(fltrCol3);
cnt_fltrCol4 = SUM(fltrCol4);
cnt_fltrCol5 = SUM(fltrCol5);
GENERATE group,cnt_fltrCol1,cnt_fltrCol2,cnt_fltrCol3,cnt_fltrCol4,cnt_fltrCol5;
}

conditional statements not working in expression editor

I want to check current month to many condition as below.
so I take TextField ,edit its pattern to MM & in expression editor edit below code
new java.util.Date()>=4 && new java.util.Date() <= 7 ? "Q1" :
new java.util.Date()>=8 && new java.util.Date() <=11 ? "Q2" : "Q3"
but it gives error
Error filling print... Error evaluating expression :      Source text : new java.util.Date()>=4 && new java.util.Date() <= 7 ? "Q1" : new java.util.Date()>=8 && new java.util.Date() <=11  ? "Q2" : "Q3"
Setting up the file resolver..
but when I give expression like
new java.util.Date()== 4 ? "Q1" : "Q2"
It works fine.
Does iReport not able to resolve multiple conditions ? or should I give different TextField with single condition ?
are you sure new java.util.Date() is just give you the month.
try this it will give you the moth
(new SimpleDateFormat("M")).format(new java.util.Date())
also you can use yyyy(year) or d(date) to check
and Integer.parse(...month...) or Date.parse(...month...) maybe needed, if you don't have your Qs as integer or date.
Try using calendar instead.
(Calendar.getInstance()).get(Calendar.MONTH)>=3 && (Calendar.getInstance()).get(Calendar.MONTH)>=6 ? "Q1" :
(Calendar.getInstance()).get(Calendar.MONTH)>=7 && (Calendar.getInstance()).get(Calendar.MONTH)>=10 ? "Q2" : "Q3"
can you please try putting the expression in brackets like
(new java.util.Date()>=4 && new java.util.Date() <= 7 )? "Q1" : (
(new java.util.Date()>=8 && new java.util.Date() <=11 ) ? "Q2" : "Q3")
You can use multiple condition like this
new java.util.Date() >= 4 ?
"Q1" :
new java.util.Date() <= 7 ?
"Q2" :
new java.util.Date() >= 8 ?
"Q3" :
"Q4"

Doctrine remove "()" automatically

When I built a query, the doctrine removed the "()" automatically.
Here is my query:
$query = MstFontTable::getInstance()->createQuery('msf');
$query->where('(full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name = ? OR
font_name = ?)', array(trim($fontName) . ',%',
'%,' . trim($fontName),
'%,' . trim($fontName) . ',%',
trim($fontName),
trim($fontName)
)
);
$query->andWhere('((tenant_id = 0 OR tenant_id = ?))', array(intval($tenantId)));
Here is the result when I use $query->getDql():
FROM MstFont msf WHERE (full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name = ? OR
font_name = ?) AND ((tenant_id = 0 OR tenant_id = ?))
Here is the result when I use $query->getSqlQuery():
SELECT m.font_id AS m__font_id, m.tenant_id AS m__tenant_id, m.font_name AS m__font_name, m.font_file AS m__font_file, m.font_category AS m__font_category, m.vendor AS m__vendor, m.full_font_name AS m__full_font_name, m.font_name_ap AS m__font_name_ap FROM mst_font m WHERE (m.full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name LIKE ? OR
full_font_name = ? OR
font_name = ? AND (m.tenant_id = 0 OR m.tenant_id = ?))
Has anyone help me explain this problem?
Try by removing excess brackets like:
$query->andWhere('((tenant_id = 0 OR tenant_id = ?))', array(intval($tenantId)));
to
$query->andWhere('tenant_id = 0 OR tenant_id = ?', array(intval($tenantId)));
Do the same for first one too like from:
$query->where('(full_font_name LIKE ? OR ...
to
$query->where('full_font_name LIKE ? OR ...
and see what happens. You don't need them and Doctrine will put brackets for those and/or conditions.

Resources