lets say i have a column with many lines but only two values, A and B:
i am trying unsuccessfully to count only lines with A - in a summary calculation for a dashboard (without making a new column for this specific calculation)
the expression which gives me syntax error is this:
count([column] = 'A')
any suggestion?
You'll need to use an if then else construct:
count( if([Column]='A') then ([Column]) else (Null))
You can use IF-THEN-ELSE or CASE-WHEN-THEN-ELSE to create your own count:
sum(
if ([Query Item] = 'A')
then (1)
else (0)
)
or
sum(
case
when [Query Item] = 'A'
then 1
else 0
end
)
Related
laravel I'm trying to get count row and its value.
model name Result
example THIS DATA STORED IN DATABASCE
NAME SUBJECT RESULT
A HX PASS
B HX FAIL
C DX PASS
D DX PASS
E MR FAIL
I want to show value like blade this
in table
SUBJECT PASS FAIL
HX 1 1
DX 2 0
MR 0 1
it is work for me:
$result = Subject::groupBy('subject')
->selectRaw("subject, SUM(IF(result LIKE 'pass', 1, 0) ) as PASS, SUM(IF(result LIKE 'fail', 1, 0) ) as FAIL")
->get();
or write your condition in SUM:
->selectRaw("subject, SUM(result LIKE 'pass') as PASS, SUM(result LIKE 'fail') as FAIL")
not: if you use COUNT it's not work.
bro if it's possible use Boolean for result (1 = 'pass', 0 = 'fail').
if you have any error you can use:
->selectRaw("`subject`, SUM(`result` LIKE 'pass') as PASS, SUM(`result` LIKE 'fail') as FAIL")
I need the following output.
NE 50
SE 80
I am using pig query to count the country based on zone.
c1 = group country by zone;
c2 = foreach c1 generate COUNT(country.zone), (
case country.zone
when 1 then 'NE'
else 'SE'
);
But I am not able to achieve my output. I am getting error like the following:
2016-03-30 13:57:16,569 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1039: (Name: Equal Type: null Uid: null)incompatible types in Equal Operator left hand side:bag :tuple(zone:int) right hand side:int
Details at logfile: /home/cloudera/pig_1459370643493.log
But I was able to do using following query.
c2 = foreach c1 generate group, COUNT(country.zone);
This will give following output:
(1,50)
(2,80)
How can I add NE instead of 1 and SE instead of 2? I thought using CASE would help but I am getting error. Can anyone help?
EDIT
Pig 0.12.0 Version now supports CASE expression.
c2 = FOREACH c1 GENERATE (CASE group
WHEN 1 THEN 'NE'
WHEN 2 THEN 'SE'
WHEN 3 THEN 'AE'
ELSE 'VR' END), COUNT(country.zone);
Older Pig Versions
Pig does not have a case statement.Your best option is to use UDF.If the group values are limited to only two then you can use bincond operator to check the value
c2 = foreach c1 generate (group == 1 ? 'NE' : 'SE'), COUNT(country.zone);
If you have multiple values then use this.I've used test values to generate the output.
Input
c2 = FOREACH c1 GENERATE (group == 1 ? 'NE' :
(group == 2 ? 'SE' :
(group == 3 ? 'AE' : 'VR'))), COUNT(country.zone);
Output
In Pig 12 and later, you can use case statement in pig
In your case, country.zone is a bag and you cant compare it to an int
With above posted answer getting this error.
mismatched input ')' expecting END.
So updating a working code:
c2 = FOREACH c1 GENERATE (CASE group
WHEN 1 THEN 'NE'
WHEN 2 THEN 'SE'
WHEN 3 THEN 'AE'
ELSE 'VR' END), COUNT(country.zone);
Output:
(NE, 50)
(SE, 80)
(AE, 30)
I have a column in tuple called avg_rating. I would like to create a new column NPS based on values in avg_rating. Here is how avg_rating data looks like
avg_rating
3
4
8
9
10
So if rating >= 8 then Pr
if rating rating is in between 4 & 8 NPS will be P
if rating is < 4 then NPS will be D
here is what i m trying,
yy = FOREACH avg_rating GENERATE avg_rating,((int)wtr>=8 ?'P':(int)wtr>=4 && (int)wtr<8 ?'PR':'D');
I am using multiple conditions in turnery operator but gives me error
Syntax error, unexpected symbol at or near '('
Any idea whats wrong with this?
There are several issues here.
You can't generate avg_rating
and instead of &&
Another set of parentheses around the embedded ternary.
This parses:
avg_rating = load '/tmp' using PigStorage('\t') as (wtr:INT);
yy = FOREACH avg_rating GENERATE
wtr,
((int)wtr>=8 ? 'P' : ((int)wtr>=4 and (int)wtr<8 ? 'PR' : 'D')) as v;
describe yy;
I am using the ternary operator to include values in SUM() operation conditionally. Here is how I am doing it.
GROUPED = GROUP ALL_MERGED BY (fld1, fld2, fld3);
REPORT_DATA = FOREACH GROUPED
{ GENERATE group,
SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : 0) AS sum1,
SUM(GROUPED.fld4 == 'S' ? GROUPED.fld5 : (GROUPED.fld5 * -1)) AS sum2;
}
Schema for ALL_MERGED is
{ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}}
When I execute this, it gives me following error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: SUM in {group: (fld1:chararray, fld2:chararray, fld3:chararray), ALL_MERGED: {fld1:chararray, fld2:chararray, fld3:chararray, fld4:chararray: fld5:int}}
What am I doing wrong here?
SUM is a UDF which takes a bag as input. What you are doing has a number of problems, and I suspect it would help you to review a good reference on Pig. I recommend Programming Pig, available for free online. To begin with, GROUPED has two fields: a tuple called group and a bag called ALL_MERGED, which is what the error message is trying to tell you. (I say "trying" because Pig error messages are often quite cryptic.)
Also, you cannot pass expressions to UDFs like you wish to do. Instead you will have to GENERATE these fields and then pass them afterward. Try this:
ALL_MERGED_2 =
FOREACH ALL_MERGED
GENERATE
fld1 .. fld5,
((fld4 == 'S') ? fld5 : 0) AS sum_me1,
((fld4 == 'S') ? fld5 : fld5*-1) AS sum_me2;
GROUPED = GROUP ALL_MERGED_2 BY (fld1, fld2, fld3);
DATA =
FOREACH GROUPED
GENERATE
group,
SUM(ALL_MERGED_2.sum_me1) AS sum1,
SUM(ALL_MERGED_2.sum_me2) AS sum2;
I have a query that's running slow (in a loop of about 100 it takes 5-10 seconds) and have no clue why. It's simply querying against a List of objects... your help is much appreciated!
I'm basically querying for Schedules that have been assigned to specific managers. It must be from the specified Shifts week OR the first 2 days of next week OR the last 2 days of the previous week.
I tried calculating .AddDays before but that didn't help. When I ran a performance test it highlighted the "from" statement below.
List<Schedule> _schedule = Schedule.GetAll();
List<Shift> _shifts = Shift.GetAll();
// Then later...
List<Schedule> filteredSchedule = (from sch in _schedule
from s in _shifts
where
**sch.ShiftID == s.ShiftID
& (sch.ManagerID == 1 | sch.ManagerID == 2 | sch.ManagerID == 3)
& ((s.ScheduleWeek == shift.ScheduleWeek)
| (s.ScheduleWeek == shift.ScheduleWeek.AddDays(7)
& (s.DayOfWeek == 1 | s.Code == 2))
| (sch.ScheduleWeek == shift.ScheduleWeek.AddDays(-7)
& (s.DayOfWeek == 5 | s.Code == 6)))**
select sch)
.OrderBy(sch => sch.ScheduleWeek)
.ThenBy(sch => sch.DayOfWeek)
.ToList();
First port of call: use && instead of & and || instead of |. Otherwise all the subexpressions in the where clause will be evaluated, even if the answer is already known.
Second port of call: use a join instead of two "from" clauses with a where:
var filteredSchedule = (from sch in _schedule
join s in _shifts on s.ShiftID equals sch.ShiftID
where ... rest of the condition ...
Basically that's going to create a hash of all the shift IDs, so it can quickly look up possible matches for each schedule.