I currently have a table with five columns:
A = Campaign
B = Person
C = Opportunity Name
D = Total Cost of Campaign
E = Date
I'm trying to use COUNTIFS to count the number of rows that match the exact value in cell H2 to column A and has a date range, in column E, that is greater than the value in cell I2.
I have something like this so far:
=countifs($A$2:$A, $H$2, $E$2:$E, ">"&$I$2).
However, I'm having a tough time to trying to dedupe this - it should only count unique rows based on the data in column C, where duplicate names exist. Please refer to my data table as reference:
Campaign Person Opportunity Name Total Cost of Campaign Date
A Bob Airbnb 5000 3/2/2017
B Jim Sony 10000 3/2/2017
B Jane Coca-Cola 10000 3/2/2017
C Jim Sony 200 3/2/2017
B Daniel Sony 10000 3/2/2017
B April Coca-Cola 10000 3/5/2017
For example:
=countifs($A$2:$A, $H$2, $E$2:$E, ">"&$I$2)
with B in H2 and 3/1/2017 in I2 will give me a result of 4 but I'm really trying to extract a value of 2, given that there are only two unique names in Column C (Sony and Coca-Cola).
How could I do this?
You need to include column C in your formula and use COUNTUNIQUE function as #Jeeped have suggested. Here is the final formula that you can use:
=COUNTUNIQUE(IFERROR(FILTER(C:C,A:A=H2,E:E>I2)))
Use COUNTUNIQUE with QUERY
=countunique(QUERY(A:E,"Select C where A = '"&H2&"' and E > date '" & text(I2,"yyyy-mm-dd") & "'",0))
I have just starting learning PIG and need small help with the question below . thanks in advance !
For eg: I have input like:
Occupation Category Name
Actress Acting Marion Cotillard
Actor Acting Liam Nelson
Tennis Plyr Athletics Roger Federer
Football Plyr Athletics Neymar
Actor Acting Tom Hanks
Actress Acting Elizabeth Banks
US Senator Politics Elizabeth Warren
Football Plyr Athletics Mesut Ozil
I want to know how many types are there in single category.
For eg:- Acting has two types one is Actress and other is Actor. Hence , result will be 2.
Problem facing : Not able to DISTINCT the output from 'group by Category' using 'Occupation' column. :(
Try this:
x= load '<data>' using PigStorage('\t') as (occupation:chararray,category:chararray,name:chararray);
x_grouped= group x by category;
x_grouped_distinct= foreach x_grouped { cat= distinct $1.occupation; generate $0, cat, COUNT(cat);};
dump x_grouped_distinct;
Distinct first and then Group By Category.Assuming you have already loaded the data into relation A.
Select the 2 columns after the load.
Distinct the relation
Group By category
Count Occupation for each Category
B = FOREACH A GENERATE Occupation as Occupation,Category as Category;
C = DISTINCT B;
D = GROUP C BY $1;
E = FOREACH D GENERATE group,COUNT(C.Occupation);
DUMP E;
im new at programming in Pig Latin and i have a question.
Let's say i have the following two relations (A and B):
Relation A: http://i.stack.imgur.com/Aa5Rd.png
Relation B: http://i.stack.imgur.com/m467q.png
Now, the Relations should be joined, but only when in A a key (id) exists. Otherwise not. So the Result should look like:
Relation Result: i.stack.imgur.com/3elgh.png (i cannot post more than 2 links)
How i can solve that?
My approach result = JOIN A BY id, B BY id; because it creates a result relation with all ids & texts :/
Thank you very much in advance,
Stefanos
Your approach is right. I got the correct output as you mentioned but not sure why you didn't get the output. Can you cross check your pigscript with the below one?
input1:
1
4
6
input2:
1,peter
2,jay
3,dan
4,knut
5,Gnu
6,rafael
7,hans
PigScript:
A = LOAD 'input1' AS (id:int);
B = LOAD 'input2' USING PigStorage(',') AS (id:int,text:chararray);
C = JOIN A BY id,B BY id;
D = FOREACH C GENERATE A::id AS id,B::text as text;
DUMP D;
Output:
(1,peter)
(4,knut)
(6,rafael)
i have two tables
1) Logs
2) Jobs
structure of both are as follows
Logs :- id, Emailid, LogDate
sampledata:- 1, a#a.com, jan24 1999
2, b#a.com, jan25 1999
3, a#a.com, jan25 1999
4, c#a.com jan26,1999
5, a#a.com jan27,1999
Jobs :- jid, job_name, job_viewed_by
sampledata:- j01, painter, a#a.com
j02, teacher, a#a.com
j01, painter, b#a.com
job_viewed_by is foreign key in jobs table and is related with Emailid in Logs table.
now i want a linq to entitites query which can give me
all Emailids from the logs tables who haved logged recently along with the no of jobs viewed (count of jobs) by them.
so as per above sample data my requirement is
a#a.com last logged on 27th jan.1999 and had viewed 2 jobs so far
b#a.com last logged on 24th jan.1999 had viewed 1 jobs so far
c#a.com last logged on 26th jan.1999. no jobs viewed
i know how to write it in SQL but i need to convert it using LinqtoEntities.
i tried a query but it give me number of recent logins rather than job counts.
var q= (from p in context.Logs
from x in context.ViewedJobs.Where(v=>p.EmailId ==v.ViewedBy)
group p by p.EmailId into grp
select new{ EmailId = grp.Key,
LastDate = grp.Max(g => g.LogDate),
Count=grp.Count() }).OrderByDescending(m=>m.LogDate);
Just smiple to try:
var q = from p in context.Logs
group p by p.Emailid into g
select new
{
EmailId=g.Key,
LastDate= g.Max(x => x.LogDate),
Count=context.ViewedJobs.Count(v=>v.ViewedBy==g.Key)
};
Update Version:
var q = from p in context.Logs
group p by p.Emailid into g
join j in context.ViewedJobs
on g.Key equlas j.ViewedBy into leftGroup
select new
{
EmailId=g.Key,
LastDate= g.Max(x => x.LogDate),
Count=leftGroup.Any()?leftGroup.Count():0
};
In PigLatin, I want to group by 2 times, so as to select lines with 2 different laws.
I'm having trouble explaining the problem, so here is an example. Let's say I want to grab the specifications of the persons who have the nearest age as mine ($my_age) and have lot of money.
Relation A is four columns, (name, address, zipcode, age, money)
B = GROUP A BY (address, zipcode); # group by the address
-- generate the address, the person's age ...
C = FOREACH B GENERATE group, MIN($my_age - age) AS min_age, FLATTEN(A);
D = FILTER C BY min_age == age
--Then group by as to select the richest, group by fails :
E = GROUP D BY group; or E = GROUP D BY (address, zipcode);
-- The end would work
D = FOREACH E GENERATE group, MAX(money) AS max_money, FLATTEN(A);
F = FILTER C BY max_money == money;
I've tried to filter at the same time the nearest and the richest, but it doesn't work, because you can have richest people who are oldest as mine.
An another more realistic example is :
You have demands file like : iddem, idopedem, datedem
You have operations file like : idope,labelope,dateope,idoftheday,infope
I want to return operations that matches demands like :
idopedem matches ideope.
The dateope must be the nearest with datedem.
If datedem - date_ope > 0, then I must select the operation with the max(idoftheday), else I must select the operation with the min(idoftheday).
Relation A is 5 columns (idope,labelope,dateope,idoftheday,infope)
Relation B is 3 columns (iddem, idopedem, datedem)
C = JOIN A BY idope, B BY idopedem;
D = FOREACH E GENERATE iddem, idope, datedem, dateope, ABS(datedem - dateope) AS datedelta, idoftheday, infope;
E = GROUP C BY iddem;
F = FOREACH D GENERATE group, MIN(C.datedelta) AS deltamin, FLATTEN(D);
G = FILTER F BY deltamin == datedelta;
--Then I must group by another time as to select the min or max idoftheday
H = GROUP G BY group; --Does not work when dump
H = GROUP G BY iddem; --Does not work when dump
I = FOREACH H GENERATE group, (datedem - dateope >= 0 ? max(idoftheday) as idofdaysel : min(idoftheday) as idofdaysel), FLATTEN(D);
J = FILTER F BY idofdaysel == idoftheday;
DUMP J;
Data in the 2nd example (note date are already in Unix format) :
You have demands file like :
1, 'ctr1', 1359460800000
2, 'ctr2', 1354363200000
You have operations file like :
idope,labelope,dateope,idoftheday,infope
'ctr0','toto',1359460800000,1,'blabla0'
'ctr0','tata',1359460800000,2,'blabla1'
'ctr1','toto',1359460800000,1,'blabla2'
'ctr1','tata',1359460800000,2,'blabla3'
'ctr2','toto',1359460800000,1,'blabla4'
'ctr2','tata',1359460800000,2,'blabla5'
'ctr3','toto',1359460800000,1,'blabla6'
'ctr3','tata',1359460800000,2,'blabla7'
Result must be like :
1, 'ctr1', 'tata',1359460800000,2,'blabla3'
2, 'ctr2', 'toto',1359460800000,1,'blabla4'
Sample input and output would help greatly, but from what you have posted it appears to me that the problem is not so much in writing the Pig script but in specifying what exactly it is you hope to accomplish. It's not clear to me why you're grouping at all. What is the purpose of grouping by address, for example?
Here's how I would solve your problem:
First, design an optimization function that will induce an ordering on your dataset that reflects your own prioritization of money vs. age. For example, to severely penalize large age differences but prefer more money with small ones, you could try:
scored = FOREACH A GENERATE *, money / POW(1+ABS($my_age-age)/10, 2) AS score;
ordered = ORDER scored BY score DESC;
top10 = LIMIT ordered 10;
That gives you the 10 best people according to your optimization function.
Then the only work is to design a function that matches your own judgments. For example, in the function I chose, a person with $100,000 who is your age would be preferred to someone with $350,000 who is 10 years older (or younger). But someone with $500,000 who is 20 years older or younger is preferred to someone your age with just $50,000. If either of those don't fit your intuition, then modify the formula. Likely a simple quadratic factor won't be sufficient. But with a little experimentation you can hit upon something that works for you.