PIG: Filter hive table by previous table result - hadoop

I need to query one HIVE table and filter the other table with one column of the previous one.
Example:
A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
filterA = filter A by (id=='123');
B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();
//the problem is here. filterA has many rows. I need to apply filter for each of the row.
filterB = filter B by (id==filterA.id);
Data in A:
tabid id dept location
1 1 IS SJ
2 4 CS SF
3 5 EC MD
Data in B:
tabid id name address
1 4 john 123 S AVE
2 5 jane 456 N BLVD
3 9 nick 789 GREAT LAKE DR
Expected Result:
tabid id name address
1 4 john 123 S AVE
2 5 jane 456 N BLVD

As asked in the comment, it sounds like what you're looking for is a join. Sorry if I misunderstood your question.
A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();
C = JOIN A by id, B by id;

Related

Sort Column header based on row value, and show as columns

We have sheet with column names and values in the cells below.
We like to have a list of the names and the value next to it ordered.
example.
A
B
C
D
E
1
John
Mary
Tom
Grace
2
3
4
5
2
and we would like the same data below which looks like...
A
B
1
Tom
5
2
Mary
4
3
John
3
4
Grace
2
Any ideas? Thanks
use:
=SORT(TRANSPOSE(A1:D2), 2, )
or:
=SORT(TRANSPOSE({A1:D1; A4:D4}), 2, )
SUGGESTION
Perhaps you can try this way:
=QUERY(TRANSPOSE(A1:D2),"SELECT * order by Col2 DESC")
Sample Sheet
Reference
TRANSPOSE
QUERY

Distinct on two columns with same data type

In my game application I have a combats table:
id player_one_id player_two_id
---- --------------- ---------------
1 1 2
2 1 3
3 3 4
4 4 1
Now I need to know hoy many unique users played the game. How can I apply distinct, count on both columns player_one_id and player_two_id?
Many thanks.
By using union you can get unique distinct value.
$playerone = DB::table("combats")
->select("combats.player_one_id");
$playertwo = DB::table("combats")
->select("combats.player_two_id")
->union($playerone)
->count();

Oracle Query Prevent Displayed Duplicate Record

Let's say i have a table structure like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
so, how to diplay all of the data above into like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
3 Knight NY University 3
5 JOHN HARVARD 3
my purpose is want to display data with the max of codeschool, but when i tried with my query below :
SELECT NAME, SCHOOLNAME, MAX(CODESCHOOL) FROM TABLE GROUP BY NAME, SCHOOLNAME
but the result is just like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
maybe it caused by the GROUP BY SCHOOLNAME, when i tried to not select SCHOOLNAME, the data displayed just like what i expected, but i need the SCHOOLNAME field for search condition in my query
hope you guys can help me out of this problem
any help will be appreciated
thanks
Using some wacky joins you can get a functional get max rows per category query.
What you essentially need to do is to join the table to itself and make sure that the joined values only contain the top values for the CODESCHOOL column.
I've also added a :schoolname parameter because you wanted to search by schoolname
Example:
SELECT
A.*
FROM
TABLE1 A
LEFT OUTER JOIN TABLE1 B ON B.NAME = A.NAME
AND B.CODESCHOOL < A.CODESCHOOL
WHERE
B.CODESCHOOL IS NULL AND
(
(A.SCHOOLNAME = :SCHOOLNAME AND :SCHOOLNAME IS NOT NULL) OR
(:SCHOOLNAME IS NULL)
);
this should create this output, note that dark has 2 outputs because it has 2 rows with the same code school which is the max in the dark "category"/name.
ID|NAME |SCHOOLNAME |CODESCHOOL
--| -----|----------------|----------
4|Knight|LA Senior HS | 2
5|JOHN |HARVARD | 3
2|DARK |Kindergarten 111| 1
1|DARK |Kindergarten 123| 1
It's not the most effective query but it should be more than good enough as a starting point.
Sidenote: I've been blatantly stealing this logic for a while from https://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/
I am using an analytical window function ROW_NUMBER().
This will group (or partition) by NAME then select the top 1 CODESCHOOL in DESC order.
Select NAME,
SCHOOLNAME,
CODESCHOOL
From (
Select NAME,
SCHOOLNAME,
CODESCHOOL,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY CODESCHOOL DESC) as rn
from myTable)
Where rn = 1;

linq group by First()

I have the following list
ID Counter SrvID FirstName
-- ------ ----- ---------
1 34 66M James
5 34 66M Keith
3 55 45Q Jason
2 45 75W Mike
4 33 77U Will
What I like to do is to order by ID by ascending and then get the first value of Counter, SrvID which are identical (if any).
So the output would be something like:
ID Counter SrvID FirstName
-- ------ ----- ---------
1 34 66M James
2 45 75W Mike
3 55 45Q Jason
4 33 77U Will
Note how ID of 5 is removed from the list as Counter and SrvID was identical to what I had for ID 1 but as ID 1 came first I removed 5.
I tried the following but not working:
var query = from record in list1
group record by new {record.Counter, record.SrvID }
into g
let winner = (from groupedItem in g
order by groupedItem.ID
select groupedItem ).First()
select winner;
I get the followng message:
The method 'First' can only be used as a final query operation.
The funny thing is the full error message is:
"NotSupportedException: The method 'First' can only be used as a final query operation. Consider using the method 'FirstOrDefault' in this instance instead."
I have had a problem with using First in Entity Framework, have you tried changing to FirstOrDefault ?

How do I get the next record in table with linq?

I have a table Orders like this:
ID
CustomerID
name
ID CustomerID name
1 4 aa
5 6 bbb
4 9 ccc
8 10 ddd
first ordering the table,then get the next row..... How to do?
if current row id is 4 ,i want to get the row where id==5
I think you want this:
Orders.OrderBy(x=>x.ID).Skip(1).Take(1)
Edit: If I understand your question now:
Orders.OrderBy(x=>x.ID).Where(x=>x.ID>4).FirstOrDefault();

Resources