Combine multiple rows that share the same ID - azure-databricks

I'm trying to combine comment rows that have the same id in the sequence order separated by '/ ' The data I need is in different databases & tables. I've tried GROUP BY, UNION, and AGGERGATE with no success.
Here is example sql code:
SELECT
F.ID
F.FRUIT
C.SEQUNENCE
C.COMMENT
FROM FRUIT F
LEFT JOIN COMMENT C
ON F.ID = C.ID
Current result:
ID
FRUIT
SEQUENCE
COMMENT
1
APPLE
1
COMMENT1
1
APPLE
2
COMMENT2
1
APPLE
3
COMMENT3
2
BANANA
1
COMMENT1
2
BANANA
2
COMMENT2
3
KIWI
1
COMMENT1
Desired result:
ID
FRUIT
COMMENT
1
APPLE
COMMENT1/ COMMENT2/ COMMENT3
2
BANANA
COMMENT1/ COMMENT2
3
KIWI
COMMENT1
Thanks for the help - J

The rough equivalent would be using collect_set and array_join but note you have lost the order:
SELECT
F.ID
,F.FRUIT
,array_join(array_sort(collect_set(str(sequence) + '-' + comment)), '/')
FROM FRUIT F
LEFT JOIN COMMENT C
ON F.ID = C.ID
GROUP BY F.ID, F.FRUIT
test
http://sqlfiddle.com/#!18/9757e/3
STRING_AGG()
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16

Related

Sort Column header based on row value, and show as columns

We have sheet with column names and values in the cells below.
We like to have a list of the names and the value next to it ordered.
example.
A
B
C
D
E
1
John
Mary
Tom
Grace
2
3
4
5
2
and we would like the same data below which looks like...
A
B
1
Tom
5
2
Mary
4
3
John
3
4
Grace
2
Any ideas? Thanks
use:
=SORT(TRANSPOSE(A1:D2), 2, )
or:
=SORT(TRANSPOSE({A1:D1; A4:D4}), 2, )
SUGGESTION
Perhaps you can try this way:
=QUERY(TRANSPOSE(A1:D2),"SELECT * order by Col2 DESC")
Sample Sheet
Reference
TRANSPOSE
QUERY

PIG: Filter hive table by previous table result

I need to query one HIVE table and filter the other table with one column of the previous one.
Example:
A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
filterA = filter A by (id=='123');
B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();
//the problem is here. filterA has many rows. I need to apply filter for each of the row.
filterB = filter B by (id==filterA.id);
Data in A:
tabid id dept location
1 1 IS SJ
2 4 CS SF
3 5 EC MD
Data in B:
tabid id name address
1 4 john 123 S AVE
2 5 jane 456 N BLVD
3 9 nick 789 GREAT LAKE DR
Expected Result:
tabid id name address
1 4 john 123 S AVE
2 5 jane 456 N BLVD
As asked in the comment, it sounds like what you're looking for is a join. Sorry if I misunderstood your question.
A = LOAD 'db.table1' USING org.apache.hive.hcatalog.pig.HCatLoader();
B = LOAD 'db.table2' USING org.apache.hive.hcatalog.pig.HCatLoader();
C = JOIN A by id, B by id;

Oracle Query Prevent Displayed Duplicate Record

Let's say i have a table structure like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
so, how to diplay all of the data above into like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
3 Knight NY University 3
5 JOHN HARVARD 3
my purpose is want to display data with the max of codeschool, but when i tried with my query below :
SELECT NAME, SCHOOLNAME, MAX(CODESCHOOL) FROM TABLE GROUP BY NAME, SCHOOLNAME
but the result is just like this :
ID | Name | SCHOOLNAME | CODESCHOOL
1 DARK Kindergarten 123 1
2 DARK Kindergarten 111 1
3 Knight NY University 3
4 Knight LA Senior HS 2
5 JOHN HARVARD 3
maybe it caused by the GROUP BY SCHOOLNAME, when i tried to not select SCHOOLNAME, the data displayed just like what i expected, but i need the SCHOOLNAME field for search condition in my query
hope you guys can help me out of this problem
any help will be appreciated
thanks
Using some wacky joins you can get a functional get max rows per category query.
What you essentially need to do is to join the table to itself and make sure that the joined values only contain the top values for the CODESCHOOL column.
I've also added a :schoolname parameter because you wanted to search by schoolname
Example:
SELECT
A.*
FROM
TABLE1 A
LEFT OUTER JOIN TABLE1 B ON B.NAME = A.NAME
AND B.CODESCHOOL < A.CODESCHOOL
WHERE
B.CODESCHOOL IS NULL AND
(
(A.SCHOOLNAME = :SCHOOLNAME AND :SCHOOLNAME IS NOT NULL) OR
(:SCHOOLNAME IS NULL)
);
this should create this output, note that dark has 2 outputs because it has 2 rows with the same code school which is the max in the dark "category"/name.
ID|NAME |SCHOOLNAME |CODESCHOOL
--| -----|----------------|----------
4|Knight|LA Senior HS | 2
5|JOHN |HARVARD | 3
2|DARK |Kindergarten 111| 1
1|DARK |Kindergarten 123| 1
It's not the most effective query but it should be more than good enough as a starting point.
Sidenote: I've been blatantly stealing this logic for a while from https://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/
I am using an analytical window function ROW_NUMBER().
This will group (or partition) by NAME then select the top 1 CODESCHOOL in DESC order.
Select NAME,
SCHOOLNAME,
CODESCHOOL
From (
Select NAME,
SCHOOLNAME,
CODESCHOOL,
ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY CODESCHOOL DESC) as rn
from myTable)
Where rn = 1;

Combine Two iEnumerable Objects with ID

It must be that time of year. Totally having a brain fart.
I have two basic iEnumerable objects. Each object has two fields. In the first object I have a field with an ID and then total.
Id Total
1 23
2 16
3 59
...
In the other object it has a ID field and then Fruit Name
ID Fruit
1 Apple
2 Orange
3. Pear
I need to combine these into a new table by the ID so I get a new object with the fields
ID Total Fruit
1 23 Apple
2 16 Orange
3 59 Pear
What's the best way to go about this using LINQ?
Do a join
from o in iEobject
join f in Fruit on o.ID equals f.ID
select new {ID = o.ID, Total = o.Total, Fruit = f.Name }

displaying the top 3 rows

In the school assignment I'm working on I need to display the 3 criminals with the most crimes. But I'm having a few problems
Here's the code I have so far, and its output:
`Select Last, First, Count(Crime_ID)
From Criminals Natural Join crimes
Group by Last, First, Criminal_ID
order by Count(Crime_Id) Desc`
`LAST FIRST COUNT(CRIME_ID)
--------------- ---------- ---------------
Panner Lee 2
Sums Tammy 1
Statin Penny 1
Dabber Pat 1
Mansville Nancy 1
Cat Tommy 1
Phelps Sam 1
Caulk Dave 1
Simon Tim 1
Pints Reed 1
Perry Cart 1
11 rows selected `
I've been toying around with ROWNUM, but when I include it in the SELECT it won't run because of my GROUP BY. But If you put ROWNUM in the GROUP BY it just separates everything back out.
I just want to display the top 3 with the most crimes, which is weird because only 1 guy has more than 1 crime. Theoretically, more criminals would be added to the Database, but these are the tables given in the assignment.
select *
from
( Select Last, First, Count(Crime_ID)
From Criminals Natural Join crimes
Group by Last, First, Criminal_ID
order by Count(Crime_Id) Desc )
where ROWNUM <= 3;

Resources