How to flatten the queried data - oracle

I am currently using the below query to pull the data which is being represented in 4 rows for the same sample record and would like to have it flattened into 1 ow per sample. Attaching the query results for information any help is much appreciated.
select s.name as CRF, a.name as Aliquot_Name, a.aliquot_type, au.u_step_yield as Step_Yield, au.u_step_concentration as Step_Concentration, au. u_pooled_plasma_volume as Pooled_Plasma_volume
from aliquot a
join aliquot_user au on a.aliquot_id = au.aliquot_id
join sample s on s.sample_id = a.SAMPLE_ID
where a.aliquot_type in ('DNA Extracted', 'Library', 'Target Enrichment', 'DNA Plasma')
order by s.name desc, a.aliquot_type, a.name, au.u_step_yield, au.u_step_concentration, au.u_pooled_plasma_volume;
CRF ALIQUOT_NAME ALIQUOT_TYPE STEP_YIELD STEP_CONCENTRATION POOLED_PLASMA_VOLUME
CRF007650 PE-0046758 DNA Plasma 10
CRF007650 LCNL-47275 Library 2,178 36
CRF007650 HCNLS-47467 Target Enrichment 105 2
CRF007649 1146667362 DNA Extracted 451 6
CRF007649 PE-0046774 DNA Plasma 10
CRF007649 LCNL-47291 Library 3,543 59
CRF007649 HCNLS-47483 Target Enrichment 132 2
CRF007648 1146668498 DNA Extracted 166 2
CRF007648 PE-0046755 DNA Plasma 9
CRF007648 LCNL-47272 Library 3,881 65
CRF007648 HCNLS-47463 Target Enrichment 381 6
CRF007647 1146635220 DNA Extracted 29 0
CRF007647 PE-0046764 DNA Plasma 8
CRF007647 LCNL-47281 Library 1,274 21
CRF007647 HCNLS-47473 Target Enrichment 57 1
CRF007646 1146736347 DNA Extracted 67 1

I think you have to more specific.
There's no tables' info. which is pk and which is not.
Only I can say now is that you have to join same table if you want to flatten rows.
If you want to get answer with query, you have to write your tables' info and others can help people answer your question.

As far as I understand your data you have 4 entrys in your table a with different a.aliquot_type ('DNA Extracted', 'Library', 'Target Enrichment', 'DNA Plasma'). And you want to give 4 columns with the corresponding Aliquot_Name (one for 'DNA Extracted' etc.).
You could use 4 columns with a subselect where you read the corresponding data from aliquot and therefore you have to quit the join
a.aliquot_id = au.aliquot_id
For example:
select s.name as CRF, (select a.aliquot_type from aliquot where a.aliquot_type = 'DNA Extracted' and ....) col1, (select a.aliquot_type from aliquot where a.aliquot_type = 'Library' and ....) col2, ...

Related

Select single random sample from group by in Hive

I have a table that looks like so:
Name Age Num_Hobbies Num Shoes
Jane 31 10 2
Bob 23 3 4
Jane 60 2 200
Jane 31 100 6
Bob 10 8 7
etc etc
I would like to group this table by Name and Age, and at random pick one row from the rest of the columns.
In pandas, I would do the following:
df.groupby(['Name', 'Age']).apply(lambda x: x.sample(n=1))
In hive, I know how to create the group, but not how to choose a single random sample from group.
I saw this question on stack overflow: How to sample for each group in hive?
However, I do not understand how to apply Dynamic partitions or Hive bucketing to select a single sample from a group.
You can use rank() or row_number() with rand()
select * from
(
select name,age,rank() (partition by name,age order by rand()) as rank
from table
) t
where rank = 1

What is the most efficient way to update values of a table based on a mapping from another table

I have a table including following details.
empID department location segment
1 23 55 12
2 23 11 12
3 25 11 39
I also have a mapping table like following
Field old value new value
Department 23 74
department 25 75
segment 10 24
location 11 22
So My task is to replace old values with new values. I can actually use a cursor and update departments first then segments so on and so forth . But that is time consuming and inefficient. I would like to know if there are any efficient way to do this. Which also need to support in future if we were plan to add more columns to the mapping.
cheers.
Check this if it solves the issue.
update emp set department = (select map.new_value from map where emp.department = map.old_value);
How about copying the data to a new table?
CREATE TABLE newemp AS
SELECT e.empid,
NVL(d.new_value, e.department) AS department,
NVL(l.new_value, e.location) AS location,
NVL(s.new_value, e.segment) AS segment
FROM emp e
LEFT JOIN map d ON d.field='DEPARTMENT' AND e.department = d.old_value
LEFT JOIN map l ON l.field='LOCATION' AND e.location = d.old_value
LEFT JOIN map s ON s.field='SEGMENT' AND e.segment = d.old_value
ORDER BY e.empid;
EMPID DEPARTMENT LOCATION SEGMENT
1 84 55 12
2 84 11 12
3 75 11 39
You'll need obviously three passes through the mapping table, but only one pass through the emp table.
We use a LEFT JOIN because not all values will be changed. If no new_value is found, the NVL function uses the existing value of the emp table.
You could update the original table from this new table (if the new table has a primary key):
UPDATE (SELECT empid,
e.department as old_department,
n.department as new_department,
e.location as old_location,
n.location as new_location,
e.segment as old_segment,
n.segment as new_segment
FROM emp e
JOIN newemp n USING (empid))
SET old_department = new_department,
old_location = new_location,
old_segment = new_segment
WHERE old_department != new_department
OR old_location != new_location
OR old_segment != new_segment;

update rows from multiple tables

I have two tables affiliation and customer, in that i have data like this
aff_id From_cus_id
------ -----------
1 10
2 20
3 30
4 40
5 50
cust_id cust_aff_id
------- -------
10
20
30
40
50
i need to update data for cust_aff_id column from affiliation table which is aff_id like below
cust_id cust_aff_id
------- -------
10 1
20 2
30 3
40 4
50 5
could u please give reply if anyone knows......
Oracle doesn't have an UPDATE with join syntax, but you can use a subquery instead:
UPDATE customer
SET customer.cust_aff_id =
(SELECT aff_id FROM affiliation WHERE From_cus_id = customer.cust_id)
merge into customer t2
using affiliation t1 on (t1.From_cus_id =t2.cust_id )
WHEN MATCHED THEN
update set t2.cust_aff_id = t1.aff_id
;
Here is an update with join syntax. This, quite reasonably, works only if from_cus_id is primary key in the first table and cust_id is foreign key in the second table, referencing the first table. Without these conditions, the requirement doesn't make much sense in the first place anyway... but Oracle requires that these constraints be stated explicitly in the tables. This is also reasonable on Oracle's part IMO.
update
( select t1.aff_id, t2.cust_aff_id
from affiliation t1 join customer t2 on t2.cust_id = t1.from_cus_id) j
set j.cust_aff_id = j.aff_id;

convert string of a column to multiple rows

For data like below
Col1
----
1
23
34
124
Output should be like below
Out
1
2
3
4
I tried the below hierarchical query but its giving repeated data
select substr(col1, level, 1)
from table1
connect by level <= length(col1);
I can't use distinct as this is sample and main table where I have to use this query has quite large data.
Thanks

Hive: Joining two tables with different keys

I have two tables like below. Basically i want to join both of them and expected the result like below.
First 3 rows of table 2 does not have any activity id just empty.
All fields are tab separated. Category "33" is having three description as per table 2.
We need to make use of "Activity ID" to get the result for "33" category as there are 3 values for that.
could anyone tell me how to achieve this output?
TABLE: 1
Empid Category ActivityID
44126 33 TRAIN
44127 10 UFL
44128 12 TOI
44129 33 UNASSIGNED
44130 15 MICROSOFT
44131 33 BENEFITS
44132 43 BENEFITS
TABLE 2:
Category ActivityID Categdesc
10 billable
12 billable
15 Non-billable
33 TRAIN Training
33 UNASSIGNED Bench
33 BENEFITS Benefits
43 Benefits
Expected Output:
44126 33 Training
44127 10 Billable
44128 12 Billable
44129 33 Bench
44130 15 Non-billable
44131 33 Benefits
44132 43 Benefits
It's little difficult to do this Hive as there are many limitations. This is how I solved it but there could be a better way.
I named your tables as below.
Table1 = EmpActivity
Table2 = ActivityMas
The challenge comes due to the null fields in Table2. I created a view and Used UNION to combine result from two distinct queries.
Create view actView AS Select * from ActivityMas Where Activityid ='';
SELECT * From (
Select EmpActivity.EmpId, EmpActivity.Category, ActivityMas.categdesc
from EmpActivity JOIN ActivityMas
ON EmpActivity.Category = ActivityMas.Category
AND EmpActivity.ActivityId = ActivityMas.ActivityId
UNION ALL
Select EmpActivity.EmpId, EmpActivity.Category, ActView.categdesc from EmpActivity
JOIN ActView ON EmpActivity.Category = ActView.Category
)
You have to use top level SELECT clause as the UNION ALL is not directly supported from top level statements. This will run total 3 MR jobs. ANd below is the result I got.
44127 10 billable
44128 12 billable
44130 15 Non-billable
44132 43 Benefits
44131 33 Benefits
44126 33 Training
44129 33 Bench
I'm not sure if I understand your question or your data, but would this work?
select table1.empid, table1.category, table2.categdesc
from table1 join table2
on table1.activityID = table2.activityID;

Resources