I have table of data which looks like:
J2150 IMPAC-BRIGA (RH)
J2283 BAYWA-FERGU (NK)
J2284 BAYWA-DIAPR (NK)
J2320 BOSCH-OWNER (ML)
J2475 GIPPS-GIPWF (NK)
J2568 GWFLD-CASTL-002 (PW)
J2663 AUSTRA-BARHA-001 (NK)
J2690 PHOTO-NEWAT (KT)
J2692 TETRI-MANGA (NK)
I'm using a Google Sheets query but I want to order the table by the project manager ie the initials at the end eg (CM).
I've been trying to use 'ends with' and 'order by' but this doesn't do what I want; eg
select Col2 where Col2 ends with '(CM)' group by Col2 order by Col2
I could separate out the initials into a new column in the original data, select and sort on that but is there an elegant way of sorting by the end of the row rather than the start?
try the below formula:
Assuming your data range is A2:B, if not then change your data range accordingly
=Query({A2:A,B2:B,Arrayformula(SPLIT(B2:B," "))},"Select Col1,Col2 where Col4 <>'' order by Col4")
=ArrayFormula(QUERY(REGEXEXTRACT(A:A,"^(.+?) (.+)$"),"select * where Col2 ends with '(NK)' order by Col2"))
I realised my original question wasn't precise enough: the rows are one record and not separate entries
J2150 IMPAC-BRIGA (RH)
J2283 BAYWA-FERGU (NK)
J2284 BAYWA-DIAPR (NK)
J2320 BOSCH-OWNER (ML)
J2475 GIPPS-GIPWF (NK)
J2568 GWFLD-CASTL-002 (PW)
J2663 AUSTRA-BARHA-001 (NK)
J2690 PHOTO-NEWAT (KT)
J2692 TETRI-MANGA (NK)
But I did want to sort on the two-letter suffix in the () at the end.
This is what I came up with:
query({'Project Overview'!$B$1:$B$373,
arrayformula(REGEXEXTRACT('Project Overview'!$B$1:$B$373,"\(([A-Z]{2})\)")),
'Project Overview'!$c$1:$d$373},
"select Col1, Col2, Col3, Col4 where Col3 is not null order by Col2
label Col1 'Project', Col2 'Project Manager'", 1)
This extracts the first column, 'B' and then uses REGEXTRACT to pull the 2-letter code from the same data as the second column.
Project
Project Manager
4-Oct-21
11-Oct-21
Kinley (CM)
CM
1
Kinley (CM)
CM
1
J1911 KINEL-KENT (CM)
CM
4
8
J2741 SIGNL-DARLP (DD)
DD
2
J2745 MANGO-FEASB (DD)
DD
1
J2754 CPENG-WANG (DD)
DD
16
8
J2754 CPENG-WANG (DD)
DD
20
16
J2754 CPENG-WANG (DD)
DD
4
DARETON O&M (JM)
JM
0.5
0.5
DARETON O&M (JM)
JM
2
4
Related
SCENARO/input,
COL1 COL2
100 ABC
101 PQR
100 ABC
100 OPQ
101 HDR
101 PQR
Expected OUTPUT:
COL1 COL2
100 ABC,OPQ
101 PQR,HDR
This is one of classic string aggregation issue. Pls follow below steps.
using SRT order the data and get distinct. So, click enable distinct and then make sure you are ordering col1 first.
using expression transformation concat the COL2. Create 5 ports - in_out means input+output, in_ means input only, v_ means variable port and so on.
in_out_col1
in_col2
v_col2 = iif( in_out_col1=v_prev_col1, v_col2||',' ||in_col2,in_col2)
v_prev_col1 = in_out_col1
out_col2=v_col2
Create an AGG, group by col1. Create a new column max_col2 and assign the max value to it.
in_out_col1
in_col2
out_max_col2= MAX(in_col2)
Connect in_out_col1 to col1 and out_max_col2 to col2 for your desired data.
I have this formula which works great: =ArrayFormula(QUERY(TRANSPOSE(TRIM(SPLIT(JOIN(";",'TAB'!I2:I),";"))&{"";""}),"select Col1, count(Col2) group by Col1 label count(Col2) ''",0))
It takes values in column that are separated with ; and counts unique entries and plots everything in a table.
Question: I would like to add a filter/condition so that it would plot only the values what have specific entries in another column. Like A or B ir C, but not all values.
I have tried: =ArrayFormula(QUERY(TRANSPOSE(TRIM(SPLIT(JOIN(";",'TAB'!I2:I),";"))&{"";""})&'RAW-TODOS'!F2:F,"select Col1 * where 'TAB'!F2:F ='A'or 'B' or 'C', count(Col2) group by Col1 label count(Col2) ''",0))
but probably because of obvious reasons it did not work. Please help me with this one.
Thank you in advance.
try:
=INDEX(QUERY(TRIM(FLATTEN(SPLIT(FILTER('RAW-TODOS'!B:B,
REGEXMATCH('RAW-TODOS'!A:A, "NEW|IN PROGRESS")), ";"))),
"select Col1,count(Col1)
where Col1 is not null
group by Col1
label count(Col1)''"))
Input the criteria in J1, which is referred by my formula.
leave it blank to match all rows
for single value, simply input it, i.e. NEW
for multiple values, join them with |, i.e. NEW|IN PROGRESS
=ArrayFormula(QUERY(SPLIT(FLATTEN('RAW-TODOS'!A2:A&"♦"&TRIM(SPLIT('RAW-TODOS'!B2:B,";"))),"♦"),CONCATENATE("select Col2,count(Col2) where Col2 is not null",IF(J1="",," and Col1 matches '"&J1&"'")," group by Col2 label count(Col2) ''"),0))
Here's the formula I currently use:
=query(IMPORTRANGE("XXXX","XXXXX!A:H"),
"select Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8
where Col1> date '"&TEXT(F1,"yyyy-mm-dd")&"' and Col3 = '"&B1&"' and Col4 = '"&D1&"'
order by Col1 desc",1)
The formula is working.
Col1 includes input dates. I retrieve only values that are after a date listed in F1.
Col3 and Col3 include some properties which are selected in cells B1 and D1, accordingly.
Col5 includes strings (client names). client name can repeat on several rows.
I'd like to retrieve just the most recent one. Any ideas on how to do it?
And, to add more fun into the question, would it be the same idea to retrieve the oldest row per client?
Here's a link to demo sheet, details in the "unique query" tab.
Another challenge can be to retrieve X number of row per client, and not just the most recent one.
try:
=SORTN(QUERY(IMPORTRANGE("1LoHg53hzQvYtOLTcDwxLY8OrKVN4F7usX8YI41BtdWg", "Activity list!A:E"),
"where Col1 > date '"&TEXT(I2, "yyyy-mm-dd")&"'
and Col2 = '"&I3&"'
order by Col1 desc", 1), 99^99, 2, 4, 1)
SORTN explained:
99^99 all rows - no limits
2 means "merge mode"
4 collapse 4th column into unique values
1 return 4th column ascending - 0 for descending
Solution
I am basing myself on a SQL expression that would achieve this result but unfortunately Google Sheets QUERY language is not as expressive. That's why the resulting formula looks a bit confusing.
=query(
IMPORTRANGE("https://docs.google.com/spreadsheets/d/1LoHg53hzQvYtOLTcDwxLY8OrKVN4F7usX8YI41BtdWg/edit","Activity list!A:E"),
"select Col1, Col2, Col3, Col4, Col5
where Col1= date'"&
JOIN("' or Col1 = date '",
ARRAYFORMULA(TEXT(ARRAY_CONSTRAIN(
query(query(
"THE IMPORTED RANGE",
"select Col1,Col2,Col3,Col4,Col5
where Col1> date '"&TEXT(I2,"yyyy-mm-dd")&"' and Col2 = '"&I3&"'
order by Col1 desc",1),
"select MAX(Col1), Col4
group by Col4
order by MAX(Col1) desc
label MAX(Col1) ''", 0),
1000, 1),
"yyyy-MM-DD")
))&"'",1)
Queries specification starting from the inner one:
Filter the data with your criteria.
Get the most recent submission grouping by Client.
Join the results with the whole dataset to fetch the other column values.
Use the ARRAY_CONSTRAIN formula to retrieve the columns with the dates.
The same approach goes for the oldest submission changing MAX for MIN aggregate function.
Note: This is not suited for daily multiple submissions.
I think the easiest way to do this is using a Vlookup into a query(). unfortunately, it involves using the IMPORTRANGE() twice, but I still think it's more efficient than some other possible methods. You'll find it in A2 of the MK.Help tab on your sample sheet.
=ARRAYFORMULA(IFERROR(VLOOKUP(UNIQUE(query(IMPORTRANGE("1LoHg53hzQvYtOLTcDwxLY8OrKVN4F7usX8YI41BtdWg","Activity list!A:E"), "select Col4 where Col1> date '"&TEXT(I2,"yyyy-mm-dd")&"' and Col2 = '"&I3&"'
order by Col1 desc",1)),query(IMPORTRANGE("1LoHg53hzQvYtOLTcDwxLY8OrKVN4F7usX8YI41BtdWg","Activity list!A:E"), "select Col4,Col1,Col2,Col3,Col5 where Col1> date '"&TEXT(I2,"yyyy-mm-dd")&"' and Col2 = '"&I3&"'
order by Col1 desc",1),{2,3,4,1,5},0)))
I have a table with Lots of cost columns for each Key
TableA
SK1 SK2 Col1 Col2 Col3..... Col50 Flg(Y/N)
1 2 10 20 30 ...... 500 Y
1 2 10 20 30 ...... 500 N
2 2 10 20 30 ...... 500 N
I need to aggregate(sum) of all values and then check if there are any values with Y then add them to new tableB.
Here table A record combination (1,2) for (sk1,sk2) should be returned.
The i have written query is to select lisr of all cols and add as group by.
We have lots of data so this query is taking too long to run. Any chance to relook into this and do so that it can become faster.
select
Sk1,
Sk2,
nvl(sum(col3),0),
nvl(sum(col4))0,
.....
nvl(sum(col50))
from table A
group by Sk1,
Sk2
Iam using this as part of large query where in many other calculations are performed on top of this.
Working out whether any of a grouped set of records contains a 'Y' would be as simple as ...
select ...
from ...
group by ...
having max(flg) = 'Y'
For now i have created a temporary table and have loaded all the data into it.
If you are using this as part of large query, did you try WITH option?
It could be like this
WITH SUM_DATA AS (select col1, col2, nvl(sum(col3),0), nvl(sum(col4))0, ..... nvl(sum(col50)) from table A group by col1, col2)
SELECT xyz
FROM abc, sum_data
WHERE abc.join_col = sum_data.join_col
More help here
I am recieving information from a csv file from one department to compare with the same inforation in a different department to check for discrepencies (About 3/4 of a million rows of data with 44 columns in each row). After I have the data in a table, I have a program that will take the data and send reports based on a HQ. I feel like the way I am going about this is not the most efficient. I am using oracle for this comparison.
Here is what I have:
I have a vb.net program that parses the data and inserts it into an extract table
I run a procedure to do a full outer join on the two tables into a new table with the fields in one department prefixed with '_c'
I run another procedure to compare the old/new data and update 2 different tables with detail and summary information. Here is code from inside the procedure:
DECLARE
CURSOR Cur_Comp IS SELECT * FROM T.AEC_CIS_COMP;
BEGIN
FOR compRow in Cur_Comp LOOP
--If service pipe exists in CIS but not in FM and the service pipe has status of retired in CIS, ignore the variance
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
--If there is not a summary record for this HQ in the table for this run, create one
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT compRow.HQ, to_date(sysdate, 'DD/MM/YYYY') from dual WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = compRow.HQ AND RUN_DATE = to_date(sysdate, 'DD/MM/YYYY'))
-- Check fields and update the tables accordingly
If (compRow.cis_loop <> compRow.cis_loop_c) Then
--Insert information into the details table
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl,
DateTime, Changed_Field, CIS_Value, FM_Value)
VALUES(compRow.Fac_ID, compRow.Pipe_Num, compRow.Hq, compRow.Street_Num || ' ' || compRow.Street_Name,
'Y', sysdate, 'Cis_Loop', compRow.cis_loop, compRow.cis_loop_c);
-- Update information into the summary table
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq = compRow.Hq
AND Run_Date = to_date(sysdate, 'DD/MM/YYYY')
End If;
END LOOP;
END;
Any suggestions of an easier way of doing this rather than an if statement for all 44 columns of the table? (This is run once a week if it matters)
Update: Just to clarify, there are 88 columns of data (44 of duplicates to compare with one suffixed with _c). One table lists each field in a row that is different so one row can mean 30+ records written in that table. The other table keeps tally of the number of discrepencies for each week.
First of all I believe that your task can be implemented (and should be actually) with staight SQL. No fancy cursors, no loops, just selects, inserts and updates. I would start with unpivotting your source data (it is not clear if you have primary key to join two sets, I guess you do):
Col0_PK Col1 Col2 Col3 Col4
----------------------------------------
Row1_val A B C D
Row2_val E F G H
Above is your source data. Using UNPIVOT clause we convert it to:
Col0_PK Col_Name Col_Value
------------------------------
Row1_val Col1 A
Row1_val Col2 B
Row1_val Col3 C
Row1_val Col4 D
Row2_val Col1 E
Row2_val Col2 F
Row2_val Col3 G
Row2_val Col4 H
I think you get the idea. Say we have table1 with one set of data and the same structured table2 with the second set of data. It is good idea to use index-organized tables.
Next step is comparing rows to each other and storing difference details. Something like:
insert into diff_details(some_service_info_columns_here)
select some_service_info_columns_here_along_with_data_difference
from table1 t1 inner join table2 t2
on t1.Col0_PK = t2.Col0_PK
and t1.Col_name = t2.Col_name
and nvl(t1.Col_value, 'Dummy1') <> nvl(t2.Col_value, 'Dummy2');
And on the last step we update difference summary table:
insert into diff_summary(summary_columns_here)
select diff_row_id, count(*) as diff_count
from diff_details
group by diff_row_id;
It's just rough draft to show my approach, I'm sure there is much more details should be taken into account. To summarize I suggest two things:
UNPIVOT data
Use SQL statements instead of cursors
You have several issues in your code:
If(compRow.pipe_num = '' AND cis_status_c = 'R')
continue
END IF
"cis_status_c" is not declared. Is it a variable or a column in AEC_CIS_COMP?
In case it is a column, just put the condition into the cursor, i.e. SELECT * FROM T.AEC_CIS_COMP WHERE not (compRow.pipe_num = '' AND cis_status_c = 'R')
to_date(sysdate, 'DD/MM/YYYY')
That's nonsense, you convert a date into a date, simply use TRUNC(SYSDATE)
Anyway, I think you can use three single statements instead of a cursor:
INSERT INTO t.AEC_CIS_SUM (HQ, RUN_DATE)
SELECT comp.HQ, trunc(sysdate)
from AEC_CIS_COMP comp
WHERE NOT EXISTS
(SELECT null FROM t.AEC_CIS_SUM WHERE HQ = comp.HQ AND RUN_DATE = trunc(sysdate));
INSERT INTO T.AEC_CIS_DET( Fac_id, Pipe_Num, Hq, Address, AutoUpdatedFl, DateTime, Changed_Field, CIS_Value, FM_Value)
select comp.Fac_ID, comp.Pipe_Num, comp.Hq, comp.Street_Num || ' ' || comp.Street_Name, 'Y', sysdate, 'Cis_Loop', comp.cis_loop, comp.cis_loop_c
from T.AEC_CIS_COMP comp
where comp.cis_loop <> comp.cis_loop_c;
UPDATE AEC_CIS_SUM
SET cis_loop = cis_loop + 1
WHERE Hq IN (Select Hq from T.AEC_CIS_COMP)
AND trunc(Run_Date) = trunc(sysdate);
They are not tested but they should give you a hint how to do it.