SELECT DISTINCT for only one Column in Google Sheets Query - sorting

Let's say I have the following query.
=query (A:C; "SELECT A, B, C")
How can I modify it so that it returns no duplicate A?
In other words, when several rows contain the same A, I want the results to include only one of those rows (the last one). Duplicates in other columns should be allowed.
Here I have found an answer for SQL Server, but I can't figure out how to do this in Google Sheets Query
DISTINCT for only one Column
what I have:
A, B, C
lucas#abc.com, approved, 05/04/2019
lucas#abc.com, not set, 05/05/2019
lucas#abc.com, refunded, 05/06/2019
john#xyz.com, approved, 05/06/2019
john#xyz.com, approved, 05/07/2019
john#xyz.com, approved, 05/07/2019
what I want:
A, B, C
lucas#abc.com, refunded, 05/06/2019
john#xyz.com, approved, 05/07/2019

you can use SORTN where the 3rd parameter is set to 2:
=SORTN(SORT(A1:C, 3, 1), ROWS(A:A), 2, 1, 0)

Try the following formula:
=unique(filter(A:C,match(A:A&C:C,query(query(A:C,"select A,max(C) where A<>'' group by A label max(C) ''"),"select Col1")&query(query(A:C,"select A,max(C) where A<>'' group by A label max(C) ''"),"select Col2"),0)))
Screenshot:

As of today, June, 2020, the query syntax has changed a bit, for those who didn't get the formula above, try to replace it with this variant.
== For those who don't know what this is about.
Google selection of the table of unique values grouped and sorted by date of row creation
=UNIQUE(FILTER(A:C;MATCH(A:A&C:C;QUERY(QUERY(A:C;"select A, max(C) where A <> '' GROUP BY A label max(C) ''");"select Col1")&QUERY(QUERY(A:C;"select A,max(C) where A<>'' group by A label max(C) ''");"select Col2");0)))

Related

Small detail when using this function in Google Sheets

Small detail when using =INDEX($A$8:$A$11;MATCH(MAX(SUMIF($B$8:$B$11;$B$8:$B$11));SUMIF($B$8:$B$11;$B$8:$B$11);0))) If the values in column B are all different it returns the correct date value, but if two identical values in column B coincide in different dates then it returns the date of the first value; it does not return the correct date and it keeps the first one that has the repeated value.
Any idea?
p.s This question can be added to this post
Even more easier way:
On E2 Try this =TRANSPOSE(INDEX(QUERY(A1:B," select A, sum(B) group by A Order By sum(B) Desc "),2))
and format the date and currency accordingly.
You can do that easily and differently to get:
1 - Make a helper table to get unique dates, You can use two ways
a) - Use SUMIF Function to get the sum of Expenditure in each unique date Like so =IF(D2="",,SUMIF($A$2:$A,D2,$B$2:$B)) and drag it down.
b) - By using QUERY Function =QUERY(A1:B11," select A, sum(B) group by A Order By sum(B) Desc ")
2 - to get SUM BY DATE OF HIGHEST EXPENDITURE: =MAX(E2:E)
3 - to get DATE BY HIGHEST EXPENDITURE: =INDEX($D$2:$D,MATCH($H$3,$E$2:$E,0),1)
Make a copy of this sheet "make it yours."
Hope that answerd your question.

Google Sheet Date and Time Calculation Question

I have a column that has 34 records of Week Day, Month/Day, and Times. I am looking for two formulas that I can use in a table that will give me the count of weekdays and the time duration per day. Eventually, I would like to just copy and past new dates into column A and have the table automatically calculate. Here is my google sheet example. Is there a way to do this without creating helper columns? If not, no big deal. Anything to help automate the process will be helpful.
https://docs.google.com/spreadsheets/d/1C6N94QJyEgm-2yg2SEDOweIU2fk2h2DLydKb-nH-ObE/edit?usp=sharing
enter image description here
Take a look at the Punches tab in the sample sheet below. It shows what I was mentioning about breaking the columns up. Then, using the QUERY() function I was able to populate the table.
https://docs.google.com/spreadsheets/d/1qbLOjTdzISICTKyUp_jK6gZbQCt-OwtDYYy3HNJygeE/edit#gid=1181136581
G6 Formula
=if(isna(query($A$1:$C, "SELECT COUNT(B) WHERE B ='"&F2&"' LABEL COUNT(B) ''",1)),0,query($A$1:$C, "SELECT COUNT(B) WHERE B ='"&F2&"' LABEL COUNT(B) ''",1))
H6 Formula
=if(G2<>0,query($A$1:$C, "SELECT C WHERE B ='"&F2&"' ORDER BY C DESC LIMIT 1 LABEL C ''",1)-query($A$1:$C, "SELECT C WHERE B ='"&F2&"' ORDER BY C LIMIT 1 LABEL C ''",1),0)

Clickhouse - Latest Record

We have almost 1B records in a replicated merge tree table.
The primary key is a,b,c
Our App keeps writing into this table with every user action. (we accumulate almost a million records per hour)
We append (store) the latest timestamp (updated_at) for a given unique combination of (a,b)
The key requirement is to provide a roll-up against the latest timestamp for a given combination of a,b,c
Currently, we are processing the queries as
select a,b,c, sum(x), sum(y)...etc
from table_1
where (a,b,updated_at) in (select a,b,max(updated_at) from table_1 group by a,b)
and c in (...)
group by a,b,c
clarification on the sub-query
(select a,b,max(updated_at) from table_1 group by a,b)
^ This part is for illustration only.. our app writes latest updated_at for every a,b implying that the clause shown above is more like
(select a,b,updated_at from tab_1_summary)
[where tab_1_summary has latest record for a given a,b]
Note: We have to keep the grouping criteria as-is.
The table is structured with partition (c) order by (a, b, updated_at)
Question is, is there a way to write a better query. (that can returns results faster..we are required to shave off few seconds from the overall processing)
FYI: We toyed working with Materialized View ReplicatedReplacingMergeTree. But, given the size of this table, and constant inserts + the FINAL clause doesn't necessarily work well as compared to the query above.
Thanks in advance!
Just for test try to use join instead of tuple in (tuples):
select t.a, t.b, t.c, sum(x), sum(y)...etc
from table_1 AS t inner join tab_1_summary using (a, b, updated_at)
where c in (...)
group by t.a, t.b, t.c
Consider using AggregatingMergeTree to pre-calculate result metrics:
CREATE MATERIALIZED VIEW table_1_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(updated_at)
ORDER BY (updated_at, a, b, c)
AS SELECT
updated_at,
a,b,c,
sum(x) AS x, /* see [SimpleAggregateFunction data type](https://clickhouse.tech/docs/en/sql-reference/data-types/simpleaggregatefunction/) */
sum(y) AS y,
/* For non-simple functions should be used [AggregateFunction data type](https://clickhouse.tech/docs/en/sql-reference/data-types/aggregatefunction/). */
// etc..
FROM table_1
GROUP BY updated_at, a, b, c;
And use this way to get result:
select a,b,c, sum(x), sum(y)...etc
from table_1_mv
where (updated_at,a,b) in (select updated_at,a,b from tab_1_summary)
and c in (...)
group by a,b,c

Can you help me when I select Desc in dropdown list google sheet I want the value become unique number

I have a google Spreadsheet.
Column A = Unique number,
Column B = Desc
In column C, I use data validation from column B. I want when I select 1 value the result in column C should be column A. I tried include image to become more clear. Do you have any ideas?
Thank you.
use:
=IFNA(VLOOKUP(D3, {B:B, A:A}, 2, 0))
or:
=IFNA(FILTER(B:B, A:A=D3))

substr function for substract the column value basis of another column in oracle

How to use substr function in oracle to substract the column value
based on another column vale in same table.
For example:suppose table abc having some column value like a=01-CEDAPR while B=AB_52MM_01-CEDAPR
Now i want to populate the column c on the basis of value AB_52MM. can any one suggest me
what is right way to achieve this .
This should be relatively straightforward. All you want to do is replace the value of a, if found in b, with nothing. Right?
WITH abc AS (
SELECT '01-CEDAPR' AS a, 'AB_52MM_01-CEDAPR' AS b
FROM dual
)
SELECT a, b, REPLACE(b, a)
FROM abc
See SQL Fiddle Demo here.
If you need to replace the _ preceding the value of a, then you might want to use REGEXP_REPLACE() (in case the _ may or may not exist):
WITH abc AS (
SELECT '01-CEDAPR' AS a, 'AB_52MM_01-CEDAPR' AS b
FROM dual
)
SELECT a, b, REGEXP_REPLACE(b, '_?' || a || '$')
FROM abc
The $ sign ensures that the value of a is anchored to the end; the ? makes _ optional.
SQL Fiddle Demo here.
Here's a couple of solutions that may or may not help you, based on the sketchy information you have provided. If they don't, then you will need to edit your question to provide more a more detailed explanation of what you're after!
with sd as (select '01-CEDAPR' a, 'AB_52MM_01-CEDAPR' b from dual) -- assumed O1 in column b was a typo
select a,
b,
regexp_replace(b, '_'||a),
substr(b, 1, instr(b, '_', -1) -1)
from sd;

Resources