Oracle counting distinct rows using two columns merged as one - oracle

I have one table where each row has three columns. The first two columns are a prefix and a value. The third column is what I'm trying to get a distinct count for columns one/two.
Basically I'm trying to get to this.
Account
Totals
prefix & value1
101
prefix & value2
102
prefix & value3
103
I've tried a lot of different versions but I'm basically noodling around this.
select prefix||value as Account, count(distinct thirdcolumn) as Totals from Transactions

It sounds like you want
SELECT
prefix||value Account,
count(distinct thirdcolumn) Totals
FROM Transactions
GROUP BY prefix, value
The count(distinct thirdcolumn) says you want a count of the distinct values in the third column. The GROUP BY prefix, value says you want a row returned for each unique prefix/value combination and that the count applies to that combination.
Note that "thirdcolumn" is a placeholder for the name of your third column, not a magic keyword, since I didn't see the actual name in the post.

If you want the number of rows for each prefix/value pair then you can use:
SELECT prefix || value AS account,
COUNT(*) AS totals
FROM Transactions
GROUP BY prefix, value
You do not want to count the DISTINCT values for prefix/value as if you GROUP BY those values then different values for the pairs will be in different groups so the COUNT of DISTINCT prefix/value pairs would always be one.

Related

Google Sheets Formula - Get Total from filtered dates per row (undefined number of columns)

I have this data in Google Sheets where in I need to get the total of the filtered data columns per row. The date columns are not fixed (may increase over time, I already know how to handle this undefined number of columns). What my current challenge encountered is how can I efficiently get a summary of totals per user based on filtered date columns.
My data is like this:
My expected result is like this:
My current idea is this:
Here is a sample spreadsheet for reference:
https://docs.google.com/spreadsheets/d/1_dByPabStGQvh94TabKxwFeUyVaRFnkBCRf4ioTY5jM/edit?usp=sharing
This is a method to unpivot the data so you can work with it
=ARRAYFORMULA(
QUERY(
IFERROR(
SPLIT(
FLATTEN(
IF(ISBLANK(A2:A),,A2:A&"|"&B1:G1&"|"&B2:G)),
"|")),
"select Col1, Sum(Col3)
where
Col2 >= "&DATE(2022,1,1)&" and
Col2 <= "&DATE(2022,1,15)&"
group by Col1
label
Col1 'Person',
Sum(Col3) 'Total'"))
Basically, its creating an output of User1|44557|8 -- it then FLATTENs it all and splits by the pipe, which gives you three clean columns.
Run that through a QUERY to SUM by the person between the dates and you get what you're after. If you wanted to use cell references for dates, simply replace the dates with the cell references.
To expand the table, change B1:G1 and B2:G2 to match the width of the range.

ArrayFormula with sum of previous rows

I have an ArrayFormula to calculate a value for each row, and for each 6th row I want it to calculate the sum of the previous 5 instead.
Example sheet: https://docs.google.com/spreadsheets/d/18g2bOOBqsUgmy3ZXINOl6hcaMXf-uYJv7PGft247FjU/edit?usp=sharing
I have tried several routes, including google script, but keep banging my head against limitations of ArrayFormula.
You need make group by rows
My E.g
Cell A2 (Name groups):
=ArrayFormula(IF(B2:B<>"",FLOOR((ROW(A2:A)-2)/5)+1,""))
Column B (Your Data)
Cell E2 (Result):
=QUERY({QUERY({A2:B},"select Col1,sum(Col2) where Col1>0 group by Col1");
QUERY({A2:B},"select Col1,Col2 where Col1>0")},
"select Col2 where Col1>0 order by Col1,Col2 label Col2 ''")
Function References
Query

Sum of only Distinct values in a Column in DAX

I have table[Table 1] having three columns
OrganizationName, FieldName, Acres having data as follows
organizationname fieldname Acres
ABC |F1 |0.96
ABC |F1 |0.96
ABC |F1 |0.64
I want to calculate the sum of Distinct values of Acres
(eg: 0.96+0.64) in DAX.
One of the problems with doing what you want is that many measures rely on filters and not actual table expressions. So, getting a distinct list of values and then filtering the table by those values, just gives you the whole table back.
The iterator functions are handy and operate on table expressions, so try SUMX
TotalDistinctAcreage = SUMX(DISTINCT(Table1[Acres]),[Acres])
This will generate a table that is one column containing only the distinct values for Acres, and then add them up. Note that this is only looking at the Acres column, so if different fields and organizations had the same acreage -- then that acreage would still only be counted once in this sum.
If instead you want to add up the acreage simply on distinct rows, then just make a small change:
TotalAcreageOnDistinctRows = SUMX(DISTINCT(Table1),[Acres])
Hope it helps.
Ok, you added these requirements:
Thank You. :) However, I want to add Distinct values of Acres for a
Particular Fieldname. Is this possible? – Pooja 3 hours ago
The easiest way really is just to go ahead and slice or filter the original measure that I gave you. But if you have to apply the filter context in DAX, you can do it like this:
Measure =
SUMX(
FILTER(
SUMMARIZE( Table1, [FieldName], [Value] )
, [FieldName] = "<put the name of your specific field here"
)
, [Value]
)

How to sort rows in "SELECT ... FOR ALL ENTRIES ...", ORDER BY is not accepted

I am selecting a table that has multiple of the same records (same REQUEST_ID) with different VERSION_NO. So I want to sort it descending so I can take the highest number (latest record).
This is what I have...
IF it_temp2[] IS NOT INITIAL.
SELECT request_id
version_no
status
item_list_id
mod_timestamp
FROM ptreq_header INTO TABLE it_abs3
FOR ALL ENTRIES IN it_temp2
WHERE item_list_id EQ it_temp2-itemid.
ENDIF.
So version_no is one of the SELECT field but I want to sort that field (descending) and only take the first row.
I was doing some research and read that SORT * BY * won't work with FOR ALL ENTRIES. But that's just my understanding from reading up.
Please let me know how I can make this work. Thanks
You can simply sort the itab after the select and delete all adjecent duplicates afterwards, if wanted:
SORT it_abs3 BY request_id [ASCENDING] version_no DESCENDING.
DELETE ADJACENT DUPLICATES FROM it_abs3 COMPARE request_id.
Depending on the amount of expected garbage (to be deleted lines) in the itab an SQL approach is better. See Used_By_Already's answer.
If you are using the term "latest" to indicate "the most recent entry", then the field mod_timestamp appears to be relevant and you could use it this way to choose only the most recent records for each request_id.
SELECT
request_id
, version_no
, status
, item_list_id
, mod_timestamp
FROM ptreq_header h
INNER JOIN (
SELECT
request_id
, MAX(mod_timestamp) AS latest
FROM ptreq_header
GROUP BY
request_id
) l
ON h.request_id = l.request_id
AND h.mod_timestamp = l.latest
If you want the largest version_no, then instead of MAX(mod_timestamp) use MAX(version_no)
Just declare the it_abs3 as a sorted table with key that would consist of the columns you want to sort by.
You can also sort the table after the query.
SORT it_abs3 BY ...

Substring inside string

Suppose this is my table:
ID STRING
1 'ABC'
2 'DAE'
3 'BYYYYYY'
4 'H'
I want to select all rows that have at least one of the characters in the STRING column somewhere in another row's STRING variable.
For example, 1 and 2 have an A in common and 1 ad 3 have a B in common, but 4 does not have any characters in common with any of the other rows. So my query should return only the first three lines.
I don't need to know with which line it matched.
Thanks!
#A.B.Cade : Good solution but could be done without any distinct nor join.
SELECT * FROM test t1
WHERE EXISTS
(
SELECT * FROM test t2
WHERE t1.id<>t2.id AND
regexp_like(t1.string, '['|| replace(t2.string, '.[]', '\.\[\]')||']')
)
The query won't compare the string with extra rows since it'll stop the comparison as soon as 1 match is found for the current row...
See fiddle.
#GolezTrol's answer is a good one, but here is another approach:
select distinct t1."ID", t1."STRING"
from table1 t1, table1 t2
where t1."ID" <> t2."ID"
and regexp_like(t1."STRING", '['|| t2."STRING"||']')
First take a cartessian product of the table
Then make sure your not comparing the same string to itself
then create a regexp from one string for comparing to the other - [<string1>] means that the string must contain one of the letters in the [ ] which are all from string1
Here is a fiddle
Like this:
select distinct
id, name
from
(select distinct
x.id,
x.NAME,
length(x.NAME) as leng,
substr(x.name, level, 1) as namechar
from
YourTable x
start with
level = 0
connect by
level <= length(x.name)) y
where
exists
(select
'x'
from
YourTable z
where
instr(z.name, y.namechar) > 0 and
z.id <> y.id)
order by
id
What it does:
First, (inner select) use the table with a number generator that returns a number for each letter in the name. Now each record in YourTable is returned Length(Name) times, each with another number. That generated number is used to isolate that letter (substr).
Then (subselect in top level where clause) check if records exist that contain that isolated letter. Distinct is needed, because records are returned more than once if more than one letter matches. You could add namechar to the outer select field list to see the letter that match.

Resources