What is simplest query to display unique values in each column with their count? - oracle

Let's consider I have table like this :
id name addr_line 1 addr_line_2 rec_ins_dt rec_updt_dt
and I want to show output as follows :
rec_ins_dt rec_ins_dt_count rec_updt_dt rec_updt_dt_count
How can I achieve this result using single query ? I understand this can be done by creating temp tables and then joining two temp tables together but I want to use single query.
Following are the additional limitations while executing this query :
Input data : 1 billion rows
Memory : 4 GB
Please consider platform as Oracle or Netezza. Thank you for your inputs.

SELECT
rec_ins_dt , COUNT(*) OVER (PARTITION BY rec_ins_dt) AS rec_ins_dt_count,
rec_updt_dt , COUNT(*) OVER (PARTITION BY rec_updt_dt) AS rec_ins_dt_count
FROM <your-table>;
Oracle Version

Related

Type wise summation and subtracting in oracle

I have two table of my store and working on Oracle. Image First table describe about my transaction in store, there are two types of transaction (MR & SR), MR means adding products in Store and SR means removing products from my storage. What I wanted to do get the final closing of my storage. After transaction final Quantity every products as shown in Image. I have tried many solution but can't finish it. so I could not show now. Please help me to sort this problem. Thanks
You can use case as below to decrease and increase the quantity based on type and then group by Name and find the sum of quantity derived from the case statement to get your desired result.
select row_number() over (order by a.Name) as Sl,a.Name, sum(a.qntity) as qntity
from
(select t2.Name,case when t1.type='MR' then t2.qntity else -(t2.qntity) end as qntity
from table1 t1,table2 t2 where t1.oid=t2.table01_oid) a
group by a.Name;
This query will provide result as below:
SL NAME QNTITY
1 Balls 0
2 Books 6
3 Pencil 13

HIVE equivalent of FIRST and LAST

I have a table with 3 columns:
table1: ID, CODE, RESULT, RESULT2, RESULT3
I have this SAS code:
data table1
set table1;
BY ID, CODE;
IF FIRST.CODE and RESULT='A' THEN OUTPUT;
ELSE IF LAST.CODE and RESULT NE 'A' THEN OUTPUT;
RUN;
So we are grouping the data by ID and CODE, and then writing to the dataset if certain conditions are met. I want to write a hive query to replicate this. This is what I have:
proc sql;
create table temp as
select *, row_number() over (partition by ID, CODE) as rowNum
from table1;
create table temp2 as
select a.ID, a.CODE, a.RESULT, a.RESULT2, a.RESULT3
from temp a
inner join (select ID, CODE, max(rowNum) as maxRowNum
from temp
group by ID, CODE) b
on a.ID=b.ID and a.CODE=b.CODE
where (a.rowNum=1 and a.RESULT='A') or (a.rowNum=b.maxRowNum and a.RESULT NE 'A');
quit;
There are two issues I see with this.
1) The row that is first or last in each BY group is entirely dependant on the order of rows in table1 in SAS, we aren't ordering by anything. I don't think row order is preserved when translating to a hive query.
2) The SAS code is taking the first row in each BY GROUP or the last, not both. I think that my HIVE query is taking both, resulting in more rows than I want.
Any suggestions or insight on how to improve my query is appreciated. Is it even possible to replicate this SAS code in HIVE?
The SAS code has a by statement (BY ID CODE;), which tells SAS that the set dataset is sorted at those levels. So, not a random selection for first. and last..
That said, we can replicate this in HIVE by using the first_value and last_value window functions.
FIRST.CODE should replicate to
first_value(code) over (partition by Id order by code)fcode
Similarly, LAST.CODE would be
last_value(code) over (partition by Id order by code)lcode
Once you have the fcode and lcode columns, use case when statements for the result column criteria. Like,
case when (code=fcode and result='A') or (code=lcode and result<>'A')
then 1 else 0 end as op_flag
Then the fetch the table with where op_flag = 1
SAMPLE
select id, code, result from (
select *,
first_value(code) over (partition by id order by code)fcode,
last_value(code) over (partition by id order by code)lcode
from footab) f
where (code=fcode and result='A') or (code=lcode and result<>'A')
Regarding point 1) the BY group processing requires the input data to be sorted or indexed on BY variables, so though the code contains no ordering, the source data is processed in order. If the input data was not indexed/sorted, SAS will throw error.
Regarding this, possible differences are on rows with same values of BY variables, especially if the RESULT is different.
In SAS, I would pre-sort data by ID, CODE, RESULT, then use BY ID CODE in order to not be influenced by order of rows.
Regarding 2) FIRST and LAST can be both true in SAS. Since your condition for first and last on RESULT is different, I guess this is not a source of differences.
I guess you could add another field as
row_number() over (partition by ID, CODE desc) as rowNumDesc
to detect last row with rowNumDesc = 1 (so that you skip the join).
EDIT:
I think the two programs above both include random selection of rows for groups with same values of ID and CODE variables, especially with same values of RESULT. But you should get same number of rows from both. If not, just debug it.
However the random aspect in SAS code/storage is based on physical order of rows, while the ROW_NUMBERs randomness within a group will be influenced by the implementation of the function in the engine.

Using index to speed up child <> parent query

I have query similar to this:
select *
from table1
where status = 'ACTV'
and child_id <> parent_id
The problem is that this table is quite and large and Oracle is doing full table scan.
I was trying to create an index (with status, child_id, parent_id columns) that would speed up this query but Oracle is not using this index even with hint.
Is there a way to speed up this query ?
You can use index with function:
CREATE INDEX child_parent ON table1(DECODE(child_id,parent_id,1, 0))
And then use it in your select:
select *
from table1
where status = 'ACTV'
and DECODE(child_id,parent_id,1, 0) = 0
Only cons for this solution - it will slow down insert and update operations a bit more than regular index.
Also if potentially returnable record count is large Oracle can do table full scan
In parent, child table : "child_id <> parent_id" is obvious right , it will always fetch 99% of data then full table scan is better approach. Index will be slower if you selecting more percentage of data.
if your application needs "child_id <> parent_id" always then you can create check constrain for the same. Then you may not need this where condition "child_id <> parent_id" any time.

Transforming hive IN subselect query combined with WHERE replacement

I know that one needs to replace IN query with semi-left-join (e.g. Hive doesn't support in, exists. How do I write the following query?), but I don't know how to combine it with a WHERE clause:
SELECT *
from foo
WHERE userId IN
(SELECT distinct(userId) FROM foo WHERE x=true ORDER BY RAND() LIMIT 100);
thanks.
EDIT: Changed query. Intention is to create a random sample of entries (statistics wise).
(Posting alternative approach for completeness.)
To sample a set of records from a table, you can use Hive's TABLESAMPLE syntax. For example, too select a random sample of 100 distinct userId's you would use:
SELECT userId
FROM (SELECT DISTINCT(userId) as userId FROM foo) f
TABLESAMPLE(100 ROWS);
The syntax allows you to specify your sample size in different ways. The following is also valid:
SELECT userId
FROM (SELECT DISTINCT(userId) as userId FROM foo) f
TABLESAMPLE(1 PERCENT);
For more details, check out the manual page for this topic.
Once you have your sample of userId's, you can use Manuel Aldana's earlier answer to select the corresponding records from your original table.
select id from foo
left semi join
(SELECT id_2 FROM bar WHERE x=true RAND() LIMIT 100) x
ON foo.id=x.id_2
Should be like this.
I just don't understand this part : x=true RAND()
Also, this doesn't handle nulls just like your query.

SSRS - T-SQL - Concatenate multiple rows

I have T-SQL query that joins multiple tables. I am using that in SSRS as Dataset query. I am only selecting two columns, ID and Names. I have three records with same "ID" values but three different "Names" values. In SSRS, I am getting the first "Names" value and I need to concatonate all three values with same ID and have it in one cell on a table.
How would I go about doing that?
I am using lookup to combine cube + sql
Pulling ID straight from a table but using Case statement for Names to define alias.
You can accomplish this in TSQL either using PIVOT to get them as separate columns which you can then combine in the report cell, or you can use one of these concatenation methods to get all the names in one column.
For example, you can do this:
SELECT SomeTableA.Id,
STUFF(
(SELECT ',' + SomeTableB.Names AS [text()]
FROM SomeTable SomeTableB
WHERE SomeTableB.Id = SomeTableA.Id
FOR XML PATH('')), 1, 1, '' )
AS ConcatenatedNames
FROM SomeTable SomeTableA
INNER JOIN AnotherTable
ON SomeTableA.Id = AnotherTable.SomeId
...

Resources