How to fetch all possible pattern in hive - hadoop

I have below table:
+----------+----+
|customerID|name|
+----------+----+
| 1| Ram|
+----------+----+
I want output as (All possible value of column-value):
+----------+----+
|customerID|name|
+----------+----+
| 1| Ram|
| 2| Arm|
| 3| Mar|
| .| ...|
| .| ...|
+----------+----+

Split string, explode array and use cross join with itself to find all possible combinations:
with s as (select col
from (select explode( split(lower('Ram'),'')) as col)s
where col <>''
)
select concat(upper(s1.col), s2.col, s3.col) as name,
row_number() over() as customerId
from s s1
cross join s s2
cross join s s3
where s1.col<>s2.col and s2.col<>s3.col;
Result:
OK
name customerid
Mam 1
Mar 2
Mrm 3
Mra 4
Ama 5
Amr 6
Arm 7
Ara 8
Rma 9
Rmr 10
Ram 11
Rar 12
Time taken: 185.638 seconds, Fetched: 12 row(s)
Without last WHERE s1.col<>s2.col and s2.col<>s3.col you will get all combinations like Aaa, Arr, Rrr, etc.

Related

Oracle 11g insert into select from a table with duplicate rows

I have one table that need to split into several other tables.
But the main table is just like a transitive table.
I dump data from a excel into it (from 5k to 200k rows) , and using insert into select, split into the correct tables (Five different tables).
However, the latest dataset that my client sent has records with duplicates values.
The primary key usually is ENI for my table. But even this record is duplicated because the same company can be a customer and a service provider, so they have two different registers but use the same ENI.
What i have so far.
I found a script that uses merge and modified it to find same eni and update the same main_id to all
|Main_id| ENI | company_name| Type
| 1 | 1864 | JOHN | C
| 2 | 351485 | JOEL | C
| 3 | 16546 | MICHEL | C
| 2 | 351485 | JOEL J. | S
| 1 | 1864 | JOHN E. E. | C
Main_id: Primarykey that the main BD uses
ENI: Unique company number
Type: 'C' - COSTUMER 'S' - SERVICE PROVIDERR
Some Cases it can have the same type. just like id 1
there are several other Columns...
What i need:
insert any of the main_id my other script already sorted, and set a flag on the others that they were not inserted. i cant delete any data i'll need to send these info to the costumer validate.
or i just simply cant make this way and go back to the good old excel
Edit: as a question below this is a example
|Main_id| ENI | company_name| Type| RANK|
| 1 | 1864 | JOHN | C | 1 |
| 2 | 351485 | JOEL | C | 1 |
| 3 | 16546 | MICHEL | C | 1 |
| 2 | 351485 | JOEL J. | S | 2 |
| 1 | 1864 | JOHN E. E. | C | 2 |
RANK - would be like the 1864 appears 2 times,
1st one found gets 1 second 2 and so on. i tryed using
RANK() OVER (PARTITION BY MAIN_ID ORDER BY ENI)
RANK() OVER (PARTITION BY company_name ORDER BY ENI)
Thanks to TEJASH i was able to come up with this solution
MERGE INTO TABLEA S
USING (Select ROWID AS ID,
row_number() Over(partition by eniorder by eni, type) as RANK_DUPLICATED
From TABLEA
) T
ON (S.ROWID = T.ID)
WHEN MATCHED THEN UPDATE SET S.RANK_DUPLICATED= T.RANK_DUPLICATED;
As far as I understood your problem, you just need to know the duplicate based on 2 columns. You can achieve it using analytical function as follows:
Select t.*,
row_number() Over(partition by main_id, eni order by company_name) as rnk
From your_table t

Lag and Lead to next month

TABLE: HIST
CUSTOMER MONTH PLAN
1 1 A
1 2 B
1 2 C
1 3 D
If I query:
select h.*, lead(plan) over (partition by customer order by month) np from HIST h
I get:
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B C
1 2 C D
1 3 D (null)
But I wanted
CUSTOMER MONTH PLAN np
1 1 A B
1 2 B D
1 2 C D
1 3 D (null)
Reason being, next month to 2 is 3, with D. I'm guessing partition by customer order by month doesn't work the way I thought.
Is there a way to achieve this in Oracle 12c?
One way to do it is to use RANGE partitioning with the MIN analytic function. Like this:
select h.*,
min(plan) over
(partition by customer
order by month
range between 1 following and 1 following) np
from HIST h;
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | |
+----------+-------+------+----+
When you use RANGE partitioning, you are telling Oracle to make the windows based on the values of the column you are ordering by rather than making the windows based on the rows.
So, e.g.,
ROWS BETWEEN 1 following and 1 following
... will make a window containing the next row.
RANGE BETWEEN 1 following and 1 following
... will make a window containing all the rows having the next value for month.
UPDATE
If it is possible that some values for MONTH might be skipped for a given customer, you can use this variant:
select h.*,
first_value(plan) over
(partition by customer
order by month
range between 1 following and unbounded following) np
from h
+----------+-------+------+----+
| CUSTOMER | MONTH | PLAN | NP |
+----------+-------+------+----+
| 1 | 1 | A | B |
| 1 | 3 | B | D |
| 1 | 3 | C | D |
| 1 | 4 | D | |
+----------+-------+------+----+
You can use LAG/LEAD twice. The first time to check for duplicate months and to set the value to NULL in those months and the second time use IGNORE NULLS to get the next monthly value.
It has the additional benefit that if months are skipped then it will still find the next value.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE HIST ( CUSTOMER, MONTH, PLAN ) AS
SELECT 1, 1, 'A' FROM DUAL UNION ALL
SELECT 1, 2, 'B' FROM DUAL UNION ALL
SELECT 1, 2, 'C' FROM DUAL UNION ALL
SELECT 1, 3, 'D' FROM DUAL UNION ALL
SELECT 2, 1, 'E' FROM DUAL UNION ALL
SELECT 2, 1, 'F' FROM DUAL UNION ALL
SELECT 2, 3, 'G' FROM DUAL UNION ALL
SELECT 2, 5, 'H' FROM DUAL;
Query 1:
SELECT CUSTOMER,
MONTH,
PLAN,
LEAD( np ) IGNORE NULLS OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM ) AS np
FROM (
SELECT h.*,
CASE MONTH
WHEN LAG( MONTH ) OVER ( PARTITION BY CUSTOMER ORDER BY MONTH, PLAN, ROWNUM )
THEN NULL
ELSE PLAN
END AS np
FROM hist h
)
Results:
| CUSTOMER | MONTH | PLAN | NP |
|----------|-------|------|--------|
| 1 | 1 | A | B |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |
| 2 | 1 | E | G |
| 2 | 1 | F | G |
| 2 | 3 | G | H |
| 2 | 5 | H | (null) |
Just so that it is listed here as an option for Oracle 12c (onward), you can use an apply operator for this style of problem
select
h.customer, h.month, h.plan, oa.np
from hist h
outer apply (
select
h2.plan as np
from hist h2
where h.customer = h.customer
and h2.month > h.month
order by month
fetch first 1 rows only
) oa
order by
h.customer, h.month, h.plan
I don't know of any Oracle 12c public fiddles so, an example in SQL Server can be found here: http://sqlfiddle.com/#!18/cd95e/1
| customer | month | plan | np |
|----------|-------|------|--------|
| 1 | 1 | A | C |
| 1 | 2 | B | D |
| 1 | 2 | C | D |
| 1 | 3 | D | (null) |

Oracle SQL - distributing into buckets

i'am searching for a smart oracle sql solution to distribute data into a number of buckets. The order of x is important. I know there are a lot of algorithms but iam pretty sure there must be smart sql (analytic function) solution e.g. NTILE(3) but i don't get it.
x|quantity
1|7
2|4
3|9
4|2
5|10
6|3
8|7
9|7
10|4
11|9
12|2
13|10
16|3
17|7
The result should look something like this:
x_from|x_to|sum(quantity)
1|4|22
...and so on
Thanks in advance
Tim
This example divides the table into 4 buckets (ntile( 4 )):
SELECT min( "x" ) as "From",
max( "x" ) as "To",
sum("quantity")
FROM (
SELECT t.*,
ntile( 4 ) over (order by "x" ) as group_no
FROM table1 t
)
GROUP BY group_no
ORDER BY 1;
| From | To | SUM("QUANTITY") |
|------|----|-----------------|
| 1 | 4 | 22 |
| 5 | 9 | 27 |
| 10 | 12 | 15 |
| 13 | 17 | 20 |

Remove duplicate values from a listagg in oracle

I have used listagg to concat and list books along with the supplementary books name.
SELECT DISTINCT SUBSTR(LISTAGG(',-'||B1.BOOK_NO||','||B1.BOOK_NAME||','||A.AUTHOR_NAME||',-'||B2.BOOK_NO||','||B2.BOOK_NAME) WITHIN GROUP (ORDER BY B2.BOOK_NO),2)
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123;
But since the number of supplementary books are more i get the main book name repeatedly.
Is there a way to remove the duplicate main book name and number.
My ouput is like this
-99,Anders Carlson ,-109,John Stuart,-99,Anders Carlson ,-47,James Anderson
Here the value 99 is repeated i want only one 99.
Desired Output:
-99,Anders Carlson ,-109,John Stuart,-47,James Anderson
DB data:
Book_list:
NO | MAIN_BOOK_NO | MAIN_BOOK_NAME | BOOK_SUP_NO | AUTHOR_NO
1 | 12 | xyz | 5 | 2
2 | 22 | abc | 7 | 4
Authors:
NO | AUTHOR_NO | AUTHOR_NAME
1 | 2 | Alex
2 | 3 | Leonard
3 | 4 | Benjamin
Supplementary:
NO | BOOK_NO | BOOK_NAME
1 | 5 | ABC
2 | 5 | XYZ
3 | 7 | LMN
4 | 7 | DEF
5 | 7 | NEW
The output should be like
NAME
12,xyz,Alex,-5,ABC,-5,XYZ
22,abc,Benjamin,-7,LMN,-7,DEF,-7,NEW
Similarly for the entire data in the table
If I understand you correctly, you need to append the list of supplementary books to the main book, so you're actually after something like:
SELECT B1.MAIN_BOOK_NO||','||B1.MAIN_BOOK_NAME||',-'||
LISTAGG(B2.BOOK_NO||','||B2.BOOK_NAME, ',-') WITHIN GROUP (ORDER BY B2.BOOK_NO)
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123
GROUP BY B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME;
See if this works
select T1.MAIN_BOOK_NO, T11.MAIN_BOOK_NAME, LISTAGG(',-'||',-'||T1.BOOK_NO||','||T1.BOOK_NAME) WITHIN GROUP (order by T1.BOOK_NO)
from
(
SELECT B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME, B2.BOOK_NO, B2.BOOK_NAME
FROM BOOK_LIST B1
INNER JOIN AUTHORS A ON A.AUTHOR_NO=B1.AUTHOR_NO
INNER JOIN SUPPLEMENTARY B2 ON B2.BOOK_NO = B1.BOOK_SUP_NO
WHERE B1.SEQ = 123
group by B1.MAIN_BOOK_NO, B1.MAIN_BOOK_NAME, B2.BOOK_NO, B2.BOOK_NAME
order by B2.BOOK_NO
) T1
group by T1.MAIN_BOOK_NO, T1.MAIN_BOOK_NAME;

Joining tables with same column names - ORACLE

I am using Oracle.
I am currently working one 2 tables which both have the same column names. Is there any way in which I can combine the 2 tables together as they are?
Simple example to show what I mean:
TABLE 1:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
TABLE 2:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| c | 3 | y |
| d | 4 | z |
RESULT THAT I WANT:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
| c | 3 | y |
| d | 4 | z |
Any help would be greatly appreciated. Thank you in advance!
You can use the union set operator to get the result of two queries as a single result set:
select column1, column2, column3
from table1
union all
select column1, column2, column3
from table2
union on its own implicitly removes duplicates; union all preserves them. More info here.
The column names don't need to be the same, you just need the same number of columns with the same datatpes, in the same order.
(This is not what is usually meant by a join, so the title of your question is a bit misleading; I'm basing this on the example data and output you showed.)

Resources