I am in a scenario to obtain all the records from a table where FIRSTNAME and LASTNAME of a particular record is the same but the BIRTHDATE is greater than or equal to 15 years.
Consider my table looks like:
_______________________________________________________________________________
| PRIMARY_ID | UNIQUE_ID | FIRSTNAME | LASTNAME | SUFFIX | BIRTHDATE |
_______________________________________________________________________________
| 12345 | abcd | john | collin | Mr | 1975-10-01 00:00:00|
| 12345 | cdef | john | collin | Mr | 1960-10-01 00:00:00|
| 12345 | efgh | john | collin | Mr | 1975-10-01 00:00:00|
| 12345 | ghij | john | collin | Mr | 1960-10-01 00:00:00|
| 12345 | aaaa | john | collin | Mr | 1975-10-01 00:00:00|
| 12345 | bdfs | john | collin | Mr | 1975-10-01 00:00:00|
| 12345 | asdf | john | collin | Mr | null |
| 12345 | dfgh | john | collin | Mr | null |
| 23456 | ghij | jeremy | lynch | Mr | 1982-10-15 00:00:00|
| 23456 | aaaa | jacob | lynch | Mr | 1945-10-12 00:00:00|
| 23456 | bdfs | jeremy | lynch | Mr | 1945-10-12 00:00:00|
| 23456 | asdf | jacob | lynch | Mr | null |
| 23456 | dfgh | jeremy | lynch | Mr | null |
_______________________________________________________________________________
In this table, for the PRIMARY_ID 12345, the FIRSTNAME and LASTNAME are all same but the BIRTHDATE difference between the UNIQUE_IDs if 15 years. So this PRIMARY_ID needs to be pulled out. Wherein for PRIMARY_ID 23456, the FIRSTNAME is not the same for all UNIQUE_ID records, so it must not be pulled out.
The table might contain NULL values for BIRTHDATE, which should be ignored.
This is what I have tried till now:
SELECT
/*PARALLEL(16)*/
PRIMARY_ID,
UNIQUE_ID,
FIRSTNAME,
LASTNAME,
SUFFIX,
BIRTHDATE,
RANK() OVER ( ORDER BY FIRSTNAME, LASTNAME, SUFFIX, BIRTHDATE) "GROUP"
FROM TABLE;
I have queried to form separate groups to distinguish by FIRSTNAME, LASTNAME and BIRTHDATE. I do not know on how to proceed further with this.
Can someone please help out?
NOTE: The BIRTHDATE field is in varchar datatype and I use Oracle 12C.
As I understand it, the goal is to return the distinct set of primary_id for which adjacent (alphabetically) unique_id that share the same firstname and lastname are separated by 15+ years. As I understand it, NULL should interrupt comparison (and be considered a non-match (otherwise, primary_id 23456 would also match here for pseudo-adjacent bdfs + ghij).
There are other ways to do this, but one way available in 12c is to use pattern-matching. An example is below. The example just uses a difference of 5478 days as to represent 15-years, but one could nuance that if greater exactitude was needed for intercalary days etc.
SELECT DISTINCT PRIMARY_ID
FROM THE_TABLE
MATCH_RECOGNIZE (
PARTITION BY PRIMARY_ID
ORDER BY UNIQUE_ID
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN(FIFTEEN_DIFF)
DEFINE FIFTEEN_DIFF AS
(FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME)
AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME)
AND (ABS(EXTRACT( DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE,'YYYY-MM-DD HH24:MI:SS') - PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE,'YYYY-MM-DD HH24:MI:SS'))))) >= 5478)));
Result:
PRIMARY_ID
12345
1 row selected.
The above query does the following:
PARTITIONs to look at each PRIMARY_ID group individually,
then ORDERs by the UNIQUE_ID, so only alphabetically-adjacent records are compared.
Then each record is compared to the last, and if they share FIRSTNAME and LASTNAME, and their BIRTHDATEs differ by 15+ years, they are counted as a MATCH, and returns one record to indicate this.
After any match is found, it skips to the next row and resumes comparing.
Since only the distinct matches are desired, a DISTINCT is included in the select statement.
EDIT:
In response to follow-up questions, adding two additional examples.
Alternative 1: Pre-Filter NULL
This will bring different UNIQUE_ID into proximity, giving different matches.
SELECT DISTINCT PRIMARY_ID
FROM (SELECT PRIMARY_ID, UNIQUE_ID, FIRSTNAME, LASTNAME, SUFFIX, BIRTHDATE
FROM THE_TABLE
WHERE BIRTHDATE
IS NOT NULL)
MATCH_RECOGNIZE (
PARTITION BY PRIMARY_ID
ORDER BY UNIQUE_ID
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (FIFTEEN_DIFF)
DEFINE FIFTEEN_DIFF AS
(FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME)
AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME)
AND (ABS(EXTRACT(DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS') -
PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS'))))) >= 5478)));
Result (this now includes PRIMARY_ID 23456, as removing NULL brings two UNIQUE_IDs into order that ar 15+ years apart) :
PRIMARY_ID
12345
23456
2 rows selected.
Alternative 2: Count NULL as a match
SELECT DISTINCT PRIMARY_ID
FROM THE_TABLE
MATCH_RECOGNIZE (
PARTITION BY PRIMARY_ID
ORDER BY UNIQUE_ID
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (FIFTEEN_DIFF)
DEFINE FIFTEEN_DIFF AS
(FIFTEEN_DIFF.FIRSTNAME = PREV(FIFTEEN_DIFF.FIRSTNAME)
AND FIFTEEN_DIFF.LASTNAME = PREV(FIFTEEN_DIFF.LASTNAME)
AND ((ABS(EXTRACT(DAY FROM (TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS') -
PREV(TO_TIMESTAMP(FIFTEEN_DIFF.BIRTHDATE , 'YYYY-MM-DD HH24:MI:SS'))))) >= 5478)
OR (LEAST(FIFTEEN_DIFF.BIRTHDATE,PREV(FIFTEEN_DIFF.BIRTHDATE)) IS NULL
AND COALESCE(FIFTEEN_DIFF.BIRTHDATE,PREV(FIFTEEN_DIFF.BIRTHDATE)) IS NOT NULL))));
Result (This also return both PRIMARY_ID, as NULL is now counted as a match):
PRIMARY_ID
12345
23456
2 rows selected.
Related
There is a table_verif table where I need to insert data, here is an example of fields and records for this table:
call_id | datetime | process_name | param_1 | param_2 | param_3 | param_ 4 |
------------------------------------------------------------------------------------------
1234567 | 29.12.2022| verification process | greeting| presentation | result | waiting |
---------------------------------------------------------------------------------------
And there is my table my_table with entries:
call_id | datetime | process_name | param_1 |
--------------------------------------------------------------
1234567 | 29.12.2022 | complaints | Establishing contact |
--------------------------------------------------------------
1234567 | 29.12.2022 | complaints | Identification |
--------------------------------------------------------------
1234567 | 29.12.2022 | complaints | Num specification |
--------------------------------------------------------------
1234567 | 29.12.2022 | complaints | Data transfer |
I need to translate these four records into a table_verif format:
call_id| datetime |process_name| param_1 | param_2 | param_3 | param_4
--------------------------------------------------------------------------------------------
1234567|29.12.2022|complaints |Establishing contact|Identification|Num specification|Data transfer
I took it shortly to param_4, but in practice it starts with param_1 and ends with ..param_9
You can pivot a result set (more explanation here), but you need to have a value to pivot on and provide the ordering for the result. For example you could start with a query like:
select call_id, datetime, process_name, param_1,
-- establish some ranking - use your own criteria
row_number() over (partition by call_id, process_name order by datetime, param_1) as rn
from my_table
which with the sample data plus reverse-engineered data from your example table_verif row might produce:
CALL_ID
DATETIME
PROCESS_NAME
PARAM_1
RN
1234567
29-DEC-22
complaints
Data transfer
1
1234567
29-DEC-22
complaints
Establishing contact
2
1234567
29-DEC-22
complaints
Identification
3
1234567
29-DEC-22
complaints
Num specification
4
1234567
29-DEC-22
verification process
greeting
1
1234567
29-DEC-22
verification process
presentation
2
1234567
29-DEC-22
verification process
result
3
1234567
29-DEC-22
verification process
waiting
4
You may have different criteria to decide the parameter order, perhaps from a look-up table you haven't shown. However you get it, once you have a generated value like rn you can use that for the pivot:
select *
from (
select call_id, datetime, process_name, param_1,
-- establish some ranking - use your own criteria
row_number() over (partition by call_id, process_name order by datetime, param_1) as rn
from my_table
)
pivot (
max(param_1)
for (rn) in (1 as param_1, 2 as param_2, 3 as param_3, 4 as param_4, 5 as param_5)
)
CALL_ID
DATETIME
PROCESS_NAME
PARAM_1
PARAM_2
PARAM_3
PARAM_4
PARAM_5
1234567
29-DEC-22
complaints
Data transfer
Establishing contact
Identification
Num specification
null
1234567
29-DEC-22
verification process
greeting
presentation
result
waiting
null
fiddle
You can then use that as an insert ... select ... statement to populate table_verif, or if this is permanent data then you can turn the query into a view instead of duplicating it.
You could also pivot by listing all the possible param_1 values in the in() clause instead of generating a ranking value, but that would give you a much wider result, and doesn't seem to be what you want here.
I currently have a table:
id | info | value | date
1 | desc | description | 19-01-1990 10:01:23
2 | lname | Doe | 19-11-1990 10:01:23
1 | fname | John | 19-08-1990 10:01:23
1 | dob | dob | 19-05-1990 10:01:23
3 | fname | Jo | 19-01-1990 10:01:23
I would like to query and grab data and do joins with multiple tables later on, so I need it to be:
id | desc | lname | fname | dob | desc | date | ... |
1 | description | Doe | John | dob | description | 19-01-1990 10:01:23 | ... |
2 | ......... | ..... | Jo | | | ... | ... |
I have tried crosstab but it does not seem to work. Any help is appreciated
Your current table is a typical denormalized key value store. You may generate the normalized output you want by aggregating by id and then using max CASE expressions:
SELECT
id,
MAX(CASE WHEN info = 'desc' THEN value END) AS desc,
MAX(CASE WHEN info = 'lname' THEN value END) AS lname,
MAX(CASE WHEN info = 'fname' THEN value END) AS fname,
MAX(CASE WHEN info = 'dob' THEN value END) AS dob
FROM yourTable
GROUP BY
id
ORDER BY
id;
Note that I don't have any column for the date, as you did not give logic for which date value should be retained for each id.
As for the Spring part of your question, you would probably have to use a native query to execute the above.
I have one table that need to split into several other tables.
But the main table is just like a transitive table.
I dump data from a excel into it (from 5k to 200k rows) , and using insert into select, split into the correct tables (Five different tables).
However, the latest dataset that my client sent has records with duplicates values.
The primary key usually is ENI for my table. But even this record is duplicated because the same company can be a customer and a service provider, so they have two different registers but use the same ENI.
What i have so far.
I found a script that uses merge and modified it to find same eni and update the same main_id to all
|Main_id| ENI | company_name| Type
| 1 | 1864 | JOHN | C
| 2 | 351485 | JOEL | C
| 3 | 16546 | MICHEL | C
| 2 | 351485 | JOEL J. | S
| 1 | 1864 | JOHN E. E. | C
Main_id: Primarykey that the main BD uses
ENI: Unique company number
Type: 'C' - COSTUMER 'S' - SERVICE PROVIDERR
Some Cases it can have the same type. just like id 1
there are several other Columns...
What i need:
insert any of the main_id my other script already sorted, and set a flag on the others that they were not inserted. i cant delete any data i'll need to send these info to the costumer validate.
or i just simply cant make this way and go back to the good old excel
Edit: as a question below this is a example
|Main_id| ENI | company_name| Type| RANK|
| 1 | 1864 | JOHN | C | 1 |
| 2 | 351485 | JOEL | C | 1 |
| 3 | 16546 | MICHEL | C | 1 |
| 2 | 351485 | JOEL J. | S | 2 |
| 1 | 1864 | JOHN E. E. | C | 2 |
RANK - would be like the 1864 appears 2 times,
1st one found gets 1 second 2 and so on. i tryed using
RANK() OVER (PARTITION BY MAIN_ID ORDER BY ENI)
RANK() OVER (PARTITION BY company_name ORDER BY ENI)
Thanks to TEJASH i was able to come up with this solution
MERGE INTO TABLEA S
USING (Select ROWID AS ID,
row_number() Over(partition by eniorder by eni, type) as RANK_DUPLICATED
From TABLEA
) T
ON (S.ROWID = T.ID)
WHEN MATCHED THEN UPDATE SET S.RANK_DUPLICATED= T.RANK_DUPLICATED;
As far as I understood your problem, you just need to know the duplicate based on 2 columns. You can achieve it using analytical function as follows:
Select t.*,
row_number() Over(partition by main_id, eni order by company_name) as rnk
From your_table t
I have the below tables:
person table
personid | firstname | lastname
------------------------------
P1 | Jim | John
P2 | Kori | Test
P3 | Adam | Blair
P4 | Kim | sand
P5 | julia | Dan
order table
orderno |ordername | price | personid
---------------------------------
1 |shoes | 100 | P1
2 |books | 50 |P2
3 | pen | 10 |P3
4 |laptop | 80 |P4
5 |notebook | 40 |P5
Email address table
clientid emailid
---------------------
P1 | jom.John#test.com
P3 | adam.blair#test.com
P4 | kim.sand#test.com
I have to get the top 3 person names ordered by 'price' and then get rid of the persons who dont have an email address and this final list should be ordered by firstname (desc)
My results should look like:
Firstname lastname price
------------------------------
Kim | sand | 80
Jim | john | 100
I tried minus but not able to figure out how to use order by (first by price) and then for the final results by firstname. oracle is not liking the combination of multiple order by and rownum together.
Any pointers would be helpful.
Try this:-
SELECT P.FIRSTNAME,P.LASTNAME,O.PRICE FROM PERSON_TABLE P
INNER JOIN ORDER_TABLE O
ON P.PERSONID = O.PERSONID
AND P.PERSONID NOT IN
(SELECT E.CLIENTID FROM EMAIL_ADDRESS_TABLE E)
ORDER BY P.FIRSTNAME desc;
output:
Firstname lastname price
------------------------
kim | sand | 80
julia | dan | 40
My result is not look like as your but it may be help you. I wrote above script as per your above mentioned requirement.
Dynamically Identify Columns in External Tables
We have a process wherein we upload employee data from multiple legislations (ex. US, Philippines, Latin America) via a SQL Loader.
This happens at least once a week and the current process is they create a control file every time they load employee information,
Load that into Staging Tables using SQL*Loader.
I was hoping to simplify the process by creating an External Table and running a concurrent request to put the data into our staging Tables.
There are two stumbling blocks i'm encountering:
There are some columns which are not being used by some legislations.
Example: US uses the column "Veteran_Information", while the Philippines and Latin America don't.
Philippines uses "SSS_Number" while US and Latin America Don't.
Latin America uses a "Medical_Insurance" Column while US and Philippines don't.
Something like below:
US: LEGISLATION, EMPLOYEE_NUMBER, DATE_OF_BIRTH, VETERAN_INFORMATION
PHL: LEGISLATION, EMPLOYEE_NUMBER, DATE_OF_BIRTH, SSS_NUMBER
LAT: LEGISLATION, EMPLOYEE_NUMBER, DATE_OF_BIRTH, MEDICAL_INSURANCE
Business Users don't use a Standard CSV Template/Format.
Since the File is being sent by Non-IT Business Users, they don't usually follow a prescribed format. (Training/User issue, probably).
they often don't follow the correct order of columns
they often don't follow the correct number of columns
they often don't follow the correct names of columns
Something like below:
US: LEGISLATION, EMPLOYEE_ID, VETERAN_INFORMATION, DATE_OF_BIRTH, EMAIL_ADD
PHL: EMP_NUM, LEGISLATION, DOB, SSS_NUMBER, EMAIL_ADDRESS
LAT: LEGISLATION, PS_ID, BIRTH_DATE, EMAIL, MEDICAL_INSURANCE
Is there a way for External Tables to identify the correct order and naming of columns even if they're not in the correct order/naming convention in the File?
Taking the Column Data from Problem 2:
US: LEGISLATION | EMPLOYEE_ID | VETERAN_INFORMATION | DATE_OF_BIRTH | EMAIL_ADD
US | 111 | No | 1967 | vet#gmail.com
PHL: EMP_NUM | LEGISLATION | DOB | SSS_NUMBER | EMAIL_ADDRESS
222 | PHL | 1898 | 456789 | pinoy#gmail.com
LAT: LEGISLATION | PS_ID | BIRTH_DATE | EMAIL | MEDICAL_INSURANCE
HON | 333 | 1956 | hon#gmail.com | Yes
I would like it to be like this when it appears in the External Table:
LEGISLATION | EMPLOYEE_NUMBER | DATE_OF_BIRTH | VETERAN_INFORMATION | SSS_NUMBER | MEDICAL_INSURANCE | EMAIL_ADDRESS
US | 111 | 1967 | Y | (NULL) | (NULL) | vet#gmail.com
PHL | 222 | 1898 | (NULL) | 456789 | (NULL) | pinoy#gmail.com
HON | 333 | 1956 | (NULL) | (NULL) | Yes | hon#gmail.com
Is there a way for External Tables to do something like above?
Thanks in advance!
The simplest would be:
Use three distinct load scripts for each type of input (US, PHL, HON). Each script just discards the other 2 record types, and places the columns (possibly doing some transformation, like 'No' -> 'N') in the right place and inserts NULL for columns that were not present for that record type.