I'm merging several tables in Oracle 10g, into a consolidated table, like this:
table_A (will have all the records)
table_b -part of the data to be merged
table_c -part of the data to be merged
table_d -part of the data to be merged
now, i run it with error logging like this
MERGE INTO TABLE_A A USING (SELECT * FROM TABLE_B) B
ON
(
A.NOMBRE=B.NOMBRE AND
A.PRIMER_APELLIDO=B.PRIMER_APELLIDO AND
A.SEGUNDO_APELLIDO=B.SEGUNDO_APELLIDO AND
TO_CHAR(A.FECHA_NACIMIENTO,'DD/MM/YYYY')=TO_CHAR(B.FECHA_NACIMIENTO,'DD/MM/YYYY') AND
A.SEXO=B.SEXO
)
WHEN MATCHED THEN
UPDATE SET DGP2011='1'
WHEN NOT MATCHED THEN
INSERT
(
A.FOLIO_RELACIONADO,
A.CVE_PROGRAMA,
A.FECHA_ALTA,
A.PRIMER_APELLIDO,
A.SEGUNDO_APELLIDO,
A.NOMBRE,
A.FECHA_NACIMIENTO,
A.SEXO,
A.CVE_NACIONALIDAD,
A.CVE_ENTIDAD_NACIMIENTO,
A.CVE_GRADO_ESCOLAR,
A.CVE_GRADO_ESTUDIOS,
A.CURP,
A.CALLE,
A.NUM_EXT,
A.NUM_INT,
A.CODIGO_POSTAL,
A.ENTRE_CALLE,
A.Y_CALLE,
A.OTRA_REFERENCIA,
A.TELEFONO,
A.COLONIA,
A.LOCALIDAD,
A.CVE_MUNICIPIO,
A.CVE_ENTIDAD_FEDERATIVA,
A.CVE_CCT,
A.PRIMER_APELLIDO_C,
A.SEGUNDO_APELLIDO_C,
A.NOMBRE_C,
A.FECHA_NACIMIENTO_C,
A.SEXO_C,
A.CVE_ESTADO_CIVIL_C,
A.CVE_GRADO_ESTUDIOS_C,
A.CVE_PARENTESCO_C,
A.CURP_C,
A.CVE_TIPO_ID_OFCL_C,
A.ID_DOCTO_OFL_C,
A.CVE_NACIONALIDAD_C,
A.CVE_ENTIDAD_NACIMIENTO_C,
A.CALLE_C,
A.NUM_EXT_C,
A.NUM_INT_C,
A.CODIGO_POSTAL_C,
A.ENTRE_CALLE_C,
A.Y_CALLE_C,
A.OTRA_REFERENCIA_C,
A.TELEFONO_C,
A.COLONIA_C,
A.LOCALIDAD_C,
A.CVE_MUNICIPIO_C,
A.CVE_ENTIDAD_FEDERATIVA_C,
A.E_MAIL_C,
A.DGP2011
)
VALUES
(
B.FOLIO_RELACIONADO,
B.CVE_PROGRAMA,
B.FECHA_ALTA,
B.PRIMER_APELLIDO,
B.SEGUNDO_APELLIDO,
B.NOMBRE,
TO_CHAR(B.FECHA_NACIMIENTO,'DD/MM/YYYY'),
B.SEXO,
B.CVE_NACIONALIDAD,
B.CVE_ENTIDAD_NACIMIENTO,
B.CVE_GRADO_ESCOLAR,
B.CVE_GRADO_ESTUDIOS,
B.CURP,
B.CALLE,
B.NUM_EXT,
B.NUM_INT,
B.CODIGO_POSTAL,
B.ENTRE_CALLE,
B.Y_CALLE,
B.OTRA_REFERENCIA,
B.TELEFONO,
B.COLONIA,
B.LOCALIDAD,
B.CVE_MUNICIPIO,
B.CVE_ENTIDAD_FEDERATIVA,
B.CVE_CCT,
B.PRIMER_APELLIDO_C,
B.SEGUNDO_APELLIDO_C,
B.NOMBRE_C,
TO_CHAR(B.FECHA_NACIMIENTO_C,'DD/MM/YYYY'),
B.SEXO_C,
B.CVE_ESTADO_CIVIL_C,
B.CVE_GRADO_ESTUDIOS_C,
B.CVE_PARENTESCO_C,
B.CURP_C,
B.CVE_TIPO_ID_OFCL_C,
B.ID_DOCTO_OFL_C,
B.CVE_NACIONALIDAD_C,
B.CVE_ENTIDAD_NACIMIENTO_C,
B.CALLE_C,
B.NUM_EXT_C,
B.NUM_INT_C,
B.CODIGO_POSTAL_C,
B.ENTRE_CALLE_C,
B.Y_CALLE_C,
B.OTRA_REFERENCIA_C,
B.TELEFONO_C,
B.COLONIA_C,
B.LOCALIDAD_C,
B.CVE_MUNICIPIO_C,
B.CVE_ENTIDAD_FEDERATIVA_C,
B.E_MAIL_C,
'1'
)LOG ERRORS INTO ELOG_SEGURO_ESCOLAR REJECT LIMIT UNLIMITED;
and it just raises the error "ORA-01722: invalid number" and toad highlights the 'A.' part of the query.
Now about the tables
table A has all the fields in varchar2 (4000)
table b to d have formatting according to the data they hold (date, number, etc)
the thing is, even with the error logging clause it raises the error and doesn't merge anything!
Plus i have no idea what i should be looking for to find the 'invalid number' field
Any advice would be deeply appreciated
Found it!
It was the TO_CHAR(A.FECHA_NACIMIENTO,'DD/MM/YYYY') line. Just left it like this
A.FECHA_NACIMIENTO=B.FECHA_NACIMIENTO and it worked. Thanks anyway!
Related
I'm trying to update a table based on another one's information:
Source_Table (Table 1) columns:
TABLE_ROW_ID (Based on trigger-sequence when insert)
REP_ID
SOFT_ASSIGNMENT
Description (Table 2) columns:
REP_ID
NEW_SOFT_ASSIGNMENT
This is my loop statement:
SELECT count(table_row_id) INTO V_ROWS_APPROVED FROM Source_Table;
FOR i IN 1..V_ROWS_APPROVED LOOP
SELECT REQUESTED_SOFT_MAPPING INTO V_SOFT FROM Source_Table WHERE ROW_ID = i;
SELECT REP_ID INTO V_REP_ID FROM Source_Table WHERE ROW_ID = i;
UPDATE Description_Table D
SET D.NEW_SOFT_ASSIGNMENT = V_SOFT
WHERE D.REP_ID = V_REP_ID;
END LOOP;
END;
The ending result of this loop is a beautiful ''504 Gateway Time-out''.
I know the issue is on the Update query but there's no other way (I can think about) of doing it.
Can someone give me a hand please?
Thanks
Unless your row_id values are contiguous - i.e. count(row_id) == max(row_id) - then this will get a no-data-found. Sequences aren't gapless, so this seems fairly likely. We have no way of telling if that is happening and somehow that is leaving your connection hanging until it times out, or if it's just taking a long time because you're doing a lot of individual queries and updates over a large data set. (And you may be squashing any errors that do occur, though you haven't shown that.)
You don't need to query and update in a loop though, or even use PL/SQL; you can apply all the values in the source table to the description table with a single update or merge:
merge into description_table d
using source_table s
on (s.rep_id = d.rep_id)
when matched then
update set d.new_soft_assignment = s.requested_soft_mapping;
db<>fiddle with some dummy data, including a non-contiguous row_id to show that erroring.
I have the following query that I am using in Oracle 11g
IF EXISTS (SELECT * FROM EMPLOYEE_MASTER WHERE EMPID='ABCD32643')
THEN
update EMPLOYEE_MASTER set EMPID='A62352',EMPNAME='JOHN DOE',EMPTYPE='1' where EMPID='ABCD32643' ;
ELSE
insert into EMPLOYEE_MASTER(EMPID,EMPNAME,EMPTYPE) values('A62352','JOHN DOE','1') ;
END IF;
On running the statement I get the following output:
Error starting at line : 4 in command -
ELSE
Error report -
Unknown Command
1 row inserted.
Error starting at line : 6 in command -
END IF
Error report -
Unknown Command
The values get inserted with error when I run it directly. But when I try to execute this query through my application I get an oracle exception because of the error generated :
ORA-00900: invalid SQL statement
And hence the values are not inserted.
I am relatively new to Oracle. Please advise on what's wrong with the above query so that I could run this query error free.
If MERGE doesn't work for you, try the following:
begin
update EMPLOYEE_MASTER set EMPID='A62352',EMPNAME='JOHN DOE',EMPTYPE='1'
where EMPID='ABCD32643' ;
if SQL%ROWCOUNT=0 then
insert into EMPLOYEE_MASTER(EMPID,EMPNAME,EMPTYPE)
values('A62352','JOHN DOE','1') ;
end if;
end;
Here you you the update on spec, then check whether or not you found a matching row, and insert in case you didn't.
"what's wrong with the above query "
What's wrong with the query is that it is not a query (SQL). It should be a program snippet (PL/SQL) but it isn't written as PL/SQL block, framed by BEGIN and END; keywords.
But turning it into an anonymous PL/SQL block won't help. Oracle PL/SQL does not support IF EXISTS (select ... syntax.
Fortunately Oracle SQL does support MERGE statement which does the same thing as your code, with less typing.
merge into EMPLOYEE_MASTER em
using ( select 'A62352' as empid,
'JOHN DOE' as empname,
'1' as emptype
from dual ) q
on (q.empid = em.empid)
when not matched then
insert (EMPID,EMPNAME,EMPTYPE)
values (q.empid, q.empname, q.emptype)
when matched then
update
set em.empname = q.empname, em.emptype = q.emptype
/
Except that you're trying to update empid as well. That's not supported in MERGE. Why would you want to change the primary key?
"Does this query need me to add values to all columns in the table? "
The INSERT can have all the columns in the table. The UPDATE cannot change the columns used in the ON clause (usually the primary key) because that's a limitation of the way MERGE works. I think it's the same key preservation mechanism we see when updating views. Find out more.
How can I retrieve a list off nth occurence of data in a clob?
Example of a clob:
<bank>
<bankDetails>
<bankDetailsList>
<pk>1</pk>
<accountName>
<asCurrent>EDGARS LESOTHO</asCurrent>
</accountName>
<bankAccountType>
<asCurrent>CURR</asCurrent>
</bankAccountType>
</bankDetailsList>
<bankDetailsList>
<pk>2</pk>
<accountName>
<asCurrent>EDGARS LESOTHO 2</asCurrent>
</accountName>
<bankAccountType>
<asCurrent>CURR</asCurrent>
</bankAccountType>
</bankDetailsList>
</bankDetails>
</bank>
So I would like to retrieve all values of account names in sql assuming there might be up to nth list of this account names occurring in a clob.
I am using oracle 11g and SqlDeveloper 4.1.3
Your response is highly appreciated.
SELECT EXTRACTVALUE( v.COLUMN_VALUE, '/asCurrent' )
FROM table_name t,
TABLE(
XMLSequence(
EXTRACT(
XMLType( t.clob_column ),
'/bank/bankDetails/bankDetailsList/accountName/asCurrent'
)
)
) v
SELECT level as rnk, regexp_substr(t.clob_column,
'<accountName>[^<]*?<asCurrent>([^<]*?)<', 1, level, null, 1) as acct_name
FROM t
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t);
t is the table name and clob_column is the column with clob values (in my test case, the table has one row and one column, the value being the one in the original post).
If you have a column of clob values and need to do this simultaneously for more than one value, this needs to be modified a bit; please clarify the requirement and we can take it from there.
ADDED: To make it work with several rows, you need to modify the CONNECT BY LEVEL clause. You want each row to only reference itself; and to avoid issues with cycles, you need to add one more condition. Like this:
...
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t)
and clob_column= prior clob_column
and prior sys_guid() is not null;
I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING
Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message
UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );
Here's my original question:
merging two data sets
Unfortunately I omitted some intircacies, that I'd like to elaborate here.
So I have two tables events_source_1 and events_source_2 tables. I have to produce the data set from those tables into resultant dataset (that I'd be able to insert into third table, but that's irrelevant).
events_source_1 contain historic event data and I have to do get the most recent event (for such I'm doing the following:
select event_type,b,c,max(event_date),null next_event_date
from events_source_1
group by event_type,b,c,event_date,null
events_source_2 contain the future event data and I have to do the following:
select event_type,b,c,null event_date, next_event_date
from events_source_2
where b>sysdate;
How to put outer join statement to fill the void (i.e. when same event_type,b,c found from event_source_2 then next_event_date will be filled with the first date found
GREATLY APPRECIATE FOR YOUR HELP IN ADVANCE.
Hope I got your question right. This should return the latest event_date of events_source_1 per event_type, b, c and add the lowest event_date of event_source_2.
Select es1.event_type, es1.b, es1.c,
Max(es1.event_date),
Min(es2.event_date) As next_event_date
From events_source_1 es1
Left Join events_source_2 es2 On ( es2.event_type = es1.event_type
And es2.b = es1.b
And es2.c = es1.c
)
Group By c1.event_type, c1.b, c1.c
You could just make the table where you need to select a max using a group by into a virtual table, and then do the full outer join as I provided in the answer to the prior question.
Add something like this to the top of the query:
with past_source as (
select event_type, b, c, max(event_date)
from event_source_1
group by event_type, b, c, event_date
)
Then you can use past_source as if it were an actual table, and continue your select right after the closing parens on the with clause shown.
I end up doing two step process: 1st step populates the data from event table 1, 2nd step MERGES the data between target (the dataset from 1st step) and another source. Please forgive me, but I had to obfuscate table name and omit some columns in the code below for legal reasons. Here's the SQL:
INSERT INTO EVENTS_TARGET (VEHICLE_ID,EVENT_TYPE_ID,CLIENT_ID,EVENT_DATE,CREATED_DATE)
select VEHICLE_ID, EVENT_TYPE_ID, DEALER_ID,
max(EVENT_INITIATED_DATE) EVENT_DATE, sysdate CREATED_DATE
FROM events_source_1
GROUP BY VEHICLE_ID, EVENT_TYPE_ID, DEALER_ID, sysdate;
Here's the second step:
MERGE INTO EVENTS_TARGET tgt
USING (
SELECT ee.VEHICLE_ID VEHICLE_ID, ee.POTENTIAL_EVENT_TYPE_ID POTENTIAL_EVENT_TYPE_ID, ee.CLIENT_ID CLIENT_ID,ee.POTENTIAL_EVENT_DATE POTENTIAL_EVENT_DATE FROM EVENTS_SOURCE_2 ee WHERE ee.POTENTIAL_EVENT_DATE>SYSDATE) src
ON (tgt.vehicle_id = src.VEHICLE_ID AND tgt.client_id=src.client_id AND tgt.EVENT_TYPE_ID=src.POTENTIAL_EVENT_TYPE_ID)
WHEN MATCHED THEN
UPDATE SET tgt.NEXT_EVENT_DATE=src.POTENTIAL_EVENT_DATE
WHEN NOT MATCHED THEN
insert (tgt.VEHICLE_ID,tgt.EVENT_TYPE_ID,tgt.CLIENT_ID,tgt.NEXT_EVENT_DATE,tgt.CREATED_DATE) VALUES (src.VEHICLE_ID, src.POTENTIAL_EVENT_TYPE_ID, src.CLIENT_ID, src.POTENTIAL_EVENT_DATE, SYSDATE)
;