Trying to insert values if particular column value not exist in table
I have tried with sub query in where statement
INSERT
INTO ANIMALDATA VALUES
(
( SELECT MAX(first)+1 FROM ANIMALDATA
)
,
'Animals',
'Lion',
10,
'',
'13-06-2019',
'STOP'
)
where not exists
(select NAMES from ANIMALDATA where NAMES='Lion');
If the lion not exist then do insert statement should run
Give me an idea what i am missing as i am a beginner to oracle queries. help me to proceed further. thanks in advance
Since you have a condition, I think you need to do an INSERT INTO...SELECT:
(UPDATE: the CREATE TABLE statement is there to provide simple test data. It is not part of the solution).
create table animaldata(first, kingdom, names, num, nl, dte, s) as
select 1, 'Animals', 'Tiger', 11, 'a', '13-06-2019', 'STOP' from dual;
INSERT
INTO ANIMALDATA select
( SELECT MAX(first)+1 FROM ANIMALDATA
)
,
'Animals',
'Lion',
10,
'',
'13-06-2019',
'STOP'
from dual
where not exists
(select NAMES from ANIMALDATA where NAMES='Lion');
Best regards,
Stew Ashton
please try below. Thanks,
INSERT
INTO ANIMALDATA a
select
( SELECT MAX(first)+1 FROM ANIMALDATA
)
,
'Animals',
'Lion',
10,
'',
'13-06-2019',
'STOP'
from dual
where not exists
(select 1 from ANIMALDATA b where b.NAMES='Lion' and a.NAMES = b.NAMES );
First off, don't use max(<value>) + 1 to come up with new values for a column - that does not play well with concurrent sessions.
Instead, you should create a sequence and use that in your inserts.
Next, if you are trying to do an upsert (update the row if it exists or insert if it doesn't), you could use a MERGE statement. In this case, you're trying to insert a row if it doesn't already exist, so you don't need the update part.
Therefore you should be doing something like:
CREATE SEQUENCE animaldata_seq
START WITH <find MAX VALUE OF animaldata.first>
INCREMENT BY 1
MAXVALUE 9999999999999999
CACHE 20
NOCYCLE;
MERGE INTO animaldata tgt
USING (SELECT 'Animals' category,
'Lion' animal,
10 num_animals,
NULL unknown_col,
TRUNC(SYSDATE) date_added,
'STOP' action
FROM dual) src
ON (tgt.animal = src.animal)
WHEN NOT MATCHED THEN
INSERT (<list of animaldata columns>)
VALUES (animaldata_seq.nextval,
src.animal,
src.unknown_col,
src.date_added,
src.action);
Note that I have tried to specify the columns being inserted into - that's good practice! Code that has insert statements that don't list the columns being inserted into are prone to errors should someone add a column to the table.
I have also assumed that the column you're adding the date into is of DATE datatypee; I have used sysdate (truncated to remove the time part) as the value to insert, but you may which to use a specific date, in which case you should use to_date(<string date>, '<string date format')
Related
I've tried to find an answer on several forums with no luck, so perhaps you can help me out.
I've got an INSERT ALL request that inserts thousands of rows at once.
INSERT ALL
INTO my_table (field_x, field_y, field_z) VALUES ('value_x1', 'value_y1', 'value_z1')
INTO my_table (field_x, field_y, field_z) VALUES ('value_x2', 'value_y2', 'value_z2')
...
INTO my_table (field_x, field_y, field_z) VALUES ('value_xn', 'value_yn', 'value_zn')
SELECT * FROM DUAL;
Now I'd like to amend it to update rows when some criteria are met. For each row, I could have something like:
MERGE INTO my_table m
USING (SELECT 'value_xi' x, 'value_yi' y, 'value_zi' z FROM DUAL) s
ON (m.field_x = s.x and m.field_y = s.y)
WHEN MATCHED THEN UPDATE SET
field_z = s.z,
WHEN NOT MATCHED THE INSERT (field_x, field_y, field_z)
VALUE(s.x, s.y, s.z);
Is there a way for me to do a kind of "MERGE ALL" that would allow to have all those merge requests in one?
Or maybe I'm missing the point and there's a better way to do this?
Thanks,
Edit: One possible solution is to use "UNION ALL" for a set of selects from dual, as follows:
MERGE INTO my_table m
USING (
select '' as x, '' as y, '' as z from dual
union all select 'value_x1', 'value_y1', 'value_z1' from dual
union all select 'value_x2', 'value_y2', 'value_z2' from dual
[...]
union all select 'value_xn', 'value_yn', 'value_zn' from dual
) s
ON (m.field_x = s.x and m.field_y = s.y)
WHEN MATCHED THEN UPDATE SET
field_z = s.z,
WHEN NOT MATCHED THEN INSERT (field_x, field_y, field_z)
VALUES (s.x, s.y, s.z);
NB: I've used a first empty row to be able generate all rows in the same format when I write the request. I also specify the columns names there.
Another solution would be to create a temporary table, INSERT ALL data into it, then merge with the target table and delete the temporary table.
If you're passing in tens of thousands of rows from your python script, I would do:
Create a global temporary table (GTT - this is a permanent table that holds data at session level)
Get your python script to insert the rows into the GTT
Use the GTT in the Merge statement, e.g.:
merge into your_main_table tgt
using your_gtt src
on (<join conditions>)
when matched then
update ...
when not matched then
insert ...;
I am currently using the following code to pivot a table and it works perfectly. Now I want to replace any null values with 'No Data' after it is summed but I am getting errors, so I think I am placing the case statement in the wrong place.
This works:
SELECT *
FROM (SELECT PROV_NO, DATA_YEAR, DATA_MONTH, MEASURE_ID, CASES
FROM pivot_test_2)
PIVOT (SUM(CASES) FOR (MEASURE_ID) IN ('MORT_30_AMI', 'MORT_30_HF', 'MORT_30_PN'))
order by PROV_NO, DATA_YEAR, DATA_MONTH;
but this does not
SELECT *
FROM (SELECT PROV_NO, DATA_YEAR, DATA_MONTH, MEASURE_ID, CASES
FROM pivot_test_2)
PIVOT (SUM(CASES) FOR (MEASURE_ID) IN ('MORT_30_AMI', 'MORT_30_HF', 'MORT_30_PN'))
case when MORT_30_HF is null then 'No Data' else MORT_30_HF end
order by PROV_NO, DATA_YEAR, DATA_MONTH;
I get "ORA-00933: SQL command not properly ended" as the error. I'm trying to place ";" around but the error is still the same. I am currently in Oracle 11g and using Golden as my scripting/retrieval software.
You can move the CASE statement to the SELECT statement and handle the NULL values there. Better yet, use COALESCE. But unfortunately you have to do this for each item in the SELECT list:
SELECT
--Must manually reference each column.
COALESCE(TO_CHAR(MORT_30_AMI), 'No Data') MORT_30_AMI,
COALESCE(TO_CHAR(MORT_30_HF), 'No Data') MORT_30_HF,
COALESCE(TO_CHAR(MORT_30_PN), 'No Data') MORT_30_PN,
PROV_NO, DATA_YEAR, DATA_MONTH
FROM (SELECT PROV_NO, DATA_YEAR, DATA_MONTH, MEASURE_ID, CASES
FROM pivot_test_2)
PIVOT
(
SUM(CASES)
FOR (MEASURE_ID) IN
--Use aliases to make the columns easier to use.
('MORT_30_AMI' MORT_30_AMI, 'MORT_30_HF' MORT_30_HF, 'MORT_30_PN' MORT_30_PN))
ORDER BY PROV_NO, DATA_YEAR, DATA_MONTH;
A Simpler Version That Doesn't Work
Ideally you would be able to replace this part of the code:
SUM(CASES)
With this:
COALESCE(TO_CHAR(SUM(CASES)), 'No data')
Then you wouldn't need to handle each column separately. But there doesn't appear to be a way to automatically apply a non-aggregate function to the results of a PIVOT. Using the above code generates this error message:
ORA-56902: expect aggregate function inside pivot operation
Sample Schema
create table pivot_test_2
(
PROV_NO CHAR(6),
DATA_YEAR NUMBER(4),
DATA_MONTH Number(2),
MEASURE_ID VARCHAR2(250),
CASES NUMBER
);
insert into pivot_test_2
select 'A', 2000, 1, 'MORT_30_AMI', 1 from dual union all
select 'A', 2000, 1, 'MORT_30_AMI', 1 from dual union all
select 'A', 2000, 1, 'MORT_30_HF', 2 from dual union all
select 'A', 2000, 1, 'MORT_30_HF', 2 from dual;
Thanks everyone, with help from you all, I was able to cob this together and it works.
SELECT PROV_NO, DATA_YEAR, DATA_MONTH,
case when MORT_30_AMI is null then 'No Data' else to_char(MORT_30_AMI) end as MORT_30_AMI,
case when MORT_30_HF is null then 'No Data' else to_char(MORT_30_HF) end as MORT_30_HF,
case when MORT_30_PN is null then 'No Data' else to_char(MORT_30_PN) end as MORT_30_PN
FROM pivot_test_2
PIVOT (SUM(CASES) FOR (MEASURE_ID) IN ('MORT_30_AMI' as MORT_30_AMI,'MORT_30_HF' as MORT_30_HF, 'MORT_30_PN' as MORT_30_PN))
order by PROV_NO, DATA_YEAR, DATA_MONTH;
Use DECODE keyword for the query defined fields (from the list of pivot defined fields) and then replace NULL with whatever value you need. i.e., instead of NULL replace it with 'No Data'. I believe that should solve this issue.
How can I retrieve a list off nth occurence of data in a clob?
Example of a clob:
<bank>
<bankDetails>
<bankDetailsList>
<pk>1</pk>
<accountName>
<asCurrent>EDGARS LESOTHO</asCurrent>
</accountName>
<bankAccountType>
<asCurrent>CURR</asCurrent>
</bankAccountType>
</bankDetailsList>
<bankDetailsList>
<pk>2</pk>
<accountName>
<asCurrent>EDGARS LESOTHO 2</asCurrent>
</accountName>
<bankAccountType>
<asCurrent>CURR</asCurrent>
</bankAccountType>
</bankDetailsList>
</bankDetails>
</bank>
So I would like to retrieve all values of account names in sql assuming there might be up to nth list of this account names occurring in a clob.
I am using oracle 11g and SqlDeveloper 4.1.3
Your response is highly appreciated.
SELECT EXTRACTVALUE( v.COLUMN_VALUE, '/asCurrent' )
FROM table_name t,
TABLE(
XMLSequence(
EXTRACT(
XMLType( t.clob_column ),
'/bank/bankDetails/bankDetailsList/accountName/asCurrent'
)
)
) v
SELECT level as rnk, regexp_substr(t.clob_column,
'<accountName>[^<]*?<asCurrent>([^<]*?)<', 1, level, null, 1) as acct_name
FROM t
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t);
t is the table name and clob_column is the column with clob values (in my test case, the table has one row and one column, the value being the one in the original post).
If you have a column of clob values and need to do this simultaneously for more than one value, this needs to be modified a bit; please clarify the requirement and we can take it from there.
ADDED: To make it work with several rows, you need to modify the CONNECT BY LEVEL clause. You want each row to only reference itself; and to avoid issues with cycles, you need to add one more condition. Like this:
...
CONNECT BY level <= (select regexp_count(clob_column, '<accountName>') FROM t)
and clob_column= prior clob_column
and prior sys_guid() is not null;
I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING
Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message
UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );
Table has columns for issue_date, part_num and date_received.
If an issue_date is null, I want to select issue_date of part_num + 1 (the next part number), and insert it in the issue_date column of the part with no issue date.
part_num is sequential.
What sql statement would select then insert the appropriate issue date?
Thank you in advance for any help.
Figured it out with a little self join statement.. thank you delete if you wish!!
Try this:
update t
set t.issue_date = (select issue_date
from t t1
where t1.part_num = t.part_num+1)
where t.issue_date is null
But, if the next part number also doesn't have issue_date, this will ramain null's in issue_date. To solve this problem you can change query to this one (if it's suitable for your application):
update t
set t.issue_date = (select min(issue_date)
from t t1
where t1.part_num > t.part_num)
where t.issue_date is null