Update Query issue for multiple rows - oracle

I have below update query to set some values and controle the data flow.But i am getting error "Too many values" from the condtion(subquery)when i execute the bellow query.
UPDATE MTB ----- TABLE NAME
SET MTB_EXTR_FLAG='N',
MTB_ALOC_PROCESS='DC1'
WHERE MTB_I IN --- PRIMARY KEY
(
SELECT * FROM
(
SELECT MTB_I ,ROW_NUMBER() OVER (ORDER BY ROWID) AS RN
FROM MTB
)
WHERE RN BETWEEN 100 AND 500
)
Here my intension is selecting the different set up data per processing of one job.
I want to set MTB_EXTR_FLAG='N',MTB_ALOC_PROCESS='DC1' each time before running of the job with different set of data.
Can someone please help me to resolve the error issue or propose different query.
Thank you.

I think this is just a matter of number of columns not matching (2 - MTB_I and RN - instead of 1 - MTB_I):
UPDATE MTB
SET MTB_EXTR_FLAG='N',
MTB_ALOC_PROCESS='DC1'
WHERE MTB_I IN --- PRIMARY KEY
(
SELECT MTB_I FROM -- Else RN will be taken !!
(
SELECT MTB_I ,ROW_NUMBER() OVER (ORDER BY ROWID) AS RN
FROM MTB
)
WHERE RN BETWEEN 100 AND 500
)
You can't do where x in (...) with a subquery returning more fields than expected.

Related

Issue in Jqgrid pagination in Oracle server

We have a code to sort data and paginate the same and render the data to a Jqgrid. The code works fine when it is connected to an SQL server. That is on paginating each page returns distinct data as expected. But on connecting to an oracle server after some point of time the duplicate data are rendered. Both Oracle and SQL server has same data. Parameters in the Jqgrid page and the number of pages are working as expected on the server-side. That is on paging the start point and chunk size is correctly transferred to the server-side. The duplicate values are observed after sorting columns that are of type varchar in the database but hold numeric also. The database status column holds values of 3 and A, after sorting with the status column the duplicate data when the paginating issue is observed. Duplicate data in the sense, that data on page 2 will be the same as data on page 3. Any help will be appreciated. Thanks in advance...
Query One:-
select * from ( select row_.*, rownum rownum_ from ( Select x,y,z,status FROM tablename c WHERE status IN('in condition seperated with status') ORDER BY status asc ) row_ where rownum <= 30 ) where rownum_ > 20;
Query Two:-
select * from ( select row_.*, rownum rownum_ from ( Select x,y,z,status FROM tablename c WHERE status IN('in condition seperated with status') ORDER BY status asc ) row_ where rownum <= 20 ) where rownum_ > 10;
Here the query 1 and 2 always return the same results.
Where two or more values in the column of your ORDER BY clause are the same, you must always provide another secondary column to rank. Otherwise, data return has only a probability of fetching correct result as we expect. The possibility of getting a correct answer will be same as rolling a dice. The secondary column must be unique for accurate results. While you might be able to assume that they will sort themselves based on order entered
select * from( select row_.*, rownum rownum_ from( Select x,y,z,status FROM tablename c WHERE status IN('in condition seperated with status') ORDER BY status,x asc ) row_ where rownum <= 30 ) where rownum_ > 20;
Hoping x is a unique value. DBMS_RANDOM.VALUE can also be used in case if is an oracle specific query other than adding extra order by clause

How can i improce performance of sqlite with large table?

SQlite DB with single table and 60,000,000 records. time to run simple query is more then 100 seconds.
I've tried to switch to postgeSQL but its performance was even less good.
Hadn't test it on mySQL or msSQL.
Shell I split the table (lets say different table for each pointID - there are some hundreds of it? or different table for each month - then I'll have maximum of 10,000,000 records?)
sql scheme:
CREATE TABLE `collectedData` (
`id` INTEGER,
`timeStamp` double,
`timeDateStr` nvarchar,
`pointID` nvarchar,
`pointIDindex` double,
`trendNumber` integer,
`status` nvarchar,
`value` double,
PRIMARY KEY(`id`)
);
CREATE INDEX `idx_pointID` ON `collectedData` (
`pointID`
);
CREATE INDEX `idx_pointIDindex` ON `collectedData` (
`pointIDindex`
);
CREATE INDEX `idx_timeStamp` ON `collectedData` (
`timeStamp`
);
CREATE INDEX `idx_trendNumber` ON `collectedData` (
`trendNumber`
);
Next query took 107 seconds:
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by timestamp desc, id desc limit 5000
next query took 150 seconds (less conditions)
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointIDindex % 1 = 0
order by timestamp desc, id desc limit 5000
Editing:
Asnwer from another place - add the next index:
CREATE INDEX idx_All ON collectedData (trendNumber, pointid, pointIDindex, status, timestamp desc, id desc, timeDateStr, value)
had improved performance by factor of 3.
Editing #2: by #Raymond Nijland offer: the execution plan is:
SEARCH TABLE collectedData USING COVERING INDEX idx_All (trendNumber=? AND pointID=?)"
"0" "0" "0" "EXECUTE LIST SUBQUERY 1"
"0" "0" "0" "USE TEMP B-TREE FOR ORDER BY"
and thanks to him - using this data, I've changed the order of the rules in the query to the next:
select * from (
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by id desc limit 5000
) order by timestamp desc
this made big improvement (for me it's solved).
After #RaymondNijland had offered me to check the execution plan, I've changed the query to:
select * from (
select * from collectedData
where
trendNumber =1
and status <> '' and
timestamp <=1556793244
and pointid in ('point1','point2','pont3','point4','point5','point6','point7','point8','point9','pointa')
and pointIDindex % 1 = 0
order by id desc limit 5000
) order by timestamp desc
This query gives same results like the other, but is't 120 times faster (decrease the number of records before sorting).

Delete duplicate rows from a BigQuery table

I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING

Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message

UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );

recursive cte working very slow

I want to Group the rows based on certain columns, i.e. if data is same in these columns in continuous rows, then assign same Group Number to them, and if its changed, assign new one. This become complex as the same data in the columns could appear later in some other rows, so they have to be given another Group Number as they are not in continuous rows with previous group.
I used cte for this purpose and it is giving correct output also, but is so slow that iterating over 75k+ rows takes about 15 minutes. The code I used is:
WITH
cte AS (SELECT ROW_NUMBER () OVER (ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM t_opnames)
SELECT * INTO #ttt FROM cte;
WITH cte2 AS (SELECT TOP 1 RowNumber,
1 AS GroupNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen, SpecialismeCode, Specialismen
FROM #ttt
ORDER BY RowNumber
UNION ALL
SELECT c1.RowNumber,
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
END AS GroupNumber,
c1.Opnamenummer,c1.Patient_ID,c1.AfdelingsCode,c1.Opnamedatum,c1.Opnamedatumtijd,c1.Ontslagdatum,c1.Ontslagdatumtijd,c1.IsSpoedopname,c1.OpnameType,c1.IsNuOpgenomen, SpecialismeCode, Specialismen
FROM cte2 c2
JOIN #ttt c1 ON c1.RowNumber = c2.RowNumber + 1
)
SELECT *
FROM cte2
OPTION (MAXRECURSION 0) ;
DROP TABLE #ttt
I tried to improve performance by putting output of cte in a temp table. That increased the performance, but still its too slow. So, how can I increase the performance of this code to run it under 10 seconds for 75k+ records? The output before cancelling the query is: Screenshot. As visible from the image, data is same in columns Afdelingscode,Patient_ID and Opnametype in RowNumber 3,5 and 6, but they have different GroupNumber because of concurrency of the rows.
Without data its not that easy to test but i would try first to not use temporary table and just use both cte from start to end, ie;
;WITH
cte AS (...),
cte2 AS (...)
select * from cte2
OPTION (MAXRECURSION 0);
Without knowing indices etc... for instance, you do a lot of ordering in the first cte. Is this supported by indices (or one multicolumn index) or not?
Without the data i don't have the option to play with it but looking at this:
CASE
WHEN c2.Afdelingscode <> c1.Afdelingscode
OR c2.Patient_ID <> c1.Patient_ID
OR c2.Opnametype <> c1.Opnametype
THEN c2.GroupNumber + 1
ELSE c2.GroupNumber
i would try to take a look at partition by statement in row_number
So try to run this:
WITH
cte AS (
SELECT ROW_NUMBER () OVER (PARTITION BY Afdelingscode , Patient_ID ,Opnametype ORDER BY Patient_ID, Opnamenummer, SPECIALISMEN, Opnametype, OntslagDatumTijd ) AS RowNumber,
Opnamenummer, Patient_ID, AfdelingsCode, Opnamedatum, Opnamedatumtijd, Ontslagdatum, Ontslagdatumtijd, IsSpoedopname, OpnameType, IsNuOpgenomen
FROM t_opnames)

Slow oracle query being used in a tcl script

I have written this query to be used by orasql and orafetch in a tcl script. The query is nested into a proc that is run as many times as a date range given needs it to be run (usually around 3 times per instance). The tables with $YYYYMM are full month tables consisting of about 13 million rows. The $advocate variable will pull about 39 different account_ids which will reduce the number of rows returned substantially but not great as great as claim_status_code='4' (~460,000 rows). I have no idea what the current run-time of the query is because I have no allowed it to run longer than about 30 mins. I really need this to pull in a couple of seconds, I am just way to new to all of this to know how to improve the speed. I have looked at oracle documentation over optimizing queries and I have tried incorporate those suggestions into what I have written. Any help at all would be much appreciated.
select
case
when clg_payor_id
is null
then payor_id1
else clg_payor_id
end
, count(unique era_id||invoice) count
from marge.e835_clp_$YYYYMM#e835v2
where claim_status_code='4' and payor_id1!='client'
group by case when clg_payor_id is null then payor_id1 else clg_payor_id end, era_id
having era_id
in (
select era_id
from marge.e835_checks_$YYYYMM#e835v2
where input_date between '$date1' and '$date2'
group by account_id, era_id
having account_id
in (
select ea.account_id
from cowboy.enrolled_submitters es
, cowboy.enrolled_accounts ea
where ea.submitter_id=es.submitter_id
and es.advocate='$advocate'
)
)
order by count desc
UPDATE: As suggested, I have moved the era_id out of the HAVING clause and into the WHERE. When I run this query, it returns a lot of unwanted results because of all of the account_id's but I can't figure out why. The query did actually run though and it only took ~9 secs. So much improved.
select
case
when clg_payor_id
is null
then payor_id1
else clg_payor_id
end
, account_id, count(unique era_id||invoice) count
from marge.e835_clp_$YYYYMM#e835v2
where claim_status_code='4'
and payor_id1!='client'
and era_id
in (
select era_id
from marge.e835_checks_$YYYYMM#e835v2
where input_date between '$date1' and '$date2'
group by account_id
, era_id
having account_id
in (
select ea.account_id
from cowboy.enrolled_submitters es
, cowboy.enrolled_accounts ea
where ea.submitter_id=es.submitter_id
and es.advocate='$advocate'
)
)
group by case when clg_payor_id is null then payor_id1 else clg_payor_id end, account_id
order by count desc

Resources