I want to check whether the array of values (4690, 4693) both is exists in the contextid column without using functions as the table contains more that a million records
Table structure:
ID
CONTEXTID
4
4690
5
4690
6
4693
7
4693
8
4690
What about this query?
select
case when count(distinct CONTEXTID) = 2 then 'Y' else 'N' end as contains_4690_4693
from tab
where CONTEXTID in (4690, 4693)
it retuns Y if both keys are in the table at least once, N otherwise.
If you just want to find out if they exist then
SELECT DISTINCT(CONTEXTID)
FROM SOME_TABLE
wHERE CONTEXTID IN (4690, 4693)
will do it. If CONTEXTID isn't indexed, though, the database will have to do a full table scan which will probably be slow.
Takeaway: add an index on CONTEXTID or live with the fact that it's going to be slow.
If the "test" values are known when you write the query (as they very rarely are - even though all the solutions presented so far make the implicit assumption that they are), you could do something like this - which is probably the most efficient way, regardless of whether there is an index on the relevant column or not:
select case when exists
( select *
from sys.odcinumberlist(4690, 4693)
where column_value not in ( select contextid
from the_table
where context_id is not null
)
) then 'Not all found' else 'All found' end as result
from dual
;
Note how I gave an array of input values to the query - I used the sys.odcinumberlist constructor. You will have to clarify how you plan to "input" an array of "test" values.
Related
Does apex_string.split always guarantee that the order of the rows returned is the order of the characters of the string ?
Can I rely on the rownum to always correspond to 1 for the first character of the split string ?
or do I need to add a order by rownum ?
What is the method to get the rows in the same order of the characters of the string ?
My requirement is to insert the rows returned by apex_string.split in the same order as the characters of the string.
I am currently executing the below, will this maintain the character order ?
select t.column_value value, rownum seq
from table(apex_string.split('test','')) t
bulk collect into ins_arr;
for i in ins_arr.first..ins_arr.last
loop
/* execute insert statement */
insert into table (seq, value )
values (ins_arr.seq,ins_arr.value);
end loop
The insert should result in
seq
value
1
t
2
e
3
s
4
t
Thank you in advance,
I don't think it's guaranteed, becuase if it was, it would be in the documentation. But I think you can accomplish what you want by changing your routine. (Note, I have not verified this.)
insert into table (seq, value)
select t.column_value value,
row_number() over (order by t.column_value)
from table(apex_string.split('test','')) t
I think you can do the same with rownum, but I'm never 100% sure what order the rownum and the order by happen in.
Am trying to list top 3 records from atable based on some amount stored in a column FTE_TMUSD which is of varchar datatype
below is the query i tried
SELECT *FROM
(
SELECT * FROM FSE_TM_ENTRY
ORDER BY FTE_TMUSD desc
)
WHERE rownum <= 3
ORDER BY FTE_TMUSD DESC ;
o/p i got
972,9680,963 -->FTE_TMUSD values which are not displayed in desc
I am expecting an o/p which will display the top 3 records of values
That should work; inline view is ordered by FTE_TMUSD in descending order, and you're selecting values from it.
What looks suspicious are values you specified as the result. It appears that FTE_TMUSD's datatype is VARCHAR2 (ah, yes - it is, you said so). It means that values are sorted as strings, not numbers - and it seems that you expect numbers. So, apply TO_NUMBER to that column. Note that it'll fail if column contains anything but numbers (for example, if there's a value 972C).
Also, an alternative to your query might be use of analytic functions, such as row_number:
with temp as
(select f.*,
row_number() over (order by to_number(f.fte_tmusd) desc) rn
from fse_tm_entry f
)
select *
from temp
where rn <= 3;
I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING
Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message
UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );
I need a query to return boolean when there's table has data in the given range.
Assume table
Customer
[User ID, Name, Date, Products_Purchased]
I'm trying to do:
select case when exists(
select Date, count(*)
from Customer
where date between '2015-08-03' and '2015-08-05'
)
then cast(1 as BIT)
else case(0 as BIT)end;
This is throwing an error near "select Date".
However, weird part is the inner query is running perfectly fine.
Im wondering if im missing out something here !
What about something more straightforward e.g.
select case when count(*) >0 then 1 else 0 end as HIT
from ... where ...
That way you don't have to bother about Hive assuming that EXISTS implies a correlated sub-query, automagically translated into a MapJoin, i.e. a Java HashMap shuffled to the 2nd line of Mappers jobs, etc. Not exactly your use case.
Then it's not useful to compute the exact count, so the query could be refined as
select case when count(*) >0 then 1 else 0 end as HIT
from
(select ... from ... where ... limit 1) X
[Edit] There is no "bit" datatype in Hive. But the default "int" should be OK if you just want a return flag (zero / non-zero)
I have a row that is a varchar(50) that has a unique constraint and i would like to get the next unique number for an new insert but with a given prefix.
My rows could look like this:
ID (varchar)
00010001
00010002
00010003
00080001
So if I would like to get the next unqiue number from the prefix "0001" it would be "00010004" but if I would want it for the prefix "0008" it would be "00080002".
There will be more then 1 millon entries in this table. Is there a way with Oracle 11 to perform this kind of operation that is fairly fast?
I know that this setup is totaly insane but this is what I have to work with. I cant create any new tables etc.
You can search for the max value of the specified prefix and increment it:
SQL> WITH DATA AS (
2 SELECT '00010001' id FROM DUAL UNION ALL
3 SELECT '00010002' id FROM DUAL UNION ALL
4 SELECT '00010003' id FROM DUAL UNION ALL
5 SELECT '00080001' id FROM DUAL
6 )
7 SELECT :prefix || to_char(MAX(to_number(substr(id, 5)))+1, 'fm0000') nextval
8 FROM DATA
9 WHERE ID LIKE :prefix || '%';
NEXTVAL
---------
00010004
I'm sure you're aware that this is an inefficient method to generate a primary key. Furthermore it won't play nicely in a multi-user environment and thus won't scale. Concurrent inserts will wait then fail since there is a UNIQUE constraint on the column.
If the prefix is always the same length, you can reduce the workload somewhat: you could create a specialized index that would find the max value in a minimum number of steps:
CREATE INDEX ix_fetch_max ON your_table (substr(id, 1, 4),
substr(id, 5) DESC);
Then the following query could use the index and will stop at the first row retrieved:
SELECT id
FROM (SELECT substr(id, 1, 4) || substr(id, 5) id
FROM your_table
WHERE substr(id, 1, 4) = :prefix
ORDER BY substr(id, 5) DESC)
WHERE rownum = 1
If you need to do simultaneous inserts with the same prefix, I suggest you use DBMS_LOCK to request a lock on the specified newID. If the call fails because someone is already inserting this value, try with newID+1. Although this involves more work than traditional sequence, at least your inserts won't wait on each others (potentially leading to deadlocks).
This is a very unsatisfactory situation for you. As other posters have pointed out - if you don't use sequences then you will almost certainly have concurrency issues. I mentioned in a comment the possibility that you live with big gaps. This is the simplest solution but you will run out of numbers after 9999 inserts.
Perhaps an alternative would be to create a separate sequence for each prefix. This would only really be practical if the number of prefixes is fairly low but it could be done.
ps - your requirement that > 1000000 records should be possible may, in fact, mean you have no choice but to redesign the database.
SELECT to_char(to_number(max(id)) + 1, '00000000')
FROM mytable
WHERE id LIKE '0001%'
SQLFiddle demo here http://sqlfiddle.com/#!4/4f543/5/0