Comparison retrieving same value as differences - oracle

I am trying to compare the date fields between two of my tables T1 and T2.
I have framed a query similar to below and doing the comparison:
Query:
SELECT distinct t1.id t1_id, t2.id t2_id, t1.source t1_source, t2.source t2_source, to_date(t1.t1_date,'MM-DD-YYYY') t1_date, to_char(t2.t2_date,'DD-MON-YY') t2_date
FROM t1
JOIN t2 ON t1.id=t2.id
AND t1.source = t2.source
and to_date(t1.t1_date,'MM-DD-YYYY') <> to_char(t2.t2_date,'DD-MON-YY');
Though the T1_DATE and T2_DATE are one and the same, the results are popping up that they are not equal.
------------------------------------------------------------------------
T1_ID | T2_ID | T2_SOURCE | T2_SOURCE | T1_DATE | T2_DATE
------------------------------------------------------------------------
123 | 123 | SOU | SOU | 17-FEB-47 | 17-FEB-47
234 | 234 | SOU | SOU | 01-JAN-49 | 01-JAN-49
------------------------------------------------------------------------
I did copy pasted T1_DATE and T2_DATE to notepad and confirmed that they don't have any leading or trailing spaces.
I verified from the table that they don't have any leading or trailing spaces.
The date condition I added for comparison too doesn't have any spaces.
I now updated my query to add trim to my condition:
Updated query:
SELECT distinct t1.id t1_id, t2.id t2_id, t1.source t1_source, t2.source t2_source, to_date(t1.t1_date,'MM-DD-YYYY') t1_date, to_char(t2.t2_date,'DD-MON-YY') t2_date
FROM t1
JOIN t2 ON t1.id=t2.id
AND t1.source = t2.source
and trim(to_date(t1.t1_date,'MM-DD-YYYY')) <> trim(to_char(t2.t2_date,'DD-MON-YY'));
And now it worked fine. What is happening in the backend?
Please advice.

When you compare with:
to_date(t1.t1_date,'MM-DD-YYYY') <> to_char(t2.t2_date,'DD-MON-YY');
you are comparing a date on the left-hand side with a string on the right-hand side. That is forcing Oracle to do an implicit conversion so that they are both the same data type, to allow the values to be compared.
The data comparison rules say:
When comparing a character value with a DATE value, Oracle converts the character data to DATE.
So the string you just generated is converted back to a date, so you're really doing:
to_date(t1.t1_date,'MM-DD-YYYY') <> to_date(to_char(t2.t2_date,'DD-MON-YY'))
... which used your NLS settings, and your NLS_DATE_FORMAT appears to be DD-MON-YY too. The string is going to be '17-FEB-47', and when you convert that back to a date with the two-digit YY format mask that uses the current century, so it comes out as 2047, not 1947. So you end up effectively doing:
date '1947-02-17' <> date '2047-02-17'
... which is true, as the dates are in different centuries, hence your rows are displayed.
But when you do:
trim(to_date(t1.t1_date,'MM-DD-YYYY')) <> trim(to_char(t2.t2_date,'DD-MON-YY'))
you are now comparing two strings. The left-hand side is converted to a date and now that is implicitly converted back to a string, again using your NLS setting, and the right-hand side isn't modified. So you end up effectively doing:
'17-FEB-47' <> '17-FEB-47'
... which is false, so the rows are not shown.
So, either leave the right-hand side as its original date value:
to_date(t1.t1_date,'MM-DD-YYYY') <> t2.t2_date
or if that has a time part you want to ignore then truncate it, but leave it as a date:
to_date(t1.t1_date,'MM-DD-YYYY') <> trunc(t2.t2_date)
fiddle

Related

Impala Query to get next date

I have 2 Impala tables.
1st table T1 (additional columns are there but I am interested in only date and day type as weekday):
date day_type
04/01/2020 Weekday
04/02/2020 Weekday
04/03/2020 Weekday
04/04/2020 Weekend
04/05/2020 Weekend
04/06/2020 Weekday
2nd table T2:
process date status
A 04/01/2020 finished
A 04/02/2020 finished
A 04/03/2020 finished
A 04/03/2020 run_again
Using Impala queries I have to get the maximum date from second table T2 and get its status. According to the above table 04/03 is the maximum date.
If the status is finished on 04/03, then my query should return the next available weekday date from T1 which is 04/06/2020.
But if the status is run_again, then the query should return the same date.
In the above table, 04/03 has run_again and when my query runs the output should be 04/03/2020 and not 04/06/2020.
Please note more than one status is possible for a date. For example, 04/03/2020 can have a row with finished as status and another with run again as status. In this case run again should be prioritized and the query should give 04/03/2020 as output date
What I tried so far:
I ran a subquery from second table and got the maximum date and its status. I tried to run a case in my main query and gave T1 as subselect in Case statement but its not working.
Is it possible to achieve this through Impala query?
One way to do this is to create a CTE from table T1 instead of a correlated subquery. Something like:
WITH T3 as (
select t.date date, min(x.date) next_workday
from T1 t join T1 x
on t.date < x.date
where x.day_type = 'Weekday'
group by t.date
)
select T2.process, T2.date run_date, T2.status,
case when T2.status = 'finished' then T3.next_workday
else T3.date
end next_run_date
from T2 join T3
on T2.date = T3.date
order by T2.process, T2.date;
+---------+------------+-----------+---------------+
| process | run_date | status | next_run_date |
+---------+------------+-----------+---------------+
| A | 2020-04-01 | finished | 2020-04-02 |
| A | 2020-04-02 | finished | 2020-04-03 |
| A | 2020-04-03 | run again | 2020-04-03 |
+---------+------------+-----------+---------------+
You can then select max from the result instead of ordering.
There might be multiple solutions and even some better ones considering performance but this is my approach. Hope it helps.
select case when status='run_again' then t2_date else t1_date end as needed_date from t2 cross join (select t1_date from t1 where t1.day_type='Weekday' and t1_date>(select max(t2_date) from t2) order by t1.t1_date limit 1)a where t2_date=(select max(t2_date) from t2);

SQL Error: ORA-01861: literal does not match format string 01861 being thrown on SQL select of a NUMBER field

In referencing a view in an Oracle database, we have some columns that contain decimal days which are produced from calculations like the one below:
CASE
WHEN
(
date_field_a is null
OR date_field_b IS NULL
)
THEN
-9999
ELSE
to_number(to_char(to_date(date_field_a, 'DD-MON-YYYY') - to_date(date_field_b, 'DD-MON-YYYY')),'9999.9999')
Which results in a NUMBER column containing NULL values, -9999, or decimal digit values. The calculation seems to be performed correctly, and throws no errors during the creation.
However, when pulling this view into crystal reports or another program, this field and several others like it, cause this error:
SQL Error: ORA-01861: literal does not match format string 01861
Which, doesn't make sense to me since these fields are registering as NUMBER fields.
What is causing this error?
Edit:
date_field_a and date_field_b are datetime data types.
Like Caius said, your underlying table probably has some data which are not formatted the way you expect. If that's the case, you can query the table to try to find the problem rows, e.g.
select *
from YOUR_TABLE
where not regexp_like(date_field_a, '[0-9][0-9]?-[a-zA-Z]{3}-[0-9]{4}')
or not regexp_like(date_field_b, '[0-9][0-9]?-[a-zA-Z]{3}-[0-9]{4}');
*note, this simple regexp won't catch if you have a misspelled month. But that seems unlikely.
Another alternative is that, like Wernfried suggested, if your date_field_a is actually a DATE data type, then your problem is likely due to implicit conversion - you're relying on the NLS_DATE_FORMAT environment variable being identical on all client machines, which is impossible to guarantee.
If your columns are DATE, don't wrap them in a TO_DATE() call - it will do an implicit conversion and actually call to_date(to_char(date_field_a, NLS_DATE_FORMAT), 'DD-MON-YYYY') - and you will get unpredictable results.
Description
When you encounter an ORA-01861 error, the following error message will appear:
Cause
You tried to enter a literal with a format string, but the length of the format string was not the same length as the literal.
Resolution
Re-enter the literal so that it matches the format string.
For example, if you tried to execute the following statement:
SELECT TO_DATE('20041308','yyyy/mm/dd') FROM dual;
You would receive the following error message:
You could correct the SQL statement as follows:
SELECT TO_DATE('2004/08/13','yyyy/mm/dd') FROM dual;
As a general rule, if you are using the TO_DATE function, TO_TIMESTAMP function, TO_CHAR function, and similar functions, make sure that the literal that you provide matches the format string that you've specified.
Per the edit to your question:
Edit: date_field_a and date_field_b are datetime data types.
Then you should ditch all this conversion code and go with:
CREATE VIEW x AS
SELECT
...
COALESCE(date_field_a - date_field_b, -9999) as yournumber,
...
FROM
yourtable
If the dates are dates, then you get a number of days and fractions of days between the two. If two dates are exactly 36 hours apart you get 1.5 (days)
If one of the dates is null, you get null. In that case you want -9999, and that's what the coalesce is for
select sysdate - TO_DATE('2001-01-01', 'yyyy-mm-dd') from dual
| SYSDATE-TO_DATE('2001-01-01','YYYY-MM-DD') |
| -----------------------------------------: |
| 6848.718715277777777777777777777777777778 |
select sysdate - CAST(null as date) from dual
| SYSDATE-CAST(NULLASDATE) |
| -----------------------: |
| null |
select COALESCE(sysdate - CAST(null as date), -9999) from dual
| COALESCE(SYSDATE-CAST(NULLASDATE),-9999) |
| ---------------------------------------: |
| -9999 |
db<>fiddle here

Extracting strings between distinct characters using hive SQL

I have a field called geo_data_display which contains country, region and dma. The 3 values are contained between = and & characters - country between the first "=" and the first "&", region between the second "=" and the second "&" and DMA between the third "=" and the third "&". Here's a re-producible version of the table. country is always character but region and DMA can be either numeric or character and DMA doesn't exist for all countries.
A few sample values are:
country=us&region=tx&dma=625&domain=abc.net&zipcodes=76549
country=us&region=ca&dma=803&domain=abc.com&zipcodes=90404
country=tw&region=hsz&domain=hinet.net&zipcodes=300
country=jp&region=1&dma=a&domain=hinet.net&zipcodes=300
I have some sample SQL but the geo_dma code line isn't working at all and the geo_region code line only works for character values
SELECT
UPPER(REGEXP_REPLACE(split(geo_data_display, '\\&')[0], 'country=', '')) AS geo_country
,UPPER(split(split(geo_data_display, '\\&')[1],'\\=')[1]) AS geo_region
,split(split(cast(geo_data_display as int), '\\&')[2],'\\=')[2] AS geo_dma
FROM mytable
You can use str_to_map like so:
select geo_map['country'] as geo_country
,geo_map['region'] as geo_region
,geo_map['dma'] as geo_dma
from (select str_to_map(geo_data_display,'&','=') as geo_map
from mytable
) t
;
+--------------+-------------+----------+
| geo_country | geo_region | geo_dma |
+--------------+-------------+----------+
| us | tx | 625 |
| us | ca | 803 |
| tw | hsz | NULL |
| jp | 1 | a |
+--------------+-------------+----------+
Source
regexp_extract(string subject, string pattern, int index)
Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 1) returns 'the'
select
regexp_extract(geo_data_display, 'country=(.*?)(&region)', 1),
regexp_extract(geo_data_display, 'region=(.*?)(&dma)', 1),
regexp_extract(geo_data_display, 'dma=(.*?)(&domain)', 1)
Please try the following,
create table ch8(details map string,string>)
row format delimited
collection items terminated by '&'
map keys terminated by '=';
Load the data into the table.
create another table using CTAS
create table ch9 as select details["country"] as country, details["region"] as region, details["dma"] as dma, details["domain"] as domain, details["zipcodes"] as zipcode from ch8;
Select * from ch9;

Oracle Trim RegEx

I am trying to execute a join between two tables in Oracle where the column in one of the tables is a string and number in the other.
I need to perform some sort of trim function the string version because it is an 8 character field and will lead with 0s in cases where the number is less than 8 digits.
For example, 123 = '00000123'. How can I get the string '00000123' to equal '123' regardless of the number of leading 0s.
Thanks!!
Use to_number conversion function:
SELECT to_number('00000123')
FROM dual;
| TO_NUMBER('00000123') |
|-----------------------|
| 123 |
Demo: http://sqlfiddle.com/#!4/1792d/18

Use Oracle unnested VARRAY's instead of IN operator

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Resources