MINUS not working for the same value in Snowflake - oracle

I loaded data from oracle to snowflake in table1 using informatica.
and the same data we have in snowflake table already table2.
i want to perform minus query for testing but it doesn't work as expected.
eg col1 field value is 1.21 in table1 and the datatype is same as snowflake table2.
col1 fields value is 1.21 in table2
when i run
select col1 form table1
minus
select col1 from table2
it gives two rows but when we check those records value is same.
What could be this issue ?
Any leads are appreciated.
Thanks .

given
select 1.21
minus
select 1.21;
gives no rows, it suspect it's a type thing.
but
select 1.2100000000000001::double as a
minus
select 1.2099999999999999::double as a;
gives 1.21, but also if you output those values you see
select 1.2100000000000001::double as a
union
select 1.2099999999999999::double as a;
values:
1.21
1.21
which is the default format of 6 decimal places of these values
adding one more level of decimal places there is magically no difference..
select 1.21000000000000001::double as a
minus
select 1.20999999999999999::double as a;
basically floating point number are not how they appear. ether cast them to some smaller type like
select 1.2100000000000001::number(30,3) as a
minus
select 1.2099999999999999::number(30,3) as a;
gives no "difference"

Related

How to only select existing values from oracle?

I have a table with a massive number of columns. So many, that when I do SELECT * I can't even see any values because all the columns fill up the screen. I'd like to do something like this:
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND <THIS COLUMN> IS NOT NULL
Is this possible? Note: VALUE is not a column.
There are so many questions on SO that ask this same question, but they have some bizarre twist, and the actual question is not answered.
I've tried:
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND VALUE NOT NULL
*
Invalid relational operator
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND VALUE <> ''
*
'VALUE': invalid identifier
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND COLUMN NOT NULL
*
Missing Expression
Bonus Questions:
Is there any way to force Oracle to only show one output screen at a time?
Is there a variable to use in the WHERE clause that relates to the current column? Such as: WHERE this.column = '1', where it would check each column to match that expression?
Is there any way to get back your last command in Oracle? (I have to remote into a Linux box running Oracle - it's all command line - can't even copy/paste, so I have to type every command by hand, with a wonky connection, so it's taking an extremely long time to debug this stuff)
If you are trying to find all the non null column values for a particular record you could try an unpivot provided all the columns you are unpivoting have the same data type:
SELECT *
FROM (select * from my_table where name like '%unique value%')
UNPIVOT [include nulls] (col_value FOR col_name IN (col1, col2, ..., coln))
with the above code null values will be excluded unless you include the optional include nulls statement, also you will need to explicitly list each column you want unpivoted.
If they don't all have the same data type, you can use a variation that doesn't necessarily prune away all the null values:
select *
from (select * from my_table where name like '%unique value%')
unpivot ((str_val, num_val, date_val)
for col_name in ((cola, col1, date1)
,(colb, col2, date2)
,(colc, col3, date1)));
You can have a fairly large set of column groups, though here I'm showing just three, one for each major data type, with the IN list you need to have a column listed for each column in your column group, though you can reuse columns as shown by the date_val column where I've used date1 twice. As an alternative to reusing an existing column, you could use a dummy column with a null value:
select *
from (select t1.*, null dummy from my_table t1 where name like '%unique value%')
unpivot ((str_val, num_val, date_val)
for col_name in ((dummy, col1, date1)
,(colb, dummy, date2)
,(colc, col3, dummy)));
Have tried this?
SELECT * FROM my_table WHERE NAME LIKE '%unique name%' AND value IS NOT NULL;
Oracle / PLSQL: IS NOT NULL Condition
For row number:
SELECT field1, field2, ROW_NUMBER() OVER (PARTITION BY unique_field) R WHERE R=1;
Usually in Linux consoles you can use arrow up&down to repeat the last sentence.

Range Partitioning in Hive

Does hive support range partitioning?
I mean does hive supports something like below:
insert overwrite table table2 PARTITION (employeeId BETWEEN 2001 and 3000)
select employeeName FROM emp10 where employeeId BETWEEN 2001 and 3000;
Where table2 & emp10 has two columns:
employeeName &
employeeId
When I run the above query i am facing an error:
FAILED: ParseException line 1:56 mismatched input 'BETWEEN' expecting ) near 'employeeId' in destination specification
Is not possible. Here is a quote from Hive documentation :
A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns
No its not possible. Even I use separate calculated column like ,
insert overwrite table table2 PARTITION (employeeId_range)
select employeeName , employeeId/1000 FROM emp10 where employeeId BETWEEN 2000 and 2999;
which will make sure all values fall in same partition.
while querying the table since we already know the range calculator, we can
select employeeName , employeeId FROM table2 where employeeId_range=2;
Thus we can also parallelise the queries of given ranges.
Hope it helps.

Hive Timestamp aggregation

I have two hive tables, in which one table is updating an hourly basic by Java API team (they are calling and storing it into hive table1). And now I have to aggregate the latest data and store it into another table called table2 (data which are loaded newly,because old data have been aggregated and stored). For that I have used the query below:
set maxtime = select max(lastactivitytimestamp) from table2;
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('${hivevar:maxtime}');
I am not getting any result. But when I give the timestamp value manually I am getting data, like below:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('2014-08-18 15:23:26.754');
Is it possible to pass dynamic values in unix_timestamp?
Try removing the upper commas from the unix_timestamp() function, like this:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp(${hivevar:maxtime});

Oracle identity column and insert into select

Oracle 12 introduced nice feature (which should have been there long ago btw!) - identity columns. So here's a script:
CREATE TABLE test (
a INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
b VARCHAR2(10)
);
-- Ok
INSERT INTO test (b) VALUES ('x');
-- Ok
INSERT INTO test (b)
SELECT 'y' FROM dual;
-- Fails
INSERT INTO test (b)
SELECT 'z' FROM dual UNION ALL SELECT 'zz' FROM DUAL;
First two inserts run without issues providing values for 'a' of 1 and 2. But the third one fails with ORA-01400: cannot insert NULL into ("DEV"."TEST"."A"). Why did this happen? A bug? Nothing like this is mentioned in the documentation part about identity column restrictions. Or am I just doing something wrong?
I believe the below query works, i havent tested!
INSERT INTO Test (b)
SELECT * FROM
(
SELECT 'z' FROM dual
UNION ALL
SELECT 'zz' FROM dual
);
Not sure, if it helps you any way.
For, GENERATED ALWAYS AS IDENTITY Oracle internally uses a Sequence only. And the options on general Sequence applies on this as well.
NEXTVAL is used to fetch the next available sequence, and obviously it is a pseudocolumn.
The below is from Oracle
You cannot use CURRVAL and NEXTVAL in the following constructs:
A subquery in a DELETE, SELECT, or UPDATE statement
A query of a view or of a materialized view
A SELECT statement with the DISTINCT operator
A SELECT statement with a GROUP BY clause or ORDER BY clause
A SELECT statement that is combined with another SELECT statement with the UNION, INTERSECT, or MINUS set operator
The WHERE clause of a SELECT statement
DEFAULT value of a column in a CREATE TABLE or ALTER TABLE statement
The condition of a CHECK constraint
The subquery and SET operations rule above should answer your Question.
And for the reason for NULL, when pseudocolumn(eg. NEXTVAL) is used with a SET operation or any other rules mentioned above, the output is NULL, as Oracle couldnt extract them in effect with combining multiple selects.
Let us see the below query,
select rownum from dual
union all
select rownum from dual
the result is
ROWNUM
1
1

Distinct Column in Hive

I am trying to get a query result in HiveQL with one column as distinct. However the results are not matching . There are almost 20 columns in the table.
create table uniq_us row format delimited fields terminated by ',' lines terminated by '\n' as select distinct(a),b,c,d,e,f,g,h,i,j from ctry_us_join;
The resulting number of Rows :513238
select count(distinct a) from ctry_us_join;
The resulting number of rows : 151616
How is this possible and is something wrong in my first or second query
U need to use subselect with group by statement.
select count(a) from (
select a, count(*) from ctry_us_join group by a) b
This is just one solution for this.
Distinct is a keyword, not a function. It applies to all columns you list in your select clause. It is quite reasonable that your table has only 151,616 distinct values in the column a, but multiple rows with the same value in the column a have different values in other columns. That might give you 513,238 distinct rows.

Resources