Invalid result for Order By in postgresql - sql-order-by

I was trying to order a table "main_table" by date (which is a float), my initial table looks like this (select date from main_table):
date
20160424105948
20160424045955
20160424050000
20160418170003
20160419233154
20160419233155
so by creating another table with ordered rows by "Order by date", i get some rows which are not correctly ordered:
create temp table tmp (like main_table including defaults);
insert into tmp (select * from compact_table order by date asc)
and it i get (select date from tmp) :
date
20160418170908
20160418170909
20160418170910
20160420110031 <<
20160418170911
'date' is a primary key, and i have up to 600000 rows in my table, am I doing something wrong ?

Related

How to use derived columns in same hive table?

Could you please help me below query.
Suppose there is table employee and columns A , B and Date column.
I have to load data from table employee to another table emp with below transformation applied
Transformation in Employee table
Absolute value of column A - (column name in emp wil be ABS_A)
Absolute value of column B -(column name in emp wil be ABS_B)
Find the sum(ABS_A) for a given Date column
4.Find the sum(ABS_b) for a given Date column
Find sum(ABS_A)/sum(ABS_B) - column name will be Average.
So the Final table emp will have below columns
1.A
2.B
3.ABS_A
4.ABS_B
5.Average
How to handle such derived column in hive?
I tried below query but now working. could anyone guide me.
insert overwrite into emp
select
A,
B,
ABS(A) as ABS_A,
ABS(B) as ABS_B,
sum(ABS_A) OVER PARTION BY DATE AS sum_OF_A,
sum(ABS_B) OVER PARTTION BY DATE AS sum_of_b,
avg(sum_of_A,sum_of_b) over partition by date as average
from employee
Hive does not support using derived columns in the same subquery level. Use subqueries or functions in place of column aliases.
insert overwrite table emp
select A, B, ABS_A, ABS_B, sum_OF_A, sum_of_b, `date`, sum_OF_A/sum_of_b as average
from
(
select A, B, ABS(A) as ABS_A, ABS(B) as ABS_B, `date`,
sum(ABS(A)) OVER (PARTTION BY DATE) AS sum_OF_A,
sum(ABS(B)) OVER (PARTTION BY DATE) AS sum_of_b
from employee
)s;

check if row exists before inserting in athena table

Expected result
Result i am getting when same row is added
I want to check if the same row exists before inserting into athena to avoid any duplicate rows in table abc.
Is it possible to use "check if exists" in athena table. The below code is being used to add data in table abc.
insert into table abc
with tmp as(
select
date,
price1,
price2
from
table2
)
select
*
from
tmp
where
price1 > 100
;
You can definitely check if rows already exist - you could work with where not exists:
NOT EXISTS is satisfied if no rows are returned by the subquery
with tmp as(
select
date,
price1,
price2
from
table2
)
select
*
from
tmp
where price1 > 100
and not exists (select 1 from abc where abc.date = tmp.date)
However, keep in mind that Athena can't update data that is once written.

Showing NULL values after adding column in hive

i am using hive-version 1.2.1. i m newbie to hive.
i have added a column to TABLE_2 and shows NULL value. i want to put DATE part from timestamp column to newly created column. i tried with below query:
ALTER TABLE table_2 ADD COLUMNS(DATE_COL string);
INSERT INTO table_2 (DATE_COL) AS SELECT SUBSTRING(TIMESTAMP_COL,-19,10) FROM table_1 ;
this is working bt still it shows NULL values in newly created DATE_COL.
i want just date in DATE_COL.
table_1 has 13 columns, table_2 has 14 columns (13 + DATE_COL).
TIMESTAMP_COL :- STRING.
DATE_COL - STRING.
please tell me how to solve this problem.
Use UPDATE command :
Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
Hive version 0.14.0: INSERT...VALUES, UPDATE, and DELETE are now available with full ACID support.

hive : select row with column having maximum value without join

writing hive query over a table to pick the row with maximum value in column
there is table with following data for example:
key value updated_at
1 "a" 1
1 "b" 2
1 "c" 3
the row which is updated last needs to be selected.
currently using following logic
select tab1.* from table_name tab1
join select tab2.key , max(tab2.updated_at) as max_updated from table_name tab2
on tab1.key=tab2.key and tab1.updated_at = tab2.max_updated;
Is there any other better way to perform this?
If it is true that updated_at is unique for that table, then the following is perhaps a simpler way of getting you what you are looking for:
-- I'm using Hive 0.13.0
SELECT * FROM table_name ORDER BY updated_at DESC LIMIT 1;
If it is possible for updated_at to be non-unique for some reason, you may need to adjust the ORDER BY logic to break any ties in the fashion you wish.

Update query resulting wrongly

I have table called company_emp. In that table I have 6 columns related to employees:
empid
ename
dob
doj, ...
I have another table called bday. In that I have only 2 columns; empid and dob.
I have this query:
select empid, dob
from company_emp
where dob like '01/05/2011'
It shows some list of employees.
In the same way I have queried with table bday it listed some employees.
Now I want to update the company_emp table for employees who have date '01/05/2011'.
I have tried a query like this:
update company_name a
set dob = (select dob from bday b
where b.empid=a.empid
and to_char(a.dob,'dd/mm/yyyy') = '01/05/2011'}
Then all the records in that row becoming null. How can I fix this query?
You're updating every row in the company_name/emp table.
You can fix that with a correlated subquery to make sure the row exists, or more efficiently by placing a primary or unique key on bday.empid and querying:
update (
select c.dob to_dob,
d.dob from_dob
from company_emp c join dob d on (c.empid = d.empid)
where d.dob = date '2011-05-01')
set to_dob = from_dob
Syntax not tested.

Resources