How to extract last 7 days of rows from the max date in shell - bash

I'm passing max(pay_date) to a variable Max_date in Shell from Hive table.
The datatype of pay_date field is Date.
I want to extract 7 days of pay_date from Max_date of pay_date from the table.
I used below script to get...
#!/bin/bash
Max_date=$(hive -e "select max(pay_date) from dbname.tablename;")
hive -e "select pay_date from dbname.tablename where pay_date >= date_sub(\"$Max_date\",7);"
It's not giving me any output.
I'm stuck with passing a variable which has date value and use that in date_sub function for last 7 days of rows.
Please let me know if I'm missing some absolute basics.

You can run query like this:
hive -e "select o1.order_date from orders o1 join (select max(order_date) order_date from orders) o2 on 1=1 where o1.order_date >= date_sub(o2.order_date, 7);
Also you can use this as part of shell script:
max_date=$(hive -e "select max(order_date) from orders" 2>/dev/null)
hive -e "select order_date from orders where order_date >= date_sub('$max_date', 7);"

Related

MIN function behavior changed on Oracle databases after SAS Upgrade to 9.4M7

I have a program that has been working for years. Today, we upgraded from SAS 9.4M3 to 9.4M7.
proc setinit
Current version: 9.04.01M7P080520
Since then, I am not able to get the same results as before the upgrade.
Please note that I am querying on Oracle databases directly.
Trying to replicate the issue with a minimal, reproducible SAS table example, I found that the issue disappear when querying on a SAS table instead of on Oracle databases.
Let's say I have the following dataset:
data have;
infile datalines delimiter="|";
input name :$8. id $1. value :$8. t1 :$10.;
datalines;
Joe|A|TLO
Joe|B|IKSK
Joe|C|Yes
;
Using the temporary table:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have
group by name;
quit;
Results:
name A
Joe TLO
However, when running the very same query on the oracle database directly I get a missing value instead:
proc sql;
create table want as
select name,
min(case when id = "A" then value else "" end) as A length 8
from have_oracle
group by name;
quit;
name A
Joe
As per documentation, the min() function is behaving properly when used on the SAS table
The MIN function returns a missing value (.) only if all arguments are missing.
I believe this happens when Oracle don't understand the function that SAS is passing it - the min functions in SAS and Oracle are very different and the equivalent in SAS would be LEAST().
So my guess is that the upgrade messed up how is translates the SAS min function to Oracle, but it remains a guess. Does anyone ran into this type of behavior?
EDIT: #Richard's comment
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
proc sql;
create table want as
select t1.name,
min(case when id = 'A' then value else "" end) as A length 8
from oracle_db.names t1 inner join oracle_db.ids t2 on (t1.tid = t2.tid)
group by t1.name;
ORACLE_26: Prepared: on connection 0
SELECT * FROM NAMES
ORACLE_27: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='NAMES' AND
UIC.TABLE_NAME='NAMES' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_28: Executed: on connection 1
SELECT statement ORACLE_27
ORACLE_29: Prepared: on connection 0
SELECT * FROM IDS
ORACLE_30: Prepared: on connection 1
SELECT UI.INDEX_NAME, UIC.COLUMN_NAME FROM USER_INDEXES UI,USER_IND_COLUMNS UIC WHERE UI.TABLE_NAME='IDS' AND
UIC.TABLE_NAME='IDS' AND UI.INDEX_NAME=UIC.INDEX_NAME
ORACLE_31: Executed: on connection 1
SELECT statement ORACLE_30
ORACLE_32: Prepared: on connection 0
select t1."NAME", MIN(case when t2."ID" = 'A' then t1."VALUE" else ' ' end) as A from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID" group by t1."NAME"
ORACLE_33: Executed: on connection 0
SELECT statement ORACLE_32
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.
NOTE: Table WORK.SELECTED_ATTR created, with 1 row and 2 columns.
! quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.34 seconds
cpu time 0.09 seconds
Use the SASTRACE= system option to log SQL statements sent to the DBMS.
options SASTRACE=',,,d';
will provide the most detailed logging.
From the prepared statement you can see why you are getting a blank from the Oracle query.
select
t1."NAME"
, MIN ( case
when t2."ID" = 'A' then t1."VALUE"
else ' '
end
) as A
from
NAMES t1 inner join IDS t2 on t1."TID" = t2."TID"
group by
t1."NAME"
The SQL MIN () aggregate function will exclude null values from consideration.
In SAS SQL, a blank value is also interpreted as null.
In SAS your SQL query returns the min non-null value TLO
In Oracle transformed query, the SAS blank '' is transformed to ' ' a single blank character, which is not-null, and thus ' ' < 'TLO' and you get the blank result.
The actual MIN you want to force in Oracle is min(case when id = "A" then value else null end) which #Tom has shown is possible by omitting the else clause.
The only way to see the actual difference is to run the query with trace in the prior SAS version, or if lucky, see the explanation in the (ignored by many) "What's New" documents.
Why are you using ' ' or '' as the ELSE value? Perhaps Oracle is treating a string with blanks in it differently than a null string.
Why not use null in the ELSE clause?
or just leave off the ELSE clause and let it default to null?
libname mylib oracle .... ;
proc sql;
create table want as
select name
, min(case when id = "A" then value else null end) as A length 8
from mylib.have_oracle
group by name
;
quit;
Also try running the Oracle code yourself, instead of using implicit pass thru.
proc sql;
connect to oracle ..... ;
create table want as
select * from connection to oracle
(
select name,
min(case when id = "A" then value else null end) as A length 8
from have_oracle
group by name
)
;
quit;
When I try to reproduce this in Oracle I get the result you are looking for so I suspect it has something to do with SAS (which I'm not familiar with).
with t as (
select 'Joe' name, 'A' id, 'TLO' value from dual union all
select 'Joe' name, 'B' id, 'IKSK' value from dual union all
select 'Joe' name, 'C' id, 'Yes' value from dual
)
select name
, min(case when id = 'A' then value else '' end) as a
from t
group by name;
NAME A
---- ----
Joe TLO
Unrelated, if you are only interested in id = 'A' then a better query would be:
select name
, min(value) as a
from t
where id = 'A'
group by name;

Hive query performance is slow when using Hive date functions instead of hardcoded date strings?

I have a transaction table table_A that gets updated every day. Every day I insert new data into table_A from external table_B using the file_date field to filter the necessary data from external table_B to insert into table_A. However, there's a huge performance difference if I use a hardcoded date vs. using the Hive date functions:
-- Fast version (~20 minutes)
SET date_ingest = '2016-12-07';
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.dynamic.partition = TRUE;
INSERT
INTO
TABLE
table_A PARTITION (FILE_DATE) SELECT
id, eventtime
,CONCAT_WS( '-' ,substr ( eventtime ,0 ,4 ) ,SUBSTRING( eventtime ,5 ,2 ) ,SUBSTRING( eventtime ,7 ,2 ) )
FROM
table_B
WHERE
file_date = ${hiveconf:date_ingest}
;
compared to:
-- Slow version (~9 hours)
SET date_ingest = date_add(to_date(from_unixtime( unix_timestamp( ) )),-1);
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.dynamic.partition = TRUE;
INSERT
INTO
TABLE
table_A PARTITION (FILE_DATE) SELECT
id, eventtime
,CONCAT_WS( '-' ,substr ( eventtime ,0 ,4 ) ,SUBSTRING( eventtime ,5 ,2 ) ,SUBSTRING( eventtime ,7 ,2 ) )
FROM
table_B
WHERE
file_date = ${hiveconf:date_ingest}
;
Has anyone experienced similar issues? You should assume that I don't have access to the Unix hive command (i.e. can't use --hiveconf options) since we're using a third party UI.
Sometimes partition pruning does not work when using functions in filter clause. If you calculate the variable in the wrapper shell script and pass it as -hiveconf variable to the Hive, it will work fine.
Example:
#inside shell script
date_ingest=$(date -d '-1 day' +%Y-%m-%d)
hive -f your_script.hql -hiveconf date_ingest="$date_ingest"
Then use it inside Hive script as WHERE file_date ='${hiveconf:date_ingest}'

Passing Numeric Variable into Simple oracle query

I have table
TableA
ID PRICE DATE
123 200 01-SEP-2015
456 500 01-AUG-2015
In my query i want to show date when (date + X months)< sysdate else show null.
I have issue using variable in that query.
Forexample in record 1
(01-SEP-2015 + 3 months) = 01-DEC-2015 is not less than sysdate so, I don't want to show that date.
but on the other hand for second record
(01-AUG-2015 + 3 months) = 01-NOV-2015 is less than sysdate so, I want to show that date.
declare
x number(5);
set x:=3;
select
a.ID,
a.PRICE,
case when( ADD_MONTHS(a.DATE, x) < sysdate() )
then a.DATE
else
NULL
end as NEW_DATE
from TableA a;
You need to put an ampersand (&) before the variable name to tell SQL*Plus to substitute in the value of the variable:
declare
x number(5);
set x:=3;
select a.ID,
a.PRICE,
case
when ADD_MONTHS(a.DATE, &x) < sysdate then a.DATE
else NULL
end as NEW_DATE
from TableA a;
Also, DATE is a data type in Oracle - don't use it as a column name. You can do that but it's going to cause problems down the line somewhere.
Best of luck.

how to pass string(s) stored in a variable to sybase SQL IN clause from bash script

I have an array of say empId's which are of typr String , now i want to pass it in IN clause of SQL from bash.
I had tried below
#make an array of empIds
empIdsarray=(e123 e456 e675 e897)
for j in "${empIdsarray[#]}"
do
inclause=\"$j\",$inclause
done
#remove the trailing comma below
inclause=`echo $inclause|sed 's/,$//'
`isql -U$user -P$pwd -D$db -S$server <<< QRY > tempRS.txt
select * from emp where empId IN ($inclause)
go
quit
go
QRY`
i had tried IN('$inclause') as well but nothing is working , the output is blank although when i run in DB directly it gives result .
Any help is appreciated .
#it should execute like
select * from emp where empId IN ("e123", "e456", "e675", "e897")
thanks in advance .
In BASH you can do:
#!/bin/bash
#make an array of empIds
empIdsarray=(e123 e456 e675 e897)
printf -v inclause '"%s",' "${empIdsarray[#]}"
isql -U$user -P$pwd -D$db -S$server <<QRY > tempRS.txt
select * from emp where empId IN (${inclause%,})
go
quit
go
QRY

Get the sysdate -1 in Hive

Is there any way to get the current date -1 in Hive means yesterdays date always?
And in this format- 20120805?
I can run my query like this to get the data for yesterday's date as today is Aug 6th-
select * from table1 where dt = '20120805';
But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column.
select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(),
'yyyyMMdd')) , 1) limit 10;
It is looking for the data in all the partitions? Why? Something wrong I am doing in my query?
How I can make the evaluation happen in a subquery to avoid the whole table scanned?
Try something like:
select * from table1
where dt >= from_unixtime(unix_timestamp()-1*60*60*24, 'yyyyMMdd');
This works if you don't mind that hive scans the entire table. from_unixtime is not deterministic, so the query planner in Hive won't optimize for you. For many cases (for example log files), not specifying a deterministic partition key can cause a very large hadoop job to start since it will scan the whole table, not just the rows with the given partition key.
If this matters to you, you can launch hive with an additional option
$ hive -hiveconf date_yesterday=20150331
And in the script or hive terminal use
select * from table1
where dt >= ${hiveconf:date_yesterday};
The name of the variable doesn't matter, nor does the value, you can set them in this case to get the prior date using unix commands. In the specific case of the OP
$ hive -hiveconf date_yesterday=$(date --date yesterday "+%Y%m%d")
In mysql:
select DATE_FORMAT(curdate()-1,'%Y%m%d');
In sqlserver :
SELECT convert(varchar,getDate()-1,112)
Use this query:
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()-1*24*60*60,'%Y%m%d');
It looks like DATE_SUB assumes date in format yyyy-MM-dd. So you might have to do some more format manipulation to get to your format. Try this:
select * from table1
where dt = FROM_UNIXTIME(
UNIX_TIMESTAMP(
DATE_SUB(
FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')
, 1)
)
, 'yyyyMMdd') limit 10;
Use this:
select * from table1 where dt = date_format(concat(year(date_sub(current_timestamp,1)),'-', month(date_sub(current_timestamp,1)), '-', day(date_sub(current_timestamp,1))), 'yyyyMMdd') limit 10;
This will give a deterministic result (a string) of your partition.
I know it's super verbose.

Resources