What is the equivalent of hive query for the below sql - oracle

What is the equivalent query in hive for
select to_char(trunc(sysdate,'iw')-1)

You could go about this at least two ways:
Either implement your own function in hive using the UDF capability
OR
Use a manipulation of a case statement and modulus of the date:
In rough code this would be something like:
select pmod(datediff(date_column,'2012-01-02'),7)+1 as day_of_week, case when day_of_week = 1 then date_column else when day_of_week = 2 then date_add(date_column,-1)
etc.

Related

Return Boolean value when table has data in the specified range

I need a query to return boolean when there's table has data in the given range.
Assume table
Customer
[User ID, Name, Date, Products_Purchased]
I'm trying to do:
select case when exists(
select Date, count(*)
from Customer
where date between '2015-08-03' and '2015-08-05'
)
then cast(1 as BIT)
else case(0 as BIT)end;
This is throwing an error near "select Date".
However, weird part is the inner query is running perfectly fine.
Im wondering if im missing out something here !
What about something more straightforward e.g.
select case when count(*) >0 then 1 else 0 end as HIT
from ... where ...
That way you don't have to bother about Hive assuming that EXISTS implies a correlated sub-query, automagically translated into a MapJoin, i.e. a Java HashMap shuffled to the 2nd line of Mappers jobs, etc. Not exactly your use case.
Then it's not useful to compute the exact count, so the query could be refined as
select case when count(*) >0 then 1 else 0 end as HIT
from
(select ... from ... where ... limit 1) X
[Edit] There is no "bit" datatype in Hive. But the default "int" should be OK if you just want a return flag (zero / non-zero)

Compare date to month-year in Postgres/Ruby

I have a date column in my table and I would like to 'filter'/select out items after a certain year-month. So if I have data from 2010 on, I have a user input that specifies '2011-10' as the 'earliest date' they want to see data from.
My current SQL looks like this:
select round(sum(amount), 2) as amount,
date_part('month', date) as month
from receipts join items
on receipts.item = items.item
where items.expense = ?
and date_part('year', date)>=2014
and funding = 'General'
group by items.expense, month, items.order
order by items.order desc;
In the second part of the 'where', instead of doing year >= 2014, I want to do something like to_char(date, 'YY-MMMM') >= ? as another parameter and then pass in '2011-10'. However, when I do this:
costsSql = "select round(sum(amount), 2) as amount,
to_char(date, 'YY-MMMM') as year_month
from receipts join items
on receipts.item = items.item
where items.expense = ?
and year_month >= ?
and funding = 'General'
group by items.expense, year_month, items.order
order by items.order desc"
and call that with my two params, I get a postgres error: PG::UndefinedColumn: ERROR: column "year_month" does not exist.
Edit: I converted my YYYY-MM string into a date and passed that in as my param instead and it's working. But I still don't understand why I get the 'column does not exist' error after I created that column in the select clause - can someone explain? Can columns created like that not be used in where clauses?
This error: column "year_month" does not exist happens because year_month is an alias defined the SELECT-list and such aliases can't be refered to in the WHERE clause.
This is based on the fact that the SELECT-list is evaluated after the WHERE clause, see for example: Column alias in where clause? for an explanation from PG developers.
Some databases allow it nonetheless, others don't, and PostgreSQL doesn't. It's one of the many portability hazards between SQL engines.
In the case of the query shown in the question, you don't even need the to_char in the WHERE clause anyway, because as mentioned in the first comment, a direct comparison with a date is simpler and more efficient too.
When a query has a complex expression in the SELECT-list and repeating it in the WHERE clause looks wrong, sometimes it might be refactored to move the expression into a sub-select or a WITH clause at the beginning of the query.

Dynamic order by date data type in Oracle using CASE

My code in the stored procedure:
SELECT * FROM
my_table ir
WHERE
--where clause goes here
ORDER BY
CASE WHEN p_order_by_field='Id' AND p_sort_order='ASC' THEN IR.ID end,
CASE WHEN p_order_by_field='Id' AND p_sort_order='DESC' THEN IR.ID end DESC,
CASE WHEN p_order_by_field='Date' AND p_sort_order='ASC' THEN TO_CHAR(IR.IDATE, 'MM/dd/yyyy') end,
CASE WHEN p_order_by_field='Date' AND p_sort_order='DESC' THEN TO_CHAR(IR.IDATE, 'MM/dd/yyyy') end DESC;
Problem is that sorting is done based on the char, which comes out wrong for the date case. CASE statement, however, won't allow any other datatype other than char. So what is the solution in this case? I need to be able to pass the p_order_by_field into the stored procedure.
Thanks
Should be simple - just use ISO date format in your case:
TO_CHAR(IR.IDATE, 'yyyy-mm-dd')
and you should be fine.
Another problem could occure when you want to sort on the date difference (let say number of days between two days).
For example such a sort would return number 13 (days) before 9 (days).
The solution is that you concatenate length of date difference and the difference itself:
length(trunc(date2) - trunc(date1)) || to_char(date2 - date1)

Get the sysdate -1 in Hive

Is there any way to get the current date -1 in Hive means yesterdays date always?
And in this format- 20120805?
I can run my query like this to get the data for yesterday's date as today is Aug 6th-
select * from table1 where dt = '20120805';
But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column.
select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(),
'yyyyMMdd')) , 1) limit 10;
It is looking for the data in all the partitions? Why? Something wrong I am doing in my query?
How I can make the evaluation happen in a subquery to avoid the whole table scanned?
Try something like:
select * from table1
where dt >= from_unixtime(unix_timestamp()-1*60*60*24, 'yyyyMMdd');
This works if you don't mind that hive scans the entire table. from_unixtime is not deterministic, so the query planner in Hive won't optimize for you. For many cases (for example log files), not specifying a deterministic partition key can cause a very large hadoop job to start since it will scan the whole table, not just the rows with the given partition key.
If this matters to you, you can launch hive with an additional option
$ hive -hiveconf date_yesterday=20150331
And in the script or hive terminal use
select * from table1
where dt >= ${hiveconf:date_yesterday};
The name of the variable doesn't matter, nor does the value, you can set them in this case to get the prior date using unix commands. In the specific case of the OP
$ hive -hiveconf date_yesterday=$(date --date yesterday "+%Y%m%d")
In mysql:
select DATE_FORMAT(curdate()-1,'%Y%m%d');
In sqlserver :
SELECT convert(varchar,getDate()-1,112)
Use this query:
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()-1*24*60*60,'%Y%m%d');
It looks like DATE_SUB assumes date in format yyyy-MM-dd. So you might have to do some more format manipulation to get to your format. Try this:
select * from table1
where dt = FROM_UNIXTIME(
UNIX_TIMESTAMP(
DATE_SUB(
FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')
, 1)
)
, 'yyyyMMdd') limit 10;
Use this:
select * from table1 where dt = date_format(concat(year(date_sub(current_timestamp,1)),'-', month(date_sub(current_timestamp,1)), '-', day(date_sub(current_timestamp,1))), 'yyyyMMdd') limit 10;
This will give a deterministic result (a string) of your partition.
I know it's super verbose.

DATEDIFF (in months) in linq

select col1,col2,col3 from table1
where(DATEDIFF(mm, tblAccount.[State Change Date], GETDATE()) <= 4
I want to convert this sql query to LINQ. but I dont know any DateDiff alternative in LINQ. can you please suggest me?
You're looking for SqlMethods.DateDiffMonth.
EDIT: In EF4, use SqlFunctions.DateDiff.
Putting aside your original question for a moment, in your query you use:
where(DATEDIFF(mm, tblAccount.[State Change Date], GETDATE()) <= 4
This query would always cause a full table scan, since you're comparing the result of a function call against a constant. It would be much better if you calculate your date first, then compare your column value against the calculated value, which would allow SQL to use an index to find the results instead of having to evaluate every record in your table.
It looks like you're trying to retrieve anything within the past 4 months, so in your application code, try calculating the date that you can compare against first, and pass that value into your Linq2Entities expression:
DateTime earliestDate = new DateTime(DateTime.Now.Year, DateTime.Now.Month, 1).AddMonths(-4);
var results = from t in context.table1
where t.col3 >= earliestDate
select t;
In EF6, the class to use is DbFunctions. See, for example, DbFunctions.DiffMonths.
I solved this problem in another manner: I calculated the date from code:
theDate = Date.Now.AddMonths(-4)
And in EF change the condition:
tblAccount.[State Change Date] < theDate

Resources