Parsing date format to join in hive - hadoop

I have a date field which is of type String and in the format:
03/11/2001
And I want to join it with another column, which is in a different String format:
1855-05-25 12:00:00.0
How can I join both columns efficiently in hive, ignoring the time part of the second column?
My query looks like below:
LEFT JOIN tabel1 t1 ON table2.Date=t1.Date

Since you have both the date values in different formats you need to use the date functions for both and convert it to a similar format of date type in your join query. It would be something like this :
LEFT JOIN tabel1 t1 ON unix_timestamp(table2.Date, 'yyyy-MM-dd HH:mm:ss.S')table2.Date=unix_timestamp(t1.Date,'MM/dd/yyyy')
You could refer this and this for the hive in built date functions.

convert the dates into same format
to_date(table2.date) = to_date(t1.date)

Related

How to select named tuple/json/map with multiple types from subquery in clickhouse?

I'm struggling with a query, where I must select a group of named tuples from a subquery result. Sample code looks like this:
SELECT 123456 as productId,
ordersHistory # <- this must be a named tuple like [{'date': '2022-11-10', 'orders': 4}], but now it's only [('2022-11-10', 4)]
FROM products
GLOBAL LEFT JOIN (
SELECT groupArray(tuple(date, orders)) ordersHistory,
123456 as productId,
FROM (
SELECT date, orders
FROM orders
WHERE productId = 123456 AND date BETWEEN '2022-10-10' AND '2022-11-10'
)
GROUP BY productId
) orders ON products.productId = orders.productId
I've tried maps, but they expect to have a fixed subtype, which is not applicable to my result subset. Cast output format to JSON in subquery is not working either. And I have no clue, how to make named tuples in other ways.
I expect ordersHistory subset of result to be an array of key->value pairs.
Found a way to workaround this problem.
First of all, we should transform our default tuple to named tuple
cast(tuple(date, orders), Tuple(date Date, orders UInt64)) AS namedTuple
Named tuples can be cast to JSON
namedTuple::JSON as jsonedTuple
But groupArray could not work with JSON. So we should transform JSON to some type, which is appliable to group array. Strings are the case
toString(jsonedTuple) as stringifiedJson
And this stringified JSON could be grouped.
groupArray(stringifiedJSON)
This transformes into SELECT subquery
SELECT groupArray(toString(cast(tuple(date, orders), Tuple(date Date, orders UInt64))::JSON))
Yes. This looks awful, but it does it's job and this is only one thing, that I've found to workaround this case.

Oracle: filter query on datetime

I need to restrict a query with a
SELECT ... FROM ...
WHERE my_date=(RESULT FROM A SELECT)
... ;
in order to achieve that I am using as result of the select a timestamp (if I instead use a datetime I get nothing from my select probably because the format I am using trims the datetime at the second).
Sadly this is not working because these kindo of queries:
select DISTINCT TO_DATE(TO_TIMESTAMP(TO_DATE('25-10-2017 00:00', 'dd-MM-yyyy HH24:MI'))) from DUAL;
return an
ORA-01830: date format picture ends before converting entire input string
how to deal with timestamp to date conversion?
If you want to just compare and check only he dates use trunc on both LHS and RHS.
SELECT ... FROM ...
WHERE trunc(my_date)=(select trunc(RESULT) FROM A)
... ;
This will just compare the dates by truncating the timestamp values
You can use the combination of "TRUNC" and "IN" keywords in your query to achieve what you are expecting. Please check the below query sample as a reference.
SELECT * FROM customer WHERE TRUNC(last_update_dt) IN (select DISTINCT (TRUNC(last_update_dt)) from ... )
Cheers !!

Date format in Oracle- fetching Date of certain range

I have a date table in my db in Oracle. When I run a query I get the date format as '01-05-2015' but when I run a similar query in BIRT, I get the date format as '01-MAY-2015 12:00 AM'. How can I get the date format in dd/mm/yyy by keeping the data type of date field as date.
here is sample of my database.
EQ_DT
05-07-2015
06-06-2015
15-02-2015
19-09-2015
28-12-2015
also my query is :
select to_date(to_char(to_date(enquiry_dt,'DD/MM/YYYY'),'DD/MM/YY'),'DD/MM/YY') as q from xxcus.XXACL_SALES_ENQ_DATAMART where to_date(to_char(to_date(enquiry_dt,'DD/MM/YY'),'DD/MM/YY'),'DD/MM/YY')>'21-06-2012' order by q
I am getting error of NOT A VALID Month also
If enquiry_dt is already a date column, why are you trying to convert it to date (and then to char and to date again)?
SELECT to_char(enquiry_dt, 'DD/MM/YYYY') AS q
FROM xxcus.xxacl_sales_enq_datamart
WHERE enquiry_dt > to_date('21-06-2012', 'dd-mm-yyyy')
ORDER BY enquiry_dt
In birt, where you place the field on the report, set the field type to date. Then in properties for that field , go to format date time, and finally specify the date formatting you want for that field .
I prefer to always use pass date parameters as strings to BIRT, using a known date format. This is for report parameters as well as for DataSet parameters.
Then, inside the query, I convert to date like this:
with params as
( select to_date(pi_start_date_str, 'DD.MM.YYYY') as start_date_incl,
to_date(pi_end_date_str, 'DD.MM.YYYY') + 1 as end_date_excl
from dual
)
select whatever
from my_table, params
where ( my_table.event_date >= params.start_date_incl
and
my_table.end_date < params.start_date_excl
)
This works independent of the time of day.
This way, e.g. to select all events for january 2016, I could pass the query parameters '01.01.2016' and '31.01.2016' (I'm using german date format here).

Compare date to month-year in Postgres/Ruby

I have a date column in my table and I would like to 'filter'/select out items after a certain year-month. So if I have data from 2010 on, I have a user input that specifies '2011-10' as the 'earliest date' they want to see data from.
My current SQL looks like this:
select round(sum(amount), 2) as amount,
date_part('month', date) as month
from receipts join items
on receipts.item = items.item
where items.expense = ?
and date_part('year', date)>=2014
and funding = 'General'
group by items.expense, month, items.order
order by items.order desc;
In the second part of the 'where', instead of doing year >= 2014, I want to do something like to_char(date, 'YY-MMMM') >= ? as another parameter and then pass in '2011-10'. However, when I do this:
costsSql = "select round(sum(amount), 2) as amount,
to_char(date, 'YY-MMMM') as year_month
from receipts join items
on receipts.item = items.item
where items.expense = ?
and year_month >= ?
and funding = 'General'
group by items.expense, year_month, items.order
order by items.order desc"
and call that with my two params, I get a postgres error: PG::UndefinedColumn: ERROR: column "year_month" does not exist.
Edit: I converted my YYYY-MM string into a date and passed that in as my param instead and it's working. But I still don't understand why I get the 'column does not exist' error after I created that column in the select clause - can someone explain? Can columns created like that not be used in where clauses?
This error: column "year_month" does not exist happens because year_month is an alias defined the SELECT-list and such aliases can't be refered to in the WHERE clause.
This is based on the fact that the SELECT-list is evaluated after the WHERE clause, see for example: Column alias in where clause? for an explanation from PG developers.
Some databases allow it nonetheless, others don't, and PostgreSQL doesn't. It's one of the many portability hazards between SQL engines.
In the case of the query shown in the question, you don't even need the to_char in the WHERE clause anyway, because as mentioned in the first comment, a direct comparison with a date is simpler and more efficient too.
When a query has a complex expression in the SELECT-list and repeating it in the WHERE clause looks wrong, sometimes it might be refactored to move the expression into a sub-select or a WITH clause at the beginning of the query.

Get the sysdate -1 in Hive

Is there any way to get the current date -1 in Hive means yesterdays date always?
And in this format- 20120805?
I can run my query like this to get the data for yesterday's date as today is Aug 6th-
select * from table1 where dt = '20120805';
But when I tried doing this way with date_sub function to get the yesterday's date as the below table is partitioned on date(dt) column.
select * from table1 where dt = date_sub(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(),
'yyyyMMdd')) , 1) limit 10;
It is looking for the data in all the partitions? Why? Something wrong I am doing in my query?
How I can make the evaluation happen in a subquery to avoid the whole table scanned?
Try something like:
select * from table1
where dt >= from_unixtime(unix_timestamp()-1*60*60*24, 'yyyyMMdd');
This works if you don't mind that hive scans the entire table. from_unixtime is not deterministic, so the query planner in Hive won't optimize for you. For many cases (for example log files), not specifying a deterministic partition key can cause a very large hadoop job to start since it will scan the whole table, not just the rows with the given partition key.
If this matters to you, you can launch hive with an additional option
$ hive -hiveconf date_yesterday=20150331
And in the script or hive terminal use
select * from table1
where dt >= ${hiveconf:date_yesterday};
The name of the variable doesn't matter, nor does the value, you can set them in this case to get the prior date using unix commands. In the specific case of the OP
$ hive -hiveconf date_yesterday=$(date --date yesterday "+%Y%m%d")
In mysql:
select DATE_FORMAT(curdate()-1,'%Y%m%d');
In sqlserver :
SELECT convert(varchar,getDate()-1,112)
Use this query:
SELECT FROM_UNIXTIME(UNIX_TIMESTAMP()-1*24*60*60,'%Y%m%d');
It looks like DATE_SUB assumes date in format yyyy-MM-dd. So you might have to do some more format manipulation to get to your format. Try this:
select * from table1
where dt = FROM_UNIXTIME(
UNIX_TIMESTAMP(
DATE_SUB(
FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd')
, 1)
)
, 'yyyyMMdd') limit 10;
Use this:
select * from table1 where dt = date_format(concat(year(date_sub(current_timestamp,1)),'-', month(date_sub(current_timestamp,1)), '-', day(date_sub(current_timestamp,1))), 'yyyyMMdd') limit 10;
This will give a deterministic result (a string) of your partition.
I know it's super verbose.

Resources