Does Hive have something equivalent to DUAL? - hadoop

I'd like to run statements like
SELECT date_add('2008-12-31', 1) FROM DUAL
Does Hive (running on Amazon EMR) have something similar?

Best solution is not to mention table name.
select 1+1;
Gives the result 2. But poor Hive need to spawn map reduce to find this!

Not yet: https://issues.apache.org/jira/browse/HIVE-1558

To create a dual like table in hive where there is one column and one row you can do the following:
create table dual (x int);
insert into table dual select count(*)+1 as x from dual;
Test an expression:
select split('3,2,1','\\,') as my_new_array from dual;
Output:
["3","2","1"]

There is a nice working solution (well, workaround) available in the link, but it is slow as you might imagine.
The idea is that you create a table with a dummy field, create a text file whose content is just 'X', load that text into that table. Viola.
CREATE TABLE dual (dummy STRING);
load data local inpath '/path/to/textfile/dual.txt' overwrite into table dual;
SELECT date_add('2008-12-31', 1) from dual;

Hive does support this function now and also does support many other dates function as well.
You can run query like below in hive, which will add days the provided date in first argument.
SELECT DATE_ADD('2019-03-01', 5);
Hive Date Functions

Quick Solution:
We can use existing table to achieve dual functionality by following query.
SELECT date_add('2008-12-31', 1) FROM <Any Existing Table> LIMIT 1
For example:
SELECT CONCAT('kbdjj','56454') AS a, null AS b FROM tbl_name LIMIT 1
Result
"limit 1" in query is used to avoid multiple occurrences of specified values (kbdjj56454,null).

Related

hive create view problem: internally casting numbers to something else

I'm facing an issue in hive while creating view from a partitioned table. If I use the command below:
create view test_view as select * from table where and year=2000 and month=01 and day=02;
The view gets created but the below selection results in 0 records:
select count(*) from test_view where day='02';
Whereas, the below selection will work just what it's meant to do:
select count(*) from test_view where day='2';
The following command also gives the count(*) result properly:
select count(*) from test_view where day=2;
The important thing here is that day=02 is a physical partition in the actual table, which is fine to the understanding. It's somehow the view creation is interpreting the input integers.
Anyone got any ideas on this?

Hive Generating ID

I'm trying to generate unique id's for a table that was originally done in DB2 using the following:
insert into database.table
select next value for database.sequence,
current_timestamp,
from source
Where the sequence has a defined start value (e.g 25430).
The code I'm currently using is:
insert into database.table
select
row_number() over() + select max(id) from table,
from_unixtime(unix_timestamp())
from source;
Which is fine apart from the nested select statement not working, at the moment I have to run
select max(id) from table
and put it into the query manually.
Can anyone suggest a way to do this in the one query?
You have to force a crossjoin, something like this:
select
...
from source,
(select max(id)as maxid from table) as m_id
;
This way you get one value for your max id back, and you can use that to generate your new one.
Generating surrogate keys with hive is kind of painful, sadly enough.

Insert timestamp into Hive

Hi i'm new to Hive and I want to insert the current timestamp into my table along with a row of data.
Here is an example of my team table :
team_id int
fname string
lname string
time timestamp
I have looked at some other examples, How to insert timestamp into a Hive table?, How can I add a timestamp column in hive and can't seem to get it to work.
This is what I am trying:
insert into team values('101','jim','joe',from_unixtime(unix_timestamp()));
The error I get is:
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
If anyone could help, that would be great, many thanks frostie
Can be achieved through current_timestamp() , but only via select clause. don't even require from clause in select statment.
insert into team select '101','jim','joe',current_timestamp();
or if your hive version doesn't support leaving from in select statment
insert into team select '101','jim','joe',current_timestamp() from team limit 1;
If you don't already have a table with at least one row, you can accomplish the desired result as such.
insert into team select '101','jim','joe',current_timestamp() from (select '123') x;

Bulk insert in oracle

I need to insert the huge records that are comes as Interface file(text files).
Now am using this format to insert records.
INSERT ALL
INTO POSTAL_CODE( postal_code,desc)
VALUES('100','Coimbatore')
INTO POSTAL_CODE (postal_code,desc)
VALUES('101','Mumbai') SELECT * FROM DUAL;
But this gives bad performance. I am new to database. So please help me to make faster inserting records. But in db2 this format is supports.
INSERT INTO POSTAL_CODE( postal_code,desc)
VALUES('100','Coimbatore'), (postal_code,desc),('101','Mumbai');
But why oracle is not support this type of insert. Please help me. Am stuck with this. I need to use another solution for this and that should be faster....
You can change the below statement
INSERT INTO POSTAL_CODE( postal_code,desc) VALUES('100','Coimbatore'),
(postal_code,desc),('101','Mumbai');
To be like below using UNION which should work in Oracle as well
INSERT INTO POSTAL_CODE( postal_code,"desc")
select '100','Coimbatore' from dual
union all
select '99','Goa' from dual
union all
select '101','Mumbai' from dual;
You should rather check the utilities provided by Oracle for this purpose like SQL*Loader
As well check this other SO post Loading data from a text file to a table in oracle

What is the dual table in Oracle?

I've heard people referring to this table and was not sure what it was about.
It's a sort of dummy table with a single record used for selecting when you're not actually interested in the data, but instead want the results of some system function in a select statement:
e.g. select sysdate from dual;
See http://www.adp-gmbh.ch/ora/misc/dual.html
As of 23c, Oracle supports select sysdate /* or other value */, without from dual, as has been supported in MySQL for some time already.
It is a dummy table with one element in it. It is useful because Oracle doesn't allow statements like
SELECT 3+4
You can work around this restriction by writing
SELECT 3+4 FROM DUAL
instead.
From Wikipedia
History
The DUAL table was created by Chuck Weiss of Oracle corporation to provide a table for joining in internal views:
I created the DUAL table as an underlying object in the Oracle Data Dictionary. It was never meant to be seen itself, but instead used
inside a view that was expected to be queried. The idea was that you
could do a JOIN to the DUAL table and create two rows in the result
for every one row in your table. Then, by using GROUP BY, the
resulting join could be summarized to show the amount of storage for
the DATA extent and for the INDEX extent(s). The name, DUAL, seemed
apt for the process of creating a pair of rows from just one. 1
It may not be obvious from the above, but the original DUAL table had two rows in it (hence its name). Nowadays it only has one row.
Optimization
DUAL was originally a table and the database engine would perform disk IO on the table when selecting from DUAL. This disk IO was usually logical IO (not involving physical disk access) as the disk blocks were usually already cached in memory. This resulted in a large amount of logical IO against the DUAL table.
Later versions of the Oracle database have been optimized and the database no longer performs physical or logical IO on the DUAL table even though the DUAL table still actually exists.
I think this wikipedia article may help clarify.
http://en.wikipedia.org/wiki/DUAL_table
The DUAL table is a special one-row
table present by default in all Oracle
database installations. It is suitable
for use in selecting a pseudocolumn
such as SYSDATE or USER The table has
a single VARCHAR2(1) column called
DUMMY that has a value of "X"
It's the special table in Oracle. I often use it for calculations or checking system variables. For example:
Select 2*4 from dual prints out the result of the calculation
Select sysdate from dual prints the server current date.
A utility table in Oracle with only 1 row and 1 column. It is used to perform a number of arithmetic operations and can be used generally where one needs to generate a known output.
SELECT * FROM dual;
will give a single row, with a single column named "DUMMY" and a value of "X" as shown here:
DUMMY
-----
X
Kind of a pseudo table you can run commands against and get back results, such as sysdate. Also helps you to check if Oracle is up and check sql syntax, etc.
The DUAL table is a special one-row table present by default in all Oracle database installations. It is suitable for use in selecting a pseudocolumn such as SYSDATE or USER
The table has a single VARCHAR2(1) column called DUMMY that has a value of "X"
You can read all about it in http://en.wikipedia.org/wiki/DUAL_table
DUAL is necessary in PL/SQL development for using functions that are only available in SQL
e.g.
DECLARE
x XMLTYPE;
BEGIN
SELECT xmlelement("hhh", 'stuff')
INTO x
FROM dual;
END;
More Facts about the DUAL....
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1562813956388
Thrilling experiments done here, and more thrilling explanations by Tom
DUAL we mainly used for getting the next number from the sequences.
Syntax : SELECT 'sequence_name'.NEXTVAL FROM DUAL
This will return the one row one column value(NEXTVAL column name).
another situation which requires select ... from dual is when we want to retrieve the code (data definition) for different database objects (like TABLE, FUNCTION, TRIGGER, PACKAGE), using the built in DBMS_METADATA.GET_DDL function:
select DBMS_METADATA.GET_DDL('TABLE','<table_name>') from DUAL;
select DBMS_METADATA.GET_DDL('FUNCTION','<function_name>') from DUAL;
in is true that nowadays the IDEs do offer the capability to view the DDL of a table, but in simpler environments like SQL Plus this can be really handy.
EDIT
a more general situation: basically, when we need to use any PL/SQL procedure inside a standard SQL statement, or when we want to call a procedure from the command line:
select my_function(<input_params>) from dual;
both recipes are taken from the book 'Oracle PL/SQL Recipes' by Josh Juneau and Matt Arena
The DUAL is special one row, one column table present by default in all Oracle databases. The owner of DUAL is SYS.
DUAL is a table automatically created by Oracle Database along with the data functions. It is always used to get the operating systems functions(like date, time, arithmetic expression., etc.)
SELECT SYSDATE from dual;
It's a object to put in the from that return 1 empty row. For example:
select 1 from dual;
returns 1
select 21+44 from dual;
returns 65
select [sequence].nextval from dual;
returns the next value from the sequence.

Resources