Hive Generating ID - hadoop

I'm trying to generate unique id's for a table that was originally done in DB2 using the following:
insert into database.table
select next value for database.sequence,
current_timestamp,
from source
Where the sequence has a defined start value (e.g 25430).
The code I'm currently using is:
insert into database.table
select
row_number() over() + select max(id) from table,
from_unixtime(unix_timestamp())
from source;
Which is fine apart from the nested select statement not working, at the moment I have to run
select max(id) from table
and put it into the query manually.
Can anyone suggest a way to do this in the one query?

You have to force a crossjoin, something like this:
select
...
from source,
(select max(id)as maxid from table) as m_id
;
This way you get one value for your max id back, and you can use that to generate your new one.
Generating surrogate keys with hive is kind of painful, sadly enough.

Related

how to get select statement query which was used to create table in oracle

I created a table in oracle like
CREATE TABLE suppliers AS (SELECT * FROM companies WHERE id > 1000);
I would like to know the complete select statement which was used to create this table.
I have already tried get_ddl but it is not giving the select statement. Can you please let me know how to get the select statement?
If you're lucky one of these statements will show the DDL used to generate the table:
select *
from gv$sql
where lower(sql_fulltext) like '%create table suppliers%';
select *
from dba_hist_sqltext
where lower(sql_text) like '%create table%';
I used the word lucky because GV$SQL will usually only have results for a few hours or days, until the data is purged from the shared pool. DBA_HIST_SQLTEXT will only help if you have AWR enabled, the statement was run in the last X days that AWR is configured to hold data (the default is 8), the statement was run after the last snapshot collection (by default it happens every hour), and the statement ran long enough for AWR to think it's worth saving.
And for each table Oracle does not always store the full SQL. For security reasons, DDL statements are often truncated in the data dictionary. Don't be surprised if the text suddenly cuts off after the first N characters.
And depending on how the SQL is called the case and space may be different. Use lower and lots of wildcards to increase the chance of finding the statement.
TRY THIS:
select distinct table_name
from
all_tab_columns where column_name in
(
select column_name from
all_tab_columns
where table_name ='SUPPLIERS'
)
you can find table which created from table

Can not improve bulk delete

I am using Java with mybatis.
I have a query like this and I need to execute this for 2000 values on key_b. That means I need to run the sql for 2000 times. Which is reasonably slow.
DELETE FROM my_table
WHERE key_a = xxx
AND key_b = yyy
Now I came up with another solution, this time I am sending 1000 values in IN clause for key_b. Which means only two query I am executing. I was expecting this one to be faster at least. But this seems to be even slower than the above one. Here is the sql.
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN (y1, y2, ... y1000)
For more information, the key_b is the Primary Key. And the key_a is a Foreign key and has an Index.
Another thing, I've tried to take out the session and make a commit after all the sqls are executed. But It didn't improve that much.
you can use temp table for this:
I mean if you have a table which has id column.
And then You can insert your values to that table like this:
insert into temp_table
select 1 from dual -- your ids
union all
select 2 from dual
union all
select 3 from dual
union all
......
after you fill your temp_table you can run just this:
DELETE FROM my_table
WHERE key_a = xxxx
AND key_b IN
(
select id from temp_table
);
I recommend sticking with 1st approach: called prepared Delete statement in a Java loop over id collection. Off course with ExecutorType REUSE or BATCH, so that statement is prepared once and run for every record.
Furthermore, I discourage trying to bind thousands of parameters.
Anyway, I fear this is the best you can do since Delete operation will check integrity constraints, probably update index, for every record. That is not "bulked".

Fill in the NOT NULL columns of a table while using the INSERT INTO statement

I'm working through a problem where I need to insert 135 rows into a recently created table with a select statement. I have a handful of NOT NULL constraints on that table and I don't understand how to alter my SELECT to insert the correct information.
Here's what I'm trying to do:
CREATE SEQUENCE target_table_s1 START WITH 1001;
INSERT INTO target_table(colA,ColB,ColC,ColD,ColE)
target_table_s1.NEXTVAL,
(SELECT (colB,colC,ColD)
FROM source_table),
colE;
Where colA is a sequence number (to provide a primary key for the target_table) and colE basically just needs to be something simple like SYSDATE.
Any suggestions on how I can make this work? I know that what I've written above isn't going to work but it's the best way I can illustrate what I'm trying to accomplish. Do I need to find a way to put my sequence inside the select statement so it follows the proper "INSERT INTO SELECT" format?
I think you should just be using the INSERT INTO ... SELECT construct here:
INSERT INTO target_table (colA,ColB,ColC,ColD,ColE)
SELECT target_table_s1.NEXTVAL, ColB, ColC, ColD, SYSDATE
FROM source_table
I assume above that you want to insert the SYSDATE into column E.

Insert timestamp into Hive

Hi i'm new to Hive and I want to insert the current timestamp into my table along with a row of data.
Here is an example of my team table :
team_id int
fname string
lname string
time timestamp
I have looked at some other examples, How to insert timestamp into a Hive table?, How can I add a timestamp column in hive and can't seem to get it to work.
This is what I am trying:
insert into team values('101','jim','joe',from_unixtime(unix_timestamp()));
The error I get is:
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
If anyone could help, that would be great, many thanks frostie
Can be achieved through current_timestamp() , but only via select clause. don't even require from clause in select statment.
insert into team select '101','jim','joe',current_timestamp();
or if your hive version doesn't support leaving from in select statment
insert into team select '101','jim','joe',current_timestamp() from team limit 1;
If you don't already have a table with at least one row, you can accomplish the desired result as such.
insert into team select '101','jim','joe',current_timestamp() from (select '123') x;

Does Hive have something equivalent to DUAL?

I'd like to run statements like
SELECT date_add('2008-12-31', 1) FROM DUAL
Does Hive (running on Amazon EMR) have something similar?
Best solution is not to mention table name.
select 1+1;
Gives the result 2. But poor Hive need to spawn map reduce to find this!
Not yet: https://issues.apache.org/jira/browse/HIVE-1558
To create a dual like table in hive where there is one column and one row you can do the following:
create table dual (x int);
insert into table dual select count(*)+1 as x from dual;
Test an expression:
select split('3,2,1','\\,') as my_new_array from dual;
Output:
["3","2","1"]
There is a nice working solution (well, workaround) available in the link, but it is slow as you might imagine.
The idea is that you create a table with a dummy field, create a text file whose content is just 'X', load that text into that table. Viola.
CREATE TABLE dual (dummy STRING);
load data local inpath '/path/to/textfile/dual.txt' overwrite into table dual;
SELECT date_add('2008-12-31', 1) from dual;
Hive does support this function now and also does support many other dates function as well.
You can run query like below in hive, which will add days the provided date in first argument.
SELECT DATE_ADD('2019-03-01', 5);
Hive Date Functions
Quick Solution:
We can use existing table to achieve dual functionality by following query.
SELECT date_add('2008-12-31', 1) FROM <Any Existing Table> LIMIT 1
For example:
SELECT CONCAT('kbdjj','56454') AS a, null AS b FROM tbl_name LIMIT 1
Result
"limit 1" in query is used to avoid multiple occurrences of specified values (kbdjj56454,null).

Resources