Create table in Hue after many with statements - hadoop

I am having an issue creating a table in Hue after I do a bunch of temp. table commands. A very high-level example is below.. I am trying to create a table after the many temporary tables are created.
I am basically trying to create a table of the last select statement but I am running into errors both with the create table line and also determining what the last select * table is called..
With TABLEA as (Select * from TEST1.FILEA),
TableB as (Select * from tableA)
Select * from tableB
where TableB.Curr = 'TYPEE'
CREATE TABLE TEST
row format delimited
fields terminated by '|'
STORED AS RCFile
as Select * from TableB

In your query please follow the syntax and examples as below
create table as <your_with_clause_select_query>
Example:
create table test as
with tableA as ( select * from test1.fileA)
select * from tableA;
You can also use nested select statements with CTAS.
CREATE TABLE TEST AS
select * from (
select
*
from
test1.fileA
) b
row format delimited fields terminated by '|'
STORED AS RCFile

Related

In sql without insert values in table, how i can see column of table

SQL> create table justlike(customerid varchar(19),first_name varchar(40),last_name varchar(100),Address varchar(50),city varchar(30),pincode varchar(10),state varchar(20));
Just use:
desc <table_name>;
This will print the description of your table columns
in your case:
desc justlike;
You can always check the table definition, in case you are using Oracle, by running below query -
SELECT * FROM USER_TAB_COLS
WHERE TABLE_NAME = 'JUSTLIKE';
OR you can write a select on table itself -
SELECT * FROM JUSTLIKE;

how to use one sql insert data to two table?

I have two table,and they are connected by one field : B_ID of table A & id of table B.
I want to use sql to insert data to this two table.
how to write the insert sql ?
1,id in table B is auto-increment.
2,in a stupid way,I can insert data to table B first,and then select the id from table B,then add the id to table A as message_id.
You cannot insert data to multiple tables in one SQL statement. Just insert data first to B table and then table A. You could use RETURNING statement to get ID value and get rid of additional select statement between inserts.
See: https://oracle-base.com/articles/misc/dml-returning-into-clause
Have you heard about AFTER INSERT trigger? I think it is what you are looking for.
Something like this might do what you want:
CREATE OR REPLACE TRIGGER TableB_after_insert
AFTER INSERT
ON TableB
FOR EACH ROW
DECLARE
v_id int;
BEGIN
/*
* 1. Select your id from TableB
* 2. Insert data to TableA
*/
END;
/

Hive Timestamp aggregation

I have two hive tables, in which one table is updating an hourly basic by Java API team (they are calling and storing it into hive table1). And now I have to aggregate the latest data and store it into another table called table2 (data which are loaded newly,because old data have been aggregated and stored). For that I have used the query below:
set maxtime = select max(lastactivitytimestamp) from table2;
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('${hivevar:maxtime}');
I am not getting any result. But when I give the timestamp value manually I am getting data, like below:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp('2014-08-18 15:23:26.754');
Is it possible to pass dynamic values in unix_timestamp?
Try removing the upper commas from the unix_timestamp() function, like this:
insert into table2 select * from table1 where lastactivitytimestamp > unix_timestamp(${hivevar:maxtime});

Hive multiple insert goes wrong with the DISTINCT select statement

I read this code from "Hadoop the Definitive Guide":
SELECT a.ad_id, a.campaign_id, a.account_id, b.user_id
FROM dim_ads a JOIN impression_logs b ON (b.ad_id = a.ad_id)
WHERE b.dateid = '2008-12-01') x
INSERT OVERWRITE DIRECTORY 'results_gby_adid'
SELECT x.ad_id, count(1), count(DISTINCT x.user_id) GROUP BY x.ad_id
INSERT OVERWRITE DIRECTORY 'results_gby_campaignid'
SELECT x.campaign_id, count(1), count(DISTINCT x.user_id) GROUP BY x.campaign_id
INSERT OVERWRITE DIRECTORY 'results_gby_accountid'
SELECT x.account_id, count(1), count(DISTINCT x.user_id) GROUP BY x.account_id;
but as my test, using several DISTINCT cannot get right results.
my hiveql as below:
CREATE TABLE IF NOT EXISTS a (logindate int, id int);
then
load local file to this table...
CREATE TABLE IF NOT EXISTS user (id INT) PARTITIONED BY (logindate INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
then
if inserting table separately:
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT DISTINCT(id) FROM a WHERE logindate=20130120;
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT DISTINCT(id) FROM a WHERE logindate=20130121;
the results are correct;
but if choosing the next multiple insert hql:
FROM a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT DISTINCT(id) WHERE logindate=20130120
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT DISTINCT(id) WHERE logindate=20130121;
the results are not correct, both partitions have the same number of records, seems like select from DISTINCT(id) WHERE logindate=20130120 OR logindate=20130121
so is it a bug or did I write some wrong syntax?
DISTINCT has a bit of an odd history in the code as an alias to group by.
If there is a bug, then the version of hive you are using would be important to know since bugs are addressed in each release.
This might work:
FROM a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130120 GROUP BY id
INSERT OVERWRITE TABLE user PARTITION(logindate=20130121) SELECT id WHERE logindate=20130121 GROUP BY id;
if that doesn't work, this will definitely work...even though it isn't the approach you were attempting to use...
FROM (select distinct id, logindate from a where logindate in ('20130120','20130121')) subq_a
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130120
INSERT OVERWRITE TABLE user PARTITION(logindate=20130120) SELECT id WHERE logindate=20130121;

How to duplicate all data in a table except for a single column that should be changed

I have a question regarding a unified insert query against tables with different data
structures (Oracle). Let me elaborate with an example:
tb_customers (
id NUMBER(3), name VARCHAR2(40), archive_id NUMBER(3)
)
tb_suppliers (
id NUMBER(3), name VARCHAR2(40), contact VARCHAR2(40), xxx, xxx,
archive_id NUMBER(3)
)
The only column that is present in all tables is [archive_id]. The plan is to create a new archive of the dataset by copying (duplicating) all records to a different database partition and incrementing the archive_id for those records accordingly. [archive_id] is always part of the primary key.
My problem is with select statements to do the actual duplication of the data. Because the columns are variable, I am struggling to come up with a unified select statement that will copy the data and update the archive_id.
One solution (that works), is to iterate over all the tables in a stored procedure and do a:
CREATE TABLE temp as (SELECT * from ORIGINAL_TABLE);
UPDATE temp SET archive_id=something;
INSERT INTO ORIGINAL_TABLE (select * from temp);
DROP TABLE temp;
I do not like this solution very much as the DDL commands muck up all restore points.
Does anyone else have any solution?
How about creating a global temporary table for each base table?
create global temporary table tb_customers$ as select * from tb_customers;
create global temporary table tb_suppliers$ as select * from tb_suppliers;
You don't need to create and drop these each time, just leave them as-is.
You're archive process is then a single transaction...
insert into tb_customers$ as select * from tb_customers;
update tb_customers$ set archive_id = :v_new_archive_id;
insert into tb_customers select * from tb_customers$;
insert into tb_suppliers$ as select * from tb_suppliers;
update tb_suppliers$ set archive_id = :v_new_archive_id;
insert into tb_suppliers select * from tb_suppliers$;
commit; -- this will clear the global temporary tables
Hope this helps.
I would suggest not having a single sql statement for all tables and just use and insert.
insert into tb_customers_2
select id, name, 'new_archive_id' from tb_customers;
insert into tb_suppliers_2
select id, name, contact, xxx, xxx, 'new_archive_id' from tb_suppliers;
Or if you really need a single sql statement for all of them at least precreate all the temp tables (as temp tables) and leave them in place for next time. Then just use dynamic sql to refer to the temp table.
insert into ORIGINAL_TABLE_TEMP (SELECT * from ORIGINAL_TABLE);
UPDATE ORIGINAL_TABLE_TEMP SET archive_id=something;
INSERT INTO NEW_TABLE (select * from ORIGINAL_TABLE_TEMP);

Resources