I am new for hadoop and big data concepts. I am using Hortonworks sandbox and trying to manipulate values of a csv file. So I imported the file using file browser and created a table in hive to do some query. Actually I want to have an "insert into values" query to select some rows, change the value of columns(for example change string to binary 0 or 1) and insert it into a new table. SQL LIKE query could be something like this:
Insert into table1 (id, name, '01')
select id, name, graduated
from table2
where university = 'aaa'
Unfortunately hive could not insert (constant) values (without importing from file) and I don`t know how to solve this problem using hive,pig or even mapreduce scripts.
Please help me to fine the solution,I really need to it.
Thanks in advance.
In Hive,
CREATE TABLE table1 as SELECT id, name, graduated FROM table2
WHERE university = 'aaa'
should create a new table with the results of the query.
Related
I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.
I have my source table and i want to create new table like this.
Were:
id would be number of distinct counties as auto increment numbers and primary key
counties as distinct names of counties (from source table)
You could follow this approach.
A CTAS(Create Table As Select)
with your example this CTAS could work
CREATE TABLE t_county
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE AS
WITH t AS(
SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
FROM counties)
SELECT id, county
FROM t;
You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.
I can not give you the exact answer because of it suppose to you must try to do it by yourself and then if you have a problem or a doubt come here and tell us. But, what i can tell you is that you can use the insertstatement to create a new table using data from another table, I.E:
create table CARS (name string);
insert table CARS select x, y from TABLE_2;
You can also use the overwrite statement if you desire to delete all the existing data that you have inside that table (CARS).
So, the operation will be
CREATE TABLE ==> INSERT OPERATION (OVERWRITE?) + QUERY OPERATION
Hive is not an RDBMS database, so there is no concept of primary key or foreign key.
But you can add auto increment column in Hive. Please try as:
Create table new_table as
select reflect("java.util.UUID", "randomUUID") id, countries from my_source_table;
Using hive, I'm trying to concatenate columns from one table and insert them in another table using the query
insert into table temp_error
select * from (Select 'temp_test','abcd','abcd','abcd',
from_unixtime(unix_timestamp()),concat_ws('|',sno,name,age)
from temp_test_string)c;
I get the required output till I use Select *. But as soon as I try to insert it into the table, it does not give concatenated output but gives the value of sno only instead of whole concatenated output.
Thanks guys.
I found why it was behaving that way. It's because while creating table I gave "separate fields by '|'". So what I was trying to insert as a string into the table, hive was interpreting it as different columns.
I want to write a query such that it returns the table name (of the table I am querying) and some other values. Something like:
select table_name, col1, col2 from table_name;
I need to do this in Hive. Any idea how I can get the table name of the table I am querying?
Basically, I am creating a lookup table that stores the table name and some other information on a daily basis in Hive. Since Hive does not (at least the version we are using) support full-fledged INSERTs, I am trying to use the workaround where we can INSERT into a table with a SELECT query that queries another table. Part of this involves actually storing the table name as well. How can this be achieved?
For the purposes of my use case, this will suffice:
select 'table_name', col1, col2 from table_name;
It returns the table name with the other columns that I will require.
I create pages for loading data from CSV through Data Load Wizard. CSV-file contains data not for all fields, but this fields are required.
How can I insert data in fields, that not contains in CSV, for example, using queries to another tables?
My APEX vervion is 4.1.1.00.23
You can do that using Oracle SQL.
If there is a unique ID for the table that you are loading into (let's say you have most of the employee in a csv), you can use that ID to merge or update data from other tables.
For a table like this...
empid,
ename,
zip_code
state_code
Example : you can load this using CSV
EMP:
empid,
ename,
zip_code
and then update the state using this query.
update emp tgt
set state = (select state
from zip_code_lookup src
where src.zip_code = tgt.zip_code
);
If there is more than one column that needs to be updated, I prefer using the merge command.
http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_9016.htm#SQLRF01606
you can refer to this link for more information about data loading to multiple tables using the data loading wizard http://vincentdeelen.blogspot.in/2014/03/data-load-for-multiple-tables.html
I am trying to insert data from table a to table b (both are external tables), basically relying upon the append feature of the environment. I have tried the same with managed tables as well, but the behaviour was same.
The append somehow is not working out for me. On the other hand, ther overwrite works just fine.
e.g. the following fails
hive> insert table page_view select viewtime, userid, page_url, country from page_view1;
FAILED: Parse Error: line 1:0 cannot recognize input near 'insert' 'table' 'page_view' in insert clause
but, the following works just fine...
hive> insert overwrite table page_view select viewtime, userid, page_url, country from page_view1;
I am on hadoop 1.0.2 and hive 0.8.1
help needed...
insert table page_view select viewtime, userid, page_url, country from page_view1;
I believe according to what I saw in the comments here (https://issues.apache.org/jira/browse/HIVE-306) you are missing an INTO keyword. I think something like this might work:
insert INTO table page_view select viewtime, userid, page_url, country from page_view1;