append not working with hive - hadoop

I am trying to insert data from table a to table b (both are external tables), basically relying upon the append feature of the environment. I have tried the same with managed tables as well, but the behaviour was same.
The append somehow is not working out for me. On the other hand, ther overwrite works just fine.
e.g. the following fails
hive> insert table page_view select viewtime, userid, page_url, country from page_view1;
FAILED: Parse Error: line 1:0 cannot recognize input near 'insert' 'table' 'page_view' in insert clause
but, the following works just fine...
hive> insert overwrite table page_view select viewtime, userid, page_url, country from page_view1;
I am on hadoop 1.0.2 and hive 0.8.1
help needed...
insert table page_view select viewtime, userid, page_url, country from page_view1;

I believe according to what I saw in the comments here (https://issues.apache.org/jira/browse/HIVE-306) you are missing an INTO keyword. I think something like this might work:
insert INTO table page_view select viewtime, userid, page_url, country from page_view1;

Related

How to create table in Hive with specific column values from another table

I am new to Hive and have some problems. I try to find a answer here and other sites but with no luck... I also tried many different querys that come to my mind, also without success.
I have my source table and i want to create new table like this.
Were:
id would be number of distinct counties as auto increment numbers and primary key
counties as distinct names of counties (from source table)
You could follow this approach.
A CTAS(Create Table As Select)
with your example this CTAS could work
CREATE TABLE t_county
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE AS
WITH t AS(
SELECT DISTINCT county, ROW_NUMBER() OVER() AS id
FROM counties)
SELECT id, county
FROM t;
You cannot have primary key or foreign keys on Hive as you have primary key on RBDMSs like Oracle or MySql because Hive is schema on read instead of schema on write like Oracle so you cannot implement constraints of any kind on Hive.
I can not give you the exact answer because of it suppose to you must try to do it by yourself and then if you have a problem or a doubt come here and tell us. But, what i can tell you is that you can use the insertstatement to create a new table using data from another table, I.E:
create table CARS (name string);
insert table CARS select x, y from TABLE_2;
You can also use the overwrite statement if you desire to delete all the existing data that you have inside that table (CARS).
So, the operation will be
CREATE TABLE ==> INSERT OPERATION (OVERWRITE?) + QUERY OPERATION
Hive is not an RDBMS database, so there is no concept of primary key or foreign key.
But you can add auto increment column in Hive. Please try as:
Create table new_table as
select reflect("java.util.UUID", "randomUUID") id, countries from my_source_table;

order by added column not working in CLOUDERA

I've created a table in CLOUDERA and then added a column to it with:
ALTER TABLE table1 ADD COLUMNS (`new_col` VARCHAR(40));
then i'm trying to select with order by:
select col1,col2,new_col from table1 order by 1,2,3
however this is fails.
it is working without order by clause.
it is also working without selecting the new_col in select statement.
any ideas what causing the failure?
edit:
i think it happens because new column contain nulls. how can i overcome this issue?

Concat_ws not working in insert statement in hive

Using hive, I'm trying to concatenate columns from one table and insert them in another table using the query
insert into table temp_error
select * from (Select 'temp_test','abcd','abcd','abcd',
from_unixtime(unix_timestamp()),concat_ws('|',sno,name,age)
from temp_test_string)c;
I get the required output till I use Select *. But as soon as I try to insert it into the table, it does not give concatenated output but gives the value of sno only instead of whole concatenated output.
Thanks guys.
I found why it was behaving that way. It's because while creating table I gave "separate fields by '|'". So what I was trying to insert as a string into the table, hive was interpreting it as different columns.

How to "insert into values" using Hive,Pig or MapReduce?

I am new for hadoop and big data concepts. I am using Hortonworks sandbox and trying to manipulate values of a csv file. So I imported the file using file browser and created a table in hive to do some query. Actually I want to have an "insert into values" query to select some rows, change the value of columns(for example change string to binary 0 or 1) and insert it into a new table. SQL LIKE query could be something like this:
Insert into table1 (id, name, '01')
select id, name, graduated
from table2
where university = 'aaa'
Unfortunately hive could not insert (constant) values (without importing from file) and I don`t know how to solve this problem using hive,pig or even mapreduce scripts.
Please help me to fine the solution,I really need to it.
Thanks in advance.
In Hive,
CREATE TABLE table1 as SELECT id, name, graduated FROM table2
WHERE university = 'aaa'
should create a new table with the results of the query.

Hive: Create New Table from Existing Partitioned Table

I'm using Amazon's Elastic MapReduce and I have a hive table created based on a series of log files stored in Amazon S3 and split in folders by day like so:
data/day=2011-09-01/log_file.tsv
data/day=2011-09-02/log_file.tsv
I am currently trying to create an additional table which filters out some unwanted activity in these log files but I can't figure out how to do this and keep getting errors such as:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
If my initial table create statement looks something like this:
CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
... fields ...
)
PARTITIONED BY ( DAY STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bucketname/data/';
That initial table works fine and I've been able to query it with no problems.
How then should I create a new table that shares the structure of the previous one but simply filters out data? This doesn't seem to work.
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
FROM table1
INSERT OVERWRITE TABLE table2
SELECT * WHERE
col1 = '%somecriteria%' AND
more criteria...
;
As I've stated above, this returns:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
Thanks!
This always works for me:
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
INSERT OVERWRITE TABLE table2 PARTITION (day) SELECT col1, col2, ..., day FROM table1;
ALTER TABLE table2 RECOVER PARTITIONS;
Notice that I've added 'day' as a column in the SELECT statement. Also notice that there is an ALTER TABLE line which is necessary for Hive to become aware of the partitions that were newly created in table2.
I have never used the like option.. so thanks for showing me that. Will that actually create all of the partitions that the first table has as well? If not, that could be the issue. You could try using dynamic partitions:
create external table if not exists table2 like table1;
insert overwrite table table2 partition(part) select col1, col2 from table1;
Might not be the best solution, as I think you have to specify your columns in the select clause (as well as the partition column in the partition clause).
And, you must turn on dynamic partitioning.
I hope this helps.

Resources