Using hive, I'm trying to concatenate columns from one table and insert them in another table using the query
insert into table temp_error
select * from (Select 'temp_test','abcd','abcd','abcd',
from_unixtime(unix_timestamp()),concat_ws('|',sno,name,age)
from temp_test_string)c;
I get the required output till I use Select *. But as soon as I try to insert it into the table, it does not give concatenated output but gives the value of sno only instead of whole concatenated output.
Thanks guys.
I found why it was behaving that way. It's because while creating table I gave "separate fields by '|'". So what I was trying to insert as a string into the table, hive was interpreting it as different columns.
Related
I have a file called : akolp9app1a_170905_0000.txt
I need to split the values
hostname= akolp9app1a
date=170905 (convert into proper data format)
Now create a table in hive with 2 columns hostname and date and insert this values into the table.
any suggestion
Thanks.
You can use virtual columns, for example INPUT__FILE__NAME. It gives the input file's name.
Then you can use split (or) substring (or) regexp_extract string functions on the input__file__name field and create hostname,date values.
Example:-
the below select query gives date field value as 170905, like this way build your query using string functions to extract hostname
hive> select split(INPUT__FILE__NAME,'[\_]')[1] `date` from tablename;
Store them to separate table by using insert statement.
When I need to change a column into a partition (convert normal column as partition column in hive), I want to create a new table to copy all columns except one. I currently have >50 columns in the original table. Is there any clean way of doing that?
Something like:
CREATE student_copy LIKE student EXCEPT age and hair_color;
Thanks!
You can use a regex:
CTAS using REGEX column spec. :
set hive.support.quoted.identifiers=none;
CREATE TABLE student_copy AS SELECT `(age|hair_color)?+.+` FROM student;
set hive.support.quoted.identifiers=column;
BUT (as mentioned by Kishore Kumar Suthar :
this will not create a partitioned table, as that is not supported with CTAS (Create Table As Select).
Only way I see for you to get your partitioned table is by getting the complete create statement of the table (as mentioned by Abraham):
SHOW CREATE TABLE student;
Altering it to create a partition on the column you want. And after that you can use the select with regex when inserting into the new table.
If your partition column is already part of this select, then you need to make sure it is the last column you insert. If it is not you can exclude that column in the regex and including it as last. Also if you expect several partitions to be created based on your insert statement you need to enable 'dynamic partitioning':
set hive.support.quoted.identifiers=none;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
INSERT INTO TABLE student_copy PARTITION(partcol1) SELECT `(age|hair_color|partcol1)?+.+`, partcol1 FROM student;
set hive.support.quoted.identifiers=column;
the 'hive.support.quoted.identifiers=none' is required to use the backticks '`' in the regex part of the query. I set this parameter to it's original value after my statement: 'hive.support.quoted.identifiers=column'
CREATE TABLE student_copy LIKE student;
It just copies the source table definition.
CREATE TABLE student_copy AS select name, age, class from student;
Target cannot be partitioned table.
Target cannot be external table.
It copies the structure as well as the data
I use below command to get the create statement of existing table.
SHOW CREATE TABLE student;
Copy the result and modify that based on your requirement for new table and run the modified command to get the new table.
I would like to add a new column to a table, but only if that column does not already exist.
This works if the column does not exist:
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
But when I execute it a second time, I get an error.
Column 'mycolumn' exists
When I try to use the "IF NOT EXISTS" syntax that is supported for CREATE TABLE and ADD PARTITION, I get a syntax error:
ALTER TABLE MyTable ADD IF NOT EXISTS COLUMNS (mycolumn string);
FAILED: ParseException line 3:42 required (...)+ loop did not match anything at input 'COLUMNS' in add partition statement
What I need is something that can execute itempotently so I can run my query whether this column exists or not.
You can partially work it around, by setting the hive.cli.errors.ignore flag. In this case hive CLI will force the execution of further queries even when queries on the way fail.
In this example:
SET hive.cli.errors.ignore=true;
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn2 string);
hive will execute all queries, even though there'll be an error in the second query.
Well there is no direct way to do that. I mean through a single query.
There are two other ways:
1.) Using JDBC:
1.1) Do describe on the table name.
1.2) You will get a list of columns in result set.
1.3) Check if your columns exists or not by iterating through the result set.
2.) Using hive Metastore client:
2.1) Create a object of HiveMetastoreClient
2.2) HiveMetastoreClient.getFields(<>db_name, <table_name>).get(index).getName() will give you the column name.
2.3) Check if your column exists of not by comparing the list.
Hope it helps...!!!
I am new for hadoop and big data concepts. I am using Hortonworks sandbox and trying to manipulate values of a csv file. So I imported the file using file browser and created a table in hive to do some query. Actually I want to have an "insert into values" query to select some rows, change the value of columns(for example change string to binary 0 or 1) and insert it into a new table. SQL LIKE query could be something like this:
Insert into table1 (id, name, '01')
select id, name, graduated
from table2
where university = 'aaa'
Unfortunately hive could not insert (constant) values (without importing from file) and I don`t know how to solve this problem using hive,pig or even mapreduce scripts.
Please help me to fine the solution,I really need to it.
Thanks in advance.
In Hive,
CREATE TABLE table1 as SELECT id, name, graduated FROM table2
WHERE university = 'aaa'
should create a new table with the results of the query.
I am trying to run a hive query to filter invalid records. Here is what I am doing
1. Load the csv file into a single column table.
2. define a UDF my_validation to validate each record
3. execute the query
from pgstg INSERT OVERWRITE LOCAL DIRECTORY '/tmp/validrecords.out'
select * where my_validation(record) IS NOT NULL
INSERT OVERWRITE TABLE PGERR
select record where my_validation(record) IS NULL;
Here are my questions:
a. Is there a better way to filter invalid records;
b. Does the my_validation UDF run twice on the whole table ?
c. what is the best way to split a single column to multiple column.
Thanks much for your help.
To answer your questions:
1) If you have custom validation criteria UDF is probably the way to go. If I were doing it, I would create an is_valid UDF that returns a boolean (instead of returning NULL vs. not NULL).
2) Yes, the UDF does get run twice.
3) Glad you asked. Look at the explode function available in Hive