Hive change column name without knowing column data type - hadoop

I want to change the column name of a Hive table without changing it's datatype.
I tried below query but it requires datatype which I don't know.
ALTER TABLE test CHANGE a a1 INT;
I would like to prefix SALES_ before all my columns irrespective of their column types.
Input Table
emp_id(int) emp_name(string) salary(double)
Output Table
sales_emp_id(int) sales_emp_name(string) sales_salary(double)
Thanks in advance.

Well, altering the column name in hive using alter table command require its datatype.
For this purpose you may perform the below commands,
1)Create a new table with the your new column names)
create table newTable (sales_emp_id int ,sales_emp_name string, sales_salary double) ;
2)Insert into new table from old table
insert into newTable select * from oldtable;
3)Now,you may drop your old table.
drop table oldtable;
The above code may be used if creating a new table sounds ok for you.
Well if you use a shell script , something like below:
while read line;do
SOURCE_TABLENAME= `echo $line| awk -F" " '{print $1}'`
TARGET_TABLENAME= `echo $line| awk -F" " '{print $2}'`
LOC=`echo "$line"| awk -F" " '{print $3}'`
PREFIX="emp_"
S=`hive -e "desc $SOURCE_TABLENAME"`
VAL=echo $S |sed 's/\(\(\w\w*\W*\)\{2\}\)/\1\n/g' | sed 's/$/,/g' | sed -e 's/^/$PREFIX/'
STATEMENT="CREATE TABLE $SOURCE_TABLENAME (`echo $VAL) as select * from $SOURCE_TABLENAME LOCATION $LOC`"
hive -e "drop table $SOURCE_TABLENAME"
done < INPUT_FILE.txt
INPUT_FILE.txt
source_table target_table location (all inputs separated by space)

Without creating new table, you can use the REPLACE function in hive to change all the column names. The command looks like this
ALTER TABLE table_name REPLACE COLUMNS (sales_emp_id INT,sales_emp_name STRING,sales_salary DOUBLE);
Now you can use the describe command to check the column names
describe table_name;

Related

Export hql output to csv in beeline

I am trying to export my hql output to csv in beeline using below command :
beeline -u "jdbc:hive2://****/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"?tez.queue.name=devices-jobs --outputformat=csv2 -e "use schema_name; select * from table_name where open_time_new>= '2020-07-13' and open_time_new < '2020-07-22'" > filename.csv
The problem is that some column values in the table contains commas which pushes the data of same column to the next column value.
For eg:
| abcd | as per data,outage fault,xxxx.
| xyz |as per the source,ghfg,hjhjg.
The above data will get saved as 4 column instead of 2.
Need help!
Try the approach with local directory:
insert overwrite local directory '/tmp/local_csv_report'
row format delimited fields terminated by "," escaped by '\\'
select *
from table_name
where open_time_new >= '2020-07-13'
and open_time_new < '2020-07-22'
This will create several csv files under your local /tmp/local_csv_report directory, so using simple cat after that will merge the results into a single file.

How to export a Hive table into a CSV file including header?

I used this Hive query to export a table into a CSV file.
hive -f mysql.sql
row format delimited fields terminated by ','
select * from Mydatabase,Mytable limit 100"
cat /LocalPath/* > /LocalPath/table.csv
However, it does not include table column names.
How to export in csv the column names ?
show tablename ?
You should add set hive.cli.print.header=true; before your select query to get column names as the first row of your output. The output would look as Mytable.col1, Mytable.col2 ....
If you don't want the table name with the column names, use set hive.resultset.use.unique.column.names=false;. The first row of your output would then look like col1, col2 ...
Invoking hive command-line with the parameters suggested in the other answer here works for a plain select. So, you can extract the column names and create the csv to start with, as follows:
hive -S --hiveconf hive.cli.print.header=true --hiveconf hive.resultset.use.unique.column.names=false --database Mydatabase -e 'select * from Mytable limit 0;' > /LocalPath/table.csv
Post which you can have the actual data extraction part run, except this time, remember to append to the csv:
cat /LocalPath/* >> /LocalPath/table.csv ## From your question with >> for append

bash / sed / awk Remove or gsub timestamp pattern from text file

I have a text file like this:
1/7/2017 12:53 DROP TABLE table1
1/7/2017 12:53 SELECT
1/7/2017 12:55 --UPDATE #dat_recency SET
Select * from table 2
into table 3;
I'd like to remove all of the timestamp patterns (M/D/YYYY HH:MM, M/DD/YYYY HH:MM, MM/D/YYYY HH:MM, MM/DD/YYYY HH:MM). I can find the patterns using grep but can't figure out how to use gsub. Any suggestions?
DESIRED OUTPUT:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;
You can use this sed command to remove data/time stamps from line start:
sed -i.bak -E 's~([0-9]{1,2}/){2}[0-9]{4} [0-9]{2}:[0-9]{2} *~~' file
cat file
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;
Use the default space separator, make first and second columns to empty string and then print the whole line.
awk '/^[0-9]/{$1=$2="";gsub(/^[ \t]+|[ \t]+$/, "")} !/^[0-9]/{print}' sample.csv
the command checks each line whether starts with numeric or not, if it is replace the first 2 columns with empty strings and remove leading spaces; otherwise print the original line.
output:
DROP TABLE table1
SELECT
--UPDATE #dat_recency SET
Select * from table 2
into table 3;

hive load data:how to specify file column separator and dynamic partition columns?

well I had some question on loading mysql data into hive2, and don't know how to specify the separator, I tried for serval times but got nothing.
Here below is the hive table,id is the partition column,
0: jdbc:hive2://localhost/> desc test;
+-----------+------------+----------+
| col_name | data_type | comment |
+-----------+------------+----------+
| a | string | |
| id | int | |
+-----------+------------+----------+
When i execute
load data local inpath 'file:///root/test' into table test partition (id=1);
it says:
Invalid path ''file:///root/test'': No files matching path file
but it do exists.
I wish to dynamic partitioned by the specified file,so i add the very column into the file like this:
root#<namenode|~>:#cat /root/test
a,1
b,2
but it also failed,the docs say nothing about this,i guess it doesn't support right now.
dose anyone got some idea in it? any help will be appreciated!
If you want to specify column sperators it uses the command;
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Replace the ',' with your separator
Also if you want to partition a Hive table you specify the column which you want to terminate on using;
CREATE TABLE Foo (bar int )
PARTITIONED BY (testpartition string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','

how to select arguments from text file in bash and loop over them

I have a text file that contains the following format below and I wanted to write a bash script that stores the column (adastatus,type,bodycomponent) names into a variable say x1.
# col_name data_type comment
adastatus string None
type string None
bodycomponent string None
bodytextlanguage string None
copyishchar string None
Then for each of the columns names in x1 I want to run a loop
alter table tabelname change x1(i) x1(i) DOUBLE;
How about:
#!/bin/sh
for i in `cut -f1 yourfile.txt`
do
SQL="alter table tablename change $i $i DOUBLE"
sql_command $SQL
done
awk '$1 !~ /^#/ {if ($1) print $1}' in.txt | \
xargs -I % echo "alter table tabelname change % % DOUBLE"
Replace echo with the command needed to run the alter command (from #Severun's answer it sounds like sql_command).
using awk, matches only input lines that do no start with # (except for leading whitespace) and are non-empty, then returns the first whitespace-separated token, i.e., the 1st column value for each line.
xargs invokes the target command once for each column name, substituting the column name for % - note that % as a placeholder was randomly chosen via the -I option.
Try:
#!/bin/bash
while read col1 _ _
do
[[ "$col1" =~ \#.* ]] && continue # skip comments
[[ -z "$col1" ]] && continue # skip empty lines
echo alter table tabelname change ${col1}\(i\) ${col1}\(i\)
done < input.txt
Output:
$ ./c.sh
alter table tabelname change adastatus(i) adastatus(i)
alter table tabelname change type(i) type(i)
alter table tabelname change bodycomponent(i) bodycomponent(i)
alter table tabelname change bodytextlanguage(i) bodytextlanguage(i)
alter table tabelname change copyishchar(i) copyishchar(i)
Change echo to a more appropriate command.

Resources