unknown column 'cities.city' in 'field list' - sqoop

unknown column 'cities.city' in 'field list'
sqoop import \
--connect jdbc:mysql://localhost/sivam_db \
--username root \
--password cloudera \
--query 'select cities.city as ccity,normcities.city as ncity from cities full join normcities using(id) where $CONDITIONS' \
--split-by id \
--target-dir /user/duplicatecolumn \
--m 1 \
--boundary-query "select min(id),max(id) from cities" \
--mapreduce-job-name fjoin \
--direct
I have checked all the posts related to this error and tried too, but still not resolved.
Schema of Cities:
create table cities(id int not null auto_increment,country varchar(30) not null,city varchar(30) not null, primary key (id));
Schema of normcities:
create table normcities(id int not null auto_increment,country_id int not null,city varchar(30) not null, primary key(id));

In the above query in sqoop command, the output will be having 2 columns which display city names and this is only specified in the select statement and no other columns are retrieved. So full join gives an error as we filter only matched city names..
so remove the word 'full' and run the commnad.
we will get output.

Related

Importing subset of columns from RDBMS to Hive table with Sqoop

Suppose we have mysql database called lastdb with table person. This table contains 4 columns called: id, firstname, lastname, age.
Rows inside person table:
1, Firstname, Lastname, 20
I want to import data from this mysql person table to hive table with the same structure but only from the first and the last column from table person. So after my import the rows in hive table should look like this:
1, NULL, NULL, 20
I tried this sqoop command:
sqoop import --connect jdbc:mysql://localhost:3306/lastdb --table person --username root --password pass --hive-import --hive-database lastdb --hive-table person --columns id,age
But it imports rows to hive table in this format:
1, 20, NULL, NULL
Could anyone tell me how to fix that?
Given that rows inside your MySQL table are id, firstname, lastname, age of values: 1, NULL, NULL, 20, then, running the sqoop import script below will give your desired outcome in the hive person table.
~]$ sqoop import \
--connect \
jdbc:mysql://localhost:3306/lastdb \
--username=root \
--password=pass \
--query 'select * from person WHERE $CONDITIONS' \
--hive-import \
--hive-table person \
--hive-database lastdb \
--target-dir /user/hive/warehouse/person -m 1

Hadoop - sqoop Export/Import Partitioned table

Can anyone explain how to export partitioned table from hive to MYSQL database?
And how to import into a hive partitioned table from mysql?
I have read the documents in google but not sure on the latest techniques that can be used.
Thanks
sqoop to hive partition import
1. create a table in mysql with 4 fields (id, name, age, sex)
CREATE TABLE `mon2`
(`id` int, `name` varchar(43), `age` int, `sex` varchar(334))
2. insert data into mysql table using csv abc.csv
1,mahesh,23,m
2,ramesh,32,m
3,prerna,43,f
4,jitu,23,m
5,sandip,32,m
6,gps,43,f
mysql> source location_of_your_csv/abc.csv
3. now start your hadoop service and goto $SQOOP_HOME and write sqoop import query for partition hive import.
sqoop import \
--connect jdbc:mysql://localhost:3306/apr \
--username root \
--password root \
-e "select id, name, age from mon2 where sex='m' and \$CONDITIONS" \
--target-dir /user/hive/warehouse/hive_part \
--split-by id \
--hive-overwrite \
--hive-import \
--create-hive-table \
--hive-partition-key sex \
--hive-partition-value 'm' \
--fields-terminated-by ',' \
--hive-table mar.hive_part \
--direct
hive to sqoop export with partition
1. create hive_temp table for load data
create table hive_temp
(id int, name string, age int, gender string)
row format delimited fields terminated by ',';
2. load data
load data local inpath '/home/zicone/Documents/pig_to_hbase/stack.csv' into table hive_temp;
3. create a partition table with a specific column that you want to partition.
create table hive_part1
(id int, name string, age int)
partitioned by (gender string)
row format delimited fields terminated by ',';
4. add a partition in hive_temp table
alter table hive_part1 add partition(gender='m');
5. copy data from temp to hive_part table
insert overwrite table hive_part1 partition(gender='m')
select id, name, age from hive_temp where gender='m';
6. sqoop export command
create a table in mysql
mysql> create table mon3 like mon2;
sqoop export \
--connect jdbc:mysql://localhost:3306/apr \
--table mon3 \
--export-dir /user/hive/warehouse/mar.db/hive_part1/gender=m \
-m 1 \
--username root \
--password root
now goto mysql terminal and run
select * from mon3;
hope it works for you
:)

Sqoop import issue

I am importing a table into hive. So i have created a external table on hadoop and import data from oracle using sqoop. but the problem is when i am querying data all columns are into one column in hive.
Table:
CREATE EXTERNAL TABLE `default.dba_cdr_head`(
`BI_FILE_NAME` varchar(50),
`BI_FILE_ID` int,
`UPDDATE` TIMESTAMP)
LOCATION
'hdfs:/tmp/dba_cdr_head';
Sqoop:
sqoop import \
--connect jdbc:oracle:thin:#172.16.XX.XX:15xx:CALLS \
--username username\
--password password \
--table CALLS.DBM_CDR_HEAD \
--columns "BI_FILE_NAME, BI_FILE_ID, UPDDATE" \
--target-dir /tmp/dba_cdr_head \
--hive-table default.dba_cdr_head
data looks like as below:
hive> select * from dba_cdr_head limit 5;
OK
CFT_SEP0801_20120724042610_20120724043808M,231893, NULL NULL
CFT_SEP1002_20120724051341_20120724052057M,232467, NULL NULL
CFT_SEP1002_20120724052057_20120724052817M,232613, NULL NULL
CFT_SEP0701_20120724054201_20120724055154M,232904, NULL NULL
CFT_SEP0601_20120724054812_20120724055853M,233042, NULL NULL
Time taken: 3.693 seconds, Fetched: 5 row(s)
I have changed table create option ( ROW FORMAT DELIMITED FIELDS TERMINATED BY ',') and it solved.
CREATE EXTERNAL TABLE `default.dba_cdr_head`(
`BI_FILE_NAME` varchar(50),
`BI_FILE_ID` int,
`UPDDATE` TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION
'hdfs:/tmp/dba_cdr_head';

Sqoop teradata table whith same columns title

I've got an error importing data from teradate to an Hadoop cluster using Sqoop.
My tera table have 2 columns title (not columns name)equivalent. Is there an automatic way to use col name instead of col title in my sqoop job ?
I've tried to use a "Select * from table" as a query but does not work.
And I can't change col title in teradata.
Here my job code :
sqoop job -Dmapred.job.queue.name=shortduration \
--create inc_My_Table \
-- import \
--connect jdbc:teradata://RCT/DATABASE=DWHBIG \
--driver com.teradata.jdbc.TeraDriver \
--username MBIGDATA -P \
--query "select a.* from My_Table a where \$CONDITIONS" \
--target-dir /data/source/fb/$i \
--check-column DAT_MAJ_DWH \
--incremental append \
--last-value 2001-01-01 \
--split-by ID
Any idea ? Thanks
Since Teradata JDBC Driver 16.00.00.28, you can use connection URL parameter COLUMN_NAME to control the behavior of the getColumnName and getColumnLabel to return the column name, column title, or As-clause name, That should resolve your problem.
COLUMN_NAME=OFF (the default) specifies that the
ResultSetMetaData.getColumnName method should return the AS-clause
name if available, or the column name if available, or the column
title, and specifies that the ResultSetMetaData.getColumnLabel method
should return the column title.
COLUMN_NAME=ON specifies that, when StatementInfo parcel support is
available, the ResultSetMetaData.getColumnName method should return
the column name if available, and specifies that the
ResultSetMetaData.getColumnLabel method should return the AS-clause
name if available, or the column name if available, or the column
title. This option has no effect when StatementInfo parcel support is
unavailable.
I finally found a solution. I aliased the column that was causing the issue using the AS SQL command!

Import data from oracle into hive using sqoop - cannot use --hive-partition-key

I have a simple table:
create table osoba(id number, imie varchar2(100), nazwisko varchar2(100), wiek integer);
insert into osoba values(1, 'pawel','kowalski',36);
insert into osoba values(2, 'john','smith',55);
insert into osoba values(3, 'paul','psmithski',44);
insert into osoba values(4, 'jakub','kowalski',70);
insert into osoba values(5, 'scott','tiger',70);
commit;
that i want to import into Hive using sqoop. I want to have partitioned table in Hive. This is my sqoop command:
-bash-4.1$ sqoop import -Dmapred.job.queue.name=pr --connect "jdbc:oracle:thin:#localhost:1521/oracle_database" \
--username "user" --password "password" --table osoba --hive-import \
--hive-table pk.pk_osoba --delete-target-dir --hive-overwrite \
--hive-partition-key nazwisko
And I get an error:
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
Can anyone suggest how --hive-partition-key parameter should be used?
Sqoop command without --hive-partition-key parameter works fine and creates table pk.pk_osoba in Hive.
Regards
Pawel
In columns option provide all those columns name which you want to import except the partition column. We don't specify the partition column in --columns option as it get automatically added.
Below is the example:
I am importing DEMO2 table with columns ID,NAME, COUNTRY. I have not specified COUNTRY column name in --columns options as it is my partition column name.
$SQOOP_HOME/bin/sqoop import --connect jdbc:oracle:thin:#192.168.41.67:1521/orcl --username orcluser1 --password impetus --hive-import --table DEMO2 --hive-table "DEMO2" --columns ID,NAME --hive-partition-key 'COUNTRY' --hive-partition-value 'INDIA' --m 1 --verbose --delete-target-dir --target-dir /tmp/13/DEMO2

Resources