sqoop import complete but hive show tables can't see table - hadoop

After install hadoop, hive (CDH version) I execute
./sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --hive-home /opt/hive/
All goes fine, but when I enter hive command line and execute show tables, there are nothing.
I use ./hadoop fs -ls, I can see /user/(username)/user existing.
Any help is appreciated.
---EDIT-----------
/sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --target-dir /user/hive/warehouse
import fail due to :
11/07/02 00:40:00 INFO hive.HiveImport: FAILED: Error in semantic analysis: line 2:17 Invalid Path 'hdfs://hadoop1:9000/user/ubuntu/user': No files matching path hdfs://hadoop1:9000/user/ubuntu/user
11/07/02 00:40:00 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 10
at com.cloudera.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:326)
at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:276)
at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:218)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:228)

Check your hive-site.xml for the value of the property
javax.jdo.option.ConnectionURL. If you do not define this explicitly,
the default value will use a relative path for creation of hive
metastore (jdbc:derby:;databaseName=metastore_db;create=true) which
will be different depending upon where you launch the process from.
This would explain why you cannot see the table via show tables.
define this property value in your
hive-site.xml using an absolute path

no need of creating the table in hive..refer the below query
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table tablename (mysql table) --direct -m 1 --hive-import --create-hive-table --hive-table table name --target-dir '/user/hive/warehouse/Tablename(which u want create in hive)' --fields-terminated-by '\t'

In my case Hive stores data in /user/hive/warehouse directory in HDFS. This is where Sqoop should put it.
So I guess you have to add:
--target-dir /user/hive/warehouse
Which is default location for Hive tables (might be different in your case).
You might also want to create this table in Hive:
sqoop create-hive-table --connect jdbc:mysql://host/database --table tableName --username user --password password

in my case it creates table in hive default database, you can give it a try.
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table NAME --hive-import --warehouse-dir DIR --create-hive-table --hive-table NAME -m 1

Hive tables will be created by Sqoop import process. Please make sure the /user/hive/warehouse is created in you HDFS. You can browse the HDFS (http://localhost:50070/dfshealth.jsp - Browse the File System option.
Also include the HDFS local in -target dir i.e hdfs://:9000/user/hive/warehouse in the sqoop import command.

First of all , create the table definition in Hive with exact field names and types as in mysql.
Then, perform the import operation
For Hive Import
sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table tablename --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table tablename
'id' can be your primary key of the existing table
'localhost' can be your local ip
'test' is database
'warehouse' directory is in HDFS

I think all you need is to specify the hive table where data should go.
add "--hive-table database.tablename" to the sqoop command and remove the --hive-home /opt/hive/. I think that should resolve the problem.

Related

error while performing sqoop - merge

I was trying to sqoop merge two data sets by importing the data from the netezza server.
below are the data sets with the numeric as id and letters as name:
Both of the below tables are imported from netezza using the commands:
sqoop import --connect neteeza_url --username uname --password pwd --table sqoop_merge_1 --hive-import --warehouse-dir hdfs_pth --create-hive-table sqoop_merge_1 -m 1
sqoop_merge_1:
1,a
2,b
3,c
4,d
5,e
sqoop_merge_2:
4,z
5,y
and the commands are:
sqoop merge --new-data hdfs_path/sqoop_merge_2 --onto hdfs_path/sqoop_merge_1 --target-dir hdfs_path/sqoop_merge_output --jar-file jar_file_path/sqoop_merge_class_name.jar --class-name sqoop_merge_class_name --merge-key id
I created the jar file by using the codegen command:
sqoop codegen --connect netezza_url --username uname --password -pwd --table sqoop_merge_1
But I am getting the following error:
java.io.IOException: Cannot join values on null key. Did you specify a key column that exists?
Tried all the ways i knew but still getting the error.
Please help.
As you are sure about id column existence, it could be an issue due to case-sensitivity.
Check if you specified ID in Netezza?
If yes, try with --merge-key ID.

sqoop import as parquet file to target dir, but can't find the file

I have been using sqoop to import data from mysql to hive, the command I used are below:
sqoop import --connect jdbc:mysql://localhost:3306/datasync \
--username root --password 654321 \
--query 'SELECT id,name FROM test WHERE $CONDITIONS' --split-by id \
--hive-import --hive-database default --hive-table a \
--target-dir /tmp/yfr --as-parquetfile
The Hive table is created and the data is inserted, however I can not find the parquet file.
Does anyone know?
Best regards,
Feiran
Sqoop import to hive works in 2 steps:
Fetching data from RDBMS to HDFS
Create hive table if not exists and Load data into hive table
In your case,
firstly, data is stored at --target-dir i.e. /tmp/yfr
Then, it is loaded into Hive table a using
LOAD DATA INPTH ... INTO TABLE..
command.
As mentioned in the comments, data is moved to hive warehouse directory that's why there is no data in --target-dir.

how to overwrite the data in hive using sqoop

I am trying to load data into an already existing table in hive via sqoop from mysql database. I am referring to the below guide for reference:-
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive
--hive-import has been tried and tested successfully.
I created a hive table as below:-
create table sqoophive (id int, name string, location string)
row format delimited
fields terminated by '\t'
lines terminated by '\n'
stored as textfile;
Loaded the data as required.
I want to use --hive-overwrite option to overwrite the content in the above table. As per the guide mentioned above - "--hive-overwrite Overwrite existing data in the Hive table."
"If the Hive table already exists, you can specify the --hive-overwrite option to indicate that existing table in hive must be replaced."
So I tried the below queries separately to get the result:-
sqoop import --connect jdbc:mysql://localhost/test --username root --password 'hr' --table sample --hive-import --hive-overwrite --hive-table sqoophive -m 1 --fields-terminated-by '\t' --lines-terminated-by '\n'
sqoop import --connect jdbc:mysql://localhost/test --username root --password 'hr' --table sample --hive-overwrite --hive-table sqoophive -m 1 --fields-terminated-by '\t' --lines-terminated-by '\n'
but rather than replacing the content in the existing table it just created a file in the below path /user/<username>/<mysqltablename>
Can somebody please explain me where I am going wrong?
the first query should work fine. I didn't give fields terminated and lines terminated as the schema already exists.
the keywords --hive-import and --hive-overwrite should be there.
if only --hive-overwrite is there, it doesn't load data to the table. just copies to hdfs.
It's putting the _SUCCESS file in
/user/<username>/<mysqltablename>
You can change where that goes with --warehouse-dir
ex: --warehouse-dir /tmp
One would think that hive-overwrite would handle this, meaning remove that directory first. But for good reason Hive doesn't want to start removing dirs in HDFS. What if something else was put in there?
hive-overwrite is saying, "I'm going to overwrite the rows in Hive, not just add to the table." Thus you will not have duplicates.
You have to remove that directory and the _SUCCESS file first; or better yet, right after the import is successful.
hadoop fs -rm -R /user/<username>/<mysqltablename>
sqoop import with out --target-dir OR --warehouse-dir (for --hive-import) will import /user/<username>/<mysqltablename>:
By default, Sqoop will import a table named foo to a directory named
foo inside your home directory in HDFS. For example, if your username
is someuser, then the import tool will write to
/user/someuser/foo/(files). You can adjust the parent directory of the
import with the --warehouse-dir argument.
You can also explicitly choose the target directory with --target-dir param
but as #hrobertv said that --hive-overwrite does not delete existing dir but it overwrites the HDFS data location of hive table. if you want to save new data at same location as origin than you would have to delete the existing table dir first and then run sqoop import with specifying --target-dir OR --warehouse-dir for --hive-overwrite to store data at specific location as per your requirement...

how to import table from rdbms to hive using sqoop in hadoop cluster?

I am trying to import table from RDBMS to HIVE using SQOOP in hadoop cluster, i am getting the following error, can you please provide the solution for this.
bin/sqoop-import --connect jdbc:mysql://localhost:3306/hadoop -username root -password root --table salaries --hive-table salaries --create-hive-table --hive-import --hive-home /home/techgene/hive-0.11.0 -m 1 --target-dir /user/hive/warehouse
Exception:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
14/06/02 14:30:19 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 1
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:364)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:314)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Whenever ,you are using a Sqoop with Hive import option,the sqoop connects directly the corresponding the database's metastore and gets the corresponding table 's metadata(the table's schema),so there is no need to create a table structure in Hive.This schema is then provided to the Hive when used with Hive-import option.
**Example::
**sudo sqoop import-all-tables --connect jdbc:mysql://10.0.0.57/movielens --username root --password root —hive-import**
this is too import the tables from movielens database in mysql.

**sqoop import \
--connect jdbc:mysql://10.0.0.57/movielens \
--username root \
--password hadoop \
--table cities \
--hive-import**
this is to just import one table called cities.**
So the output of all the sqoop data on HDFS will by default stored in the default directory .i.e /user/sqoop/tablename/part-m files
with hive import option,the tables will be downloaded directly into the default warehouse direcotry i.e.
/user/hive/warehouse/tablename
command : sudo -u hdfs hadoop fs -ls -R /user/
this lists recursively all the files with in the user.
Now go to Hive and type show databases.if there is only default database, then type show tables:
remember OK is common default system output and is not part of the command output.
hive> show databases;
OK
default
Time taken: 0.172 seconds
hive> show tables;
OK
genre
log_apache
movie
moviegenre
movierating
occupation
user
Time taken: 0.111 seconds
Check for the syntax, eliminate extra spaces..
$ sqoop-import --connect "jdbc:mysql://localhost:3306/hadoop;database=< db_name >"
-username root
-password root
--table salaries
--hive-import
--target-dir /user/hive/warehouse
No need to mention --hive-table < table_name > if using the same name as in mysql

Appending Data to hive Table using Sqoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe
Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.
The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

Resources