Appending Data to hive Table using Sqoop

Appending Data to hive Table using Sqoop - hadoop

I am trying to append data to already existing Table in hive.Using the Following command first i import the table from MS-SQL Server to hive.
Sqoop Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id > 100" --username myuser --password mypassword --hive-import
Now i want to append the data to same existing table in hive where "Batch_Id < 100"
I am using the following Command:
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --append --hive-table my_table
This command however runs successfully also updates the HDFS data, but when u connect to hive shell and query the table, the records which are appended are not visible.
Sqoop updated the Data on hdfs "/user/hduser/my_table" but the data on "/user/hive/warehouse/batch_dim" is not updated.
How can reslove this issue.
Regards,
Bhagwant Bhobe

Try using
sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase"
--table "my_table" --where "Batch_Id < 100"
--username myuser --password mypassword
--hive-import --hive-table my_table
when you are using --hive-import DO NOT use --append parameter.

The Sqoop command you're using (--import) is only for ingesting records into HDFS. You need to use the --hive-import flag to import records into Hive.
See http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_importing_data_into_hive for more details and for additional import configuration options (you may want to change the document reference to your version of Sqoop, of course).

Related

Sqoop export from Hcatalog to MySQL with different col names assign

Now my hive table with columns - id, name
and MySQL table - number, id, name
I want to map id (from hive) with number (from mysql), name (from hive) with id (from mysql).
I use the command :
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
However, it didn't work.
The same scenario liked this case can work fine [1]. The requirement can be fulfilled by locating the hive table on hdfs and using the following command to achieve.
sqoop export --export-dir /[hdfs_path] --columns "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
Is there any solution can fulfill my scenario via Hcatalog?
reference :
[1]. Sqoop export from hive to oracle with different col names, number of columns and order of columns

I didn't used the hcatalog part of sqoop, but as is written in the manual, the next script should do the work:
sqoop export --hcatalog-database <my_db> --hcatalog-table <my_table> --map-column-hive "number,id" \
--connect jdbc:mysql://db...:3306/test \
--username <my_user> --password <my_passwd> --table <my_mysql_table>
This option: --map-column-hive when is used along with --hcatalog, does the work for hcatalog instead of hive.
Hope that this works for you.

Is it possible to import a table with sqoop and add an extra timestamp column?

is it possible to use the sqoop command "import table" to import a table from an oracle database to an Hadoop cluster and add an extra column with the current timestamp (for troubleshouting purposes)? so far, I have the following command:
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect jdbc:oracle:thin:#//MY_ORACLE_SERVER --username USERNAME --password PASSWORD --target-dir /MyDIR --fields-terminated-by '\b' --table SOURCE_TABLE --hive-table DESTINATION_TABLE --hive-import --hive-overwrite --hive-delims-replacement '<newline>'
I would like to add a timestamp column to the table so that I know when that data was loaded. Is it possible?
Thanks in advance

you can use the free-form query import instead of table import, and call the timestamp function :
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect jdbc:oracle:thin:#//MY_ORACLE_SERVER --username USERNAME --password PASSWORD --target-dir /MyDIR --fields-terminated-by '\b' ----query 'SELECT a.*,systimestamp FROM SOURCE_TABLE a' --hive-table DESTINATION_TABLE --hive-import --hive-overwrite --hive-delims-replacement '<newline>'
Maybe you could use sysdate instead systimestamp (smaller datatype but less precision)

You can create a temp hive table by using sqoop ,after that create a new hive table by using old one with extra required columns.

import data from vertica to hive

I try to upload data from Vertica to Hive by using Sqoop.
I can see that it creates a file and a table on HIVE, but when i try to select the data from the HIVE or from the file i cannot see the data. it shows me an ERROR(there is no delimiter on the column of the file) select.
this is my code:
sqoop import -m -1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://serverName:5443/DBName" --username "user" --password "pass" --query 'select id, name from contacts limit 10' --target-dir "folder/contacts" --hive-import --create-hive-table --hive-table db.contacts

Use these arguments and choose a delimiters for your data
--fields-terminated-by
--lines-terminated-by

How to use a specified Hive database when using Sqoop import

sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import
The above command imports table tb into the 'default' Hive database.
Can I use other database instead?

Off the top of my head i recall you can specify --hive-table foo.tb
where foo is your hive database and tb is your hive table.
so in your case it would be:
sqoop import --connect jdbc:mysql://remote-ip/db --username xxx --password xxx --table tb --hive-import --hive-table foo.tb
As a footnote, here is the original jira issue https://issues.apache.org/jira/browse/SQOOP-322

Hive database using Sqoop import:
sqoop import --connect jdbc:mysql://localhost/arun --table account --username root --password root -m 1 --hive-import **--hive-database** company **--create-hive-table --hive-table** account --target-dir /tmp/customer/ac

You can specify the database name as a part of the --hive-table parameter, e.g. "--hive-table foo.tb".
There is a new request to add a special parameter for the database that is being tracked: SQOOP-912.

sqoop import complete but hive show tables can't see table

After install hadoop, hive (CDH version) I execute
./sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --hive-home /opt/hive/
All goes fine, but when I enter hive command line and execute show tables, there are nothing.
I use ./hadoop fs -ls, I can see /user/(username)/user existing.
Any help is appreciated.
---EDIT-----------
/sqoop import -connect jdbc:mysql://10.164.11.204/server -username root -password password -table user -hive-import --target-dir /user/hive/warehouse
import fail due to :
11/07/02 00:40:00 INFO hive.HiveImport: FAILED: Error in semantic analysis: line 2:17 Invalid Path 'hdfs://hadoop1:9000/user/ubuntu/user': No files matching path hdfs://hadoop1:9000/user/ubuntu/user
11/07/02 00:40:00 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive exited with status 10
at com.cloudera.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:326)
at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:276)
at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:218)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:228)

Check your hive-site.xml for the value of the property
javax.jdo.option.ConnectionURL. If you do not define this explicitly,
the default value will use a relative path for creation of hive
metastore (jdbc:derby:;databaseName=metastore_db;create=true) which
will be different depending upon where you launch the process from.
This would explain why you cannot see the table via show tables.
define this property value in your
hive-site.xml using an absolute path

no need of creating the table in hive..refer the below query
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table tablename (mysql table) --direct -m 1 --hive-import --create-hive-table --hive-table table name --target-dir '/user/hive/warehouse/Tablename(which u want create in hive)' --fields-terminated-by '\t'

In my case Hive stores data in /user/hive/warehouse directory in HDFS. This is where Sqoop should put it.
So I guess you have to add:
--target-dir /user/hive/warehouse
Which is default location for Hive tables (might be different in your case).
You might also want to create this table in Hive:
sqoop create-hive-table --connect jdbc:mysql://host/database --table tableName --username user --password password

in my case it creates table in hive default database, you can give it a try.
sqoop import --connect jdbc:mysql://xxxx.com/Database name --username root --password admin --table NAME --hive-import --warehouse-dir DIR --create-hive-table --hive-table NAME -m 1

Hive tables will be created by Sqoop import process. Please make sure the /user/hive/warehouse is created in you HDFS. You can browse the HDFS (http://localhost:50070/dfshealth.jsp - Browse the File System option.
Also include the HDFS local in -target dir i.e hdfs://:9000/user/hive/warehouse in the sqoop import command.

First of all , create the table definition in Hive with exact field names and types as in mysql.
Then, perform the import operation
For Hive Import
sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table tablename --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table tablename
'id' can be your primary key of the existing table
'localhost' can be your local ip
'test' is database
'warehouse' directory is in HDFS

I think all you need is to specify the hive table where data should go.
add "--hive-table database.tablename" to the sqoop command and remove the --hive-home /opt/hive/. I think that should resolve the problem.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Appending Data to hive Table using Sqoop - hadoop

Try using sqoop import --connect "jdbc:sqlserver://XXX.XX.XX.XX;databaseName=mydatabase" --table "my_table" --where "Batch_Id < 100" --username myuser --password mypassword --hive-import --hive-table my_table when you are using --hive-import DO NOT use --append parameter.

Related

Sqoop export from Hcatalog to MySQL with different col names assign

Is it possible to import a table with sqoop and add an extra timestamp column?

import data from vertica to hive

How to use a specified Hive database when using Sqoop import

sqoop import complete but hive show tables can't see table

Categories

Resources