How to use autoincrement-IDs in Sqoop export

How to use autoincrement-IDs in Sqoop export - hadoop

I have a tab-separated textfile in HDFS, and want to export this into a MySQL table.
Since the rows in the textfile do not have numerical ids, how do I export into a table with an ID automatically set during the SQL INSERT (autoincrement)?
If I try to export (id being the last defined attribute in the table), I get
java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at entity.__loadFromFields(entity.java:996)
If I take the autogenerated class and modify it to exclude the id-attribute, I get
java.io.IOException: java.sql.SQLException: No value specified for parameter 27
where parameter 27 is 'id'.
Version is Sqoop 1.3.0-cdh3u3

In Sqoop 1.4.1, writing a "null" in the text file field position corresponding to the autoincrement field worked for me. After exported to mySQL you will see an incremented and automatically asigned ID.

As somebody on the Sqoop mailinglist suggested:
Create a temporary table without the ID
Sqoop-export into this table
Copy the rows of this table into the final table (that has the autoincrement ID)

My source table is in HIVE. What works for me is that I add a column called id int, and populate the column as NULL. After sqoop, the mysql will receive insert (id, X, Y) values (null, "x_value, "y_value"). Then mysql knows to populate the id as auto-increment.

Related

After changing column name in hive, value of column are getting NULL

Working on hive table, where I need to change column name as below, its working as expected and changing column name but underline value of this column getting NULL.
ALTER TABLE db.tbl CHANGE hdfs_loaddate hdfs_load_date String;
Here changed column name is hdfs_load_date and values are getting NULL after renaming column name.
Does any one have idea to fix this. Thanks in advance!!

#Ajay_SK Referencing this article: Hive Alter table change Column Name
There is a comment:
Note that the column change will not change any underlying data if it is a parquet table. That is, if you have data in the table already, renaming a column will not make the data in that column accessible under the new name: select a from test_change; 1 alter table test_change change a a1 int; select a1 from test_change; null
He is specific to parquet, but the scenario you describe is similar where you have successfully changed the name, but hive still thinks the original data is in the original key.
A better approach to solve your issue, would be to create a new table of the schema you want with column name change. Then perform an Insert INTO new table select FROM * old table.

Drop Columns while importing Data in sqoop

I am importing data from oracle to hive . My table doesn't have any integer columns which can be used in my primary keys .So I am not able to use it in my split-by column.
Alternatively I created a row_num column for all rows present in the table . Then this row_num column will be used in split-by column. Finally I want to drop this column from my hive table.
Column list is huge ,I dont want to select all columns using --columns neither I want to create any temporary table for this purpose.
Please let me know whether we can handle this in sqoop arguments.

Can Any little tweek on the --query parameter help you?
Something below.
sqoop import --query 'query string'

How to create a table in H base-shell without passing values & rowid?

hello "I m new to Hbase.. My question is How to create a table in hbase with column family & column names inside the columnfamily without passing values and row key?Is it possible to create that table in hbase shell?
In Sql we create a table first and later we add data ..same thing how can we do it in hbase?

HBase is a nosql key value database. The tables can be created just by specifying table name and column family for example create "sampletable","m" where sampletable is table name and m is column family. If you want to use SQL queries on HBase try Apache Phoenix.

How to drop hive column?

I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?

In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.

The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.

Add a fixed value column to a table in Hive

I am prototyping my pig script to hive. I need to add a status column to the table which is imported from Oracle database.
My pig scripts looks like this:
user_data = LOAD 'USER_DATA' USING PigStorage(',') AS (USER_ID:int,MANAGER_ID:int,USER_NAME:int);
user_data_status = FOREACH user_data GENERATE
USER_ID,
MANAGER_ID,
USER_NAME,
'active' AS STATUS;
Here I am adding the STATUS column with 'active' value to the user_data table.
How can I add column to an existing table to add column while importing the table via Hive QL??

As far as I know, You will have to reload the data as you did in Pig.
For example, If you already have the table user_data with columns USER_ID:int,MANAGER_ID:int,USER_NAME:int and you are looking for USER_ID:int,MANAGER_ID:int,USER_NAME:int, STATUS:active
You can re-load the table user_data_status by using something like this
INSERT OVERWRITE TABLE user_data_status SELECT *, 'active' AS STATUS FROM user_data;
Though there are options to add columns to the existing table, that would only update the metadata in metastore and the values would be defaulted to NULL.
If I was you, I would rather re-load the complete data rather than looking to update the complete table using UPDATE command after Altering the column structure. Hope this helps !

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to use autoincrement-IDs in Sqoop export - hadoop

In Sqoop 1.4.1, writing a "null" in the text file field position corresponding to the autoincrement field worked for me. After exported to mySQL you will see an incremented and automatically asigned ID.

As somebody on the Sqoop mailinglist suggested: Create a temporary table without the ID Sqoop-export into this table Copy the rows of this table into the final table (that has the autoincrement ID)

My source table is in HIVE. What works for me is that I add a column called id int, and populate the column as NULL. After sqoop, the mysql will receive insert (id, X, Y) values (null, "x_value, "y_value"). Then mysql knows to populate the id as auto-increment.

Related

After changing column name in hive, value of column are getting NULL

Drop Columns while importing Data in sqoop

How to create a table in H base-shell without passing values & rowid?

How to drop hive column?

Add a fixed value column to a table in Hive

Categories

Resources