I am importing data from memsql to Hdfs using Sqoop. My source table in Memsql doesn't have any integer value, I created a new table including a new column 'test' with the existing columns.
FOllowing is the query
sqoop import --connect jdbc:mysql://XXXXXXXXX:3306/db_name --username XXXX --password XXXXX --query "select closed,extract_date,open,close,cast(floor(rand()*1000000 as int) as test from tble_name where \$CONDITIONS" --target-dir /user/XXXX--split-by test;
this query gave me following error :
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as int) as test from table_name where (1 = 0)' at line 1
I tried it another way as well:
sqoop import --connect jdbc:mysql://XXXXX:3306/XXXX --username XXXX --password XXXX --query "select closed,extract_date,open,close,ceiling(rand()*1000000) as test from table_name where \$CONDITIONS" --target-dir /user/dfsdlf --split-by test;
With the following query the job gets executed, but there is no data being transferred. It says split-by column is of float type and change it to integer type strictly.
Please help me with this to change split-by column as integer type from float type
The problem mostly seems to be related with the use of alias as the --split-by parameter.
If it's required to use the particular column in the query , you can run the query
'select closed,extract_date,open,close,ceiling(rand()*1000000) from table_name' in the console, get the column name thus coming for the table in the console and use it in --split-by 'complete_column_name_from_console' (here it should be --split-by 'ceiling(rand()*1000000)') .
Related
Having an issue with sqooping from a Teradata database when using the Teradata method "--fast-export", example sqoop query is below
-Dhadoop.security.credential.provider.path=jceks:/PATH/TO/password/password.jcecks
-Dteradata.db.job.data.dictionary.usexviews=false
--connect
jdbc:teradata://DATABASE
--password-alias
password.alias
--username
USER
--connection-manager
org.apache.sqoop.teradata.TeradataConnManager
--fields-terminated-by
'\t'
--lines-terminated-by
'\n'
--null-non-string
''
--null-string
''
--num-mappers
8
--split-by
column3
--target-dir
/THE/TARGET/DIR
--query
SELECT column1,column2,column3 WHERE column3 > '2020-01-01 00:00:00' and column3 <= '2020-01-12 10:41:20' AND $CONDITIONS
--
--method
internal.fastexport
The error I am getting is
Caused by: com.teradata.connector.common.exception.ConnectorException: java.sql.SQLException: [Teradata Database] [TeraJDBC ] [Error 3524] [SQLState 42000] The user does not have CREATE VIEW access to database DATABASE.
I suspect fast export will implement a staging table/view to be temporarily created, and the job under the hood will be ingesting from the temp table. Is this a sqoop mechanism and is it possible to turn it off?
Many thanks
Dan
Fast export does not implement any view to extract data. The view is being created by Sqoop based on --query value. Hence, the user running the job must have CV right granted on the DATABASE.
You can check user's rights on the database by running the below query replacing USER_NAME and DATABASE_NAME by their values in your env.
ACCESS_RIGHT = 'CV' , means CREATE VIEW so leave it as it is.
SELECT *
FROM dbc.allRoleRights WHERE roleName IN
(SELECT roleName FROM dbc.roleMembers WHERE grantee = 'USER_NAME')
AND DATABASENAME = 'DATABASE_NAME'
AND ACCESS_RIGHT = 'CV'
ORDER BY 1,2,3,5;
You may need CT (Create table) rights in order to create log table for fast export. This is given by Sqoop parameters --error-table and --error-database
I need to import data from few different SQL servers which have same tables, table structure and even primary key value. So to uniquely identify a record, ingested from a SQLserver say "S1", i want to have a extra column - say "serverName" in my hive tables. How should i add this in my sqoop free form query.
All i want to do is pass a hardcoded value along with list of columns such that the hardcoded column value should get stored in Hive. Once done, I can take care of dynamically changing this value depending upon the server data.
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims
S1 being the hardcoded value here. I am thinking in SQL-way that when you pass a hardcode value, same is returned as the query result. Any pointers how to get this done?
Thanks in Advance.
SOLVED: Actually it just needed an alias for the hardcoded value. So the sqoop command executed is -
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' as serverName from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims
I've got an error importing data from teradate to an Hadoop cluster using Sqoop.
My tera table have 2 columns title (not columns name)equivalent. Is there an automatic way to use col name instead of col title in my sqoop job ?
I've tried to use a "Select * from table" as a query but does not work.
And I can't change col title in teradata.
Here my job code :
sqoop job -Dmapred.job.queue.name=shortduration \
--create inc_My_Table \
-- import \
--connect jdbc:teradata://RCT/DATABASE=DWHBIG \
--driver com.teradata.jdbc.TeraDriver \
--username MBIGDATA -P \
--query "select a.* from My_Table a where \$CONDITIONS" \
--target-dir /data/source/fb/$i \
--check-column DAT_MAJ_DWH \
--incremental append \
--last-value 2001-01-01 \
--split-by ID
Any idea ? Thanks
Since Teradata JDBC Driver 16.00.00.28, you can use connection URL parameter COLUMN_NAME to control the behavior of the getColumnName and getColumnLabel to return the column name, column title, or As-clause name, That should resolve your problem.
COLUMN_NAME=OFF (the default) specifies that the
ResultSetMetaData.getColumnName method should return the AS-clause
name if available, or the column name if available, or the column
title, and specifies that the ResultSetMetaData.getColumnLabel method
should return the column title.
COLUMN_NAME=ON specifies that, when StatementInfo parcel support is
available, the ResultSetMetaData.getColumnName method should return
the column name if available, and specifies that the
ResultSetMetaData.getColumnLabel method should return the AS-clause
name if available, or the column name if available, or the column
title. This option has no effect when StatementInfo parcel support is
unavailable.
I finally found a solution. I aliased the column that was causing the issue using the AS SQL command!
I am getting error Unrecognized argument --hive-partition-key , when I run the following statement:
sqoop import
--connect 'jdbc:sqlserver://192.168.56.1;database=xyz_dms_cust_100;username-hadoop;password=hadoop'
--table e_purchase_category
--hive_import
--delete-target-dir
--hive-table purchase_category_p
--hive-partition-key "creation_date"
--hive-partition-value "2015-02-02"
The partitioned table exists.
Hive partition key (creation_date in your example) should not be part of your database table when you are using hive-import. When you are trying to create table in hive with partition you will not include partition column in your table schema. The same applies to sqoop hive-import as well.
Based on your sqoop command, i am guessing that creation_date column is present in your SQLServer table. If yes, you might be getting this error
ERROR tool.ImportTool: Imported Failed:
Partition key creation_date cannot be a column to import.
To resolve this issue, i have two solutions:
Make sure that the partition column is not present in the SQLServer table. So, when sqoop creates hive table it includes that partition column and its value as directory in hive warehouse.
Change the sqoop command by including a free form query to get all the columns expect the partiton column and do hive-import. Below is a example for this solution
Example:
sqoop import
--connect jdbc:mysql://localhost:3306/hadoopexamples
--query 'select City.ID, City.Name, City.District, City.Population from City where $CONDITIONS'
--target-dir /user/XXXX/City
--delete-target-dir
--hive-import
--hive-table City
--hive-partition-key "CountryCode"
--hive-partition-value "USA"
--fields-terminated-by ','
-m 1
Another method:
You can also try to do your tasks in different steps:
Create a partition table in hive (Example: city_partition)
Load data from RDBMS to sqoop using hive-import into a plain hive table (Example: city)
Using insert overwrite, import data into partition table (city_partition) from plain hive table (city) like:
INSERT OVERWRITE TABLE city_partition
PARTITION (CountryCode='USA')
SELECT id, name, district, population FROM city;
It could applied too :
sqoop import --connect jdbc:mysql://localhost/akash
--username root
--P
--table mytest
--where "dob='2019-12-28'"
--columns "id,name,salary"
--target-dir /user/cloudera/
--m 1 --hive-table mytest
--hive-import
--hive-overwrite
--hive-partition-key dob
--hive-partition-value '2019-12-28'
I'm trying to import "50" records from a single table using the following query
sqoop import --connect jdbc:mysql://xxxxxxx/db_name --username yyyyy --query 'select * from table where (id <50) AND $CONDITIONS' --target-dir /user/tmp/ -P
I'm having error on this query.
Any ideas ?
i removed the parenthesis in where clause and it worked and when using two or more logical operators use parenthesis otherwise it doesn't work