explain and show example why we use $CONDITIONS in sqoop

explain and show example why we use $CONDITIONS in sqoop - sqoop

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --query 'select * from table name where $CONDITIONS'

If you want to import the results of a query in parallel, then each map task will need to execute a copy of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. You must also select a splitting column with --split-by.
$ sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id
--target-dir /user/foo/joinresults

Related

How to pass a string value in sqoop free form query

I need to import data from few different SQL servers which have same tables, table structure and even primary key value. So to uniquely identify a record, ingested from a SQLserver say "S1", i want to have a extra column - say "serverName" in my hive tables. How should i add this in my sqoop free form query.
All i want to do is pass a hardcoded value along with list of columns such that the hardcoded column value should get stored in Hive. Once done, I can take care of dynamically changing this value depending upon the server data.
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims
S1 being the hardcoded value here. I am thinking in SQL-way that when you pass a hardcode value, same is returned as the query result. Any pointers how to get this done?
Thanks in Advance.

SOLVED: Actually it just needed an alias for the hardcoded value. So the sqoop command executed is -
sqoop import --connect "connDetails" --username "user" --password "pass" --query "select col1, col2, col3, 'S1' as serverName from table where \$CONDITIONS" --hive-import --hive-overwrite --hive-table stg.T1 --split-by col1 --as-textfile --target-dir T1 --hive-drop-import-delims

Is it possible to write a Sqoop incremental import with filters on the new file before importing?

My doubt is, Say, I have a file A1.csv with 2000 records on sql-server table, I import this data into hdfs, later that day I have added 3000 records to the same file on sql-server table.
Now, I want to run incremental import for the second chunk of data to be added on hdfs, but, I do not want complete 3000 records to be imported. I need only some data according to my necessity to be imported, like, 1000 records with certain condition to be imported as part of the increment import.
Is there a way to do that using sqoop incremental import command?
Please Help, Thank you.

You need a unique key or a Timestamp field to identify the deltas which is the new 1000 records in your case. using that field you have to options to bring in the data to Hadoop.
Option 1
is to use the sqoop incremental append, below is the example of it
sqoop import \
--connect jdbc:oracle:thin:#enkx3-scan:1521:dbm2 \
--username wzhou \
--password wzhou \
--table STUDENT \
--incremental append \
--check-column student_id \
-m 4 \
--split-by major
Arguments :
--check-column (col) #Specifies the column to be examined when determining which rows to import.
--incremental (mode) #Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value) Specifies the maximum value of the check column from the previous import.
Option 2
Using the --query argument in sqoop where you can use the native sql for mysql/any database you connect to.
Example :
sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id --target-dir /user/foo/joinresults
sqoop import \
--query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
-m 1 --target-dir /user/foo/joinresults

sqoop free form query to import n records from a table

I'm trying to import "50" records from a single table using the following query
sqoop import --connect jdbc:mysql://xxxxxxx/db_name --username yyyyy --query 'select * from table where (id <50) AND $CONDITIONS' --target-dir /user/tmp/ -P
I'm having error on this query.
Any ideas ?

i removed the parenthesis in where clause and it worked and when using two or more logical operators use parenthesis otherwise it doesn't work

Sqoop - can I bulk import multiple mysql tables to one HBase/Hive table

If I have multiple similar tables, e.g:
table A: "users", columns: user_name, user_id, user_address, etc etc
table B: "customers" columns: customer_name, customer_id, customer_address, etc etc
table C: "employee" columns: employee_name, employee_id, employe_address, etc etc
Is it possible that using Sqoop to import the three tables into one HBase or Hive table? So After the import, I have one HBase table contains all the records in table A, B, C ?

It's definitely possible if the tables are somehow related. A free-form query can be used in Sqoop to do exactly that. In this case, the free-form query would be a join. For example, when importing into Hive:
sqoop import --connect jdbc:mysql:///mydb --username hue --password hue --query "SELECT * FROM users JOIN customers ON users.id=customers.user_id JOIN employee ON users.id = employee.user_id WHERE \$CONDITIONS" --split-by oozie_job.id --target-dir "/tmp/hue" --hive-import --hive-table hive-table
Similarly, for Hbase:
sqoop import --connect jdbc:mysql:///mydb --username hue --password hue --query "SELECT * FROM users JOIN customers ON users.id=customers.user_id JOIN employee ON users.id = employee.user_id WHERE \$CONDITIONS" --split-by oozie_job.id --hbase-table hue --column-family c1
The key ingredient in all of this is the SQL statement being provided:
SELECT * FROM users JOIN customers ON users.id=customers.user_id JOIN employee ON users.id = employee.user_id WHERE \$CONDITIONS
For more information on free-form queries, check out http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_free_form_query_imports.

Sqoop import fetching more records from source

I recently encountered an issue with SQOOP import.
When I mention the following :
--num-mappers 10 SQOOP import fetches 10x records
--num-mappers 15 SQOOP import fetches 15x records
--num-mappers 1 SQOOP import fetches exact records
for the same select query.
the select query contains a LEFT outer join which when ran on the DB returns x records, which is what I am trying to retrieve.
The query is :
SELECT table1.*,table2.* from table1 left outer join table2 on
(table1.tab1_id = table2.tab2_id);
As tab2_id is a PK for table2 i am using the same for the --split-by clause.
But I am unable to understand why the SQOOP returns different #records when different #mappers are specified

The issue was with the sql query, that was intermediately submitted to the SQOOP job, was not yielding correct result set.
Reason::
The SQOOP command looked like ::
sqoop import --connect <JDBC connection string> -m 10 --hive-drop-import-
delims --fields-terminated-by '\001' --fetch-size=10000 --split-by <PK
column> --query "SELECT table1.*,table2.* from table1 left outer join
table2 on (table1.tab1_id = table2.tab2_id) AND \$CONDITIONS"
And this $CONDITIONS is internally replaced by the lower/upper boundary values.
And as there is no where condition in the query, this $CONDITIONS block is having no effect on the splitting of the data to mappers, resulting in delegating the entire data to all the mappers whichever is being spawned.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

explain and show example why we use $CONDITIONS in sqoop - sqoop

sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --query 'select * from table name where $CONDITIONS'

Related

How to pass a string value in sqoop free form query

Is it possible to write a Sqoop incremental import with filters on the new file before importing?

sqoop free form query to import n records from a table

Sqoop - can I bulk import multiple mysql tables to one HBase/Hive table

Sqoop import fetching more records from source

Categories

Resources