I do not understand what is jar and class-name in this example code - hadoop

I'm trying to merge the incremented data on hdfs using sqoop, this is the sample code I found on google https://developer.ibm.com/hadoop/2017/02/28/typical-scenario-sqoop-incremental-import-merge/
I do not understand what is that jar-file there and class-name, path to which jar file should I provide and what class name?
Can someone help me understand? Thank you.
sqoop merge --new-data /apps/hive/warehouse/student/part-m-00000
--onto /apps/hive/warehouse/student/part-m-00000_copy_1
--target-dir /tmp/sqoop_merge
--jar-file /tmp/sqoop-ambari-qa/compile/9062c87c959e4090dcec5995a439b514/TIME.jar
--class-name TIME
--merge-key TIME

I used Codegen to create the jar file later could see the class name as well. This is the code I found to create the jar file,
sqoop codegen \
--connect jdbc:sqlserver://localhost/<db> \
--username <username> --password <password> \
--table <tablename from database>
by the end of the execution you will get an output like,
18/01/16 11:44:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-user1/compile/6430d9e2fe24cec8b2cb13f684806ca6/student.jar
after which to check the class-name, I did
$ cd /tmp/sqoop-user1/compile/6430d9e2fe24cec8b2cb13f684806ca6/
:/tmp/sqoop-user1/compile/6430d9e2fe24cec8b2cb13f684806ca6$ ls
That will give you class name,jar etc,
student.class student.jar student.java
Thank you.

Related

how to include jar file for oozie

I am trying to a sqoop action in oozie, but mysql-connector-java.jar is not present in /user/oozie/share/lib/sqoop, because of no permission I am not able to add the jar as of now,
Is there any way or workaround to include mysql-connector-java.jar in workflow.xml
I have placed the jar in sqoop apps / lib directory, but its not working
In general, the Hadoop administrator should keep all the common library in Hadoop distribution to make the usage more efficient, if not, give a try to the following -jarfile option
sqoop import \
-libjars /file/location/path/mysql-connector-java.jar \
--connect jdbc:mysql://localhost:3306:3306/retail_db \
--username root \
--password xyzpwd \
--table order_items \
--target-dir /user/cloudera/landing_zone/sqoop_import/order_items
as per sqoop documentation:
-libjars specify a comma separated jar files to include in the classpath. The -files, -libjars, and -archives arguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.

Does sqoop import/export create java classes? If it does so, what is the location of these classes?

Does sqoop import/export create java classes? If it does so, where can I see these generated classes. What is the location of these class files?
Does sqoop import/export create java classes?
Yes
If it does so, where can I see these generated classes. What is the location of these class files?
It automatically generates a java file of same table name in the
current path of local system.
You can use --outdir to provide your own path.
Updated as per comment
You can use codegen command for this:
sqoop codegen \
--connect jdbc:mysql://localhost/databasename\
--username username\
--password password\
--table tablename
After the command is executed successfully there will be a path at the end where you can see the java files.
This is the complete flow of sqoop commands
User---> SQOOP CLI cmd ----> Sqoop Code GEN -----> Sqoop JAR Writer
----> JAR submission ---> ResourceManager ----> MR operation (5phases) ----> HDFS ----> Ack to Sqoop by MR program
**
Sqoop internally uses MapReducev1 or v2 for its execution(Getting data from DB and Storing the same in HDFS in comma delimited values). And it first creates a .java source file for the map-reduce prg and pakages in jar and then submits.
The .java is created in the current local directory with name of table.
sqoop import --connect jdbc:mysql://localhost/hadoop --table employee -m 1
In this case a "employee.java" is created .

Sqoop job fails with KiteSDK validation error for Oracle import

I am attempting to run a Sqoop job to load from an Oracle db and into Parquet format to a Hadoop cluster. The job is incremental.
Sqoop version is 1.4.6. Oracle version is 12c. Hadoop version is 2.6.0 (distro is Cloudera 5.5.1).
The Sqoop command is (this creates the job, and executes it):
$ sqoop job -fs hdfs://<HADOOPNAMENODE>:8020 \
--create myJob \
-- import \
--connect jdbc:oracle:thin:#<DBHOST>:<DBPORT>/<DBNAME> \
--username <USERNAME> \
-P \
--as-parquetfile \
--table <USERNAME>.<TABLENAME> \
--target-dir <HDFSPATH> \
--incremental append \
--check-column <TABLEPRIMARYKEY>
$ sqoop job --exec myJob
Error on execute:
16/02/05 11:25:30 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.ValidationException: Dataset name
05112528000000918_2088_<USERNAME>.<TABLENAME>
is not alphanumeric (plus '_')
at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103)
at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:66)
at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
at org.kitesdk.data.Datasets.create(Datasets.java:239)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:80)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:106)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:668)
at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Troubleshooting Steps:
0) HDFS is stable, other Sqoop jobs are functional, Oracle source DB is up and the connection has been tested.
1) I tried creating a synonym in Oracle, that way I could simply have the --table option as:
--table TABLENAME (without the username)
This gave me an error that the table name was not correct. It needs the full USERNAME.TABLENAME for the --table option.
Error:
16/02/05 12:04:46 ERROR tool.ImportTool: Imported Failed: There is no column found in the target table <TABLENAME>. Please ensure that your table name is correct.
2) I made sure that this is a Parquet issue. I removed the --as-parquetfile option and the job was successful.
3) I wondered if this is somehow caused by the incremental options. I removed the --incremental append & --check-column options and the job was successful. This confuses me.
4) I tried the job with MySQL and it was successful.
Has anyone run into something similar? Is there a way (or is it even advisable) to disable the Kite validation? It seems that the dataset is being created with dots ("."), which then Kite SDK complains about - but this is an assumption on my part as I am not too familiar with Kite SDK.
Thanks in advance,
Jose
Resolved. There seems to be a known issue with the JDBC connectivity to Oracle 12c. Using a specific OJDBC6 (instead of 7) did the trick. FYI - the OJDBC is installed in /usr/share/java/ and a symbolic link is created in /installpath.../lib/sqoop/lib/
As reported by user #Remya Senan,
breaking the parameter
--hive-table my_hive_db_name.my_hive_table_name
into separate params
--hive-database my_hive_db_name
--hive-table my_hive_table_name
did the trick for me
My environment was
Sqoop v1.4.7
Hive 2.3.3
Tip: I was on emr-5.19.0
I also got this error when I was sqoop importing all tables as parquet file on CHD5.8. By looking at error message I felt this implementation does not support directories with "-" in their name. Based on this understanding I removed "-" from directory name and re-ran the sqoop import command and all worked fine. Hope this helps!

hadoop sqoop error while importing data using options file

i am new to hadoop and while practicing sqoop i have got this error message , the command i have used is
i created an import.txt file and in that i used
import --connect jdbc:mysql://localhost/hadoopdb --username hadoop -P and placed this file on HDFS.
while importing i have given this file to the sqoop tool using the --options-file command. so the final command i have given at the command promt is as follows,
sqoop --options-file /user/cloudera/import.txt --table employee
after hiting the enter key i have got the following error message
sqoop --options-file /user/cloudera/import.txt --table employee
13/10/16 13:43:12 ERROR sqoop.Sqoop: Error while expanding arguments
java.lang.Exception: Unable to read options file: /user/cloudera/import.txt
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:102)
at com.cloudera.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:33)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:201)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
Caused by: java.io.FileNotFoundException: /user/cloudera/import.txt (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.sqoop.util.OptionsFileUtil.expandArguments(OptionsFileUtil.java:70)
... 4 more
Unable to read options file: /user/cloudera/import.txt
can anyone tell me why the error is coming.
Thanks in advance.
--option-file path should be local directory. Don't use HDFS directory.
sqoop --options-file /home/cloudera/import.txt --table employee
I got the same issue. I solved it using the following approach.
In the options file you have to mention tools,commands and their arguments line by line
In your case, Your options file "import.txt" should be created like this
$cat > import.txt
import
--connect
jdbc:mysql://localhost/hadoopdb
--username
hadoop
-P
After you created the options file, you can use this to import the table
sqoop --options-file /user/cloudera/import.txt --table employee
Hope this works. Key is you have to mention tools and arguments line by line.
For more understanding on this refer this link
Sqoop User Guide by Apache.org
Correct me if I am wrong.
If you are Calling Sqoop from Oozie and you are facing the same issue -Unable to read options file.
Then you need place the option-file inside workflow location and specify the file in sqoop action-files and also you need to change the permission for that file to chmod 674(When workflow is running in oozie, it will run with sqoop user so it is mandatory to change permission).
This will resolve the error.
I put option file in local directory, It worked.
Also
Argument and value should in different line.
like
--where
'sal > 5000'
and not like
--where 'sal > 5000'
[cloudera#quickstart sqoop]$ sqoop --options-file
/home/cloudera/Desktop/SqoopOptions.txt --table departments --username root --
password cloudera -m 1 --target-dir jan1301
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
No such sqoop tool: import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera. See 'sqoop help'.
I have received above error when I define the SqoopOptions.txt file data in a single line.
The issue resolved when i define each parameter & value in different line like below.
if you are trying on single node cluster, option file can be placed under local file system.
your option file should be like this.
import
--connect
"jdbc:mysql://localhost:3306/sakila"
--username root
-P
for each parameter there should be a next line space.
once you saved the option file then use below command.
sqoop --options-file "your optionfile location" --table abc
hope this should work as this option is perfectly working for me.
Thanks,
Suresh.

Sqoop : import data from Oracle

I try to use Sqoop to import data from an Oracle DB.
I have placed the Oracle JDBC Driver (ojdbc6.jar) into SQOOP_HOME/lib.
My JDK is 1.6 version.
Here is my query :
sqoop import --hive-import --connect jdbc:oracle:thin#<ip_server>:1521/db --table ENTITE --username username --password password
But, when i launch the command, i get this error :
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.oracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.oracleDriver
I don't understand why Sqoop can't connect to my db server.
Thanks for your help
If your using sqoop 1.4.2 assuming based on ojdbc6.jar above then see comments about the --driver usage from Kathleen here as it shouldn't be required:
https://issues.apache.org/jira/browse/SQOOP-457
With sqoop 1.4.2 and dropping ojdbc6.jar into my sqoop/lib this string works w/HDP 1.3 and MapR 2.0:
sqoop import --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" \
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \
--num-mappers 1 --verbose -P \
If you have access to mysql and or sql server, etc. test those too and make sure your lib directory is getting picked up. SQL Server is / was supposed to be in sqoop 1.4, but the docs and attempting to use it proved otherwise:
http://www.microsoft.com/en-us/download/confirmation.aspx?id=11774 - here is what you want for sql server testing.
cheers.
You need to add the oracle jdbc driver inside sqoop lib directory
You have to download the oracle connector jar file and copy that jar file to lib folder of Sqoop.
The jar file can be downloaded from http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html
copy this jar file to your Sqoop lib folder (/usr/lib/sqoop/lib)
And run the sqoop command.
Check your sqoop classpath by adding echo and make sure your driver is on the classpath. Same problem I have faced and resolved it.
look at the error message: Could not load db driver class: oracle.jdbc.oracleDriver
You need to type oracle.jdbc.OracleDriver with high register "O", since java is case sensitive
The error says that sqoop can't load oracle driver class as there is no ojdbc driver jar file in its path.First, You have to add ojdbc driver jar to lib folder of your sqoop home. Please download it here
http://www.java2s.com/Code/Jar/o/Downloadojdbc6jar.htm
oracle ojdbc6.jar needs to be copied to sqoop/lib directory to make it work.
You can state the oracle driver you use like so
sqoop import --hive-import --driver oracle.jdbc.driver.oracledriver --connect jdbc:oracle:thin#<ip_server>:1521/db --table ENTITE --username username --password password
sqoop import --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=hostip)(port=1521))(connect_data=(service_name=servicename)))" --username user --password pwd --table schema.tablename --hive-import --num-mappers 1 --verbose -P

Resources