Use Sqoop on different user? - hadoop

I am having two users "ashish" and "ashuser". Hadoop is installed on "ashuser".
But I mistakenly installed Sqoop on "ashish" user.
So whenever I tried to check Sqoop version in "ashuser", I am getting an error "Command Not Found".
I tried giving ownership to Sqoop folder for "ashuser".
Is there any possibility that I can use Sqoop on "ashuser"?

Related

Sqoop command not found when running through Oozie

When I am running Sqoop script in CLI, it is running fine without any issue. But when run it using Oozie, it failed with Sqoop command not found. It seems sqoop is not installed in other data nodes. So to run Sqoop script using Oozie, sqoop should be installed in all data nodes or is there any alternatives for that. Currently we have one master and 2 Data nodes.

Can SQOOP work with a custom libpath?

I am trying to get some table data imported from PostgreSQL to HDFS using Sqoop. Now due to licensing constraints, Sqoop does not come packaged with JDBC drivers for all JDBC compliant databases. PostgreSQL is one of them. In order to interact with this database, Sqoop needs the relevant JDBC driver to be installed into a preset classpath (typically $SQOOP_HOME/lib).
In my case, the Hadoop administrator does not provide me write access to this predefined classpath. Is there any alternate way to instruct Sqoop client to look into some path (say, my home directory) instead of or in addition to the preset location?
I looked into the official Apache documentation and searched the internet, but could not fetch any answer. Could anyone please help?
Thanks !
I got this working yesterday. Below are the steps to follow.
Download the appropriate JDBC driver from here
Put the jar file under the directory of choice. I chose
the hadoop cluster user's home directory i.e. /home/myuser
export HADOOP_CLASSPATH="/home/myuser/postgresql-9.4.1209.jar"
(replace /home/myuser/postgresql-9.4.1209.jar with your path and jar file name)
To perform Sqoop import you may use the below command.
sqoop import
--connect 'jdbc:postgresql://<postgres_server_url>:<postgres_port>/<db_name>'
--username <db_user_name>
--password <db_user_password>
--table <db_table_name>
--warehouse-dir <existing_empty_hdfs_directory>
To perform Sqoop export you may use the below command.
sqoop export
--connect 'jdbc:postgresql://<postgres_server_url>:<postgres_port>/<db_name>'
--username <db_user_name>
--password <db_user_password>
--table <db_table_name>
--export-dir <existing_hdfs_path_containing_export_data>
As per Sqoop docs,
-libjars <comma separated list of jars>- specify comma separated jar files to include in the classpath.
Make sure you use -libjars as first argument in the command.
EDIT :
According to docs,
The -files, -libjars, and -archives arguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.
So, JDBC client jars need to be put at $SQOOP_HOME/lib.
I had recently experienced issue with this -libjars option. It doesn't work perfectly. Probably this issue is propagated from Hadoop jar command line option. Possible option is to specify your extra jars using HADOOP_CLASSPATH environmental variable.
You have to export path to your driver jar file.
export HADOOP_CLASSPATH=<path_to_driver_jar>.jar
After this, it can correctly pick up the jar file you specified. -libjars option doesn't correctly pick the file. I noticed this in sqoop version 1.4.6.

Import data from Oracle(Windows) to HDFS (CDH3) machine using sqoop

Hi I am taking a training in HADOOP. I have a task in which I have to import a tables data from oracle(windows, 11g xe) to hdfs using sqoop. I am reading the following article. My question is that how do I exactly import data from windows to hdfs. Noramally I use Winscp to transfer files from Windows to hdfs machine. I have imported data from MySql which was installed in hdfs(cdh3) machine. But I don't know to import data from Oracle in windows to hdfs. Please help.
Link that I am following
Following is the step wise process:
1.Connect oracle sql command line log in with your credentials:
e.g username : system password: system
(make sure that this user has all administrative privileges or connect as sysdba in oracle make a new user with all privilegdes)
Create a user with all privileges in Oracle
Create tables under that user and insert some values and commit
2.Now we need a connector for transferring our data from Oracle to HDFS.
So, we need to download the oracle -sqoop connector jar file and place it in the following path of CDH3.(use sudo in your commands while copying in the following path as it requires admin acess in linux)
/usr/lib/sqoop/bin
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html --Download link--ojdbc6.jar
Use winscp to transfer the downloaded jar from windows to CDH3.then move it to the above mentioned path in CDH3.
3.Command:
sudo bin/sqoop import –connect jdbc:oracle:thin:system/system#192.168.XX.XX:1521:xe–username system -P –table system.emp –columns “ID” –target-dir /sqoopoutput1 -m 1
sqoopoutput is the ouput file in HDFS where you will get your data ,U can change dis as per your
-m 1 : this tells the number no of mappers for this sqoop job here it is 1.
192.168.XX.XX:1521--ip address of your windows machine
you don't need to import data from oracle to local machine. Then copy it to HDFS machine. Then import it in HDFS.
Sqoop is here to import your RDBMS tables in HDFS directory.
Use command:
sqoop import --connect 'jdbc:oracle:thin:#192.xx.xx.xx:1521:ORCL' --username testuser --password testpassword --table testtable --target-dir /tmp/testdata
Go to the machine on which sqoop is running. Go to the terminal (I believe its linux). Just fire above mentioned command and check --target-dir (I mentioned /tmp/testdata in the example command) in hdfs. You will find files corresponding to your oracle table there.
Check sqoop docs for more details.

Datastax Enterprise Sqoop demo, got exceptions

I try to run the sqoop demo from Datastax Enterprise 4.8, I set up an Analytics cluster of 4 nodes, then with another node set up MySql, and populate the data as in the demo example, I followed all the steps of the demo, and everything seems working fine until the point where I actually run the sqoop data migration command. All DBs are created correctly, and cluster is running fine (I can see it with nodetool status and with OpsCenter), but when I run the sqoop command, I got an exception:
host# /bin/dse sqoop --options-file /usr/share/dse/demos/sqoop/import.options
/usr/share/dse/bin/dse.in.sh: line 4: /bin/dse-client-tool: No such file or directory
Unable to start sqoop: jobtracker not found
The import.options file:
*cql-import
--table
npa_nxx
--cassandra-keyspace
npa_nxx
--cassandra-table
npa_nxx_data
--cassandra-column-mapping
npa:npa,nxx:nxx,latitude:lat,longitude:lon,state:state,city:city
--connect
jdbc:mysql://10.xxx.xxx.xxx/npa_nxx_demo
--username
root
--password
xxxxx
--cassandra-host
10.xxx.xxx.xxx,10.xxx.xxx.xxx*
anyone has ideas why is this error? I reinstalled the DSE, and still got the same... Thanks.
I found the reason, need to do a softlink of the dse-client-tool in /bin dir:
# ln -s /usr/shares/dse/bin/dse-client-tool /bin/dse-client-tool
then it works, not sure why the link not created during the installation...
Start DSE as an analytics node.
Edit /etc/default/dse, set HADOOP_ENABLED=1 in the cassandra.yaml to start the DSE service.
bin/dse cassandra -t

Error in sqoop import query

Scenario:
I am trying for importing data from MS SQL Server to HDFS. But I am getting certain errors as:
Errors:
hadoop#ubuntu:~/sqoop-1.1.0$ bin/sqoop import --connect 'jdbc:sqlserver://localhost;username=abcd;password=12345;database=HadoopTest' --table PersonInfo
11/12/09 18:08:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
Question:
I have configured Sqoop successfully and then what could be the problem? I am trying to connect to database by entering IP address but there is also the same problem.
How can I remove these error? Pls suggest me solution.
Thanks.
Sqoop is now an incubator project in Apache. There is no reason Sqoop should only run with CDH and not Apache Hadoop.
The Sqoop documentation says Sqoop is compatible with Apache Hadoop 0.21 and Cloudera's Distribution of Hadoop version 3.. So, I think using the the correct version of Apache will also solve the problem.
SQOOP-82 is more than an year old and there had been changes after that.
FYI, Sqoop was made part of the Hadoop 0.21 branch and has been removed from Hadoop after moving it to Apache Incubator.
Please check this issue:
Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.
In your sqoop import command you are missing the driver value using --driver
May be this will help.
I think you should try this one, it may solve your problem:
Add the port number of the sqlserver. For port number check with your my.conf(/etc/mysql/my.conf) file.
Try this command with port number and schema:
sqoop import --connect jdbc:mysql://localhost:3306/mydb -username root -password password --table emp --m 1

Resources