Stratio Sqoop complains of missing hadoop library - hadoop

I have pulled stratio sqoop docker containers as documented here:
https://stratio.atlassian.net/wiki/display/SQOOP0X2/Example+mysql+to+kafka
but when start the process of creating link between mysql and kafka. The step
"create link -c generic-jdbc-connector"
complains of missing hadoop library.
Is there something else pre-req for this?
Thx

I had missed a step of connecting sqoop client to sqoop server and got confused with the error message.

Related

Error when trying to execute kylin.sh start in HDP Sandbox 2.6

I installed Apache Kylin, following the official installation guide http://kylin.apache.org/docs/install/index.html, in HDP sandbox 2.6
When I run the script, $KYLIN_HOME/bin/kylin.sh start, I got the error below:
What can I do to fix this error?
Thanks in advance
Check if Hive service is up in your ambari, when Hive service is down Kylin cannot find it and gives the error. Check for .bash_profile as well. When those two issues are addressed kylin should be able to find location of hive dependency.
Kylin uses the find-hive-dependency.sh script to setup the CLASSPATH. This script uses a Hive CLI command (I test it with beeline) to query Hive env vars and extract the CLASSPATH from them.
beeline connect to Hive using the properties at kylin_hive_conf.xml but for some reason (probably due to the Hive version included in HDP 2.6) some of the loaded Hive properties cannot be set when the connection is stablished.
The Hive properties that causes the issue can be discarded for connecting to Hive to query the CLASSPATH, so, to fix this issue:
Edit $KYLIN_HOME/conf/kylin.properties and set kylin.source.hive.client=beeline
Open the find-hive-dependency.sh script, go to line 34 aprox and modify the line
hive_env=${beeline_shell} ${hive_conf_properties} ${beeline_params} --outputformat=dsv -e "set;" 2>&1 | grep 'env:CLASSPATH'
Just remove ${hive_conf_properties}
Check Hive depedencies have been configured by running the command find-hive-dependency.sh.
Now $KYLIN_HOME/bin/kylin.sh start should works.

Datastax Enterprise Sqoop demo, got exceptions

I try to run the sqoop demo from Datastax Enterprise 4.8, I set up an Analytics cluster of 4 nodes, then with another node set up MySql, and populate the data as in the demo example, I followed all the steps of the demo, and everything seems working fine until the point where I actually run the sqoop data migration command. All DBs are created correctly, and cluster is running fine (I can see it with nodetool status and with OpsCenter), but when I run the sqoop command, I got an exception:
host# /bin/dse sqoop --options-file /usr/share/dse/demos/sqoop/import.options
/usr/share/dse/bin/dse.in.sh: line 4: /bin/dse-client-tool: No such file or directory
Unable to start sqoop: jobtracker not found
The import.options file:
*cql-import
--table
npa_nxx
--cassandra-keyspace
npa_nxx
--cassandra-table
npa_nxx_data
--cassandra-column-mapping
npa:npa,nxx:nxx,latitude:lat,longitude:lon,state:state,city:city
--connect
jdbc:mysql://10.xxx.xxx.xxx/npa_nxx_demo
--username
root
--password
xxxxx
--cassandra-host
10.xxx.xxx.xxx,10.xxx.xxx.xxx*
anyone has ideas why is this error? I reinstalled the DSE, and still got the same... Thanks.
I found the reason, need to do a softlink of the dse-client-tool in /bin dir:
# ln -s /usr/shares/dse/bin/dse-client-tool /bin/dse-client-tool
then it works, not sure why the link not created during the installation...
Start DSE as an analytics node.
Edit /etc/default/dse, set HADOOP_ENABLED=1 in the cassandra.yaml to start the DSE service.
bin/dse cassandra -t

Does HCatalog require installation before being used?

Can anyone please tell me that, does HCatalog require installation before using? Or it can be used just as a jar file?
I have Cloudera running on a VM, and I can use HCatalog for my MR job, Pig, Hive with no problem. And I thought the same MR code would work with another hadoop installed platform, but obviously it's not the case, exception thrown on the HCatInputFormat.setInput(). When I use Pig -useHCatalog, I'ved been prompted that the usage was wrong, meaning that it didn't know what's -useHCatalog as a parameter.
Didn't thought about this before as have been using HCatalog on Cloudera...
Yes, you need to install and start HCatalog server. HCatalog should come with the latest Hive tar package.
Check here of Apache Hive documentation for details,
Basically you need to,
Setup MySQL database for HCatalog
Run server install script
share/hcatalog/scripts/hcat_server_install.sh -r root -d dbroot -h
hadoop_home -p portnum
Start the HCatalog server
export HIVE_HOME=hive_home
$HIVE_HOME/sbin/hcat_server.sh start
As pointed out, you do not need to install hcatalog separately if you are working with hive 0.12 or later versions.

Hue Hive -- Beeswax Server Can't Find JDBC Driver for MySQL

We're using the Cloudera 3.7.5 and having a tough time configuring the Beeswax server such that the Hue can access the Hive databases. I followed all the instructions from the Cloudera documentation that to setup MySQL to serve as Hive's metastore, but when I restart the Hue services and check Beeswax server's StdErr logs, I still see the painful "javax.jdo.JDOFatalInternalException: Error creating transactional connection factory" which is caused by
org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is bizzare to me, because the logs also indicate that the environment variable HIVE_HOME is equal to "/usr/lib/hive", and sure enough I have copied the "mysql-connector-java-5.1.15-bin.jar" into the /usr/lib/hive/lib directory, as the documents dictate.
I have also tried the instructions on the blog post http://hadoopchallenges.blogspot.com/2011/03/hue-120-upgrade-and-beeswax.html, which involved copying the the mysql-connector jar into "/usr/share/hue/apps/beeswax/hive/lib/". Unfortunately I did not have a hive/lib subdirectory in the beeswax folder, so I attempted to make one. This also did not work.
Any advice how I can get the MySQL JDBC library onto Beeswax's classpath?
We finally decided to just bite the bullet and upgrade to CDH4. Placing the JDBC jar in /usr/share/hive/lib allowed the Beeswax server to function perfectly without issue.
If anyone else is experiencing this issue I recommend upgrading from CDH3 to CDH4, the UI is much cleaner, smoother, and we had much fewer installation and maintenance bugs with CDH4.
You have to paste your mysql connector in HUE_HOME/apps/beeswax/hive/lib.
If this path doesn't exist, create hive/lib and then paste the mysql connector. I hope your problem will be solved.
When you start using cloudera 4.5 they move everything into parcels, so this exact problem on my hive meta server was fixed by this command (below). Essentially you're just re-adding modules. I'm sure you can modify the extra classpath in the hive config file to make this oblivious to parcel updates.
cp /usr/lib/hive/lib/mysql-connector-java-5.1.17-bin.jar /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/hive/lib/.
So a real fix might be something like this:
cp `locate mysql-connector | grep jar | head -n 1` /opt/cloudera/parcels/*/lib/hive/lib/.
which would copy the jar into every parcel.

Error in sqoop import query

Scenario:
I am trying for importing data from MS SQL Server to HDFS. But I am getting certain errors as:
Errors:
hadoop#ubuntu:~/sqoop-1.1.0$ bin/sqoop import --connect 'jdbc:sqlserver://localhost;username=abcd;password=12345;database=HadoopTest' --table PersonInfo
11/12/09 18:08:15 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.1
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
Question:
I have configured Sqoop successfully and then what could be the problem? I am trying to connect to database by entering IP address but there is also the same problem.
How can I remove these error? Pls suggest me solution.
Thanks.
Sqoop is now an incubator project in Apache. There is no reason Sqoop should only run with CDH and not Apache Hadoop.
The Sqoop documentation says Sqoop is compatible with Apache Hadoop 0.21 and Cloudera's Distribution of Hadoop version 3.. So, I think using the the correct version of Apache will also solve the problem.
SQOOP-82 is more than an year old and there had been changes after that.
FYI, Sqoop was made part of the Hadoop 0.21 branch and has been removed from Hadoop after moving it to Apache Incubator.
Please check this issue:
Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.
In your sqoop import command you are missing the driver value using --driver
May be this will help.
I think you should try this one, it may solve your problem:
Add the port number of the sqlserver. For port number check with your my.conf(/etc/mysql/my.conf) file.
Try this command with port number and schema:
sqoop import --connect jdbc:mysql://localhost:3306/mydb -username root -password password --table emp --m 1

Resources