Run Azure Databricks Notebook from Apache Nifi - etl

I am new to apache Nifi, is there any way to run Azure Databricks notebook from Nifi?
Or is to be done by different tool?

You can run Azure databricks notebook from Apache Nifi.
To Connect Databricks Data in Apache NiFi
Download the CData JDBC Driver for Databricks installer, unzip the package, and run the JAR file to install the driver.
Copy the CData JDBC Driver JAR file (and license file if it exists), cdata.jdbc.databricks.jar (and cdata.jdbc.databricks.lic), to the Apache NiFi lib subfolder, for example, C:\nifi-1.3.0-bin\nifi-1.3.0\lib.
On Windows, the default location for the CData JDBC Driver is C:\Program Files\CData\CData JDBC Driver for Databricks.
Start Apache NiFi. For example:
cd C:\nifi-1.3.0-bin\nifi-1.3.0\bin
run-nifi.bat
Lastly Navigate to the Apache NiFi UI in your web browser: typically http://localhost:8080/nifi
You can refer this article( https://www.cdata.com/kb/tech/databricks-jdbc-apache-nifi.rst ) for more information

Related

How to access remote hive and hadoop file system using sqoop import command with kerberos authentication?

Using sqoop version 1.4.5 and hadoop version 3.3.4. My requirement is to connect to remote hive and remote hadoop file system without changing the configuration files with kerberos.
Is it possible to do the following operation without amending the configuration files for hadoop, sqoop? If yes, then what all parameters needs to be changed in the configuration files?

httpfs for hadoop apache Download

Iam using Apache Hadoop-2.7.1 on Centos 7 Operating System.
To setup HttpFs, this link suggests to install HttpFs. I do not find any binary available for it.
Is there an alternative method to configure HttpFs for Hadoop?
HttpFs is included in the binary tarball of Apache Hadoop itself. You need not download it separately.
The configuration files httpfs-env.sh and httpfs-site.xml are available under $HADOOP_HOME/etc/hadoop/ directory.
The startup script httpfs.sh is under $HADOOP_HOME/sbin/.
To configure the embedded Tomcat of HttpFs, look for the configuration files under $HADOOP_HOME/share/hadoop/httpfs/tomcat/conf/.

How can I connect to Remote Linux Nodes on which Hadoop is installed using Apache Nifi instance installed on my local Windows Box?

I have installed Apache nifi 1.1.1 on My Windows Local System. How can I connect to Remote Linux Nodes on which Hadoop is installed using Apache Nifi instance installed on my local Windows Box?
Also How can I perform data migration activity on Remote Linux Nodes on which Hadoop is installed using these local instance of Nifi?
I have enabled Kerberos on these Remote Hadoop Cluster.
The "Unsupported major.minor version" is because Apache NiFi 1.x requires Java 8, and you tried to start it with a Java 7 JVM. You could install a Java 8 JDK just for NiFi to use, and leave all the Hadoop stuff using Java 7, and you can set NiFi's JAVA_HOME in bin/nifi-env.sh:
export JAVA_HOME=/path/to/jdk1.8.0/
If you are trying to connect NiFi on your local Windows system to remote Hadoop nodes, you will need the core-site.xml and hdfs-site.xml from your Hadoop cluster, and since you have kerberos enabled you will need the krb5.conf file from one of your Hadoop servers (/etc/krb5.conf).

Connect to apche spark hive remotely through jdbc client like Squirrel SQL

I have a running spark cluster installed with hive.I am able to run SQL queries through org.apache.spark.sql.hive.HiveContext locally via beeline.Hive thriftserver is running.
But I want to know how to connect to this hive metastore from a remote computer through jdbc without having installed hive all over again in this remote system.
Please suggest what exact driver would be needed and any jdbc client application like Squirrel SQL Client.
The following jars, will help(for cdh distribution)
commons-configuration-1.6.jar
commons-logging-1.1.1.jar
commons-logging-1.1.3.jar
commons-logging-1.1.jar
commons-logging-1.2.jar
hadoop-common-2.6.0-cdh5.4.4-tests.jar
hadoop-common-2.6.0-cdh5.4.4.jar
hadoop-core-2.6.0-mr1-cdh5.4.4.jar
hive-exec-1.1.0-cdh5.4.4.jar
hive-jdbc-1.1.0-cdh5.4.4-standalone.jar
hive-jdbc-1.1.0-cdh5.4.4.jar
hive-service-1.1.0-cdh5.4.4.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.16.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar

sqoop + cloudera manger jdbc driver not found

I am trying to set up the JDBC driver for sqoop in cloudera manager.
Here is some background on my setup:
1) I have a 5 machine Hadoop cluster running CDH 4.5 on ubundu
2) Installed sqoop through cloudera manager
I have already downloaded the latest JDBC mysqlconnector jar and copied that to the following locations:
sudo cp /home/clouderasudo/jbdcDriver/mysql-connector-java-5.1.29-bin.jar /usr/lib/sqoop/lib/
sudo cp /home/clouderasudo/jbdcDriver/mysql-connector-java-5.1.29-bin.jar /usr/lib/oozie/lib/
But it still get below an error when I try to set up a new job in sqoop with com.mysql.jdbc.Driver as the JDBC driver class:
Can't load specified driver
Any help appreciated.
you may need to copy the database driver file in directory that contains sqoop library, in my case it is /opt/cloudera/parcels/CDH/lib/sqoop/lib
Savio
for me it was
hdfs path: /user/oozie/share/lib/lib_20140909154837/sqoop
restart oozie in cloudera manager

Resources