How to connect with Impala with Kedro? - kedro

How to connect with Impala with Kedro? Is there any integration with it? I am using a Window machine.
I try Impyla and ibis and both doesn't work.

Your best bet is to use the Pandas or Spark JDBC DataSet types.

Related

HBase - SAS Integration and Read

I have cloudera cluster(kerberos enabled) and HBase is running in it. I need few tables in HBase with the filter condition which needs to read/write from external SAS server.
I am trying to achieve this through Thrift and Python whereas I have installed python in my SAS Server and accessing HBase through Thrift. I have some limitations in Filter Conditions like SingleColumnValueFilter and other options are not supported with Thrift Python. HappyBase also we tried and there are some limitations and not a stable one to use.
Do we have any other possiblities to do this?
Can anyone please help me out on this.
Thanks..

Connect tableau with Elastic search via Spark SQL

I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,

How do I import from a MySQL database to Datastax DSE Hive using sqoop?

I've spent the afternoon trying to wrap my head around how I can leverage dse sqoop to import a table from MySQL to Hive/Shark. In my case, I am not really interested in importing the table into Cassandra per sé. Hive/Shark will do.
AFAIK, this should be possible given the dse sqoop import help gives me options to create a Hive table. Ive been trying to execute something very similar to http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/ana/anaSqpImport.html except I can't seem to be able to get the Cassandra username/password credentials to work.
Should this be possible? How? Do I have to go through a CQL table?
I am running DSE 4.5.
Sounds like you're trying to do something similar to slide 47 in this deck:
http://www.slideshare.net/planetcassandra/escape-from-hadoop
The strategy Russell uses there is to use the spark mysql driver, no need to deal with Sqoop. You do have to add the dependency to your spark classpath for it to work. No need to go through a CQL table.
Then you can join with c* data, write the data to c*, etc.

Connect to Spark SQL via ODBC

According to this page: https://spark.apache.org/sql/ you can connect existing BI tools to Spark SQL via ODBC or JDBC:
I don't mean Shark as this is basically EOL:
It is for this reason that we are ending development in Shark as a separate project and moving all our development resources to Spark SQL, a new component in Spark.
How would a BI tool (like Tableau) connect to shark sql via ODBC?
With the release of Spark SQL 1.1 you also have thrift JDBC driver see https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Simba provides the ODBC driver that Databricks uses, however that is only for the Databricks distribution. We are launching the public version for use with Apache tomorrow (Wed, Dec 3rd) at www.simba.com. You'll be able to download and trial the driver for use with Tableau then.
As Carlos said, Stratio Meta is a module that acts as a parser, validator, planner and coordinator layer over the different persistence layers (currently, only Cassandra and Mongo, but also HDFS in the short term). This modules offers a Shell with a SQL-like language, a Java/Scala API, a REST API and ODBC (JDBC shortly). It also uses another Stratio module, Stratio Deep, which allows us to use Apache Spark in order to execute query in an efficent and fast way.
Disclaimer: I am currently employed by Stratio Big Data
Please take a look at: http://www.openstratio.org/blog/connecting-to-the-stratio-big-data-platform-using-odbc-2/
Stratio is a platform that includes a certified Spark distribution that allows you to connect Spark to any type of data repository (like Cassandra, MongoDB,...). It has an ODBC Driver so you can write SQL queries that will be translated to Spark jobs, or even faster, direct queries to Cassandra -or whichever database you want to connect to it - if possible. This way, it is pretty simple to connect Tableau into Spark and your data repository. If you need any help, we will be more than glad to assist you.
Disclaimer: I'm one of Stratio's ODBC developers
Simba will offer one: http://databricks.com/blog/2014/04/30/Databricks-selects-Simba-ODBC-driver-for-shark.html. No known official release date.
[update]
Use HIVE's ODBC driver to connect to Spark SQL as described here and here.
For Spark on Azure HDInsight, you can connect Tableau (or PowerBI) as described here https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/. The ODBC driver is here: http://www.microsoft.com/en-us/download/details.aspx?id=47713

Cloudera beeswax server and hive server

I have a fundamental question regarding the two servers mentioned in the context of cloudera cdh4 distribution
Are those two interchangeable/replaceable as in could you run beeswax in place of hive server?
I'm trying to use a thrift client to connect and in my set up only the beeswax is running and not the hive server. In such a case can I connect to the beeswax server?
Hive Server is the default process and Beeswax is a newer process designed to better support concurrency and provide authentication using Kerberos. You should run one or the other.
And yes, you should definitely be able to connect to beeswax using Thrift. You can find clients for Beeswax and Hive server here.
what is the difference between hive-server2 and beeswax? They are both designed to better support concurrency and security.

Resources