I developed a custom connector for Elasticsearch with Apache Calcite.
I could connect to the datasource with Dbeaver SQL Client.
Now I want to use it as data source in Tableau Software. How should I configure it in Tableau,
This is how it's configured in Dbeaver: the idea is to do the same thing in Tableau
Thanks
A few months ago, I logged a Calcite case ([CALCITE-4859]) for a Tableau connector and started work on it. It isn't yet complete, but there is a development branch and you're welcome to contribute to the effort.
Related
I would like to expose the data table from my oracle database and expose into apache kafka. is it technicaly possible?
As well i need to stream data change from my oracle table and notify it to Kafka.
do you know good documentation of this use case?
thanks
You need Kafka Connect JDBC source connector to load data from your Oracle database. There is an open source bundled connector from Confluent. It has been packaged and tested with the rest of the Confluent Platform, including the schema registry. Using this connector is as easy as writing a simple connector configuration and starting a standalone Kafka Connect process or making a REST request to a Kafka Connect cluster. Documentation for this connector can be found here
To move change data in real-time from Oracle transactional databases to Kafka you need to first use a Change Data Capture (CDC) proprietary tool which requires purchasing a commercial license such as Oracle’s Golden Gate, Attunity Replicate, Dbvisit Replicate or Striim. Then, you can leverage the Kafka Connect connectors that they all provide. They are all listed here
Debezium, an open source CDC tool from Redhat, is planning to work on a connector that is not relying on Oracle Golden Gate license. The related JIRA is here.
You can use Kafka Connect for data import/export to Kafka. Using Kafka Connect is quite simple, because there is no need to write code. You just need to configure your connector.
You would only need to write code, if no connector is available and you want to provide your own connector. There are already 50+ connectors available.
There is a connector ("Golden Gate") for Oracle from Confluent Inc: https://www.confluent.io/product/connectors/
At the surface this is technically feasible. However, understand that the question has implications on downstream applications.
So to comprehensively address the original question regarding technical feasibility, bear in mind the following:
Are ordering/commit semantics important? Particularly across tables.
Are continued table changes across instance crashes (Kafka/CDC components) important?
When the table definition changes - do you expect the application to continue working, or will resort to planned change control?
Will you want to move partial subsets of data?
What datatypes need to be supported? e.g. Nested table support etc.
Will you need to handle compressed logical changes - e.g. on update/delete operations? How will you address this on the consumer side?
You can consider also using OpenLogReplicator. This is a new open source tool which reads Oracle database redo logs and sends messages to Kafka. Since it is written in C++ it has a very low latency like around 10ms and yet a relatively high throughput ratio.
It is in an early stage of development but there is already a working version. You can try to make a POC and check yourself how it works.
i have a big doubt. In the same server i have running some loader of qlikview and a data lake in hadoop. My qlikview loader consult some "tables" of the data lake using an impala connector (the connector that you can find in the qlikmarkt), but we will have kerberos security in our data lake.
Anybody knows if i will need the special connector that cloudera provides for kerberos? I think that probably kerberos will not affect in local but i dont know, any idea?
Thanks for all
Ok, i have ask to some expert and they say that you need kerberos in all the parts of the server, then you must use the new cloudera connector.
We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation?
Informatica version: 9.6.1 (Hotfix 2)
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
http://www.simba.com/drivers/hbase-odbc-jdbc/
I am using CDH 5.4.4 and installed Phoenix parcel to be able to run SQL on hbase tables. Has anyone tried to browse that data using Hue?
I know since we can connect using JDBC connection to Phoenix, there must be a way for Hue to connect to it too.
The current status is that we would need to add HUE-2745 and then it would show up in DBQuery / Notebook
The latest https://phoenix.apache.org/server.html is brand new and JDBC only.
If there was an HiveServer2 Thrift API or ODBC for Phoenix it would work almost out of the box in the SQL or DB Query apps. Hue could work with JDBC but there will be a JDBC connector that is GPL (so to install separately).
The Hue jira for integrating Phoenix is https://issues.cloudera.org/browse/HUE-2121.
According to this page: https://spark.apache.org/sql/ you can connect existing BI tools to Spark SQL via ODBC or JDBC:
I don't mean Shark as this is basically EOL:
It is for this reason that we are ending development in Shark as a separate project and moving all our development resources to Spark SQL, a new component in Spark.
How would a BI tool (like Tableau) connect to shark sql via ODBC?
With the release of Spark SQL 1.1 you also have thrift JDBC driver see https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Simba provides the ODBC driver that Databricks uses, however that is only for the Databricks distribution. We are launching the public version for use with Apache tomorrow (Wed, Dec 3rd) at www.simba.com. You'll be able to download and trial the driver for use with Tableau then.
As Carlos said, Stratio Meta is a module that acts as a parser, validator, planner and coordinator layer over the different persistence layers (currently, only Cassandra and Mongo, but also HDFS in the short term). This modules offers a Shell with a SQL-like language, a Java/Scala API, a REST API and ODBC (JDBC shortly). It also uses another Stratio module, Stratio Deep, which allows us to use Apache Spark in order to execute query in an efficent and fast way.
Disclaimer: I am currently employed by Stratio Big Data
Please take a look at: http://www.openstratio.org/blog/connecting-to-the-stratio-big-data-platform-using-odbc-2/
Stratio is a platform that includes a certified Spark distribution that allows you to connect Spark to any type of data repository (like Cassandra, MongoDB,...). It has an ODBC Driver so you can write SQL queries that will be translated to Spark jobs, or even faster, direct queries to Cassandra -or whichever database you want to connect to it - if possible. This way, it is pretty simple to connect Tableau into Spark and your data repository. If you need any help, we will be more than glad to assist you.
Disclaimer: I'm one of Stratio's ODBC developers
Simba will offer one: http://databricks.com/blog/2014/04/30/Databricks-selects-Simba-ODBC-driver-for-shark.html. No known official release date.
[update]
Use HIVE's ODBC driver to connect to Spark SQL as described here and here.
For Spark on Azure HDInsight, you can connect Tableau (or PowerBI) as described here https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/. The ODBC driver is here: http://www.microsoft.com/en-us/download/details.aspx?id=47713