Can we use multiple hive server in JDBC URL for failover - hadoop

Is it possible to use multiple hive servers in the jdbc URL?
jdbc:hive2://ip1:10000,ip2:10000/;transportMode=http;
Basically I want an Active Passive kind of setup and if the first server is not available i want to use the second one. I don't want to go through zk setup as load balancing is not required.
I am using hive over socks proxy.

Zookeeper is for failover also, not just load balancing, and that's how you can get a highly available HiveServer connection
To provide high availability or load balancing for HiveServer2, Hive provides a function called dynamic service discovery where multiple HiveServer2 instances can register themselves with Zookeeper
https://www.ibm.com/support/knowledgecenter/en/SSCRJT_5.0.1/com.ibm.swg.im.bigsql.admin.doc/doc/admin_HA_HiveS2.html
You should be using Zookeeper already for a highly available namenode

Related

NiFi - connect to another instance (S2S)

I'm trying to use the SiteToSiteProvenance Reporting Task.
The objective is to send provenance data between two dockerized instances of NiFi, one at port 8080 and another at port 9090.
I've created a input port creatively called "IN" on the destination NiFi and the service configuration on the source NiFi is:
However I'm getting the following error:
Unable to refresh Remote Group's peers due to Unable to communicate with remote NiFi cluster in order to determine which nodes exist in the remote cluster
I've also exposed the port 10000 in the destination docker.
As mentioned in the comments, it appears there was a networking issue between the containers.
It was finally resolved by the asker by not using containers.

SSH access for the headnode of FIWARE-Cosmos

I am following this guide on Hadoop/FIWARE-Cosmos and I have a question about the Hive part.
I can access the old cluster’s (cosmos.lab.fiware.org) headnode through SSH, but I cannot do it for the new cluster. I tried both storage.cosmos.lab.fiware.org and computing.cosmos.lab.fiware.org and failed to connect.
My intention in trying to connect via SSH was to test Hive queries on our data through the Hive CLI. After failing to do so, I checked and was able to connect to the 10000 port of computing.cosmos.lab.fiware.org with telnet. I guess Hive is served through that port. Is this the only way we can use Hive in the new cluster?
The new pair of clusters have not enabled the ssh access. This is because users tend to install a lot of stuff (even not related with Big Data) in the “old” cluster, which had the ssh access enabled as you mention. So, the new pair of clusters are intended to be used only through the APIs exposed: WebHDFS for data I/O and Tidoop for MapReduce.
Being said that, a Hive Server is running as well and it should be exposing a remote service in the 10000 port as you mention as well. I say “it should be” because it is running an experimental authenticator module based in OAuth2 as WebHDFS and Tidoop do. Theoretically, connecting to that port from a Hive client is as easy as using your Cosmos username and a valid token (the same you are using for WebHDFS and/or Tidoop).
And what about a Hive remote client? Well, this is something your application should implement. Anyway, I have uploaded some implementation examples in the Cosmos repo. For instance:
https://github.com/telefonicaid/fiware-cosmos/tree/develop/resources/java/hiveserver2-client

What is the right MariaDB Galera jdbc URL properties for loadbalance

I have setup have 2 nodes of MariaDB 10.0 Galera cluster running on both private IPs of 192.168.2.51 and 192.168.2.52. I'm about to try connecting to the cluster using MariaDB's JDBC Client (org.mariadb.jdbc.Driver) provided by MariaDB's website.
It worked with the regular url like: "jdbc:mariadb://192.168.2.51:3306,192.168.2.52:3306/dbname".
But what I am trying to achieve is the possibility with the MySQL JDBC Driver, with url like: "jdbc:mysql://192.168.2.51,192.168.2.52/dbname?autoReconnect=true&autoReconnectForPools=true&failoverReadonly=false&roundRobinLoadBalance=true"
I have compared the properties stated in MariaDB (https://mariadb.com/kb/en/about-the-mariadb-java-client/) and MySQL (http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-configuration-properties.html). For the MariaDB JDBC Client, it doesn't seem to have properties that deal with loadbalance or autoReconnect.
So my question is:
Is there a right recommended way to connect (with loadbalance and failover capability) to MariaDB Galera through the MariaDB JDBC Driver or should I fall back to MySQL's ConnectorJ and how compatible is ConnectorJ with regards to MariaDB Galera cluster?
Thank you.
There is no loadbalance or failover capability in MariaDB JDBC . Even the multiple endpoint feature you used is not documented and is experimental. ConnectorJ loadbalancing should work fine, because to it, MariaDB Galera is just instances of regular MySQL.
You just use failover.
From what I observe, jdbc:mariadb:failover in mariadb connector equals jdbc:mysql:loadbalance in mysql connector;
and jdbc:mariadb:sequential in mariadb equals jdbc:mysql:failover in mysql.
This is a bit confusing.
In Mariadb, even the word is failover, the read/write load is actually spread across all nodes. I prefer to use sequential so that the connections are always to one node, which is more reliable in some cases, such as with galera multi-master cluster.
https://mariadb.com/kb/en/library/about-mariadb-connector-j/

Load Balancing with Tomcat

This may sound a basic question...
But I am new to concept of load balancing and had few questions.
Scenario - I have 3 Tomcat 7 servers which I want to be load balanced.
I read few articles and saw that using Apache HTTP Server one can do this.
There exists a worker.properties file which needs to be defined with the servers you would like to load balance. Now the problem is that this needs to be done before I start the Apache HTTP Server.
Problem - What if I want to add few more Tomcat Servers dynamically without restarting the Apache HTTP Server. Is this possible ?
Regards
Ajax
I spot an interesting article about Tomcat & PAAS: http://www.devx.com/Java/Article/48086
You will probably find what you need in the article. It describes the mechanism to register / unregister a new node in the cluster.
HIH
Apache HTTPD 2.4 supports dynamic reconfiguration of the load balancer. But use the load-balancer proxy module for this, not mod_jk.

How to have a single IP for Rapid Application Cluster (RAC) oracle cluster and WIndows Server 2008?

I have a multi tier application that want to use a RAC to improve the availability of the server.
What we have now is, the client side sending a transaction data to the server side through a webservice. At client level, we need to specify the url address (IP address) as a path to send a data.
As for now, there are 2 oracle instance installed as a RAC at a server.
1. 133.38.52.101
2. 133.38.52.102
Both of the server are connect to same Oracle Database (SAN storage).
Let say, the client side is pointing to .101. Suddenly the .101 machine is down, how can I possible to use the .102 without changing the point URL at the client side. Is there any configuration can be done at RAC or Windows Server 2008 for this type of problem?
Use a load balancer between client machine and application server machines.
Use Oracle's transparent application failover functionality in OCI to achieve redundancy and load balancing between application server machines and RAC instances. DML transactions will be rolled back but selects will be transparently failed over.

Resources