Cassandra 2.07 node repair hangs - cassandra-2.0

We have a 5 node cassandra cluster. The cassandra version is 2.07. OS is Oracle Enterprise Linux 6.5.
The Java environment is:
-bash-4.1$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
The node repair is hanging randomly. The output log would show:
-------------- Repairing... ------------------------------------------------
[2014-05-05 20:00:02,305] Starting repair command #7, repairing 728 ranges for keyspace ???
And it just hangs w/o making any progress.
Any idea how to find the root cause of the problem?
Thanks!

I am experiencing the same problems with cassandra 2.0.7. Usually it hands after it sends merkle tree requests to the replication partner nodes, and then fails to create its own snapshot to send that tree back to itself. So the log message would look like this:
INFO [RepairJobTask:1] 2014-06-10 18:56:42,176 RepairJob.java (line 134) [repair #3c663fb1-f0ce-11e3-ac99-f9b8874f4c5e] requesting merkle trees for <CF_Name> (to [/10.0.4.101, /10.0.2.91, /10.0.3.91, /10.0.3.111, /10.0.4.111, /10.0.4.92, /10.0.2.101, /10.0.3.101])
The only way to push the repair forward, is to restart cassandra on one of the nodes from that list (not the repairing node itself). This will throw a few errors, but at least the rest of the repair will continue.

Related

Designing ElasticSearch Migration from 6.8 to 7.16 along with App Deployment

I have a Spring Boot application that uses ElasticSearch 6.8 and I would like to migrate it to Elasticsearch 7.16 with least downtime. I can do rolling update but the problem with migration is that when I migrate my ES cluster from version 6 to 7, some features in my application fails because of breaking changes (for example total hit response change)
I also upgraded my ElasticSearch client to version 7 in a separate branch and I can deploy it as well but that client doesn't work with ES version 6. So I cannot first release the application and then do the ES migration. I thought about doing application deployment and ES migration at the same time with a few hours downtime but in case something goes wrong rollback may take too much time (We have >10TB data in PROD).
I still couldn't find a good solution to this problem. I'm thinking to migrate only ES data nodes to 7.16 version and keep master nodes in 6.8. Then do application deployment and migrate ElasticSearch master nodes together with a small downtime. Has anyone tried doing this? Would running data and master nodes of my ElasticSearch cluster in different versions (6.8 and 7.16) cause problem?
Any help / suggestion is much appreciated
The breaking change you mention can be alleviated by using the query string parameter rest_total_hits_as_int=true in your client code in order to keep getting total hit count as in version 6 (mentioned in the same link you shared).
Running master and data nodes with different versions is not supported and I would not venture into it. If you have a staging environment where you can test this upgrade procedure it's better.
Since 6.8 clients are compatible with 7.16 clusters, you can add that small bit to your 6.8 client code, then you should be able to upgrade your cluster to 7.16.
When your ES server is upgraded, you can upgrade your application code to use the 7.16 client and you'll be good.
As usual with upgrades, since you cannot revert them once started, you should test this on a test environment first.

DSX local - Models not working

I have installed a DSX 3 node cluster on RHEL 7.4, all notebooks and r-studio code work fine. However, model creation gives this error:
Load Data
Error: The provided kernel id was not found. Verify the input spark service credentials
All kubernetes pods seem to be up and running. Any ideas on how to fix this?
If you are in the Sept release, suggest stop kernels and restart. There was a limit of of 10 kernels in that release. You will see the active green button across notebooks/models with option to stop.

Hortonworks Data Platform: High load causes node restart

I have setup a Hadoop Cluster with Hortonworks Data Platform 2.5. I'm using 1 master and 5 slave (worker) nodes.
Every few days one (or more) of my worker nodes gets a high load and seem to restart the whole CentOS operating system automatically. After the restart the Hadoop components don't run anymore and have to be restarted manually via the Amabri management UI.
Here a screenshot of the "crashed" node (reboot after the high load value ~4 hours ago):
Here a screenshot of one of other "healthy" worker node (all other workers have similar values):
The node crashes alternate between the 5 worker nodes, the master node seems to run without problems.
What could cause this problem? Where are these high load values coming from?
This seems to be a Kernel problem, as the log file (e.g. /var/spool/abrt/vmcore-127.0.0.1-2017-06-26-12:27:34/backtrace) says something like
Version: 3.10.0-327.el7.x86_64
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a0
After running a sudo yum update I had the kernel version
[root#myhost ~]# uname -r
3.10.0-514.26.2.el7.x86_64
Since the operating system updates the problem didn't occur anymore. I will observe the issue and give feedback if neccessary.

Set up IBM Open Platform with an external Oracle Database

I'm a little confused when I try to install a single node IBM Open Platform cluster using an Oracle database as RDBMS.
Firstly, I understand that the Hadoop part of the IBM Big Insights is not a modified version of the corresponding Apache version (as HortonWorks do) so, when Ambari (from the IBM repo) offers me to use an external Oracle database, I suppose it should work. I may be wrong, and I can't find any oracle reference in the crappy IBM installation guide to set it up correctly (only that it should work with Oracle 11g R2)
So, as I do with an equivalent HortonWorks distribution (but using the binaries from IBM), I set up my ambari-server with all the oracle parameters (--jdbc-db=oracle --jdbc-driver=path/to/ojdbc6.jar, I'm using a Oracle 11g XE on Centos 6.5, supposed to be supported by IOP) and I specified all the stuff I had to specify to use Ambari with Oracle (Service Name, Host, Port, ...)
I created the ambari user, loaded the corresponding Oracle DDL (packaged with Ambari) and created my Hive & Oozie users, as specified in the... Hortonworks installation guide.
Well, Ambari seems to work well with Oracle, I can set up my cluster until the last step :
If I configure Hive and/or oozie to work with oracle (validating the oracle connection is OK from the service configuration tab), the "review" step (step 8) doesn't show anything (or sometimes the IOP repos, it seems to be arbitrary). Trying to deploy starts the tasks preparation and implies a blocking states of the installation: I can't do anything else than dropping the database and reload the entire DDL to try again (or I'll obtain lots of unexpected NullPointerException)
If I configure Hive AND Oozie to work with an embedded MySQL (the default choice), keeping Ambari against Oracle, everything works fine.
Am I doing something wrong?? Or is there any limitation to configure (IBM Open Platform) Hive and Oozie to use Oracle 11 ? (when it works with the HortonWorks - same apache version - and Cloudera Distribution)
Of course, log files don't tell me anything...
UPDATE:
I tried to install IOP 4.1, firstly using MySQL as my Ambari, Hive and Oozie database, everything was fine.
Next I tried to install IOP 4.1 with Oracle 11 XE as external database (I configured oracle, created ambari, hive and oozie oracle users and loaded the Ambari Oracle schema given with IOP 4.1, and I configure the same cluster as the first time, specifying the Oracle particularities for Hive, Oozie (and Sqoop (Oracle driver)). Before deploying the services to all the nodes, Ambari is supposed to resume what it is going to install, but it doesn't: sometimes it doesn't show anything, sometimes it shows only the IOP repos urls. Next, trying to deploy, it starts the preparation tasks but never ends. and that's it. No message, no log, nothing, it just get stucked.
As the desired components of IOP 4.1 are in the same version in HDP 2.3 (Ambari 2.1, Hive 1.2.1, oozie 4.2.0, hadoop 2.7.1, pig 0.15.0, sqoop 1.4.6 and zookeeper 3.4.6), I tried to configure exactly the same cluster with HDP 2.3, Oracle 11 XE, ... and everything worked. I noticed that HDP 2.3 forces me to use SSL, while IOP does not. HDP works with an Oracle JDK 1.8 by default while IOP actually offer to use an OpenJDK 1.8 instead. I don't know if it matters, I'll try to be sure... I'll take pictures of the Ambari screen when it blocks and copy the log traces, even if there's no error message...
If anyone got an idea, please share it!
Thanks!
Trying to operate the same installation using the Oracle JDK 1.8 everything works fine.
I don't know if there is any restriccion using the Oracle JDBC driver with OpenJDK 1.8 but using Oracle 11 XE with IOP 4.1 + Oracle JDK 1.8 works.

Where to install Java on multi-node hadoop cluster?

In a multi-node hadoop cluster where there are multiple slave nodes, one master node, and one client node, where all do we need java to be installed?
Also is that we need hadoop to be installed only on the client node? I get confused after going through sites where they mention that we first need to install Java but it does not mention on which node do we need to install it.
Java is prerequisite to run Hadoop. You need to install java in all the machines even in client also.
Coming to client configuration. In client machine no need to install Hadoop. It is just to communicate with the Hadoop cluster
Check below links for more
Hadoop Client Node Configuration
https://pravinchavan.wordpress.com/2013/06/18/submitting-hadoop-job-from-client-machine/
Java is the pre-requisite to run hadoop. It should be installed on all Master and slave node.
You can refer the document for Hadoop MultiNode cluster setup for more details.
JDK should be installed on all the nodes as it is the primary requirement for Hadoop to work.
Make sure you install the same version of Java in all the nodes.
Oracle Java is preferred over openjdk

Resources