Percona Xtradb Cluster nodes won't start - cluster-computing

I setup percona_xtradb_cluster-56 with three nodes in the cluster. To start the first cluster, i use the following command and it starts just fine:
#/etc/init.d/mysql bootstrap-pxc
The other two nodes however fail to start when i start them normally using the command:
#/etc/init.d/mysql start
The error i am getting is "The server quit without updating the PID file". The error log contains this message:
Error in my_thread_global_end(): 1 threads didn't exit 150605 22:10:29
mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended.
The cluster nodes are running all Ubuntu 14.04. When i use percona-xtradb-cluster5.5, the cluster ann all the nodes run just fine as expected. But i need to use version 5.6 because i am also using GTID which is only available in version 5.6 and not supported in earlier versions.
I was following these two percona documentation to setup the cluster:
https://www.percona.com/doc/percona-xtradb-cluster/5.6/installation.html#installation
https://www.percona.com/doc/percona-xtradb-cluster/5.6/howtos/ubuntu_howto.html
Any insight or suggestions on how to resolve this issue would be highly appreciated.

The problem is related to memory, as "The Georgia" writes. There should be at least 500MB for default setup and bootstrapping. See here http://sysadm.pp.ua/linux/px-cluster.html

Related

Windows/Drillbit Error: Could not find or load main class org.apache.drill.exec.server.Drillbit

I have set up a Hadoop single node cluster with pseudo distributed operations, and YARN running. I am able to use Spark JAVA API to run queries as a YARN-client. I wanted to go one step further and try Apache Drill on this "cluster". I installed Zookeeper that is running smoothly but I am not able to start drill and I get this log:
nohup: ignoring input
Error: Could not find or load main class
org.apache.drill.exec.server.Drillbit
Any idea?
I am on Windows 10 with JDK 1.8.
DRILL CLASSPATH is not initialized in the process of running drillbit on your machine.
For the purpose to start Drill on Windows machine it is necessary to run sqlline.bat script, for example:
C:\bin\sqlline sqlline.bat –u "jdbc:drill:zk=local;schema=dfs"
See more info: https://drill.apache.org/docs/starting-drill-on-windows/

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

redis on windows cluster setup

I have downloaded MSOpenTech Redis version 3.x which includes the long awaited clustering feature. My redis database is all working and I can start my cluster on the min 3 nodes required (in cluster mode). Does anyone know how to configure the cluster (it seems no one knows)?
Installing Linux and running the native Linux version is not an option for me sadly.
Any help would be greatly appreciated.
You can follow the Redis Cluster Tutorial and to create the cluster you can use the redis-trib.rb ruby script, for which you need to install Ruby for Windows.
For example:
> C:\Ruby22\Bin\ruby.exe redis-trib.rb create --replicas 1 192.168.1.1:7000 192.168.1.1:7001 192.168.1.1:7002 192.168.1.1:7003 192.168.1.1:7004 192.168.1.1:7005
Did not have the option to install Ruby on Windows but found the manual steps worked for me. The Ruby script seems to do a lot of checking stuff is setup correctly and is the preferred setup route. So Beware, here be dragons.
Set each node to run in Cluster mode. Edit the redis.windows-service.conf file and uncomment
cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-node-timeout 15000
restart the service.
Run a powershell window and change to the Redis installed folder and start the redis-cli. e.g.
cd "C:\Program Files\Redis"
.\redis-cli.exe
Now you can join other nodes. Run CLUSTER MEET IPADDRESS PORT for each of the other nodes, than the instance you happen to be on. e.g.
CLUSTER MEET 10.10.0.2 6379
After a few seconds running
CLUSTER NODES
Should list all the nodes connected, but all will be set as MASTER.
On each of the other nodes, run CLUSTER REPLICATE MASTERNODEID. Where MASTERNODEID is the hash-looking value next the node declared "myself" on your master when running CLUSTER NODES. e.g.
CLUSTER REPLICATE b7c767ab3ab7c4a926ac2fed937cf140b96764a7
Now allocate slots to each Master. My setup has three instances, only one master.
for ($slot=0;$slot -le 16383;$slot++) {
.\redis-cli.exe -h REDMST CLUSTER ADDSLOTS $slot
}
Reconnect with redis-cli and try and save data. e.g.
SET foo bar
OK
GET foo
"bar"
Phew! Got most this from reading https://www.javacodegeeks.com/2015/09/redis-clustering.html#InstallingRedis which is not Windows specific.
for windows version:
open the command window then type below command
C:\ProgramFiles\redis>FOR /L %i IN (0,1,16383) DO ( redis-cli.exe -p **6380** CLUSTER ADDSLOTS %i )
6380 is port of master node.

Can't access Ganglia on EC2 Spark cluster

Launching using spark-ec2 script results in:
Setting up ganglia RSYNC'ing /etc/ganglia to slaves... <...>
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Shutting down GANGLIA gmond: [FAILED]
Starting GANGLIA gmond: [ OK ]
Connection to <...> closed. <...> Stopping httpd:
[FAILED] Starting httpd: httpd: Syntax error on line 199 of
/etc/httpd/conf/httpd.conf: Cannot load modules/libphp-5.5.so into
server: /etc/httpd/modules/libphp-5.5.so: cannot open shared object
file: No such file or directory
[FAILED] [timing]
ganglia setup: 00h 00m 03s Connection to <...> closed.
Spark standalone cluster started at <...>:8080 Ganglia started at
<...>:5080/ganglia
Done!
However, when I netstat, there is no 5080 port listened on.
Is this related to the above error with httpd or it's something else?
EDIT:
So the issue is found (see the answer below), and the fix can be applied locally on the instance, after which Ganglia works fine. However the question is how to fix this issue in the root, so that spark-ec2 script can start Ganglia normally without intervention.
The fact that ganglia is not available is related to these errors - ganglia is php application and it won't run without php module for apache.
Which version of spark you are using to start cluster?
It is wierd error - these file should be present in AMI image.
Just traced the error: /etc/httpd/conf/httpd.conf is trying to load libphp-5.5 library while modules/ contains libphp-5.6 version...
Changing httpd.conf fixes the issue, however I'd be good to know a permanent fix within spark-ec2 script
This is because httpd fails to launch. As you have noted httpd.conf is trying to load modules and failing. You can reproduce the problem via apachectl start and examine exactly what modules are failing to load.
In my case there was one involving "auth" and "core". The last four (maybe five) listed will also fail to load. I did not encounter anything related to PHP so maybe our cases our different. Anyway the hacky solution is to comment out the problems. I did so and am running Ganglia without issue.

Spark Standalone Mode: Worker not starting properly in cloudera

I am new to the spark, After installing the spark using parcels available in the cloudera manager.
I have configured the files as shown in the below link from cloudera enterprise:
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
After this setup, I have started all the nodes in the spark by running /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-all.sh. But I couldn't run the worker nodes as I got the specified error below.
[root#localhost sbin]# sh start-all.sh
org.apache.spark.deploy.master.Master running as process 32405. Stop it first.
root#localhost.localdomain's password:
localhost.localdomain: starting org.apache.spark.deploy.worker.Worker, logging to /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain: failed to launch org.apache.spark.deploy.worker.Worker:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
localhost.localdomain: at gnu.java.lang.MainThread.run(libgcj.so.10)
localhost.localdomain: full log in /var/log/spark/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost.localdomain:starting org.apac
When I run jps command, I got:
23367 Jps
28053 QuorumPeerMain
28218 SecondaryNameNode
32405 Master
28148 DataNode
7852 Main
28159 NameNode
I couldn't run the worker node properly. Actually I thought to install a standalone spark where the master and worker work on a single machine. In slaves file of spark directory, I given the address as "localhost.localdomin" which is my host name. I am not aware of this settings file. Please any one cloud help me out with this installation process. Actually I couldn't run the worker nodes. But I can start the master node.
Thanks & Regards,
bips
Please notice error info below:
localhost.localdomain: at java.lang.ClassLoader.loadClass(libgcj.so.10)
I met the same error when I installed and started Spark master/workers on CentOS 6.2 x86_64 after making sure that libgcj.x86_64 and libgcj.i686 had been installed on my server, finally I solved it. Below is my solution, wish it can help you.
It seem as if your JAVA_HOME environment parameter didn't set correctly.
Maybe, your JAVA_HOME links to system embedded java, e.g. java version "1.5.0".
Spark needs java version >= 1.6.0. If you are using java 1.5.0 to start Spark, you will see this error info.
Try to export JAVA_HOME="your java home path", then start Spark again.

Resources