Infinispan Server Topology - performance

I am currently analyzing a cluster environment with a Distributed cache.
I have 5 nodes, each one with an application that must cache a value. 2 nodes are in one datacenter and the other 3 in another. I was thinking of installing an ISPN instance on each node (where is the application hosted) to build the cluster.
Do you have any suggestions for me for further analysis?
Many thanks!

Related

Can we run two 'slave' nodes on the same machines?

We are running a 3 node mesos cluster and mesos master is running on each node. Also, 2 slaves are running on each node. Is this a good practice? 2 slaves on each cluster won't be sending too much offer and end up being overloaded? What is the recommended config for 3 nodes cluster?
Thread from Mesos User Mailing List
It depends on your isolation setting (mainly cgroup, or any node level
resources). In general, we don't recommend folks use multiple agents on a
node.
It's possible to make it work by setting cgroup_root separately for
MesosContainerizer. For DockerContainerizer, currently, we hard code
DOCKER_NAME_PREFIX, making it not possible to use two agents on a node
properly.
Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.

Elasticsearch in production with kubernetes

I am working on product in which we are using elasticsearch for search. Our production setup is in K8S (1.7.7) and we are able to scale it pretty well. Only thing I am not sure about is whether we should be hosting elasticsearch in k8s (it can go on dedicated host as well using label selector nodes) or it is advisable to host elasticsearch on VM than docker.
Our data set size is 2-3 GB and would go further. But this is the benchmark we can consider.
And elasticsearch cluster I am planning to have ti is - 3 master (with 2 as eligible master), one client node, and one data node. We can scale datanode and client node as data increases.
Is anyone did this before? thanks in advance.
IMO the best resource for Elasticsearch on Kubernetes is https://github.com/pires/kubernetes-elasticsearch-cluster
Note that while there are official Docker containers, no official solution for orchestration is being provided at the moment. This is currently covered by the community only.
3 master (with 2 as eligible master)
This doesn't make much sense. You'll want 3 master eligible nodes with the setting discovery.zen.minimum_master_nodes: 2 and one of the 3 nodes will be the actual master.

Calculating yarn.nodemanager.resource.cpu-vcores for a yarn cluster with multiple spark clients

If I have 3 spark applications all using the same yarn cluster, how should I set
yarn.nodemanager.resource.cpu-vcores
in each of the 3 yarn-site.xml?
(each spark application is required to have it's own yarn-site.xml on the classpath)
Does this value even matter in the client yarn-site.xml's ?
If it does:
Let's say the cluster has 16 cores.
Should the value in each yarn-site.xml be 5 (for a total of 15 to leave 1 core for system processes) ? Or should I set each one to 15 ?
(Note: Cloudera indicates one core should be left for system processes here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ however, they do not go into details of using multiple clients against the same cluster)
Assume Spark is running with yarn as the master, and running in cluster mode.
Are you talking about the server-side configuration for each YARN Node Manager? If so, it would typically be configured to be a little less than the number of CPU cores (or virtual cores if you have hyperthreading) on each node in the cluster. So if you have 4 nodes with 4 cores each, you could dedicate for example 3 per node to the YARN node manager and your cluster would have a total of 12 virtual CPUs.
Then you request the desired resources when submitting the Spark job (see http://spark.apache.org/docs/latest/submitting-applications.html for example) to the cluster and YARN will attempt to fulfill that request. If it can't be fulfilled, your Spark job (or application) will be queued up or there will eventually be a timeout.
You can configure different resource pools in YARN to guarantee a specific amount of memory/CPU resources to such a pool, but that's a little bit more advanced.
If you submit your Spark application in cluster mode, you have to consider that the Spark driver will run on a cluster node and not your local machine (that one that submitted it). Therefore it will require at least 1 virtual CPU more.
Hope that clarifies things a little for you.

how to install kafka in hadoop cluster

I want to install the latest release of Kafka on my ubuntu Hadoop cluster that contains 1 master nodes and 4 data nodes.
Here are my questions:
Should kafka be installed on all the machines or only on NameNode machine?
What about zookeeper? Should it be installed on all the machines or only
on NameNode machine?
Please share required document to install kafka and Zookeeper in a Hadoop 5 node cluster
The architecture is strictly based on your requirements and on what you have: how powerful your machines are, how much data do they need to process, how many consumers do the Kafka instances need to feed, and so on. In theory you can have 1 kafka instance and 1 zookeeper, but it won't be fault-tolerant - if it fails, you lose data and so on.
You find more information about zookeeper multi-cluster here.
What I would do first is to try to analyze
how much data they need to process,
how much data they need to
"ingest",
how powerful your machines are,
how many consumers you
are going to need,
how reliable your machines are
These are just a few factors to consider before starting to build up an infrastructure. If you want to have a rough estimate based on "just" 5 machines, assuming they are all equally powerful and with a good amount of memory (e.g., 32GB per machine), is that you need is to have at least a couple of Kafka nodes and at least 3 machines for Zookeeper (2N + 1) so that if one fails, Zookeeper can handle this failure.

tomcat 6 - Cluster / BackupManager

I have a question regarding Clustering (session replication/failover) in tomcat 6 using BackupManager. Reason I chose BackupManager, is because it replicates the session to only one other server.
I am going to run through the example below to try and explain my question.
I have 6 nodes setup in a tomcat 6 cluster with BackupManager. The front end is one Apache server using mod_jk with sticky session enabled
Each node has 1 session each.
node1 has a session from client1
node2 has a session from client2
..
..
Now lets say node1 goes down ; assuming node2 is the backup, node2 now has two sessions (for client2 and client1)
The next time client1 makes a request, what exactly happens ?
Does Apache "know" that node1 is down and does it send the request directly to node2 ?
=OR=
does it try each of the 6 instances and find out the hard way who the backup is ?
Not too sure about the workings with BackupManager, my reading of this good URL suggests the replication is intelligent enough in identifying the backup.
In-memory session replication, is
session data replicated across all
Tomcat instances within the cluster,
Tomcat offers two solutions,
replication across all instances
within the cluster or replication to
only its backup server, this solution
offers a guaranteed session data
replication ...
SimpleTcpCluster uses Apache Tribes to maintain communicate with the communications group. Group membership is established and maintained by Apache Tribes, it handles server crashes and recovery. Apache Tribes also offer several levels of guaranteed message delivery between group members. This is achieved updating in-session memory to reflect any session data changes, the replication is done immediately between members ...
You can reduce the amount of data by
using the BackupManager (send only to
one node, the backup node)
You'll be able to see this from the logs if notifyListenersOnReplication="true" is set.
On the other hand, you could still use DeltaManager and split your cluster into 3 domains of 2 servers each.
Say these will be node 1 <-> node 2, 3 <-> 4 and 5 <-> 6.
In such a case - configuring the domain worker attribute, will ensure that session replication will only happen within the domain.
And mod_jk then definitely knows which server to look on when node1 fails.
http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html states
Currently you can use the domain
worker attribute (mod_jk > 1.2.8) to
build cluster partitions with the
potential of having a more scaleable
cluster solution with the
DeltaManager(you'll need to configure
the domain interceptor for this).
And a better example on this link:
http://people.apache.org/~mturk/docs/article/ftwai.html
See the "Domain Clustering model" section.

Resources