Elasticsearch replication cluster with different OS - elasticsearch

If I am running one ES on Windows server, another ES on Linux, and the third ES on Unix, can I cluster them and make them replicate each other? Is it possible?
Server A Windows 192.168.0.100
Server B Linux 192.168.0.101
Server C Unix 192.168.0.102

Well it possible as soon as they can "see" each other in network.
I think more important that should have similar configuration in terms of memory and CPU this would be more important but again it depends on performance you are looking for.

Related

NiFi Flow Files Stuck at Outbound Port - for remote connections

I maintain several high capacity NiFi clusters that requires regular maintenance. It's used alot.
Recently I have an odd problem I cannot fix. Need some help.
For this example I am using Cluster A sending files to Cluster B in separate network domains.
Cluster B pulls files from Cluster A outbound port C using a remote processor group.
Normally Flow files will arrive at C and hang there for only a few seconds at the most.
Recently however those same flow files are hanging at C for several hours.
What is causing the files to hang for such a long time? I recently upgraded the cluster VMs to something more powerful with more ram and cpu cores. Do I need to change something in nifi.properties? Any help is appreciated.

Will Run H2O on local desktop will speed up calculation?

I just start to learn H2O. I am confused about if i run H2O at home just for leaning purpose. When I simply run "h2o.init()" then start data clean or modeling using H2O. Will it speed up the calculation speed for big data? Is it automatically connect to some H2O cluster online? Where is the H2O cluster located?
When you run h2o.init() (i.e. with no arguments) it will start a "cluster", on that same machine. By default it will be given about a quarter of your machine's memory, and can use either all threads or two threads (the latter is if using R and you installed it from CRAN). You will find Flow listening on http://127.0.0.1:54321/
If you already have an H2O cluster running on another machine (whether on your LAN or a distant cloud server), give the address to h2o.init() to have it connect to that instead of starting anything locally.
Run help(h2o.init) (on Python) or ?h2o.init (on R) to see all the available options.
NOTE: H2O is a client/server architecture, but the server (also called the "cluster", even if you only have one machine) is where all the action takes place, and where the data and models are kept, and the client is relatively thin. Responding to one of the comments, if you are comparing H2O running localhost to a library like scikit-learn, there is not much difference (in available compute power). The advantage of H2O is that you can easily and transparently add more machines over a LAN, to increase available memory and (to some extent) compute power; and having clients in languages other than R. The disadvantages are mainly around having to remember the server is where your data is kept; e.g. with large data sets use the functions to load it directly into your server, because keeping a copy in the client is just wasting memory.

Elasticsearch DNS Host Setup in a cluster

I have been asked to take care of an ES cluster with 14 machines.
For example:
Hostnames of my machines start with es1.abc.com till es14.abc.com
These all are part of a cluster called "es-test" configured in elasticsearch.yml
Now suppose the DNS A record of this cluster is es.abc.com.
The current DNS resolves es.abc.com to 4 machines es1.abc.com to es4.abc.com in a round robin set up. So all the HTTP requests hit these 4 machines first.
The only difference I see is that these 4 machines having SSD and other 10 machines have hard disks
I do not have any idea why the original engineer set up like this.
In ES documentation , I could not find any particular mention of DNS resolution guidelines.
I do not see a reason why I can not set up DNS resolution to all 14 machines.
I am not an expert in ES. If I have missed something obvious pleaase point me to the right documentation.
Your help would be truly appreciated

Running Hadoop in virtual environment

I would like to know whether I should expect problems when having Hadoop cluster on virtual instead of physical machines?
I'm mostly worried about using the same hard drive, I read that I should count for 1-2 containers per drive,but in my case only one drive will exist. Could that be a problem?
I think it depends upon how much size are you allocating for containers. Of course there would be limitation to number of containers if you have restriction to the memory.
I can highlight few points which can be considered while running hadoop cluster in virtual environment:
Network configuration in case of multi node cluster
Obvious the performance of application
Affect on scalability as limited resources if you are planning to run the cluster on host which has low configuration hardware

The best memory configuration for ElasticSearch

I have one linux server with 128G memory and 32 cpu cores. I would run an ElasticSearch instance on this server, the server is exclusively only for running ES. So how many memory I should configure for ES. How could I get the best performance of ES please. Is the server too luxurious for ES? Thanks!
I suggest you run two ES instances in each server. Since your linux server pretty powerful, if you set the ES memory as 60g or 80g it may encounter GC problem. Try to run two or three ES instances in one server and monitor the CPU and Memory usage, btw, change the http port of ES for running multiple nodes in one server.

Resources