JGroups configuration for internet - cluster-computing

I have successfully got JGroups working on the local network across via different machines using TCP, I can't use multicast. I need the ability for two nodes to communicate over the internet. Changing the addresses to the public ones doesn't seem to work and requires additional configuration.
I've looked at http://www.jgroups.org/manual-3.x/html/protlist.html
and set external_addr but maybe there is more to set.
How to you set it up to communicate via public addresses?
Configuration:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.0.xsd">
<TCP bind_port="7800"
recv_buf_size="${tcp.recv_buf_size:130k}"
send_buf_size="${tcp.send_buf_size:130k}"
max_bundle_size="64K"
sock_conn_timeout="300"
client_bind_addr="GLOBAL"
thread_pool.min_threads="0"
thread_pool.max_threads="20"
thread_pool.keep_alive_time="30000"/>
<TCPPING async_discovery="true"
initial_hosts="${jgroups.tcpping.initial_hosts:52.211.80.63[7801]}"
port_range="2"/>
<MERGE3 min_interval="10000"
max_interval="30000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3" />
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 />
<pbcast.STABLE desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<!--RSVP resend_interval="2000" timeout="10000"/-->
<pbcast.STATE_TRANSFER/>
</config>

No, you won't need external_addr unless you are behind a NAT. What you need to do is:
Set TCP.bind_addr (I suggest remove TCP.client_bind_addr), e.g. to one of the public IP addresses (50.x.x.x)
TCPPING.initial_hosts needs to have all (or a majority) of the members' addresses
Your current configuration doesn't work because (1) bind_addr is undefined and (2) initial_hosts lists a member at port 7801, but TCP.bind_port is 7800.

Related

Azure Cloud Service (classic) web role with load balancer probe doesn't seem to work for HTTPS

I am trying to change an old classic cloud service to use an HTTP health probe instead of the default process-based check. We've had some issues where instances stopped working but it thought they were still usable because they were running, but couldn't respond to requests.
Our site is HTTPS only, we don't even have an HTTP endpoint defined. Or didn't. The load balancer doesn't support HTTPS, so I had to add an HTTP endpoint and configure that to be used instead. That's gross but it seems to work. However, the HTTPS site doesn't seem to be covered by the health checks for the HTTP endpoint.
If I query my health probe path via HTTP, I can see that it's returning a 503, and if all instances return a 503 I can see that endpoint fail to load until I make one return 200 again. Once I make one return a 200 again, it works. My requests get routed to the appropriate node, if possible.
However, the HTTPS requests always seem to go to the same instance, regardless of the probes. Flipping that instance from 200 to 503 doesn't cause those requests to go to the other instance, like it does with HTTP. It isn't balanced at all.
It's really hard to find useful documentation or examples on how this should be set up or if it can work at all. Below is my csdef file. Is it possible to get this working (either an HTTPS check or the HTTP check affecting the HTTPS endpoint)?
<?xml version="1.0" encoding="utf-8"?>
<ServiceDefinition name="..." xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2015-04.2.6">
<LoadBalancerProbes>
<LoadBalancerProbe name="health" protocol="http" port="80" path="health" intervalInSeconds="5" timeoutInSeconds="11"></LoadBalancerProbe>
</LoadBalancerProbes>
<WebRole name="..." vmsize="Standard_D2_v3">
<Sites>
<Site name="Web">
<Bindings>
<Binding name="Endpoint1" endpointName="Endpoint1" />
<Binding name="LoadBalancerEndpointBinding" endpointName="LoadBalancerEndpoint" />
</Bindings>
</Site>
</Sites>
<Endpoints>
<InputEndpoint name="Endpoint1" protocol="https" port="443" certificate="..." />
<InputEndpoint name="LoadBalancerEndpoint" protocol="http" port="80" loadBalancerProbe="health" />
</Endpoints>
<ConfigurationSettings>
<!-- ... -->
</ConfigurationSettings>
<Certificates>
<!-- ... -->
</Certificates>
</WebRole>
</ServiceDefinition>
The answer in my case was to add the loadBalancerProbe attribute to both endpoint elements. I also had to make sure that however I was testing the service made new connections for every request or they would always go to the same instance (e.g. HttpClient pools connections internally per instance). This is essentially the new working configuration file:
<?xml version="1.0" encoding="utf-8"?>
<ServiceDefinition name="..." xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2015-04.2.6">
<LoadBalancerProbes>
<LoadBalancerProbe name="health" protocol="http" port="80" path="health" intervalInSeconds="5" timeoutInSeconds="11"></LoadBalancerProbe>
</LoadBalancerProbes>
<WebRole name="..." vmsize="Standard_D2_v3">
<Sites>
<Site name="Web">
<Bindings>
<Binding name="Endpoint1" endpointName="Endpoint1" />
<Binding name="LoadBalancerEndpointBinding" endpointName="LoadBalancerEndpoint" />
</Bindings>
</Site>
</Sites>
<Endpoints>
<InputEndpoint name="Endpoint1" protocol="https" port="443" certificate="..." loadBalancerProbe="health" />
<InputEndpoint name="LoadBalancerEndpoint" protocol="http" port="80" loadBalancerProbe="health" />
</Endpoints>
<ConfigurationSettings>
<!-- ... -->
</ConfigurationSettings>
<Certificates>
<!-- ... -->
</Certificates>
</WebRole>
</ServiceDefinition>

How do you join a Liberty collective on Linux?

I am trying to set up a Liberty collective using Docker hosts running linux. The videos they have about setting up Liberty collectives at the moment use Windows and are all on the same machine.
To join the collective so it appears on adminCenter isn't too hard it is just a matter of collective join --host=...
The problem is the administration part i.e. changing the configuration file or stopping and starting the servers is not working.
I tried various ways of passing in hostInfo in server.xml or --sshPrivateKey Hard coding root passwords and none of them work.
According to the instructions all you needed was an openssh-server which I have already enabled and running I have already exposed the ports and verify I can connect to them using a certificate from the controller container as well.
In addition based on the REST API it uses a stringified SSH Private Key itself rather than a file and that should be sent through the collective registerHost but it does not appear to work and there is nothing in the command line logs even with .level=ALL and ...consolelogger...=ALL that show what the hostAuthInfo is.
The one of the commands I ran for collective join is
collective join defaultServer \
--host=controller \
--port=9443 \
--user=adminUser \
--password=adminPassword \
--autoAcceptCertificates \
--rpcUser=root \
--sshPrivateKey=$HOME/.ssh/id_rsa \
--keystorePassword=$PASSWORD \
--createConfigFile=/config/collective-join-include.xml
I say one of because I tried various combinations where I removed or changed --rpcuser, --sshPrivateKey and other authInfo related items.
server.xml of member is at this point ...
<?xml version="1.0" encoding="UTF-8"?>
<server description="Application Server">
<featureManager>
<feature>javaee-7.0</feature>
<feature>clusterMember-1.0</feature>
<!--<feature>scalingMember-1.0</feature>-->
</featureManager>
<remoteFileAccess>
<writeDir>${server.config.dir}</writeDir>
</remoteFileAccess>
<httpEndpoint id="defaultHttpEndpoint" httpPort="9080" httpsPort="9443" host="*"/>
<!--<hostSingleton name="ScalingMemberSingletonService" port="5164" />-->
<applicationManager autoExpand="true"/>
<!--<hostAuthInfo rpcUser="root" sshPublicKeyPath="/root/.ssh/id_rsa.pub" sshPrivateKeyPath="/root/.ssh/id_rsa"/>-->
<include location="${server.config.dir}/collective-join-include.xml"/>
<dataSource id="myds" jndiName="jdbc/sample" type="javax.sql.XADataSource">
<jdbcDriver javax.sql.ConnectionPoolDataSource="org.mariadb.jdbc.MariaDbDataSource" javax.sql.DataSource="org.mariadb.jdbc.MariaDbDataSource" javax.sql.XADataSource="org.mariadb.jdbc.MariaDbDataSource">
<library>
<file name="${server.config.dir}/mariadb-java-client-1.5.9.jar"/>
</library>
</jdbcDriver>
<properties databaseName="jeesample" password="password" serverName="database" user="jeeuser"/>
</dataSource>
<basicRegistry id="basic" realm="BasicRealm">
<user name="websphere" password="{xor}KDo9LC83Oi06"/>
</basicRegistry>
<ejbContainer>
<timerService>
<persistentExecutor taskStoreRef="mystore"/>
</timerService>
</ejbContainer>
<databaseStore dataSourceRef="myds" id="mystore"/>
</server>
Controller side
<?xml version="1.0" encoding="UTF-8"?>
<server description="Collective Controller">
<variable name="defaultHostName" value="controller"/>
<httpEndpoint id="defaultHttpEndpoint" host="*" httpPort="9080" httpsPort="9443"/>
<featureManager>
<!--<feature>scalingController-1.0</feature>-->
<feature>adminCenter-1.0</feature>
<feature>dynamicRouting-1.0</feature>
</featureManager>
<remoteFileAccess>
<writeDir>${server.config.dir}</writeDir>
</remoteFileAccess>
<!--<scalingDefinitions>
<defaultScalingPolicy enabled="true" min="2" max="2"/>
</scalingDefinitions>-->
<include location="${server.config.dir}/resources/collective/collective-create-include.xml"/>
<collectiveController user="adminUser" password="adminPassword"/>
</server>
By default, when ssh is properly configured and running on the linux machines (controller's and member's host machine), you only need to run the 'collective join' command from the member's wlp/bin dir. You should not need to specify hostInfo nor --sshPrivateKey via server.xml nor the collective updateHost/registHost commands. This flow will use the ssh keys generated by the collective.
The useHostCredentials flag is generally meant to be used with rpcUser and rpcUserPassword (provided via registerHost, updateHost, or server.xml) instead of ssh, especially useful for systems that do not have ssh configured (like windows by default). However, it can also be used to specify custom ssh keys.
If you're still having trouble, provide the collective join command that was ran from the member's wlp/bin, as well as the server.xml of the controller and member.

Tomcat session replication through mod_jk, getting expired when one is down

I have setup session replication between two tomcats running on different machine using mod_jk, when one tomcat is down, my session is getting expired, and when I refresh its going to next tomcat. But ideally, it shouldn't get expired. Any help will be highly appreciated. Here is my config files.
Server.xml
<Engine name="Catalina" defaultHost="localhost" jvmRoute="worker1">
<!--For clustering, please take a look at documentation at:
/docs/cluster-howto.html (simple how to)
/docs/config/cluster.html (reference documentation) -->
<!--
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>
-->
<!-- Clustering configuration start -->
<Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8">
<!--<Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false"
notifyListenersOnReplication="true"/>-->
<Channel className="org.apache.catalina.tribes.group.GroupChannel">
<Membership className="org.apache.catalina.tribes.membership.McastService"
address="228.0.0.4"
port="45564" frequency="500"
dropTime="3000"/>
<Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
<Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
</Sender>
<Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
address="auto" port="4000" autoBind="100"
selectorTimeout="5000" maxThreads="6"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
<Interceptor className="org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor"/>
</Channel>
<Valve className="org.apache.catalina.ha.tcp.ReplicationValve" />
<ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener" />
workers.properties
# First we define virtual worker's list
worker.list=jkstatus,LoadBalancer
# Enable virtual workers earlier
worker.jkstatus.type=status
worker.LoadBalancer.type=lb
# Add Tomcat instances as workers, three workers in our case
worker.worker1.type=ajp13
worker.worker1.host=localhost
worker.worker1.port=8009
worker.worker2.type=ajp13
worker.worker2.host=10.57.79.232
worker.worker2.port=8019
# Provide workers list to the load balancer
worker.LoadBalancer.balance_workers=worker1,worker2

Using OpenNMS to monitor SNMP, can't see MIB data

I have a MIB with oids and events. The device the MIB relates to is online. OpenNMS sees and gathers information about the interfaces on the appliance, as well as the linux variant it is running.
but it doesn't see the other oids (or I can't find/chart them).
The gui reports Polling Status (Managed) and Package (uti_p). The uti_p package is
<package name="uti_p">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="10.19.0.200" end="10.19.0.210" />
<rrd step="300">
<rra>RRA:AVERAGE:0.5:1:2016</rra>
<rra>RRA:AVERAGE:0.5:12:1488</rra>
<rra>RRA:AVERAGE:0.5:288:366</rra>
<rra>RRA:MAX:0.5:288:366</rra>
<rra>RRA:MIN:0.5:288:366</rra>
</rrd>
<service name="ICMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2" />
<parameter key="timeout" value="3000" />
<parameter key="rrd-repository" value="/var/lib/opennms/rrd/response" />
<parameter key="rrd-base-name" value="icmp" />
<parameter key="ds-name" value="icmp" />
</service>
<service name="SNMP" interval="300000" user-defined="false" status="on">
<parameter key="retry" value="2"/>
<parameter key="timeout" value="3000"/>
<parameter key="port" value="161"/>
<parameter key="oid" value=".1.3.6.1.4.1.nnnn"/>
</service>
<downtime interval="30000" begin="0" end="300000" />
<downtime interval="300000" begin="300000" end="43200000" />
<downtime interval="600000" begin="43200000" end="432000000" />
<downtime begin="432000000" delete="true" />
</package>
I have a collectd-configuration
<package name="uti_p">
<filter>IPADDR != '0.0.0.0'</filter>
<include-range begin="10.19.0.200" end="10.19.0.210"/>
<service name="SNMP" interval="30000" user-defined="false" status="on">
<parameter key="collection" value="HsmLan"/>
<parameter key="port" value="161"/>
<parameter key="retry" value="3"/>
<parameter key="timeout" value="3000"/>
<parameter key="thresholding-enabled" value="true"/>
</service>
</package>
The .../rrd/snmp/{node} directory does not show any collection of data for the various oids that I am looking for.
In response to comments:
$ snmpwalk -v 2c -c FIPS14023 10.19.0.204 iso.3.6.1.4.1.nnnn
...
iso.3.6.1.4.1.nnnn.1.1.10.0 = INTEGER: 29
iso.3.6.1.4.1.nnnn.1.1.11.0 = STRING: "29.0"
...
i.e., it returns the expected data from the MIB.
The Community set in OpenNMS is FIPS14023, and the automatic discovery process finds the node.
Node Hsm.204
Interface 10.19.0.204
Polling Status Managed
Polling Package uti_p
Monitor Class org.opennms.netmgt.poller.monitors.SnmpMonitor
Service Parameters
oid .1.3.6.1.4.1.nnnn
The Requisition name for the class of appliance is correct.
I just can find where the above String "29.0" is supposed to appear. As I understand it, the data should be collected into RRD db files, but the node Database IDs never show up in the /etc/opennms/rrdsnmp directory.
riw#riw-ubuntu-vbox:/etc/opennms/rrdsnmp$ ls
riw#riw-ubuntu-vbox:/etc/opennms/rrdsnmp$
Thank you!
I would debug the SNMP access as the following with assuming you have SNMP v2 setup:
Test access to SNMP agent from your OpenNMS server with snmpwalk -v 2c -c
OpenNMS associates SNMP community strings with the IP address. You can verify this in the Web UI in "Admin -> Configure OpenNMS -> Configure OpenNMS Community Names by IP Address" and use the "Lookup" field. It will show you which SNMP community is configured for your device and fix this if necessary.
Go to the node page and rescan the server and see if the "SNMP Attributes" fields with IP and physical interfaces get filled.
If Step 1 does not give you the full SNMP tree you have to fix your surrounding configuration, IP tables or SNMP agent configuration for views and community.
Is there a reason you are defining a different collection set "HsmLan" over the default collection for SNMP in collectd?
Be sure to include the collection service at the bottom of collectd-configuration.xml for HsmLan, ala:
<collector service="HsmLan" class-name="org.opennms.netmgt.collectd.SnmpCollector"/>
If there is, you need to define this "snmp-collection" in datacollection-config.xml and include collections via the tag which references collections named in xml files (that have OIDs, etc) in etc/datacollection/
The poller-configuration.xml you listed at the top has no bearing on data collection. It is for service polling.

Infinispan and JGroups discovery on EC2

I'm trying to use my application on AWS EC2 on some Linux boxes with Tomcat servers. Previously I used my application with Infinispan on LAN and I used UDP multicasting for JGroups member discovery. EC2 does not support UDP multicasting and this is the default node discovery approach used by Infinispan to detect nodes running in a cluster. I looked into using the S3_PING protocol, but I have not figured out why it doesn't work.
Does anyone have any ideas what the problem might be here?
Here is my configuration files:
1. applicationContext-cache.xml
<!-- Infinispan cache -->
<cache:annotation-driven/>
<import resource="classpath:/applicationContext-dao.xml"/>
<bean id="cacheManager" class="org.infinispan.spring.provider.SpringEmbeddedCacheManagerFactoryBean">
<property name="configurationFileLocation" value="classpath:/infinispan-replication.xml"/>
</bean>
<context:component-scan base-package="com.alex.cache"/>
2.infinispan-replication.xml
<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
xmlns="urn:infinispan:config:5.1">
<global>
<transport transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
<properties>
<property name="configurationFile" value="/home/akasiyanik/dev/projects/myapp/myapp-configs/jgroups.xml"/>
</properties>
</transport>
</global>
<default>
<!-- Configure a synchronous replication cache -->
<clustering mode="replication">
<sync/>
<hash numOwners="2"/>
</clustering>
</default>
</infinispan>
3. jgroups.xml
<config>
<TCP bind_port="${jgroups.tcp.port:7800}"
loopback="true"
port_range="30"
recv_buf_size="20000000"
send_buf_size="640000"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
enable_bundling="true"
use_send_queues="true"
sock_conn_timeout="300"
enable_diagnostics="false"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="30"
thread_pool.keep_alive_time="60000"
thread_pool.queue_enabled="false"
thread_pool.queue_max_size="100"
thread_pool.rejection_policy="Discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="2"
oob_thread_pool.max_threads="30"
oob_thread_pool.keep_alive_time="60000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="Discard"
/>
<S3_PING location="r********s" access_key="AK***************SIA"
secret_access_key="y*************************************BJ" timeout="2000" num_initial_members="2"/>
<MERGE2 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD timeout="3000" max_tries="3"/>
<VERIFY_SUSPECT timeout="1500"/>
<BARRIER />
<pbcast.NAKACK use_mcast_xmit="false"
exponential_backoff="500"
discard_delivered_msgs="true"/>
<UNICAST />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<pbcast.STATE_TRANSFER/>
</config>
Use this: https://github.com/meltmedia/jgroups-aws
It is an implementation of JGroups discovery protocol for AWS using AWS API (multicast replacement)

Resources