OpenNMS Email Notification not working - opennms

I have installed OpenNMS Horizon and configured notifcations as follows:
users admin and rtc have an email address;
both are part of the Email-Admin group (Admin / Configure Notifications Destination Paths);
notifications have been turned on (Admin / Event Management);
for testing purposes, I have configured a custom nodeDown event, which has the Email-Admin group on its destination path (My Node DOWN Alert; OpenNMS-defined node event: nodeDown; uei.opennms.org/nodes/nodeDown)
Current Rule:
(IPADDR != '0.0.0.0')
I have set up a gmail account in xxx as follows:
org.opennms.core.utils.useJMTA=false
org.opennms.core.utils.transport=smtps
org.opennms.core.utils.mailHost=smtp.gmail.com
org.opennms.core.utils.smtpport=587
org.opennms.core.utils.smtpssl.enable=true
org.opennms.core.utils.authenticate=true
org.opennms.core.utils.authenticateUser=XXX#gmail.com
org.opennms.core.utils.authenticatePassword=XXX
org.opennms.core.utils.starttls.enable=true
org.opennms.core.utils.messageContentType=text/html
org.opennms.core.utils.charset=us-ascii
org.opennms.core.utils.fromAddress=OpenNMS Administrator
Gmail is configured with setting allow less secure applications.
My Question:
When I am powering off my test machine, I can see a nodeDown event in the Horizon Dashboard. However, the system does not send an email notification.
According to notefid.log (/opt/opennms/logs/notifd.log) the system does not even try to send an email.
Changing the port to org.opennms.core.utils.smtpport=465 is not working either.
What am I missing? Please advise!
EDIT
Email is working properly with this configuration (/opt/opennms/etc/javamail-configuration.properties):
org.opennms.core.utils.useJMTA=false
org.opennms.core.utils.transport=smtps
org.opennms.core.utils.mailHost=smtp.gmail.com
org.opennms.core.utils.smtpport=465
org.opennms.core.utils.smtpssl.enable=true
org.opennms.core.utils.authenticate=true
org.opennms.core.utils.authenticateUser=xxx#gmail.com
org.opennms.core.utils.authenticatePassword=xxx
org.opennms.core.utils.starttls.enable=true
org.opennms.core.utils.messageContentType=text/html
org.opennms.core.utils.charset=us-ascii
org.opennms.core.utils.fromAddress=OpenNMS Administrator <xxx#gmail.com>
A scheduled outage prevented the system from sending emails. The scheduled outage did not vanish upon deletion. I had to add a second outage and then delete the first entry.

There are a number of reasons why e-mails can't be sent. In Step 4 you state that you have configured a custom nodeDown event (which I assume is different than the default nodeDown event). Verify that your custom notice is also enabled.
Your next step will be to edit /opt/opennms/etc/log4j2.xml and scroll to the bottom. Set the log level for "notifd" to DEBUG. Then repeat your test and my guess is you will see an error in the log with connecting to GMail. Correct that and you should be good to go.

Related

Using Windows Task Scheduler for an automatic network firewall authentication

My college requires students to periodically authenticate for using WiFi and LAN. I am writing a Python script that will automatically do that so that I don't have to manually enter my credentials. The authentication is also separate for WiFi and LAN, and that makes me enter my credentials when I switch between them. So, for the python script, I want to detect when my authentication has expired and my connection is disconnected.
I also don't want the python script to be running constantly in the background and pinging a website as that really isn't optimal and I'll have to run the script every time my PC restarts. I was thinking of using the Windows Task Scheduler to fire the script when it detects that my connection is lost. The trigger event cannot be fixed intervals as the connection can be lost in between the intervals and also when switching between LAN and WiFi.
So, is there any network event that will capture the functionality I want? As Windows gives a notification of "opening the browser to connect" I feel there has to be a background event running.
I tried the NetworkProfile/Operational Event in the Task Scheduler with event id 10001 and 8003. But that just fires when I switch off the WiFi of my PC.
Thank you
Got it!
NetworkProfile/Operational Event with ID 4002 waits for network authentication.

The fix client can receive incoming messages but cannot send outgoing heartbeat message

We have built a fix client. The fix client can receive incoming messages but cannot send outgoing heartbeat message or reply the TestRequest message after the last heartbeat was sent, something is triggered to stop sending heartbeat anymore from client side.
fix version: fix5.0
The same incident happened before, we have tcpdump for one session in that time
we deploy every fix session to separated k8s pods.
We doubted it's CPU resource issue because the load average is high around the issue time, but it's not solved after we add more cpu cores. we think the load average is high because of fix reconnection.
We doubted it's IO issue because we use AWS efs which shared by 3 sessions for logging and message store. but it's still not solved after we use pod affinity to assign 3 sessions to different nodes.
It's not a network issue either, since we can receive fix messages, other sessions worked well at that time. We have disabled SNAT in k8s cluster too.
We are using quickfixj 2.2.0 to create a fix client, we have 3 sessions, which are deployed to k8s pods in separated nodes.
rate session to get fx price from server
order session to get transaction(execution report) messages from server, we only send logon/heartbeat/logout messages to server.
backoffice session to get marketstatus
We use apache camel quickfixj component to make our programming easy. It works well in most time, but it keeps happening to reconnect to fix servers in 3 sessions, the frequency is like once a month, mostly only 2 sessions have issues.
heartbeatInt = 30s
The fix event messages at client side
20201004-21:10:53.203 Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=1)
20201004-21:10:53.271 MINA session created: local=/172.28.65.164:44974, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/10.60.45.132:11050
20201004-21:10:53.537 Initiated logon request
20201004-21:10:53.643 Setting DefaultApplVerID (1137=9) from Logon
20201004-21:10:53.643 Logon contains ResetSeqNumFlag=Y, resetting sequence numbers to 1
20201004-21:10:53.643 Received logon
The fix incoming messages at client side
8=FIXT.1.1☺9=65☺35=0☺34=2513☺49=Quote1☺52=20201004-21:09:02.887☺56=TA_Quote1☺10=186☺
8=FIXT.1.1☺9=65☺35=0☺34=2514☺49=Quote1☺52=20201004-21:09:33.089☺56=TA_Quote1☺10=185☺
8=FIXT.1.1☺9=74☺35=1☺34=2515☺49=Quote1☺52=20201004-21:09:48.090☺56=TA_Quote1☺112=TEST☺10=203☺
----- 21:10:53.203 Already disconnected ----
8=FIXT.1.1☺9=87☺35=A☺34=1☺49=Quote1☺52=20201004-21:10:53.639☺56=TA_Quote1☺98=0☺108=30☺141=Y☺1137=9☺10=183☺
8=FIXT.1.1☺9=62☺35=0☺34=2☺49=Quote1☺52=20201004-21:11:23.887☺56=TA_Quote1☺10=026☺
The fix outgoing messages at client side
8=FIXT.1.1☺9=65☺35=0☺34=2513☺49=TA_Quote1☺52=20201004-21:09:02.884☺56=Quote1☺10=183☺
---- no heartbeat message around 21:09:32 ----
---- 21:10:53.203 Already disconnected ---
8=FIXT.1.1☺9=134☺35=A☺34=1☺49=TA_Quote1☺52=20201004-21:10:53.433☺56=Quote1☺98=0☺108=30☺141=Y☺553=xxxx☺554=xxxxx☺1137=9☺10=098☺
8=FIXT.1.1☺9=62☺35=0☺34=2☺49=TA_Quote1☺52=20201004-21:11:23.884☺56=Quote1☺10=023☺
8=FIXT.1.1☺9=62☺35=0☺34=3☺49=TA_Quote1☺52=20201004-21:11:53.884☺56=Quote1☺10=027☺
Thread dump when TEST message from server was received.BTW, The gist is from our development environment which has the same deployment.
https://gist.github.com/hitxiang/345c8f699b4ad1271749e00b7517bef6
We had enabled the debug log at quickfixj, but not much information, only logs for messages receieved.
The sequence in time serial
20201101-23:56:02.742 Outgoing heartbeat should be sent at this time, Looks like it's sending, but hung at io writing - in Running state
20201101-23:56:18.651 test message from server side to trigger thread dump
20201101-22:57:45.654 server side began to close the connection
20201101-22:57:46.727 thread dump - right
20201101-23:57:48.363 logon message
20201101-22:58:56.515 thread dump - left
The right(2020-11-01T22:57:46.727Z): when it hangs, The left(2020-11-01T22:58:56.515Z): after reconnection
It looks like that the storage - aws efs we are using made the issue happen.
But the feedback from aws support is that nothing is wrong at aws efs side.
Maybe it's the network issue between k8s ec2 instance and aws efs.
First, we make the logging async at all session, make the disconnection happen less.
Second, for market session, we write the sequence files to local disk, the disconnection had gone at market session.
Third, at last we replaced the aws efs with aws ebs(persist volume in k8s) for all sessions. It works great now.
BTW, aws ebs is not high availability across zone, but it's better than fix disconnection.

Difficulty connecting to Websphere MQ manager using domain authentication

This is a follow-up to: Can create Websphere Queue Manager but not connect
I'm trying to set up MQ on a development machine, but if I try to connect to it using my domain account it's unable to authenticate (AMQ4999). Digging a little further I find this in the error logs:
AMQ8079: Access was denied when attempting to retrieve group membership
information for user 'xxx#domain'.
Now I'm well aware of the known issue with MQ where it fails to authenticate domain accounts since it's unable to access their member information, and have confirmed from the logs that this is definitely what's happening here, so I tried overriding this using the following script gleaned from the previous post:
DEFINE CHL('DOTNET.SVRCONN') CHLTYPE(SVRCONN) MCAUSER('MUSR_MQADMIN#hostname')
SET CHLAUTH('DOTNET.SVRCONN') TYPE(BLOCKUSER) USERLIST('nobody')
SET CHLAUTH('DOTNET.SVRCONN') TYPE(ADDRESSMAP) ADDRESS('*') USERSRC(CHANNEL) ACTION(ADD)
However, even with this channel in place I still cannot connect to the queue manager while logged into my domain account. I'm still plagued with the exact same error I was getting previously. One thing I did notice was that MQ Explorer reports the channel as inactive even though I started it (although judging by my reading from IBM's website this is normal).
I'm still very new to MQ so I think I'm either missing something or did something wrong, but ideally I would like to be able to set up a dev environment where I can hit the service without having to rely on the 'runas' command. I should also emphasize that this is strictly for dev/learning so obviously I'm not concerned about security.
Update:
I found out what I was doing wrong -- sure enough I was missing a step. A little more background. Upon creating the QM I was trying to connect to it using a simple C# client. Originally I wrote code that looked like this:
var queueManager = new MQQueueManager("MyQueueManager", MQC.MQCNO_STANDARD_BINDING);
Also, when trying to connect via MQExplorer both appears to be using my domain credentials to authenticate. However when I explicitly created a properties object and specified the channel like such:
var props = new Hashtable() {
[MQC.HOST_NAME_PROPERTY] = "localhost",
[MQC.PORT_PROPERTY] = 1414,
[MQC.CHANNEL_PROPERTY] = "DOTNET.SVRCONN",
[MQC.USER_ID_PROPERTY] = "DevMQUser",
[MQC.PASSWORD_PROPERTY] = "p#$$w0rd"
};
var queueManager = new MQQueueManager("MyQueueManager", props);
Then everything worked correctly. I still need to run MQExplorer.exe as a local user (even explicitly setting credentials in Connection Details > Properties doesn't seem to work), but this isn't a big deal.
Thanks for the suggestions.
Try changing...
SET CHLAUTH('DOTNET.SVRCONN') TYPE(ADDRESSMAP) ADDRESS('*') USERSRC(CHANNEL)
To...
SET CHLAUTH('DOTNET.SVRCONN') TYPE(ADDRESSMAP) ADDRESS('*') USERSRC(MAP) MCAUSER(MUSR_MQADMIN)
The USERSRC(CHANNEL) says to take the ID that is presented to the channel, in this case the local process ID of your logged-in account, to override MCAUSER.
MQ Security diagnostics
For connectivity issues over channels, grab SupportPac MS0P and install into MQ Explorer. Then turn on Authorization Events and Channel Events and recreate the problem. If the connection is blocked by a CHLAUTH record, this shows up in the Channel Event queue. If it is blocked by OAM it shows up in the QMgr Event queue. From Explorer with MS0P installed, right-clicking on the queue name from the Queues panel opens a context dialog that includes "Format event messages" as an option. Select is and MS0P will parse the PCF message into human-readable values that show all the parameters that were presented to MQ and why it blocked the connection.
IBM MQ v8
If this is v8 of MQ, you also have ID and password checking to configure. If the QMgr points to an AUTHINFO record that specifies ID and password checking (IDPWOS) the password can't be blank if the ID is set. Even if the password authentication is set to OPTIONAL the check will be made if an ID is present on the channel, which the client code will ensure is true unless specifically overridden.

Hermes JMS cannot connect to Websphere MQ 7.1 (2035 error)

I am trying to connect to Websphere MQ 7.1 with Hermes JMS but I am not able to. I have followed their giude, loaded all the jars without problems, set the plugin, set all the variables (hostname, port, transportType, queuemanager), checked the box at the bottom that says user and typed the username and password and after confirming I tried to discover however I get the following message back:
com.ibm.mq.MQException: MQJE001: Completion Code '2', Reason '2035'.
at
com.ibm.mq.MQManagedConnectionJ11.(MQManagedConnectionJ11.java:233)
at
com.ibm.mq.MQClientManagedConnectionFactoryJ11._createManagedConnection(MQClientManagedConnectionFactoryJ11.java:553) at
com.ibm.mq.MQClientManagedConnectionFactoryJ11.createManagedConnection(MQClientManagedConnectionFactoryJ11.java:593)
at
com.ibm.mq.StoredManagedConnection.(StoredManagedConnection.java:95)
at
com.ibm.mq.MQSimpleConnectionManager.allocateConnection(MQSimpleConnectionManager.java:198)
at
com.ibm.mq.MQQueueManagerFactory.obtainBaseMQQueueManager(MQQueueManagerFactory.java:882)
at
com.ibm.mq.MQQueueManagerFactory.procure(MQQueueManagerFactory.java:770)
at
com.ibm.mq.MQQueueManagerFactory.constructQueueManager(MQQueueManagerFactory.java:719)
at
com.ibm.mq.MQQueueManagerFactory.createQueueManager(MQQueueManagerFactory.java:175)
at com.ibm.mq.MQQueueManager.(MQQueueManager.java:647) at
hermes.ext.mq.MQSeriesAdmin.getQueueManager(MQSeriesAdmin.java:107)
at
hermes.ext.mq.MQSeriesAdmin.discoverDestinationConfigs(MQSeriesAdmin.java:280)
at
hermes.impl.HermesAdminAdapter.discoverDestinationConfigs(HermesAdminAdapter.java:82)
at
hermes.impl.DefaultHermesImpl.discoverDestinationConfigs(DefaultHermesImpl.java:1126)
at
hermes.browser.tasks.DiscoverDestinationsTask.invoke(DiscoverDestinationsTask.java:77)
at hermes.browser.tasks.TaskSupport.run(TaskSupport.java:175) at
hermes.browser.tasks.ThreadPool.run(ThreadPool.java:170) at
java.lang.Thread.run(Thread.java:662)
After a few hours of trial and error and research on the net, it seems that the issue is that it cannot connect due to bad authorization however I am able to connect using Java code (Using same lib MQQueueConnectionFactory) and I am also able to connect using QueueZee with the exact same libraries, get a list of all queues and browse them so I know user authorization issues should not be the problem.
I am running Hermes JMS 1.14 and I tried using both Java 1.6.0_33 and 1.7.0_5. Websphere MQ is running on version 7.1.0.0 and the libraries were gotten from this installation on a remote server.
I tried setting the channel variable to SYSTEM.DEF.SVRCONN which is what I used in QueueZee to get it to work but still the same issue.
Has anybody seen this issue before and hopefully can shed some light in the situation?
At V7.1 the new CHLAUTH rules shut off access to all SYSTEM.* channels except SYSTEM.ADMIN.SVRCONN by default and do not allow any administrative access on any SVRCONN channel by default. In order to diagnose this it would be necessary to know what channel was used, the CHLAUTH rules that are set, the channel definition (in particular, the MCAUSER value) and whether the ID used is in the mqm group.
You didn't mention whether the QueueZee setup was also to a V7.1 QMgr or this one in particular. Taking a wild guess, I'd say that CHLAUTH rules are enabled and that the SYSTEM.DEF.SVRCONN channel is disabled at this point. Recommended steps are to define a new channel whose name doesn't start with SYSTEM. and make sure the ID used is not in the mqm group but is authorized as a non-admin ID.
Alternatively, an ID in the mqm group can be used but you'd have to define a CHLAUTH rule to allow it to work. For example, the default CHLAUTH rule uses CHANNEL(*) BLOCKUSER(*MQADMIN) and you could change that to CHANNEL(THE.NEW.CHL.NAME) BLOCKUSER('nobody'). The new rule would be more specific than the old rule and thus take precedence on your channel. It tells the QMgr to block the user ID 'nobody' but omits any mention of *MQADMIN. Since 'nobody' doesn't have access anyway but since *MQADMIN is not mentioned (and thus not blocked by thei rule) the effect of the rule is to allow admins on this channel.
As a quick, dirty and temporary measure, you can also ALTER QMGR CHLAUTH(DISABLED) to get the same behavior as in v7.0 and earlier QMgrs. Be aware though that this allows anonymous remote admin and remote code execution using the mqm user ID. That's why the default settings were changed. Now you must explicitly provision remote admin access if you need it.
For more on this topic, I recommend the Securing Your QMgr presentation from the IMPACT conference.
Note that the password the app sends in is not checked by the QMgr. The field exists so that channel exits can validate the password against AD, LDAP, etc. Without such an exit, the password is ignored. The user ID passed in by the client is either accepted at face value or modified by the channel's MCAUSER or by CHLAUTH rules.
Finally, when having authorization problems the easiest way to diagnose is to ALTER QMGR AUTHOREV(ENABLED) and then use SupportPac MS0P to decode the PCF messages in WMQ Explorer. The auths errors end up in the QMgr Event queue. Each message tells you the object that failed auths, the API call made against that object, the options of the call and the ID that made the call. Often we find the ID making the call isn't the one we wanted or that the program is using options it isn't authorized for so this can be extremely helpful.
Not really an answer, just a little research on the problem.
I have faced the same problem about hour ago. I am passing the username like domain\sortoflongusername and what i see in systemlog on WSMQ server is that my username is being truncated to 12 symbols.
I'm not really familiar with hermesJMS and soapui at all (just wanted to offer it to our testers to check it out as testing platform), so maybe anyone here does know about roots of this problem.

How to use notificationconf?

I have read THIS tutorial about creating Push nodes and posting/subscribing to notifications.
The only problem I have met is that it seems that notificationconf unable to create that node...
My first question: are nodename (parameter of notificationconf tool) and notificationName (NSString which I use from app) the same things?
Second:
notificationconf createnode push.example.com BFMyTestPushhNotification beefon
Enter password: // password from Open Directory for user beefon - it is Admin of the 10.6 server
2010-01-24 13:24:58.916 notificationconf[15221:903] created XMPP session
2010-01-24 13:24:58.931 notificationconf[15221:903] Connecting to push.example.com:5222 with user com.apple.notificationuser#push.example.com/TestPubsub, security = 2 ...
2010-01-24 13:24:59.130 notificationconf[15221:903] sessionCallback (event 1)
2010-01-24 13:24:59.130 notificationconf[15221:903] Session stopped (event 1)
What I do wrong?
And posting notification from app does nothing...
Thanks for any help!
I've been trying to use Snow Leopard Server's Push Notification service with a custom application based on XMPP Publish–Subscribe. I struggled to create a node but finally figured it out.
Track down the password for the service account com.apple.notificationuser. You can find it, for example, in /private/etc/dovecot/notify/notify.plist.
Connect to your push notification server with JID com.apple.notificationuser#your-chat-server-hostname.com and that password.
Create nodes the normal way. In XMPPFramework it's like this:
XMPPJID *serviceJID =
[XMPPJID jidWithString:#"pubsub.your-chat-server-hostname.com"];
XMPPPubSub *xmppPubSub = [[XMPPPubSub alloc] initWithServiceJID:serviceJID];
[xmppPubSub createNode:#"pubsub.your-chat-server-hostname.com`
withOptions:nil];
The server creates the node. It responds with an iq, but not the one the spec requires. It does send a compliant error if the node already exists.
<iq xmlns="jabber:client"
to="com.apple.notificationuser#your-chat-server-hostname.com/..."
from="pubsub.your-chat-server-hostname.com"
id="...:create_node" type="result"/>
Connect using that same user to publish your updates.
I was never able to get notificationconf to work.
Notifications are easy to use on the same node, but harder across a network. Especially, I don't think too many people are actually using it, as Google search results are scarce :) Now, regarding your questions:
For 1: yes, you need to have matching nodename and notificationName. The man page says so (although not crystal-clear):
createnode hostname nodename username
Creates a node on the server to send notifications using. Before
a client can subscribe to notifications with a given name, the
server must be configured with a node with a matching name.
So, first you have to create the node, then you can listen to notifications of a given name. Otherwise, you don't get the notifications.
For 2: I get this error when there is no XMMP daemon running (i.e. port 5222 is closed). Is that port open for you? (check the output of nmap -p 5222 push.example.com).

Resources