OpenNMS SNMP Traps Stopped Working - How to Further Troubleshoot - opennms

About 5 days ago, OpenNMS Horizon 22.02 on Ubuntu 18.04.1 LTS stopped accepting traps from network elements. No changes were made to configuration or underlying operating system to my knowledge.
There are about 125 network elements, all Cisco, sending traps.
So far I have checked the following:
tcpdump shows the traps coming into the interface on port 162
Turned on Debug for trapd.log and incoming traps from network elements do not create any log entries
Traps sent with send-trap.pl from the localhost create traps that flow all the way to events
Traps sent with snmptrap either on localhost or another host create log entries that flow all the way to events. The other host is using the same interface that the network elements are using.
ss -lnpu sport = :162 shows an open UPD "UNCONN"
sudo lsof -i :162 shows a single listener java process
Startup of trapd does not seem to show any warnings in the log
I have verified that the ufw and iptables are off
I have updated OpenNMS to 22.04 and updated Ubunutu with no relief
Restarted OpenNMS many many times...
I moved Trapd startup after Asterisk in service-configuration.xml based on this
All of this seems similar to this. I think the last commenter on that thread asked about comparing the successful and unsuccessful traps in Wireshark which I have not done but all of the traps that are being sent have worked hundreds if not thousands of times until November 6th.
Is there anywhere else to look for errors as to why Trapd is not accepting traps? I think I have ruled out network issues.
I created a new Ubuntu 18.04 VM, updated it and then installed Horizon 23.01 fresh. I pointed my stream of traps at it and it behaves the exactly the same way, none of the traps create any log entries on the trapd.log with the level set to debug. Tcpdump shows the traps coming to the interface.

Issue Resolved.
The underlying operating system lost its static route for the subnet that the traps were coming from. OpenNMS had a route back to the subnet but not via the path that the traps were coming in from. Once the static route was restored, traps started working again and were flowing all the way to events.

Related

Windows 10 SNMP service not responding

I'm trying to get my head around SNMP for a project I'm working on. After I failed miserably getting it to work in my company's network, I set up a simple 3-device network to test things on, consisting of two Windows 10 PCs and a manageable switch between them.
I installed the optional feature "SNMP" on both PCs, made sure the service is running correctly and configured both services to accept SNMP queries from each other. I made sure to open up UDP port 161 in both PCs firewalls. Then I got the Net-SNMP binaries in order to use SNMPGET and SNMPWALK. As an alternative, I set up the SNMP extension for PHP through xampp (since I want to use PHP in my project once I get SNMP to work). Finally, I installed wireshark to monitor what exactly is going on and this is what I found:
When I try SNMPGET or SNMPWALK either through cmd or as a PHP command, I always get a timeout message. Wireshark is showing the get-next-request leaving one PC and arriving correctly on the other, so the network connection itself is working fine. But the receiving PC never sends a response. As I said, I'm pretty new to SNMP and I'm at a loss as to why this is happening. As I understand it, the optional feature for Windows 10 comes with its own SNMP agent, correct? If so, what could cause it to simply ignore an incoming request from a valid source IP?
The funny thing is that this even happens when I try to send an SNMP query to 127.0.0.1. I have no idea what I'm doing wrong...
Thanks to the comment of Lex Li, I was able to finally figure out which step I made a mistake with:
When setting up the SNMP service, under the security tab, I had to add 'public' as an accepted community name (with READ-ONLY rights). I figured since 'public' is sort of the standard read-only community, it would be accepted by default, which apparently it is not.
Alternatively, I guess I could have added my own communtiy name, but I didn't try that since I only want to read some values through SNMP anyways and read-only access is all I need for that.
Thank you very much Lex Li, I'm off to continue my project now!

How to simulate external TCP traffic in Windows?

I don't have any code yet, so please feel free to move this to a sister site, if you think it belongs there :)
I have a Program A ( I don't have it's source code, so I can't modify it's behavior ) running on my machine which keeps listening to a particular port on the system for TCP data. It's a Peer to Peer application.
System 1 running A ====================== System 2 sunning A
The program A is supposed to run on systems where I may not be allowed to modify Firewall settings to allow incoming connections on the port the program listens to. I have an EC2 linux server running Ubuntu 16.
So I thought I can use an existing tool or create a program that would connect to the server on port X, and fetch the data from the server, and locally throw that data to the port A is listening to.
System 1 running A ========= SERVER =========== System 2 sunning A
What kind of configuration should I have on the server ? And is there any program I can use for this, or an idea of how to make one ?
I did something similar to bypass firewalls and hotspots.
Check this out https://github.com/yarrick/iodine, with a proper configuration your would be able to send\receive packets as DNS queries which is I know is always allowed, I used my server to get usual internet access with any hotspot I found.
You would lose some time, higher latency but you will have access.
Hope I helped.

Wildfly clustering with VirtualBox

I am using VirtualBox on a WINDOWS7 as host of two DEBIAN7.7 guests, deb1 and deb2. Each guest can comunicate with the other one. Using one guest browser I can see the Wildfly istance welcome page that's running on the other guest. I run each istance in standalone-ha mode, network interfaces have mutlicast enabled, I can see on Wildfly node named srv1 that the two istances build a cluster:
...
...ISPN000094: Received new cluster view: [srv2/web|3] (2) [srv2/web, srv1/web]
where srv1 and srv2 are the node names of the istances. A tcpdump show UDP packets come across the multicast address 230.0.0.4, just where JGroups is listening. Despite all this goodness, http-session is not shared, this is my problem.
The application I use is very simple and <distributable/>, I have already used it succesfully in a multiple nodes on a single host scenario.
UPDATE: I made some tests using jgroups's test application McastReceiverTest and McastSenderTest with the following addresses: 230.0.0.4:45688, 230.0.0.4:45700 and 224.0.1.105:23364. Every test worked, on the receiver guest I can read what I sent by the sender guest. I tried to change my application too, I use this one https://github.com/liweinan/cluster-demo but http session is not shared.
Wildfly work well, I was looking at the problem as if I was still running multiple istances on my host. As JBoss forum suggests, I tried with curl retreiving my JSESSIONID and I see the cluster responding as expected. Happy ending.

Automatically send magic package on access

I configured my Windows 8 machine that it listens to magic packages send from other PCs to start it. It works very good. BUT I don't want to explicitly send a magic package, I would rather prefer it if I could send a magic package automatically when I try to access the PC over network.
I tryed using an smbclient event (30803). I configured this event to trigger a command line WOL. But This command will be triggered each time I get this event, no matter which PC I try to reach. I don't want to wake up PC-X when I actually try to access PC-Y.
Is there another way?
This sounds interesting... a possible solution would be, create a windows service and install it on the server or a computer that uses to be up. This service basically would be a network sniffer that captures all tcp traffic in network. It would have a table with ips and MAC addresses (to get MAC from an IP) that should be filled previously with manually or better... from ARP table (I did a program that gets IP / MAC from ARP table but has its issues... so each machine plugged on the LAN will get its MAC / IP), also this service would have last date ping done to each IP.
Then... how it would work... the service would capture all TCP packets and make a list of distinct IP, then each second or two get all distinct IPs (this will guarantee that the service is not consuming a lot of system resources), and on each distinct ip check last ping: if last ping was done successfully in last 5 or 10 minutes nothing is done (machine is guessed up), if no ping done or success in 5-10 minutes a ping is made. Based on ping response... if the machine is not responging magic packet is sent to MAC (provided from ARP when machine is up or manually as commented before). If ping responds nothing is done. Ping result and date is stored to avoid pings to all machines every time. Instead of ping also its possible to do it reading ARP table.
I this approach, system resources are preserved, and pings are made with sense, also magic packets are not sent if machine is up or guessed up.
Note that firewall should allow ICMP.

Full Clustering in Apache Traffic Server

I followed the steps mentioned in the official documentation for full clustering of multiple ATS instances. I installed 2 instances of ATS on 2 different Ubuntu machines (having the same specs, OS versions and hardware), and both of these act as a reverse proxy for web service hosted on a Tomcat server in a different machine. I wasnt able to set up the cluster. Here are some of the queries that I have.
They are on the same switch or same VLAN : The two Ubuntu machines on which I installed the ATS are connected to the same switch. They have the same interface mentioned in the /etc/network/interfaces. Are these enough or there is something else that has to be done to get the clustering?.
Running the comment traffic_line -r proxy.process.cluster.nodes : This returned 1 after I ran the traffic_line -x and traffic_line -L commands. But, in the cluster.config file, there isnt any additions or changes.
Moreover, when I make a query to one of these ATS instances (I have mapped the URLs in the remap.config file), both of them cache the responses locally and is not shared across.
From this information, can anyone tell me if I am doing something wrong. Let me know if anymore info is required.
Are these on virtual machines? I almost wasted 2 days trying to figure out what is wrong, when I initially set it up on openvz containers. Out of a wild guess, I decided to migrate to 2 physical nodes, and it went well. See Apache Traffic Server Clustering not working
proxy.process.cluster.nodes returns 1
means that it is just the standalone single node, and the second node on the cluster is not discovered.
Try a tcp dump for multicast and broadcast messages. If the other server's IP is not showing in the discovery packet, it has something to do at the network level, where the netops might have disabled multicast packet forwarding across switches.

Resources