Snmp push mechanism - snmp

Recently I am stuck into a snmp problem. My client requirement is that I have to push cpu,hard disk and memory data to the Network management system periodically. I have already configured my snmpd.conf file to pull data through snmpwalk command but don't know how to push periodically to nms. Also need to know how to test that the data's are pushed properly or not. Anly help would be great

If the NMS provides any other interfaces than SNMP, for example, 3GPP XML files transferred over FTP, I would strongly recommend using that interface instead.
You have an SNMP Agent running already, and it will respond to get-requests, walk, etc. Normally, an enterprise-grade NMS would have no problem polling an SNMP Agent regularly to collect data such as what you describe. This is a common approach in what the telecom sector defines as Performance Management(PM) according to FCAPS. For a modest amount of counters, fetched at reasonably large intervals, this approach generally works well. Problems with polling time can sometimes occur with too-frequent polling of large amount of data.
From the SNMP Agent, you also have the option to send Trap messages, which are spontaneous asynchronous messages. Normally, traps are only used to notify an NMS about important events on the supervised equipment, such as equipment faults (Fault Management). However, there is technically nothing stopping you from designing a MIB which defines traps sent regularly, containing performance data. Some form of adaptation would probably be needed on the NMS, to receive PM data from SNMP traps, since this is not usually done. If the NMS is not able to do regular polling of counters, it seems unlikely that it would be flexible enough to do this.
If there is a large amount of counters, traps are unsuitable since the size of each message should ideally not exceed the MTU of the network (1500 bytes for Ethernet).

Related

Using NdisFIndicateReceiveNetBufferLists for every packet vs chaining them all together to receive?

I have an NDIS driver where i send received packets to the user service, then the service marks those packets that are OK (not malicious), then i iterate over the packets that are good to receive then i send them one by one by by converting each of them back to a proper NetBufferList with one NetBuffer and then i indicate them using NdisFIndicateReceiveNetBufferLists.
This caused a problem that in large file transfers through SMB (copying files from shares), which reduced the transfer speed significantly.
As a workaround, i now chain all of the NBLs that are OK altogether (instead of sending them one by one), and then send all of them at once via NdisFIndicateReceiveNetBufferLists.
My question is, will this change cause any issue? Any difference between sending X number of NBLs one by one vs chaining them together and sending all of them at once? (since most of them might be related to different flows/apps)
Also, the benefit of chaining packets together is much greater in multi packet receive compared to multi packet send via FilterSendNetBufferLists, why is that?
An NET_BUFFER represents one single network frame. (With some appropriate hand-waving for LSO/RSC.)
An NET_BUFFER_LIST is a collection of related NET_BUFFERs. Each NET_BUFFER on the same NET_BUFFER_LIST belong to the same "traffic flow" (more on that later), they have all the same metadata and will have all the same offloads performed on them. So we use the NET_BUFFER_LIST to group related packets and to have them share metadata.
The datapath generally operates on batches of multiple NET_BUFFER_LISTs. The entire batch is only grouped together for performance reasons; there's not a lot of implied relation between multiple NBLs within a batch. Exception: most datapath routines take a Flags parameter that can hold flags that make some claims about all the NBLs in a batch, for example, NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE.
So to summarize, you can indeed safely group multiple NET_BUFFER_LISTs into a single indication, and this is particularly important for perf. You can group unrelated NBLs together, if you like. However, if you are combining batches of NBLs, make sure you clear out any NDIS_XXX_FLAGS_SINGLE_XXX style flags. (Unless, of course, you know that the flags' promise still holds. For example, if you're combining 2 batches of NBLs that both had the NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE flag, and if you verify that the first NBL in each batch has the same EtherType, then it is actually safe to preserve the NDIS_RECEIVE_FLAGS_SINGLE_ETHER_TYPE flag.)
However note that you generally cannot combine multiple NET_BUFFERs into the same NET_BUFFER_LIST, unless you control the application that generated the payload and you know that the NET_BUFFERs' payloads belong to the same traffic flow. The exact semantics of a traffic flow are a little fuzzy down in the NDIS layer, but you can imagine it means that any NDIS-level hardware offload can safely treat each packet as the same. For example, an IP checksum offload needs to know that each packet has the same pseudo-header. If all the packets belong to the same TCP or UDP socket, then they can be treated as the same flow.
Also, the benefit of chaining packets together is much greater in multi packet receive compared to multi packet send via FilterSendNetBufferLists, why is that?
Receive is the expensive path, for two reasons. First, the OS has to spend CPU to demux the raw stream of packets coming in from the network. The network could send us packets from any random socket, or packets that don't match any socket at all, and the OS has to be prepared for any possibility. Secondly, the receive path handles untrusted data, so it has to be cautious about parsing.
In comparison, the send path is super cheap: the packets just fall down to the miniport driver, who sets up a DMA and they're blasted to hardware. Nobody in the send path really cares what's actually in the packet (the firewall already ran before NDIS saw the packets, so you don't see that cost; and if the miniport is doing any offload, that's paid on the hardware's built-in processor, so it doesn't show up on any CPU you can see in Task Manager.)
So if you take a batch of 100 packets and break it into 100 calls of 1 packet on the receive path, the OS has to grind through 100 calls of some expensive parsing functions. Meanwhile, 100 calls through the send path isn't great, but it'll be only a fraction of the CPU costs of the receive path.

What are down sides of using ZeroMQ for sending large messages (up to gigabytes)?

I found that people don't recommend sending large messages with ZeroMQ. But it is a real headache for me to split the data (it is somewhat twisted). Why this is not recommended is there some specific reason? Can it be overcome?
Why this is not recommended?
Resources ...
Even the best Zero-Copy implementation has to have spare resources to store the payloads in several principally independent, separate locations:
|<fatMessageNo1>|
|...............|__________________________________________________________ RAM
|...............|<fatMessageNo1>|
|...............|...............|__________________Context().Queue[peerNo1] RAM
|...............|...............|<fatMessageNo1>|
|...............|...............|...............|________O/S.Buffers[L3/L2] RAM
Can it be overcome?
Sure, do not send Mastodon-sized-GB+ messages. May use any kind of an off-RAM representation thereof and send just a lightweight reference to allow a remote peer to access such an immense beast.
Many new questions added via comment:
I was concern more about something like transmission failure: what will zeromq do (will it try to retransmit automatically, will it be transparent for me etc). RAM is not so crucial - servers can have it more than enough and service that we write is not intended to have huge amount of clients at the same time. The data that I talk about is very interrelated (we have molecules/atoms info and bonds between them) so it is impossible to send a chunk of it and use it - we need it all)) – Paul 25 mins ago
You may be already aware that ZeroMQ is working under a Zen-of-Zero, where also a zero-warranty got its place.
So, a ZeroMQ dispatched message will either be delivered "through" error-free, or not delivered at all. This is a great pain-saver, as your code will receive only a fully-protected content atomically, so no tortured trash will ever reach your target post-processing. Higher level soft-protocol handshaking allows one to remain in control, enabling mitigations of non-delivered cases from higher levels of abstractions, so if your design apetite and deployment conditions permit, one can harness a brute force and send whatever-[TB]-BLOBs, at one's own risk of blocked both local and infrastructure resources, if others permit and don't mind ( ... but never on my advice :o) )
Error-recovery self-healing - from lost-connection(s) and similar real-life issues - is handled if configuration, resources and timeouts permit, so a lot of troubles with keeping L1/L2/L3-ISO-OSI layers issues are efficiently hidden from user-apps programmers.

Is ZeroMQ slower than boost asio?

I am trying to write a network transfer application.
The data is binary data and each packet size is mostly 800KB.
The client produces 1000 data per second. I want transfer data as quick as possible.
When I use ZeroMQ, the speed hits 350 data per second, but the boost asio hits 400(or more) per second.
As you can see the performance of both methods is not good.
The pattern used for ZeroMQ is a PUSH/PULL pattern, the boost asio is simple sync I/O.
Q1: I want to ask, is ZeroMQ only suitable for small messages?
Q2: Is there a way to improve the ZeroMQ speed?
Q3: If ZeroMQ can't, please advice some good method or library to improve these kind of data transfer.
Data Rate
You're attempting to move 800 MByte/second. What sort of connection is this? For a tcp:// transport-class it'd have to something pretty rapid, e.g. 100 Gbit/s Ethernet, which is pretty exotic.
So I'm presuming that it's an ipc:// transport-class connection. In which case you can get an improvement, using ZeroMQ zerocopy functions, which saves copying the data repeatedly.
With a normal transfer, you have to copy data into a zmq message, that has to be copied into an ipc pipe, copied out again, and copied back into a new zmq message at the receiving end. All that copying requires 4 x 800 = 2.4 GByte/sec memory bandwidth which, by the time cache conflicts have come into play, is an appreciable percentage of the total memory bandwidth of a typical PC system. Using zerocopy should cut that in half.
Alternative to Zero Copy - Zero Transfer
If you are using ipc://, then consider not sending data through the sockets, but sending references to the data through the sockets.
I have previously blended use of zmq and a semaphore locked C++ stl::queue, using zmq simply for it's pattern ( PUSH/PULL in my case ), the stl::queue to carry shared pointers to data, and leave the data still. The sender locks the queue, puts a shared pointer into it, and then sends a simple message ( e.g. "1" ) through a zmq socket. The recipient reads the "1" and uses that as a cue to lock the queue and pull a shared pointer off it. Thus a shared pointer to data has been transferred from one thread to another in a ZMQ pattern via a stl::queue, but the data itself has stayed still. All I've done is pass ownership of the data between threads. It works so long as the shared pointer that the send has goes out of scope immediately after sending and is not used by the sender to modify or access the data.
PUSH/PULL is not too bad to deal with - each message goes to only one recipient. It would take more effort to make such a blend with PUB/SUB, and received messages would have to be treated as read-only because each recipient would have a shared pointer to the same block of data as everyone else.
Message Size
I've not idea how big a chunk zmqtp transfers at a time, but I'd guess that it's relatively efficient in terms of protocol:data ratio.

Performance benefit of multiple pending reads or multiple pending writes per individual TCP socket?

IOCP is great for many connections, but what I'm wondering is, is there a significant benefit to allowing multiple pending receives or multiple pending writes per individual TCP socket, or am I not really going to lose performance if I just allow one pending receive and one pending send per each socket (which really simplifies things, as I don't have to deal with out-of-order completion notifications)?
My general use case is 2 worker threads servicing the IOCP port, handling several connections (more than 2 but less than 10), where the transmitted data is ether of two forms: one is frequent very small messages (which I combine if possible manually, but generally need to send often enough that the per-send data is still pretty small), and the other is transferring large files.
Multiple pending recvs tend to be of limited use unless you plan to turn off the network stack's recv buffering in which case they're essential. Bear in mind that if you DO decide to issue multiple pending recvs then you must do some work to make sure you process them in the correct sequence. Whilst the recvs will complete from the IOCP in the order that they were issued thread scheduling issues may mean that they are processed by different I/O threads in a different order unless you actively work to ensure that this is not the case, see here for details.
Multiple pending sends are more useful to fully utilise the TCP connection's available TCP window (and send at the maximum rate possible) but only if you have lots of data to send, only if you want to send it as efficiently as you can and only if you take care to ensure that you don't have too many pending writes. See here for details of issues that you can come up against if you don't actively manage the number of pending writes.
For less than 10 connections and TCP, you probably won't feel any difference even at high rates. You may see better performance by simply growing your buffer sizes.
Queuing up I/Os is going to help if your application is bursty and expensive to process. Basically it lets you perform the costly work up front so that when the burst comes in, you're using a little of the CPU on I/O and as much of it on processing as possible.

Gauging a web browser's bandwidth

Is it possible to gauge a web browsers upload and/or download speed by monitoring normal http requests? Ideally a web application would be able to tell the speed of a client without any modifications and without client-side scripting like JavaScript/Java/Flash. So even if a client was accessing the service with a library like Curl it would still work. If this is possible, how? If its not possible, why? How accurate can this method be?
(If it helps assume PHP/Apache, but really this is a platform independent question. Also being able to gauge the upload speed is more important to me.)
Overview
You're asking for what is commonly called "passive" available bandwidth (ABW) measurement along a path (versus measuring a single link's ABW). There are a number of different techniques1 that estimate bandwidth using passive observation, or low-bandwidth "Active" ABW probing techniques. However, the most common algorithms used in production services are active ABW techniques; they observe packet streams from two different end-points.
I'm most familiar with yaz, which sends packets from one side and measures variation in delay on the other side. The one-sided passive path ABW measurement techniques are considered more experimental; there aren't solid implementations of the algorithms AFAIK.
Discussion
The problem with the task you've asked for is that all non-intrusive2 ABW measurement techniques rely on timing. Sadly, timing is a very tricky thing when working with http...
You have to deal with the reality of object caching (for instance, akamai) and http proxies (which terminate your TCP session prematurely and often spoof the web-server's IP address to the client).
You have to deal with web-hosts which may get intermittently slammed
Finally, active ABW techniques rely on a structured packet stream (wrt packet sizes and timing), unlike what you see in a standard http transfer.
Summary
In summary, unless you set up dedicated client / server / protocol just for ABW measurement, I think you'll be rather frustrated with the results. You can keep your ABW socket connections on TCP/80, but the tools I have seen won't use http3.
Editorial note: My original answer suggested that ABW with http was possible. On further reflection, I changed my mind.
END-NOTES:
---
See Sally Floyd's archive of end-to-end TCP/IP bandwidth estimation tools
The most common intrusive techniques (such as speedtest.net) use a flash or java applet in the browser to send & receive 3-5 parallel TCP streams to each endpoint for 20-30 seconds. Add the streams' average throughput (not including lost packets requiring retransmission) over time, and you get that path's tx and rx ABW. This is obviously pretty disruptive to VoIP calls, or any downloads in progress. Disruptive meausurements are called bulk transfer capacity (BTC). See RFC 3148: A Framework for Defining Empirical Bulk Transfer Capacity Metrics. BTC measurements often use HTTP, but BTC doesn't seem to be what you're after.
That is good, since it removes the risk of in-line caching by denying http caches an object to cache; although some tools (like yaz) are udp-only.
Due to the way TCP connections adapt to available bandwidth, no this is not possible. Requests are small and typically fit within one or two packets. You need a least a dozen full-size packets to get even a coarse bandwidth estimate, since TCP first has to scale up to available bandwidth ("TCP slow start"), and you need to average out jitter effects. If you want any accuracy, you're probably talking hundreds of packets required. That's why upload rate measurement scripts typically transfer several megabytes of data.
OTOH, you might be able to estimate round-trip delay from the three-way handshake and the timing of acks. But download speed has at least as much impact as upload speed.
There's no support in javascript or any browser component to measure upload performance.
The only way I can think of is if you are uploading to a page/http handler, and the page is receiving the incoming bytes, it can measure how many bytes it is receiving per second. Then store that in some application wide dictionary with a session ID.
Then from the browser you can periodically poll the server to get the value in the dictionary using the session ID and show it to user. This way you can tell how's the upload speed.
You can use AJAXOMeter, a JavaScript library which meassures your up- and download speed. You can see a live demo here.
That is not feasible in general as in-bound and out-bound bandwidth frequently is not symmetric. Different ISPs have significantly different ratios here that can vary on even time of the day basis.

Resources