OpenNMS - Monitor VPN Tunnel Traffic - opennms

Is it possible to generate a SINGLE graph which lists all the active VPN tunnels/session's traffic? (Single line for each tunnel, example, 10 tunnels will be represented by 10 individual lines)
Note:
In the agent device, active tunnels and their traffic 'counter' is
stored in a tables, I'm retrieving it through 'snmpwalk'.
I've created 'resourceType' and associated it with group/mibObj.
OpenNMS creates multiple 'sub directories' and relevant rrd files.
Each sub directory (for each sessions) have an 'alias' rrd file.
I need to access the 'alias' files (btw all these files have the same name) which are located in different 'paths' to generate such a graph, does OpenNMS has a way to do so?
Thank you.

There are a couple of ways to do this.
1) The traditional way
First it helps to understand how OpenNMS stores data in the RRD files. Usually they are stored in a directory structure that starts off /opt/opennms/share/rrd/snmp. The next directory is usually the nodeid for the device. Anything that is a "node" level value (i.e. only occurs once per device) is stored there. Interface level data is stored in a subdirectory made up of the interface description and its MAC address. Generic resource types (like the one you created) are stored in a subdirectory starting with the resource type name.
OpenNMS defines graphs in the files found in snmp-graph.properties.d. The two values of interest to this discussion are "type" and "columns". The "type" tells OpenNMS where to look for the RRD file: nodeSnmp is the nodeid directory, interfaceSnmp would be the interface directory, and "resourceType" would be the name of the generic resource. The "columns" value tells OpenNMS to look for a file with that name.
For example, if I have:
report.name.type=nodeSnmp
report.name.columns=columnA,columnB
Then OpenNMS is looking for two files in the device's node directory called columnA.rrd and columnB.rrd. If they exist, it will try to run the report.
So, a kludge is that you can create symlinks and then use those names to create an RRD report.
I usually only do this for known or important values. For example, let's say that I have three peer points: New York, Chicago and San Francisco. I could then go to a particular node directory and symlink the ifHCInOctets.rrd file for the NYC router to NYC-in.rrd and the ifHCOutOctets.rrd for that router to NYC-out.rrd. Rinse and repeat for ORD and SFO. Then you just create a report with a type of "nodeSnmp" and the columns of NYC-in,NYC-out,ORD-in,ORD-out,SFO-in and SFO-out.
As I mentioned, it is a kludge, which is why you can use:
2) The Grafana method
OpenNMS was the first third party plugin for the Grafana data visualization tool. If you set up Grafana and tie it to your OpenNMS instance you can create a template to do what you want. There is a good post about how to do that here: http://www.jessewhite.ca/opennms/grafana/2016/04/15/opennms-grafana-template-queries.html

Related

SphinxSearch - Different Nodes using shared data

We are in the process of building a SphinxSearch Cluster using Amazon EC2 instances. We did a sample test like several instances using the same shared file system (Elastic File System). Our idea is, in a cluster we might have more than 10 nodes, But we can use a single instance to index documents and keep it in Elastic File System and can shared by multiple nodes for reading.
Our test worked fine, But technically any problem with this approach? (Like locking issue etc)
Can someone please suggest on this
Thanks in Advance
If you're ok with having N copies of the index you can do as follows:
build an index in one place in a temp folder
rename the files so they include .new.
distribute the index to all the other places using rsync or whatever you like. Some even do broadcasting with UFTP
rotate the indexes at once in all the places by sending HUP to the searchds or better by doing RELOAD INDEX (http://docs.manticoresearch.com/latest/html/sphinxql_reference/reload_index_syntax.html), it normally takes only few ms so we can say that your new index replaces the previous one simultaneously on all the nodes
previously (and perhaps still in Sphinx) there was an issue with rotating the index (either by --rotate or RELOAD) in case it was processing a long query (the rotate just had to wait). It was fixed in Manticoresearch recently.
This is tried'n'true solution people use in production for years, but if you really want to share the same files among multiple searchd instances you can softlink all the files except .spl, but then to rotate the index in the searchd instances using the links (not the actual files) you'll need to restart the searchd instances which doesn't look good in general, but in some special cases may be still a good solution.

Im not seeing Datasets on the envirnoment z/OS unix shell

Just to give the context I had the same doubt of the our coleague #JOB in this Thread: "Unable to Access PDS , ok its was solved.
I have one question which is related to: Why when us are on that envirnoment Linux, from the command TSO OMVS, we aren't able to see us Dataset's, PDS's?
Or Is that possible?
You have to understand that z/OS-datasets and OMVS files live in two different worlds:
z/OS datasets do have a name that consists of a series of qualifiers but are not really organized in a hierarchical manner, they are distributed over a cluster of (virtual) disks and you have a couple of catalogs to find them again. There are no such things as directories: while you might have a dataset named MY.COOL.DSN, there might not be an object that is called MY.COOL or it might be an ordinary dataset as well.
OMVS files on the other hand live in a filesystem that has a hierarchical structure. Each file might reside in a directory, that might be in another directory etc. In total you end up with a tree of directories with a single root-directory and files as leaves.
To realize this in OMVS you create z/OS datasets that contain hierarchical filesystems - either HFS or z/FS - each with its own directory-tree. Upon installation one of those is mounted as the root-filesystem and addressed via / and you might mount other HFS or z/FS filesystems on any point further down the directory-tree, so that it is added as a subtree.
If you are familiar with Linux you can compare the HFS and z/FS datasets as disk-partitions that can be mounted in your system.
Long story short: when navigating via cd and ls you are moving through the directory-tree that consists of all the mounted z/FS and HFS datasets, but there is no defined place that contains the ordinary z/OS datasets - and there can't be since they are not organized in a tree-structure.

Does hadoop use folders and subfolders

I have started learning Hadoop and just completed setting up a single node as demonstrated in hadoop 1.2.1 documentation
Now I was wondering if
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
When files are stored in this type of FS should I use a hierachial mode of storage - like folders and sub-folders as I do in Windows or files are just written into as long as they have a unique name?
Yes, use the directories to your advantage. Generally, when you run jobs in Hadoop, if you pass along a path to a directory, it will process all files in that directory. So.. you really have to use them anyway.
Is it possible to add new nodes to the single node setup if say somebody were to use it in production environment. Or simply can a single node be converted to a cluster without loss of data by simply adding more nodes and editing the configuration?
You can add/remove nodes as you please (unless by single-node, you mean pseudo-distributed... that's different)
This one I can google but what the hell! I am asking anyway, sue me. What is the maximum number of files I can store in HDFS?
Lots
To expand on climbage's answer:
Maximum number of files is a function of the amount of memory available to your Name Node server. There is some loose guidance that each metadata entry in the Name Node requires somewhere between 150-200 bytes of memory (it alters by version).
From this you'll need to extrapolate out to the number of files, and the number of blocks you have for each file (which can vary depending on file and block size) and you can estimate for a given memory allocation (2G / 4G / 20G etc), how many metadata entries (and therefore files) you can store.

Get file offset on disk/cluster number

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?
Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Architecture - How to efficiently crawl the web with 10,000 machine?

Let’s pretend I have a network of 10,000 machines. I want to use all those machines to crawl the web as fast as possible. All pages should be downloaded only once. In addition there must be no single point of failure and we must minimize the number of communication required between machines. How would you accomplish this?
Is there anything more efficient than using consistent hashing to distribute the load across all machines and minimize communication between them?
Use a distributed Map Reduction system like Hadoop to divide the workspace.
If you want to be clever, or doing this in an academic context then try a Nonlinear dimension reduction.
Simplest implementation would probably be to use a hashing function on the name space key e.g. the domain name or URL. Use a Chord to assign each machine a subset of the hash values to process.
One Idea would be to use work queues (directories or DB), assuming you will be working out storage such that it meets your criteria for redundancy.
\retrieve
\retrieve\server1
\retrieve\server...
\retrieve\server10000
\in-process
\complete
1.) All pages to be seeds will be hashed and be placed in the queue using the hash as a file root.
2.) Before putting in the queue you check the complete and in-process queues to make sure you don't re-queue
3.) Each server retrieves a random batch (1-N) files from the retrieve queue and attempts to move it to the private queue
4.) Files that fail the rename process are assumed to have been “claimed” by another process
5.) Files that can be moved are to be processed put a marker in in-process directory to prevent re-queuing.
6.) Download the file and place it into the \Complete queue
7.) Clean file out of the in-process and server directories
8.) Every 1,000 runs check the oldest 10 in-process files by trying to move them from their server queues back into the general retrieve queue. This will help if a server hangs and also should load balance slow servers.
For the Retrieve, in-process and complete servers most file systems hate millions of files in 1 directory, Divide storage into segments based on the characters of the hash \abc\def\123\ would be the directory for file abcdef123FFFFFF…. If you were scaling to billions of downloads.
If you are using a mongo DB instead of a regular file store much of these problems would be avoided and you could benefit from the sharding etc…

Resources