I am looking for a spec or an example of how to format the Nagios performance data.
This documentation does not provide any good examples
I am looking for an explanation on how to create a table like this in Thruk / Nagios output.
Raw Data :
pending=3;5500;7000 complete=18940;; error=454;;7000
I found this page which describes in some detail how to format the performance data.
This is the expected format:
'label'=value[UOM];[warn];[crit];[min];[max]
Notes:
space separated list of label/value pairs
label can contain any characters except the equals sign or single quote (')
the single quotes for the label are optional. Required if spaces are in the label
label length is arbitrary, but ideally the first 19 characters are unique (due to a limitation in RRD). Be aware of a limitation in the amount of data that NRPE returns to Nagios
to specify a quote character, use two single quotes
warn, crit, min or max may be null (for example, if the threshold is not defined or min and max do not apply). Trailing unfilled semicolons can be dropped
min and max are not required if UOM=%
value, min and max in class [-0-9.]. Must all be the same UOM. value may be a literal "U" instead, this would indicate that the actual value couldn't be determined
warn and crit are in the range format (see the Section called Threshold and ranges). Must be the same UOM
UOM (unit of measurement) is one of:
no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)
s - seconds (also us, ms)
% - percentage
B - bytes (also KB, MB, TB)
c - a continous counter (such as bytes transmitted on an interface)
It is up to third party programs to convert the Nagios Plugins performance data into graphs.
In my this case the raw data provides a pefect example of the output required.
pending=3;5500;7000 complete=18940;; error=454;;7000
Related
Let's say I have a lot of IPv4s stored as integers (specifically, in a relational database) and I want to do a substring search on them given a string representation of an IP.
For example, a user types in 12.3 and expects that they get back results such as 12.30.45.67, 192.168.12.3, 1.12.34.5, 9.212.34.5.
If the IP were a string, I could just do a plain substring search. It might not be efficient but it is at least simple to implement and understand. But because I can't readily change it into a string at the moment, I don't see any efficient (in terms of CPU cycles, memory, and also my development/implementation time) way of doing this, but maybe I am just missing something.
You aren't missing anything.
For example try to turn 12.3 into a series of ranges. In whichever octet the 12 is in, there will be 3 options (12, 112, 212). In whichever octet the 3 is in there will be 2 options (3 and 30-39). That's 6 range per combination of preceding octets.
But the preceding octets? We have 1 + 256 + 256*256 depending whether 0, 1 or 2 octets precede your start.
That's a grand total of 3 * 2 * (1 + 256 + 256*256) = 394758 ranges of numbers you have to search in. It is unlikely that doing that many index searches will be faster than scanning everything.
Incidentally the worst case would be 1.2. In that case you'd have had 17 * 3 * (1 + 256 + 256*256) = 3355443 range lookups to do!
If they want this badly enough, you need to do a full text search on strings.
Anything other than some pre-processing, indexing, caching in that case sounds too inefficient (and very hard to implement) to me.
Here are a few ideas:
Look into creating a custom index, if possible, that enables you do string search.
Add an automatic field to the table that represents the ip as a string and enable you to string search. Add a corresponding index of course.
If you can't or don't want to change the schema of that table, create another one with string representations of the rows in your ip table and corresponding foreign keys that map to the primary keys of the ip table.
If you don't want or can't edit that database at all, create an external key/value store/database where the keys are string representations of the ips and values hold the corresponding records of the ip table in the (now) external database or point to it.
In any case, I don't think searching in that table with its current form (integer) is feasible (both performance-wise and implementation complexity-wise) considering your requirements.
I have just started learning algorithms and data structures and I came by an interesting problem.
I need some help in solving the problem.
There is a data set given to me. Within the data set are characters and a number associated with each of them. I have to evaluate the sum of the largest numbers associated with each of the present characters. The list is not sorted by characters however groups of each character are repeated with no further instance of that character in the data set.
Moreover, the largest number associated with each character in the data set always appears at the largest position of reference of that character in the data set. We know the length of the entire data set and we can get retrieve the data by specifying the line number associated with that data set.
For Eg.
C-7
C-9
C-12
D-1
D-8
A-3
M-67
M-78
M-90
M-91
M-92
K-4
K-7
K-10
L-13
length=15
get(3)= D-1(stores in class with character D and value 1)
The answer for the above should be 13+10+92+3+8+12 as they are the highest numbers associated with L,K,M,A,D,C respectively.
The simplest solution is, of course, to go through all of the elements but what is the most efficient algorithm(reading the data set lesser than the length of the data set)?
You'll have to go through them each one by one, since you can't be certain what the key is.
Just for sake of easy manipulation, I would loop over the dataset and check if the key at index i is equal to the index at i+1, if it's not, that means you have a local max.
Then, store that value into a hash or dictionary if there's not already an existing key:value pair for that key, if there is, do a check to see if the existing value is less than the current value, and overwrite it if true.
While you could use statistics to optimistically skip some entries - say you read A 1, you skip 5 entries you read A 10 - good. You skip 5 more, B 3, so you need to go back and also read what is inbetween.
But in reality it won't work. Not on text.
Because IO happens in blocks. Data is stored in chunks of usually around 8k. So that is the minimum read size (even if your programming language may provide you with other sized reads, they will eventually be translated to reading blocks and buffering them).
How do you find the next line? Well you read until you find a \n...
So you don't save anything on this kind of data. It would be different if you had much larger records (several KB, like files) and an index. But building that index will require reading all at least once.
So as presented, the fastest approach would likely be to linearly scan the entire data once.
I have a column of 4 byte integers that contains a lot of Nulls. I would like to know how are null values represented in storage. Do they use up 4 bytes as well or they are handled in a way they don't waste space?
It depends on the data type. For example, in the case on numerical types (int, float, etc) nulls are represented as minimal value of the type. As a result, no extra space is wasted, but one cannot use the smallest possible value for that type.
For other types, such as boolean columns, some extra space is used, since single bit is not sufficient to represent true, false and null. (It's not a qubit ;) )
You can find more information here: https://www.monetdb.org/Documentation/Manuals/SQLreference/BuiltinTypes
The link below provides more developer targeted info (with more details): https://www.monetdb.org/wiki/MonetDB_type_system
According to the question "What is the maximum length of a valid email address?", the maximum length of the address is 254. But I like to know what would be the maximum length of the display name:
Display Name <my#examplemailaddress.net>
Following this link https://www.ietf.org/mail-archive/web/ietf-822/current/msg00086.html the size is unlimited but practically according this link https://www.ietf.org/mail-archive/web/ietf-822/current/msg00088.html the size would be 72 characters. But I believe this answer is a bit outdated? What would be reasonable limit for today?
If you ask the maximal length allowed by the specs (the normative source is RFC5322 as of current_timestamp) then, indeed, there is no limit, since folding allows you to have an unlimted length of any field (while still respecting the advised 78 [or the larger 998] character limit).
Practical limit is a very subjective question, since "practical" would be the length which is fully displayed by "most" clients and environments; now that's pretty hard to calculate.
I would say the upper limit of practicality would be the total length of 78 characters from the "From:" header up to the last ">" character of the email address, since longer ones may probably break up while displaying in almost all environments, which would give you around 40 characters to use even for longer email addresses.
Most clients, however, probably expects to display around 20-25 characters under normal circumstances.
These are all displayed characters and not the actual length in bytes of a whatever way encoded address (especially for long utf-8 codes).
I'm doing calculations and the resultant text file right now has 288012413 lines, with 4 columns. Sample column:
288012413; 4855 18668 5.5677643628300215
the file is nearly 12 GB's.
That's just unreasonable. It's plain text. Is there a more efficient way? I only need about 3 decimal places, but would a limiter save much room?
Go ahead and use MySQL database
MSSQL express has a limit of 4GB
MS Access has a limit of 4 GB
So these options are out. I think by using a simple database like mysql or sSQLLite without indexing will be your best bet. It will probably be faster accessing the data using a database anyway and on top of that the file size may be smaller.
Well,
The first column looks suspiciously like a line number - if this is the case then you can probably just get rid of it saving around 11 characters per line.
If you only need about 3 decimal places then you can round / truncate the last column, potentially saving another 12 characters per line.
I.e. you can get rid of 23 characters per line. That line is 40 characters long, so you can approximatley halve your file size.
If you do round the last column then you should be aware of the effect that rounding errors may have on your calculations - if the end result needs to be accurate to 3 dp then you might want to keep a couple of extra digits of precision depending on the type of calculation.
You might also want to look into compressing the file if it is just used to storing the results.
Reducing the 4th field to 3 decimal places should reduce the file to around 8GB.
If it's just array data, I would look into something like HDF5:
http://www.hdfgroup.org/HDF5/
The format is supported by most languages, has built-in compression and is well supported and widely used.
If you are going to use the result as a lookup table, why use ASCII for numeric data? why not define a struct like so:
struct x {
long lineno;
short thing1;
short thing2;
double value;
}
and write the struct to a binary file? Since all the records are of a known size, advancing through them later is easy.
well, if the files are that big, and you are doing calculations that require any sort of precision with the numbers, you are not going to want a limiter. That might possibly do more harm than good, and with a 12-15 GB file, problems like that will be really hard to debug. I would use some compression utility, such as GZIP, ZIP, BlakHole, 7ZIP or something like that to compress it.
Also, what encoding are you using? If you are just storing numbers, all you need is ASCII. If you are using Unicode encodings, that will double to quadruple the size of the file vs. ASCII.
Like AShelly, but smaller.
Assuming line #'s are continuous...
struct x {
short thing1;
short thing2;
short value; // you said only 3dp. so store as fixed point n*1000. you get 2 digits left of dp
}
save in binary file.
lseek() read() and write() are your friends.
file will be large(ish) at around 1.7Gb.
The most obvious answer is just "split the data". Put them to different files, eg. 1 mln lines per file. NTFS is quite good at handling hundreds of thousands of files per folder.
Then you've got a number of answers regarding reducing data size.
Next, why keep the data as text if you have a fixed-sized structure? Store the numbers as binaries - this will reduce the space even more (text format is very redundant).
Finally, DBMS can be your best friend. NoSQL DBMS should work well, though I am not an expert in this area and I dont know which one will hold a trillion of records.
If I were you, I would go with the fixed-sized binary format, where each record occupies the fixed (16-20?) bytes of space. Then even if I keep the data in one file, I can easily determine at which position I need to start reading the file. If you need to do lookup (say by column 1) and the data is not re-generated all the time, then it could be possible to do one-time sorting by lookup key after generation -- this would be slow, but as a one-time procedure it would be acceptable.