In developing my own SNMP poller, I've come across the problem of being able to poll devices with 32-bit interface indexes. I can't find anything out there explaining how to covert the hex (5 bytes) into the 32 bit integer or from the integer into hex as it doesn't use simple hex conversion. Example, the interface index is 436219904. While doing a pcap with a snmpget, I see the hex for this is 81 d0 80 e0 00 which makes no sense. I cannot for the life of me figure out how that converts to an integer value. I've tried to find a RFC dealing with this and have had no luck. The 16 bit interface values convert as they should. 0001 = 1 and so on. Only the 32-bit ones seem to be giving me this problem. Any help is appreciated.
SNMP uses ASN.1 syntax to encode data. Thus, you need to learn the BER rules,
http://en.wikipedia.org/wiki/X.690
For your case, I can say you watched the wrong data, as if 436219904 is going to be encoded as Integer32 in SNMP, the bytes should be 1A 00 30 00.
I guess you have missed some details in the analysis, so you might want to do it once again and add more descriptions (screenshot and so on) to enrich your question.
I suspect the key piece of info missing from your question is that the ifIndex value in question as used in your polling is an index for the table polled (not mentioned which, but we could assume ifTable), which means it will be encoded in as a subidentifier of the OID being polled (give me [some property] for [this ifIndex]) versus a requested value (give me [the ifIndex] for [some other row of some other table]).
Per X.209 (the version of the ASN.1 Basic Encoding Rules used by SNMP) subidentifiers in OIDs (other than the first two) are encoded in one or more octets (8 bits) with the highest order bit used as a continuation bit (i.e. "next octet is part of this subidentifier too"), and then remaining 7 bits for the actual value.
In other words, in your value "81 d0 80 e0 00", the highest bit is set in each of the first 4 octets and cleared in the last octet: this is how you know there are 5 octets in the subidentifier. The remaining 7 bits of each of those octets are concatenated to arrive at the integer value.
The converse of course is that to encode an integer value into a subidentifier of an OID, you have to build it 7 bits at a time.
Related
The official document for protocol buffers https://developers.google.com/protocol-buffers/docs/proto3 says the maximum field number for fields in protobuf message is 2^29-1. But why is this limit?
Please anyone can explain in some detail? I am newbie to this.
I read answers to the this question at why 2^29-1 is the biggest key in protocol buffers.
But I am not clarified
Each field in an encoded protocol buffer has a header (called key or tag) prefixed to the actual encoded value. The encoding spec defines this key:
Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.
Here the spec says the tag is a varint where the first 3 bits are used to encode the wire type. A varint could encode a 64 bit value, thus just by going on this definition the limit would be 2^61-1.
In addition to this, the Language Guide narrows this down to a 32 bit value at max.
The smallest field number you can specify is 1, and the largest is 2^29 - 1, or 536,870,911.
The reasons for this are not given. I can only speculate for the reasons behind this:
Artificial limit as no one is expecting a message to have that many fields. Just think about fitting a message with that many fields into memory.
As the key is a varint, it isn't simply the next 4 bytes in the raw buffer, rather a variable length of bytes (Java code reading a varint32). Each byte has 7 bit of actual data and 1 bit indicating if the end is reached. It cloud be that for performance reasons it was deemed to be better to limit the range.
Since proto3 is the 3rd version of protocol buffers, it could be that either proto1 or proto2 defined the tag to be a varint32. To keep backwards compatibility this limit is still true in proto3 today.
Because of this line:
#define GOOGLE_PROTOBUF_WIRE_FORMAT_MAKE_TAG(FIELD_NUMBER, TYPE) \
static_cast<uint32>((static_cast<uint32>(FIELD_NUMBER) << 3) | (TYPE))
this line create a "tag", which left only 29 (32 - 3) bits to save field indice.
Don't know why google use uint32 instead of uint64 though, since field number is a varint, may be they think 2^29-1 fields is large enough for a single message declaration.
I suspect this is simply so that a field-header (wire-type and tag-number) can be decoded and handled as a 32-bit value. The wire-type is always the 3 least significant bits, leaving 29 bits for the tag number. Technically "varint" should support 64 bits, but it makes sense to limit it to reasonable numbers, not least because "varint" encoding means that larger numbers take more bytes to encode.
Edit: I realise now that this is similar to the linked post, but... it remain true! Each field in protobuf is prefixed by a "varint" that expresses what field (tag-number) follows, and what data type it is (wire-type). The latter is important especially so that unexpected fields (version differences) can be stored or skipped correctly. It is convenient for that field-header to be trivially processed by most frameworks, and most frameworks are fine with 32-bit integers.
this is another question rather a comment, in the document it says,
Field numbers in the range 16 through 2047 take two bytes. So you
should reserve the numbers 1 through 15 for very frequently occurring
message elements. Remember to leave some room for frequently occurring
elements that might be added in the future.
Because for the first byte, top 5 bits are used for field number, and bottom 3 bits for field type, isn't it that field number from 31 (because zero is not used) to 2047 take two bytes? (and I also guess the second bytes' lower 3 bits are used also for field type.. I'm in the middle of reading it, so I'll fix it when I know it)
I am trying to read a 2-byte value that is stored lower order byte first. So if the 2 bytes are 10 and 85 in that order (decimal) what is the number they represent? 8510? 851? Or something different?
These values represent the length of an encoded sequence and I need to know the length to properly handle the information it contains. Most only use the first byte and as a decimal number it accurately represents the total number of characters (or bytes) in the sequence... but some use both bytes and I don't understand how to interpret them.
IF anyone can help be with this I would appreciate it.
Thanks
What you're referring to is the "endianness" of the two-byte value (also called a WORD). It's generally not considered beneficial to discuss them in terms of decimal values because no, if the bytes are 10 and 85 in decimal, the values would not be any combination of 10/85. Rather, showing them in the hexadecimal values is far more prudent.
+----+----+
|0x0a|0x55|
+----+----+
To interpret this as Little Endian, the value would be 0x550a (21770 in decimal). And in Big Endian (highest order byte first), the value is 0x0a55 (2645 decimal).
It's very important, for this reason, to know the endianness of your system and be able to properly handle the two. It is also noteworthy that "Network Byte Order" is Big Endian.
I have a cobol "tape format" dump which has a mixture of text and number fields. I'm reading the file in C# as a binary array (array of byte). I have the copy book and the formats are lining up fine on the text fields. There are a number of COMP-3 fields as well. The data in those fields doesnt seem to match any BCD format. I know what the data should be and I have the raw bytes of the COMP-3. I tried converting to EBCDIC first which yielded no better results. Any thoughts on how a COMP-3 number can be otherwise internally stored? Below are three examples of the PIC, the raw data and the expected number. I know I have the field positions correct because there is alpha data on either side of the numbers and that all lines up correctly.
First Example:
The PIC of the field is 9(9) COMP-3
There are 5 bytes to the data, the hex values are 02 01 20 91 22
The resulting data should be a date (00CCYYMMDD). This particular date should be 3-17-14.
Second Example:
The PIC of the field is S9(3) COMP-3
There are 2 bytes to the data, the hex values are 0A 14
The resulting value should be between 900 and 999
My understanding is that the "S" means that the last nibble should be 0xC or 0xD to indicate + or -
Third Example:
The PIC of the field is S9(15)V99 COMP-3
There are 9 bytes to the data, the hex values are 00 00 00 00 00 00 01 80 0C
The resulting value should be 12.00
Ok so thanks to the people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14 Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros) Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22 Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
There is no such thing as a COBOL "tape format" although the phrase may mean something to the person who gave you the data.
The clue to your problem is that you can read the text. Connect that to the EBCDIC tag and your reference to C#.
So, you are reading data which is originally source from a Mainframe, most likely an IBM Mainframe, which uses EBCDIC instead of ASCII.
COBOL does not have native support for BCD.
What some kind soul has done for you is "convert" the data from EBCDIC to ASCII. Otherwise you wouldn't even recognise the "text".
Unfortunately, what that means for any binary or packed-decimal or floating-point fields (you won't see much of the last, but they are COMP-1/COMP-2) is that "convert" means "potentially scrambled", because the coversion is assuming individual bytes, with simple byte values, whereas all of those fields have conventional coding, either through multiple bytes or non-EBCDIC values or both.
So: COMP-3 PIC 9(9). As you say, five bytes. It is unsigned, so the rightmost nybble will be F (all bits on). You are slightly out with your positions due to the sign position being occupied, even for an unsigned field.
On the Mainframe, it contains a value X'020140317F'. Only that field in its entirety can make any sense as to its value. However, the EBCDIC to ASCII conversion has made it X'0201209122'.
How?
Look up the EBCDIC value of X'02' and X'01'. They don't change. Look up the value of X'40', whoops, that's a space, change it to ASCII X'20'. Look up the value of X'31'. Actually nothing special there, and it has converted to something higher than X'7F', but if you look at the translation table used, I guess you'll see why it happens. The X'7F' is a double-quote, so gets changed to X'22'.
The other values you show suffer the same problem.
You should only ever take data from a Mainframe in character-only format. There are many answers here on this, you should look at the related to the right.
Have a look at this recent question: Convert COMP and COMP-3 Packed Decimal into readable value with C
OK, let's have a look at your first example. Given the format and value the original BCD-content should have been something like
02 01 40 31 7F
When transforming that from EBCDIC to ASCII we run into trouble with the first, second and fourth byte because they are control-characters - so here we would need some more details on how the ASCII->EBCDIC-converter worked. Looking at the two remaining bytes those would be changed
EBCDIC ASCII CHARACTER
40 -> 20 (blank)
7F -> 22 "
So assuming the first two bytes remain unchanged and the third gets converted like 31->91 we end up with
02 01 20 91 22
which is what you got. So it looks like some kind of EBCDIC->ASCII-conversion took place. If that is the case it might be that you can't repair the data since the transformation may not be one-one and thus not reversible.
Looking at the second example and using
EBCDIC ASCII CHARACTER
25 -> 0A (LF)
3C -> 14 (DC4)
you would have started with 25 3C which would fit the format but not the range you gave.
In the third example the original 01 20 0C could be converted to 01 80 0C since 20 also is an EBCDIC control-character with no direct ASCII-equivalent.
But given all other examples I would assume there is some codepage-conversion issue.
If you used some kind of filetransfer to move the data fromm the (supposed) mainframe make sure it is set to binary-mode and don't do any character-conversion before you split the file into fields and know what's meant to be a character and what not.
EDIT: You can find a list of several EBCDIC and ASCII-based codepages here or look here for the same as one pdf.
I'm coming to this a bit late, but have a couple of suggestions that might make your life easier...
First, see if you can get your mainframe conterparts to convert all non-character (i.e. binary numeric and packed decimal) data
to display format (e.g. PIC X) before you
download it. Then you only need to deal with the "printable" range of numeric characters representing 0 through 9. Printable character
only code-page conversions are fairly standaard and tend not to screw up as much. Reformatting data given a copybook is not
a difficult prospect for anybody
proficient in a mainframe environment. Unfortunately, sometimes you get the "runaround" and a claim is made that it is
extremely costly or, takes special software, or any one of a hundred other bogus excuses.
If you get the "runaround" then the next best thing is to is download the file in binary format and do your own codepage conversion
for the character data (fairly
straight forward). Next deal with the binary data based on your copybook definitions. With a few Googles you should be able to find
enough information to get through converting the PACKED-DECIMAL (COMP-3) data to whatever you need.
Here are a couple of links to get you started:
Numeric Data Formats
Packed Decimal
I do not recommend trying to reverse engineer the code page conversions applied by your file transfer package in order to
decode the packed decimal and other binary data.
Ok so thanks to both people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14
Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros)
Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22
Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
Thanks again for the two above answers which led me in the right direction.
You can avoid the above issues by having the data converted into a modern method for transferring data: XML.
I've got a file consisting of a bunch of 'rows' of data (not actually on separate lines). As far as I can tell from the description (PDF), each 'row' has forty bytes precisely, as follows:
4 bytes I can ignore.
2 bytes that are the byte representations of the (decimal) integer 240 or 241 — that is, 00 F0 or 00 F1 — depending on 'row'.
4 bytes I can ignore.
11 bytes: the first few bytes are an ASCII string, which I need, and the rest is padding by 00 bytes.
1 byte I can ignore.
1 byte I need. This seems from the documentation to be ASCII 'B', 'S', or '0'.
1 byte that's the byte representation of a small integer — that is, 00, 01, or the like.
4 bytes that are the byte representation of an integer.
4 bytes that are the byte representation of an integer.
4 bytes that are the byte representation of an integer.
4 bytes that are the byte representation of an integer.
And there's nothing else in the file (no file header, for example). (I may well be wrong in my understanding of the linked-to documentation, and would appreciate any correction; more info may be available elsewhere (PDF).)
I wish to:
convert each byte representation of a number to a human-readable representation of the number and each 00 byte to (say) a human-readable 0, and
convert the file to comma-separated or the like.
Now, step (2) should be doable using sed; I mention it only because I want to make sure step (1) is done is such a manner as to allow step (2) (for example, keep track of how many bytes are each field when doing step (1)). But step (1) I have no idea how to do. Can anyone help?
As a caveat, please note that I'm comfortable with sed and bash and can handle perl, but have no experience with real programming; and that, alas, I'm doing this on a Windows machine I don't have installation-of-programs rights on, so (although I have a sed port) I don't have bash. So, basically, I need to do this in sed or a Windows (DOS) command-line script. (I should be able to download the files to another machine, work with them, and upload them, though, if that turns out to be necessary.)
Perl has an unpack function you could use.
I'm writing a code for handling SMS PDUs based on all those ETSI GSM documentations. There is one thing I need to ask about. PDU contains a User Data Length field followed by User Data. According to GSM 03.40, the UDL field is the number of septets of user data when the uncompressed GSM default alphabet is used. However, it also says, that when the data is compressed, then the UDL is the number of octets of user data.
See the quotes:
If the TP User Data is coded using the
GSM 7 bit default alphabet, the TP
User Data Length field gives an
integer representation of the number
of septets within the TP User Data
field to follow.
[...]
If the TP User Data is coded using
compressed GSM 7 bit default alphabet
or compressed 8 bit data or compressed
UCS2 [24] data, the TP User Data
Length field gives an integer
representation of the number of octets
after compression within the TP User
Data field to follow.
The problem is that when the 7-bit data is compressed and the number of octets of the compressed user data is a multiple of 7, you don't know whether the last 7 bits in the last octet are fill bits or a real character. I.e. 7 octets may contain either 7 or 8 7-bit characters when compression is on. And when the UDL field is the number of octets, how can you know whether those 7 octets contain 7 or 8 characters?? If UDL contained the number of septets, everything would be clear, right? So have I misunderstood the documentation or does it really work this way?
Could anyone please explain me how it really works? Thanks in advance!
As you are already aware, creating an MMS message requires you to add a UDH before your text message. The UDH becomes part of your payload, thus reducing the number of characters you can send per segment.
As it has become part of your payload, it needs to confirm with your payloads data requirement - which is 7 bit. The UDH however, is 8 bit, which clearly complicates things.
Consider the UDH of the following as an example (It's a UDH for a concatenated message):
050003000302
05 is the length of the UDH (the 5 octets which follow)
00 is the IEI
03 is the IEDL (3 more octets)
00 is a reference (this number must be the same in each of your concatenated message UDH's)
03 is the maximum number of messages
02 is the current message number.
This is 6 octets in total - equating to 48 bits. This is all and well, but since the UDH is actually part of your SMS message, what you have to do is add more bits so that the actual message starts on a septet boundary. A septet boundary is every 7 bits, so in this case, we will have to add 1 more bit of data to make the UDH 49 bits, and then we can add our standard GSM-7 encoded characters.
You can read up more about this from Here
So, the thing is that I misunderstood the meaning of the compression bit in the Data Coding Scheme byte. I thought it referred to the 7-bit alphabet packing method (where at least one character is stored within one byte) but it refers to Huffman compression.
Therefore, the question above was kind of stupid. Sorry for that :-).