After careful reading of FFmpeg Bitstream Filters Documentation, I still do not understand what they are really for.
The document states that the filter:
performs bitstream level modifications without performing decoding
Could anyone further explain that to me? A use case would greatly clarify things. Also, there are clearly different filters. How do they differ?
Let me explain by example. FFmpeg video decoders typically work by converting one video frame per call to avcodec_decode_video2. So the input is expected to be "one image" worth of bitstream data. Let's consider this issue of going from a file (an array of bytes of disk) to images for a second.
For "raw" (annexb) H264 (.h264/.bin/.264 files), the individual nal unit data (sps/pps header bitstreams or cabac-encoded frame data) is concatenated in a sequence of nal units, with a start code (00 00 01 XX) in between, where XX is the nal unit type. (In order to prevent the nal data itself to have 00 00 01 data, it is RBSP escaped.) So a h264 frame parser can simply cut the file at start code markers. They search for successive packets that start with and including 00 00 01, until and excluding the next occurence of 00 00 01. Then they parse the nal unit type and slice header to find which frame each packet belongs to, and return a set of nal units making up one frame as input to the h264 decoder.
H264 data in .mp4 files is different, though. You can imagine that the 00 00 01 start code can be considered redundant if the muxing format already has length markers in it, as is the case for mp4. So, to save 3 bytes per frame, they remove the 00 00 01 prefix. They also put the PPS/SPS in the file header instead of prepending it before the first frame, and these also miss their 00 00 01 prefixes. So, if I were to input this into the h264 decoder, which expects the prefixes for all nal units, it wouldn't work. The h264_mp4toannexb bitstream filter fixes this, by identifying the pps/sps in the extracted parts of the file header (ffmpeg calls this "extradata"), prepending this and each nal from individual frame packets with the start code, and concatenating them back together before inputting them in the h264 decoder.
You might now feel that there's a very fine line distinction between a "parser" and a "bitstream filter". This is true. I think the official definition is that a parser takes a sequence of input data and splits it in frames without discarding any data or adding any data. The only thing a parser does is change packet boundaries. A bitstream filter, on the other hand, is allowed to actually modify the data. I'm not sure this definition is entirely true (see e.g. vp9 below), but it's the conceptual reason mp4toannexb is a BSF, not a parser (because it adds 00 00 01 prefixes).
Other cases where such "bitstream tweaks" help keep decoders simple and uniform, but allow us to support all files variants that happen to exist in the wild:
mpeg4 (divx) b frame unpacking (to get B-frames sequences like IBP, which are coded as IPB, in AVI and get timestamps correct, people came up with this concept of B-frame packing where I-B-P / I-P-B is packed in frames as I-(PB)-(), i.e. the third packet is empty and the second has two frames. This means the timestamp associated with the P and B frame at the decoding phase is correct. It also means you have two frames worth of input data for one packet, which violates ffmpeg's one-frame-in-one-frame-out concept, so we wrote a bsf to split the packet back in two - along with deleting the marker that says that the packet contains two frames, hence a BSF and not a parser - before inputting it into the decoder. In practice, this solves otherwise hard problems with frame multithreading. VP9 does the same thing (called superframes), but splits frames in the parser, so the parser/BSF split isn't always theoretically perfect; maybe VP9's should be called a BSF)
hevc mp4 to annexb conversion (same story as above, but for hevc)
aac adts to asc conversion (this is basically the same as h264/hevc annexb vs. mp4, but for aac audio)
Related
I'm trying to decode a file using Huffman. Suppose I got the characters AAAAABBBC and Suppose the codes for distinct characters are :
A: 1
B: 01
C: 00
and the encoded file looks like this: 11111010 10100000
notice that I don't need the last 3 bits which are 000. How do I know that I don't need these bits while Decoding ?
You don't know. You need a way to terminate the bit stream, since it is stored in bytes. Either you need to provide the number of bits in the bit stream or the number of symbols to decode from the bit stream before the bit stream, or you need to add an end-of-stream symbol, which when encountered ends the decoding.
Let's consider the following example with symbol- code length - canonical code data.
A - 2 - 00
B - 2 - 01
D - 2 - 10
C - 3 - 110
E - 3 - 111
I was wondering what would be the contents of encoded bit stream? Is it 00 01 10 110 111 (basically all codes) or 2,2,2,3,3 in binary equivalent as corresponding code lengths? I wanted to add here that some resources say just transmit code as encoded bit stream and few other resources talk about throwing code away from encoded bit stream and transmit only code length data.
Encoded bitstream
The code is:
00 01 10 110 111
Note that if we sent the code of 2,2,2,3,3, then it would be impossible to decide if the input was AAACC or BBBEE (or many other equivalent choices).
Because Huffman codes are a prefix code it means that we can unambiguously decode the bitstream despite not knowing where the spaces are.
In other words, when given the output 000110110111, we can uniquely decode it as ABDCE.
Transmitting code table
I think the confusion may be because you need to possess two things to decode the bitstream:
The coded bitstream
The lookup table
These two things are often coded in very different ways.
In many cases the lookup table is fixed in advance so does not need to be transmitted.
However, if the probabilities can change, then we need to tell the recipient what code table to use. In this case we can just transmit the lengths of each code word and this gives enough information for the receiver to construct the canonical Huffman code. Alternatives are also possible, for example we can send the number of each code word length followed by the values. This alternative is used by JPEG and explained more below.
Example
The JPEG image codec uses Huffman tables. Normally some default tables are used, but it is possible to optimize the size of images by transmitting a custom Huffman code. A tutorial about this is here.
Another description of the way of transmitting the Huffman table is here. The code lengths are sent (as bytes) followed by the code values (again as bytes).
Code to read it (taken from the link) is:
// Next sixteen bytes are the counts for each code length
u8 counts[16];
for (i = 0; i < 16; i++) {
counts[i] = fgetc(fp);
ctr++;
}
// Remaining bytes are the data values to be mapped
// Build the Huffman map of (length, code) -> value
for (i = 0; i < 16; i++) {
for (j = 0; j < counts[i]; j++) {
huffData[table][huffKey(i + 1, code)] = fgetc(fp);
code++;
ctr++;
}
code <<= 1;
}
What you are asking is how to send a description of the code to the receiver, so that the receiver knows how to decode the following code values.
There are many ways of varying levels of sophistication, depending on how much effort you want to put into compressing the description of the code. Peter de Rivaz describes a simple approach used by JPEG, which is to send 16 counts of the number of codes of each length, followed by the byte values of each of those symbols. So for your code that would be (in hex):
00 03 02 00 00 00 00 00 00 00 00 00 00 00 00 00 41 42 43 44 45
That's not terribly compact, and it can't represent one of the possible codes, which is 256 8-bit codes, since you are limited to a count of 255 for each length.
The first thing you can do is cut off the code lengths when you have a complete code. It is easy to calculate how many code patterns are left, in which case you can simply end it when there are none left. Follow that with the symbols. You then have:
00 03 02 41 42 43 44 45
We don't need eight bits for each count, since they are limited by the constraints on those counts. For example, you can't have more than two one-bit codes. So we could code these in fewer bits, e.g. n+1 bits for n codes. So two bits, three, bits, and so on until the code is complete. For your code, now in binary:
00 011 0010
followed by the bytes 41 42 43 44 45, offset in the bit stream appropriately. Now the list of counts takes nine bits instead of 24. Since we know that there can only be 256 symbols, we can cap off the number of bits for each count at nine, allowing for the count 256, solving the previous problem of not being able to represent the flat code. Then if the code is limited to 16 bits in length (as it is for JPEG), the largest number of bytes needed for the counts is 14.5, less than the original 16. Often the counts will end before 14.5 bytes.
You can get even more sophisticated, noting that at each code length, you have a limit on the possible count of codes of that length due to the shorter code lengths using up patterns. Then the number of bits for each count can be variable, based on how many possible values there are. Then the counts description would be:
00 011 10, then the eight-bit values 41 42 43 44 45
Since we have no preceding patterns used up for lengths one and two, those still need to be two and three bits respectively. However we now have only three possibilities left for length three: the counts 0, 1, or 2. A count of 3 would oversubscribe the code. So we can use two bits for that last one. It is now seven bits instead of nine, and this greatly reduces the number of bits in the counts for codes that use longer code lengths.
An entirely different scheme is the one used by the deflate format (used in zip, gzip, zlib, png, etc.). There the number of code lengths to follow is sent first, followed by the code length of each symbol in order up to the last one. The symbols themselves are implied by the code length location. That results in lots of zeros, to represent symbols that are not present. So for your code there would be a 70 to go up to symbol 69 ("E"), followed by 65 zeros, then 2 2 2 3 3. That seems awfully long, and it is. deflate then run-length and Huffman codes that list of lengths, to compress it. The long strings of zeros get compressed to a few bits, and the short lengths are also just a few bits each. So then you have to first send a description of the code lengths code lengths code (!) so that you can decode that.
You can read the deflate specification for more information on that scheme. brotli uses a similar scheme, with more sophistication still.
I got one paragraph from the book Hadoop: The Definitive Guide, it is as follows:
"DEFLATE
stores data as a series of compressed blocks. The problem is that the start of each block
is not distinguished in any way that would allow a reader positioned at an arbitrary
point in the stream to advance to the beginning of the next block, thereby synchronizing
itself with the stream. For this reason, gzip does not support splitting."
My question is I cannot understand the reason the author has explained about why gzip does not support splitting. Can someone give me a more detail explanation about this?
As my understanding, if the big file is split to 16 blocks. When one mapper begin to read one block, and in this point, 2 situations may happen:
The mapper cannot the block
or it can read it and then process it but does not know where to put the result to the whole stream
Does one of the above situations will happen or none will happen and there is other logic?
In order to split a file into pieces for processing, you need two things:
The pieces need to be able to be processed independently.
You need to be able to find where to split the pieces.
The deflate format in its normal usage supports neither. For 1: the deflate format is inherently serial, with every match referring to previously uncompressed data, itself potentially coming from a similar back reference, perhaps all the way to the beginning of the file.
The description you quote doesn't mention that important point.
Though it is a moot point since you don't have 1, for 2: deflate has no apparent markers in the stream to identify block boundaries. To find block boundaries, you would have to decode all of the bits up to the boundary, which would defeat the purpose of splitting the file for independent processing.
That is the point mentioned in your quoted description.
Though this is all true for a normal deflate stream, not prepared for splitting, you can if you like prepare such a deflate stream. The history can be erased at select breakpoints using Z_FULL_FLUSH, which allows independent decompression from that point. It also inserts a visible marker 00 00 ff ff. That's not a very long marker, and could appear by accident in the compressed data. It could be followed by a second flush to insert a second marker giving nine bytes: 00 00 ff ff 00 00 00 ff ff. That is something that Hadoop could use the split the deflate stream.
I have a cobol "tape format" dump which has a mixture of text and number fields. I'm reading the file in C# as a binary array (array of byte). I have the copy book and the formats are lining up fine on the text fields. There are a number of COMP-3 fields as well. The data in those fields doesnt seem to match any BCD format. I know what the data should be and I have the raw bytes of the COMP-3. I tried converting to EBCDIC first which yielded no better results. Any thoughts on how a COMP-3 number can be otherwise internally stored? Below are three examples of the PIC, the raw data and the expected number. I know I have the field positions correct because there is alpha data on either side of the numbers and that all lines up correctly.
First Example:
The PIC of the field is 9(9) COMP-3
There are 5 bytes to the data, the hex values are 02 01 20 91 22
The resulting data should be a date (00CCYYMMDD). This particular date should be 3-17-14.
Second Example:
The PIC of the field is S9(3) COMP-3
There are 2 bytes to the data, the hex values are 0A 14
The resulting value should be between 900 and 999
My understanding is that the "S" means that the last nibble should be 0xC or 0xD to indicate + or -
Third Example:
The PIC of the field is S9(15)V99 COMP-3
There are 9 bytes to the data, the hex values are 00 00 00 00 00 00 01 80 0C
The resulting value should be 12.00
Ok so thanks to the people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14 Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros) Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22 Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
There is no such thing as a COBOL "tape format" although the phrase may mean something to the person who gave you the data.
The clue to your problem is that you can read the text. Connect that to the EBCDIC tag and your reference to C#.
So, you are reading data which is originally source from a Mainframe, most likely an IBM Mainframe, which uses EBCDIC instead of ASCII.
COBOL does not have native support for BCD.
What some kind soul has done for you is "convert" the data from EBCDIC to ASCII. Otherwise you wouldn't even recognise the "text".
Unfortunately, what that means for any binary or packed-decimal or floating-point fields (you won't see much of the last, but they are COMP-1/COMP-2) is that "convert" means "potentially scrambled", because the coversion is assuming individual bytes, with simple byte values, whereas all of those fields have conventional coding, either through multiple bytes or non-EBCDIC values or both.
So: COMP-3 PIC 9(9). As you say, five bytes. It is unsigned, so the rightmost nybble will be F (all bits on). You are slightly out with your positions due to the sign position being occupied, even for an unsigned field.
On the Mainframe, it contains a value X'020140317F'. Only that field in its entirety can make any sense as to its value. However, the EBCDIC to ASCII conversion has made it X'0201209122'.
How?
Look up the EBCDIC value of X'02' and X'01'. They don't change. Look up the value of X'40', whoops, that's a space, change it to ASCII X'20'. Look up the value of X'31'. Actually nothing special there, and it has converted to something higher than X'7F', but if you look at the translation table used, I guess you'll see why it happens. The X'7F' is a double-quote, so gets changed to X'22'.
The other values you show suffer the same problem.
You should only ever take data from a Mainframe in character-only format. There are many answers here on this, you should look at the related to the right.
Have a look at this recent question: Convert COMP and COMP-3 Packed Decimal into readable value with C
OK, let's have a look at your first example. Given the format and value the original BCD-content should have been something like
02 01 40 31 7F
When transforming that from EBCDIC to ASCII we run into trouble with the first, second and fourth byte because they are control-characters - so here we would need some more details on how the ASCII->EBCDIC-converter worked. Looking at the two remaining bytes those would be changed
EBCDIC ASCII CHARACTER
40 -> 20 (blank)
7F -> 22 "
So assuming the first two bytes remain unchanged and the third gets converted like 31->91 we end up with
02 01 20 91 22
which is what you got. So it looks like some kind of EBCDIC->ASCII-conversion took place. If that is the case it might be that you can't repair the data since the transformation may not be one-one and thus not reversible.
Looking at the second example and using
EBCDIC ASCII CHARACTER
25 -> 0A (LF)
3C -> 14 (DC4)
you would have started with 25 3C which would fit the format but not the range you gave.
In the third example the original 01 20 0C could be converted to 01 80 0C since 20 also is an EBCDIC control-character with no direct ASCII-equivalent.
But given all other examples I would assume there is some codepage-conversion issue.
If you used some kind of filetransfer to move the data fromm the (supposed) mainframe make sure it is set to binary-mode and don't do any character-conversion before you split the file into fields and know what's meant to be a character and what not.
EDIT: You can find a list of several EBCDIC and ASCII-based codepages here or look here for the same as one pdf.
I'm coming to this a bit late, but have a couple of suggestions that might make your life easier...
First, see if you can get your mainframe conterparts to convert all non-character (i.e. binary numeric and packed decimal) data
to display format (e.g. PIC X) before you
download it. Then you only need to deal with the "printable" range of numeric characters representing 0 through 9. Printable character
only code-page conversions are fairly standaard and tend not to screw up as much. Reformatting data given a copybook is not
a difficult prospect for anybody
proficient in a mainframe environment. Unfortunately, sometimes you get the "runaround" and a claim is made that it is
extremely costly or, takes special software, or any one of a hundred other bogus excuses.
If you get the "runaround" then the next best thing is to is download the file in binary format and do your own codepage conversion
for the character data (fairly
straight forward). Next deal with the binary data based on your copybook definitions. With a few Googles you should be able to find
enough information to get through converting the PACKED-DECIMAL (COMP-3) data to whatever you need.
Here are a couple of links to get you started:
Numeric Data Formats
Packed Decimal
I do not recommend trying to reverse engineer the code page conversions applied by your file transfer package in order to
decode the packed decimal and other binary data.
Ok so thanks to both people who responded as they pointed me in the right direction. This is indeed an ASCII/EBCDIC representation issue. The BCD is stored in EBCDIC. Using an ASCII to EBCDIC conversion table yields properly formatted BCD digits:
I used this link to map the data: http://shop.alterlinks.com/ascii-table/ascii-ebcdic-us.php
My data: 0A 14
Converted: 25 3C (turns out that 253 is a valid value, spec was wrong) C = +, all good
My data: 01 80 0C (excluding leading zeros)
Converted: 01 20 0C 12.00 C = +, implied 2 digits in format, all good
My data: 02 01 20 91 22
Converted: 02 01 40 31 7F 2014/03/17 (F is unused nibble), all good
Thanks again for the two above answers which led me in the right direction.
You can avoid the above issues by having the data converted into a modern method for transferring data: XML.
How to count/detect frames (pictures) in raw H.264 bitstream? I know there are 5 VCL NALU types but I don't know how to rec(k)ognize sequence of them as access unit. I suppose detect a frame means detect an access unit as access unit is
A set of NAL units that are consecutive in decoding order and contain
exactly one primary coded picture. In addition to the primary coded
picture, an access unit may also contain one or more redundant coded
pictures, one auxiliary coded picture, or other NAL units not
containing slices or slice data partitions of a coded picture. The
decoding of an access unit always results in a decoded picture.
I want it to know what is the FPS of live stream out to server.
You are right on the interpretation, and if you want to parse the stream by yourself, take a look here
But to quickly extract stream info in a format easy to read and parse (with any text parser) you can use ffprobe
ffprobe -show_streams -count_frames -pretty filename
You will find in the output:
nb_read_frames=....
And for the fps, as I heard that ffprobe may report some error for the fps, try a simple ffmpeg -i command.
ffmpeg -i filename 2>&1 | sed -n "s/.*, \(.*\) fps.*/\1/p"
From ITU-T H.264 (03/2009):
7.4.1.2.3 Order of NAL units and coded pictures and association to access units
This subclause specifies the order of NAL units and coded pictures and association to access unit for coded video sequences that conform to one or more of the profiles specified in Annex A that are decoded using the decoding process specified in clauses 2-9.
An access unit consists of one primary coded picture, zero or more corresponding redundant coded pictures, and zero or more non-VCL NAL units. The association of VCL NAL units to primary or redundant coded pictures is described in subclause 7.4.1.2.5.
The first access unit in the bitstream starts with the first NAL unit of the bitstream.
The first of any of the following NAL units after the last VCL NAL unit of a primary coded picture specifies the start of a new access unit:
access unit delimiter NAL unit (when present),
sequence parameter set NAL unit (when present),
picture parameter set NAL unit (when present),
SEI NAL unit (when present),
NAL units with nal_unit_type in the range of 14 to 18, inclusive (when present),
first VCL NAL unit of a primary coded picture (always present).
The constraints for the detection of the first VCL NAL unit of a primary coded picture are specified in subclause 7.4.1.2.4.
7.4.1.2.4 Detection of the first VCL NAL unit of a primary coded picture
This subclause specifies constraints on VCL NAL unit syntax that are sufficient to enable the detection of the first VCL NAL unit of each primary coded picture for coded video sequences that conform to one or more of the profiles specified in Annex A that are decoded using the decoding process specified in clauses 2-9.
Any coded slice NAL unit or coded slice data partition A NAL unit of the primary coded picture of the current access unit shall be different from any coded slice NAL unit or coded slice data partition A NAL unit of the primary coded picture of the previous access unit in one or more of the following ways:
frame_num differs in value. The value of frame_num used to test this condition is the value of frame_num that appears in the syntax of the slice header, regardless of whether that value is inferred to have been equal to 0 for subsequent use in the decoding process due to the presence of memory_management_control_operation equal to 5. (NOTE 1 – A consequence of the above statement is that a primary coded picture having frame_num equal to 1 cannot contain a memory_management_control_operation equal to 5 unless some other condition listed below is fulfilled for the next primary coded picture that follows after it (if any).)
pic_parameter_set_id differs in value.
field_pic_flag differs in value.
bottom_field_flag is present in both and differs in value.
nal_ref_idc differs in value with one of the nal_ref_idc values being equal to 0.
pic_order_cnt_type is equal to0 for both and either pic_order_cnt_lsb differs in value, or delta_pic_order_cnt_bottom differs in value.
pic_order_cnt_type is equal to 1 for both and either delta_pic_order_cnt[ 0 ] differs in value, or delta_pic_order_cnt[ 1 ] differs in value.
IdrPicFlag differs in value.
IdrPicFlag is equal to 1 for both and idr_pic_id differs in value.
(NOTE 2 – Some of the VCL NAL units in redundant coded pictures or some non-VCL NAL units (e.g., an access unit delimiter NAL unit) may also be used for the detection of the boundary between access units, and may therefore aid in the detection of the start of a new primary coded picture.)
NAL units do not have a 1-1 relationship to frames necessarily. Frames can be split into multiple NAL units. If you want to parse the stream manually, you'll need to handle each type which is pretty well defined in the blog article below. If the stream has an SPS NAL packet it should contain frame rate, but thats not necessarily the actual frame rate, just what the container believes it has.
As you are asking as well about how to find the actual start of an AU, if its an "Annex B" bitstream each NALU will have a start code 0x000001 or 0x00000001. AVCC uses a small header to define the length of the NALU.
Check out the following great blog post for more details: szatmary.org
Hope that helps!