I dont know on which tag i need to ask this question.
I'm currently studying about the ATSC standards for Digital TV broadcasting.
I have doubts while going through the contents. In Digital TV broadcasting, a single bandwidth will contain multiple channels(services).
The data that is coded and multiplexed by the broadcaster is called a transport stream.
The transport stream consists of header and payload. The header comprises of PID's of the audio elementary stream or video elementary stream or data elementary stream. This transport stream is received by the set top box containing the middleware which parses the transport stream and puts the data onto the PAT,PMT,EIT,ETT,SDT,NIT,CAT tables..
Is it possible to use PSI tables and not ATSC tables?
This would be like explaining the entire standard in a few sentence, but here it is:
MPEG 2 systems - numbered ISO/IEC 13818-1 defines the structure of packets and timing of how compressed video along with associated information can be transported. Such a stream is called transport stream which is packetized with 188 bytes.
Many audio and video streams can be simultaneously multiplexed. They are identified as PID of the stream. The set of PIDs and the organization of the stream is expressed as PAT and PMT tables. PAT, PMT, and CAT tables are mandatory by MPEG2 system, without which you cannot decode the stream easily. However, apart from this more information is needed that are encoded as other tables, NIT, SDT and EIT.
Before answering your question, i would like to clear some of your doubts which is the cause of confusion.
This transport stream is received by the settop box containing the
middleware which parses the tranport stream and put the data on to the
PAT,PMT,EIT,ETT,SDT,NIT,CAT tables..
The correct representation is :
This transport stream carries the audio video data and other important
tables PAT,PMT,EIT,ETT,SDT,NIT,CAT tables.. this is received by the
settop box containing the middleware which parses the tables of
tranport stream and decodes the appropriate audio video lines.
Yes, it is possible not to use anything other than PAT PMT and no other table, the transport stream is still fully decodable by STB. These are not ATSC tables. but other tables only makes it easy to put relevant information.
Finally: EPG is not done by mapping the above tables: EPG is simply carried in of the tables called -EIT. It refers to program names (channels) as mentioned in PAT. One of the reference below explains that as well.
I am providing some reference documents for your reference:
1. PSIP: Program specific information
2. PSIP tutorial
3. The ATSC transport layer, including program and system information protocol (PSIP)
4. Using SI Tables to Create Electronic Program Guides
5. ISO/IEC 13818-1 MPEG 2 systems. Read section "2.4.4 Program specific information"
6. SYSTEM INFORMATION FOR DIGITAL TELEVISION ATSC STANDARD.
7. ETSI EN 300 468 - Digital Video Broadcasting (DVB); Specification for Service Information (SI) in DVB systems - Section 5 explains SI information.
I believe there is some overlap in what you are calling ATSC tables and PSI tables. PSI tables are the PAT, PMT, NIT, and CAT. The other tables are the ATSC tables. With that in mind, there isn't a way to produce the ATSC data in the PSI tables.
Related
Media Source Extension (MSE) needs fragmented mp4 for playback in the browser.
A fragmented MP4 contains a series of segments which can be requested individually if your server supports byte-range requests.
Boxes aka Atoms
All MP4 files use an object oriented format that contains boxes aka atoms.
You can view a representation of the boxes in your MP4 using an online tool such as MP4 Parser or if you're using Windows, MP4 Explorer. Let's compare a normal MP4 with one that is fragmented:
Non-Fragmented MP4
This screenshot (from MP4 Parser) shows an MP4 that hasn't been fragmented and quite simply has one massive mdat (Movie Data) box.
If we were building a video player that supports adaptive bitrate, we might need to know the byte position of the 10 sec mark in a 0.5Mbps and a 1Mbps file in order to switch the video source between the two files at that moment. Determining this exact byte position within one massive mdat in each respective file is not trivial.
Fragmented MP4
This screenshot shows a fragmented MP4 which has been segmented using MP4Box with the onDemand profile.
You'll notice the sidx and series of moof+mdat boxes. The sidx is the Segment Index and stores meta data of the precise byte range locations of the moof+mdat segments.
Essentially, you can independently load the sidx (its byte-range will be defined in the accompanying .mpd Media Presentation Descriptor file) and then choose which segments you'd like to subsequently load and add to the MSE SourceBuffer.
Importantly, each segment is created at a regular interval of your choosing (ie. every 5 seconds), so the segments can have temporal alignment across files of different bitrates, making it easy to adapt the bitrate during playback.
Media File Formats
Media data streams are wrapped in a container format. The container includes the physical data of the media but also metadata that are necessary for playback. For example it signals to the video player the codec
used, subtitles tracks etc. In video streaming there are two main formats
that are used for storage and presentation of multimedia content: MPEG-
2 Transport Streams (MPEG-2 TS)[25] and ISO Base Media File Formats
(ISOBMFF)[24](MP4 and fragmented MP4).
MPEG-2 Transport Streams are specified by [25] and are designed for
broadcasting video through satellite networks. However, Apple adopted
it for its adaptive streaming protocol making it an important format. In
MPEG-2 TS audio, video and subtitle streams are multiplexed together.
MP4 and fragmented MP4 (fMP4), are both part of the MPEG-4, Part
12 standard that covers the ISOBMFF. MP4 is the most known multimedia
container format and it’s widely supported in different operating systems
and devices. The structure of an MP4 video file, is shown in figure 2.2a.
As shown, MP4 consist of different boxes, each with a different function-
ality. These boxes are the basic building block of every container in MP4.
For example the file type box (’ftyp’), specifies the compatible brands (spe-
cifications) of the file. MP4 files have a Movie Box (’moov’) that contains
metadata of the media file and sample tables that are important for timing
and indexing the media samples (’stbl’). Also there is a Media Data Box
(’mdat’) that contains the corresponding samples. In the fragmented con-
tainer, shown in figure 2.2b, media samples are interleaved by using Movie
Fragment boxes (’moof’) which contain the sample table for the specific
fragment(mdat box).
Ref : https://repository.tudelft.nl/islandora/object/uuid%3Ae06cde4c-1514-4a8d-90be-7e10eee5aac1
I'm searching for a way to analyse the content of internet radios. I want to write a ruby client that can get the current track, next track, band, bpm and other meta information from a stream (e.g. a radio on shoutcast).
Does anybody know how to do this? And how do I record that stream into a mp3 or aac file?
Maybe there is a library that can already do this, I haven't one so far.
regards
I'll answer both of your questions.
Metadata
What you are seeking isn't entirely possible. Information on the next track is not available (keep in mind not all stations are just playing songs from a playlist... many offer live content). Advanced metadata such as BPM is not available. All you get is something like this:
Some Band - Some Song
The format of {artist} - {song title} isn't always followed either.
With those caveats, you can get that metadata from a stream by connecting to the stream URL and requesting the metadata with the following request header:
Icy-MetaData: 1
That tells the server to send the metadata, which is interleaved into the stream. Every 8KB or so (specified by the server in a response header), you'll find a chunk of metadata to parse. I have written up a detailed answer on how to parse that here: Pulling Track Info From an Audio Stream Using PHP The prior question was language-specific, but you will find that my answer can be easily implemented in any language.
Saving Streams to Disk
Audio playing software is generally very resilient to errors. SHOUTcast servers are built on this principal, and are not knowledgeable about the data going through them. They just receive data from an encoder, and when the client requests the stream, they start sending that data at an arbitrary point.
You can use this to your advantage when saving stream data. It is possible to simply write the stream data as it comes in to a file. Most audio players will play them without problem. I have tested this with MP3 and AAC.
If you want a more conformant file, you will have to use a library or parse the stream yourself to split on the appropriate frames, and then handle bit reservoir issues in your code. This is a lot of work, and generally isn't worth doing unless you find your files have real compatibility problems.
I am downloading various sound files with my own c++ http client (i.e. mp3's, aiff's etc.). Now I want to parse them using Core Audio's AudioToolbox, to get linear PCM data for playback with i.e. OpenAL. According to this document: https://developer.apple.com/library/mac/#documentation/MusicAudio/Conceptual/CoreAudioOverview/ARoadmaptoCommonTasks/ARoadmaptoCommonTasks.html , it should be possible to also create an audio file from memory. Unfortunately I didn't find any way of doing this when browsing the API, so what is the common way to do this? Please don't say that I should save the file to my hard drive first.
Thank you!
I have done this using an input memory buffer, avoiding any files, in my case I started with AAC audio format and used apple's api : AudioConverterFillComplexBuffer to do the hardware decompress into LPCM. The trick is you have to define a callback function to supply each packet of input data. That api call does the format conversion on a per packet basis. In my case I had to write code to parse the compressed AAC data to identify packet starts (0xfff) then use the callback to spoon feed each packet into the api call. I am also using OpenAL for audio rendering which has its own challenges to avoid using input files.
I'm writing a DirectShow source filter which is registered as a CLSID_VideoInputDeviceCategory, so it can be seen as a Video Capture Device (from Skype, for example, it is viewed as another WebCam).
My source filter is based on the VCam example from here, and, for now, the filter produces the exact output as this example (random colored pixels with one Video output pin, no audio yet), all implemented in the FillBuffer() method of the one and only output pin.
Now the real scenario will be a bit more tricky - The filter uses a file handle to a hardware device, opened using the CreateFile() API call (opening the device is out of my control, and is done by a 3Party library). It should then read chunks of data from this handle (usually 256-512 bytes chunk sizes).
The device is a WinUSB device and the 3Party framework just "gives" me an opened file handle to read chunks from.
The data read by the filter is a *.mp4 file, which is streamed from the device to the "handle".
This scenario is equivalent to a source filter reading from a *.mp4 file on the disk (in "chunks") and pushing its data to the DirectShow graph, but without the ability to read the file entirely from start to end, so the file size is unknown (Correct?).
I'm pretty new to DirectShow and I feel as though I'm missing some basic concepts. I'll be happy if anyone can direct me to solutions\resources\explanations for the following questions:
1) From various sources on the web and Microsoft SDK (v7.1) samples, I understood that for an application (such as Skype) to build a correct & valid DirectShow graph (so it will render the Video & Audio successfully), the source filter pin (inherits from CSourceStream) should implement the method "GetMediaType". Depending on the returned value from this implemented function, an application will be able to build the correct graph to render the data, thus, build the correct order of filters. If this is correct - How would I implement it in my case so that the graph will be built to render *.mp4 input in chunks (we can assume constant chunk sizes)?
2) I've noticed the the FillBuffer() method is supposed to call SetTime() for the IMediaSample object it gets (and fills). I'm reading raw *.mp4 data from the device. Will I have to parse the data and extract the frames & time values from the stream? If yes - an example would b great.
3) Will I have to split the data received from the file handle (the "chunks") to Video & Audio, or can the data be pushed to the graph without the need to manipulate it in the source filter? If split is needed - How can it be done (the data is not continuous, and is spitted to chunks) and will this affect the desired implementation of "GetMediaType"?
Please feel free to correct me if I'm using incorrect terminology.
Thanks :-)
This is a good question. On the one hand this is doable, but there is some specific involved.
First of all, your filter registered under CLSID_VideoInputDeviceCategory category is expected to behave as a live video source. By doing so you make it discoverable by applications (such as Skype as you mentioned), and those applications will be attempting to configure video resolution, they expect video to go at real time rate, some applications (such as Skype) are not expecting compressed video such H.264 there or would just reject such device. You can neither attach audio right to this filter as applications would not even look for audio there (not sure if you have audio on your filter, but you mentioned .MP4 file so audio might be there).
On your questions:
1 - You would have a better picture of application requirement by checking what interface methods applications call on your filter. Most of the methods are implemented by BaseClasses and convert the calls into internal methods such as GetMediaType. Yes you need to implement it, and by doing so you will - among other - enable your filter to connect with downstream filter pins by trying specific media types you support.
Again, those cannot me MP4 chunks, even if such approach can work in other DirectShow graphs. Implementing a video capture device you should be delivering exactly video frames, preferably decompressed (well those could be compressed too, but you are going to immediately have compatibility issies with applications).
A solution you might be thinking of is to embed a fully featured graph internally to which you inject your MP4 chunks, then the pipelines parse those, decodes and delivers to your custom renderer, taking frames on which you re-expose them off your virtual device. This might be a good design, though assumes certain understanding of how filters work internally.
2 - Your device is typically treated as/expected to be a live source, which means that you deliver video in realtime and frames are not necessarily time stamped. So you can put times there and yes you definitely need to extract time stamps from your original media (or have it done by internal graph as mentioned in item 1 above), however be prepared that applications strip time stamps especially for preview purposes, since the source is "live".
3 - Getting back to audio, you cannot implement audio on the same virtual device. Well you can, and this filter might be even working in a custom built graph, but this is not going to work with applications. They will be looking for separate audio device, and if you implement such, they will instantiate it separately. So you are expected to implement both virtual video and virtual audio source, and implement internal synchronization behind the scenes. This is where timestamps will be important, by providing them correctly you will keep lip sync in live session to what it was originally on the media file you are streaming from.
I have to write a RTMP parser which will handle the packets captured form a RTMP stream on wireshark and i will extract the data from the pcap.
I have gone through the specs ad i am able to understand the handshake process and also able to locate the media in TCP packets but i am confused in case of Multiple Audio/Video session which are interleaved within a single pcap, how we can handle that in the parsing so as make our parser able to parse multiple stream simultaneously. Any uniqueness will be very helpful for the simultaneous parsing of the different RTMP streams.
EDIT (after #Martin Redmond's answer): yeah that I am able to figure out but it seems like some FLV data is being streamed over the RTMp but that FLV header is missing and there seems to be different handshake and FLV data is streaming for same IP with different ports. So, i am not able to find if its the real FLV file or only header as if i extract only the header and the other data, i am not able to make a FLV file from it.
Any way to validate or extract the media from that RTMP stream???
The header information for each chunk of data lets you figure out which stream the chunk belongs to. It's not straight forward though. The header information gets compressed and the relevant info may have only been sent at the begining of the stream so you need have a context for each chunk.
The important part is the streamid. Video and audio from the same source will have the same streamid but will have different channel numbers and datatypes.
In the spec. the streamid is referred to as the message stream id (section 6.1.2.1) and is only sent with a type 0 header.