digital audio output - what format is it in? - macos

My MacBook has an optical digital audio output 3.5 mm plug (see here). I'm asking here on SO because I think this is a standard digital audio output plug; the description says I should use a Toslink cable with a Toslink mini-plug adapter or a fiber-optic cable.
I was wondering: What is the format of the audio data transferred over this cable? Is it a fixed format, e.g. 44.1kHz, 16bit integer, two-channel (standard PCM like from an audio CD)? Or what formats does it allow? For example, I would like to send 96kHz (or 48kHz), 32bit float (or 24bit integer), two-channel (or 6 channels) audio data over it. How is the data encoded? How does the receiver (the DA converter) know about the format? Is there some communication back from the receiver so that the receiver tells my computer what format it would prefer? Or how do I know the maximal sample rate and the maximal bit width of a sample?
How do I do that on the software side? Is it enough to tell CoreAudio to use whatever format I like and it puts that unmodified onto the cable? At least that is my goal. So basically my main questions are: What formats are supported, how do I know that my raw audio data in my application gets exactly in that format on the cable?

Digital audio interconnects like TOSLINK use the S/PDIF protocol. The channel layout and compression status is encoded in the stream, and the sample rate is implied by the speed at which the signal is sent (!). For uncompressed streams, S/PDIF transmits 24-bit (integer) PCM data. (Lower bit depths can be transmitted as well; S/PDIF just pads them out to 24 bits anyway.) Note that, due to bandwidth constraints, compression must be used if more than two channels are being transmitted.
From the software side, on OS X, most of the properties of a digital audio output are controlled by the settings of your audio output device.

Related

Convert LPCM buffer to AAC for HTTP Live Streaming

I have an application that records audio from devices into a Float32 (LPCM) buffer.
However, LPCM needs to be encoded in an audio format (MP3, AAC) to be used as a media segment to be streamed, according to the HTTP Live Streaming specifications. I have found some useful resources on how to convert a LPCM file to an AAC / MP3 file but this is not exactly what I am looking for, since I am not willing to convert a file but a buffer.
What are the main differences between converting an audio file and a raw audio buffer (LPCM, Float32)? Is the latter more trivial?
My initial thought was to create a thread that would regularly fetch data from a ring buffer (where the raw audio is stored) and convert it to a a valid audio format (either AAC or MP3).
Would it be more sensible to do so immediately when the AudioBuffer is captured through a AURenderCallback and hence pruning the ring buffer?
Thanks for your help,
The core audio recording buffer length and the desired audio file length are rarely always exactly the same. So it might be better to poll your circular/ring buffer (you know the sample rate, which should tell approximately how often) to decouple the two rates, and convert the buffer (if filled sufficiently) to a file at a later time. You can memory map a raw audio file to the buffer, but there may or may not be any performance difference between that, and async writing a temp file.

Output from an ADC is needed to be stored in memory

We want to take the output of a 16-bit Analog to Digital Converter, which is coming at a rate of 10 million samples per second and SAVE the sequence of 16 output bits in a computer memory. How to save this 16-bit binary voltage signal (0V, 5V) in a computer memory?
If a FPGA is to be used, please elaborate the method.
Sample Data and feed to fifo
Take data from fifo and prepare UDP frames and send data over ethernet
Received UDP packets on PC side and put in memory

rtp decoding issue on p frames

I am streaming an rtsp stream from an IP camera. I have a parser which packages the data into frames based on the rtp payload type. The parser is able to process I frames since these contain the start of frame and end of frame packets, as well as packets in between (this is FU-A payload type).
These are combined to create a complete frame. The problem comes in when I try to construct P frames, from the wireshark dump some of these appear to be fragmented (FU-A payload type) these contain the start of frame and end of frame packets, however these do not contain packets in between. Also in some instances the camera sends strange marked packets with a payload type 1, this according to my understanding should be a complete frame.
Upon processing these two versions of P frames I then use ffmpeg to attempt to decode the frames, I receive errors messages like top block unavailable for intra mode 4x4.
At first I thought this could be due to an old ffmpeg version but I searched the web and recompiled ffmpeg with the same problem.
The I frames appear fragmented and contain lots of packets, some P frame have a start of frame (0x81) and EOF (0x41) but no packets in between and some just looked corrupt starting with 0x41 (seems like this should be the second byte) which gives payload type of 1. I am a novice when it comes to these issues but I looked at rtp documentation and I cannot find an issue with how I handle the data.
Also I stream from VLC and this seems fine but appears to halve the frame rate, I am not sure how they are able to reconstruct frames.
Please could someone help.
It is common for I-frames to be fragmented since they are usually a lot bigger than p-frames. P-frames can however also be fragmented. However there is nothing wrong with a P-frame that has been fragmented into 2 RTP packets i.e. one with the FU-header start bit set, and the following one with the end bit set. There do not need to be packets in between. For example, if the MTU is 1500, and the NAL unit is 1600 bytes large, this will be fragmented into 2 RTP packets.
As for the packets "looking corrupt" starting with 0x41 without a prior packet with a 0x81, you should examine the sequence number in the RTP header as this will tell you straight away if packets are missing. If you are seeing packet loss, the first thing to try is to increase your socket receiver buffer size.
Since VLC is able to play the stream, there is most likely an issue in the way you are reassembling the NAL units.
Also, in your question it is not always clear which byte you are referring to: I'm assuming that the 0x41 and 0x81 appear in the 2nd byte of the RTP payload, i.e. the FU header in the case where the NAL unit type of the first byte is FU-A.
Finally, note that "payload type" is the RTP payload type (RFC3550), not the NAL unit type defined in the H.264 standard.

MME Audio Output Buffer Size

I am currently playing around with outputting FP32 samples via the old MME API (waveOutXxx functions). The problem I've bumped into is that if I provide a buffer length that does not evenly divide the sample rate, certain audible clicks appear in the audio stream; when recorded, it looks like some of the samples are lost (I'm generating a sine wave for the test). Currently I am using the "magic" value of 2205 samples per buffer for 44100 sample rate.
The question is, does anybody know the reason for these dropouts and if there is some magic formula that provides a way to compute the "proper" buffer size?
Safe alignment of data buffers is the value of nBlockAlign of WAVEFORMATEX structure.
Software must process a multiple of nBlockAlign bytes of data at a
time. Data written to and read from a device must always start at the
beginning of a block. For example, it is illegal to start playback of
PCM data in the middle of a sample (that is, on a non-block-aligned
boundary).
For PCM formats this is the amount of bytes for single sample across all channels. Non-PCM formats have their own alignments, often equal to length of format-specific block, e.g. 20 ms.
Back in time when waveOutXxx was the primary API for audio, carrying over unaligned bytes was an unreasonable burden for the API and unneeded performance overhead. Right now this API is a compatibility layer on top of other audio APIs, and I suppose that unaligned bytes are just stripped to still play the rest of the content, which would otherwise be rejected in full due to this small glitch, which might be just a smaller and non-fatal caller's inaccuracy.
if you fill the audio buffer with sine sample and play it looped , very easily it will click , unless the buffer length is not a multiple of the frequence, as you said ... the audible click in fact is a discontinuity in the wave ...an advanced techinques is to fill the buffer dinamically , that is, you should set a callback notification while the buffer pointer advance and fill the buffer with appropriate data at appropriate offset. i would use a more large buffer as 2205 is too short to get an async notification , calculate data , and write the buffer ,all that while playing , but it would depend of cpu power

Internet Video | FFMPEG | 2-PASS encoding vs. 1-PASS CRF

What is the most preferable way to encode internet video?
2-Pass encoding probably takes longer processing time, but results in lower file size, and more average bitrate (?) Correct?
CRF (constant rate factor) results in a constant rate, but higher file size?
What is default way sites like youtube, vimeo encode their videos? And should I do it any other way than I do now with 2-Pass encoding?
Fredrick is right about VBR vs. CBR, but dropson mentions CRF (constant rate factor), which is actually kind of a third method. CBR and VBR both lock in on a bit rate, while CRF locks in on a perceived visual quality. It also takes into account motion in the video, and can typically achieve better compression than 2-pass VBR. More info.
It's the default setting if you're using x264 or Zencoder. I'd go with CRF any time you're doing h.264.
There are two encoding modes for video
CBR or Constant Bit Rate
Main usage is when you have a fixed carrier for your data, the best example here is the video telephony Use Case, where audio/video/control information needs to co-exist on a fixed 64 kbit carrier. Since this is a real-time UC, one pass encoding is used and the rate-controller (RC) does it's best to have a fixed number of bits assigned to each frame so that the bitrate is deterministic.
VBR or Variable Bit Rate
This encoding scheme is used practically every where else. Variable here means that e.g. if the video goes black or no motion, no bits are sent, i.e bitrate is 0 for this particular moment, then when things starts to move again, the bitrate sky rockets.
This encoding scheme have normally no real-time requirements, e.g. when encoding/transcoding an video. Normally you would use a multipass encoder here to get the highest quality and to even out the bitrate-peakes.
Youtube uses VBR. Use e.g clive to download videos from youtube and analyse them using ffmpeg and you'll see the variable bitrate in action.
As always, wikipedia is your friend, read their entry on VBR and CBR
There is no reason for you to use anything else than VBR (unless you plan to set up an streaming-server)

Resources