Synchronisation for audio decoders - algorithm

There's a following setup (it's basically a pair of TWS earbuds and a smartphone):
2 audio sink devices (or buds), both are connected to the same source device. One of these devices is primary (and is responsible for handling connection), other is secondary (and simply sniffs data).
Source device transmits a stream of encoded data and sink device need to decode and play it in sync with each other. There problem is that there's a considerable delay between each receiver (~5 ms # 300 kbps, ~10 ms # 600 kbps and # 900 kbps).
It seems that synchronisation mechanism which is already implemented simply doesn't want to work, so it seems that my only option is to implement another one.
It's possible to send messages between buds (but because this uses the same radio interface as sink-to-source communication, only small amount of bytes at relatively big interval could be transferred, i.e. 48 bytes per 300 ms, maybe few times more, but probably not by much) and to control the decoder library.
I tried the following simple algorithm: secondary will send every 50 milliseconds message to primary containing number of decoded packets. Primary would receive it and update state of decoder accordingly. The decoder on primary only decodes if the difference between number of already decoded frame and received one from peer is from 0 to 100 (every frame is 2.(6) ms) and the cycle continues.
This actually only makes things worse: now latency is about 200 ms or even higher.
Is there something that could be done to my synchronization method or I'd be better using something other? If so, what would be the best in such case? Probably fixing already existing implementation would be the best way, but it seems that it's closed-source, so I cannot modify it.

Related

STM32F411 I need to send a lot of data by USB with high speed

I'm using STM32F411 with USB CDC library, and max speed for this library is ~1Mb/s.
I'm creating a project where I have 8 microphones connected into ADC line (this part works fine), I need a 16-bit signal, so I'm increasing accuracy by adding first 16 signals from one line (ADC gives only 12-bits signal). In my project, I need 96k 16-bit samples for one line, so it's 0,768M signals for all 8 lines. This signal needs 12000Kb space, but STM32 have only 128Kb SRAM, so I decided to send about 120 with 100Kb data in one second.
The conclusion is I need ~11,72Mb/s to send this.
The problem is that I'm unable to do that because CDC USB limited me to ~1Mb/s.
Question is how to increase USB speed to 12Mb/s for STM32F4. I need some prompt or library.
Or maybe should I set up "audio device" in CubeMX?
If small b means byte in your question, the answer is: it is not possible as your micro has FS USB which max speeds is 12M bits per second.
If it means bits your 1Mb (bit) speed assumption is wrong. But you will not reach the 12M bit payload transfer.
You may try to write (only if b means bit) your own class but I afraid you will not find a ready made library. You will need also to write the device driver on the host computer

Is Realterm dropping characters or am I?

I'm using a SAML21 board to accept some data over a serial connection, and at the moment, just mirror it to a serial port on a computer. However this data is 6 bytes at ~250Hz(It was closer to 3KHz before). As far as I can tell I'm tracking the start and end bytes correctly, however my columnar alignment gets out of whack on occasion in realterm.
I have it set up for 6 bytes in single mode. SO all columns should be presenting the same bytes up and down. However, over time as I increase the rate at which I mirror(I am still receiving the data at a fixed rate) the first column's byte tends to float.
I have not used realterm at speeds this high before, so I am not aware of it's limitations.

NFC APDU READ command performance tuning

I am reading several hundred bytes from a DESFire card using APDU commands.
The data application is authenticated, and the response MAC'ed.
I submit a series of READ_DATA commands (0xBD), each retrieving 54 bytes+MAC while increasing read offset for each command.
Will this operation go much quicker if I use a long READ with ADDITIONAL_FRAME (AF) instead of many sequential reads?
I understand that a simple AF is 1 byte vs 8 bytes for a full READ DATA command, thus reducing the number of bytes transferred by ~10%.
But will use of AF give additional performance benefits, for example because of less processing needed by the card?
I am asking this since I am getting only ~220kbit/s effective transfer rate when the theoretical limit is 424kbit/s. See my question on this here
I modified my reads to use ADDITIONAL FRAME.
This reduced the total bytes sent+received from 1628 to 1549 bytes, a 5% reduction.
The time used by tranciecve() was reduced from 602ms to 593ms, a 1.5% reduction.
The conclusion is that use of AF will not give additional performance other that the reduced time for bytes transfered.
The finding also indicates that since the time was reduced by a much lower factor than the data reduction, there must be operations that introduce significant latency that is not dependent on the data trancieved, but ReadFile is not the one.
Authenticate, SelectApplication or ReadRecord may be significantly more expensive than the time used for data transfer.
(Wanted to write a comment, but it got quite long...)
I would use the ADDITIONAL FRAME (AF) way, some reasoning follows:
in addition to the mentioned 7 bytes saved in the command, you save 4 MAC bytes in all of the responses (but the last one, of course)
every time you read 54 bytes, you wastefully MAC 2 zero bytes of padding, which might have been MACed as data (given by DES block size of 8). In the "AF way" there is only a single MAC run covering all the data, so this does not happen here
you are not enforcing the actual frame size. It is up to the reader and the card to select the right frame size (here I am not 100% sure how DESFire handles the ISO 14443-4 chaining and FSD)
some readers can handle the AF situation on their own and (magically?) give you the complete answer (doing the AF somehow in their firmware -- I have seen at least one reader doing this)
If my thoughts are (at least partially) correct, these points make only 9ms difference in your scenario. But under another scenario they might make much more.
Additional notes:
I would exclude the SELECT APPLICATION and AUTHENTICATE from the benchmark and measure them separately. It is up to you, but I would say that they only interfere with the desired "raw" data transfer measurement
I would recommend to benchmark the pure "plain data transfer" mode which is (presumably) the fastest "raw" data transfer possible
Thank you for sharing your results, not many people do so...Good luck!

MME Audio Output Buffer Size

I am currently playing around with outputting FP32 samples via the old MME API (waveOutXxx functions). The problem I've bumped into is that if I provide a buffer length that does not evenly divide the sample rate, certain audible clicks appear in the audio stream; when recorded, it looks like some of the samples are lost (I'm generating a sine wave for the test). Currently I am using the "magic" value of 2205 samples per buffer for 44100 sample rate.
The question is, does anybody know the reason for these dropouts and if there is some magic formula that provides a way to compute the "proper" buffer size?
Safe alignment of data buffers is the value of nBlockAlign of WAVEFORMATEX structure.
Software must process a multiple of nBlockAlign bytes of data at a
time. Data written to and read from a device must always start at the
beginning of a block. For example, it is illegal to start playback of
PCM data in the middle of a sample (that is, on a non-block-aligned
boundary).
For PCM formats this is the amount of bytes for single sample across all channels. Non-PCM formats have their own alignments, often equal to length of format-specific block, e.g. 20 ms.
Back in time when waveOutXxx was the primary API for audio, carrying over unaligned bytes was an unreasonable burden for the API and unneeded performance overhead. Right now this API is a compatibility layer on top of other audio APIs, and I suppose that unaligned bytes are just stripped to still play the rest of the content, which would otherwise be rejected in full due to this small glitch, which might be just a smaller and non-fatal caller's inaccuracy.
if you fill the audio buffer with sine sample and play it looped , very easily it will click , unless the buffer length is not a multiple of the frequence, as you said ... the audible click in fact is a discontinuity in the wave ...an advanced techinques is to fill the buffer dinamically , that is, you should set a callback notification while the buffer pointer advance and fill the buffer with appropriate data at appropriate offset. i would use a more large buffer as 2205 is too short to get an async notification , calculate data , and write the buffer ,all that while playing , but it would depend of cpu power

Why wait SIFS time before sending ACK?

Question about the MAC-protocol of 802.11 Wifi.
We have learned that when a station has received the data it waits for SIFS time. Then it sends the packet. When searching online the reason that is always mentioned is to give ACK packets a higher priority. This is understandable since a station first has to wait DIFS time when it wants to send normal data (and DIFS is larger than SIFS).
But why wait at all? Why not immediately send the ACK? The station knows the data has arrived and the CRC is correct, so why wait?
It is theoretically possible to know that the CRC is correct at the exact end of the received data from the wire, but in practice, you need to accumulate all the samples in the last block in order to run the IFFT, deconvolution, FEC, and then, finally, at the very end, after finally getting the input data out of the waveform, do you know that the CRC is correct. Also, you sometimes need to turn on transmit circuitry to send the ACK, which can hamper receive performance. If each step in the processing chain were instantaneous, and if the transmit circuitry definitely didn't interfere with the receive circuitry, and if there were no lead-time necessary for building the waveform for the ACK, it'd be possible to send the ACK immediately after getting the last bit of the wave-form. But, while each element in this chain takes some deterministic time, it is not instantaneous. SIFS gives the receiver time to get the data from the PHY, verify it, and send the response.
Disclaimer: I'm more familiar with Homeplug than 802.11.
It is like that because Distributed Coordination Function (DCF) and Point Coordination Function (PCF) mode can coexist within one cell. That is a base station may use polling while at the same time the cell can use disitributed coordination using CSMA/CA.
So during SIFS, control frames or next fragment may be sent. During PIFS, PCF frames may be sent and during DIFS DCF frames may be sent. During SIFS and PIFS, PCF can work its magic.
Even though not all base stations support PCF all stations must wait since some may support it.
Update:
The way I understand this now is that during SIFS the station may send RTS,CTS or ACK and have enough time to switch back to receiving mode before the sender starts to transmit. If that's correct, it will send ACK during SIFS. Then it will change to receive mode and wait until SIFS elapses. When SIFS has elapsed the transmitter will start sending.
Also, PCF is controlled by PIFS which comes after SIFS and before DIFS and is therefor not relevant for this discussion (my mistake). That is, SIFS < PIFS < DIFS < EIFS.
Sources: This PDF (page 8) and Computer Networks by Andrew S. Tanenbaum
SIFS = RTT (based on PHY Transmission rate) + FRAME PROCESSING DELAY AT RECEIVER (PHY PROCESSING DELAY + MAC PROCESSING DELAY) + FRAME PROCESSING DELAY (FOR COMPOSING RESPONSE CTS/ACK)+ RF TUNER DELAY (CHANGE FROM RX to TX)
A the Transmitter side, after last PHY Symbol (of RTS, e.g), the time required to change to RX mode (at RF). So, I would see SIFS as a RX side calculation than a TX side.
I can't say for sure but it sounds like an optimization strategy similar to IP. If you don't require an ACK for every data packet, it makes sense to hold off for a bit so that, if more data packets arrive, you can acknowledge them all with a single ACK.
Example: client sends 400 packets really fast to the server. Rather than the server sending back 400 ACKs, it can simply wait until the client takes a breather before sending a single ACK back. Combined with the likelihood that the client will take a breather even under heavy load (it has to as its unacknowledged-packets buffer fills up), this would be workable.
This is possible in systems where the ACK(n) means "I've received everything up to and including packet # n.
You'll get better performance and less traffic by using such a strategy. As long as the wait-before-sending-ack time on the receiver is less than the retransmit-if-no-ack-before time on the sender (taking transmission delays into account), there should be no problem.
Quick crash-course on 802.11:
802.11 is a essentially a giant system of timers. The most common implementations of 802.11 utilize the distributed coordination function, DCF. The DCF allows for nodes to come in and out of the range of a radio channel being used for 802.11 and coordinate in a distributed fashion who should be sending and receiving data (ignoring hidden and exposed node problems for this discussion). Before any node can begin sending data on the channel they all must wait a period of DIFS, in which the channel is determined to be idle, if it is idle during a DIFS period the first node to grab the channel begins transmitting. In standard 802.11, i.e. non-802.11e implementations and non 802.11n, every single data packet that gets transmitted needs to be acknowledged by a physical layer, PHY, acknowledgment packet, irregardless of the upper layer protocol being used. After a data packet gets sent a SIFS time period needs to expire, after SIFS expires control frames destined for the node that has "taken" control of the channel may be sent, in this instance and acknowledgment frame is transmitted. SIFS allows the node that sent the data packet to switch from transmitting to receiving mode. If a packet does get lost and no ACK is received after SIFS/ACK timeout occurs, then exponential back-off is invoked. Exponential back-off, a.k.a contention window (CW), begins at a value CWmin, in some linux implementation this is 15 slot times, where a slot time varies depending on the 802.11 protocol that is being used. The CW value is then chosen from 1 to whatever the upper limit that has been calculated for CW. If the current packet was lost, then the CW is incremented from 15 to 30, and then a random value is chosen between 1 and 30. Every-time there is a consecutive lose the CW doubles up to 1023, at which point it hits a limit. Once a packet is received successfully the CW is reset back to CWmin.
In regards to 802.11n / 802.11e:
Every data packet still needs to be acknowledged, but when using 802.11e (implemented into 802.11n) multiple data packets can be aggregated together in two different ways A-MSDU or A-MPDU. A-MSDU is a jumbo-frame that has one checksum for the entire aggregated packet being sent, within it are many sub-frames that contain each of the data frames that needed to be sent. If there is any error in the A-MSDU frame and it needs to be retransmitted, then every sub-frame is required to be resent. However, when using A-MPDU, each sub-frame has a small header and checksum that allow for any sub-frame that has an error in it to be retransmitted by itself/within another aggregated frame the next time the sending nodes gains the channel. With these aggregated packet sending schemes there is the notion of the block-ack. The block-ack contains a bitmap of the frames from a starting sequence number that were just sent in the aggregated packet and received correctly or incorrectly. Usage of aggregated frame sending greatly improves throughput performance as more data can be sent per channel acquisition by a sending node, also allowing out-of-order packet sending. However, out-order packet sending greatly complicates the 802.11 MAC layer.
SIFS=D+M+Rx/Tx
Where,
D=(Receiver delay (RF delay) and decoding of physical layer convergence procedure (PLCP) preamble/header)
M=(MAC processing delay)
Rx/Tx=(transceiver turnaround time)
Above all the delays can not be avoided so It has to wait SIFS time before sending acknowledgement

Resources