How can I avoid distortion and stuttering in DirectSound? - windows

I have a DirectSound application I'm writing in C, running on Windows 7. The application just captures some sound frames, and plays them back. For sanity-checking the capture results, I'm writing out the PCM data to a file, which I can play in Linux using aplay.
Unfortunately, the sound is choppy, sometimes contains stuttering (and plays at the wrong speed in Linux). Oddly, the amount of distortion observed when playing the capture file is less if the PCM data is not played in the playback buffer at the time of capture.
Here's the initialization of my WAVEFORMATEX:
memset(&wfx, 0, sizeof(WAVEFORMATEX));
wfx.cbSize = 0;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = 1;
wfx.nSamplesPerSec = sampleRate;
wfx.wBitsPerSample = sampleBitWidth;
wfx.nBlockAlign = (wfx.nChannels * wfx.wBitsPerSample) / 8;
wfx.nAvgBytesPerSec = wfx.nSamplesPerSec * wfx.nBlockAlign code here
The sampleRate is 8000, and sampleBitWidth is 16.
I create a capture and play buffer using this same structure, and the capture buffer has 3 notification positions. I start capturing with:
lpDsCaptureBuffer->Start(DSCBSTART_LOOPING);
I then spark off a playback thread that calls WaitForMultipleObjects on the events associated with the notification points. Upon notification, I reset all the events, and copy the 1 or 2 pieces of the capture buffer to a local buffer, and pass those on to a play routine:
void playFromBuff(LPVOID captureBuff,DWORD captureLen) {
LPVOID playBuff;
DWORD playLen;
HRESULT hr;
hr = lpDsPlaybackBuffer->Lock(0L,captureLen,&playBuff,&playLen,NULL,0L,0L);
memcpy(playBuff,captureBuff,playLen);
hr = lpDsPlaybackBuffer->Unlock(playBuff,playLen,NULL,0L);
hr = lpDsPlaybackBuffer->SetCurrentPosition(0L);
hr = lpDsPlaybackBuffer->Play(0L,0L,0L);
}
(some error-checking omitted).
Note that the playback buffer has no notification positions. Each time I get a chunk from the capture buffer, I lock the playback buffer starting at position 0.
The capture code, guarded by the WaitForMultipleObjects, looks like:
lpDsCaptureBuffer->GetCurrentPosition(NULL,&readPos);
hr = lpDsCaptureBuffer->Lock(...,...,&captureBuff1,&captureLen1,&captureBuff2,&captureLen2,0L);
where the ellipses contain calculations involving the current and previously-seen read positions. I'm omitting those likely-wrong calculations -- I suspect that's where the problem lies.
My notification positions are multiples of 1024. Yet the read positions reported are 1500, 2500, and 3500. So if I see a read position of 1500, does that mean I can read from bytes 0 to 1500. And when next I see 2500, does that mean I should read from 1501 to 2500? Why do those read positions not correspond exactly to my notification positions? What's the right algorithm here?
I've tried the simpler alternative of stopping the capture when the capture buffer is full, without other notification positions. But that means, I think, allowing some sound to escape capture.

My notification positions are multiples of 1024. Yet the read positions reported are 1500, 2500, and 3500. So if I see a read position of 1500, does that mean I can read from bytes 0 to 1500. And when next I see 2500, does that mean I should read from 1501 to 2500? Why do those read positions not correspond exactly to my notification positions? What's the right algorithm here?
DirectSound API is nowadays a compatibility layer on top of other "real" audio capture API. This means that inside audio capture fills some buffers (esp. those multiples of 500) and then passes the filled buffers to DirectSound capture, which in turn reports them to you. This explains why you see read positions as multiples of 500, because DirectSound itself has data available this way.
Since you are interested in getting captured data, your assumption is correct that you are interested mostly in read position. You get the notification and you know what offset is safe to read up to. Since the capture API is layered, there is some latency involved because layers need to pass chunks of data between one another, before making them available to you.

Related

How can I read the received packets with a NDIS filter driver?

I am currently experimenting with the NDIS driver samples.
I am trying to print the packets contents (including the MAC-addresses, EtherType and the data).
My first guess was to implement this in the function FilterReceiveNetBufferLists. Unfortunately I am not sure how to extract the packets contents out of the NetBufferLists.
That's the right place to start. Consider this code:
void FilterReceiveNetBufferLists(..., NET_BUFFER_LIST *nblChain, ...)
{
UCHAR buffer[14];
UCHAR *header;
for (NET_BUFFER_LIST *nbl = nblChain; nbl; nbl = nbl->Next) {
header = NdisGetDataBuffer(nbl->FirstNetBuffer, sizeof(buffer), buffer, 1, 1);
if (!header)
continue;
DbgPrint("MAC address: %02x-%02x-%02x-%02x-%02x-%02x\n",
header[0], header[1], header[2],
header[3], header[4], header[5]);
}
NdisFIndicateReceiveNetBufferLists(..., nblChain, ...);
}
There are a few points to consider about this code.
The NDIS datapath uses the NET_BUFFER_LIST (nbl) as its primary data structure. An nbl represents a set of packets that all have the same metadata. For the receive path, nobody really knows much about the metadata, so that set always has exactly 1 packet in it. In other words, the nbl is a list... of length 1. For the receive path, you can count on it.
The nbl is a list of one or more NET_BUFFER (nb) structures. An nb represents a single network frame (subject to LSO or RSC). So the nb corresponds most closely to what you think of as a packet. Its metadata is stored on the nbl that contains it.
Within an nb, the actual packet payload is stored as one or more buffers, each represented as an MDL. Mentally, you should pretend the MDLs are just concatenated together. For example, the network headers might be in one MDL, while the rest of the payload might be in another MDL.
Finally, for performance, NDIS gives as many NBLs to your LWF as possible. This means there's a list of one or more NBLs.
Put it all together, and you have:
Your function receives a list of NBLs.
Each NBL contains exactly 1 NB (on the receive path).
Each NB contains a list of MDLs.
Each MDL points to a buffer of payload.
So in our example code above, the for-loop iterates along that first bullet point: the chain of NBLs. Within the loop, we only need to look at nbl->FirstNetBuffer, since we can safely assume there is no other nb besides the first.
It's inconvenient to have to fiddle with all those MDLs directly, so we use the helper routine NdisGetDataBuffer. You tell this guy how many bytes of payload you want to see, and he'll give you a pointer to a contiguous range of payload.
In the good case, your buffer is contained in a single MDL, so NdisGetDataBuffer just gives you a pointer back into that MDL's buffer.
In the slow case, your buffer straddles more than one MDL, so NdisGetDataBuffer carefully copies the relevant bit of payload into a scratch buffer that you provided.
The latter case can be fiddly, if you're trying to inspect more than a few bytes. If you're reading all 1500 bytes of the packet, you can't just allocate 1500 bytes on the stack (kernel stack space is scarce, unlike usermode), so you have to allocate it from the pool. Once you figure that out, note it will slow things down to copy all 1500 bytes of data into a scratch buffer for every packet. Is the slowdown too much? It depends on your needs. If you're only inspecting occasional packets, or if you're deploying the LWF on a low-throughput NIC, it won't matter. If you're trying to get beyond 1Gbps, you shouldn't be memcpying so much data around.
Also note that if you ultimately want to modify the packet, you'll need to be wary of NdisGetDataBuffer. It can give you a copy of the data (stored in your local scratch buffer), so if you modify the payload, those changes won't actually stick to the packet.
What if you do need to scale to high throughputs, or modify the payload? Then you need to work out how to manipulate the MDL chain. That's a bit confusing at first, but spend a little time with the documentation and draw yourself some whiteboard diagrams.
I suggest first starting out by understanding an MDL. From networking's point of view, an MDL is just a fancy way of holding a { char * buffer, size_t length }, along with a link to the next MDL.
Next, consider the NB's DataOffset and DataLength. These conceptually move the buffer boundaries in from the beginning and the end of the buffer. They don't really care about MDL boundaries -- for example, you can reduce the length of the packet payload by decrementing DataLength, and if that means that one or more MDLs are no longer contributing any buffer space to the packet payload, it's no big deal, they're just ignored.
Finally, add on top CurrentMdl and CurrentMdlOffset. These are redundant with everything above, but they exist for (microbenchmark) performance. You aren't required to even think about them if you're reading the NB, but if you are editing the size of the NB, you do need to update them.

Real Time Workaround using windows for fixed sampling time

I am trying to collect data off an accelerometer sensor. I have an Arduino doing the analog to digital conversion of the signal and sending it through a serial port to MATLAB on Windows.
I send a reading every 5ms from the Arduino through the serial port. I am saving that data using MATLAB's serial read in a vector as well as the time at which it was read using the clock method.
If I was to plot the column of the vector where I have saved at which second I read, I get a curve (non-linear), and when I look at the difference between 1 read and another, I see that it is slightly varying.
Is there any way to get the data saved in real time with fixed sampling time?
Note: I am using 250000 baud rate.
Matlab Code:
%%%%% Initialisation %%%%%
clear all
clc
format shortg
cnt = 1;%File name changer
sw = 1;%switch: 0 we add to current vector and 1 to start new vector
%%%%% Initialisation %%%%%
%%%%% Communication %%%%%
arduino=serial('COM7','BaudRate',250000);
fopen(arduino);
%%%%% Communication %%%%%
%%%%% Reading from Serial and Writing to .mat file%%%%%
while true,
if sw == 0,
if (length(Vib(:,1))==1000),% XXXX Samples in XX minutes
filename = sprintf('C:/Directory/%d_VibrationReading.mat',cnt);
save (filename,'Vib');
clear Vib
cnt= cnt+1;
sw = 1;
end
end
scan = fscanf(arduino,'%f');
if isfloat(scan) && length(scan(:,1))==6,% Change length for validation
vib = scan';
if sw == 1,
Vib = [vib clock];
sw = 0;
else
Vib = [Vib;vib clock];
end
end
end
%%%%% Reading from Serial and Writing to .mat file%%%%%
% Close Arduino Serial Port
fclose(arduino);
Image 1 shows the data received through serial (each Row corresponding to 1 serial read)
Image 2 shows that data saved with the clock
Image 1:
Image 2:
I know that my answer does not contain a quick and easy solution. Instead it primarily gives advice how to redesign your system. I worked with real-time systems for several years and saw it done wrong too many time. It might be possible to just "fix", but working with your current communication pattern tweaking the performance but I am convinced you will never receive reliable time information.
I will answer this from a general system design perspective, instead of trying to fix your code. Where I see the problems:
In general, it is a bad idea to append time information on the receiving PC. Whenever the sensor is capable and has a clock, append the time information on the sensor system itself. This allows for an accurate relative timing between the measurements. Some clock adjustment might be necessary when the clock on the sensor is not set properly, but that is just a constant offset.
Switch from ASCII-encoded data to binary data. With your sample rate and baut rate set, you only have 50 bytes for each message.
Write a robust receiver. Just dropping messages you "don't understand" is not a good idea. Whenever the buffer is full, you might receive multiple messages unless you use a proper terminator.
Use preallocation. You know how large the batches you want to write are.
A simple solution for a message:
2 bytes - clock milliseconds
4 bytes - unix timestamp of measurement
For each sensor
2 bytes int32 sensor data
2 bytes - Terminator, constant value. Use a value which is outside the range for all previous integers, e.g. intmax
This message format should theoretically allow you to use 21 sensors. Now to the receiving part:
To get a first version running with a good performance, call fread (serial) with large batches of data (size parameter) and dump all readings into a large cell array. Something like:
C=cell(1000,1)
%seek until you hit a terminator
while not(terminator==fread(arduino,1));
for ix=1:numel(C)
C{ix}=fread(arduino,'int16',1000)
end
fclose(arduino);
Once you read the data append it to a single vector: C=[C{:}]; and try to parse it in post-processing. If you manage the performance you may later return to on-the-fly processing, but I recommend to start this way to get the system established.

OSX CoreAudio: Getting inNumberFrames in advance - on initialization?

I'm experimenting with writing a simplistic single-AU play-through based, (almost)-no-latency tracking phase vocoder prototype in C. It's a standalone program. I want to find how much processing load can a single render callback safely bear, so I prefer keeping off async DSP.
My concept is to have only one pre-determined value which is window step, or hop size or decimation factor (different names for same term used in different literature sources). This number would equal inNumberFrames, which somehow depends on the device sampling rate (and what else?). All other parameters, such as window size and FFT size would be set in relation to the window step. This seems the simplest method for keeipng everything inside one callback.
Is there a safe method to machine-independently and safely guess or query which could be the inNumberFrames before the actual rendering starts, i.e. before calling AudioOutputUnitStart()?
The phase vocoder algorithm is mostly standard and very simple, using vDSP functions for FFT, plus custom phase integration and I have no problems with it.
Additional debugging info
This code is monitoring timings within the input callback:
static Float64 prev_stime; //prev. sample time
static UInt64 prev_htime; //prev. host time
printf("inBus: %d\tframes: %d\tHtime: %lld\tStime: %7.2lf\n",
(unsigned int)inBusNumber,
(unsigned int)inNumberFrames,
inTimeStamp->mHostTime - prev_htime,
inTimeStamp->mSampleTime - prev_stime);
prev_htime = inTimeStamp->mHostTime;
prev_stime = inTimeStamp->mSampleTime;
Curious enough, the argument inTimeStamp->mSampleTime actually shows the number of rendered frames (name of the argument seems somewhat misguiding). This number is always 512, no matter if another sampling rate has been re-set through AudioMIDISetup.app at runtime, as if the value had been programmatically hard-coded. On one hand, the
inTimeStamp->mHostTime - prev_htime
interval gets dynamically changed depending on the sampling rate set in a mathematically clear way. As long as sampling rate values match multiples of 44100Hz, actual rendering is going on. On the other hand 48kHz multiples produce the rendering error -10863 ( =
kAudioUnitErr_CannotDoInCurrentContext ). I must have missed a very important point.
The number of frames is usually the sample rate times the buffer duration. There is an Audio Unit API to request a sample rate and a preferred buffer duration (such as 44100 and 5.8 mS resulting in 256 frames), but not all hardware on all OS versions honors all requested buffer durations or sample rates.
Assuming audioUnit is an input audio unit:
UInt32 inNumberFrames = 0;
UInt32 propSize = sizeof(UInt32);
AudioUnitGetProperty(audioUnit,
kAudioDevicePropertyBufferFrameSize,
kAudioUnitScope_Global,
0,
&inNumberFrames,
&propSize);
This number would equal inNumberFrames, which somehow depends on the device sampling rate (and what else?)
It depends on what you attempt to set it to. You can set it.
// attempt to set duration
NSTimeInterval _preferredDuration = ...
NSError* err;
[[AVAudioSession sharedInstance]setPreferredIOBufferDuration:_preferredDuration error:&err];
// now get the actual duration it uses
NSTimeInterval _actualBufferDuration;
_actualBufferDuration = [[AVAudioSession sharedInstance] IOBufferDuration];
It would use a value roughly around the preferred value you set. The actual value used is a time interval based on a power of 2 and the current sample rate.
If you are looking for consistency across devices, choose a value around 10ms. The worst performing reasonable modern device is iOS iPod touch 16gb without the rear facing camera. However, this device can do around 10ms callbacks with no problem. On some devices, you "can" set the duration quite low and get very fast callbacks, but often times it will crackle up because the processing is not finished in the callback before the next callback happens.

MME Audio Output Buffer Size

I am currently playing around with outputting FP32 samples via the old MME API (waveOutXxx functions). The problem I've bumped into is that if I provide a buffer length that does not evenly divide the sample rate, certain audible clicks appear in the audio stream; when recorded, it looks like some of the samples are lost (I'm generating a sine wave for the test). Currently I am using the "magic" value of 2205 samples per buffer for 44100 sample rate.
The question is, does anybody know the reason for these dropouts and if there is some magic formula that provides a way to compute the "proper" buffer size?
Safe alignment of data buffers is the value of nBlockAlign of WAVEFORMATEX structure.
Software must process a multiple of nBlockAlign bytes of data at a
time. Data written to and read from a device must always start at the
beginning of a block. For example, it is illegal to start playback of
PCM data in the middle of a sample (that is, on a non-block-aligned
boundary).
For PCM formats this is the amount of bytes for single sample across all channels. Non-PCM formats have their own alignments, often equal to length of format-specific block, e.g. 20 ms.
Back in time when waveOutXxx was the primary API for audio, carrying over unaligned bytes was an unreasonable burden for the API and unneeded performance overhead. Right now this API is a compatibility layer on top of other audio APIs, and I suppose that unaligned bytes are just stripped to still play the rest of the content, which would otherwise be rejected in full due to this small glitch, which might be just a smaller and non-fatal caller's inaccuracy.
if you fill the audio buffer with sine sample and play it looped , very easily it will click , unless the buffer length is not a multiple of the frequence, as you said ... the audible click in fact is a discontinuity in the wave ...an advanced techinques is to fill the buffer dinamically , that is, you should set a callback notification while the buffer pointer advance and fill the buffer with appropriate data at appropriate offset. i would use a more large buffer as 2205 is too short to get an async notification , calculate data , and write the buffer ,all that while playing , but it would depend of cpu power

PID Control Loops with large and unpredictable anomalies

Short Question
Is there a common way to handle very large anomalies (order of magnitude) within an otherwise uniform control region?
Background
I am working on a control algorithm that drives a motor across a generally uniform control region. With no / minimal loading the PID control works great (fast response, little to no overshoot). The issue I'm running into is there will usually be at least one high load location. The position is determined by the user during installation, so there is no reasonable way for me to know when / where to expect it.
When I tune the PID to handle the high load location, it causes large over shoots on the non-loaded areas (which I fully expected). While it is OK to overshoot mid travel, there are no mechanical hard stops on the enclosure. The lack of hardstops means that any significant overshoot can / does cause the control arm to be disconnected from the motor (yielding a dead unit).
Things I'm Prototyping
Nested PIDs (very agressive when far away from target, conservative when close by)
Fixed gain when far away, PID when close
Conservative PID (works with no load) + an external control that looks for the PID to stall and apply additional energy until either: the target is achieved or rapid rate of change is detected (ie leaving the high load area)
Hardware Limitations
Full travel defined
Hardstops cannot be added (at this point in time)
Update
My answer does not indicate that this is best solution. It's just my current solution that I thought I would share.
Initial Solution
stalled_pwm_output = PWM / | ΔE |
PWM = Max PWM value
ΔE = last_error - new_error
The initial relationship successfully ramps up the PWM output based on the lack of change in the motor. See the graph below for the sample output.
This approach makes since for the situation where the non-aggressive PID stalled. However, it has the unfortunate (and obvious) issue that when the non-aggressive PID is capable of achieving the setpoint and attempts to slow, the stalled_pwm_output ramps up. This ramp up causes a large overshoot when traveling to a non-loaded position.
Current Solution
Theory
stalled_pwm_output = (kE * PID_PWM) / | ΔE |
kE = Scaling Constant
PID_PWM = Current PWM request from the non-agressive PID
ΔE = last_error - new_error
My current relationship still uses the 1/ΔE concept, but uses the non-aggressive PID PWM output to determine the stall_pwm_output. This allows the PID to throttle back the stall_pwm_output when it starts getting close to the target setpoint, yet allows 100% PWM output when stalled. The scaling constant kE is needed to ensure the PWM gets into the saturation point (above 10,000 in graphs below).
Pseudo Code
Note that the result from the cal_stall_pwm is added to the PID PWM output in my current control logic.
int calc_stall_pwm(int pid_pwm, int new_error)
{
int ret = 0;
int dE = 0;
static int last_error = 0;
const int kE = 1;
// Allow the stall_control until the setpoint is achived
if( FALSE == motor_has_reached_target())
{
// Determine the error delta
dE = abs(last_error - new_error);
last_error = new_error;
// Protect from divide by zeros
dE = (dE == 0) ? 1 : dE;
// Determine the stall_pwm_output
ret = (kE * pid_pwm) / dE;
}
return ret;
}
Output Data
Stalled PWM Output
Note that in the stalled PWM output graph the sudden PWM drop at ~3400 is a built in safety feature activated because the motor was unable to reach position within a given time.
Non-Loaded PWM Output

Resources