I want to play a sound sample (about 2 minute long) and record keystrokes while the sample is playing. I need to know the exact time a key was pressed relative to the play start time, with a resolution of 1 millisecond or less.
First I tried NAudio and a Winforms application. I found it quite impossible to synchronize the playback start time and the keystroke time. I know when I start transferring the sample bytes to NAudio, but not the time it takes between passing the sample and the actual playback. I also know the time my application receives the KeyDown event, but not the time it's actually taken for the event to go all the way from the keyboard hardware to my C# event handler.
This is more or less accurate - I get a 270ms(+- 5ms) delay between the delay reported by the application and the actual delay (I know the actual delay by recording the session and looking at the sample file. The recording is done on a different device, not the computer running the application of course...).
This isn't good enough, because of the +- 5ms. This happens even when I disabled the generation 2 GC during playback. I need to be more accurate than this.
I don't mind switching to C++ and unmanaged code, but I'd like to know which sound playback API to use, and whether there are more accurate ways to get the keyboard input than waiting for the WM_KEYDOWN message.
With NAudio, all output devices that implement the IWavePosition interface can report accurately exactly where the playback position is. It can take a bit of trial and error to learn exactly how to convert the position into time, but this is your best approach to solving this problem.
Related
What is the preferred way of synchronizing with monitor refreshes, when vsync is not an option? We enable vsync, however, some users disable it in driver settings, and those override app preferences. We need reliable predictable frame lengths to simulate the world correctly, do some visual effects, and synchronize audio (more precisely, we need to estimate how long a frame is going to be on screen, and when it will be on screen).
Is there any way to force drivers to enable vsync despite what the user set in the driver? Or to ask Windows when a monitor rerfesh is going to happen? We have issues with manual sleeping when our frame boundaries line up closely to vblank. It causes occasional missed frames, and up to 1 extra frame of input latency.
We mainly use OpenGL, but Direct3D advice is also appreciated.
You should not build your application's timing on the basis of vsync and exact timings of frame presentation. Games don't do that these days and have not do so for quite some time. This is what allows them to keep a consistent speed even if they start dropping frames; because their timing, physics computations, AI, etc isn't based on when a frame gets displayed but instead on actual timing.
Game frame timings are typically sufficiently small (less than 50ms) that human beings cannot detect any audio/video synchronization issues. So if you want to display an image that should have a sound played alongside it, as long as the sound starts within about 30ms or so of the image, you're fine.
Oh and don't bother trying to switch to Vulkan/D3D12 to resolve this problem. They don't. Vulkan in particular decouples presentation from other tasks, making it basically impossible to know the exact time when an image starts appearing on the screen. You give Vulkan an image, and it presents it... at whatever is the next most opportune moment. You get some control over how that moment gets chosen, but even those choices can be restricted based on factors outside of your control.
Design your program to avoid the need for rigid vsync. Use internal timings instead.
Let's say for instance that you are watching a stream with sound that's two seconds too early. No matter how many times you refresh, you still hear sound 2 secs before the image appears.
Would it be possible to then programmatically delay the whole system's sound to match the image? After searching for a while I did not find anything on the topic, for free at least.
I am writing a client/server app in that server send live audio data that capture audio samples that captured from some external device( mic. for example ) and send it to the client. Then client want to play those samples. My app will run on local network so I have no problem with bandwidth( My sound is 8k, 8bit stereo while my net card 1000Mb ). In client I buffer the data for a small time and then start playback. and as data arrive from server I send them to sound card. This seems to work fine but there is a problem:
when my buffer in the client side finished, I will experience gaps in played sound.
I consider this is because of the difference in sampling time of the server and the client, it means that 8K on server is not same as 8K on client.
I can solve this with pausing client's playback and buffer again, but my boss doesn't accept it, since I have proper bandwidth and I should be able to play sound with no gap or pause.
So I decided to dynamically change playback speed in the client but I don't know how.
I am programming in Windows( native ) and I currently use waveOutXXX to play the sound. I can use any other native library( DirectX/DirectSound, Jack or ... ) but they should provide a smooth playback in the client.
I have programmed with waveOutXXX many times without any problem and I know it good but I can't solve my problem of dynamic resampling
I would suggest that your problem isn't likely due to mis-matched sample rates, but something to do with your buffering. You should be continuously dumping data to the sound card, and continuously filling your buffer. Use a reasonable buffer size... 300ms should be enough for most applications.
Now, over long periods of time, it is possible for the clock on the recording side and the clock on the playback side to drift apart enough that the 300ms buffer is no longer sufficient. I would suggest that rather than resampling at such a small difference, which could introduce artifacts, simply add samples at the encoding end. You still record at 8kHz, but you might add a sample or two every second, to make that 8.001kHz or so. Simply doubling one of the existing samples for this (or even a simple average between one sample and the next) will not be audible. Adjust this as necessary for your application.
I had a similar problem in an application I worked on. It did not involve network, but it did involve source data being captured in real-time at a certain fixed sampling rate, a large amount of signal processing, and finally output to the sound card at a fixed rate. Like you, I had gaps in the playback at buffer boundaries.
It seemed to me like the problem was that the processing being done caused audio data to make it to the sound card in a very jerky manner. That is, it would get a large chunk, then it would be a long time before it got another chunk. The overall throughput was correct, but this latency caused the sound card to often be starved for data. I suppose you may have the same situation with the network piece in your system.
The way I solved it was to first make the audio buffer longer. Then, every time a new chunk of audio was received, I checked how full the buffer was. If it was less than 20% full, I would write some silence to make it around 60% full.
You may think that this goes against reducing the gaps in playback since it is actually adding a gap, but it actually helps. The problem that I was having was that even though I had a significantly large audio buffer, I was always right at the verge of it being empty. With the other latencies in the system, this resulted in playback gaps on almost every buffer.
Writing the silence when the buffer started to get empty, but before it actually did, ensured that the buffer always had some data to spare if the processing fell behind a little. Also, just a single small gap in playback is very hard to notice compared to many periodic gaps.
I don't know if this will work for you, but it should be easy to implement and try out.
I need to measure how long it takes before my code executes a transfer call until the actual packets is sent over the air.
Is this possible using the XCode developer tool "Instruments" or is it best to look for timestamps in my code somewhere?
All help is really appreciated
I have used Packet Analyzer to debug traces over the air (http://www.fte.com/). But it's a very expensive tool.
Otherwise, you won't get a precise measurement. You have no idea, what delays could hardware create.
Although it would be fun to have a look. Set connection intervals of your tag and then check with logging timestamp if the delta you get is similar to delta you set.
The setup
The game in question is using CoreAudio and single AudioGraph to play sounds.
The graph looks like this:
input callbacks -> 3DMixer -> DefaultOutputDevice
3DMixer's BusCount is set to 50 sounds max.
All samples are converted to default output device's stream format before being fed to input callbacks. Unused callbacks aren't set (NULL). Most sounds are 3D, so azimuth, pan, distance and gain are usually set for each mixer input, not left alone. They're checked to make sure only valid values are set. Mixer input's playback rate is also sometimes modified slightly to simulate pitch, but for most sounds it's kept at default setting.
The problem
Let's say I run the game and start a level populated with many sounds, lot of action.
I'm running HALLab -> IO Cycle Telemetry window to see how much time it takes to process each sound cycle - it doesn't take any more than 4ms out of over 10 available ms in each cycle, and I can't spot a single peak that would make it go over alloted time.
At some point when playing the game, when many sounds are playing at the same time (less than 50, but not less than 20), I hear a poof, and from then on only silence. No Mac sounds can be generated from any application on Mac. The IO Telemetry window shows my audio ticks still running, still taking time, still providing samples to output device.
This state persists even if there are less, then no sounds playing in my game.
Even if I quit the game entirely, Mac sounds generated by other applications don't come back.
Putting Mac to sleep and waking it up doesn't help anything either.
Only rebooting it fully results in sounds coming back. After it's back, first few sounds have crackling in them.
What can I do to avoid the problem? The game is big, complicated, and I can't modify what's playing - but it doesn't seem to overload the IO thread, so I'm not sure if I should. The problem can't be caused by any specific sound data, because all sound samples are played many times before the problem occurs. I'd think any sound going to physical speakers would be screened to avoid overloading them physically, and the sound doesn't have to be loud at all to cause the bug.
I'm out of ideas.