How do you find the audio latency ? (Windows/OSX) - windows

I'm writing a music application, but I have no clue how applications such as Ableton/Cubase/etc find the audio latency of a system (do they ?) so they can compensate for time difference when recording/playing.
Meaning, the audio input latency (from mic to useable buffer) and the audio output latency (from a buffer to sound in speakers).
It seems more complex than just a matter of buffer size, since an internal chain of events occurs in-between the analogic audio and the digital data the software has access to.
Any idea how to (gu)estimate that ?

CoreAudio:
property.mSelector = kAudioDevicePropertyLatency;
if ( AudioObjectHasProperty( id, &property ) == true ) {
result = AudioObjectGetPropertyData( id, &property, 0, NULL, &dataSize, &latency );
WASAPI:
IAudioClient::GetStreamLatency
IAudioClient*& captureAudioClient->GetStreamLatency( ( long long* ) &stream_.latency[mode] );
IAudioClient*& renderAudioClient->GetStreamLatency( ( long long* ) &stream_.latency[mode] );
ASIO:
long inputLatency, outputLatency;
ASIOGetLatencies( &inputLatency, &outputLatency );
ALSA:
snd_pcm_sframes_t frames;
snd_pcm_delay( handle, &frames );
OpenSL:
AudioManager am = (AudioManager)getSystemService(Context.AUDIO_SERVICE);
Method m = am.getClass().getMethod("getOutputLatency", int.class);
latency = (Integer)m.invoke(am, AudioManager.STREAM_MUSIC);
see also:
uint32_t afLatency;
android::AudioSystem::getOutputLatency(&afLatency, ANDROID_DEFAULT_OUTPUT_STREAM_TYPE);
https://android.googlesource.com/platform/system/media/+/4f924ff768d761f53db6fa2dbfb794ba7a65e776/opensles/libopensles/android_AudioPlayer.cpp
https://android.googlesource.com/platform/frameworks/av/+/006ceacb82f62a22945c7702c4c0d78f31eb2290/media/libmedia/AudioSystem.cpp

I'm not sure, but the output is the same as input, with a delay.
If you find the FFT of the input signal and the output one, at the same time, both must to be the very similar of the input.
Make a correlation of both signals, and you must find a pulse.
More tall and thin pulse, more similar input and output.
The time from 0 to the pulse, is the delay time between signals.
The correlation of the FFT of the same signal is a pulse at 0.
The correlation of the FFT of the same signal delayed 1 seg is a pulse at 1s.
So on
Check this link
https://www.dsprelated.com/showarticle/26.php

Related

Why do I need to run this function twice to get the expected output?

In system-verilog i have a random number generator like this:
task set_rand_value(output bit [7:0] rand_frms, output bit [13:0] rand_bcnt);
begin
bit [7:0] rand_num;
// Set the random value between 2 and 254
rand_num=$urandom_range(254,2);
rand_bcnt=$urandom_range(2047,1);
// Check to ensure number is even
if ((rand_num/2)*2 != rand_num) begin
rand_num=rand_num+1;
$display("Generated %d frames. 8'h%h", rand_num, rand_num);
$display("Packets generated with 0x%4h bytes.", rand_bcnt);
end
// Set the random frame number
else begin
rand_frms = rand_num;
$display("Generated %d frames. 8'h%h", rand_num, rand_num);
$display("Packets generated with 0x%4h bytes.", rand_bcnt);
end
end
endtask // set_rand_value
For some reason the first time I run this (and the third and fifth... etc) it returns a value of zero even though a number is properly generated.
I am calling it like this:
bit [ 7:0] num_frms;
bit [13:0] size_val;
// Generate random values
set_rand_value(num_frms, size_val); // Run twice to correct initial write error
$display("Error correction for packets sent: 0x%h", num_frms);
set_rand_value(num_frms, size_val);
$display("The number of packets being sent: 0x%h", num_frms);
Which gives me this output:
Generated 214 frames. 8'hd6
Packets generated with 0x05b2 bytes.
Error correction for packets sent: 0x00
Generated 252 frames. 8'hfc
Packets generated with 0x011f bytes.
The number of packets being sent: 0xfc
0x80fc011f // Expected number
I have tried quite a few things to correct it but for some reason the only way i can get it to behave the way i expect it to is to just run it twice every time i need to use it.
You only set num_frms if the random number is even. Why not just do
function void set_rand_value(output bit [7:0] rand_frms, output bit [13:0] rand_bcnt);
begin
// Set the random value between 2 and 254
rand_frms=$urandom*2;
rand_bcnt=$urandom_range(2047,1);
endfunction
P.S Only use tasks for routines that may block.
The reason why the code above did not work was you were not assigning rand_frms = rand_num; in the first half of the if condition . So only for 50% of the times read_frm would have a value in it. [ 50 % just been approximate ] .

Problems making a parallel Flash memory programmer with arduino

I have a very large project and this a small piece of it, but none the less essential. I have a parallel flash memory chip made by SST and Microchip (a bit confusing) and am having trouble bypassing the write protection. I am using an arduino mega to program it because I do not have time to wait for a programmer to ship from china. here is the datasheet for the flash memory: http://ww1.microchip.com/downloads/en/DeviceDoc/25022B.pdf
void setup() {
Serial.begin(19200);
pinMode(A8,OUTPUT);//OE#
pinMode(A9,OUTPUT);//WE#
for(byte i=0;i<15;i++) //15 bit address bus
pinMode(i+20, OUTPUT);
for(byte i=0;i<8;i++) //8 bit bidirectional data bus
pinMode(i+40, OUTPUT);
wrt(0xAA,0x5555);// erase sector 0 to 0xFF
wrt(0x55,0x2AAA);
wrt(0x80,0x5555);
wrt(0xAA,0x5555);
wrt(0x55,0x2AAA);
wrt(0x30,0);
delay(250);
wrt(0xAA,0x5555);// write byte 0 to 3
wrt(0x55,0x2AAA);
wrt(0xA0,0x5555);
wrt(0x03,0);
delay(1000);
Serial.println("reading....");
for(int i=0;i<16;i++)// read data
rd(i);
}
void loop();
void wrt(byte var, int loc){
datbusout();// set data bus to output mode
digitalWrite(20,HIGH&&(loc&1));
digitalWrite(21,HIGH&&(loc&2));
digitalWrite(22,HIGH&&(loc&4));
digitalWrite(23,HIGH&&(loc&8));
digitalWrite(24,HIGH&&(loc&16));
digitalWrite(25,HIGH&&(loc&32));
digitalWrite(26,HIGH&&(loc&64));
digitalWrite(27,HIGH&&(loc&128));
digitalWrite(28,HIGH&&(loc&256));
digitalWrite(29,HIGH&&(loc&1024));
digitalWrite(30,HIGH&&(loc&2048));
digitalWrite(31,HIGH&&(loc&4096));
digitalWrite(32,HIGH&&(loc&8192));
digitalWrite(33,HIGH&&(loc&16384));
digitalWrite(34,HIGH&&(loc&32768));
for(int i=40;i<48;i++)
digitalWrite(i,HIGH&&(var&(1<<i)));
PORTK=1;// write mode
Serial.println(var,HEX);
delayMicroseconds(20);
PORTK=3;// set OE# and WE# high
}
void rd(int loc){
datbusinp();
byte out=0;
digitalWrite(20,HIGH&&(loc&1));
digitalWrite(21,HIGH&&(loc&2));
digitalWrite(22,HIGH&&(loc&4));
digitalWrite(23,HIGH&&(loc&8));
digitalWrite(24,HIGH&&(loc&16));
digitalWrite(25,HIGH&&(loc&32));
digitalWrite(26,HIGH&&(loc&64));
digitalWrite(27,HIGH&&(loc&128));
digitalWrite(28,HIGH&&(loc&256));
digitalWrite(29,HIGH&&(loc&1024));
digitalWrite(30,HIGH&&(loc&2048));
digitalWrite(31,HIGH&&(loc&4096));
digitalWrite(32,HIGH&&(loc&8192));
digitalWrite(33,HIGH&&(loc&16384));
digitalWrite(34,HIGH&&(loc&32768));
PORTK=2;
delayMicroseconds(1); // wait for read to finish
for(int i=0;i<8;i++)
out|=digitalRead(40+i)<<i;
PORTK=3;
Serial.println(out,HEX);
}
void datbusinp(){
DDRG&=252;// did the same thing like this, just faster
DDRL&=3;
}
void datbusout(){
DDRG|=3;
DDRL|=252;// see last comment
}
for(int i=40;i<48;i++)
digitalWrite(i,HIGH&&(var&(1<<i)));
That is wrong, surely?
You are shifting 1 left 40 times at least, which means you are always writing a zero.
I fixed it!!!! so I accidentally had the wrong values for anding when writing to the address bus. A TON of thanks to Nick Gammon, test would have failed today without him. more about the answer: I needed to change my for loop in the wrt function, and not skip 512 when writing to the address bus. :D code:
digitalWrite(20,HIGH&&(loc&1));
digitalWrite(21,HIGH&&(loc&2));
digitalWrite(22,HIGH&&(loc&4));
digitalWrite(23,HIGH&&(loc&8));
digitalWrite(24,HIGH&&(loc&16));
digitalWrite(25,HIGH&&(loc&32));
digitalWrite(26,HIGH&&(loc&64));
digitalWrite(27,HIGH&&(loc&128));
digitalWrite(28,HIGH&&(loc&256));
digitalWrite(29,HIGH&&(loc&1024));
digitalWrite(30,HIGH&&(loc&2048));
digitalWrite(31,HIGH&&(loc&4096));
digitalWrite(32,HIGH&&(loc&8192));
digitalWrite(33,HIGH&&(loc&16384));
digitalWrite(34,HIGH&&(loc&32768));
needed to become
digitalWrite(20,HIGH&&(loc&1));
digitalWrite(21,HIGH&&(loc&2));
digitalWrite(22,HIGH&&(loc&4));
digitalWrite(23,HIGH&&(loc&8));
digitalWrite(24,HIGH&&(loc&16));
digitalWrite(25,HIGH&&(loc&32));
digitalWrite(26,HIGH&&(loc&64));
digitalWrite(27,HIGH&&(loc&128));
digitalWrite(28,HIGH&&(loc&256));
digitalWrite(29,HIGH&&(loc&512));
digitalWrite(30,HIGH&&(loc&1024));
digitalWrite(31,HIGH&&(loc&2048));
digitalWrite(32,HIGH&&(loc&4096));
digitalWrite(33,HIGH&&(loc&8192));
digitalWrite(34,HIGH&&(loc&16384));

Pyaudio : how to check volume

I'm currently developping a VOIP tool in python working as a client-server. My problem is that i'm currently sending the Pyaudio input stream as follows even when there is no sound (well, when nobody talks or there is no noise, data is sent as well) :
CHUNK = 1024
p = pyaudio.PyAudio()
stream = p.open(format = pyaudio.paInt16,
channels = 1,
rate = 44100,
input = True,
frames_per_buffer = CHUNK)
while 1:
self.conn.sendVoice(stream.read(CHUNK))
I would like to check volume to get something like this :
data = stream.read(CHUNK)
if data.volume > 20%:
self.conn.sendVoice(data)
This way I could avoid sending useless data and spare connection/ increase performance. (Also, I'm looking for some kind of compression but I think I will have to ask it in another topic).
Its can be done using root mean square (RMS).
One way to build your own rms function using python is:
def rms( data ):
count = len(data)/2
format = "%dh"%(count)
shorts = struct.unpack( format, data )
sum_squares = 0.0
for sample in shorts:
n = sample * (1.0/32768)
sum_squares += n*n
return math.sqrt( sum_squares / count )
Another choice is use audioop to find rms:
data = stream.read(CHUNK)
rms = audioop.rms(data,2)
Now if do you want you can convert rms to decibel scale decibel = 20 * log10(rms)

Setting sample rate on AUHAL

I'm using Audio Unit Framework to develop a VOIP app on mac os x.
In my program, I set up an input AUHAL and use the default stream format (44.1kHz,32bit/channel) to capture the audio from mic. In this case, my program works fine.
Here is the Code:
//The default setting in my program
CheckError(AudioUnitGetProperty(m_audCapUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output, //the value is 0
inputBus, //the value is 1
&m_audCapUnitOutputStreamFormat,
&propertySize),
"Couldn't get OutputSample ASBD from input unit") ;
//the inOutputSampleRate is 44100.0
m_audCapUnitOutputStreamFormat.mSampleRate = inOutputSampleRate ;
CheckError(AudioUnitSetProperty(m_audCapUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
inputBus,
&m_audCapUnitOutputStreamFormat,
propertySize),
"Couldn't set OutputSample ASBD on input unit");
//
Since I'm developing a VOIP app, the default format (44.1kHz, 32bits/Channel) isn't appropriate for my program, so I want to change the sample rate to 8kHz.
And I had written this code to change the format in my program:
//......
inOutputFormat.mSampleRate = 8000. ;
inOutputFormat.mFormatID = kAudioFormatLinearPCM ;
inOutputFormat.mChannelsPerFrame = 2 ;
inOutputFormat.mBitsPerChannel = 16 ;
inOutputFormat.mBytesPerFrame = 2 ;
inOutputFormat.mBytesPerPacket = 2 ;
inOutputFormat.mFramesPerPacket = 1 ;
inOutputFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical ;
inOutputFormat.mReserved = 0 ;
CheckError(AudioUnitSetProperty(m_audCapUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
inputBus,
&inOutputFormat,
ui32PropSize),
"Couldn't set AUHAL Unit Output Format") ;
//.......
In this case, the program works fine until my program calls the AudioUnitRender in the callback function; it fails to call the AudioUnitRender with an error code -10876 that means
kAudioUnitErr_NoConnection in the documentation. The error code puzzled me so much, so I googled it but I couldn't find any useful information. Can someone tell me what the error actually means?
This is not the end, I changed the format again by this code:
//.....
inOutputFormat.mSampleRate = 8000. ;
inOutputFormat.mFormatID = kAudioFormatLinearPCM ;
inOutputFormat.mChannelsPerFrame = 2 ;
inOutputFormat.mBitsPerChannel = 32 ;
inOutputFormat.mBytesPerFrame = 4 ;
inOutputFormat.mBytesPerPacket = 4 ;
inOutputFormat.mFramesPerPacket = 1 ;
inOutputFormat.mFormatFlags = kAudioFormatFlagsAudioUnitCanonical ;
inOutputFormat.mReserved = 0 ;
CheckError(AudioUnitSetProperty(m_audCapUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
inputBus,
&inOutputFormat,
ui32PropSize),
"Couldn't set AUHAL Unit Output Format") ;
//........
In this case, the program failed to call the AudioUnitRender again and returned another error code -10863(kAudioUnitErr_CannotDoInCurrentContext). I googled it, but I found
something useful. After reading the information there, I guess the sample rate or format that I set on the AUHAL may not be supported by the hardware.
So I wrote some code to check the available sample rates on the default input device:
//..........
UInt32 propertySize = sizeof(AudioDeviceID) ;
Boolean isWritable = false ;
CheckError(AudioDeviceGetPropertyInfo(inDeviceID, //the inDeviceID is the default input device
0,
true,
kAudioDevicePropertyAvailableNominalSampleRates,
&propertySize,
&isWritable),
"Get the Available Sample Rate Count Failed") ;
m_valueCount = propertySize / sizeof(AudioValueRange) ;
printf("Available %d Sample Rate\n",m_valueCount) ;
CheckError(AudioDeviceGetProperty(inDeviceID,
0,
false,
kAudioDevicePropertyAvailableNominalSampleRates,
&propertySize,
m_valueTabe),
"Get the Available Sample Rate Count Failed") ;
for(UInt32 i = 0 ; i < m_valueCount ; ++i)
{
printf("Available Sample Rate value : %ld\n",(long)m_valueTabe[i].mMinimum) ;
}
//..............
And then I found the available sample rates are 8000, 16000, 32000, 44100, 48000, 88200, and 96000.
The 8000 sample rate is what I set just before, but I get an error code by calling AudioUnitRender, I just want to say, why ?
I had google so much and also read many documentations, but I can't get the answer, can someone solve this problem what I encounter?
In other words; how do I change the sample rate or format on an input-only AUHAL?
Finally I fixed this problem yesterday by myself.
Here is my solution:
Firstly , I use AudioDeviceGetProperty to get the available format list on my defaut input device, then I found the available format list contain : 8khz, 16khz, 32khz, 44.1khz, 48khz, 88.2khz,96khz(I just list the sample rate field here ,but there are other
field in the list).
And then I select one of the available format which obtained in the first step. Take my program as an example , I select the format(8khz,32bits/Channel) and use AudioDeviceSetProperty to set it on the default device but not the AUHAL , this is the key that my program work fine after setinng the format on the AUHAL (OutputScope , inputBus).
The last step , I use the AudioUnitSetProperty to set the format I wanted , the program work fine.
Through this problem and solution , I guess if I want to set the format on the input-only AUHAL ,the format must be match or
can be shift to the available format which the input device is using. So what I need to do is setting the format on the input device firstly and set the format on the input-only AUHAL next.
In my experience, using settings other than 44.1kHz and 16 bit audio results in all sorts of weird errors. Some generic suggestions which might set you on the right path:
Try Novocaine (https://github.com/alexbw/novocaine). It really takes the pain out of working with AudioUnits.
Try getting your app working with 44.1kHz and then downsampling the audio yourself.
The bitrate you are setting may not be compatible with the desired sample rate.
Your answer was very helpful for me. However, the use of AudioDeviceGetProperty is depreciated.
The following listing may be helpful to get things up to date. As an example the sample rate is set to 32 kHz.
// Get the available sample rates of the default input device.
defaultDeviceProperty.mSelector = kAudioDevicePropertyAvailableNominalSampleRates;
CheckError (AudioObjectGetPropertyDataSize(defaultDevice,
&defaultDeviceProperty,
0,
NULL,
&propertySize),
"Couldn't get sample rate count");
int m_valueCount = propertySize / sizeof(AudioValueRange) ;
printf("Available %d Sample Rates\n",m_valueCount) ;
AudioValueRange m_valueTabe[m_valueCount];
CheckError (AudioObjectGetPropertyData(defaultDevice,
&defaultDeviceProperty,
0,
NULL,
&propertySize,
m_valueTabe),
"Couldn't get available sample rates");
for(UInt32 i = 0 ; i < m_valueCount ; ++i)
{
printf("Available Sample Rate value : %f\n", m_valueTabe[i].mMinimum) ;
}
// Set the sample rate to one of the available values.
AudioValueRange inputSampleRate;
inputSampleRate.mMinimum = 32000;
inputSampleRate.mMaximum = 32000;
defaultDeviceProperty.mSelector = kAudioDevicePropertyNominalSampleRate;
CheckError (AudioObjectSetPropertyData(defaultDevice,
&defaultDeviceProperty,
0,
NULL,
sizeof(inputSampleRate),
&inputSampleRate),
"Couldn't get available sample rates");

Robust and fast checksum algorithm?

Which checksum algorithm can you recommend in the following use case?
I want to generate checksums of small JPEG files (~8 kB each) to check if the content changed. Using the filesystem's date modified is unfortunately not an option.
The checksum need not be cryptographically strong but it should robustly indicate changes of any size.
The second criterion is speed since it should be possible to process at least hundreds of images per second (on a modern CPU).
The calculation will be done on a server with several clients. The clients send the images over Gigabit TCP to the server. So there's no disk I/O as bottleneck.
If you have many small files, your bottleneck is going to be file I/O and probably not a checksum algorithm.
A list of hash functions (which can be thought of as a checksum) can be found here.
Is there any reason you can't use the filesystem's date modified to determine if a file has changed? That would probably be faster.
There are lots of fast CRC algorithms that should do the trick:
http://www.google.com/search?hl=en&q=fast+crc&aq=f&oq=
Edit: Why the hate? CRC is totally appropriate, as evidenced by the other answers. A Google search was also appropriate, since no language was specified. This is an old, old problem which has been solved so many times that there isn't likely to be a definitive answer.
CRC-32 comes into mind mainly because it's cheap to calculate
Any kind of I/O comes into mind mainly because this will be the limiting factor for such an undertaking ;)
The problem is not calculating the checksums, the problem is to get the images into memory to calculate the checksum.
I would suggest "stagged" monitoring:
stage 1: check for changes of file timestamps and if you detect a change there hand over to...(not needed in your case as described in the edited version)
stage 2: get the image into memory and calculate the checksum
For sure important as well: multi-threading: setting up a pipeline which enables processing of several images in parallel if several CPU cores are available.
If you are receiving the files over network you can calculate the checksum as you receive the file. This will ensure that you will calculate the checksum while the data is in memory. Hence you won't have to load them into memory from disk.
I believe if you apply this method, you'll see almost-zero overhead on your system.
This is the routines I'm using on an embedded system which does checksum control on firmware and other stuff.
static const uint32_t crctab[] = {
0x0,
0x04c11db7, 0x09823b6e, 0x0d4326d9, 0x130476dc, 0x17c56b6b,
0x1a864db2, 0x1e475005, 0x2608edb8, 0x22c9f00f, 0x2f8ad6d6,
0x2b4bcb61, 0x350c9b64, 0x31cd86d3, 0x3c8ea00a, 0x384fbdbd,
0x4c11db70, 0x48d0c6c7, 0x4593e01e, 0x4152fda9, 0x5f15adac,
0x5bd4b01b, 0x569796c2, 0x52568b75, 0x6a1936c8, 0x6ed82b7f,
0x639b0da6, 0x675a1011, 0x791d4014, 0x7ddc5da3, 0x709f7b7a,
0x745e66cd, 0x9823b6e0, 0x9ce2ab57, 0x91a18d8e, 0x95609039,
0x8b27c03c, 0x8fe6dd8b, 0x82a5fb52, 0x8664e6e5, 0xbe2b5b58,
0xbaea46ef, 0xb7a96036, 0xb3687d81, 0xad2f2d84, 0xa9ee3033,
0xa4ad16ea, 0xa06c0b5d, 0xd4326d90, 0xd0f37027, 0xddb056fe,
0xd9714b49, 0xc7361b4c, 0xc3f706fb, 0xceb42022, 0xca753d95,
0xf23a8028, 0xf6fb9d9f, 0xfbb8bb46, 0xff79a6f1, 0xe13ef6f4,
0xe5ffeb43, 0xe8bccd9a, 0xec7dd02d, 0x34867077, 0x30476dc0,
0x3d044b19, 0x39c556ae, 0x278206ab, 0x23431b1c, 0x2e003dc5,
0x2ac12072, 0x128e9dcf, 0x164f8078, 0x1b0ca6a1, 0x1fcdbb16,
0x018aeb13, 0x054bf6a4, 0x0808d07d, 0x0cc9cdca, 0x7897ab07,
0x7c56b6b0, 0x71159069, 0x75d48dde, 0x6b93dddb, 0x6f52c06c,
0x6211e6b5, 0x66d0fb02, 0x5e9f46bf, 0x5a5e5b08, 0x571d7dd1,
0x53dc6066, 0x4d9b3063, 0x495a2dd4, 0x44190b0d, 0x40d816ba,
0xaca5c697, 0xa864db20, 0xa527fdf9, 0xa1e6e04e, 0xbfa1b04b,
0xbb60adfc, 0xb6238b25, 0xb2e29692, 0x8aad2b2f, 0x8e6c3698,
0x832f1041, 0x87ee0df6, 0x99a95df3, 0x9d684044, 0x902b669d,
0x94ea7b2a, 0xe0b41de7, 0xe4750050, 0xe9362689, 0xedf73b3e,
0xf3b06b3b, 0xf771768c, 0xfa325055, 0xfef34de2, 0xc6bcf05f,
0xc27dede8, 0xcf3ecb31, 0xcbffd686, 0xd5b88683, 0xd1799b34,
0xdc3abded, 0xd8fba05a, 0x690ce0ee, 0x6dcdfd59, 0x608edb80,
0x644fc637, 0x7a089632, 0x7ec98b85, 0x738aad5c, 0x774bb0eb,
0x4f040d56, 0x4bc510e1, 0x46863638, 0x42472b8f, 0x5c007b8a,
0x58c1663d, 0x558240e4, 0x51435d53, 0x251d3b9e, 0x21dc2629,
0x2c9f00f0, 0x285e1d47, 0x36194d42, 0x32d850f5, 0x3f9b762c,
0x3b5a6b9b, 0x0315d626, 0x07d4cb91, 0x0a97ed48, 0x0e56f0ff,
0x1011a0fa, 0x14d0bd4d, 0x19939b94, 0x1d528623, 0xf12f560e,
0xf5ee4bb9, 0xf8ad6d60, 0xfc6c70d7, 0xe22b20d2, 0xe6ea3d65,
0xeba91bbc, 0xef68060b, 0xd727bbb6, 0xd3e6a601, 0xdea580d8,
0xda649d6f, 0xc423cd6a, 0xc0e2d0dd, 0xcda1f604, 0xc960ebb3,
0xbd3e8d7e, 0xb9ff90c9, 0xb4bcb610, 0xb07daba7, 0xae3afba2,
0xaafbe615, 0xa7b8c0cc, 0xa379dd7b, 0x9b3660c6, 0x9ff77d71,
0x92b45ba8, 0x9675461f, 0x8832161a, 0x8cf30bad, 0x81b02d74,
0x857130c3, 0x5d8a9099, 0x594b8d2e, 0x5408abf7, 0x50c9b640,
0x4e8ee645, 0x4a4ffbf2, 0x470cdd2b, 0x43cdc09c, 0x7b827d21,
0x7f436096, 0x7200464f, 0x76c15bf8, 0x68860bfd, 0x6c47164a,
0x61043093, 0x65c52d24, 0x119b4be9, 0x155a565e, 0x18197087,
0x1cd86d30, 0x029f3d35, 0x065e2082, 0x0b1d065b, 0x0fdc1bec,
0x3793a651, 0x3352bbe6, 0x3e119d3f, 0x3ad08088, 0x2497d08d,
0x2056cd3a, 0x2d15ebe3, 0x29d4f654, 0xc5a92679, 0xc1683bce,
0xcc2b1d17, 0xc8ea00a0, 0xd6ad50a5, 0xd26c4d12, 0xdf2f6bcb,
0xdbee767c, 0xe3a1cbc1, 0xe760d676, 0xea23f0af, 0xeee2ed18,
0xf0a5bd1d, 0xf464a0aa, 0xf9278673, 0xfde69bc4, 0x89b8fd09,
0x8d79e0be, 0x803ac667, 0x84fbdbd0, 0x9abc8bd5, 0x9e7d9662,
0x933eb0bb, 0x97ffad0c, 0xafb010b1, 0xab710d06, 0xa6322bdf,
0xa2f33668, 0xbcb4666d, 0xb8757bda, 0xb5365d03, 0xb1f740b4
};
typedef struct crc32ctx
{
uint32_t crc;
uint32_t length;
} CRC32Ctx;
#define COMPUTE(var, ch) (var) = (var) << 8 ^ crctab[(var) >> 24 ^ (ch)]
void crc32_stream_init( CRC32Ctx* ctx )
{
ctx->crc = 0;
ctx->length = 0;
}
void crc32_stream_compute_uint32( CRC32Ctx* ctx, uint32_t data )
{
COMPUTE( ctx->crc, data & 0xFF );
COMPUTE( ctx->crc, ( data >> 8 ) & 0xFF );
COMPUTE( ctx->crc, ( data >> 16 ) & 0xFF );
COMPUTE( ctx->crc, ( data >> 24 ) & 0xFF );
ctx->length += 4;
}
void crc32_stream_compute_uint8( CRC32Ctx* ctx, uint8_t data )
{
COMPUTE( ctx->crc, data );
ctx->length++;
}
void crc32_stream_finilize( CRC32Ctx* ctx )
{
uint32_t len = ctx->length;
for( ; len != 0; len >>= 8 )
{
COMPUTE( ctx->crc, len & 0xFF );
}
ctx->crc = ~ctx->crc;
}
/*** pseudo code ***/
CRC32Ctx crc;
crc32_stream_init(&crc);
while((just_received_buffer_len = received_anything()))
{
for(int i = 0; i < just_received_buffer_len; i++)
{
crc32_stream_compute_uint8(&crc, buf[i]); // assuming buf is uint8_t*
}
}
crc32_stream_finilize(&crc);
printf("%x", crc.crc); // ta daaa
CRC
adler32, available in the zlib headers, is advertised as being significantly faster than crc32, while being only slightly less accurate.
CRC32 is probably good enough, although there's a small chance you might get a collision, such that a file that has been modified might look like it hasn't been because the two versions generate the same checksum. To avoid this possibility I'd therefore suggest using MD5, which will easily be fast enough, and the chances of a collision occurring is reduced to the point where it's almost infinitessimal.
As others have said, with lots of small files your real performance bottleneck is going to be I/O so the issue is dealing with that. If you post up a few more details somebody will probably suggest a way of sorting that out as well.
Your most important requirement is "to check if the content changed".
If it most important that ANY change in the file be detected, MD-5, SHA-1 or even SHA-256 should be your choice.
Given that you indicated that the checksum NOT be cryptographically good, I would recommend CRC-32 for three reasons. CRC-32 gives good hamming distances over an 8K file. CRC-32 will be at least an order of magnitude faster than MD-5 to calculate (your second requirement). Sometimes as important, CRC-32 only requires 32 bits to store the value to be compared. MD-5 requires 4 times the storage and SHA-1 requires 5 times the storage.
BTW, any technique will be strengthened by prepending the length of the file when calculating the hash.
According to the Wiki page pointed to by Luke, MD5 is actually faster than CRC32!
I have tried this myself by using Python 2.6 on Windows Vista, and got the same result.
Here are some results:
crc32: 162.481544276 MBps
md5: 224.489791549 MBps
crc32: 168.332996575 MBps
md5: 226.089336532 MBps
crc32: 155.851515828 MBps
md5: 194.943289532 MBps
I am thinking about the same question as well, and I'm tempted to use the Rsync's variation of Adler-32 for detecting file differences.
Just a postscript to the above; jpegs use lossy compression and the extent of the compression may depend upon the program used to create the jpeg, the colour pallette and/or bit-depth on the system, display gamma, graphics card and user-set compression levels/colour settings. Therefore, comparing jpegs built on different computers/platforms or using different software will be very difficult at the byte level.
This is 5 times faster than CCITT and makes exactly the same job:
Python:
def crc16_fast(data: bytearray, length):
crc = 0xCACA
for i in range(length):
crc ^= data[i]
return crc
C:
uint16_t crc16_fast(const uint16_t* data, size_t length)
{
uint16_t crc = 0xCACA;
for (size_t i = 0; i < length; i++)
crc ^= data[i];
return crc;
}

Resources