I have the following code to send as audio RTP packet some DTMF digits:
int count=0
for(int j = 0; j < samples; j++)
{
waves = 0;
// dtmf tone 1
waves += sin( ((PI * 2.0f / 8000) * 697.0f) * count );
waves += sin( ((PI * 2.0f / 8000) * 1209.0f) * count);
waves *= 8191.0f; //amplitude
++count;
values[j] = (SInt16)waves;
}
I'm generating the digits programatically. This code basically adds up 2 sinewaves and applies scaling. This will produce 16bit PCM data which can then be encoded. The sample rate is 8K to transmit as RTP packet.
Have I done this correctly?
There are more efficient ways to program it, and it really shouldn't be hard-coded (volume, frequencies, length, etc), but that's roughly correct. I'd use M_PI instead of PI.
Note: count == j, waves = 0 is useless (change first += to =), etc.
Slightly off-topic, but if you are sending the data out as RTP, then have you checked whether the remote peer supports RFC2833? If so, you can pass the DTMF digits using RFC2833-compliant RTP packets and avoid lots of work a both ends of the link.
Mike
Related
I am reading data from a Riff wav fmt file, I have an array of the data chunk DataBuffer of the wav file, how can I convert the number of bytes read of the data to the number of seconds read from the wav file.
int size_buffer = (Subchunk2Size / (NumOfChan * bitsPerSample / 8));
FILE* WavResult = fopen(FileNom, "rb");
u8* DataBuffer = new u8[size_buffer];
size_t nRead = fread(DataBuffer, sizeof DataBuffer[0], size_buffer, WavResult);
I think you mixed up things a bit. Assuming the field names in the WAV header as described in http://soundfile.sapp.org/doc/WaveFormat :
ChunkID - "RIFF"
ChunkSize
Format - "WAVE"
Subchunk1ID - "fmt "
Subchunk1Size
AudioFormat
NumChannels
SampleRate
ByteRate
BlockAlign
BitsPerSample
Subchunk2ID - "data"
Subchunk2Size
data
This line of yours:
int size_buffer = (Subchunk2Size / (NumOfChan * bitsPerSample / 8));
calculates a number of samples in a single channel. Or a number of blocks, where the block is a structure that contains one sample for each channel. If you use that for allocating memory for bytes from data chunk, then it will be enough only in case of 8-bit mono audio.
If allocating memory for bytes is really what you want, then simply use Subchunk2Size as the size.
If you want to allocate memory for samples, then it will differ depending if the audio is 8-bit or 16-bit (I'm ignoring other possibilities). For 8-bit:
const uint32_t num_of_samples = Subchunk2Size / (BitsPerSample / 8);
uint8_t *samples = new uint8_t[num_of_samples];
and for 16-bit:
const uint32_t num_of_samples = Subchunk2Size / (BitsPerSample / 8);
int16_t *samples = new int16_t[num_of_samples];
Personally, I'd rather use std::vector instead of c-arrays:
const uint32_t num_of_samples = Subchunk2Size / (BitsPerSample / 8);
std::vector<int16_t> samples;
samples.resize(num_of_samples); // could be done in the constructor, but I am afraid of vector constructors ;-)
I also assume here that the audio is in the most popular encoding (I think), i.e., unsigned for 8-bit and signed for 16-bit. I'm also ignoring the issue of endianness.
But back to the number of seconds. We can calculate that using the total number of blocks and SampleRate. SampleRate tells us how many samples (in a single channel) there are per second. Or in another words, how many blocks there are per second. So the number of seconds is:
const double num_of_seconds = 1.0 * num_of_blocks / SampleRate;
You can calculate the number of blocks using the formula from your first line:
const uint32_t num_of_blocks = Subchunk2Size / (NumChannels * BitsPerSample / 8);
or, as we already have num_of_samples, which is the total number of samples from all channels, we can just divide that by NumChannels:
const uint32_t num_of_blocks = num_of_samples / NumChannels;
And lastly, in case all you wanted was really just to get the number of seconds from the number of bytes, then there are 2 options. You can calculate the block size:
const int block_size = NumChannels * BitsPerSample / 8;
which should be essentially the same as BlockAlign, and then divide Subchunk2Size by it, to get the number of blocks, and again by SampleRate to get the number of seconds:
const double num_of_seconds = 1.0 * Subchunk2Size / block_size / SampleRate;
// or
const double num_of_seconds = 1.0 * Subchunk2Size / BlockAlign / SampleRate;
Or you can use ByteRate, which is the number of bytes per second:
const double num_of_seconds = 1.0 * Subchunk2Size / ByteRate;
I've been trying to create a generalized Gradient Noise generator (which doesn't use the hash method to get gradients). The code is below:
class GradientNoise {
std::uint64_t m_seed;
std::uniform_int_distribution<std::uint8_t> distribution;
const std::array<glm::vec2, 4> vector_choice = {glm::vec2(1.0, 1.0), glm::vec2(-1.0, 1.0), glm::vec2(1.0, -1.0),
glm::vec2(-1.0, -1.0)};
public:
GradientNoise(uint64_t seed) {
m_seed = seed;
distribution = std::uniform_int_distribution<std::uint8_t>(0, 3);
}
// 0 -> 1
// just passes the value through, origionally was perlin noise activation
double nonLinearActivationFunction(double value) {
//return value * value * value * (value * (value * 6.0 - 15.0) + 10.0);
return value;
}
// 0 -> 1
//cosine interpolation
double interpolate(double a, double b, double t) {
double mu2 = (1 - cos(t * M_PI)) / 2;
return (a * (1 - mu2) + b * mu2);
}
double noise(double x, double y) {
std::mt19937_64 rng;
//first get the bottom left corner associated
// with these coordinates
int corner_x = std::floor(x);
int corner_y = std::floor(y);
// then get the respective distance from that corner
double dist_x = x - corner_x;
double dist_y = y - corner_y;
double corner_0_contrib; // bottom left
double corner_1_contrib; // top left
double corner_2_contrib; // top right
double corner_3_contrib; // bottom right
std::uint64_t s1 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y) + m_seed);
std::uint64_t s2 = ((std::uint64_t(corner_x) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s3 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y + 1) + m_seed);
std::uint64_t s4 = ((std::uint64_t(corner_x + 1) << 32) + std::uint64_t(corner_y) + m_seed);
// each xy pair turns into distance vector from respective corner, corner zero is our starting corner (bottom
// left)
rng.seed(s1);
corner_0_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y});
rng.seed(s2);
corner_1_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x, dist_y - 1});
rng.seed(s3);
corner_2_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y - 1});
rng.seed(s4);
corner_3_contrib = glm::dot(vector_choice[distribution(rng)], {dist_x - 1, dist_y});
double u = nonLinearActivationFunction(dist_x);
double v = nonLinearActivationFunction(dist_y);
double x_bottom = interpolate(corner_0_contrib, corner_3_contrib, u);
double x_top = interpolate(corner_1_contrib, corner_2_contrib, u);
double total_xy = interpolate(x_bottom, x_top, v);
return total_xy;
}
};
I then generate an OpenGL texture to display with like this:
int width = 1024;
int height = 1024;
unsigned char *temp_texture = new unsigned char[width*height * 4];
double octaves[5] = {2,4,8,16,32};
for( int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
double d_noise = 0;
d_noise += temp_1.noise(j/octaves[0], i/octaves[0]);
d_noise += temp_1.noise(j/octaves[1], i/octaves[1]);
d_noise += temp_1.noise(j/octaves[2], i/octaves[2]);
d_noise += temp_1.noise(j/octaves[3], i/octaves[3]);
d_noise += temp_1.noise(j/octaves[4], i/octaves[4]);
d_noise/=5;
uint8_t noise = static_cast<uint8_t>(((d_noise * 128.0) + 128.0));
temp_texture[j*4 + (i * width * 4) + 0] = (noise);
temp_texture[j*4 + (i * width * 4) + 1] = (noise);
temp_texture[j*4 + (i * width * 4) + 2] = (noise);
temp_texture[j*4 + (i * width * 4) + 3] = (255);
}
}
Which give good results:
But gprof is telling me that the Mersenne twister is taking up 62.4% of my time and growing with larger textures. Nothing else individual takes any where near as much time. While the Mersenne twister is fast after initialization, the fact that I initialize it every time I use it seems to make it pretty slow.
This initialization is 100% required for this to make sure that the same x and y generates the same gradient at each integer point (so you need either a hash function or seed the RNG each time).
I attempted to change the PRNG to both the linear congruential generator and Xorshiftplus, and while both ran orders of magnitude faster, they gave odd results:
LCG (one time, then running 5 times before using)
Xorshiftplus
After one iteration
After 10,000 iterations.
I've tried:
Running the generator several times before utilizing output, this results in slow execution or simply different artifacts.
Using the output of two consecutive runs after initial seed to seed the PRNG again and use the value after wards. No difference in result.
What is happening? What can i do to get faster results that are of the same quality as the mersenne twister?
OK BIG UPDATE:
I don't know why this works, I know it has something to do with the prime number utilized, but after messing around a bit, it appears that the following works:
Step 1, incorporate the x and y values as seeds separately (and incorporate some other offset value or additional seed value with them, this number should be a prime/non trivial factor)
Step 2, Use those two seed results into seeding the generator again back into the function (so like geza said, the seeds made were bad)
Step 3, when getting the result, instead of using modulo number of items (4) trying to get, or & 3, modulo the result by a prime number first then apply & 3. I'm not sure if the prime being a mersenne prime matters or not.
Here is the result with prime = 257 and xorshiftplus being used! (note I used 2048 by 2048 for this one, the others were 256 by 256)
LCG is known to be inadequate for your purpose.
Xorshift128+'s results are bad, because it needs good seeding. And providing good seeding defeats the whole purpose of using it. I don't recommend this.
However, I recommend using an integer hash. For example, one from Bob's page.
Here's a result of the first hash of that page, it looks OK to me, and it is fast (I think it is much faster than Mersenne Twister):
Here's the code I've written to generate this:
#include <cmath>
#include <stdio.h>
unsigned int hash(unsigned int a) {
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
unsigned int ivalue(int x, int y) {
return hash(y<<16|x)&0xff;
}
float smooth(float x) {
return 6*x*x*x*x*x - 15*x*x*x*x + 10*x*x*x;
}
float value(float x, float y) {
int ix = floor(x);
int iy = floor(y);
float fx = smooth(x-ix);
float fy = smooth(y-iy);
int v00 = ivalue(iy+0, ix+0);
int v01 = ivalue(iy+0, ix+1);
int v10 = ivalue(iy+1, ix+0);
int v11 = ivalue(iy+1, ix+1);
float v0 = v00*(1-fx) + v01*fx;
float v1 = v10*(1-fx) + v11*fx;
return v0*(1-fy) + v1*fy;
}
unsigned char pic[1024*1024];
int main() {
for (int y=0; y<1024; y++) {
for (int x=0; x<1024; x++) {
float v = 0;
for (int o=0; o<=9; o++) {
v += value(x/64.0f*(1<<o), y/64.0f*(1<<o))/(1<<o);
}
int r = rint(v*0.5f);
pic[y*1024+x] = r;
}
}
FILE *f = fopen("x.pnm", "wb");
fprintf(f, "P5\n1024 1024\n255\n");
fwrite(pic, 1, 1024*1024, f);
fclose(f);
}
If you want to understand, how a hash function work (or better yet, which properties a good hash have), check out Bob's page, for example this.
You (unknowingly?) implemented a visualization of PRNG non-random patterns. That looks very cool!
Except Mersenne Twister, all your tested PRNGs do not seem fit for your purpose. As I have not done further tests myself, I can only suggest to try out and measure further PRNGs.
The randomness of LCGs are known to be sensitive to the choice of their parameters. In particular, the period of a LCG is relative to the m parameter - at most it will be m (your prime factor) & for many values it can be less.
Similarly, the careful parameters selection is required to get a long period from Xorshift PRNGs.
You've noted that some PRNGs give good procedural generation results while other do not. In order to isolate the cause, I would factor out the proc gen stuff & examine the PRNG output directly. An easy way to visualize the data is to build a grey scale image where each pixel value is a (possibly scaled) random value. For image based stuff, I find this to be an easy way to find stuff that may lead to visual artifacts. Any artifacts you see with this are likely to cause issues with your proc gen output.
Another option is to try something like the Diehard tests. If the aforementioned image test failed to reveal any problems, I might use this just to be sure my PRNG techniques were trustworthy.
Note that your code seeds the PRNG, then generates one pseudorandom number from the PRNG. The reason for the nonrandomness in xorshift128+ that you discovered is that xorshift128+ simply adds the two halves of the seed (and uses the result mod 264 as the generated number) before changing its state (review its source code). This makes that PRNG considerably different from a hash function.
What you see is the practical demonstration of quality of PRNG. Mersenne Twister is one of the best PRNGs with good performance, it passes DIEHARD tests. One should know that generating a random numbers is not an easy computational task, so looking for a better performance will inevitably result in poor quality. LCG is known to be simplest and worst PRNG ever designed and it clearly shows two-dimensional correlation as in your picture. The quality of Xorshift generators largely depend on bitness and parameters. They are definitely worse than Mersenne Twister, but some (xorshift128+) may work good enough to pass BigCrush battery of TestU01 tests.
In other words, if you are making an important physical modelling numerical experiment, you better continue to use Mersenne Twister as known to be a good trade-off between speed and quality and it comes in many standard libraries. On a less important case you may try to use xorshift128+ generator. For an ultimate results you need to use cryptographical-quality PRNG (none of mentioned here may be used for cryptographical purposes).
i am working on entropy , i am getting consecutive frames from .mp4 file , i want to count the entropy of current frame with previous frame , if the entropy between them is not zero than it should check the frame , otherwise it should ignore the frame , it should save the previous frame and take the current frame after 2 sec, if entropy is zero it should ignore it and than again wait for 2 sec Here is my code :
capture.open("recog.mp4");
if (!capture.isOpened()) {
cerr << "can not open camera or video file" << endl;
}
while(1)
{
capture >> current_frame;
if (current_frame.empty())
break;
if (! previous_frame.empty()) {
subtract(current_frame, previous_frame, pre_img);
Mat hist;
int channels[] = {0};
int histSize[] = {32};
float range[] = { 0, 256 };
const float* ranges[] = { range };
calcHist( &pre_img, 1, channels, Mat(), // do not use mask
hist, 1, histSize, ranges,
true, // the histogram is uniform
false );
Mat histNorm = hist / (pre_img.rows * pre_img.cols);
double entropy = 0.0;
for (int i=0; i<histNorm.rows; i++)
{
float binEntry = histNorm.at<float>(i,0);
if (binEntry != 0.0)
{
entropy -= binEntry * log(binEntry);
}
else
{
//ignore the frame andgo for next , but how to code it ? is any function with ignore ?
}
waitKey(10);
current_frame.copyTo(previous_frame);
}
This is counting the entropy of only one image that is current image and it become previous image when the next image come into process , as far my page work told me. It give me error in log2 when i use it like this entropy -= binEntry * log2(binEntry); and can you please help me in telling that how to ignore the frame when the entropy is zero , so that .mp4 continue running and should i need to use cvwaitkey(2) to check .mp4 after 2 sec , mean .mp4is running but i am ignoring the frames
ignore mean when it subtract the current frame from the previous and entropy is 0, than previous frame remain previous , current not become previous , and previous wait 2sec for the next current image , and than perform the task on it
To ignore a certain amount of frames simply read them from the stream.
for(int i=0; i<60; i++)
capture >> current_frame;
If your video has 30fps this would skip 2 seconds of video.
To act in case your entropy is greater than a certain threshold you need to add something like this:
if ( entropy > 1.0 )
{
// do something
}
I used a threshold, because due to noise the entropy probably will never be zero between different frames.
If your compiler does not offer you the log2 function you can simply emulate it as described here.
Is there an efficient (fast) algorithm that will perform bit expansion/duplication?
For example, expand each bit in an 8bit value by 3 (creating a 24bit value):
1101 0101 => 11111100 01110001 11000111
The brute force method that has been proposed is to create a lookup table. In the future, the expansion value may need to be variable. That is, in the above example we are expanding by 3 but may need to expand by some other value(s). This would require multiple lookup tables that I'd like to avoid if possible.
There is a chance to make it quicker than lookup table if arithmetic calculations are for some reason faster than memory access. This may be possible if calculations are vectorized (PPC AltiVec or Intel SSE) and/or if other parts of the program need to use every bit of cache memory.
If expansion factor = 3, only 7 instructions are needed:
out = (((in * 0x101 & 0x0F00F) * 0x11 & 0x0C30C3) * 5 & 0x249249) * 7;
Or other alternative, with 10 instructions:
out = (in | in << 8) & 0x0F00F;
out = (out | out << 4) & 0x0C30C3;
out = (out | out << 2) & 0x249249;
out *= 7;
For other expansion factors >= 3:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = out * ((1 << shift) + 1) & mask;
}
out *= (1 << N) - 1;
Or other alternative, for expansion factors >= 2:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = (out | out << shift) & mask;
}
out *= (1 << N) - 1;
shift and mask values are better to be calculated prior to bit stream processing.
You can do it one input bit at at time. Of course, it will be slower than a lookup table, but if you're doing something like writing for a tiny, 8-bit microcontroller without enough room for a table, it should have the smallest possible ROM footprint.
I am trying to limit my application send rate to 900kbps but the problem is that the protocol I use is message oriented and the messages have very different sizes. I can have messages from 40 bytes all the way up to 125000 bytes and all messages are send as atomic units.
I tried implementing a token bucket buffer but if I set a low bucket size the big packets never get send and a larger bucket will result in a large burst with no rate limiting at all.
This is my small implementation in C:
typedef struct token_buffer {
size_t capacity;
size_t tokens;
double rate;
uint64_t timestamp;
} token_buffer;
static uint64_t time_now()
{
struct timeval ts;
gettimeofday(&ts, NULL);
return (uint64_t)(ts.tv_sec * 1000 + ts.tv_usec/1000);
}
static int token_buffer_init(token_buffer *tbf, size_t max_burst, double rate)
{
tbf->capacity = max_burst;
tbf->tokens = max_burst;
tbf->rate = rate;
tbf->timestamp = time_now();
}
static size_t token_buffer_consume(token_buffer *tbf, size_t bytes)
{
// Update the tokens
uint64_t now = time_now();
size_t delta = (size_t)(tbf->rate * (now - tbf->timestamp));
tbf->tokens = (tbf->capacity < tbf->tokens+delta)?tbf->capacity:tbf->tokens+delta;
tbf->timestamp = now;
fprintf(stdout, "TOKENS %d bytes: %d\n", tbf->tokens, bytes);
if(bytes <= tbf->tokens) {
tbf->tokens -= bytes;
} else {
return -1;
}
return 0;
}
Then somewhere in main():
while(1) {
len = read_msg(&msg, file);
// Loop until we have enough tokens.
// if len is larger than the bucket capacity the loop never ends.
// if the capacity is too large then no rate limit occurs.
while(token_buffer_consume(&tbf,msg, len) != 0) {}
send_to_net(&msg, len);
}
You are limiting your maximum message size by max_burst (which gets assigned to tbf->capacity in the beginning) - since the tbf->tokens never increments beyond that value, bigger messages will never get sent due to this check:
if(bytes <= tbf->tokens) {
tbf->tokens -= bytes;
} else {
return -1;
}
So, the code indeed sets a hard limit on burst to be max_burst - so you should fragment your messages if you want this burst size.
Assuming this is the only place in the code where you can insert the limiter, you might get a better result if you replace the above piece with:
if(tbf->tokens > 0) {
tbf->tokens -= bytes;
} else {
return -1;
}
The semantic will be slightly different, but on average over a long period of time it should get you approximately the rate you are looking for. Of course, if you send 125K in one message over a 1gbps link, one can hardly talk about 900kbps rate - it will be full 1gbps burst of packets, and they will need to be queued somewhere in case there are lower-speed links - hence be prepared to lose some of the packets in that case.
But, depending on your application and the transport network protocol that you are using (TCP/UDP/SCTP/...?) you might want to move the shaping code down the stack - because packets on the network typically are only maximum 1500 bytes anyway (that includes various network/transport protocol headers)
One thing which might be interesting for testing is http://www.linuxfoundation.org/en/Net:Netem - if your objective is trying to tackle the smaller-capacity links. Or, grab a couple of older routers with 1mbps serial ports connected back to back.