Algorithm for base-10 numeric display - minimum changes per refresh - algorithm

Quick Summary:
I'm looking for an algorithm to display a four-digit speed signal in such a way that the minimum number of (decimal) digits are changed each time the display is updated.
For example:
Filtered
Signal Display
--------------------
0000 0000
2345 2000
2345 2300
2345 2340
0190 0340
0190 0190
0190 0190
Details:
I'm working on a project in which I need to display a speed signal (between 0 and 3000 RPM) on a four-digit LCD display. The ideal display solution would be an analog gauge, but I'm stuck with the digital display. The display will be read by a machine operator, and I would like it to be as pleasant to read as possible.
The operator doesn't really care about the exact value of the signal. He will want to know what the value is (to the nearest 10 RPM), and he will want to see it go up and down in response to changes in the operation of the machine. He will not want to see it jumping all over the place.
Here is what I have done so far:
Round the number to the nearest 10 RPM so that the last digit always reads 0
Filter the signal so that electrical noise and normal sensor fluctuations don't cause the reading to jump around more than 10 RPM at a time.
Added a +/-10 RPM hysteresis to the signal to avoid the cases where it would wobble over the same value (for example: 990 - 1000)
This has cleaned things up nicely when the signal is steady (about 75% of the time), but I still see a lot of unnecessary variation in the signal when it is moving from one steady state to another. As the signal changes from 100 RPM to 1000 RPM (for example), it passes through a lot of numbers along the way. Since it takes a moment to actually read and comprehend the number, there seems to be little point in hitting all of those intermediate states. I tried simply reducing the update rate of the display, but that did not produce satisfactory results. It made the display "feel" sluggish and jumpy, all at the same time. There would be a noticeable delay before the numbers would change, and then they would move in big leaps (100, 340, 620, 980, 1000).
Proposal:
I would like the display to behave as shown in the example:
The display is updated twice per second
A transition from one steady state to another should not take longer than 2 seconds.
If the input signal is higher than the currently displayed value, the displayed signal should increase, but it should never go higher than the input signal value.
If the input signal is lower than the currently displayed value, the displayed signal should decrease, but it should never go lower than the input signal value.
The minimum number of digits should be changed per update (preferably only one digit)
Higher-order digits should be changed first, so that the difference between the display signal and the input signal is reduced as quickly as possible
Can you come up with, or do you know of an algorithm which will output the "proper" 4-digit decimal number according to the above rules?
The function prototype, in pseudo-code, would look something like this:
int GetDisplayValue(int currentDisplayValue, int inputSignal)
{
//..
}
Sorry for the wall of text. I wanted to document my progress to date so that anyone answering the question would avoid covering ground that I've already been through.

If you do not need the data expressed by the 4th digit, and are strictly bound to a 4 digit display, have you considered using the 4th digit as an increase/decrease indicator? Flash some portion of the top or bottom of the zero at 2Hz* to indicate that the next change of the gauge will be an increase or decrease.
I think you could also do well to make a good model of the response of your device, whatever it is, to adjustments, and use that model to extrapolate the target number based on the first half second of the two second stabilization process.
*this assumes that you have the two updates per second you posited. Most 4 digit displays are multiplexed, so you could probably flash it at a much higher frequency with a little driver tweaking.

I think your proposal to change one digit at a time is strange, because it actually provides the user with misinformation... what I would consider would be actually to add much MORE state changes, and implement it so that whenever the signal changes, they gauge moves towards the new value in increments of one. This would provide an analog-gauge like experience and "animation" of the change; the operator would very soon recognize subconsciously that digits rotating in sequence 0,1,2... denote increasing speed and 9,8,7,... decreasing speed.
E.g.:
Filtered signal Display
0000 0000
2345 0001
0002
...
2345
Hysteresis, which you have implemented, is of course very good for the stable state.

This is a delicate question, and my answer does not cover the algorithmic aspect.
I believe that the behaviour that you represent in the table at the beginning of your posting is a very bad idea. Lines 2 and 5 display data points that are and never were in the data, i.e. wrong data, for the sake of user experience. This could be a poor choice in the domain of machine operation.
A lower update rate may "feel sluggish" but is well defined (only "real" data and at most n milliseconds old). A faster update rate will display many intermediate values, but the most significant digits shouldn't change to quickly. Both are easier to test than any pretty false value generation.

This will incorporate more or less slowly the sensor value into the displayed value:
display = ( K * sensor + (256 - K) * display ) >> 8
Choose K between 0 (display never updated) and 256 (display always equal to sensor).

Related

Get all possible valid positions of ships in battleship game

I'm creating probability assistant for Battleship game - in essence, for given game state (field state and available ships), it would produce field where all free cells will have probability of hit.
My current approach is to do a monte-carlo like computation - get random free cell, get random ship, get random ship rotation, check if this placement is valid, if so continue with next ship from available set. If available set is empty, add how the ships were set to output stack. Redo this multiple times, use outputs to compute probability of each cell.
Is there sane algorithm to process all possible ship placements for given field state?
An exact solution is possible. But does not qualify as sane in my books.
Still, here is the idea.
There are many variants of the game, but let's say that we start with a worst case scenario of 1 ship of size 5, 2 of size 4, 3 of size 3 and 4 of size 2.
The "discovered state" of the board is all spots where shots have been taken, or ships have been discovered, plus the number of remaining ships. The discovered state naively requires 100 bits for the board (10x10, any can be shot) plus 1 bit for the count of remaining ships of size 5, 2 bits for the remaining ships of size 4, 2 bits for remaining ships of size 3 and 3 bits for remaining ships of size 2. This makes 108 bits, which fits in 14 bytes.
Now conceptually the idea is to figure out the map by shooting each square in turn in the first row, the second row, and so on, and recording the game state along with transitions. We can record the forward transitions and counts to find how many ways there are to get to any state.
Then find the end state of everything finished and all ships used and walk the transitions backwards to find how many ways there are to get from any state to the end state.
Now walk the data structure forward, knowing the probability of arriving at any state while on the way to the end, but this time we can figure out the probability of each way of finding a ship on each square as we go forward. Sum those and we have our probability heatmap.
Is this doable? In memory, no. In a distributed system it might be though.
Remember that I said that recording a state took 14 bytes? Adding a count to that takes another 8 bytes which takes us to 22 bytes. Adding the reverse count takes us to 30 bytes. My back of the envelope estimate is that at any point in our path there are on the order of a half-billion states we might be in with various ships left, killed ships sticking out and so on. That's 15 GB of data. Potentially for each of 100 squares. Which is 1.5 terabytes of data. Which we have to process in 3 passes.

FMOD frequency analysis/normalisation

I am using the FMOD library to apply FFT to an audio stream, providing me with a constantly updating fixed number of frequency bins. Each bin represents an equal frequency range, with a value between 0 and 1 to represent the intensity of this range from the processed audio. FMOD documentation states that these values can be represented in decibels, where val is the value between 0 and 1:
Decibels = 10.0f * (float)log10(val) * 2.0f
I am attempting to make an automated strobe-like beat detecting visualisation. So far, I test at a constant interval to see whether a particular frequency bin's intensity value surpasses a specified boundary - if this is the case, the strobe flashes. Although a pretty crude way of doing this, it works fairly effectively for my requirements.
However, this specified boundary only works effectively when the system/music player's volumes are maximum. When I reduce either volume, the strobe sensitivity is reduced and becomes either very inaccurate or stops flashing completely. I assume that I need to normalise the data in some way so analysis is performed independent of volume, though by scaling the data by 1/value of largest bin the largest value is always maxed out. This surpasses the specified boundary permanently, causing the strobe to flash indefinitely. I can't think how else this can be achieved and have been on a mental block for days - any help or a point in the right direction would be greatly appreciated!
Normalise over a a longer scale. You'll need something like an envelope follower with a long release time.
If you search for 'compressor' source code, or automatic gain control something will definitely turn up.
But broadly in pseudo C++, and working on your incoming audio (the time-domain signal before the FFT):
auto instant_level = std::abs(signal);
peak_level *= 0.99f;
peak_level = peak_level > instant_level ? peak_level : instant_level;
Now peak_level decays slowly over time. And you can use this to calculate a gain factor to normalize your incoming audio. Adjust the 0.99f as required for a sensible decay time and for the correct sample rate.
There's also a Signal Processing stack exchange site where you'll get quicker answers to these kinds of questions (although occasionally with an almost incomprehensible piece of algebra attached :) )

In matlab, speed up cross correlation

I have a long time series with some repeating and similar looking signals in it (not entirely periodical). The length of the time series is about 60000 samples. To identify the signals, I take out one of them, having a length of around 1000 samples and move it along my timeseries data sample by sample, and compute cross-correlation coefficient (in Matlab: corrcoef). If this value is above some threshold, then there is a match.
But this is excruciatingly slow (using 'for loop' to move the window).
Is there a way to speed this up, or maybe there is already some mechanism in Matlab for this ?
Many thanks
Edited: added information, regarding using 'xcorr' instead:
If I use 'xcorr', or at least the way I have used it, I get the wrong picture. Looking at the data (first plot), there are two types of repeating signals. One marked by red rectangles, whereas the other and having much larger amplitudes (this is coherent noise) is marked by a black rectangle. I am interested in the first type. Second plot shows the signal I am looking for, blown up.
If I use 'xcorr', I get the third plot. As you see, 'xcorr' gives me the wrong signal (there is in fact high cross correlation between my signal and coherent noise).
But using "'corrcoef' and moving the window, I get the last plot which is the correct one.
There maybe a problem of normalization when using 'xcorr', but I don't know.
I can think of two ways to speed things up.
1) make your template 1024 elements long. Suddenly, correlation can be done using FFT, which is significantly faster than DFT or element-by-element multiplication for every position.
2) Ask yourself what it is about your template shape that you really care about. Do you really need the very high frequencies, or are you really after lower frequencies? If you could re-sample your template and signal so it no longer contains any frequencies you don't care about, it will make the processing very significantly faster. Steps to take would include
determine the highest frequency you care about
filter your data so higher frequencies are blocked
resample the resulting data at a lower sampling frequency
Now combine that with a template whose size is a power of 2
You might find this link interesting reading.
Let us know if any of the above helps!
Your problem seems like a textbook example of cross-correlation. Therefore, there's no good reason using any solution other than xcorr. A few technical comments:
xcorr assumes that the mean was removed from the two cross-correlated signals. Furthermore, by default it does not scale the signals' standard deviations. Both of these issues can be solved by z-scoring your two signals: c=xcorr(zscore(longSig,1),zscore(shortSig,1)); c=c/n; where n is the length of the shorter signal should produce results equivalent with your sliding window method.
xcorr's output is ordered according to lags, which can obtained as in a second output argument ([c,lags]=xcorr(..). Always plot xcorr results by plot(lags,c). I recommend trying a synthetic signal to verify that you understand how to interpret this chart.
xcorr's implementation already uses Discere Fourier Transform, so unless you have unusual conditions it will be a waste of time to code a frequency-domain cross-correlation again.
Finally, a comment about terminology: Correlating corresponding time points between two signals is plain correlation. That's what corrcoef does (it name stands for correlation coefficient, no 'cross-correlation' there). Cross-correlation is the result of shifting one of the signals and calculating the correlation coefficient for each lag.

How can I detect these audio abnormalities?

iOS has an issue recording through some USB audio devices. It cannot be reliably reproduced (happens every 1 in ~2000-3000 records in batches and silently disappears), and we currently manually check our audio for any recording issues. It results in small numbers of samples (1-20) being shifted by a small number that sounds like a sort of 'crackle'.
They look like this:
closer:
closer:
another, single sample error elsewhere in the same audio file:
The question is, how can these be algorithmically be detected (assuming direct access to samples) whilst not triggering false positives on high frequency audio with waveforms like this:
Bonus points: after determining as many errors as possible, how can the audio be 'fixed'?
Dirty audio file - pictured
Another dirty audio file
Clean audio with valid high frequency - pictured
More bonus points: what could be causing this issue in the iOS USB audio drivers/hardware (assuming it is there).
I do not think there is an out of the box solution to find the disturbances, but here is one (non standard) way of tackling the problem. Using this, I could find most intervals and I only got a small number of false positives, but the algorithm could certainly use some fine tuning.
My idea is to find the start and end point of the deviating samples. The first step should be to make these points stand out more clearly. This can be done by taking the logarithm of the data and taking the differences between consecutive values.
In MATLAB I load the data (in this example I use dirty-sample-other.wav)
y1 = wavread('dirty-sample-pictured.wav');
y2 = wavread('dirty-sample-other.wav');
y3 = wavread('clean-highfreq.wav');
data = y2;
and use the following code:
logdata = log(1+data);
difflogdata = diff(logdata);
So instead of this plot of the original data:
we get:
where the intervals we are looking for stand out as a positive and negative spike. For example zooming in on the largest positive value in the plot of logarithm differences we get the following two figures. One for the original data:
and one for the difference of logarithms:
This plot could help with finding the areas manually but ideally we want to find them using an algorithm. The way I did this was to take a moving window of size 6, computing the mean value of the window (of all points except the minimum value), and compare this to the maximum value. If the maximum point is the only point that is above the mean value and at least twice as large as the mean it is counted as a positive extreme value.
I then used a threshold of counts, at least half of the windows moving over the value should detect it as an extreme value in order for it to be accepted.
Multiplying all points with (-1) this algorithm is then run again to detect the minimum values.
Marking the positive extremes with "o" and negative extremes with "*" we get the following two plots. One for the differences of logarithms:
and one for the original data:
Zooming in on the left part of the figure showing the logarithmic differences we can see that most extreme values are found:
It seems like most intervals are found and there are only a small number of false positives. For example running the algorithm on 'clean-highfreq.wav' I only find one positive and one negative extreme value.
Single values that are falsely classified as extreme values could perhaps be weeded out by matching start and end-points. And if you want to replace the lost data you could use some kind of interpolation using the surrounding data-points, perhaps even a linear interpolation will be good enough.
Here is the MATLAB-code I used:
function test20()
clc
clear all
y1 = wavread('dirty-sample-pictured.wav');
y2 = wavread('dirty-sample-other.wav');
y3 = wavread('clean-highfreq.wav');
data = y2;
logdata = log(1+data);
difflogdata = diff(logdata);
figure,plot(data),hold on,plot(data,'.')
figure,plot(difflogdata),hold on,plot(difflogdata,'.')
figure,plot(data),hold on,plot(data,'.'),xlim([68000,68200])
figure,plot(difflogdata),hold on,plot(difflogdata,'.'),xlim([68000,68200])
k = 6;
myData = difflogdata;
myPoints = findPoints(myData,k);
myData2 = -difflogdata;
myPoints2 = findPoints(myData2,k);
figure
plotterFunction(difflogdata,myPoints>=k,'or')
hold on
plotterFunction(difflogdata,myPoints2>=k,'*r')
figure
plotterFunction(data,myPoints>=k,'or')
hold on
plotterFunction(data,myPoints2>=k,'*r')
end
function myPoints = findPoints(myData,k)
iterationVector = k+1:length(myData);
myPoints = zeros(size(myData));
for i = iterationVector
subVector = myData(i-k:i);
meanSubVector = mean(subVector(subVector>min(subVector)));
[maxSubVector, maxIndex] = max(subVector);
if (sum(subVector>meanSubVector) == 1 && maxSubVector>2*meanSubVector)
myPoints(i-k-1+maxIndex) = myPoints(i-k-1+maxIndex) +1;
end
end
end
function plotterFunction(allPoints,extremeIndices,markerType)
extremePoints = NaN(size(allPoints));
extremePoints(extremeIndices) = allPoints(extremeIndices);
plot(extremePoints,markerType,'MarkerSize',15),
hold on
plot(allPoints,'.')
plot(allPoints)
end
Edit - comments on recovering the original data
Here is a slightly zoomed out view of figure three above: (the disturbance is between 6.8 and 6.82)
When I examine the values, your theory about the data being mirrored to negative values does not seem to fit the pattern exactly. But in any case, my thought about just removing the differences is certainly not correct. Since the surrounding points do not seem to be altered by the disturbance, I would probably go back to the original idea of not trusting the points within the affected region and instead using some sort of interpolation using the surrounding data. It seems like a simple linear interpolation would be a quite good approximation in most cases.
To answer the question of why it happens -
A USB audio device and host are not clock synchronous - that is to say that the host cannot accurately recover the relationship between the host's local clock and the word-clock of the ADC/DAC on the audio interface. Various techniques do exist for clock-recovery with various degrees of effectiveness. To add to the problem, the bus clock is likely to be unrelated to either of the two audio clocks.
Whilst you might imagine this not to be too much of a concern for audio receive - audio capture callbacks could happen when there is data - audio interfaces are usually bi-directional and the host will be rendering audio at regular interval, which the other end is potentially consuming at a slightly different rate.
In-between are several sets of buffers, which can over- or under-run, which is what looks to be happening here; the interval between it happening certainly seems about right.
You might find that changing USB audio device to one built around a different chip-set (or, simply a different local oscillator) helps.
As an aside both IEEE1394 audio and MPEG transport streams have the same clock recovery requirement. Both of them solve the problem with by embedding a local clock reference packet into the serial bitstream in a very predictable way which allows accurate clock recovery on the other end.
I think the following algorithm can be applied to samples in order to determine a potential false positive:
First, scan for high amount of high frequency, either via FFT'ing the sound block by block (256 values maybe), or by counting the consecutive samples above and below zero. The latter should keep track of maximum consecutive above zero, maximum consecutive below zero, the amount of small transitions around zero and the current volume of the block (0..1 as Audacity displays it). Then, if the maximum consecutive is below 5 (sampling at 44100, and zeroes be consecutive, while outstsanding samples are single, 5 responds to 4410Hz frequency, which is pretty high), or the sum of small transitions' lengths is above a certain value depending on maximum consecutive (I believe the first approximation would be 3*5*block size/distance between two maximums, which roughly equates to period of the loudest FFT frequency. Also it should be measured both above and below threshold, as we can end up with an erroneous peak, which will likely be detected by difference between main tempo measured on below-zero or above-zero maximums, also by std-dev of peaks. If high frequency is dominant, this block is eligible only for zero-value testing, and a special means to repair the data will be needed. If high frequency is significant, that is, there is a dominant low frequency detected, we can search for peaks bigger than 3.0*high frequency volume, as well as abnormal zeroes in this block.
Also, your gaps seem to be either highly extending or plain zero, with high extends to be single errors, and zero errors range from 1-20. So, if there is a zero range with values under 0.02 absolute value, which is directly surrounded by values of 0.15 (a variable to be finetuned) or higher absolute value AND of the same sign, count this point as an error. Single values that stand out can be detected if you calculate 2.0*(current sample)-(previous sample)-(next sample) and if it's above a certain threshold (0.1+high frequency volume, or 3.0*high frequency volume, whichever is bigger), count this as an error and average.
What to do with zero gaps found - we can copy values from 1 period backwards and 1 period forwards (averaging), where "period" is of the most significant frequency of the FFT of the block. If the "period" is smaller than the gap (say we've detected a gap of zeroes in a high-pitched part of the sound), use two or more periods, so the source data will all be valid (in this case, no averaging can be done, as it's possible that the signal 2 periods forward from the gap and 2 periods back will be in counterphase). If there are more than one frequency of about equal amplitude, we can plain sample these with correct phases, cutting the rest of less significant frequencies altogether.
The outstanding sample should IMO just be averaged by 2-4 surrounding samples, as there seems to be only a single sample ever encountered in your sound files.
The discrete wavelet transform (DWT) may be the solution to your problem.
A FFT calculation is not very useful in your case since its an average representation of relative frequency content over the entire duration of the signal, and thus impossible to detect momentary changes. The dicrete short time frequency transform (STFT) tries to tackle this by computing the DFT for short consecutive time-blocks of the signal, the length of which is determine by the length (and shape) of a window, but since the resolution of the DFT is dependent on the data/block-length, there is a trade-off between resolution in freqency OR in time, and finding this magical fixed window-size can be tricky!
What you want is a time-frequency analysis method with good time resolution for high-frequency events, and good frequency resolution for low-frequency events... Enter the discrete wavelet transform!
There are numerous wavelet transforms for different applications and as you might expect, it's computationally heavy. The DWT may not be practical solution to your problem, but it's worth considering. Good luck with your problem. Some friday-evening reading:
http://klapetek.cz/wdwt.html
http://etd.lib.fsu.edu/theses/available/etd-11242003-185039/unrestricted/09_ds_chapter2.pdf
http://en.wikipedia.org/wiki/Wavelet_transform
http://en.wikipedia.org/wiki/Discrete_wavelet_transform
You can try the following super-simple approach (maybe it's enough):
Take each point in your wave-form and subtract its predecessor (look at the changes from one point to the next).
Look at the distribution of these changes and find their standard deviation.
If any given difference is beyond X times this standard deviation (either above or below), flag it as a problem.
Determine the best value for X by playing with it and seeing how well it performs.
Most "problems" should come as a pair of two differences beyond your cutoff, one going up, and one going back down.
To stick with the super-simple approach, you can then fix the data by just interpolating linearly between the last good point before your problem-section and the first good point after. (Make sure you don't just delete the points as this will influence (raise) the pitch of your audio.)

About random number sequence generation

I am new to randomized algorithms, and learning it myself by reading books. I am reading a book Data structures and Algorithm Analysis by Mark Allen Wessis
.
Suppose we only need to flip a coin; thus, we must generate a 0 or 1
randomly. One way to do this is to examine the system clock. The clock
might record time as an integer that counts the number of seconds
since January 1, 1970 (atleast on Unix System). We could then use the
lowest bit. The problem is that this does not work well if a sequence
of random numbers is needed. One second is a long time, and the clock
might not change at all while the program is running. Even if the time
were recorded in units of microseconds, if the program were running by
itself the sequence of numbers that would be generated would be far
from random, since the time between calls to the generator would be
essentially identical on every program invocation. We see, then, that
what is really needed is a sequence of random numbers. These numbers
should appear independent. If a coin is flipped and heads appears,
the next coin flip should still be equally likely to come up heads or
tails.
Following are question on above text snippet.
In above text snippet " for count number of seconds we could use lowest bit", author is mentioning that this does not work as one second is a long time,
and clock might not change at all", my question is that why one second is long time and clock will change every second, and in what context author is mentioning
that clock does not change? Request to help to understand with simple example.
How author is mentioning that even for microseconds we don't get sequence of random numbers?
Thanks!
Programs using random (or in this case pseudo-random) numbers usually need plenty of them in a short time. That's one reason why simply using the clock doesn't really work, because The system clock doesn't update as fast as your code is requesting new numbers, therefore qui're quite likely to get the same results over and over again until the clock changes. It's probably more noticeable on Unix systems where the usual method of getting the time only gives you second accuracy. And not even microseconds really help as computers are way faster than that by now.
The second problem you want to avoid is linear dependency of pseudo-random values. Imagine you want to place a number of dots in a square, randomly. You'll pick an x and a y coordinate. If your pseudo-random values are a simple linear sequence (like what you'd obtain naïvely from a clock) you'd get a diagonal line with many points clumped together in the same place. That doesn't really work.
One of the simplest types of pseudo-random number generators, the Linear Congruental Generator has a similar problem, even though it's not so readily apparent at first sight. Due to the very simple formula
you'll still get quite predictable results, albeit only if you pick points in 3D space, as all numbers lies on a number of distinct planes (a problem all pseudo-random generators exhibit at a certain dimension):
Computers are fast. I'm over simplifying, but if your clock speed is measured in GHz, it can do billions of operations in 1 second. Relatively speaking, 1 second is an eternity, so it is possible it does not change.
If your program is doing regular operation, it is not guaranteed to sample the clock at a random time. Therefore, you don't get a random number.
Don't forget that for a computer, a single second can be 'an eternity'. Programs / algorithms are often executed in a matter of milliseconds. (1000ths of a second. )
The following pseudocode:
for(int i = 0; i < 1000; i++)
n = rand(0, 1000)
fills n a thousand times with a random number between 0 and 1000. On a typical machine, this script executes almost immediatly.
While you typically only initialize the seed at the beginning:
The following pseudocode:
srand(time());
for(int i = 0; i < 1000; i++)
n = rand(0, 1000)
initializes the seed once and then executes the code, generating a seemingly random set of numbers. The problem arises then, when you execute the code multiple times. Lets say the code executes in 3 milliseconds. Then the code executes again in 3 millisecnds, but both in the same second. The result is then a same set of numbers.
For the second point: The author probabaly assumes a FAST computer. THe above problem still holds...
He means by that is you are not able to control how fast your computer or any other computer runs your code. So if you suggest 1 second for execution thats far from anything. If you try to run code by yourself you will see that this is executed in milliseconds so even that is not enough to ensure you got random numbers !

Resources