What is the idea behind scaling an image using Lanczos? - algorithm

I'm interested in image scaling algorithms and have implemented the bilinear and bicubic methods. However, I have heard of the Lanczos and other more sophisticated methods for even higher quality image scaling, and I am very curious how they work.
Could someone here explain the basic idea behind scaling an image using Lanczos (both upscaling and downscaling) and why it results in higher quality?
I do have a background in Fourier analysis and have done some signal processing stuff in the past, but not with relation to image processing, so don't be afraid to use terms like "frequency response" and such in your answer :)
EDIT: I guess what I really want to know is the concept and theory behind using a convolution filter for interpolation.
(Note: I have already read the Wikipedia article on Lanczos resampling but it didn't have nearly enough detail for me)

The selection of a particular filter for image processing is something of a black art, because the main criterion for judging the result is subjective: in computer graphics, the ultimate question is almost always: "does it look good?". There are a lot of good filters out there, and the choice between the best frequently comes down to a judgement call.
That said, I will go ahead with some theory...
Since you are familiar with Fourier analysis for signal processing, you don't really need to know much more to apply it to image processing -- all the filters of immediate interest are "separable", which basically means you can apply them independently in the x and y directions. This reduces the problem of resampling a (2-D) image to the problem of resampling a (1-D) signal. Instead of a function of time (t), your signal is a function of one of the coordinate axes (say, x); everything else is exactly the same.
Ultimately, the reason you need to use a filter at all is to avoid aliasing: if you are reducing the resolution, you need to filter out high-frequency original data that the new, lower resolution doesn't support, or it will be added to unrelated frequencies instead.
So. While you're filtering out unwanted frequencies from the original, you want to preserve as much of the original signal as you can. Also, you don't want to distort the signal you do preserve. Finally, you want to extinguish the unwanted frequencies as completely as possible. This means -- in theory -- that a good filter should be a "box" function in frequency space: with zero response for frequencies above the cutoff, unity response for frequencies below the cutoff, and a step function in between. And, in theory, this response is achievable: as you may know, a straight sinc filter will give you exactly that.
There are two problems with this. First, a straight sinc filter is unbounded, and doesn't drop off very fast; this means that doing a straightforward convolution will be very slow. Rather than direct convolution, it is faster to use an FFT and do the filtering in frequency space...
However, if you actually do use a straight sinc filter, the problem is that it doesn't actually look very good! As the related question says, perceptually there are ringing artifacts, and practically there is no completely satisfactory way to deal with the negative values that result from "undershoot".
Finally, then: one way to deal with the problem is to start out with a sinc filter (for its good theoretical properties), and tweak it until you have something that also solves your other problems. Specifically, this will get you something like the Lanczos filter:
Lanczos filter: L(x) = sinc(pi x) sinc(pi x/a) box(|x|<a)
frequency response: F[L(x)](f) = box(|f|<1/2) * box(|f|<1/2a) * sinc(2 pi f a)
[note that "*" here is convolution, not multiplication]
[also, I am ignoring normalization completely...]
the sinc(pi x) determines the overall shape of the frequency response (for larger a, the frequency response looks more and more like a box function)
the box(|x|<a) gives it finite support, so you can use direct convolution
the sinc(pi x/a) smooths out the edges of the box and (consequently? equivalently?) greatly improves the rejection of undesirable high frequencies
the last two factors ("the window") also tone down the ringing; they make a vast improvement in both the perceptual artifact and the practical incidence of "undershoot" -- though without completely eliminating them
Please note that there is no magic about any of this. There are a wide variety of windows available, which work just about as well. Also, for a=1 and 2, the frequency response does not look much like a step function. However, I hope this answers your question "why sinc", and gives you some idea about frequency responses and so forth.

Related

detecting a plateu in sensor measurements

I have a stream of data measurements with an initial increasing phase that is followed by a plateau. The measurements are noisy without clear bound. I would like to stop ingesting the stream when the plateau is detected:
while (not_const)
{
add_measurement( stream.get() );
not_const = !is_const();
}
Is there a well-known algorithm for dealing with such problem? I know about Kalman-Filters, but not so sure if they are specifically made for this.
The Kalman filter will cover your noise, so long as the variance is calculable. Yes, it can help in this situation. Depending on your application, you may find that the first derivative of a moving average will do as well for you. Kalman merely optimizes some linear parameters to give a "best" prediction of actual (vs observed-through-noise) values.
You still need to handle your interpretation of that prediction series. You need to define what constitutes a "plateau". How close to 0 do you need the computable slope? Does that figure depend on the preceding input? How abrupt is the transition between the increase and the plateau? The latter considerations would suggest looking at the second derivative as well: a quick-change detector of some ilk.

Comparing 2 one dimensional signals

I have the following problem: I have 2 signals over time. They are from the same source so they should be the same. I want to check if they really are.
Complications:
they may be measured with different sample rates
the start / end time do not correlate. The measurement does not start at the same time and end at the same time.
there may be an time offset between the two signals.
My thoughts go along Fourier transformation, convolution and statistical methods for comparison. Can someone post me some links where I can find more information on how to handle this?
You can easily correct for the phase by just shifting them so their centers of mass line up. (Or alternatively, in the Fourier domain just multiplying by the inverse of the phase of the first coefficient.)
Similarly, if you want to line up the images given only partial data, you can just cross correlate and take the maximal value (which is again easy to do in the Fourier domain).
That leaves the only tricky part of this process as dealing with the sampling rates. Now if you know a-priori what the sample rates are, (and if they are related by a rational number), you can just use sinc interpolation/downsampling to rescale them to a common sampling rate:
https://ccrma.stanford.edu/~jos/st/Bandlimited_Interpolation_Time_Limited_Signals.html
If you don't know the sampling rate, you may be a bit screwed. Technically, you can try just brute forcing over all the different rescalings of your signal, but doing this tends to be either slow or else give mediocre results.
As a last suggestion, if you just want to match sounds exactly you can try using the cepstrum and verifying that the peaks of the signal are close enough to within some tolerance. This type of analysis is used a lot in sound and speech recognition, with some refinements to make it operate a bit more locally. It tends to work best with frequency modulated data like speech and music:
http://en.wikipedia.org/wiki/Cepstrum
Fourier transformation does sound like the right way.
There is too much mathematical information for me to just start explaining here so if you really wanna know what's going on with that (cause I don't think you can just use FT without understanding it) you should use this reference from MIT OpenCourseWare: http://ocw.mit.edu/courses/mathematics/18-103-fourier-analysis-theory-and-applications-spring-2004/lecture-notes/
Hope it helped.
If you are working with a linux box and the waveforms that need to be processed have already been recorded, you can try to use the file command to display details about the recording. It gives you the sampling rate when it is invoked on a wav file, though I am not sure what format you are recording in.
If the signals are time-shifted with respect to each other, you may try to convolve one with a delta function with increasing delays and then comparing. On MATLAB, conv and all should be good enough.
These are just 'crude' attempts (almost like hacking at the problem). There may be algorithms that are shift-invariant that may do a better job.
Hope that helps.

FFT Algorithm: What goes IN/OUT? (re: real-time pitch detection)

I am attempting to extract pitch data from an audio stream. From what I can see, it looks as though FFT is the best algorithm to use.
Rather than digging straight into the math, could someone help me understand what this FFT algorithm does?
Please don't say something obvious like 'FFT extracts frequency data from a raw signal.' I need the next level of detail.
What do I pass in, and what do I get out?
Once I understand the interface clearly, this will help me to understand the implementation.
I take it I need to pass in an audio buffer, I need to tell it how many bytes to use for each computation (say the most recent 1024 bytes from this buffer). and maybe I need to specify the range of pitches I want it to detect. Now it is going to pass back what? An array of frequency bins? What are these?
(Edit:) I have found a C++ algorithm to use (if I can only understand it)
Performous extracts pitch from the microphone. Also the code is open source. Here is a description of what the algorithm does, from the guy that coded it.
PCM input (with buffering)
FFT (1024 samples at a time, remove 200 samples from front of the buffer afterwards)
Reassignment method (against the previous FFT that was 200 samples earlier)
Filtering of peaks (this part could be done much better or even left out)
Combining peaks into sets of harmonics (we call the combination a tone)
Temporal filtering of tones (update the set of tones detected earlier instead of simply using the newly detected ones)
Pick the best vocal tone (frequency limits, weighting, could use the harmonic array also but I don't think we do)
But could someone help me understand how this works? What is it that is getting sent from the FFT to the Reassignment method?
The FFT is just one building block in the process, and it may not be the best approach for pitch detection. Read up on pitch detection and decide which algo you want to use first (this will depend on what exactly you are trying to measure the pitch of - speech, single musical instrument, other types of sound, etc. Get this right before getting into low level details such as the FFT (some, but not all pitch detection algorithms use the FFT internally).
There are numerous similar questions on SO already, e.g. Real-time pitch detection using FFT and Pitch detection using FFT for trumpet, and there is good overview material on Wikipedia etc - read these and then decide whether you still want to roll your own FFT-based solution or perhaps use an existing library which is suitable for your particular application.
There is an element of choice here. The most straightforward to implement is to do (2^n samples in) complex numbers in, and 2^n complex numbers out, so maybe you should start with that.
In the special case of a DCT(discrete cosine transform), typically what goes in is 2^n samples (often floats), and out go 2^n values, often floats too. DCT is an FFT but that takes only the real values, and analyses the function in terms of cosines.
It is smart (but commonly skipped) to define a struct to handle the complex values. Traditionally FFT's are done in-place, but it works fine if you don't.
It can be useful to instantiate a class that contains a work buffer for the FFT (if you don't want to do the FFT in-place), and reuse that for several FFTs.
In goes N samples of PCM (purely real complex numbers). Out comes N bins of frequency domain (each bin corresponding to a 1/N slice of sample rate). Each bin is a complex number. Rather than real and imaginary parts, these values should generally be handled in polar format (absolute value and argument). The absolute value tells the amount of sound near the bin center frequency while the argument tells the phase (at which position the sine wave is travelling).
Most often coders only use the magnitude (absolute value) and throw away the phase angle (argument).

Algorithms needed on filtering the noise caused by the vibration

For example you measure the data coming from some device, it can be a mass of the object moving on the bridge. Because it is moving the mass will give data which will vibrate in some amplitude depending on the mass of the object. Bigger the mass - bigger the vibrations.
Are there any methods for filtering such kind of noise from that data?
May be using some formulas of vibrations? Have no idea what kind of formulas or algorithms (filters) can be used here. Please suggest anything.
EDIT 2:
Better picture, I just draw it for better understanding:
Not very good picture. From that graph you can see that the frequency is the same every
time, but the amplitude chanbges periodically. Something like that I have when there are no objects on the moving road. (conveyer belt). vibrating near zero value.
When the object moves, I there are the same waves with changing amplitude.
The graph can tell that there may be some force applying to the system and which produces forced occilations. So I am interested in removing such kind of noise. I do not know what force causes such occilations. Soon I hope I will get some data on the non moving road with and without object on it for comparison with moving road case.
What you have in your last plot is basically an amplitude modulated oscillation coming from a function like:
f[x] := 10 * (4 + Sin[x]) * Sin[80 * x]
The constants have been chosen to match your plot (using just a rule of thumb)
The Plot of this function is
That isn't "noise" (although may be some noise is there too), but can be filtered easily.
Let's see your data for the static and moving payloads ....
Edit
Based on your response to several comments, and based in my previous experience with weighting devices:
You are interfacing the physical world, not just getting input from a mouse and keyboard. It is very important for you understand the device, how it works and how it is designed.
You need a calibration procedure. You have to use several master weights to be sure that the device is working properly and linearly in the whole scale, and that the static case is measured much better than your dynamic needs.
You'll not be able to predict if you can measure with several loads in the conveyor until you do some experiments and look very carefully at the resulting plots
You need to be sure that a load placed anywhere in the conveyor shows the same reading. Or at least you should be able to correlate reading and position.
As I said before, you need a lot of info, and it seems that is not available. I always worked as a team with the engineers designing the device.
Don't hesitate to add more info ...
Have you tried filters with lowpass characteristics? There are different approaches for smoothing data (i.e. Savitzky-Golay, Gauss, moving average) but often, a simple N-point median filter is already sufficient.
It really depends on what you're after.
Take a look at this book:
The Scientist and Engineer's Guide to Digital Signal Processing
You can download it for free. In particular, check chapters 14 and 15.
If the frequency changes with mass and you're trying to measure mass, why not measure the frequency of the oscillations and use that as your primary measure?
Otherwise you need a notch filter which is tunable - figure out the frequency of the "noise" and tune the notch filter to that.
Another book to try is Lyons Understanding Digital Signal Processing
In order to smooth the signal, I'd average the previous 2 * n samples where n is the maximum expected wavelength of the vibrations.
This should cause most of the noise to be eliminated.
If you have some idea of the range of frequencies, you could do a simple average as long as the measurement period were sufficiently long to give you the level of accuracy you want to achieve. The more wavelengths worth of data you average against, the smaller the ratio of contributed error from a partial wavelength.
I'd suggest first simulating/modeling this in software like Matlab.
Data you'll need to consider:
The expected range of vibration frequencies
The measurement accuracy you want to achieve
The expected range of mass you'll want to measure
The function of mass to vibration amplitude
You should be able to apply the same principles as noise-cancelling microphones: put two sensors out, then subtract the secondary sensor's (farther away from the good signal source) signal from the primary sensor's (closer to the good signal source) signal.
Obviously, this works best if the "noise" will reach both sensors fairly equally while the "signal" reaches the primary sensor much more strongly.
For things like sound, this is pretty easy to do in the sensor itself, which makes your software a lot easier and more performant. Depending on what you're measuring, this might be easier to do with multiple sets of hardware and doing the cancellation in software.
If you can characterize the frequency spectra of the unwanted vibration noise, you might be able to synthesize a set of (near) minimum phase notch or band reject filter(s) to allow you to acquire your desired signal at your desired S/N ratio with minimized latency or data set size.
Filtering noisy digital signals is straight forward, as previous posters have noted. There are lots of references. You have not however stated what your objectives are clearly, so we cannot point you into a good direction. Are you looking for a single measurement of a single object on a bridge? [Then see other answers].
Are you monitoring traffic on this bridge and weighing each entity as it passes by? Then you need to determine when entities are on the sensor and when they are not. Typically, as long as the sensor's noise floor is significantly lower than the signal you're measuring this can be accomplished by simple thresholding.
Are you trying to measure the vibrations of the bridge caused by other vehicles? In which case you need either a more expensive sensor if you're having problems doing this, or a clearer measuring objective.

Perceptual similarity between two audio sequences

I would like to get some sort of distance measure between two pieces of audio. For example, I want to compare the sound of an animal to the sound of a human mimicking that animal, and then return a score of how similar the sounds were.
It seems like a difficult problem. What would be the best way to approach it? I was thinking to extract a couple of features from the audio signals and then do a Euclidian distance or cosine similarity (or something like that) on those features. What kind of features would be easy to extract and useful to determine the perceptual difference between sounds?
(I saw somewhere that Shazam uses hashing, but that's a different problem because there the two pieces of audio being compared are fundamentally the same, but one has more noise. Here, the two pieces of audio are not the same, they are just perceptually similar.)
The process for comparing a set of sounds for similarities is called Content Based Audio Indexing, Retrieval, and Fingerprinting in computer science research.
One method of doing this is to:
Run several bits of signal processing on each audio file to extract features, such as pitch over time, frequency spectrum, autocorrelation, dynamic range, transients, etc.
Put all the features for each audio file into a multi-dimensional array and dump each multi-dimensional array into a database
Use optimization techniques (such as gradient descent) to find the best match for a given audio file in your database of multi-dimensional data.
The trick to making this work well is which features to pick. Doing this automatically and getting good results can be tricky. The guys at Pandora do this really well, and in my opinion they have the best similarity matching around. They encode their vectors by hand though, by having people listen to music and rate them in many different ways. See their Music Genome Project and List of Music Genome Project attributes for more info.
For automatic distance measurements, there are several projects that do stuff like this, including marsysas, MusicBrainz, and EchoNest.
Echonest has one of the simplest APIs I've seen in this space. Very easy to get started.
I'd suggest looking into spectrum analysis. Whilst this isn't as straightforward as you're most likely wanting, I'd expect that decomposing the audio into it's underlying frequencies would provide some very useful data to analyse. Check out this link
Your first step will definitely be taking a Fourier Transform(FT) of the sound waves. If you perform an FT on the data with respect to Frequency over Time1, you'll be able to compare how often certain key frequencies are hit over the course of the noise.
Perhaps you could also subtract one wave from the other, to get a sort of stepwise difference function. Assuming the mock-noise follows the same frequency and pitch trends2 as the original noise, you could calculate the line of best fit to the points of the difference function. Comparing the best fit line against a line of best fit taken of the original sound wave, you could average out a trend line to use as the basis of comparison. Granted, this would be a very loose comparison method.
- 1. hz/ms, perhaps? I'm not familiar with the unit magnitude being worked with here, I generally work in the femto- to nano- range.
- 2. So long as ∀ΔT, ΔPitch/ΔT & ΔFrequency/ΔT are within some tolerance x.
- Edited for formatting, and because I actually forgot to finish writing the full answer.

Resources