This question in this community discuss why in speech recognition, the front end generally does signal processing to allow feature extraction from the audio stream. It is nicely explained there,why to prefer DCT over DFT in the second step. But as this process is hardware based implies that there will be some standard circuits for DFT/DCT transform.
when I googled up for FFT algorithm I found some nice material here. But in my current project I need to use DCT. Can someone please point to any standard DCT algorithm/chip which can be used for feature extraction for speech signal?
Related
I'm trying to find out fast motion estimation algorithm used in vp9, kindly help me out. As no documentation is available although it's open source but I couldn't find anything relevant.
As with typical video standards, there is no motion estimation algorithm in VP9, the standardized parts are the bitstream and how to decode it. Of course the encoders implement some motion estimation algorithm(s) (usually configurable so the user can choose their speed/quality trade-off), but since the standard doesn't cover encoders that is not part of VP9. For the decoder it does not matter how motion vectors were chosen, it only matters what the result was.
You can get the latest version of the standard from this website.
In libvpx in vp9_mcomp.c, it is visible which algorithms that specific encoder uses, which includes several diamond searches (with varying accuracy/time trade-offs including N-step diamond search), two hexagon-based searches, square search, and even exhaustive search. There's integral projection motion estimation in it too but it seems to be used only in a special case.
I have a general question about contrast adjustment, forgive me if it is too naive or general and please let me know if any correction is necessary.
Here is my question: When do we usually do contrast adjustment or contrast stretching in image processing or computer vision? In particular, when is it necessary to do contrast adjustments for object detection or segmentation? What are the downfall of contrast stretching, if not applied in the right situation? Can you give me a few examples as well?
Your answers are greatly appreciated!I
In general, you can classify algorithms that are related to images and videos processing int two categories:
Image + video processing
Where these algorithms can be used to enhance the quality of the image.
Computer vision:
These algorithms can be used to detect, recognize and classify objects.
The contrast adjustment techniques used to enhance the quality and the visibility of the image.
Most of the time better input quality for computer vision algorithms can lead to better results. This is why most of cv algorithms use pre-processing steps to remove noise and improve the quality.
I'm going to match the sketch face (drawing photo) in to the color photo. so for the research i want to find out what are the challenges that matching sketch drawing in to color faces. for now i have find out that
resolution pixel difference
texture difference
distance difference
and color (not much effect)
I want to know (in technical terms) what are other challenges and what are available OPEN CV and JAVA CV method and algorithms to overcome that challenges?
Here is some example of the sketches and the photos that are known to match them:
This problem is called multi-modal face recognition. There has been a lot of interest in comparing a high quality mugshot (modality 1) to low quality surveillance images (modality 2), another is frontal images to profiles, or pictures to sketches like the OP is interested in. Partial Least Squares (PLS) and Tied Factor Analysis (TFA) have been used for this purpose.
A key difficulty is computing two linear projections from the image in modality 1 (and modality 2) to a space where two points being close means that the individual is the same. This is the key technical step. Here are some papers on this approach:
Abhishek Sharma, David W Jacobs : Bypassing Synthesis: PLS for
Face Recognition with Pose, Low-Resolution and Sketch. CVPR
2011.
S.J.D. Prince, J.H. Elder, J. Warrell, F.M. Felisberti, Tied Factor
Analysis for Face Recognition across Large Pose Differences, IEEE
Patt. Anal. Mach. Intell, 30(6), 970-984, 2008. Elder is a specialist in this area and has a variety of papers on the topic.
B. Klare, Z. Li and A. K. Jain, Matching forensic sketches to
mugshot photos, IEEE Pattern Analysis and Machine Intelligence, 29
Sept. 2010.
As you can understand this is an active research area/problem. In terms using OpenCV to overcome the difficulties, let me give you an analogy: you need to build build a house (match sketches to photos) and you're asking how will having a Stanley hammer (OpenCV) will help. Sure, it will probably help. But you'll also need a lot of other resources: wood, time/money, pipes, cable, etc.
I think that James Elder's old work on the completeness of the edge map (using reconstruction by solving the Laplace equation) is quite relevant here. See the results at the end of this paper: http://elderlab.yorku.ca/~elder/publications/journals/ElderIJCV99.pdf
You could give Eigenfaces a try, though i never tested them with sketches i think they could a least be a good starting point for your research.
See Wiki: http://en.wikipedia.org/wiki/Eigenface and the Tutorial for OpenCV: http://docs.opencv.org/modules/contrib/doc/facerec/facerec_tutorial.html (including not only Eigenfaces!)
OpenCV can be used for feature extraction and machine learning required for this task. I guess you can start with the papers in the answers above, start with some basic features and prototype a classifier with OpenCV.
I guess you might also want to detect and match feature points on the faces. If you use this approach, you will have to do the feature point detectors on your own (training the Viola-Jones detector in OpenCV with your own data is an option).
I want to develop an app for gesture recognition using Kinect and hidden Markov models. I watched a tutorial here: HMM lecture
But I don't know how to start. What is the state set and how to normalize the data to be able to realize HMM learning? I know (more or less) how it should be done for signals and for simple "left-to-right" cases, but 3D space makes me a little confused. Could anyone describe how it should be begun?
Could anyone describe the steps, how to do this? Especially I need to know how to do the model and what should be the steps of HMM algorithm.
One set of methods for applying HMMs to gesture recognition would be to apply a similar architecture as commonly used for speech recognition.
The HMM would not be over space but over time, and each video frame (or set of extracted features from the frame) would be an emission from an HMM state.
Unfortunately, HMM-based speech recognition is a rather large area. Many books and theses have been written describing different architectures. I recommend starting with Jelinek's "Statistical Methods for Speech Recognition" (http://books.google.ca/books?id=1C9dzcJTWowC&pg=PR5#v=onepage&q&f=false) then following the references from there. Another resource is the CMU sphinx webpage (http://cmusphinx.sourceforge.net).
Another thing to keep in mind is that HMM-based systems are probably less accurate than discriminative approaches like conditional random fields or max-margin recognizers (e.g. SVM-struct).
For an HMM-based recognizer the overall training process is usually something like the following:
1) Perform some sort of signal processing on the raw data
For speech this would involve converting raw audio into mel-cepstrum format, while for gestures, this might involve extracting image features (SIFT, GIST, etc.)
2) Apply vector quantization (VQ) (other dimensionality reduction techniques can also be used) to the processed data
Each cluster centroid is usually associated with a basic unit of the task. In speech recognition, for instance, each centroid could be associated with a phoneme. For a gesture recognition task, each VQ centroid could be associated with a pose or hand configuration.
3) Manually construct HMMs whose state transitions capture the sequence of different poses within a gesture.
Emission distributions of these HMM states will be centered on the VQ vector from step 2.
In speech recognition these HMMs are built from phoneme dictionaries that give the sequence of phonemes for each word.
4) Construct an single HMM that contains transitions between each individual gesture HMM (or in the case of speech recognition, each phoneme HMM). Then, train the composite HMM with videos of gestures.
It is also possible at this point to train each gesture HMM individually before the joint training step. This additional training step may result in better recognizers.
For the recognition process, apply the signal processing step, find the nearest VQ entry for each frame, then find a high scoring path through the HMM (either the Viterbi path, or one of a set of paths from an A* search) given the quantized vectors. This path gives the predicted gestures in the video.
I implemented the 2d version of this for the Coursera PGM class, which has kinect gestures as the final unit.
https://www.coursera.org/course/pgm
Basically, the idea is that you can't use HMM to actually decide poses very well. In our unit, I used some variation of K-means to segment the poses into probabilistic categories. The HMM was used to actually decide what sequences of poses were actually viable as gestures. But any clustering algorithm run on a set of poses is a good candidate- even if you don't know what kind of pose they are or something similar.
From there you can create a model which trains on the aggregate probabilities of each possible pose for each point of kinect data.
I know this is a bit of a sparse interview. That class gives an excellent overview of the state of the art but the problem in general is a bit too difficult to be condensed into an easy answer. (I'd recommend taking it in april if you're interested in this field)
For example: What algorithm is used to generate the image by the fresco filter in Adobe Photoshop?
Do you know some place where I can read about the algorithms implemented in these filters?
Lode's Computer Graphics Tutorial
The source code for GIMP would be a good place to start. If the code for some filter doesn't make sense, at least you'll find jargon in the code and comments that can be googled.
The Photoshop algorithms can get very complex, and beyond simple blurring and sharpening, each one is a topic unto itself.
For the fresco filter, you might want to start with an SO question on how to cartoon-ify and image.
I'd love to read a collection of the more interesting algorithms, but I don't know of such a compilation.
Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and signal distortion during processing. Since images are defined over two dimensions (perhaps more) digital image processing may be modeled in the form of multidimensional systems.
Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analog means.
In particular, digital image processing is the only practical technology for:
Classification
Feature extraction
Pattern recognition
Projection
Multi-scale signal analysis
Some techniques which are used in digital image processing include:
Pixelation,
Linear filtering,
Principal components analysis
Independent component analysis
Hidden Markov models
Anisotropic diffusion
Partial differential equations
Self-organizing maps
Neural networks
Wavelets