Looking for a audio Analysis library for information extraction - feature-extraction

Hey guys I'm a beginner in Audio Analysis and trying to find a library which gives me insights like amplitude, classification of sound, what should detect background noise. I have tried out Paura/pyAudioAnalysis (pAura: Python AUdio Recording and Analysis) which analyzes some of the information for live recording. Is there any good audio analysis library in GitHub ?

There are many. search for DTLN model for audio noise removal on github. DTLN is a pretrained noise removal lite model.
if you're not planning to use any models then try to fix this problem using audio signal processing. use audio features like zero crossing rate for noise/speech activity detection.

Related

Pass modified video stream into Vuforia Engine?

Is it possible to modify the Vuforia video stream for better tracking performance?
Step 1: Get the raw pixel data from the VuforiaBehaviour.Instance.CameraDevice.GetCameraImage();
Step 2: Modify the pixels with post processing via custom shaders in Unity. For example apply a threshold or edge detection.
Step 3: Vuforia Engine uses the modified video input to track images.
That´s the idea but I´m not sure if Vuforia is gonna pass the modified video into the Vuforia Engine then or still uses the unmodified video input for tracking?
If anybody has experience with that I would be thankful for your help! :)
Vuforia Engine assumes that the input images look like "natural" images. Passing an image belonging to a different domain (e.g., the result of an edge detector) is unlikely to improve tracking performance.
That said, tracking performance is affected by image quality. For example, if images are blurry, tracking robustness is going to suffer. If this is the case you might want to look at trying to adjust system camera parameters via the platform API (iOS, Android, etc.). However, please note that this might or might not be possible depending on the platform. Also, on some platforms when a device tracker like ARKit or ARCore is used, the platform tracker itself adjusts the camera parameters for good tracking performance. For example it might keep the exposure time low to reduce blur.

Motion detection using only video clips

I am trying to do motion detection of whether the person is walking or running. However I got only 40 short video clips of a single person walking or running. How to do the motion detection using video data? Can anyone please specify or point to any papers or implementation.
OpenCV has many trackers (https://docs.opencv.org/3.4.1/d9/df8/group__tracking.html). For instance: cv2.TrackerKCF_create().
You may find some comprehensive tutorials on the subject, like this one: https://www.pyimagesearch.com/2015/09/21/opencv-track-object-movement/

How to remove the background sound of sport commentary sound clip in ffmpeg

I'm working on a project which needs to convert sport commentary to text. For that I have already used Microsoft system speech library. It's working fine without background noises. Can any one tell me a way of removing this background noise from the given audio file by using ffmpeg-like tool or in some other programmatic way.
For better accuracy in such case it is better to use more specialized solutions like CMUSphinx.
It helps you with different things: you can configure decoder vocabulary so it will correctly recognize sport terms and expressions
You can fully use noise robust speech recognition in order to deal with background noises. External noise cleanup is actually pretty harmful for speech recognition accuracy and is not recommended. Even a simple processing algorithm like Vuvuzella denoising with Matlab is better used within the decoder, not before processing.

Real-time video(image) stitching

I'm thinking of stitching images from 2 or more(currently maybe 3 or 4) cameras in real-time using OpenCV 2.3.1 on Visual Studio 2008.
However, I'm curious about how it is done.
Recently I've studied some techniques of feature-based image stitching method.
Most of them requires at least the following step:
1.Feature detection
2.Feature matching
3.Finding Homography
4.Transformation of target images to reference images
...etc
Now most of the techniques I've read only deal with images "ONCE", while I would like it to deal with a series of images captured from a few cameras and I want it to be "REAL-TIME".
So far it may still sound confusing. I'm describing the detail:
Put 3 cameras at different angles and positions, while each of them must have overlapping areas with its adjacent one so as to build a REAL-TIME video stitching.
What I would like to do is similiar to the content in the following link, where ASIFT is used.
http://www.youtube.com/watch?v=a5OK6bwke3I
I tried to consult the owner of that video but I got no reply from him:(.
Can I use image-stitching methods to deal with video stitching?
Video itself is composed of a series of images so I wonder if this is possible.
However, detecting feature points seems to be very time-consuming whatever feature detector(SURF, SIFT, ASIFT...etc) you use. This makes me doubt the possibility of doing Real-time video stitching.
I have worked on a real-time video stitching system and it is a difficult problem. I can't disclose the full solution we used due to an NDA, but I implemented something similar to the one described in this paper. The biggest problem is coping with objects at different depths (simple homographies are not sufficient); depth disparities must be determined and the video frames appropriately warped so that common features are aligned. This essentially is a stereo vision problem. The images must first be rectified so that common features appear on the same scan line.
You might also be interested in my project from a few years back. It's a program which lets you experiment with different stitching parameters and watch the results in real-time.
Project page - https://github.com/lukeyeager/StitcHD
Demo video - https://youtu.be/mMcrOpVx9aY?t=3m38s

Waveform Visualization in Ruby

I'm about to start a project that will record and edit audio files, and I'm looking for a good library (preferably Ruby, but will consider anything other than Java or .NET) for on-the-fly visualization of waveforms.
Does anybody know where I should start my search?
That's a lot of data to be streaming into a browser. Flash or Flex charts is probably the only solution that will be memory efficient. Javascript charting tends to break-down for large data sets.
When displaying an audio waveform, you will want to do some sort of data reduction on the original data, because there is usually more data available in an audio file than pixels on the screen. Most audio editors build a separate file (called a peak file or overview file) which stores a subset of the audio data (usually the peaks and valleys of a waveform) for use at different zoom levels. Then as you zoom in past a certain point you start referencing the raw audio data itself.
Here are some good articles on this:
Waveform Display
Build an Audio Waveform Display
As far as source code goes, I would recommend looking through the Audacity source code. Audacity's waveform display is pretty good and mostly likely does a similar sort of data reduction when rendering the waveforms.
i wrote one:
http://github.com/pangdudu/rude/tree/master/lib/waveform_narray_testing.rb
,nick
The other option is generating the waveforms on the server-side with GD or RMagick. But good luck getting RubyGD to compile.
Processing is often used for visualization, and it has a Ruby port:
https://github.com/jashkenas/ruby-processing/wiki

Resources