What I can use to process video sentiment analysis on a video stream? - sentiment-analysis

I want to make an app,maybe web, that performs a sentiment analysis on a video stream during a web interview.Do you have any recommendations ?
I want to mention that I'm good with java and c#.

The solution might be different for different use cases. You mentioned during web interview. Are you looking for real time processing? How do you capture video - which format? Do you need spoken words analysis or face emotions analysis? And your budget for the same.

Related

Is there any sentiment forum dataset for unsupervised training available?

I recently finished a machine learning course and would like to make a forum sentiment analysis tool, to apply it in stock-related forums.
The idea is to:
Capture (text mining) users with their comments, and evaluate their comment's sentiment (positive, negative, neutral).
Capture what happens (stock market) after those comments, and assign a weight to the user accordingly (bigger weight if the user's sentiments is spot-on and the market follows the same direction)
Use the comments as a tool to predict market direction.
Actually, I do this myself (pay attention on forums) plus my own technical analysis and the obligatory due diligence, and it has been working very well for me. I just wanted to try to automate it a little bit and maybe even allow a program to play with some of my accounts (paper trading first, and if it performs decently assign some money in a real account)
This would be my first machine learning project (just as a proof-of-concept) so any comments would be very kindly appreciated.
The biggest problem that I find is that I would like to make an unsupervised training, and I need a sample dataset to do the training.
Question: Is there any known forum-sentiment dataset available to be used for unsupervised training?
I've found several sentiment datasets (twitter, imbd, amazon reviews) but they are very specific to their niche (short messages, movies, products...) but I'm looking for something more general.
Since you are looking for an unsupervised approach you can use any set of data that matches your "real case scenario". Text mining and sentiment analysis are are often tailored to the problem at hand so it is easy to start directly with the real data. The best approach is to built a scraper that grabs directly the forum posts that you want to analyze. You can build the scraper easily enough with Python (beautifulsoup/selenium). Online is full of nice tutorial eg: https://www.dataquest.io/blog/web-scraping-tutorial-python/

Watson speech-to-text: Narrowband producing better results than Broadband?

I'm using IBM Watson to transcribe a video library that we have. I'm currently doing initial research into it's efficacy and accuracy.
The videos in question have OK to very good sound quality and based on Watson documentation I should be using the Broadband model to transcribe them.
I've however tested using both Narrow and Broadband and I'm finding that Narrowband always either slightly better or a lot better in some cases (up to 10%).
Has anyone else done any similar testing? It's contrary to the documentation so I'm a little reluctant to just go ahead and use Narrowband for everything, but I may have to based on the results.
I'm using ffmpeg to convert the videos to audio files to send to Watson, and the audio files show 48KHz sampling rates, which again means I should be using and getting better results using Broadband.
Hoping someone out there has done similar research and can help.
Thanks in advance.
do you know what the original sampling rate of the audio is? Maybe it was recorded at 8k originally and then upsampled. If that were the case the original lower frequencies would be lost and the right model to use would be the Narrowband model. You can see this in an spectrogram, using for example audacity (https://github.com/audacity/audacity).
Another explanation would be that the n-grams in your video are better predicted by the language model that the Narrowband system uses. I suggest sharing your audio file with Watson support team to get further insight (you can go to the Bluemix portal and then click on "support").

Is opencv image similarity comparison reliable for objects? Is there any cost/benefit quality alternative to open-source API's?

I'm trying to choose an API to match object images taken with a cell phone with a list of images in a file system. The point is, I'm afraid that I won't get reliable results and it won't be worth it to loose time in this feature.
I would really appreciate some advice regarding this topic.

Smart video thumbnail generator algorithm

Hello I'm a Java developer and I'm a part of video on demand website team.
I'm currently doing research on how to implement a back-end component that we are planning to build; the component is expected to automatically generate a meaningful thumbnail representing the content of the videos like the algorithm used in YouTube to generate default thumbnails.
However, I can't seem to find any good open source or payed implementation that can do so, and building the algorithm from scratch is very complicated and needs a lot of time that I don't think the company is willing to invest at the current stage (maybe in the future though)
I would appreciate if someone can refer to any implementation that can help me or even vendors that sell an implementation or a product that can serve my component's objective.
Thanks!
As explained by google research blog:
https://research.googleblog.com/2015/10/improving-youtube-video-thumbnails-with.html
The key component is using a convolutional neural network to predict the score for each sampled frame.
There are so many open sourced CNN implementation like caffe or tensorflow. The only efforts are preparing some training data.

Unsupervised automatic tagging algorithms?

I want to build a web application that lets users upload documents, videos, images, music, and then give them an ability to search them. Think of it as Dropbox + Semantic Search.
When user uploads a new file, e.g. Document1.docx, how could I automatically generate tags based on the content of the file? In other words no user input is needed to determine what the file is about. If suppose that Document1.docx is a research paper on data mining, then when user searches for data mining, or research paper, or document1, that file should be returned in search results, since data mining and research paper will most likely be potential auto-generated tags for that given document.
1. Which algorithms would you recommend for this problem?
2. Is there an natural language library that could do this for me?
3. Which machine learning techniques should I look into to improve tagging precision?
4. How could I extend this to video and image automatic tagging?
Thanks in advance!
The most common unsupervised machine learning model for this type of task is Latent Dirichlet Allocation (LDA). This model automatically infers a collection of topics over a corpus of documents based on the words in those documents. Running LDA on your set of documents would assign words with probability to certain topics when you search for them, and then you could retrieve the documents with the highest probabilities to be relevant to that word.
There have been some extensions to images and music as well, see http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf.
LDA has several efficient implementations in several languages:
many implementations from the original researchers
http://mallet.cs.umass.edu/, written in Java and recommended by others on SO
PLDA: a fast, parallelized C++ implementation
These guys propose an alternative to LDA.
Automatic Tag Recommendation Algorithms for
Social Recommender Systems
http://research.microsoft.com/pubs/79896/tagging.pdf
Haven't read thru the whole paper but they have two algorithms:
Supervised learning version. This isn't that bad. You can use Wikipedia to train the algorithm
"Prototype" version. Haven't had a chance to go thru this but this is what they recommend
UPDATE: I've researched this some more and I've found another approach. Basically, it's a two-stage approach that's very simple to understand and implement. While too slow for 100,000s of documents, it (probably) has good performance for 1000s of docs (so it's perfect for tagging a single user's documents). I'm going to try this approach and will report back on performance/usability.
In the mean time, here's the approach:
Use TextRank as per http://qr.ae/36RAP to generate a tag list for a single document. This generates a tag list for a single document independent of other documents.
Use the algorithm from "Using Machine Learning to Support Continuous
Ontology Development" (https://www.researchgate.net/publication/221630712_Using_Machine_Learning_to_Support_Continuous_Ontology_Development) to integrate the tag list (from step 1) into the existing tag list.
Text documents can be tagged using this keyphrase extraction algorithm/package.
http://www.nzdl.org/Kea/
Currently it supports limited type of documents (Agricultural and medical I guess) but you can train it according to your requirements.
I'm not sure how would the image/video part work out, unless you're doing very accurate object detection (which has it's own shortcomings). How are you planning to do it ?
You want Doc-Tags (https://www.Doc-Tags.com) which is a commercial product that automatically and Unsupervised - generates Contextually Accurate Document Tags. The built-in Reporting functionality makes the product a light-weight document management system.
For Developers wanting to customize their own approach - the source code is available (very cheap) and the back-end service xAIgent (https://xAIgent.com) is very inexpensive to use.
I posted a blog article today to answer your question.
http://scottge.net/2015/06/30/automatic-image-and-video-tagging/
There are basically two approaches to automatically extract keywords from images and videos.
Multiple Instance Learning (MIL)
Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and the variants
In the above blog article, I list the latest research papers to illustrate the solutions. Some of them even include demo site and source code.
Thanks, Scott

Resources