TensorFlow - Text Classification using Neural Networks - text-classification

Is there any example on how can TensorFlow be used for text classification using neural networks?

I've started putting together a set of examples for text classification on DBPedia dataset (predicting class of object from its description) as part of examples for Scikit Flow:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/text_classification.py
Going to expand this example and write a blog post when will have enough different models showcased. Feel free to suggest other datasets and models you would be interested to see.

Denny Britz has some great tutorials for Deep learning on his blog at here
and has an example on github here
Some of his examples don't use tensor flow but the one I linked to github uses CNN for text classification using tensor flow

Related

How to train on very small data set?

We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it
pre-trained on any data set? (eg wikipedia, etc)
then, if we
understand correctly, the intent classification is a fine tuning task
on top of that transformer. How come it works with such small
training sets?
appreciate any insights!
thanks
Lior
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models.
Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.

Sentiment Analysis using tensorflow

I am exploring tensorflow and would like to do sentiment analysis using the options available. I had a look at the following tutorial http://www.tensorflow.org/tutorials/recurrent/index.html#language_modeling
I have worked woth Naive Bayes Classifier, Maximum Entropy Algorithm and Scikit Learn Classifier and would like to know if there are any better algorithms offered by tensorflow. Is this the right place to start or are there any other options?
Any help pointing in the right direction would be greatly appreciated.
Thanks in advance.
A commonly used approach would be using a Convolutional Neural Network (CNN) to do sentiment analysis. You can find a great explanation/tutorial in this WildML blogpost. The accompanying TensorFlow code can be found here.
Another approach would be using an LSTM (or related network), you can find example implementations online, a good starting point is this blogpost.
I would suggest you try a character-level LSTM, it's been shown to be able to achieve state-of-the-art results in many text classification tasks one of them being sentiment analysis.
I wrote a pretty lengthy article that you can find here where I go through it's implementation in TensorFlow line by line. The result is a model that is less than 100mb in size and that achieves an accuracy of over 80% on a test set of 80,000 tweets.
Another approach that has proven to be very effective is to use a recursive neural network, you can read the paper from Stanford NLP Group here
For me, the easiest tutorial to follow was: https://pythonprogramming.net/data-size-example-tensorflow-deep-learning-tutorial/?completed=/train-test-tensorflow-deep-learning-tutorial/
It walks you throughTensorFlow.train.AdamOptimizer().minimize(cost) and uses Sentiment140 dataset (from Stanford, ~1 mil examples of positive and negative sentiment)

Obtaining a HOG feature vector for implementation in SVM in Python

I am new to sci-kit learn. I have viewed the online tutorials but they all seem to leverage existing data (e.g., digits, iris, etc). I need the information on how to process images so that they can be used by scikit learn.
Details of my Study: I have a webcam set up outside my office. It captures all of the traffic on my street that passes in the field of view. I have cropped several hundred images of sedans, trucks and SUV's. The goal is to predict whether a vehicle is one of these categories. I have applied Histogram Oriented Gradients (HOG) to these images which I have attached for your review to see the differences in the categories. This blog will not allow me to post any images but you can see them here https://stats.stackexchange.com/questions/149421/obtaining-a-hog-feature-vector-for-implementation-in-svm-in-python. I posted the same question at this site but no response. This post is the closest answer I have found. Resize HOG feature for Scikit-Learn classifier
I wish to train an SVM classifier based on these images. I understand that there are algorithms that exist in scikit-image that prepares the HOG images for use in scikit-learn. Can someone help me understand this process. I am also grateful for any thoughts based on your experience as to the probability of success of this classification study. I also understand that I need to train the model using a negative images ( ones with no vehicles. How is this done?
I know I am asking a lot but I am surprised no one that I am aware of has done a tutorial on these early steps. It seems like a fairly elementary study.

How compare two images and check whether both images are having same object or not in OpenCV python or JavaCV

I am working on a feature matching project and i am using OpenCV Python as the tool for developed the application.
According to the project requirement, my database have images of some objects like glass, ball,etc ....with their descriptions. User can send images to the back end of the application and back end is responsible for matching the sent image with images which are exist in the database and send the image description to the user.
I had done some research on the above scenario. Unfortunately still i could not find a algorithm for matching two images and identifying both are matching or not.
If any body have that kind of algorithm please send me.(I have to use OpenCV python or JavaCV)
Thank you
This is a very common problem in Computer Vision nowadays. A simple solution is really simple. But there are many, many variants for more sophisticated solutions.
Simple Solution
Feature Detector and Descriptor based.
The idea here being that you get a bunch of keypoints and their descriptors (search for SIFT/SURF/ORB). You can then find matches easily with tools provided in OpenCV. You would match the keypoints in your query image against all keypoints in the training dataset. Because of typical outliers, you would like to add a robust matching technique, like RanSaC. All of this is part of OpenCV.
Bag-of-Word model
If you want just the image that is as much the same as your query image, you can use Nearest-Neighbour search. Be aware that OpenCV comes with the much faster Approximated-Nearest-Neighbour (ANN) algorithm. Or you can use the BruteForceMatcher.
Advanced Solution
If you have many images (many==1 Million), you can look at Locality-Sensitive-Hashing (see Dean et al, 100,000 Object Categories).
If you do use Bag-of-Visual-Words, then you should probably build an Inverted Index.
Have a look at Fisher Vectors for improved accuracy as compared to BOW.
Suggestion
Start by using Bag-Of-Visual-Words. There are tutorials on how to train the dictionary for this
model.
Training:
Extract Local features (just pick SIFT, you can easily change this as OpenCV is very modular) from a subset of your training images. First detect features and then extract them. There are many tutorials on the web about this.
Train Dictionary. Helpful documentation with a reference to a sample implementation in Python (opencv_source_code/samples/python2/find_obj.py)!
Compute Histogram for each training image. (Also in the BOW documentation from previous step)
Put your image descriptors from the step above into a FLANN-Based-matcher.
Querying:
Compute features on your query image.
Use the dictionary from training to build a BOW histogram for your query image.
Use that feature to find the nearest neighbor(s).
I think you are talking about Content Based Image Retrieval
There are many research paper available on Internet.Get any one of them and Implement Best out of them according to your needs.Select Criteria according to your application like Texture based,color based,shape based image retrieval (This is best when you are working with image retrieval on internet for speed).
So you Need python Implementation, I would like to suggest you to go through Chapter 7, 8 of book Computer Vision Book . It Contains Working Example with code of what you are looking for
One question you may found useful : Are there any API's that'll let me search by image?

Unsupervised automatic tagging algorithms?

I want to build a web application that lets users upload documents, videos, images, music, and then give them an ability to search them. Think of it as Dropbox + Semantic Search.
When user uploads a new file, e.g. Document1.docx, how could I automatically generate tags based on the content of the file? In other words no user input is needed to determine what the file is about. If suppose that Document1.docx is a research paper on data mining, then when user searches for data mining, or research paper, or document1, that file should be returned in search results, since data mining and research paper will most likely be potential auto-generated tags for that given document.
1. Which algorithms would you recommend for this problem?
2. Is there an natural language library that could do this for me?
3. Which machine learning techniques should I look into to improve tagging precision?
4. How could I extend this to video and image automatic tagging?
Thanks in advance!
The most common unsupervised machine learning model for this type of task is Latent Dirichlet Allocation (LDA). This model automatically infers a collection of topics over a corpus of documents based on the words in those documents. Running LDA on your set of documents would assign words with probability to certain topics when you search for them, and then you could retrieve the documents with the highest probabilities to be relevant to that word.
There have been some extensions to images and music as well, see http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf.
LDA has several efficient implementations in several languages:
many implementations from the original researchers
http://mallet.cs.umass.edu/, written in Java and recommended by others on SO
PLDA: a fast, parallelized C++ implementation
These guys propose an alternative to LDA.
Automatic Tag Recommendation Algorithms for
Social Recommender Systems
http://research.microsoft.com/pubs/79896/tagging.pdf
Haven't read thru the whole paper but they have two algorithms:
Supervised learning version. This isn't that bad. You can use Wikipedia to train the algorithm
"Prototype" version. Haven't had a chance to go thru this but this is what they recommend
UPDATE: I've researched this some more and I've found another approach. Basically, it's a two-stage approach that's very simple to understand and implement. While too slow for 100,000s of documents, it (probably) has good performance for 1000s of docs (so it's perfect for tagging a single user's documents). I'm going to try this approach and will report back on performance/usability.
In the mean time, here's the approach:
Use TextRank as per http://qr.ae/36RAP to generate a tag list for a single document. This generates a tag list for a single document independent of other documents.
Use the algorithm from "Using Machine Learning to Support Continuous
Ontology Development" (https://www.researchgate.net/publication/221630712_Using_Machine_Learning_to_Support_Continuous_Ontology_Development) to integrate the tag list (from step 1) into the existing tag list.
Text documents can be tagged using this keyphrase extraction algorithm/package.
http://www.nzdl.org/Kea/
Currently it supports limited type of documents (Agricultural and medical I guess) but you can train it according to your requirements.
I'm not sure how would the image/video part work out, unless you're doing very accurate object detection (which has it's own shortcomings). How are you planning to do it ?
You want Doc-Tags (https://www.Doc-Tags.com) which is a commercial product that automatically and Unsupervised - generates Contextually Accurate Document Tags. The built-in Reporting functionality makes the product a light-weight document management system.
For Developers wanting to customize their own approach - the source code is available (very cheap) and the back-end service xAIgent (https://xAIgent.com) is very inexpensive to use.
I posted a blog article today to answer your question.
http://scottge.net/2015/06/30/automatic-image-and-video-tagging/
There are basically two approaches to automatically extract keywords from images and videos.
Multiple Instance Learning (MIL)
Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and the variants
In the above blog article, I list the latest research papers to illustrate the solutions. Some of them even include demo site and source code.
Thanks, Scott

Resources