Training Model for Sentiment Analysis with Google Prdection API - sentiment-analysis

I am planning to use Google Prediction API for Sentiment Analysis. How can I generate the Traning model for this? Or where can I have any standard training model available for commercial use? I have already tried with the Sentiment Predictor provided in Prediction Gallery of Google Prediction API, but does not seem to work properly.

From my understanding, the "model" for the Google Prediction API is actually not a model, but a suite of models for regression as well as classification. That being said, it's not clear how the Prediction API decides what kind of regression or classification model is used when you present it with training data. You may want to look at how to train a model on the Google Prediction API if you haven't already done so.
If you're not happy with the results of the Prediction API, it might be an issue with your training data. You may want to think about adding more examples to the training file to see if the model comes up with better results. I don't know how many examples you used, but generally, the more you can add, the better.
However, if you want to look at creating one yourself, NLTK is a Python library that you can use to train your own model. Another Python library you can use is scikit-learn.
Hope this helps.

google prediction API is great BUT to train a model you will need...LOT OF DATA.
you can use the sentiment model that is alrady trained..

Related

How to train on very small data set?

We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it
pre-trained on any data set? (eg wikipedia, etc)
then, if we
understand correctly, the intent classification is a fine tuning task
on top of that transformer. How come it works with such small
training sets?
appreciate any insights!
thanks
Lior
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models.
Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.

How are the visually similar images in Google Vision API retreived?

I have retrieved "Visually Similar Images" using Google Vision API. I would like to know how given a photo (that could pertain to a blog or article), Google Vision API finds a list of visually similar images? I cannot seem to find a white paper describing this.
Additionally, I would like to know if it makes sense to consider these visually similar images if the labels predicted by Google Vision API have a score lower than 70% confidence?
According to the documentation, Google Cloud’s Vision API offers powerful pre-trained machine learning models through REST and RPC APIs, like the Web Detection which are in charge of processing and analyzing the images received in order to identify other images with characteristics similar to the original, as is mentioned here; however, since it is a pre-trained model of Google there isn’t a public documentation of its development.
Regarding your question about considering a confidence score lower than 70%, it completely depends on your use-case, you have to evaluate the acceptance limits required in order to satisfy your requirements.
Please note that the object returned in the "visuallySimilarImages" field of the JSON response is a WebImage object and its score field is deprecated, you may be referring to the score within the WebEntity object that is an overall relevancy score for the entity. Not normalized and not comparable across different image queries.

How to know the trained model by AutoML?

I used AutoML Vision to train a model to predict cancer based on images. It works quite well. I want to know what the model is, whether it is CNN, how many layers.
Thank you!
We don't normally release the exact details of the model because we want to, and continue to, change it under the hood as newer better models come out.
It is a relatively deep CNN.

Is there any sentiment forum dataset for unsupervised training available?

I recently finished a machine learning course and would like to make a forum sentiment analysis tool, to apply it in stock-related forums.
The idea is to:
Capture (text mining) users with their comments, and evaluate their comment's sentiment (positive, negative, neutral).
Capture what happens (stock market) after those comments, and assign a weight to the user accordingly (bigger weight if the user's sentiments is spot-on and the market follows the same direction)
Use the comments as a tool to predict market direction.
Actually, I do this myself (pay attention on forums) plus my own technical analysis and the obligatory due diligence, and it has been working very well for me. I just wanted to try to automate it a little bit and maybe even allow a program to play with some of my accounts (paper trading first, and if it performs decently assign some money in a real account)
This would be my first machine learning project (just as a proof-of-concept) so any comments would be very kindly appreciated.
The biggest problem that I find is that I would like to make an unsupervised training, and I need a sample dataset to do the training.
Question: Is there any known forum-sentiment dataset available to be used for unsupervised training?
I've found several sentiment datasets (twitter, imbd, amazon reviews) but they are very specific to their niche (short messages, movies, products...) but I'm looking for something more general.
Since you are looking for an unsupervised approach you can use any set of data that matches your "real case scenario". Text mining and sentiment analysis are are often tailored to the problem at hand so it is easy to start directly with the real data. The best approach is to built a scraper that grabs directly the forum posts that you want to analyze. You can build the scraper easily enough with Python (beautifulsoup/selenium). Online is full of nice tutorial eg: https://www.dataquest.io/blog/web-scraping-tutorial-python/

Attribute selection in h2o

I am very beginner in h2o and I want to know if there is any attribute selection capabilities in h2o framework so to be applied in h2oframes?
No there are not currently feature selection functions in H2O -- my advice would be to use Lasso regression (in H2O this means use GLM with alpha = 1.0) to do the feature selection, or simply allow whatever machine learning algorithm (e.g. GBM) you are planning to use to use all the features (they'll tend to ignore the bad ones, but it could still degrade performance of the algorithm to have bad features in the training data).
If you'd like, you can make a feature request by filling out a ticket on the H2O-3 JIRA. This seems like a nice feature to have.
In my opinion, Yes
My way is use automl to train your data.
after training, you can get a lot of model.
use h2o.get_model method or H2O server page to watch some model you like.
you can get VARIABLE IMPORTANCES frame.
then pick your features.

Resources