Rasa NLU) Do I need to train whole intents again when I add new intents? - rasa-nlu

Whenever I add new intents, Do I need to train whole intents and data again?
Or Is there any partial, continuous training instead of from scratch.
Because It takes too much time for training whole data, and get longer and longer.
I found some article and it said retraining all the time is good.
But it is 2 years ago.
Retraining and updating an existing Rasa NLU model

yes, currently you need to retrain all the time

Related

How to train on very small data set?

We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it
pre-trained on any data set? (eg wikipedia, etc)
then, if we
understand correctly, the intent classification is a fine tuning task
on top of that transformer. How come it works with such small
training sets?
appreciate any insights!
thanks
Lior
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models.
Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.

How can we improve the accuracy of form-recognizer model?

I am using Microsoft form-recognizer service. My forms are bit complex and I tried training a model for them. The performance I achieved is not really good. Is there anyway I can improve this accuracy? Is there anyway to tune this model? I have trained the model using 5 different populated forms of the same type.
Not sure if you're still interested in it. I've been using the labeling tool in the v2 You can manually label area to create tags (features). This can be done without too much effort. I've been using it and get a very accurate results.
IC

Is Rasa-core train on actual dialog data behind the scene?

Since the core trains on domain.yml and stories.yml, without depending on the users' words (nlu.yml), I understand that RASA-Core training has nothing to do with the NLU part. It solely trains on the 'intent-action' pair, not the actual dialog data:
* greet
- utter_greet
Is this correct? In such a case, I think the training data for dialog policy training is always going to be small, because it trains on the abstract intent-action pairs, not the actual data. In another words, dialog policy training is totally independent from NLU.
Is this understanding correct? Just want to confirm this understanding.
In another words, dialog policy training is totally independent from NLU.
This is right for the training. However, in production Rasa Core uses the extracted entities from Rasa NLU and of course the classified intents.
abstract intent-action pairs
It should only be "pairs" if you are doing a FAQ chatbot. If you actually want to handle more complex conversation, then you have to write more stories. As you can see in this Rasa demo the required training data can get quite large for more complex chatbots.
how to create those intent-action pairs?
You actually have to design your training stories manually. There is currently no way to so automatically. See this blog post which gives some recommendations how to write better training stories for Rasa Core.

Discovery model offline during incremental training

A question now that we are using Discovery. We were thinking we would do incremental training of Discovery while it is in production as we gather bits of training data from the faculty (SMEs) in CogUniversity. However, it seems that while Discovery is training, it does not return a confidence score. Is there a way around that? To me the big benefit of incremental training is that we can improve the machine learning model while it's being used in production. Seems like incremental training doesn't help if the systems has to be taken out of production while training. Please advise.
Training a new model doesn't take the old one offline, but deleting all of the training data for a collection will. If your incremental training process involves deleting all of the training data and uploading different data, then that could be why you're not seeing confidence scores while the new model trains.

Training Model for Sentiment Analysis with Google Prdection API

I am planning to use Google Prediction API for Sentiment Analysis. How can I generate the Traning model for this? Or where can I have any standard training model available for commercial use? I have already tried with the Sentiment Predictor provided in Prediction Gallery of Google Prediction API, but does not seem to work properly.
From my understanding, the "model" for the Google Prediction API is actually not a model, but a suite of models for regression as well as classification. That being said, it's not clear how the Prediction API decides what kind of regression or classification model is used when you present it with training data. You may want to look at how to train a model on the Google Prediction API if you haven't already done so.
If you're not happy with the results of the Prediction API, it might be an issue with your training data. You may want to think about adding more examples to the training file to see if the model comes up with better results. I don't know how many examples you used, but generally, the more you can add, the better.
However, if you want to look at creating one yourself, NLTK is a Python library that you can use to train your own model. Another Python library you can use is scikit-learn.
Hope this helps.
google prediction API is great BUT to train a model you will need...LOT OF DATA.
you can use the sentiment model that is alrady trained..

Resources