Why is Rasa core training unstable? - rasa-nlu

When I train the Rasa core, the training accuracy fluctuates a lot. At one run, it can go up to 95%, then I restart over and re-train once, it goes down to 75%. Is this normal? What's the reason for that?

Depending on the pipeline, number of stories, augmentation, number of epochs and other hyperparameters, it can happen. Usually, it depends on these details and it is hard to tell without knowing them.

Related

Would communication and inter-connection have any impact on a computation bound application on multinode?

I have a computation bound application. I have executed it on multi-nodes ( 4nodes, 8nodes) I'm wondering if communication between the nodes could have any effect on the run time? If so, how would it be possible? because as far as I found, computation bound application just depends on the computing capability of system.
Also, can I consider CPU amount of my system as computing capability?
Any help would be appreciated.
Updated:
In order to see if the application is memory-bound or compute-bound, I've run the application over 1 nodes using different number of cores. For that application (NPB-LU), the run time decreased linearly by increasing the number of cores. So I found this application could be compute-bound (I didn't have another option to figure it out).
Then, I have predicted the run time of the application with a model which considers the latency(in my case it's message-time) in different connection levels like inter-socket, inter-node. There are some difference in the predicted time which achieved by different latency connection levels although the application seemed to be computation-bound.
n:grid size, p:number of cores, m(total Mops/s), f(Mop/s/core)
Imagine you have horse that is drinking water, let's say 1 liter per minute.
In order to give the water to the horse you have a water well where you can take the water from. Imagine you can pump up to 1.5 liters per minute.
Having this situation your water consumption is horse-bounded.
Then it turns out that you have two horses drinking the same amount of water: 1 liter each per minute. Then your water consumption is no longer horse-bounded but well-bounded.
Your application behavior can change depending of the environment. In order to determine what is happening to your application I recommend you to profile your app. You have a lot of alternatives such as gprof, perf, PAPI and many others to better observe what is your application behaviour.
Then you can determine experimentally very intersting metrics like Instructions per Clock cycle, which can give you a better understanding of the behaviour of your app.

Variation of the job scheduling prob

I'm doing some administration work for an aviation transport company. They build aircraft containers and such here. One of the things they want me to code is a order optimization script that the guys on the floor can use to get the most out of the given material. To give a simple overview: say we order a certain amount beams that are 10 meters per unit. We need beam chunks of 5x 6m, 10x 3.5m, 4x 3m, which are acquired by cutting the 10m in smaller parts. What would be the minimum amount of 10m beams we need to order?
There are some parallels with the multiprocessor job scheduling problem (one beam is a processor, each chunk a job), although that focusses on minimizing the time required to perform all jobs instead of minimizing the amount of processors needed to perform all jobs within a pre-set time. The multiprocessor job scheduling problem is in NP-complete, but I wonder if my variation of the problem is too. Does anybody know similar problems and methods for solving them?
This problem is exactly: http://en.wikipedia.org/wiki/Cutting_stock_problem (more generally http://en.wikipedia.org/wiki/Bin_packing_problem). You can use any old ILP solver. I like http://lpsolve.sourceforge.net/5.5/, its quite friendly to use.

forecasting resource utilization

I want to create a system to forecast certain resource utilization; for example, CPU utilization. I have data of CPU utilization for each day. How can I predict its usage for next future time, say 2 days? I know that time series analysis can help but I fail to understand how to accommodate other factors associated with the CPU utilization as time series analysis is only time on x-axis and utilization on y-axis.
Check this out, i think it can help you a lot or at least help you start with something. He deals with a similar problem (forecasting of hard disk space requirements)
http://lpenz.github.com/articles/df0pred-1/index.html
http://lpenz.github.com/articles/df0pred-2/index.html
http://lpenz.github.com/articles/df0pred-3/index.html
I deduce that you have multiple time series, and that you want to put this extra information at work (as opposed to a univariate model solely with cpu utilization).
For a univariate model, you can check with arima(), and find a suitable order for this model using auto.arima() in package forecast. Predictions can be made using predict(), on the arima object.
For a multivariate model, you can consider a vector auto-regressive model. Check for function VAR() in package vars.

how to tune mapred.reduce.parallel.copies?

Following reading http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html we want to experiment with mapred.reduce.parallel.copies.
The blog mentions "looking very carefully at the logs". How would we know we've reached the sweet spot? what should we look for? how can we detect that we're over-parallelizing?
In order to do that you should basically look for 4 things : CPU, RAM, Disk and Network. If your setup is crossing the threshold of these metrics you can deduce that you are pushing the limits. For example, if you have set the value of "mapred.reduce.parallel.copies" to a value much higher than the number of cores available, you'll end up with too many threads in waiting state, as based on this property Threads will be created to fetch the Map output. In addition to that network might get overwhelmed. Or, if there is too much intermediate output to be shuffled , your job will become slow as you will need disk based shuffle in such a case, which will be slower than RAM based shuffle. Choose a wise value for "mapred.job.shuffle.input.buffer.percent" based on your RAM(defaults to 70% of Reducer heap, which is normally good). So, these are kinda things which will tell you whether you are over-parallelizing or not. There are a lot of other things as well which you should consider. I would recommend you to go through the Chapter 6 of "Hadoop Definitve Guide".
Some of the measures which you could take, in order to make your jobs efficient, are like using a combiner to limit the data transfer, enable intermediate compression etc.
HTH
P.S : The answer is not very specific to just "mapred.reduce.parallel.copies". It tells you about tuning your job in general. Actually speaking setting only this property is not gonna help you much. You should consider other important properties as well.
Reaching the "sweet spot" is really just finding the parameters that give you the best result for whichever metric you consider the most important, usually overall job time. To figure out what parameters are working I would suggest using the following profiling tools that Hadoop comes with, MrBench, TestDFSIO, and NNBench. These are found in the hadoop-mapreduce-client-jobclient-*.jar.
By running this command you will see a long list of benchmark programs that you can use besides the ones I mentioned above.
hadoop ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-*.jar
I would suggest running with the default parameters, run tests to give baseline benchmarks, then changing one parameter and rerunning. A bit time consuming but worth it, especially if you use a script to change parameters and run the benchmarks.

How to decide on what hardware to deploy web application

Suppose you have a web application, no specific stack (Java/.NET/LAMP/Django/Rails, all good).
How would you decide on which hardware to deploy it? What rules of thumb exist when determining how many machines you need?
How would you formulate parameters such as concurrent users, simultaneous connections, daily hits and DB read/write ratio to a decision on how much, and which, hardware you need?
Any resources on this issue would be very helpful...
Specifically - any hard numbers from real world experience and case studies would be great.
Capacity Planning is quite a detailed and extensive area. You'll need to accept an iterative model with a "Theoretical Baseline > Load Testing > Tuning & Optimizing" approach.
Theory
The first step is to decide on the Business requirements: how many users are expected for peak usage ? Remember - these numbers are usually inaccurate by some margin.
As an example, let's assume that all the peak traffic (at worst case) will be over 4 hours of the day. So if the website expects 100K hits per day, we dont divide that over 24 hours, but over 4 hours instead. So my site now needs to support a peak traffic of 25K hits per hour.
This breaks down to 417 hits per minute, or 7 hits per second. This is on the front end alone.
Add to this the number of internal transactions such as database operations, any file i/o per user, any batch jobs which might run within the system, reports etc.
Tally all these up to get the number of transactions per second, per minute etc that your system needs to support.
This gets further complicated when you have requirements such as "Avg response time must be 3 seconds etc" which means you have to figure in network latency / firewall / proxy etc
Finally - when it comes to choosing hardware, check out the published datasheets from each manufacturer such as Sun, HP, IBM, Windows etc. These detail the maximum transactions per second under test conditions. We usually accept 50% of those peaks under real conditions :)
But ultimately the choice of the hardware is usually a commercial decision.
Also you need to keep a minimum of 2 servers at each tier : web / app / even db for failover clustering.
Load testing
It's recommended to have a separate reference testing environment throughout the project lifecycle and post-launch so you can come back to run dedicated performance tests on the app. Scale this to be a smaller version of production, so if Prod has 4 servers and Ref has 1, then you test for 25% of the peak transactions etc.
Tuning & Optimizing
Too often, people throw some expensive hardware together and expect it all to work beautifully. You'll need to tune the hardware and OS for various parameters such as TCP timeouts etc - these are published by the software vendors, and these have to be done once the software are finalized. Set these tuning params on the Ref env, test and then decide which ones you need to carry over to Production.
Determine your expected load.
Setup a machine and run some tests against it with a Load testing tool.
How close are you if you only accomplished 10% of the peak load with some margin for error then you know you are going to need some load balancing. Design and implement a solution and test again. Make sure you solution is flexible enough to scale.
Trial and error is pretty much the way to go. It really depends on the individual app and usage patterns.
Test your app with a sample load and measure performance and load metrics. DB queries, disk hits, latency, whatever.
Then get an estimate of the expected load when deployed (go ask the domain expert) (you have to consider average load AND spikes).
Multiply the two and add some just to be sure. That's a really rough idea of what you need.
Then implement it, keeping in mind you usually won't scale linearly and you probably won't get the expected load ;)

Resources