Co-relational algorithms to find relationship between datasets - algorithm

My problem statement is "How to find co-relation between fields"
Let me explain it by an example:
Suppose I have a dataset which contains room temperature and CPU speed after regular intervals of time. i.e. two fields, one is room temperature and other one is CPU speed. As we know that CPU speed increases with rise in external temperature. So, there lies a relationship between room-temperature and CPU speed, as a result computer's performance decreases.
I want such an algorithm which may tell me relationship between two fields whether they are directly proportional to each other or inversely and what happens with third parameter (computer's performance) with change in other two parameters (room-temperature and CPU speed). Please tell me if you know some sort of algorithm to solve this problem.

I'm not sure I fully understand your question but a simple linear regression would work.
Wikipedia article on linear regressions
For example in R you would use lm function :
lm(formula = cpuSpeed ~ roomTemp)

Linear regression is a common approach, but you should also take the time to plot the two variables against each other. This visualization will help you discover nonlinear relationships as well

Related

What type of algorithm should I use for forecasting with only very little historic data?

The problem is as follows:
I want to use a forecasting algorithm to predict heat demand of a not further specified household during the next 24 hours with a time resolution of only a few minutes within the next three or four hours and lower resolution within the following hours.
The algorithm should be adaptive and learn over time. I do not have much historic data since in the beginning I want the algorithm to be able to be used in different occasions. I only have very basic input like the assumed yearly heat demand and current outside temperature and time to begin with. So, it will be quite general and unprecise at the beginning but learn from its Errors over time.
The algorithm is asked to be implemented in Matlab if possible.
Does anyone know an apporach or an algortihm designed to predict sensible values after a short time by learning and adapting to current incoming data?
Well, this question is quite broad as essentially any algorithm for forcasting or data assimilation could do this task in principle.
The classic approach I would look into first would be Kalman filtering, which is a quite general approach at least once its generalizations to ensemble Filters etc. are taken into account (This is also implementable in MATLAB easily).
https://en.wikipedia.org/wiki/Kalman_filter
However the more important part than the actual inference algorithm is typically the design of the model you fit to your data. For your scenario you could start with a simple prediction from past values and add daily rhythms, influences of outside temperature etc. The more (correct) information you put into your model a priori the better your model should be at prediction.
For the full mathematical analysis of this type of problem I can recommend this book: https://doi.org/10.1017/CBO9781107706804
In order to turn this into a calibration problem, we need:
a model that predicts the heat demand depending on inputs and parameters,
observations of the heat demand.
Calibrating this model means tuning the parameters so that the model best predicts the heat demand.
If you go for Python, I suggest to use OpenTURNS, which provides several data assimilation methods, e.g. Kalman filtering (also called BLUE):
https://openturns.github.io/openturns/latest/user_manual/calibration.html

Why should we compute the image mean when we train CNNs?

When I use caffe for image classification, it often computes the image mean. Why is that the case?
Someone said that it can improve the accuracy, but I don't understand why this should be the case.
Refer to image whitening technique in Deep learning. Actually it has been proved that it improve the accuracy but not widely used.
To understand why it helps refer to the idea of normalizing data before applying machine learning method. which helps to keep the data in the same range. Actually there is another method now used in CNN which is Batch normalization.
Neural networks (including CNNs) are models with thousands of parameters which we try to optimize with gradient descent. Those models are able to fit a lot of different functions by having a non-linearity φ at their nodes. Without a non-linear activation function, the network collapses to a linear function in total. This means we need the non-linearity for most interesting problems.
Common choices for φ are the logistic function, tanh or ReLU. All of them have the most interesting region around 0. This is where the gradient either is big enough to learn quickly or where a non-linearity is at all in case of ReLU. Weight initialization schemes like Glorot initialization try to make the network start at a good point for the optimization. Other techniques like Batch Normalization also keep the mean of the nodes input around 0.
So you compute (and subtract) the mean of the image so that the first computing nodes get data which "behaves well". It has a mean of 0 and thus the intuition is that this helps the optimization process.
In theory, a network can be able to "subtract" the mean by itself. So if you train long enough, this should not matter too much. However, depending on the activation function "long enough" can be important.

Algorithm complexity: How to see "power consumed" as a parameter?

Space and time are considered as barometers of analyzing the complexity of an algorithm. But these days with the presence of GPU on mobile devices, there are numerous possible applications which can use that high-performance to run complex algorithms on a mobile device. For eg: iOS's Metal framework can be used for GPGPU operations. But needless to say it consumes a lot of power. So, my question is, if I am developing/implementing, say, a graph search algorithm, on a mobile device, should't I also consider the "power" complexity of my algorithm along with space-time? Now, I know the argument could be that power is something that the algorithm doesn't consume itself directly and I completely agree with that. So, maybe my grammar here is incorrect in saying that power is another dimension of measuring an algorithm's efficiency. But shouldn't power be seen as a performance measure of an algorithm?
No.
Complexity explains how the algorithm scales in time / memory. Power will be a function of time and memory.
Say you have algorithm A - O(N^2) and B - O(N^3) and they both solve the same problem. For n = 1000 B uses 1 unit of power while A uses 20. Now as you scale it up to n=10k B will need 1000 units of power while A will need only 2000. At n = 100k B will need 1'000'000 while A will need 200'000. And so on.
This assumes that the energy consumption is constant for the execution of the algorithm.
By the way the same thing happens with time. For example for short arrays nothing beats linear search.
For a specific case (rendering UI on fixed resolution) it makes sense to measure power usage and optimize it. But what works for the resolution today will not necessarily be the right thing tomorrow.
For this to be possible, you need a model of energy consumption that you can relate to the atomic operations in your algorithms.
Like "a multiply consumes one unit of energy" and "a memory slot uses two units of energy per unit of time". Maybe the relation Energy = Time x Space could make sense.
Anyway, such a "naïve" model may suffer the same phenomenon as the model of time complexity: it doesn't bear any similarity with the behavior of modern architectures and can be wrong by orders of magnitude.
Using more accurate models would be just intractable analytically.

Machine learning: optimal parameter values in reasonable time

Sorry if this is a duplicate.
I have a two-class prediction model; it has n configurable (numeric) parameters. The model can work pretty well if you tune those parameters properly, but the specific values for those parameters are hard to find. I used grid search for that (providing, say, m values for each parameter). This yields m ^ n times to learn, and it is very time-consuming even when run in parallel on a machine with 24 cores.
I tried fixing all parameters but one and changing this only one parameter (which yields m × n times), but it's not obvious for me what to do with the results I got. This is a sample plot of precision (triangles) and recall (dots) for negative (red) and positive (blue) samples:
Simply taking the "winner" values for each parameter obtained this way and combining them doesn't lead to best (or even good) prediction results. I thought about building regression on parameter sets with precision/recall as dependent variable, but I don't think that regression with more than 5 independent variables will be much faster than grid search scenario.
What would you propose to find good parameter values, but with reasonable estimation time? Sorry if this has some obvious (or well-documented) answer.
I would use a randomized grid search (pick random values for each of your parameters in a given range that you deem reasonable and evaluate each such randomly chosen configuration), which you can run for as long as you can afford to. This paper runs some experiments that show this is at least as good as a grid search:
Grid search and manual search are the most widely used strategies for hyper-parameter optimization.
This paper shows empirically and theoretically that randomly chosen trials are more efficient
for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison
with a large previous study that used grid search and manual search to configure neural networks
and deep belief networks. Compared with neural networks configured by a pure grid search,
we find that random search over the same domain is able to find models that are as good or better
within a small fraction of the computation time.
For what it's worth, I have used scikit-learn's random grid search for a problem that required optimizing about 10 hyper-parameters for a text classification task, with very good results in only around 1000 iterations.
I'd suggest the Simplex Algorithm with Simulated Annealing:
Very simple to use. Simply give it n + 1 points, and let it run up to some configurable value (either number of iterations, or convergence).
Implemented in every possible language.
Doesn't require derivatives.
More resilient to local optimum than the method you're currently using.

Simple trend analysis algorithm

OK, so you have some historic data in the form of [say] an array of integers. This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample.
The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc.
How would you go about identifying the overall trend for the two-year period, i.e.: iron out the peaks and troughs in the data?
Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain.
I'm not after a bit of code as such, more of a description of how you'd approach this problem...
Thanks in advance!
You'll get many different answers, and the one you choose really depends on more specific requirements you may have. Examples:
Low-pass filter, or any other spectral analysis technique, and use the low frequencies to determine trend.
Linear regression (time/value) to find "r" (the correlation between time and the value).
Moving average of last "n" samples. If "n" is large enough this is my favorite as many times this is sufficient, and is very easy to code. It's a sort of approximation to #1 above.
I'm sure they'll be others.
If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression, http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight.

Resources