Massive memory usage when defining regression model - pymc

I'm having problems with memory when running a Poisson regression model. With the data loaded in and ready for the model, python is using about 650 MB of memory. As soon as I create the model,
import theano.tensor as t
with pm.Model() as poisson_model:
# priors for coefficients
coeffs = pm.Uniform('coeffs', -10, 10, shape=(1, predictors.shape[1]))
r = t.exp(pm.sum(coeffs*predictors.values, 1))
obs = pm.Poisson('obs', r, observed=targets)
the memory usage shoots up to 3 GB. There are only 350 data points of 8-bit integers, so I have no idea what is using this amount of memory.
After playing around a bit, I've found that adding anything to the model puts it up to 3 GB in memory, even something as simple as
with pm.Model() as poisson_model:
test = pm.Uniform('test', -1, 1)
Any suggestions as to what is happening or how I can look deeper? I'm using a new iMac, Python 2.7, and the latest version of PyMC3. Thanks.

I've tried replicating this on my system (Macbook Air, Py 2.7) but get ~80MB of memory usage. I would try a couple of things:
Clear the theano cache:
theano-cache clear
Try updating Theano
Re-install PyMC from the master branch
These are all guesses, as I cannot replicate the issue, so I'm hoping one of these will do the trick.

Related

Julia drawing from standard normal distribution

I need to draw 53000000 observations from a standard normal distribution. My current code takes a long time to run in Julia (in fact, it's been running for the past twenty minutes) and I'm wondering if there's anything I can do to speed it up. Here's what I tried:
using Distributions
d = Normal()
shock = rand(d, 1, 53000000)
The code works instantaneously when I execute it in REPL (I am working in Juno/Atom), but lags at this point (drawing from the standard normal) when I step through using the debugger. So I think the debugger may be the real culprit here.
It may be that the 1/2 gig of memory used by the allocation of the variable shock is sometimes causing swapping when the debugger is loaded.
Try running this to see, in the debugger:
using Distributions, Base.Sys
println("Free memory is $(Int(Sys.free_memory()))")
d = Normal()
shock = rand(d, 1, 53000000)
println("shock uses $(sizeof(shock)) bytes.")
println("Free memory is $(Int(Sys.free_memory()))")
Are you close to out of memory in gigs?

Unable to allocate GPU memory, when there is enough of cached memory

I am training vgg16 model from scratch on AWS EC2 Deep Learning AMI machine (Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1054-aws x86_64v)) with Python3 (CUDA 10.1 and Intel MKL) (Pytorch 1.3.1) and facing below error while updating model parameters.
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.17 GiB total capacity; 10.76 GiB already allocated; 4.81 MiB free; 119.92 MiB cached)
Code for updating parameters:
def _update_fisher_params(self, current_ds, batch_size, num_batch):
dl = DataLoader(current_ds, batch_size, shuffle=True)
log_liklihoods = []
for i, (input, target) in enumerate(dl):
if i > num_batch:
break
output = F.log_softmax(self.model(input.cuda().float()), dim=1)
log_liklihoods.append(output[:, target])
log_likelihood = torch.cat(log_liklihoods).mean()
grad_log_liklihood = autograd.grad(log_likelihood, self.model.parameters())
_buff_param_names = [param[0].replace('.', '__') for param in self.model.named_parameters()]
for _buff_param_name, param in zip(_buff_param_names, grad_log_liklihood):
self.model.register_buffer(_buff_param_name+'_estimated_fisher', param.data.clone() ** 2)
After debugging: log_liklihoods.append(output[:, target]) line throws error after 157 iterations
I have the required memory but it does not allocate, I am not getting why updating the gradients is causing the memory problem, as gradients should be de-referenced and released automatically on each iteration. Any idea?
I have tried following solutions but no luck.
Lowering batch size
Freeing cache with torch.cuda.empty_cache()
Reducing the number of filters to reduce the memory footprint
Machine Specs:
Finally I solved the memory problem! I realized that in each iteration I put the input data in a new tensor, and pytorch generates a new computation graph.
That causes the used RAM to grow forever. Then I used .detach() function, and the RAM always stays at a low level.
self.model(input.cuda().float()).detach().requires_grad_(True)

A fast solution to obtain the best ARIMA model in R (function `auto.arima`)

I have a data series composed by 2775 elements:
mean(series)
[1] 21.24862
length(series)
[1] 2775
max(series)
[1] 81.22
min(series)
[1] 9.192
I would like to obtain the best ARIMA model by using function auto.arima of package forecast:
library(forecast)
fit=auto.arima(Netherlands,stepwise=F,approximation = F)
But I am having a big problem: RStudio is running for an hour and a half without results. (I developed an R code to perform these calculations, employed on a Windows machine equipped with a 2.80GHz Intel(R) Core(TM) i7 CPU and 16.0 GB RAM.) I suspect that this is due to the length of time series. A solution could be the parallelization? (But I don't know how apply it).
Anyway, suggestions to speed this code? Thanks!
The forecast package has many of its functions built with parallel processing in mind. One of the arguments of the auto.arima() function is 'parallel'.
According to the package documentation, "If [parallel = ] TRUE and stepwise = FALSE, then the specification search is done in parallel.This can give a significant speedup on mutlicore machines."
If parallel = TRUE, it will automatically select how many 'cores' to use (for a laptop or desktop, it is often the number of cores * 2. For example, I have 4 cores and each core has 2 processors = 8 'cores'). If you want to manually set the number of cores, also use the argument num.cores.
I'd recommend checking out the e-book written by Hyndman all about the package. It is like a time-series forecasting bible.

How do you save a large amount of data using SAGE?

I'm trying to save a 'big' rational matrix in SAGE, but I'm running into problems. After computing my matrix A, which has size 5 x 10,000 and each entry contains rational numbers in fraction form with total number of digits for numerator and denominator more than 10 pages, I run the following command:
save(A, DATA + 'A').
This gives me the following error message:
Traceback(most recent call last):
...
RuntimeError: Segmentation fault.
I tried the same save command with a 'smaller' matrix and that worked fine. I should also note that I'm using a laptop with a 64-bit operating system, x64-based processor, Windows 8, i7 CPU # 2.40 GHz and 8 GB RAM. I'm running SAGE on a virtual machine to which I allocated 5237 MB. Let me know if you need further information. My questions are:
Why can't I save my matrix? Why do I get the above error message? What does it mean?
How can I save my matrix A using this command? Is there any other way I can save it?
I have asked these same questions in another forum which specifically deals with SAGE, but I'm not getting an answer there. I have also spent a lot of time searching online about this question, but haven't seen anyone with this same problem.

Performance penalty of persistent variables in MATLAB

Recently I profiled some MATLAB code and I was shocked to see the following in a heavily used function:
5.76 198694 58 persistent CONSTANTS;
3.44 198694 59 if isempty(CONSTANTS) % initialize CONSTANTS
In other words, MATLAB spent about 9 seconds, over 198694 function calls, declaring the persistent CONSTANTS and checking if it has been initialized. That represents 13% of the total time spent in that function.
Do persistent variables really carry that much of a performance penalty in MATLAB? Or are we doing something terribly wrong here?
UPDATE
#Andrew I tried your sample script and I am very, very perplexed by the output:
time calls line
6 function has_persistent
6.48 200000 7 persistent CONSTANTS
1.91 200000 8 if isempty(CONSTANTS)
9 CONSTANTS = 42;
10 end
I tried the bench() command and it showed my machine in the middle range of the sample machines. Running Ubuntu 64 bits on a Intel(R) Core(TM) i7 CPU, 4GB RAM.
That's the standard way of using persistent variables in Matlab. You're doing what you're supposed to. There will be noticable overhead for it, but your timings do seem kind of surprisingly high.
Here's a similar test I ran in 32-bit Matlab R2009b on a 3.0 GHz Intel Core 2 QX9650 machine under Windows XP x64. Similar results on other machines and versions. About 5x faster than your timings.
Test:
function call_has_persistent
for i = 1:200000
has_persistent();
end
function has_persistent
persistent CONSTANTS
if isempty(CONSTANTS)
CONSTANTS = 42;
end
Results:
0.89 200000 7 persistent CONSTANTS
0.25 200000 8 if isempty(CONSTANTS)
What Matlab version, OS, and CPU are you running on? What does CONSTANTS get initialized with? Does Matlab's bench() output seem reasonable for your machine?
Your timings do seem high. There may be a bug or config issue there to fix. But if you really want to get Matlab code fast, the standard advice is to "vectorize" it: restructure the code so that it makes fewer function calls on larger input arrays, and makes use of Matlab's built in vectorized functions instead of loops or control structures, to avoid having 200,000 calls to the function in the first place. If possible. Matlab has relatively high overhead per function or method call (see Is MATLAB OOP slow or am I doing something wrong? for some numbers), so you can often get more mileage by refactoring to eliminate function calls instead of making the individual function calls faster.
It may be worth benchmarking some other basic Matlab operations on your machine, to see if it's just "persistent" that seems slow. Also try profiling just this little call_has_persistent test script in isolation to see if the context of your function makes a difference.

Resources