Ruby GC::Profiler "Use Size" is greater than "Total Size"

Ruby GC::Profiler "Use Size" is greater than "Total Size" - ruby

I'm trying to slim down a part of my application, so I'm using the GC::Profiler to tell whether this or that does actually slim down anything, but I'm getting a result I can't interpret:
Index Invoke Time(sec) Use Size(byte) Total Size(byte) Total Object GC Time(ms)
1 2.409 13943640 15840440 396011 36.48399999999973886133
.
.
102 7.074 30720080 30525000 763125 30.90900000000029734792
103 7.359 31267800 30525000 763125 34.99699999999972277465
.
.
1664 461.066 739610760 30525000 763125 32.36799999996264887159
According to the docs, this would seem to say that more than 2000% of the allocated heap is currently used...? And would indicate that my memory usage is growing, except that the total heap size apparently isn't...
My task is: I'm reading a large incoming HTTP stream directly to a file, and then reading the file one line at a time via lazy evaluation, performing a transform on each line, and writing it out to a file.
Any ideas as to what's going on?
Edit: Ruby 2.0 on a Mac
Edit 2: ps aux reports 2936436 VSS and 455836 RSS

Related

Can I make GhostScript use more than 2 GB of RAM?

I'm running a 64-bit version of GhostScript (9.50) on 64-bit processor with 16gb of RAM under Windows 7.
GhostScript returns a random-ish error message (it will tell me that I have type error in the array command) when I try to allocate one too many arrays totaling more than 2 GBs of RAM.
To be clear, I am seeing how growth of the memory usage in Windows Task Monitor, not from within GhostScript
I'd like to know why this is so.
More importantly, I'd like to know if I can override this behavior.
Edit: This code produces the error --
/TL 25000 def
/TL- TL 1 sub def
/G TL array def
0 1 TL- { dup == flush G exch TL array put }for
The error looks like this: Here's the last bit of the messages I get
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
Unrecoverable error: typecheck in array
Operand stack: --nostringval-- ---
Begin offending input ---
/TL 25000 def /TL- TL 1 sub def /G TL array def 0 1 TL- { dup == flush G exch TL array put }for --- End offending input --- file offset = 0 gsapi_run_string_continue returns -20

The amount of RAM is almost certainly not the limiting factor, but it would help if you were to post the actual error message. It may be 'random-ish' to you, but it's meaningful to people who program in PostScript.
More than likely you've tripped over some other internal limit, for example the operand stack size but without seeing the PostScript program or the error message I cannot say any more than that. I can say that (64-bit) Ghostscript will happily address more than 2GB of RAM, I was running a file last week which had Ghostscript using 8.1GB.
Note that PostScript itself is basically a 32-bit language; while Ghostscript has extended many of the architectural limitations documented in the PostScript Language Reference Manual (such as 64K elements in arrays and strings) moving beyond 32-bit limits is essentially unspecified.
As to whether you can change the behaviour, that depends on exactly what the problem is, and I can't tell from what's here.
Edit
Here's a screenshot of Ghostscript running the test file to completion, along with the Task Manager display showing the amount of memory the process is using. Not shown is the vmstatus which I ran from the PostScript environment afterwards. This showed that Ghostscript thinks it's using 10,010,729,850 bytes form a maximum of 10,012,037,312. My calculator says that 9,562.8MB comes out at 10,027,322,572.4 bytes, so a pretty close match.
To answer the points in the comments this is (as you can probably tell) on a 64-bit Windows 10 installation with quite a lot of memory.
The difference is, almost certainly, something which has been fixed since the release of 9.52. The 9.52 64-bit binary does exit with a VMerror after (for me) 5360 iterations. Obviously trying to use vast amounts of PostScript memory (as opposed to, say, canvas memory) is not a common occurrence, not least because many PostScript interpreters simply won't allow it, so this doesn't get exercised much.
The Ghostscript Git repository is here if you want to go through the commits and try to figure out which one caused the change. You only have to go back to March this year, anything before about the 19th March would have been in 9.52.
Beyond simple curiosity, is there a reason to try and use up loads of memory in PostScript ?

How to handle for loop with large objects in Rstudio?

I have a for loop with large objects. According to my trial-and-error, I can only load the large object once. If I load the object again, I would be returned the error "Error: cannot allocate vector of size *** Mb". I tried to overcome this issue by removing the object at the end of the for loop. However, I am still returned the error "Error: cannot allocate vector of size 699.2 Mb" at the beginning of the second run of the for loop.
My for loop has the following structure:
for (i in 1:22) {
VeryLargeObject <- ...i...
...
.
.
.
...
rm(VeryLargeOjbect)
}
The VeryLargeObjects ranges from 2-3GB. My PC has RAM of 16Gb, 8 cores, 64-bit Win10.
Any solution on how I can manage to complete the for loop?

The error "cannot allocate..." likely comes from the fact that rm() does not immediately free memory. So the first object still occupies RAM when you load the second one. Objects that are not assigned to any name (variable) anymore get garbage collected by R at time points that R decides for itself.
Most remedies come from not loading the entire object into RAM:
If you are working with a matrix, create a filebacked.big.matrix() with the bigmemory package. Write your data into this object using var[...,...] syntax like a normal matrix. Then, in a new R session (and a new R script to preserve reproducibility), you can load this matrix from disk and modify it.
The mmap package uses a similar approach, using your operating system's ability to map RAM pages to disk. So they appear to a program like they are in ram, but are read from disk. To improve speed, the operating system takes care of keeping the relevant parts in RAM.
If you work with data frames, you can use packages like fst and feather that enable you to load only parts of your data frame into a variable.
Transfer your data frame into a data base like sqlite and then access the data base with R. The package dbplyr enables you to treat a data base as a tidyverse-style data set. Here is the RStudio help page. You can also use raw SQL commands with the package DBI
Another approach is to not write interactively, but to write an R script that processes only one of your objects:
Write an R script, named, say processBigObject.R that gets the file name of your big object from the command line using commandArgs():
#!/usr/bin/env Rscript
#
# Process a big object
#
# Usage: Rscript processBigObject.R <FILENAME>
input_filename <- commandArgs(trailing = TRUE)[1]
output_filename <- commandArgs(trailing = TRUE)[2]
# I'm making up function names here, do what you must for your object
o <- readBigObject(input_filename)
s <- calculateSmallerSummaryOf(o)
writeOutput(s, output_filename)
Then, write a shell script or use system2() to call the script multiple times, with different file names. Because R is terminated after each object, the memory is freed:
system2("Rscript", c("processBigObject.R", "bigObject1.dat", "bigObject1_result.dat"))
system2("Rscript", c("processBigObject.R", "bigObject2.dat", "bigObject2_result.dat"))
system2("Rscript", c("processBigObject.R", "bigObject3.dat", "bigObject3_result.dat"))
...

Redis high memory usage for almot no keys

I have a redis instance hosted by heroku ( https://elements.heroku.com/addons/heroku-redis ) and using the plan "Premium 1"
This redis is usued only to host a small queue system called Bull ( https://www.npmjs.com/package/bull )
The memory usage is now almost at 100 % ( of the 100 Mo allowed ) even though there is barely any job stored in redis.
I ran an INFO command on this instance and here are the important part ( can post more if needed ) :
# Server
redis_version:3.2.4
# Memory
used_memory:98123632
used_memory_human:93.58M
used_memory_rss:470360064
used_memory_rss_human:448.57M
used_memory_peak:105616528
used_memory_peak_human:100.72M
total_system_memory:16040415232
total_system_memory_human:14.94G
used_memory_lua:280863744
used_memory_lua_human:267.85M
maxmemory:104857600
maxmemory_human:100.00M
maxmemory_policy:noeviction
mem_fragmentation_ratio:4.79
mem_allocator:jemalloc-4.0.3
# Keyspace
db0:keys=45,expires=0,avg_ttl=0
# Replication
role:master
connected_slaves:1
master_repl_offset:25687582196
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:25686533621
repl_backlog_histlen:1048576
I have a really hard time figuring out how I can be using 95 Mo with barely 50 object stored. These objects are really small, usually a JSON with 2-3 fields containing small strings and ids
I've tried https://github.com/gamenet/redis-memory-analyzer but it crashes on me when I try to run it
I can't get a dump because Heroku does not allow it.
I'm a bit lost here, there might be something obvious I've missed but I'm reaching the limit of my understanding of Redis.
Thanks in advance for any tips / pointer.
EDIT
We had to upgrade our Redis instance to keep everything running but it seems the issue is still here. Currently sitting at 34 keys / 34 Mo
I've tried redis-cli --bigkeys :
Sampled 34 keys in the keyspace!
Total key length in bytes is 743 (avg len 21.85)
9 strings with 43 bytes (26.47% of keys, avg size 4.78)
0 lists with 0 items (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
24 hashs with 227 fields (70.59% of keys, avg size 9.46)
1 zsets with 23 members (02.94% of keys, avg size 23.00)
I'm pretty sure there is some overhead building up somewhere but I can't find what.
EDIT 2
I'm actually blind : used_memory_lua_human:267.85M in the INFO command I run when first creating this post and now used_memory_lua_human:89.25M on the new instance
This seems super high, and might explain the memory usage

You have just 45 keys in database, so what you can do is:
List all keys with KEYS * command
Run DEBUG OBJECT <key> command for each or several keys, it will return serialized length so you will get better understanding what keys consume lot of space.
Alternative option is to run redis-cli --bigkeys so it will show biggest keys. You can see content of the key by specific for the data type command - for strings it's GET command, for hashes it's HGETALL and so on.

After a lot of digging, the issue is not coming from Redis or Heroku in anyway.
The queue system we use has a somewhat recent bug where Redis ends up caching a Lua script repeatedly eating up memory as time goes on.
More info here : https://github.com/OptimalBits/bull/issues/426
Thanks for those who took the time to reply.

GDB find command error "warning: Unable to access x bytes of target memory at y, halting search"

I'm trying to find current flag count in KMines by using gdb. I know that I should look for memory mappings first to avoid non-existent memory locations. So I ran info proc mappings command to see the memory segments. I picked up a random memory gap (0xd27000-0x168b000) from the result and executed the find command like this: find 0x00d27000, 0x0168b000, 10
But I got the warning: Unable to access 1458 bytes of target memory at 0x168aa4f, halting search. error. Although the address 0x168aa4f is between 0xd27000 and 0x168b000, gdb says that it can't access to it. Why does this happen? What can I do to avoid this situation? Or is there a way to ignore unmapped/unaccessible memory locations?
Edit: I tried to set the value of the address 0x168aa4f to 1 and it works, so gdb can actually access that address but gives error when used with the find command. But why?

I guess I have solved my own problem, I can't believe how simple the solution was. The only thing I did was to decrease the 2nd parameter's value by one. So the code should be find 0x00d27000, 0x0168afff, 10 because linux allocates the memory by using maps in [x,y) format, so if the line in root/proc/pid/maps says something like this;
01a03000-0222a000 rw-p
The memory allocated includes 0x01a03000 but not 0x0222a000. Hope this silly mistake of mine helps someone :D
Edit: The root of the problem is the algorithm implemented in target.c(gdb's source code I mean) the algorithm reads and searches the memory as chunks at the size of 16000 bytes. So even if the last byte of the chunk is invalid, gdb will throw the entire chunk into the trash and won't even give any proper information about the invalid byte, it only reports the beginning of the current chunk.

Ruby Memory Management

I have been using Ruby for a while now and I find, for bigger projects, it can take up a fair amount of memory. What are some best practices for reducing memory usage in Ruby?
Please, let each answer have one "best practice" and let the community vote it up.

When working with huge arrays of ActiveRecord objects be very careful... When processing those objects in a loop if on each iteration you are loading their related objects using ActiveRecord's has_many, belongs_to, etc. - the memory usage grows a lot because each object that belongs to an array grows...
The following technique helped us a lot (simplified example):
students.each do |student|
cloned_student = student.clone
...
cloned_student.books.detect {...}
ca_teachers = cloned_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
# Not sure if the following is necessary, but we have it just in case...
cloned_student = nil
end
In the code above "cloned_student" is the object that grows, but since it is "nullified" at the end of each iteration this is not a problem for huge array of students. If we didn't do "clone", the loop variable "student" would have grown, but since it belongs to an array - the memory used by it is never released as long as array object exists.
Different approach works too:
students.each do |student|
loop_student = Student.find(student.id) # just re-find the record into local variable.
...
loop_student.books.detect {...}
ca_teachers = loop_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
end
In our production environment we had a background process that failed to finish once because 8Gb of RAM wasn't enough for it. After this small change it uses less than 1Gb to process the same amount of data...

Don't abuse symbols.
Each time you create a symbol, ruby puts an entry in it's symbol table. The symbol table is a global hash which never gets emptied.
This is not technically a memory leak, but it behaves like one. Symbols don't take up much memory so you don't need to be too paranoid, but it pays to be aware of this.
A general guideline: If you've actually typed the symbol in code, it's fine (you only have a finite amount of code after all), but don't call to_sym on dynamically generated or user-input strings, as this opens the door to a potentially ever-increasing number

Don't do this:
def method(x)
x.split( doesn't matter what the args are )
end
or this:
def method(x)
x.gsub( doesn't matter what the args are )
end
Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...
Things like this are why I have lots of love for the ruby language, but no respect for MRI

Beware of C extensions which allocate large chunks of memory themselves.
As an example, when you load an image using RMagick, the entire bitmap gets loaded into memory inside the ruby process. This may be 30 meg or so depending on the size of the image.
However, most of this memory has been allocated by RMagick itself. All ruby knows about is a wrapper object, which is tiny(1).
Ruby only thinks it's holding onto a tiny amount of memory, so it won't bother running the GC. In actual fact it's holding onto 30 meg.
If you loop over a say 10 images, you can run yourself out of memory really fast.
The preferred solution is to manually tell the C library to clean up the memory itself - RMagick has a destroy! method which does this. If your library doesn't however, you may need to forcibly run the GC yourself, even though this is generally discouraged.
(1): Ruby C extensions have callbacks which will get run when the ruby runtime decides to free them, so the memory will eventually be successfully freed at some point, just perhaps not soon enough.

Measure and detect which parts of your code are creating objects that cause memory usage to go up. Improve and modify your code then measure again. Sometimes, you're using gems or libraries that use up a lot of memory and creating a lot of objects as well.
There are many tools out there such as busy-administrator that allow you to check the memory size of objects (including those inside hashes and arrays).
$ gem install busy-administrator
Example # 1: MemorySize.of
require 'busy-administrator'
data = BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
puts BusyAdministrator::MemorySize.of(data)
# => 10 MiB
Example # 2: MemoryUtils.profile
Code
require 'busy-administrator'
results = BusyAdministrator::MemoryUtils.profile(gc_enabled: false) do |analyzer|
BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
end
BusyAdministrator::Display.debug(results)
Output:
{
memory_usage:
{
before: 12 MiB
after: 22 MiB
diff: 10 MiB
}
total_time: 0.406452
gc:
{
count: 0
enabled: false
}
specific:
{
}
object_count: 151
general:
{
String: 10 MiB
Hash: 8 KiB
BusyAdministrator::MemorySize: 0 Bytes
Process::Status: 0 Bytes
IO: 432 Bytes
Array: 326 KiB
Proc: 72 Bytes
RubyVM::Env: 96 Bytes
Time: 176 Bytes
Enumerator: 80 Bytes
}
}
You can also try ruby-prof and memory_profiler. It is better if you test and experiment different versions of your code so you can measure the memory usage and performance of each version. This will allow you to check if your optimization really worked or not. You usually use these tools in development / testing mode and turn them off in production.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio