Slowdown when processing large number of files in Ruby

Slowdown when processing large number of files in Ruby - ruby

I'm trying to create a large array containing about 64000 objects. The objects are truncated SHA256 digests of files.
The files are in 256 subdirectories (named 00 - ff) each containing about 256 files (varies slightly for each). Each file size is between around 1.5KB and 2KB.
The code looks like this:
require 'digest'
require 'cfpropertylist'
A = Array.new
Dir.glob('files/**') do |dir|
puts "Processing dir #{dir}"
Dir.glob("#{dir}/*.bin") do |file|
sha256 = Digest::SHA256.file file
A.push(CFPropertyList::Blob.new(sha256.digest[0..7]))
end
end
plist = A.to_plist({:plist_format => CFPropertyList::List::FORMAT_XML, :formatted => true})
File.write('hashes.plist', plist)
If I process 16 directories (replacing 'files/**' with 'files/0*' in the above), the time it takes on my machine is 0m0.340s.
But if I try to process all of them, the processing speed drastically reduce after about 34 directories have been processed.
This is on the latest OS X, using the stock ruby.
The machine is a mid-2011 iMac with 12GB memory and 3.4 GHz Intel Core i7.
The limiting factor does not seem to be the Array size: since if I remove the sha256 processing and just store the filenames instead, there is no slowdown.
Is there anything I can do better or to track the issue? I don't have another OS or machine available at the moment to test if this is an OS X or machine specific thing.

This is a disk/FS caching issue. After running the script to completion, and rerunning again the slowdown mostly disappeared. Also using another computer with SSD didn't show a slowdown.

Related

Can I make GhostScript use more than 2 GB of RAM?

I'm running a 64-bit version of GhostScript (9.50) on 64-bit processor with 16gb of RAM under Windows 7.
GhostScript returns a random-ish error message (it will tell me that I have type error in the array command) when I try to allocate one too many arrays totaling more than 2 GBs of RAM.
To be clear, I am seeing how growth of the memory usage in Windows Task Monitor, not from within GhostScript
I'd like to know why this is so.
More importantly, I'd like to know if I can override this behavior.
Edit: This code produces the error --
/TL 25000 def
/TL- TL 1 sub def
/G TL array def
0 1 TL- { dup == flush G exch TL array put }for
The error looks like this: Here's the last bit of the messages I get
5335
5336
5337
5338
5339
5340
5341
5342
5343
5344
5345
Unrecoverable error: typecheck in array
Operand stack: --nostringval-- ---
Begin offending input ---
/TL 25000 def /TL- TL 1 sub def /G TL array def 0 1 TL- { dup == flush G exch TL array put }for --- End offending input --- file offset = 0 gsapi_run_string_continue returns -20

The amount of RAM is almost certainly not the limiting factor, but it would help if you were to post the actual error message. It may be 'random-ish' to you, but it's meaningful to people who program in PostScript.
More than likely you've tripped over some other internal limit, for example the operand stack size but without seeing the PostScript program or the error message I cannot say any more than that. I can say that (64-bit) Ghostscript will happily address more than 2GB of RAM, I was running a file last week which had Ghostscript using 8.1GB.
Note that PostScript itself is basically a 32-bit language; while Ghostscript has extended many of the architectural limitations documented in the PostScript Language Reference Manual (such as 64K elements in arrays and strings) moving beyond 32-bit limits is essentially unspecified.
As to whether you can change the behaviour, that depends on exactly what the problem is, and I can't tell from what's here.
Edit
Here's a screenshot of Ghostscript running the test file to completion, along with the Task Manager display showing the amount of memory the process is using. Not shown is the vmstatus which I ran from the PostScript environment afterwards. This showed that Ghostscript thinks it's using 10,010,729,850 bytes form a maximum of 10,012,037,312. My calculator says that 9,562.8MB comes out at 10,027,322,572.4 bytes, so a pretty close match.
To answer the points in the comments this is (as you can probably tell) on a 64-bit Windows 10 installation with quite a lot of memory.
The difference is, almost certainly, something which has been fixed since the release of 9.52. The 9.52 64-bit binary does exit with a VMerror after (for me) 5360 iterations. Obviously trying to use vast amounts of PostScript memory (as opposed to, say, canvas memory) is not a common occurrence, not least because many PostScript interpreters simply won't allow it, so this doesn't get exercised much.
The Ghostscript Git repository is here if you want to go through the commits and try to figure out which one caused the change. You only have to go back to March this year, anything before about the 19th March would have been in 9.52.
Beyond simple curiosity, is there a reason to try and use up loads of memory in PostScript ?

Memory not being released back to OS

I've created an image resizing server that creates a few different thumbnails of and image that you upload to it. I'm using the package https://github.com/h2non/bimg for resizing, which is using libvips with c-bindings.
Before going to production I've started to stress test my app with jmeter and upload 100 images to it concurrently for a few times after each other and noticed that the memory is not being released back to the OS.
To illustrate the problem I've written a few lines of code that reads 100 images and resize them (without saving them anywhere) and then waits for 10 minutes. It repeats like this for 5 times
My code and memory/CPU graph can be found here:
https://github.com/hamochi/bimg-memory-issue
It's clear that the memory is being reused for ever cycle, otherwise it should have doubled (I think). But it's never released back to the OS.
Is this a general behaviour for cgo? Or bimg that is doing something weird. Or is it just my code that is faulty?
Thank you very much for any help you can give!

There's a libvips thing to track and debug reference counts -- you could try enabling that and see if you have any leaks.
https://libvips.github.io/libvips/API/current/libvips-vips.html#vips-leak-set
Though from your comment above about bimg memory stats, it sounds like it's probably all OK.
It's easy to test libvips memory from Python. I made this small program:
#!/usr/bin/python3
import pyvips
import sys
# disable libvips operation caching ... without this, it'll cache all the
# thumbnail operations and we'll just be testing the jpg write
pyvips.cache_set_max(0)
for i in range(0, 10000):
print("loop {} ...".format(i))
for filename in sys.argv[1:]:
# thumbnail to fit 128x128 box
image = pyvips.Image.thumbnail(filename, 128)
thumb = image.write_to_buffer(".jpg")
ie. repeatedly thumbnail a set of source images. I ran it like this:
$ for i in {1..100}; do cp ~/pics/k2.jpg $i.jpg; done
$ ../fing.py *
And watched RES in top. I saw:
loop | RES (kb)
-- | --
100 | 39220
250 | 39324
300 | 39276
400 | 39316
500 | 39396
600 | 39464
700 | 39404
1000 | 39420
As long as you have no refcount leaks, I think what you are seeing is expected behaviour. Linux processes can only release pages at the end of the heap back to the OS (have a look at the brk and sbrk sys calls):
https://en.wikipedia.org/wiki/Sbrk
Now imagine if 1) libvips allocates 6GB, 2) the Go runtime allocates 100kb, 3) libvips releases 6GB. Your libc (the thing in your process that will call sbrk and brk on your behalf) can't hand the 6GB back to the OS because of the 100kb alloc at the end of the heap. Some malloc implementations have better memory fragmentation behaviour than others, but the default linux one is pretty good.
In practice, it doesn't matter. malloc will reuse holes in your memory space, and even if it doesn't, they will get paged out anyway under memory pressure and won't end up eating RAM. Try running your process for a few hours, and watch RES. You should see it creep up, but then stabilize.
(I'm not at all a kernel person, the above is just my understanding, corrections very welcome of course)

The problem is in the resize code:
_, err = bimg.NewImage(buffer).Resize(width, height)
The image is gobject and need unref explicitly to release the memory, try:
image, err = bimg.NewImage(buffer).Resize(width, height)
defer C.g_object_unref(C.gpointer(image))

codesign an old Director projector on OS X 10.13 - perhaps manipulate __LINKEDIT segment value

(See Updates at end of this post)
For $reasons, I need to codesign an old Director projector that we can no longer re-publish (no access to original source code or to Director).
I'm doing this because when run without being signed, the app now opens a Finder window with a prompt saying "Where is..." asking for a file that's one of the embedded projector resources.
But... If I cd into the Projector.app contents (it's not really called that, but you get the idea) and find the projector binary inside Contents/MacOS/ and run this binary from terminal, the app launches and runs fine, once it's decompressed the (presumably) attached archive at the end of the binary...
/BuildRoot/Library/Caches/com.apple.xbs/Sources/AppleFSCompression/AppleFSCompression-96.30.2/Common/ChunkCompression.cpp:50: Error: unsupported compressor 8
/BuildRoot/Library/Caches/com.apple.xbs/Sources/AppleFSCompression/AppleFSCompression-96.30.2/Libraries/CompressData/CompressData.c:353: Error: Unknown compression scheme encountered for file '/System/Library/CoreServices/CoreTypes.bundle/Contents/Resources/Exceptions.plist'
/BuildRoot/Library/Caches/com.apple.xbs/Sources/AppleFSCompression/AppleFSCompression-96.30.2/Common/ChunkCompression.cpp:50: Error: unsupported compressor 8
/BuildRoot/Library/Caches/com.apple.xbs/Sources/AppleFSCompression/AppleFSCompression-96.30.2/Libraries/CompressData/CompressData.c:353: Error: Unknown compression scheme encountered for file '/System/Library/CoreServices/CoreTypes.bundle/Contents/Library/AppExceptions.bundle/Exceptions.plist'
271 blocks
1120 blocks
274 blocks
136 blocks
255 blocks
120 blocks
1487 blocks
575 blocks
1128 blocks
570 blocks
104 blocks
2042 blocks
4889 blocks
677 blocks
388 blocks
363 blocks
700 blocks
23010 blocks
...app opens and runs correctly at this point
I can't ask our users to do this (they're very non-technical) so I'm guessing that the "Where is..." prompt is some aspect of the OS X Gatekeeper, and hence I'm hoping that signing the binary will make it click-runnable again.
When I try and codesign the binary App.app/Contents/MacOS/projector I get:
main executable failed strict validation
Setting the --no-strict codesign option gives a bit more detail:
the __LINKEDIT segment does not cover the end of the file (can't be processed)
Which I think is because the Director projector is a binary with a bundled archive containing the rest of the application's resources, appended to the end of the executable. Some googling shows that other projects have similar problems with their embedded resources.
I've tried using macho_edit to see if I could modify the binary, but with no joy. I've also tried signing using jtool, but again, this didn't work.
So now, opening the binary in MachOView:
I'm hoping that I can hexedit the binary and change the value of the __LINKEDIT segment so it covers the end of the file, and hence so the codesigning will work, but I have no idea what the modified value should be, or what else if anything I need to change. Any tips appreciated.
Update 1 - in response to Kamil.S's answer
I've tried adjusting the File Size value in __LINKEDIT segment, so this + the File Offset is the same as the actual binary (I tried a few times; you actually need to change the VM Size to be the same value as the File Size or you get Killed: 9 by the OS. Same happens if you set File Size to be the total size of the binary), but with no luck. With the new File Size and VM Size values, I can still run the binary, but I can't codesign it; I do however, get a slightly different error message:
file not in an order that can be processed (link edit information does not fill the __LINKEDIT segment)
Update 2 - https://github.com/pyinstaller/pyinstaller/wiki/Recipe-OSX-Code-Signing#pyinstaller-fix-implementation has a bit more detail on the same problem:
PyInstaller breaks OSX code signing because it appends python code at
the end of the binary. Appending data at the end of executable breaks
the Mach-o format structure. codesign utility complains with the
following messages.
the __LINKEDIT segment does not cover the end of the file (can't be processed)
file not in an order that can be processed (link edit information does not fill the __LINKEDIT segment)
Fix __LINKEDIT - File Size (offset + File size == exe size), VM Size- same as 'File Size'
Fix LC_SYMTAB - String Table Size - last data in mach-o file (offset + size = exe size on the filesystem) - The data appended to the executable will be part of the 'String Table' (Last data section in
Mach-O file).
I'll take a look at fixing LC_SYMTAB to see if that helps.

__LINKEDIT's File Offset + File Size should be equal to physical executable size. You can tinker with File Size in MachOView by double clicking the value, editing it and saving - the executable should be fine. Just don't touch File Offset because this will definitely break it.

If the app can't find its external resources when being run normally from the Finder, that sounds like a result of Gatekeeper Path Randomization. The OS moves the app to a hidden read-only location when it's run, and it can't find the resources.
I don't think that signing the app binary itself will fix this problem and prevent the OS from doing path randomization. The user either needs to move the application to a different directory after extracting it, or you need to distribute the application inside a disk image that has been signed with your Developer ID certificate. DropDMG (linked from the post above) can create signed disk images, that might be worth a quick try.

DIrector uses a scheme widely used on Windows called "Overload". I attaches some data at the end of the physical file, but beyond the size of the executable image.
This approach is not supported with Mach-O files. The physical image has to end with the last byte covered by the LINKEDIT segment, and even the order of items inside the LINKEDIT segment is well defined. Reason for this was prebinding in the past, now it is codesigning.
The appended data is the initial DIR/DXR Directory wants to load first. I guess this has been fixed later by adding the DIR/DXR into the application bundle. But I am not anymore into Director, so I am not sure about this.

Performance issues mongo on windows

We have built a system where videos are stored in mongodb. The videos are each a couple of hundred megabytes in size. The system is built in python3 using mongoengine. The c extensions of pymongo and bson are installed.
The definition of the mongoengine documents is:
class VideoStore(Document, GeneralMixin):
video = EmbeddedDocumentListField(SingleFrame)
mutdat = DateTimeField()
_collection = 'VideoStore'
def gen_video(self):
for one_frame in self.video:
yield self._get_single_frame(one_frame)
def _get_single_frame(self, one_frame):
if one_frame.frame.tell() != 0:
one_frame.frame.seek(0)
return pickle.loads(one_frame.frame.read())
class SingleFrame(EmbeddedDocument):
frame = FileField()
Reading a video in Linux takes about 3 to 4 seconds. However running the same code in Windows takes 13 to 17 seconds.
Does anyone out there have any experience with this problem and any kind of solution?
I have thought of and tested (to no avail):
increasing the chunksize
reading the video as a single blob without using yield
Storing the file as a single blob (so no storing of separate frames)

Use Linux, Windows is poorly supported. The use of "infinite" virtual memory among other things causes issues with Windows variants. This thread elaborates further:
Why Mongodb performance better on Linux than on Windows?

Ruby Memory Management

I have been using Ruby for a while now and I find, for bigger projects, it can take up a fair amount of memory. What are some best practices for reducing memory usage in Ruby?
Please, let each answer have one "best practice" and let the community vote it up.

When working with huge arrays of ActiveRecord objects be very careful... When processing those objects in a loop if on each iteration you are loading their related objects using ActiveRecord's has_many, belongs_to, etc. - the memory usage grows a lot because each object that belongs to an array grows...
The following technique helped us a lot (simplified example):
students.each do |student|
cloned_student = student.clone
...
cloned_student.books.detect {...}
ca_teachers = cloned_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
# Not sure if the following is necessary, but we have it just in case...
cloned_student = nil
end
In the code above "cloned_student" is the object that grows, but since it is "nullified" at the end of each iteration this is not a problem for huge array of students. If we didn't do "clone", the loop variable "student" would have grown, but since it belongs to an array - the memory used by it is never released as long as array object exists.
Different approach works too:
students.each do |student|
loop_student = Student.find(student.id) # just re-find the record into local variable.
...
loop_student.books.detect {...}
ca_teachers = loop_student.teachers.detect {|teacher| teacher.address.state == 'CA'}
ca_teachers.blah_blah
...
end
In our production environment we had a background process that failed to finish once because 8Gb of RAM wasn't enough for it. After this small change it uses less than 1Gb to process the same amount of data...

Don't abuse symbols.
Each time you create a symbol, ruby puts an entry in it's symbol table. The symbol table is a global hash which never gets emptied.
This is not technically a memory leak, but it behaves like one. Symbols don't take up much memory so you don't need to be too paranoid, but it pays to be aware of this.
A general guideline: If you've actually typed the symbol in code, it's fine (you only have a finite amount of code after all), but don't call to_sym on dynamically generated or user-input strings, as this opens the door to a potentially ever-increasing number

Don't do this:
def method(x)
x.split( doesn't matter what the args are )
end
or this:
def method(x)
x.gsub( doesn't matter what the args are )
end
Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...
Things like this are why I have lots of love for the ruby language, but no respect for MRI

Beware of C extensions which allocate large chunks of memory themselves.
As an example, when you load an image using RMagick, the entire bitmap gets loaded into memory inside the ruby process. This may be 30 meg or so depending on the size of the image.
However, most of this memory has been allocated by RMagick itself. All ruby knows about is a wrapper object, which is tiny(1).
Ruby only thinks it's holding onto a tiny amount of memory, so it won't bother running the GC. In actual fact it's holding onto 30 meg.
If you loop over a say 10 images, you can run yourself out of memory really fast.
The preferred solution is to manually tell the C library to clean up the memory itself - RMagick has a destroy! method which does this. If your library doesn't however, you may need to forcibly run the GC yourself, even though this is generally discouraged.
(1): Ruby C extensions have callbacks which will get run when the ruby runtime decides to free them, so the memory will eventually be successfully freed at some point, just perhaps not soon enough.

Measure and detect which parts of your code are creating objects that cause memory usage to go up. Improve and modify your code then measure again. Sometimes, you're using gems or libraries that use up a lot of memory and creating a lot of objects as well.
There are many tools out there such as busy-administrator that allow you to check the memory size of objects (including those inside hashes and arrays).
$ gem install busy-administrator
Example # 1: MemorySize.of
require 'busy-administrator'
data = BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
puts BusyAdministrator::MemorySize.of(data)
# => 10 MiB
Example # 2: MemoryUtils.profile
Code
require 'busy-administrator'
results = BusyAdministrator::MemoryUtils.profile(gc_enabled: false) do |analyzer|
BusyAdministrator::ExampleGenerator.generate_string_with_specified_memory_size(10.mebibytes)
end
BusyAdministrator::Display.debug(results)
Output:
{
memory_usage:
{
before: 12 MiB
after: 22 MiB
diff: 10 MiB
}
total_time: 0.406452
gc:
{
count: 0
enabled: false
}
specific:
{
}
object_count: 151
general:
{
String: 10 MiB
Hash: 8 KiB
BusyAdministrator::MemorySize: 0 Bytes
Process::Status: 0 Bytes
IO: 432 Bytes
Array: 326 KiB
Proc: 72 Bytes
RubyVM::Env: 96 Bytes
Time: 176 Bytes
Enumerator: 80 Bytes
}
}
You can also try ruby-prof and memory_profiler. It is better if you test and experiment different versions of your code so you can measure the memory usage and performance of each version. This will allow you to check if your optimization really worked or not. You usually use these tools in development / testing mode and turn them off in production.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio