Minimize disc activity with rmagick or imagemagick - ruby

I'm generating animated GIF files from multiple source images using Ruby. I need to maximize throughput / minimize time spent to create each GIF. I'd prefer to keep the source images in memory (probably Memcached) rather than read them from disc every time I need them. I've been using convert in backticks to execute imagemagick commands directly from Ruby, e.g.
`convert -delay #{delay} -page #{w}x#{h}+0+0 src01.gif... etc`
I slightly prefer this over RMagick as I've found more examples, I can reference the ImageMagick docs directly. It seems that images passed to the convert command need to be paths to images on disc. Additionally it seems like the output of the convert commend is a file path so the generated image would be written to disc by ImageMagick and I'd need to read it back off disc using Ruby to access the resulting image data. It seems like I'm making ImageMagick read the source images from disc each time and write the generated GIF to disc each time. I think this is likely to be a bottleneck and unnecessary as I don't need to persist the generated images I just need to access their image data in Ruby momentarily.
I noticed that RMagick methods can take Magick::Images as parameters instead of filepaths. I could keep the source images in memory in this case. Additionally RMagick returns the generated image as data to Ruby which is what I need, I don't need it written to disc.
I'm thinking of using RMagick instead of
`convert...`
to reduce disc activity.
So question 1: Does this make sense though? Since RMagick presumably wraps ImageMagick, is RMagick actually reading and writing to disc under the hood or does it have some way of utilizing ImageMagick without disc activity?
And question 2: Is there any way to get image data in and out of ImageMagick's convert command without disc activity?
Hope this makes sense. Just trying to wrap my head around this and apologize if I'm unclear.

Does this make sense though?
Not really. We can argue about open fd's, and cost of shell environments over direct API, but there wouldn't be any disk I/O benefit between the convert utility & RMagick.
Is there any way to get image data in and out of ImageMagick's convert command without disc activity?
ImageMagick ships with stream utility. There's not much usage-documentation, but it could be leveraged to extract the image data to a blob that can be distributed via memcached.
There's also the mpr: protocol to handle label based memory access, but that might not be the distributed solution your looking for. Plus data is removed at time of process completion.
Personally, Marks comment about RAMdisk would be something I would recommend. A simple memory/tmpfs mount is easy to set-up on a system, and then it would just be a matter of updating policy.xml configuration to use said mount as a temporary directory.

Related

Efficient way to loop through large amount of files, convert them to webp and save the timestamp

I have a folder with about 750'000 images. Some images will change over time and new images will also be added every now and then. The folder-structure is about 4-5 levels deep with a maximum of 70'000 images per one single folder.
I now want to write a script that can do the following:
Loop through all the files
Check if the file is new (has not yet been converted) or changed since the last conversion
Convert the file from jpg or png to webp if above rules apply
My current solution is a python script that writes the conversion-times into a sqlite database. It works, but is really slow. I also thought about doing it in PowerShell due to better performance (I assume) but had no efficient way of storing the conversion-times.
What language would you recommend? Is there another way to convert jpg to webp without having to exernally call the command cwebp from within my script?

Can golang read the mpr written by imageMagick?

I need to use the webpage to display the thumbnail of the local image, golang as the server, call imageMagick to generate the thumbnail, can I do it directly in memory?
I only found the mpc parameter but don't know how to use
mpr:This format permits you to write to and read images from memory. The filename is the registry key. The image persists until you explicity delete it or the program exits.
https://www.imagemagick.org/Usage/files/#mpr
Thanks for the reply, I already know this can't be done.
#Steffen Ullrich,thanks for the recommendation.
I tried imagick but it wasn't ideal, it took almost 30s to read a 20m image,it only takes less than 2s to use convert.exe.
So I decided to generate the thumbnail first and then read it

overlay one pdf with another from the command line: pdftk alternative?

I use a bash script to auto-generate a pdf calendar each month.I use the wonderful remind program as the basis for this routine. Great as are the calendars I get using that program, I need a more detailed header for the calendar (than just the name of the month and the year). I couldn't puzzle out a way to get the remind program to enhance the header, but I was able to get the enhanced results I wanted by creating a second pdf containing the header enhancements I need, then overlaying that pdf onto the calendar I produce with remind, via the pdftk utility (pdftk calendar.pdf stamp calendar_overlay.pdf output MONTH-YEAR-cal.pdf). Unfortunately, I recently lost the ability to use pdftk since keeping it on my system would necessitate me ceasing to do other system updates. In short, I had to remove it in order to continue updating my system.
So now I'm looking for some alternative that I can incorporate into my bash script. I am not finding any utility that will allow me to overlay one pdf with another, like pdftk allows. It seems I may be able to do something like what I'm after using imagemagick (-convert), though I would likely need to overlay the pdf with an image file like a .jpg rather than with a pdf. Another possible solution may be to use TeX/LaTeX to insert text into the pdf as described at https://rsmith.home.xs4all.nl/howto/adding-text-or-graphics-to-a-pdf-file.html.
I wanted to ask here, before investing a lot of time and effort into pursuing one or other of the two potential options I've identified, whether there is some other way, using command line options that can be incorporated into a bash script, of overlaying one pdf with another in the manner described? Input will be appreciated.
LATER EDIT: another link with indications how to do such things using LaTeX https://askubuntu.com/questions/712691/batch-add-header-footer-to-pdf-files
Assuming for simplicity that both of your files are of size 500pt x 200pt,
you can use pdfjam with nup and delta options to trick it into overlaying your source pdf files.
pdfjam bottom.pdf top.pdf --outfile merged.pdf \
--nup "1x2" \
--noautoscale true \
--delta "0 -200pt" \
--papersize "{500pt, 200pt}"
Unfortunately, I've found in my tests that I needed to increase the y delta by one point to get perfect alignment.
pdftk-java is a Java-based port of pdftk which looks to be actively in development. Given that its only real requirement appears to be Java 7+, it should work even in environments such as your own that no longer support the requirements of pdftk, so long as they have a Java runtime installed.

Duplicate photo searching with compare only pure imagedata and image similarity?

Having approximately 600GB of photos collected over 13 years - now stored on freebsd zfs/server.
Photos comes from family computers, from several partial backups to different external USB HDDs, reconstructed images from disk disasters, from different photo manipulation softwares (iPhoto, Picassa, HP and many others :( ) in several deep subdirectories - shortly = TERRIBLE MESS with many duplicates.
So in the first i done:
searched the the tree for the same size files (fast) and make md5 checksum for those.
collected duplicated images (same size + same md5 = duplicate)
This helped a lot, but here are still MANY MANY duplicates:
photos what are different only with exif/iptc data added by some photo management software, but the image is the same (or at least "looks as same" and have the same dimensions)
or they are only a resized versions of the original image
or they are the "enhanced" versions of originals, etc..
Now the questions:
how to find duplicates withg checksuming only the "pure image bytes" in a JPG without exif/IPTC and like meta informations? So, want filter out the photo-duplicates, what are different only with exif tags, but the image is the same. (therefore file checksuming doesn't works, but image checksuming could...). This is (i hope) not very complicated - but need some direction.
What perl module can extract the "pure" image data from an JPG file what is usable for comparison/checksuming?
More complex
how to find "similar" images, what are only the
resized versions of the originals
"enchanced" versions of the originals (from some photo manipulation programs)
is here already any algorithm available in a unix command form or perl module (XS?) what i can use to detect these special "duplicates"?
I'm able make complex scripts is BASH and "+-" :) know perl.. Can use FreeBSD/Linux utilities directly on the server and over the network can use OS X (but working with 600GB over the LAN not the fastest way)...
My rough idea:
delete images only at the end of workflow
use Image::ExifTool script for collecting duplicate image data based on image-creation date, and camera model (maybe other exif data too).
make checksum of pure image data (or extract histogram - same images should have the same histogram) - not sure about this
use some similarity detection for finding duplicates based on resize and foto enhancement - no idea how to do...
Any idea, help, any (software/algorithm) hint how to make order in the chaos?
Ps:
Here is nearly identical question: Finding Duplicate image files but i'm already done with the answer (md5). and looking for more precise checksuming and image comparing algorithms.
Assuming you can work with localy mounted FS:
rmlint : fastest tool I've ever used to find exact duplicates
findimagedupes : automatize the whole ImageMagick way (as Randal Schwartz's script that I haven't tested? it seems)
Detecting Similar and Identical Images Using Perseptual Hashes goes all the way (a great reference post)
dupeguru-pe (gui) : dedicated tool that is fast and does an excellent job
geeqie (gui) : I find it fast/excellent to finish the job, using the granular deduplication options. Also then you can generate an ordered collection of images such that 'simular images are next to each other, allowing you to 'flip' between the two to see the changes.
Have you looked at this article by Randal Schwartz? He uses a perl script with ImageMagick to compare resized (4x4 RGB grid) versions of the pictures that he then compares in order to flag "similar" pictures.
You can remove exif data with mogrify -strip from ImageMagick toolset. So you could, for each image, copy it without exif, md5sum, and then compare md5sums.
When it comes to visually similar messages - you can, for example, use compare (also from ImageMagick toolset), and produce black/white diff map, like described here, then make histogram of the difference and check if there is "enough" white to mean that it's different.
I had a similar dilemma - several hundred gigs of photos and videos spread and duplicated over about a dozen drives. I know this may not be the exact way you are looking for, but the FSlint Janitor application (on Ubuntu 16.x, then 18.x) was a lifesaver for me. I took the project in chunks, eventually cleaning it all up and ended up with three complete sets (I wanted two off-site backups).
FSLint Janitor:
sudo apt install fslint

Read image from middle of file in ImageMagick / GraphicsMagick

I have a binary file that starts off with some data. After this data, the file has a JPEG image embedded in it. After the image, the file continues with some other data. I wish to use the image as an OpenGL texture.
At the moment, the only method I know of creating an OpenGL texture with Magick is to read the image file with Magick, write it into a blob, and upload the blob.data() to opengl (from this link: http://www.imagemagick.org/pipermail/magick-developers/2005-July/002276.html).
I am trying to use Magick++, but it only allows me to specify a filename, not a C-style filehandle or filestream...Are my only options the following? :
Save the JPEG image portion in the binary file as a separate temporary file and get Magick++ to read that image. I don't wish to do this as writing to disk will slow my program down.
Read the image portion into an array, create a Blob with the array as its data, and then read the Blob to obtain an image. I don't wish to do this either because after I get the image, I will need to again write the image data to another blob, and the entire code becomes unnecessarily long.
Switch to another library like DevIL, which offers support for what I want. Unfortunately, DevIL is not as feature rich as Magick.
I also looked into the core C API for Magick, where I can specify a filehandle, but the documentation says that the filehandle is closed by Magick after the image is read, which is definitely not good for my program (it is going to be pretty ugly to get the rest of my program to reopen the binary file to continue its processing...
If there is a way to provide Magick with custom I/O routines, or better still, a cleaner way of using Magick with OpenGL, please enlighten me!
The next release of GraphicsMagick does not close the input file handle after the image is read. You can try the latest development snapshot.
You could consider using mmap() (memory mapped file) to access the data and treat it as an in-memory BLOB using Magick++. The main issue with this is you might not know how long the data was in case you need to access data following the embedded JPEG image data.
It is trivial to add FILE* support to Magick++. The only reason I did not do so was for philosophical reasons (C++ vs C).

Resources