jRuby Zip out of Memory - ruby

We have a small utility that finds unused items on our server and zips them up then moves them this is written in jRuby. When we go to run this on the actual servers needing clean up they run out of memory before they can complete the operation of the clean up. The java memory is as high as we can get it to run stably on 32bit and we can't move to 64bit at this time it is around 1800m max heap size. There is our main application running as well that we would like to avoid shutting down. The zips the system is creating are 800megs plus is there any way to do this and not have the entire zip file open in memory?

Can you execute zip via the command line?
You may also want to look at pbzip2, you will still need tar to do the archival of multiple files though.

Related

How can I clear out data which was removed by rm -rf on a BusyBox based system without having to restart the system?

I am trying to create a shell script for easily doing some long commands for a busybox based Unix system.
The system doesn't have commands like lsof but has other basic commands. The issue I am facing is that when a new feature is added, I need to delete certain folders and push new folders into the system using adb push.
When I do that, I get an error saying that there is no space left on the device. So I need to do a power cycle.
I was wondering if there is any way to remove the data after performing rm -rf without having to perform a power cycle i.e shutdown and restart.
On Linux-based system, the spaces occupied by a file is removed when
The file is not referenced by any folder
The file is not opened by any process
The 'rm' will meet the first condition, however, you will have to kill all processes that have the files open for the space to be cleared, otherwise, it will only happen during reboot (power cycle), when the fsck utility will identify "orphan" i-nodes, not referenced by any folder.
There is no information about which processes are holding the open handles. Here are few generic alternatives:
If there is one (or few) processes that keep the large file open, try to stop/restart those services.
Instead of 'rm', you can truncate the larger files BEFORE opening them. 'echo > large-file' will release all the file space. Of course, this might have negative impact on the running processes, but from the question, it looks as if this is not an issue.
In theory, you can use 'telinit' to switch busybox from standard mode to different runlevel. If the services are configured for proper shutdown, you can get the system back into post-reboot, without power cycle

Move/copy millions of images from Macos to external drive to ubuntu server

I have created a dataset of millions (>15M, so far) of images for a machine-learning project, taking up over 500GB of storage. I created them on my Macbook Pro but want to get them to our DGX1 (GPU cluster) somehow. I thought it would be faster to copy to a fast external SSD (2x nvme in raid0) and then plug that drive directly into local terminal and copy it to the network scratch disk. I'm not so sure anymore, as I've been cp-ing to the external drive for over 24 hrs now.
I tried using the finder gui to copy at first (bad idea!). For a smaller dataset (2M images), I used 7zip to create a few archives. I'm now using the terminal in MacOS to copy the files using cp.
I tried "cp /path/to/dataset /path/to/external-ssd"
Finder was definitely not the best approach as it took forever at the "preparing" to copy stage.
Using 7zip to archive the dataset increased the "file" transfer speed, but it took over 4 days(!) to extract the files, and that for a dataset an order of magnitude smaller.
Using the command line cp, started off quickly but seems to have slowed down. Activity monitor says I'm getting 6-8k IO's on the disk. It's been 24 hours and it isn't quite halfway done.
Is there a better way to do this?
rsync is the preferred tool for this kind of workload. It is used for both local and network copies.
Main benefits are (excerpt from manpage):
delta-transfer algorithm, which reduces the amount of data sent
if it is interrupted for any reason, then you can restart it easily with very little cost. It can even restart part way through a large file
options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied.
Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
Regarding command usage and syntax, for local transfers is almost the same as cp:
rsync -az /path/to/dataset /path/to/external-ssd

Redis's huge files won't delete?

I'm using Redis-server for windows ( 2.8.4 - MSOpenTech) / windows 8 64bit.
It is working great , but even after I run :
I see this : (and here are my questions)
When Redis-server.exe is up , I see 3 large files :
When Redis-server.exe is down , I see 2 large files :
Question :
— Didn't I just tell it to erase all DB ? so why are those 2/3 huge files are still there ?
How can I completely erase those files? ( without re-generating)
NB
It seems that it is doing deletion of keys without freeing occupied space. if so , How can I free this unused space?
From https://github.com/MSOpenTech/redis/issues/83
"Redis uses the fork() UNIX system API to create a point-in-time snapshot of the data store for storage to disk. This impacts several features on Redis: AOF/RDB backup, master-slave synchronization, and clustering. Windows does not have a fork-like API available, so we have had to simulate this behavior by placing the Redis heap in a memory mapped file that can be shared with a child(quasi-forked) process. By default we set the size of this file to be equal to the size of physical memory. In order to control the size of this file we have added a maxheap flag. See the Redis.Windows.conf file in msvs\setups\documentation (also included with the NuGet and Chocolatey distributions) for details on the usage of this flag. "
I know this is an old thread, but I am facing the same issues with the file sizes.
In case you have problems with your C ssd drive (like me), you can make a directory junction:
1) Stop redis service
2) Move C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Redis folder to another drive / location.
3) Open a command prompt in C:\Windows\ServiceProfiles\NetworkService\AppData\Local then execute:
mklink /J "C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Redis" "[newpath]"
PD: [newpath] must be absolute, like "D:\directory junctions\Redis"
4) Start redis service. Now the files are in another drive.
Check http://ss64.com/nt/mklink.html if doubts regarding this command.
I faced this same issue on my development machine. It was resolved by stopping the redis service and I used WinDirStat (which is what I used to detect the issue originally) to permanently delete these files in appdata/local/redis.
I then started redis back up and things were working fine.
Before following this same procedure others may want to ensure that this data isn't needed. In my case it wasn't critical since this is my development workstation.
When you flush the DB you only flush the keys from memory. I'm not sure why you've got files of different names, it may be an artifact of the way the Windows port of Redis manages files, but Redis itself doesn't delete files when you remove keys. You will need to manage outdated files outside of Redis.

Locking sharable memory

Is there away to page into memory another process’s entire image? In a couple of weeks, our IT staff will be replacing all of the "core" network switches. This will bring down the network. This will be done after normal business hours. During this time, several users will still be using a program that I have written. It will be a nightmare to install local copies of my program on each user's machine. The program normally runs from a network share. The only time the program will access the network is when the program executes its executable (image) code. How can I get the Windows Memory Manager to load the entire image into memory and hold it "lock" there until the network is back online?
You can relink your program with the /swaprun:net option:
http://msdn.microsoft.com/en-us/library/w0628bwh.aspx
You could write it so that it copies itself locally to temp directory and then have it run that copy as a separate process, and then kill itself(the first copy). I've done this little juggling act before, but it depends on how your program works whether or not it will like being run from the temp directory.
This isn't going to work.
Windows doesn't necessarily load a 'static' copy of the executable into memory, it's free to shuffle chunks around and page parts in and out. Often it loads resources (images, strings, etc.) from the executable after the program has started running. It often loads external libraries dynamically as well.
Edited to add:
There is no such thing as "a process's entire image". Every thread, for example, gets its own allocation.
Maybe you should explain why running from a different location (i.e., a local copy of the binary) won't work for you.

Unmovable Files on Windows XP

When I defragment my XP machine I notice that there is a block of "Unmovable Files". Is there a file attribute I can use to make my own files unmovable?
Just to clarify, I want a way to programmatically tell Windows that a file that I create should be unmovable. Is this possible, and if so, how can I do it?
Thanks,
Terry
A lot of system files cannot be moved after the system boots, such as the page file and registry database files.
This utility runs before Windows boots to defragment those files. I have it set to run at every boot, and it works well for me on several machines.
Note that the very first time you boot up with this utility set to run, it may take several minutes to defrag. After that first run though, it finishes in just 3 or 4 seconds.
Edit0: To respond to your clarification- that link says windows has marked the page file and registry files as open for exclusive access. So you should be able to do the same thing with the LockFile API Call. However, that's not an attribute of the file itself. You'd have to actually run some background program that locks the file for exclusive access.
There are no file attributes that you can place on your files to mark them as immovable. The only way that a file cannot be moved (I think) during defragmentation is to have some other process have the file open (for read or write, I'm not even sure that you need to have the file open in exclusive mode or not).
Quite frankly, I cannot think of a reason that you'd want your files not to move, unless you have specific requirements about where on the disk platter your files reside. Defragmentation should generally lead to faster disk access and that seems to be desireable in all cases :-)
This usually means that the file is in use by some process. If you're defragmenting, you'll likely see this with a lot of system files. If the file should legitimately be movable and is stuck (it's being held by a process that runs at startup but shouldn't be, for example), the most useful way of resolving the problem is to remove all permissions on the file, reboot, restore the permissions, and then get rid of the file/run the program that's trying to use it.
I suppose the ugly way is to have an application boot on startup, check every few seconds if defrag is running and if so open the file in exclusive mode.
This is really ugly and I don't recommend it unless there is no cleaner solution.
Terry, the answers all mention ways to prevent files from becoming unmovable during defragmentation. From your question it appears that you are in fact wanting to make your personal files unmovable. Can you please clarify what is appealing about making your files unmovable.
I assume you're using the defragger that comes with Windows. Some commercial ones like DiskKeeper can move some of these files (usually system files). You can try their trial versions.
Contig might serve your purpose http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx
I'm relatively certain I ran across some methods/attributes you could access programatically to do exactly what you want. This was back in NT4 days though and my memory isn't that good.
For a little more complete solution try Raxco's PerfectDisk. While it is a commercial product it does a very good job and supports boot time defrag of system files. The first defrag takes longer than say DiskKeeper but its a single pass defragger and supports defragging with very little free space left on the drive. Overall its a much smarter defrag program then any other I've seen and supports systems of any size.
http://www.raxco.com/
first try to move(or delete) the files within safe mode. If can not, try to move(or delete) the files with linux.
But be careful if those are the windows system files, then you are failed to boot up your windows.
Some reason why the files are unmovable are : the file size is too big, the files are being in open/in use condition, insufficient security privileges, being access by other computer/s, and many other things.

Resources