Local disk (C:) memory consumption after running Python scripts

Local disk (C:) memory consumption after running Python scripts - memory-management

I am running a few Python scripts which act as a model, takes in lots of data inputs (from csv files) as time-series data and produces output, whereby I am using Gurobi solver to achieve optimal output results. After completing all of these simulations, my local disk (C:) memory space has been consumed a lot. Is there any way I can trace this memory consumption and manage/clear them?
Thanks!

Related

Performance Counter for Memory Mapped Files

When using memory mapped files I'm getting in situations where Windows stalls since new memory is allocated and processed faster than it can be written to disk using memory mappedfiles.
The only solution I see is to throttle my processing while the MiMappedPageWriter and the KeBalanceSetManager are doing their jobs. I would be completely fine if the application is running slower instead of a complete OS freeze.
It already helped to use SetWorkingSetSizeEx using a hard limit, because the MiMappedPageWriter is starting earlier to page-out to disk, but still on some drives the data is allocated faster. For example an SSD with 250MB/s does not manage it, but with 500MB/s it is getting better. But I have to support a wide range of hardware and cannot rely on fast drives.
I found that there once was a performance counter, for example: Memory\Mapped File Bytes Written/sec, that I could use not monitor periodically (see: https://docs.microsoft.com/en-us/windows-server/management/windows-performance-monitor/memory-performance-counter-mapped-file-bytes-written-sec) but it seems that all the links have gone.
I have searched on many places, but couldn't find the performance counters for this.
Is there still a source for this?

Pre-warm disk cache

After some theoretical discussion today I decided to do some research, but I did not find anything conclusive.
Here's the problem:
We have written a tool that reads around 10Gb of image files from a data set of several terabytes. We want to speed up the execution time by minimizing I/O overhead. The idea would be to "pre-warm" the disk cache, as we known beforehand what directory we will be reading from as the tool executes. Is there any API or method to give this hint to Windows so that it can start pre-warming the disk cache, speeding up future disk access as the files are already in RAM (of which there is plenty on the machines we run the tool on)?
I know Windows does readahead on a single file, but what if I have a directory with thousands of files?
I haven't found any direct win32 APIs or command line tools to do this directly.
What if I start a low priority background thread, opening all the files for reading and closing them?
I could of course memory map all the files and pin them in RAM, but that would probably run the risk of starving the main worker thread of I/O.
The general idea here is that the tool "bursts" I/O requests, as each thread will do I/O and CPU processing in sequence, hence we could use the "idle" I/O time to preload the remaining files into RAM.
(I could of course benchmark, and I will, but I would like to understand a bit more of how this works in order to be more scientific and less cargo culty).

Find out reason for slow indexing in elasticsearch

I have written a script to bulk index a dataset with elasticsearch. It is working as intended, however, if I run the same script on the same dataset on different servers the execution time varies. In the server equipped with SSD, the 2 million documents are done indexing within 10 minutes, however on the one with normal hard disk, it takes up to an hour to complete. Is there a diagnostic tool I can make use of to figure out what causes the slow down?
Some additional information:
The script is written for Python3, and uses elasticsearch-py module for the bulk indexing
Both server runs the same operating system (Ubuntu 14.04 LTS), the one with slower hard drive has 64GB of RAM, but the one with SSD has half the RAM.

You will run into index merges when the large number of records is ingested. That is a process heavily dependent on the speed of the underlying storage. RAM is not really that significant here - it may be more significant when it comes to query performance and stuff you do there. Disk latencies will add up and cause a slow-down compared to the SSD platform.
Therefore, I am not surprised about the SSD speedup. SSD storage is faster than HDD by a factor of 3-8, depending on the manufacturers. If you take into account that HDDs also needs to perform positioning operations for access to different parts of the storage, it is clear that simply using an SDD instead of an HDD can accelerate disk-bound applications by a factor of 10 and more.

Imread & Imwrite do not achieve expected gains on a Ramdisk

I have written a particular image processing algorithm that makes heavy use of imwrite and imread. The following example will run simultaneously on eight Matlab sessions on a hyper-threading-enabled 6-core i7 machine. (Filenames are different for each session.)
tic;
for i=1:1000
%a processing operation will be put here%
imwrite(imgarray,temp,'Quality',100);
imgarray=imread(temp);
end
toc;
I'm considering temp=[ramdrive_loc temp]; change in the example code for two purposes:
Reducing time consumption
Lowering hard drive wearing
Image files created are about 1 Mb in size. Hard drives are formed as RAID0 with 2 x 7.2k Caviar Blacks. The machine is a Windows machine, in which partitions are formatted as NTFS.
The outputs of toc from above are (without processing images) :
Without Ramdisk: 104.330466 seconds.
With Ramdisk: 106.100880 seconds.
Is there anything that causes me not to gain any speed? Would changing file system of the ramdisk to FAT32 help?
Note: There were other questions regarding ramdisk vs. harddisk comparisons; however this question is mostly about imread, imwrite, and Matlab I/O.
Addition: The ram disk is set up through a free software from SoftPerfect. It has 3gb space, which is more than adequate for task (maximum of 10mb is to be generated and written over and over during Matlab sessions).

File caching. Probably, Windows' file cache is already speeding up your I/O activity here, so the RAM disk isn't giving you an additional speedup. When you write out the file, it's written to the file cache and then asynchronously flushed to the disk, so your Matlab code doesn't have to wait for the physical disk writes to complete. And when you immediately read the same file back in to memory, there's a high chance it's still present in the file cache, so it's served from memory instead of incurring a physical disk read.
If that's your actual code, you're re-writing the same file over and over again, which means all the activity may be happening inside the disk cache, so you're not hitting a bottleneck with the underlying storage mechanism.
Rewrite your test code so it looks more like your actual workload: writing to different files on each pass if that's what you'll be doing in practice, including the image processing code, and actually running multiple processes in parallel. Put it in the Matlab profiler, or add finer-grained tic/toc calls, to see how much time you're actually spending in I/O (e.g. imread and imwrite, and the parts of them that are doing file I/O). If you're doing nontrivial processing outside the I/O, you might not see significant, if any, speedup from the RAM disk because the file cache would have time to do the actual physical I/O during your other processing.
And since you say there's a maximum of 10 MB that gets written over and over again, that's small enough that it could easily fit inside the file cache in the first place, and your actual physical I/O throughput is pretty small: if you write a file, and then overwrite its contents with new data before the file cache flushes it to disk, the OS never has to flush that first set of data all the way to disk. Your I/O might already be mostly happening in memory due to the cache so switching to a RAM disk won't help because physical I/O isn't a bottleneck.
Modern operating systems do a lot of caching because they know scenarios like this happen. A RAM disk isn't necessarily going to be a big speedup. There's nothing specific to Matlab or imread/imwrite about this behavior; the other RAM disk questions like RAMdisk slower than disk? are still relevant.

Performance testing. How to increase hdd operations stability

I try to simulate application load to measure application performance. Dozens of clients send requests to server and significant part of request processing is random data loaded from HDD (random file, random file offset).
I use 15 Gb in 400 files.
HDD does its best to cache reading operations so overall performance is very unstable from run to run (+/- 5..10%).
In order to minimize HDD-internals optimizations I am thinking to put data to dedicated physical HDD, create random files before every test run, use the same random file access sequence (sequence of files and offsets), then run a test and format HDD at the end. I suppose it will clear all internal HDD caches and file access predictions.
What shall I do to minimize performance result dispersion? It there a simpler (or may be more appropriate) way to get performance results stable?
Thank you in advance!

Essentially all modern hard drives do include caching. It seems to me that results without a cache might be more uniform, but would be uniformly meaningless.
In any case, there are commands to disable caching on most drives (but, if memory serves, they're probably extensions, not part of the standard, so you'd have to implement them specifically for a particular target drive).
OTOH, given that you want to simulate something that isn't how a real hard drive (normally) works, I'd consider writing it as a complete software simulation -- e.g., have some sort of hard-drive class that kept a "current track", with commands to read and write data, seek to another track, etc. The class would keep track of things like the amount of (virtual) time consumed for each operation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio