Storing the data using Parallel Programming - performance

The requirement is as follows:
We have a third party client from where we need to take the data and store in the database(ultimately).
The way client is sharing data to us is through the dll's having functions(the dll is built out of C++ code) and we need to call those functions with appropriate parameters and we get the result.
Declare Function wcmo_deal Lib "E:\IGB\System\Intex\vcmowrap\vcmowr64.dll" (
ByRef WCMOarg_Handle As String,
ByRef WCMOarg_User As String,
ByRef WCMOarg_Options As String,
ByRef WCMOarg_Deal As String,
ByRef WCMOarg_DataOut As String,
ByRef WCMOarg_ErrOut As String) _
As Long
wcmo_deal(wcmo_deal_WCMOarg_Handle, WCMOarg_User, WCMOarg_Options, WCMOarg_Deal, WCMOarg_DataOut, WCMOarg_ErrOut)
Here WCMOarg_DataOut is the data we get and that needs to be stored.
Similar to the above method we have 10 more methods(so,total 11 methods) which pull the data and that data(string of around 500 KB to 1 MB each) is stored in the files using the below method :
File.WriteAllText(logPath & sDealName & ".txt", sDealName & " - " & WCMOarg_ErrOut & vbCrLf)
Now these method calls run for each deal. So for a single deal we get output in 11 different folders with the text file stored with the data received from the client.
There are 5000 deals totally for which we need to call these methods and the data gets stored in the files.
The way this functionality has been implemented is by using Parallel Programming with Master-Child relationship as follows:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = System.Environment.ProcessorCount
Parallel.ForEach(dealList, opts, Sub(deal)
If Len(deal) > 0 Then
Dim dealPass As String = ""
Try
If dealPassDict.ContainsKey(deal.ToUpper) Then
dealPass = dealPassDict(deal.ToUpper)
End If
Dim p As New Process()
p.StartInfo.FileName = "E:\IGB_New\CMBS Intex Data Deal v2.0.exe"
p.StartInfo.Arguments = deal & "|" & keycode & "|" & dealPass & "|" & clArgs(1) & "|"
p.StartInfo.UseShellExecute = False
p.StartInfo.CreateNoWindow = True
p.Start()
p.WaitForExit()
Catch ex As Exception
exceptions.Enqueue(ex)
End Try
End If
End Sub)
where CMBS Intex Data Deal v2.0.exe is the child code which will execute 5000 times as deallist contains 5000 deals.
The CMBS Intex Data Deal v2.0.exe code contains the code of calling the dlls and storing the data in the files mentioned above.
Issues faced :
The code was run keeping the Master and Child code in one single place but we get out of memory exception after 3000 deals.[for 32 GB RAM,Processor Count =16]
The above code(Master-Child) is also taking up a lot of memory, it runs fine upto 4800 deals in one hour(the memory usage gradually reaches 100% at around 4800 deals) and then for the remaining 200 deals it takes close to 1 hour(so , totally 2 hours).[for 32 GB RAM,Processor Count =16]
The reason Master child was tried was on the assumption that GC will take care of the Memory disposal of all the objects in the Child.
After the data is stored in text files, a perl script runs and loads the data into the database.
Approach tried:
Instead of keeping the data in text files and then storing into the database, I tried storing the data into the DB directly without putting them in the files (assuming I/O operations consume a lot of memory),but this too didnt work as the DB crashed/Hangs everytime.
Note:
All the handles related to the DLL is properly being closed.
The call to the DLL's method consume a lot of memory,but nothing can be done to reduce it as it cant be controlled by us.
The reason to use Parallel approach is if we go with sequential approach, it would take many hours to fetch and load the data and we need to run this twice a day as the data keeps changing, so need to be up-to-date with the latest data from the client.
There was a CPU maxing out issue as well but that has been resolved by keeping the MaxDegreeOfParallelism = System.Environment.ProcessorCount.
Question :
Is there a way to reduce the time taken by the process to complete.
Currently it takes 2 hours to complete but that could be due to no memory remaining as it reaches 4800 deals, and without any memory it cannot process any further.
Is there a way to reduce memory consumption here by trying out a different way to execute this or there is something if changed in the same code could make it work?

Parallelism is most likely totally useless. You are bound by the IO, not by the CPU. The IO is the bottleneck and parallelization might even make it worse. You might experiment by using a RAM drive and then copy all the output to the actual storage.

Related

How to handle for loop with large objects in Rstudio?

I have a for loop with large objects. According to my trial-and-error, I can only load the large object once. If I load the object again, I would be returned the error "Error: cannot allocate vector of size *** Mb". I tried to overcome this issue by removing the object at the end of the for loop. However, I am still returned the error "Error: cannot allocate vector of size 699.2 Mb" at the beginning of the second run of the for loop.
My for loop has the following structure:
for (i in 1:22) {
VeryLargeObject <- ...i...
...
.
.
.
...
rm(VeryLargeOjbect)
}
The VeryLargeObjects ranges from 2-3GB. My PC has RAM of 16Gb, 8 cores, 64-bit Win10.
Any solution on how I can manage to complete the for loop?
The error "cannot allocate..." likely comes from the fact that rm() does not immediately free memory. So the first object still occupies RAM when you load the second one. Objects that are not assigned to any name (variable) anymore get garbage collected by R at time points that R decides for itself.
Most remedies come from not loading the entire object into RAM:
If you are working with a matrix, create a filebacked.big.matrix() with the bigmemory package. Write your data into this object using var[...,...] syntax like a normal matrix. Then, in a new R session (and a new R script to preserve reproducibility), you can load this matrix from disk and modify it.
The mmap package uses a similar approach, using your operating system's ability to map RAM pages to disk. So they appear to a program like they are in ram, but are read from disk. To improve speed, the operating system takes care of keeping the relevant parts in RAM.
If you work with data frames, you can use packages like fst and feather that enable you to load only parts of your data frame into a variable.
Transfer your data frame into a data base like sqlite and then access the data base with R. The package dbplyr enables you to treat a data base as a tidyverse-style data set. Here is the RStudio help page. You can also use raw SQL commands with the package DBI
Another approach is to not write interactively, but to write an R script that processes only one of your objects:
Write an R script, named, say processBigObject.R that gets the file name of your big object from the command line using commandArgs():
#!/usr/bin/env Rscript
#
# Process a big object
#
# Usage: Rscript processBigObject.R <FILENAME>
input_filename <- commandArgs(trailing = TRUE)[1]
output_filename <- commandArgs(trailing = TRUE)[2]
# I'm making up function names here, do what you must for your object
o <- readBigObject(input_filename)
s <- calculateSmallerSummaryOf(o)
writeOutput(s, output_filename)
Then, write a shell script or use system2() to call the script multiple times, with different file names. Because R is terminated after each object, the memory is freed:
system2("Rscript", c("processBigObject.R", "bigObject1.dat", "bigObject1_result.dat"))
system2("Rscript", c("processBigObject.R", "bigObject2.dat", "bigObject2_result.dat"))
system2("Rscript", c("processBigObject.R", "bigObject3.dat", "bigObject3_result.dat"))
...

Windows (ReFS,NTFS) file preallocation hint

Assume I have multiple processes writing large files (20gb+). Each process is writing its own file and assume that the process writes x mb at a time, then does some processing and writes x mb again, etc..
What happens is that this write pattern causes the files to be heavily fragmented, since the files blocks get allocated consecutively on the disk.
Of course it is easy to workaround this issue by using SetEndOfFile to "preallocate" the file when it is opened and then set the correct size before it is closed. But now an application accessing these files remotely, which is able to parse these in-progress files, obviously sees zeroes at the end of the file and takes much longer to parse the file.
I do not have control over the this reading application so I can't optimize it to take zeros at the end into account.
Another dirty fix would be to run defragmentation more often, run Systernal's contig utility or even implement a custom "defragmenter" which would process my files and consolidate their blocks together.
Another more drastic solution would be to implement a minifilter driver which would report a "fake" filesize.
But obviously both solutions listed above are far from optimal. So I would like to know if there is a way to provide a file size hint to the filesystem so it "reserves" the consecutive space on the drive, but still report the right filesize to applications?
Otherwise obviously also writing larger chunks at a time obviously helps with fragmentation, but still does not solve the issue.
EDIT:
Since the usefulness of SetEndOfFile in my case seems to be disputed I made a small test:
LARGE_INTEGER size;
LARGE_INTEGER a;
char buf='A';
DWORD written=0;
DWORD tstart;
std::cout << "creating file\n";
tstart = GetTickCount();
HANDLE f = CreateFileA("e:\\test.dat", GENERIC_ALL, FILE_SHARE_READ, NULL, CREATE_ALWAYS, 0, NULL);
size.QuadPart = 100000000LL;
SetFilePointerEx(f, size, &a, FILE_BEGIN);
SetEndOfFile(f);
printf("file extended, elapsed: %d\n",GetTickCount()-tstart);
getchar();
printf("writing 'A' at the end\n");
tstart = GetTickCount();
SetFilePointer(f, -1, NULL, FILE_END);
WriteFile(f, &buf,1,&written,NULL);
printf("written: %d bytes, elapsed: %d\n",written,GetTickCount()-tstart);
When the application is executed and it waits for a keypress after SetEndOfFile I examined the on disc NTFS structures:
The image shows that NTFS has indeed allocated clusters for my file. However the unnamed DATA attribute has StreamDataSize specified as 0.
Systernals DiskView also confirms that clusters were allocated
When pressing enter to allow the test to continue (and waiting for quite some time since the file was created on slow USB stick), the StreamDataSize field was updated
Since I wrote 1 byte at the end, NTFS now really had to zero everything, so SetEndOfFile does indeed help with the issue that I am "fretting" about.
I would appreciate it very much that answers/comments also provide an official reference to back up the claims being made.
Oh and the test application outputs this in my case:
creating file
file extended, elapsed: 0
writing 'A' at the end
written: 1 bytes, elapsed: 21735
Also for sake of completeness here is an example how the DATA attribute looks like when setting the FileAllocationInfo (note that the I created a new file for this picture)
Windows file systems maintain two public sizes for file data, which are reported in the FileStandardInformation:
AllocationSize - a file's allocation size in bytes, which is typically a multiple of the sector or cluster size.
EndOfFile - a file's absolute end of file position as a byte offset from the start of the file, which must be less than or equal to the allocation size.
Setting an end of file that exceeds the current allocation size implicitly extends the allocation. Setting an allocation size that's less than the current end of file implicitly truncates the end of file.
Starting with Windows Vista, we can manually extend the allocation size without modifying the end of file via SetFileInformationByHandle: FileAllocationInfo. You can use Sysinternals DiskView to verify that this allocates clusters for the file. When the file is closed, the allocation gets truncated to the current end of file.
If you don't mind using the NT API directly, you can also call NtSetInformationFile: FileAllocationInformation. Or even set the allocation size at creation via NtCreateFile.
FYI, there's also an internal ValidDataLength size, which must be less than or equal to the end of file. As a file grows, the clusters on disk are lazily initialized. Reading beyond the valid region returns zeros. Writing beyond the valid region extends it by initializing all clusters up to the write offset with zeros. This is typically where we might observe a performance cost when extending a file with random writes. We can set the FileValidDataLengthInformation to get around this (e.g. SetFileValidData), but it exposes uninitialized disk data and thus requires SeManageVolumePrivilege. An application that utilizes this feature should take care to open the file exclusively and ensure the file is secure in case the application or system crashes.

Setting Size of String Buffer When Accessing Windows Registry

I'm working on a VBScript web application that has a newly-introduced requirement to talk to the registry to pull connection string information instead of using hard-coded strings. I'm doing performance profiling because of the overhead this will introduce, and noticed in Process Monitor that reading the value returns two BUFFER OVERFLOW results before finally returning a success.
Looking online, Mark Russinovich posted about this topic a few years back, indicating that since the size of the registry entry isn't known, a default buffer of 144 bytes is used. Since there are two buffer overflow responses, the amount of time taken by the entire call is approximately doubled (and yes, I realize the difference is 40 microseconds, but with 1,000 or more page hits per second, I'm willing to invest some time in optimization).
My question is this: is there a way to tell WMI what the size of the registry value is before it tries to get it? Here's a sample of the code I'm using to access the registry:
svComputer = "." ' Local machine is simply "."
ivHKey = &H80000002 ' HKEY_LOCAL_MACHINE = &H80000002 (from WinReg.h)
svRegPath = "SOFTWARE\Path\To\My\Values"
Set oRegistry = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & svComputer & "\root\default:StdRegProv")
oRegistry.GetStringValue ivHKey, svRegPath, "Value", svValue
In VBScript strings are strings. They are however long they need to be. You don't pre-define their length. Also, if performance is that much of an issue for you, you should consider using a compiled instead of an interpreted language (or a cache for values you read before).

Out of memory error in VB6 application

Before anyone says it, I know this isn't the way it should be done, but it's the way it was done and I'm trying to support it without rewriting it all.
I can assure you this isn't the worst bit by far.
The problem occurs when the application reads an entire file into a string variable.
Normally this work OK because the files are small, but one user created a file of 107MB and that falls over.
intFreeFile = FreeFile
Open strFilename For Binary Access Read As intFreeFile
ReadFile = String(LOF(intFreeFile), " ")
Get intFreeFile, , ReadFile
Close intFreeFile
Now, it doesn't fall over at the line
ReadFile = String(LOF(intFreeFile), " ")
but on the
Get intFreeFile, , ReadFile
So what's going on here, surely the String has done the memory allocation so why would it complain about running out of memory on the Get?
Usually reading a file involves some buffering, which takes space. I'm guessing here, but I'd look at the space needed for byte to character conversion. VB6 strings are 16 bit, but (binary) files are 8 bit. You'll need 107MB for the file content, plus 214 MB for the converted results. The string allocation only reserves the 214 MB.
You do not need that "GET" call, just remove it, you are already putting the file into a string, so there is no need to use the GET call.
ReadFile = Input(LOF(intFreeFile), intFreeFile)
I got the same error . And we just checked the taskmanager showing 100% resource usage . we found out one of the update application was taking too much ram memory and we just killed it.
this solved the issue for me. One more thing was we gone in to config settings.
START->RUN->MSCONFIG
and go to startup tab and uncheck the application that looks like a updater application or some odd application that you dont use.

Why the difference in speed?

Consider this code:
function Foo(ds as OtherDLL.BaseObj)
dim lngRowIndex as long
dim lngColIndex as long
for lngRowIndex = 1 to ubound(ds.Data, 2)
for lngColIndex = 1 to ds.Columns.Count
Debug.Print ds.Data(lngRowIndex, lngColIndex)
next
next
end function
OK, a little context. Parameter ds is of type OtherDLL.BaseObj which is defined in a referenced ActiveX DLL. ds.Data is a variant 2-dimensional array (one dimension carries the data, the other one carries the column index. ds.Columns is a Collection of columns in 'ds.Data`.
Assuming there are at least 400 rows of data and 25 columns, this code takes about 15 seconds to run on my machine. Kind of unbelievable.
However if I copy the variant array to a local variable, so:
function Foo(ds as OtherDLL.BaseObj)
dim lngRowIndex as long
dim lngColIndex as long
dim v as variant
v = ds.Data
for lngRowIndex = 1 to ubound(v, 2)
for lngColIndex = 1 to ds.Columns.Count
Debug.Print v(lngRowIndex, lngColIndex)
next
next
end function
the entire thing processes in barely any noticeable time (basically close to 0).
Why?
Chances are that ds.Data is creating/copying the entire array each time it is accessed (e.g. by fetching it from a database or something), and this is taking a significant amount of time.
By accessing it inside the loop you are then doing this copy operation hundreds of times. In the second case, you copy it once into a local variable outside the loop, and then quickly re-use that data hundreds of times without having to actually copy it again.
This is a common and very important optimisation technique: move any unnecessary work out of the loop so it's executed as few times as possible. It's just that in your example, it's not "obvious" that it's doing lots of extra work when you access Data.

Resources