Why the difference in speed? - performance

Consider this code:
function Foo(ds as OtherDLL.BaseObj)
dim lngRowIndex as long
dim lngColIndex as long
for lngRowIndex = 1 to ubound(ds.Data, 2)
for lngColIndex = 1 to ds.Columns.Count
Debug.Print ds.Data(lngRowIndex, lngColIndex)
next
next
end function
OK, a little context. Parameter ds is of type OtherDLL.BaseObj which is defined in a referenced ActiveX DLL. ds.Data is a variant 2-dimensional array (one dimension carries the data, the other one carries the column index. ds.Columns is a Collection of columns in 'ds.Data`.
Assuming there are at least 400 rows of data and 25 columns, this code takes about 15 seconds to run on my machine. Kind of unbelievable.
However if I copy the variant array to a local variable, so:
function Foo(ds as OtherDLL.BaseObj)
dim lngRowIndex as long
dim lngColIndex as long
dim v as variant
v = ds.Data
for lngRowIndex = 1 to ubound(v, 2)
for lngColIndex = 1 to ds.Columns.Count
Debug.Print v(lngRowIndex, lngColIndex)
next
next
end function
the entire thing processes in barely any noticeable time (basically close to 0).
Why?

Chances are that ds.Data is creating/copying the entire array each time it is accessed (e.g. by fetching it from a database or something), and this is taking a significant amount of time.
By accessing it inside the loop you are then doing this copy operation hundreds of times. In the second case, you copy it once into a local variable outside the loop, and then quickly re-use that data hundreds of times without having to actually copy it again.
This is a common and very important optimisation technique: move any unnecessary work out of the loop so it's executed as few times as possible. It's just that in your example, it's not "obvious" that it's doing lots of extra work when you access Data.

Related

Storing the data using Parallel Programming

The requirement is as follows:
We have a third party client from where we need to take the data and store in the database(ultimately).
The way client is sharing data to us is through the dll's having functions(the dll is built out of C++ code) and we need to call those functions with appropriate parameters and we get the result.
Declare Function wcmo_deal Lib "E:\IGB\System\Intex\vcmowrap\vcmowr64.dll" (
ByRef WCMOarg_Handle As String,
ByRef WCMOarg_User As String,
ByRef WCMOarg_Options As String,
ByRef WCMOarg_Deal As String,
ByRef WCMOarg_DataOut As String,
ByRef WCMOarg_ErrOut As String) _
As Long
wcmo_deal(wcmo_deal_WCMOarg_Handle, WCMOarg_User, WCMOarg_Options, WCMOarg_Deal, WCMOarg_DataOut, WCMOarg_ErrOut)
Here WCMOarg_DataOut is the data we get and that needs to be stored.
Similar to the above method we have 10 more methods(so,total 11 methods) which pull the data and that data(string of around 500 KB to 1 MB each) is stored in the files using the below method :
File.WriteAllText(logPath & sDealName & ".txt", sDealName & " - " & WCMOarg_ErrOut & vbCrLf)
Now these method calls run for each deal. So for a single deal we get output in 11 different folders with the text file stored with the data received from the client.
There are 5000 deals totally for which we need to call these methods and the data gets stored in the files.
The way this functionality has been implemented is by using Parallel Programming with Master-Child relationship as follows:
Dim opts As New ParallelOptions
opts.MaxDegreeOfParallelism = System.Environment.ProcessorCount
Parallel.ForEach(dealList, opts, Sub(deal)
If Len(deal) > 0 Then
Dim dealPass As String = ""
Try
If dealPassDict.ContainsKey(deal.ToUpper) Then
dealPass = dealPassDict(deal.ToUpper)
End If
Dim p As New Process()
p.StartInfo.FileName = "E:\IGB_New\CMBS Intex Data Deal v2.0.exe"
p.StartInfo.Arguments = deal & "|" & keycode & "|" & dealPass & "|" & clArgs(1) & "|"
p.StartInfo.UseShellExecute = False
p.StartInfo.CreateNoWindow = True
p.Start()
p.WaitForExit()
Catch ex As Exception
exceptions.Enqueue(ex)
End Try
End If
End Sub)
where CMBS Intex Data Deal v2.0.exe is the child code which will execute 5000 times as deallist contains 5000 deals.
The CMBS Intex Data Deal v2.0.exe code contains the code of calling the dlls and storing the data in the files mentioned above.
Issues faced :
The code was run keeping the Master and Child code in one single place but we get out of memory exception after 3000 deals.[for 32 GB RAM,Processor Count =16]
The above code(Master-Child) is also taking up a lot of memory, it runs fine upto 4800 deals in one hour(the memory usage gradually reaches 100% at around 4800 deals) and then for the remaining 200 deals it takes close to 1 hour(so , totally 2 hours).[for 32 GB RAM,Processor Count =16]
The reason Master child was tried was on the assumption that GC will take care of the Memory disposal of all the objects in the Child.
After the data is stored in text files, a perl script runs and loads the data into the database.
Approach tried:
Instead of keeping the data in text files and then storing into the database, I tried storing the data into the DB directly without putting them in the files (assuming I/O operations consume a lot of memory),but this too didnt work as the DB crashed/Hangs everytime.
Note:
All the handles related to the DLL is properly being closed.
The call to the DLL's method consume a lot of memory,but nothing can be done to reduce it as it cant be controlled by us.
The reason to use Parallel approach is if we go with sequential approach, it would take many hours to fetch and load the data and we need to run this twice a day as the data keeps changing, so need to be up-to-date with the latest data from the client.
There was a CPU maxing out issue as well but that has been resolved by keeping the MaxDegreeOfParallelism = System.Environment.ProcessorCount.
Question :
Is there a way to reduce the time taken by the process to complete.
Currently it takes 2 hours to complete but that could be due to no memory remaining as it reaches 4800 deals, and without any memory it cannot process any further.
Is there a way to reduce memory consumption here by trying out a different way to execute this or there is something if changed in the same code could make it work?
Parallelism is most likely totally useless. You are bound by the IO, not by the CPU. The IO is the bottleneck and parallelization might even make it worse. You might experiment by using a RAM drive and then copy all the output to the actual storage.

lua math.random first randomized number doesn't reroll

so I'm new to LUA and am writing a simple guess-the-number script, but I've found a weird quirk that happens with math.random and I would like to understand what's happening here.
So I create a random seed with math.randomseed(os.time()), but when I go to get a random number, like this:
correctNum = math.random(10)
print(correctNum),
it always gets the same random number everytime I run it, unless I do it twice (irrespective of arguments given):
random1 = math.random(10)
print(random1)
random2 = math.random(10)
print(random2),
in which case the first random number will never reroll on every execution, but the second one will.
Just confused about how randomization works in LUA and would appreciate some help.
Thanks,
-Electroshockist
Here is the full working code:
math.randomseed(os.time())
random1 = math.random(10)
print(random1)
random2 = math.random(10)
print(random2)
repeat
io.write "\nEnter your guess between 1 and 10: "
guess = io.read()
if tonumber(guess) ~= random2 then
print("Try again!")
end
print()
until tonumber(guess) == random2
print("Correct!")
I guess you are calling the script twice within the same second. The resolution of os.time() is one second, i.e. if you are calling the script twice in the same second, you start with the same seed.
os.time ([table])
Returns the current time when called without arguments, or a time representing the date and time specified by the given table. This table must have fields year, month, and day, and may have fields hour, min, sec, and isdst (for a description of these fields, see the os.date function).
The returned value is a number, whose meaning depends on your system. In POSIX, Windows, and some other systems, this number counts the number of seconds since some given start time (the "epoch"). In other systems, the meaning is not specified, and the number returned by time can be used only as an argument to date and difftime.
Furthermore you are rolling a number between 1 and 10, so there is a 0.1 chance that you are hitting 4 (which is not that small).
For better methods to seed random numbers, take a look here: https://stackoverflow.com/a/31083615

Setting Size of String Buffer When Accessing Windows Registry

I'm working on a VBScript web application that has a newly-introduced requirement to talk to the registry to pull connection string information instead of using hard-coded strings. I'm doing performance profiling because of the overhead this will introduce, and noticed in Process Monitor that reading the value returns two BUFFER OVERFLOW results before finally returning a success.
Looking online, Mark Russinovich posted about this topic a few years back, indicating that since the size of the registry entry isn't known, a default buffer of 144 bytes is used. Since there are two buffer overflow responses, the amount of time taken by the entire call is approximately doubled (and yes, I realize the difference is 40 microseconds, but with 1,000 or more page hits per second, I'm willing to invest some time in optimization).
My question is this: is there a way to tell WMI what the size of the registry value is before it tries to get it? Here's a sample of the code I'm using to access the registry:
svComputer = "." ' Local machine is simply "."
ivHKey = &H80000002 ' HKEY_LOCAL_MACHINE = &H80000002 (from WinReg.h)
svRegPath = "SOFTWARE\Path\To\My\Values"
Set oRegistry = GetObject("winmgmts:{impersonationLevel=impersonate}!\\" & svComputer & "\root\default:StdRegProv")
oRegistry.GetStringValue ivHKey, svRegPath, "Value", svValue
In VBScript strings are strings. They are however long they need to be. You don't pre-define their length. Also, if performance is that much of an issue for you, you should consider using a compiled instead of an interpreted language (or a cache for values you read before).

local VS global in lua

every source agrees in that point:
the access to local variables is faster than to global ones
In practical use, the main difference is how to handle the variable, due it's limited to the scope and not accessable from any point of the code.
In theory, a local variable is save from illegal alteration due it's not accessable from a wrong point and, even bette, to lookup the var is much more performant.
Now I wonder about the details of that concept;
How does it technically work, that some parts of the code can access, others can not?
How improoved is the Performance?
But the main question is:
Let's mention I got a var bazinga = "So cool." and want to change it from somewhere.
Due the string is public, I can do this easy.
But now, if it's declared local and I'm out of scope, what performance effort is made, to gain access, if I handover the variable through X functions like this:
func_3(bazinga)
func_N(bazinga)
end
func_2(bazinga)
func_3(bazinga)
end
func_1()
local bazinga = "So cool."
func_2(bazinga)
end
Up too wich point, the local variable keeps beeing more performant and why?
I ask you, due maintaining code in which objects are handed over many functions are getting a mess and I want to know, if it's really worth it.
In theory, a local variable is save from illegal alteration due it's not accessable from a wrong point and, even bette, to lookup the var is much more performant.
Local variable is not save from anything in practical sense. This conception is a part of lexical scoping – the method of name resolution that has some advantages (as well as disadvantages, if you like) over dynamic and/or purely global scoping.
The root of performance is that in Lua locals are just stack slots, indexed by integer offset, computed once at compile-time (i.e. at load()). But globals are really keys into globals table, which is pretty regular table, so any access is a non-precomupted lookup. All this depends on the implementation details and may vary across different languages or even implementations (as someone already noted, LuaJIT is capable to optimize many things, so YMMV).
Now I wonder about the details of that concept; How does it technically work, that some parts of the code can access, others can not? How improoved is the Performance?
Technically, in 5.1 globals is a special table with special opcodes that access it, and 5.2 removed global opcodes and introduced _ENV upvalue per function. (What we call globals are actually environmental variables, because lookups go into function's environment that may be set to value other than "globals table", but let's not change the terminology on the fly). So, speaking in 5.2 terms, any global is just a key-value pair in globals table, that is accessible in every function through a lexically scoped variable.
Now on locals and lexical scoping. As you already know, local variables are stack slots. But what if our function uses a variable from outer scope? In that case a special block is created that holds a variable, and it becomes upvalue. Upvalue is a sort of seamless pointer to original variable, that prevents it from destruction when it's scope is over (local variables generally cease to exist when you escape the scope, right?).
But the main question is: Let's mention I got a var bazinga = "So cool." and want to change it from somewhere. Due the string is public, I can do this easy. But now, if it's declared local and I'm out of scope, what performance effort is made, to gain access, if I handover the variable through X functions like this: .....
Up too wich point, the local variable keeps beeing more performant and why?
In your snippet, it is not a variable that gets passed down the call stack, but a value "So cool." (which is a pointer into the heap, as all other garbage-collectible values). The local variable bazinga was never passed to any function, because Lua has no conception known as var-parameters (Pascal) or pointers/references (C/C++). Each time you call a function, all arguments become its local variables, and it our case, bazinga is not a single variable, but a bunch of stack slots in different stack frames that have the same value – same pointer into heap, with "So cool." string at that address. So there is no penalty with each level of call stack.
Before going into any comparison I'd want to mention that your worries are probably premature: write your code first, then profile it, and then optimize based on that. It may be difficult to optimize things after the fact in some cases, but this is not likely to be one of those cases.
Access to local variables is faster because access to global variables includes table lookup (whether in _G or in _ENV). LuaJIT may optimize some of that table access, so the difference may be less noticeable there.
You don't need to trade ease of access in this case as you can always use access from functions to upvalues to keep local variables available:
local myvar
function getvar() return myvar end
function setvar(val) myvar = val end
-- elsewhere
setvar('foo')
print(getvar()) -- prints 'foo'
Using getvar is not going to be faster than accessing myvar as a global variable, but this gives you an option to use myvar as local and still have access to it from other files (which is probably why you'd want it to be a global variable).
You can test the performance of locals vs globals yourself with os.clock(). The following code was tested on a 2,8 Ghz quad core running inside a virtual machine.
-- Dedicate memory:
local start = os.clock()
local timeend = os.clock()
local diff = timeend - start
local difflocal = {}
local diffglobal = {}
local x, y = 1, 1 --Locals
a, b = 1, 1 -- Globals
-- 10 tests:
for i = 0, 10, 1 do
-- Start
start = os.clock()
for ii = 0, 100000, 1 do
y = y + ii
x = x + ii
end
timeend = os.clock()
-- Stop
diff = (timeend - start) * 1000
table.insert(difflocal, diff)
-- Start
start = os.clock()
for ii = 0, 100000, 1 do
b = b + ii
a = a + ii
end
timeend = os.clock()
-- Stop
diff = (timeend - start) * 1000
table.insert(diffglobal, diff)
end
print(a)
print(b)
print(table.concat(difflocal, " ms, "))
print(table.concat(diffglobal, " ms, "))
Prints:
55000550001
55000550001
2.033 ms, 1.979 ms, 1.97 ms, 1.952 ms, 1.914 ms, 2.522 ms, 1.944 ms, 2.121 ms, 2.099 ms, 1.923 ms, 2.12
9.649 ms, 9.402 ms, 9.572 ms, 9.286 ms, 8.767 ms, 10.254 ms, 9.351 ms, 9.316 ms, 9.936 ms, 9.587 ms, 9.58

MATLAB : access loaded MAT file very slow

I'm currently working on a project involving saving/loading quite big MAT files (around 150 MB), and I realized that it was much slower to access a loaded cell array than the equivalent version created inside a script or a function.
I created this example to simulate my code and show the difference :
clear; clc;
disp('Test for computing with loading');
if exist('data.mat', 'file')
delete('data.mat');
end
n_tests = 10000;
data = {};
for i=1:n_tests
data{end+1} = rand(1, 4096);
end
% disp('Saving data');
% save('data.mat', 'data');
% clear('data');
%
% disp('Loading data');
% load('data.mat', '-mat');
for i=1:n_tests
tic;
for j=1:n_tests
d = sum((data{i} - data{j}) .^ 2);
end
time = toc;
disp(['#' num2str(i) ' computed in ' num2str(time) ' s']);
end
In this code, no MAT file is saved nor loaded. The average time for one iteration over i is 0.75s. When I uncomment the lines to save/load the file, the computation for one iteration over i takes about 6.2s (the saving/loading time is not taking into consideration). The difference is 8x slower !
I'm using MATLAB 7.12.0 (R2011a) 64 bits with Windows 7 64 bits, and the MAT files are saved with the version v7.3.
Can it be related to the compression of the MAT file? Or caching variables ?
Is there any way to prevent/avoid this ?
I also know this problem. I think it's also related to the inefficient managing of memory in matlab - and as I remember it's not doing well with swapping.
A 150MB file can easily hold a lot of data - maybe more than can be quickly allocated.
I just made a quick calculation for your example using the information by mathworks
In your case total_size = n_tests*121 + n_tests*(1*4096* 8) is about 313MB.
First I would suggest to save them in format 7 (instead of 7.3) - I noticed very poor performance in reading this new format. That alone could be the reason of your slowdown.
Personally I solved this in two ways:
Split the data in smaller sets and then use functions that load the data when needed or create it on the fly (can be elegantly done with classes)
Move the data into a database. SQLite and MySQL are great. Both work efficiently with MUCH larger datasets (in the TBs instead of GBs). And the SQL language is quite efficient to quickly get subsets to manipulate.
I test this code with Windows 64bit, matlab 64bit 2014b.
Without saving and loading, the computation is around 0.22s,
Save the data file with '-v7' and then load, the computation is around 0.2s.
Save the data file with '-v7.3' and then load, the computation is around 4.1s.
So it is related to the compression of the MAT file.

Resources