Does using global variables impact performance in MATLAB? - performance

As I understand, MATLAB cannot use pass by reference when sending arguments to other functions. I am doing audio processing, and I frequently have to pass waveforms as arguments into functions, and because MATLAB uses pass by value for these arguments, it really eats up a lot of RAM when I do this.
I was considering using global variables as a method to pass my waveforms into functions, but everywhere I read there seems to be a general opinion that this is a bad idea, for organization of code, and potentially performance issues... but I haven't really read any detailed answers on how this might impact performance...
My question: What are the negative impacts of using global variables (with sizes > 100MB) to pass arguments to other functions in MATLAB, both in terms of 1) performance and 2) general code organization and good practice.
EDIT: From #Justin's answer below, it turns out MATLAB does on occasion use pass by reference when you do not modify the argument within the function! From this, I have a second related question about global variable performance:
Will using global variables be any slower than using pass by reference arguments to functions?

MATLAB does use pass by reference, but also uses copy-on-write. That is to say, your variable will be passed by reference into the function (and so won't double up on RAM), but if you change the variable within the the function, then MATLAB will create a copy and change the copy (leaving the original unaffected).
This fact doesn't seem to be too well known, but there's a good post on Loren's blog discussing it.
Bottom line: it sounds like you don't need to use global variables at all (which are a bad idea as #Adriaan says).

While relying on copy on write as Justin suggested is typically the best choice, you can easily implement pass by reference. With Matlab oop being nearly as fast as traditional functions in Matlab 2015b or newer, using handle is a reasonable option.

I encountered an interesting use case of a global variable yesterday. I tried to parallellise a piece of code (1200 lines, multiple functions inside the main function, not written by me), using parfor.
Some weird errors came out and it turned out that this piece of code wrote to a log file, but used multiple functions to write to the log file. Rather than opening and closing the relevant log file every time a function wanted to write to it, which is very slow, the file ID was made global, so that all write-functions could access it.
For the serial case this made perfect sense, but when trying to parallellise this, using global apparently breaks the scope of a worker instance as well. So suddenly we had 4 workers all trying to write into the same log file, which resulted in some weird errors.
So all in all, I maintain my position that using global variables is generally a bad idea, although I can see its use in specific cases, provided you know what you're doing.

Using global variables in Matlab may increase performance alot. This is because you can avoid copying of data in some cases.
Before attempting to gain such performance tweaks, think carefully of the cost to your project, in terms of the many drawbacks that global variables come with. There are also pitfalls to using globals with bad consequences to performance, and those may be difficult to avoid(although possible). Any code that is littered with globals tend to be difficult to comprehend.
If you want to see globals in use for performance, you can look at this real-time toolbox for optical flow that I made. This is the only project in native Matlab that is capable of real-time optical flow that I know of. Using globals was one of the reasons this was doable. It is also a reason to why the code is quite difficult to grasp: Globals are evil.
That globals can be used this way is not a way to argue for their use, rather it should be a hint that something should be updated with Matlabs unflexible notions of workspace and inefficient alternatives to globals such as guidata/getappdata/setappdata.

Related

Is it bad to have many global functions?

I'm relatively new to software development, and I'm on my way to completing my first app for the iPhone.
While learning Swift, I learned that I could add functions outside the class definition, and have it accessible across all views. After a while, I found myself making many global functions for setting app preferences (registering defaults, UIAppearance, etc).
Is this bad practice? The only alternate way I could think of was creating a custom class to encapsulate them, but then the class itself wouldn't serve any purpose and I'd have to think of ways to passing it around views.
Global functions: good (IMHO anyway, though some disagree)
Global state: bad (fairly universally agreed upon)
By which I mean, it’s probably a good practice to break up your code to create lots of small utility functions, to make them general, and to re-use them. So long as they are “pure functions”
For example, suppose you find yourself checking if all the entries in an array have a certain property. You might write a for loop over the array checking them. You might even re-use the standard reduce to do it. Or you could write a re-useable function, all, that takes a closure that checks an element, and runs it against every element in the array. It’s nice and clear when you’re reading code that goes let allAboveGround = all(sprites) { $0.position.y > 0 } rather than a for…in loop that does the same thing. You can also write a separate unit test specifically for your all function, and be confident it works correctly, rather than a much more involved test for a function that includes embedded in it a version of all amongst other business logic.
Breaking up your code into smaller functions can also help avoid needing to use var so much. For example, in the above example you would probably need a var to track the result of your looping but the result of the all function can be assigned using let. Favoring immutable variables declared with let can help make your program easier to reason about and debug.
What you shouldn’t do, as #drewag points out in his answer, is write functions that change global variables (or access singletons which amount to the same thing). Any global function you write should operate only on their inputs and produce the exact same results every time regardless of when they are called. Global functions that mutate global state (i.e. make changes to global variables (or change values of variables passed to them as arguments by reference) can be incredibly confusing to debug due to unexpected side-effects they might cause.
There is one downside to writing pure global functions,* which is that you end up “polluting the namespace” – that is, you have all these functions lying around that might have specific relevance to a particular part of your program, but accessible everywhere. To be honest, for a medium-sized application, with well-written generic functions named sensibly, this is probably not an issue. If a function is purely of use to a specific struct or class, maybe make it a static method. If your project really is getting too big, you could perhaps factor out your most general functions into a separate framework, though this is quite a big overhead/learning exercise (and Swift frameworks aren’t entirely fully-baked yet), so if you are just starting out so I’d suggest leaving this for now until you get more confident.
* edit: ok two downsides – member functions are more discoverable (via autocomplete when you hit .)
Updated after discussion with #AirspeedVelocity
Global functions can be ok and they really aren't much different than having type methods or even instance methods on a custom type that is not actually intended to contain state.
The entire thing comes down mostly to personal preference. Here are some pros and cons.
Cons:
They sometimes can cause unintended side effects. That is they can change some global state that you or the caller forgets about causing hard to track down bugs. As long as you are careful about not using global variables and ensure that your function always returns the same result with the same input regardless of the state of the rest of the system, you can mostly ignore this con.
They make code that uses them difficult to test which is important once you start unit testing (which is a definite good policy in most circumstances). It is hard to test because you can't mock out the implementation of a global function easily. For example, to change the value of a global setting. Instead your test will start to depend on your other class that sets this global setting. Being able to inject a setting into your class instead of having to fake out a global function is generally preferable.
They sometimes hint at poor code organization. All of your code should be separable into small, single purpose, logical units. This ensures your code will remain understandable as your code base grows in size and age. The exception to this is truly universal functions that have very high level and reusable concepts. For example, a function that lets you test all of the elements in a sequence. You can also still separate global functions into logical units by separating them into well named files.
Pros:
High level global functions can be very easy to test. However, you cannot ignore the need to still test their logic where they are used because your unit test should not be written with knowledge of how your code is actually implemented.
Easily accessible. It can often be a pain to inject many types into another class (pass objects into an initializer and probably store it as a property). Global functions can often remove this boiler plate code (even if it has the trade off of being less flexible and less testable).
In the end, every code architecture decision is a balance of trade offs each time you go to use it.
I have a Framework.swift that contains a set of common global functions like local(str:String) to get rid of the 2nd parameter from NSLocalize. Also there are a number of alert functions internally using local and with varying number of parameters which makes use of NSAlert as modal dialogs more easy.
So for that purpose global functions are good. They are bad habit when it comes to information hiding where you would expose internal class knowledge to some global functionality.

Debugging classic ASP - simple way to dump locals to a file in multiple places?

I'm dealing with a bunch of spaghetti in a classic ASP site. I'm trying to figure out what I need for a particular object construct, but am having a lot of trouble because of the number of variables that are used which have (for all practical purposes) global scope. Walking through with the VS debugger isn't getting me very far because there are a lot of state changes and database lookups happening all over the place.
What I'd like to do is create some kind of debug utility to dump all variables in local scope into a file, and be able to call that from several different places in the code so that I can simply compare values to understand the necessary state changes.
It's a simple enough problem to write data out to a file, but there are so many local vars in play, and I'm not sure what are just UI, what are just business data and what are actually controlling flow that I don't yet know what I want to grab.
So that's the problem--the question is:
Is there a built-in way (or a tool of some sort) that will allow me to make a simple call to dump all local variables to some sort of output as name/value pairs or something similar?

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info
I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)
I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).
I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.
Question
Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?
If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.
Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).
Also note that formulas (and, of course, functions) have environments so watch out for those too.
If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)
See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.
(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.

`global` assertions?

Are there any languages with possibility of declaring global assertions - that is assertion that should hold during the whole program execution. So that it would be possible to write something like:
global assert (-10 < speed < 10);
and this assertion will be checked every time speed changes state?
eiffel supports all different contracts: precondition, postcondition, invariant... you may want to use that.
on the other hand, why do you have a global variable? why don't you create a class which modifies the speed. doing so, you can easily check your condition every time the value changes.
I'm not aware of any languages that truly do such a thing, and I would doubt that there exist any since it is something that is rather hard to implement and at the same time not something that a lot of people need.
It is often better to simply assert that the inputs are valid and modifications are only done when allowed and in a defined, sane way. This concludes the need of "global asserts".
You can get this effect "through the backdoor" in several ways, though none is truly elegant, and two are rather system-dependent:
If your language allows operator overloading (such as e.g. C++), you can make a class that overloads any operator which modifies the value. It is considerable work, but on the other hand trivial, to do the assertions in there.
On pretty much every system, you can change the protection of memory pages that belong to your process. You could put the variable (and any other variables that you want to assert) separately and set the page to readonly. This will cause a segmentation fault when the value is written to, which you can catch (and verify that the assertion is true). Windows even makes this explicitly available via "guard pages" (which are really only "readonly pages in disguise").
Most modern processors support hardware breakpoints. Unless your program is to run on some very exotic platform, you can exploit these to have more fine-grained control in a similar way as by tampering with protections. See for example this article on another site, which describes how to do it under Windows on x86. This solution will require you to write a kind of "mini-debugger" and implies that you may possibly run into trouble when running your program under a real debugger.

Resources