Increase the performance by removing CLEAR ALL - performance

In Matlab 2014b, when I use CLEAR ALL at the beginning of the script I get the following warning,
For improved performance, consider not using CLEAR ALL within a script
which is not given in the previous releases (As I recall).
The only reason I found is that, when you call a script from outside or from other scripts you do not want to clear the variables in the workspace, and re-generate them every time again and again.
Is there any other reason that I am missing?
How does removing CLEAR ALL improves the performance when using a single script?

In R2015b, the semantics of clear were changed. Perhaps in response to the concerns raised in this question, the changes stated in the release notes are:
The clear function no longer clears debugging breakpoints. To clear breakpoints, use dbclear all.
The clear function only clears functions that are not currently running. For example, when you call clear myFun while myFun is running, myFun is not cleared.
This part pertains to pre-R2015b MATLAB versions.
Here's a table of what gets cleared with each input argument.
The table for R2015b is identical except that there is no longer a "Debugging breakpoints" column since they are no longer cleared with clear.
Scripts and functions are cleared, when you can probably just clear variables (red boxes). It does not make a lot of sense to clear the function from memory that is presently being executed. (According to R2015b release notes, this does not happen.)
Also, keeping in mind that scripts execute in the base workspace, you are clearing out all functions that may be used by other scripts. Try looking at the output of inmem after an extended MATLAB tinkering session. You fill find all kinds of MATLAB functions that are loaded into memory for fast access (including 'matlabrc', 'pathdef', and other core scripts that setup your workspace). So, perhaps it's not that it hurts performance of just the script where you call clear all but all other scripts and the interactive command line that is in the base workspace. That would be my guess.
Not performance related, but another reason why a clear all in a script might be a bad idea is that it will clear breakpoints (this can be annoying!), and global+persistent variables. However, it might be the goal to clear global and peristent variables. For global there's clear global, but there's nothing like that for persistent since persistent variables are bound to functions and you would use clear functions or clear whateverFunctionName for them.

Related

Does using global variables impact performance in MATLAB?

As I understand, MATLAB cannot use pass by reference when sending arguments to other functions. I am doing audio processing, and I frequently have to pass waveforms as arguments into functions, and because MATLAB uses pass by value for these arguments, it really eats up a lot of RAM when I do this.
I was considering using global variables as a method to pass my waveforms into functions, but everywhere I read there seems to be a general opinion that this is a bad idea, for organization of code, and potentially performance issues... but I haven't really read any detailed answers on how this might impact performance...
My question: What are the negative impacts of using global variables (with sizes > 100MB) to pass arguments to other functions in MATLAB, both in terms of 1) performance and 2) general code organization and good practice.
EDIT: From #Justin's answer below, it turns out MATLAB does on occasion use pass by reference when you do not modify the argument within the function! From this, I have a second related question about global variable performance:
Will using global variables be any slower than using pass by reference arguments to functions?
MATLAB does use pass by reference, but also uses copy-on-write. That is to say, your variable will be passed by reference into the function (and so won't double up on RAM), but if you change the variable within the the function, then MATLAB will create a copy and change the copy (leaving the original unaffected).
This fact doesn't seem to be too well known, but there's a good post on Loren's blog discussing it.
Bottom line: it sounds like you don't need to use global variables at all (which are a bad idea as #Adriaan says).
While relying on copy on write as Justin suggested is typically the best choice, you can easily implement pass by reference. With Matlab oop being nearly as fast as traditional functions in Matlab 2015b or newer, using handle is a reasonable option.
I encountered an interesting use case of a global variable yesterday. I tried to parallellise a piece of code (1200 lines, multiple functions inside the main function, not written by me), using parfor.
Some weird errors came out and it turned out that this piece of code wrote to a log file, but used multiple functions to write to the log file. Rather than opening and closing the relevant log file every time a function wanted to write to it, which is very slow, the file ID was made global, so that all write-functions could access it.
For the serial case this made perfect sense, but when trying to parallellise this, using global apparently breaks the scope of a worker instance as well. So suddenly we had 4 workers all trying to write into the same log file, which resulted in some weird errors.
So all in all, I maintain my position that using global variables is generally a bad idea, although I can see its use in specific cases, provided you know what you're doing.
Using global variables in Matlab may increase performance alot. This is because you can avoid copying of data in some cases.
Before attempting to gain such performance tweaks, think carefully of the cost to your project, in terms of the many drawbacks that global variables come with. There are also pitfalls to using globals with bad consequences to performance, and those may be difficult to avoid(although possible). Any code that is littered with globals tend to be difficult to comprehend.
If you want to see globals in use for performance, you can look at this real-time toolbox for optical flow that I made. This is the only project in native Matlab that is capable of real-time optical flow that I know of. Using globals was one of the reasons this was doable. It is also a reason to why the code is quite difficult to grasp: Globals are evil.
That globals can be used this way is not a way to argue for their use, rather it should be a hint that something should be updated with Matlabs unflexible notions of workspace and inefficient alternatives to globals such as guidata/getappdata/setappdata.

NetLogo Debugging

NetLogo being interactive makes debugging easy, but I yet to find any tools available for setting breakpoints and stepping through code.
Please guide me if such exist. Or I can achieve the same with the current setup available.
I am not aware of such a tool if one exists. For debugging I use meaningful print statements. First I make a switch as a global parameter to set the debug mode on and off, then I add a statement to each method that prints which method updates which variable and in which order they were called (if debug mode is on).
I also use profiler extension which shows how many times each method was called and which one is the most or least time consuming one.
Not existing currently. Still, you can use one of the alternatives from above or you might take a look at user-message (https://ccl.northwestern.edu/netlogo/docs/dictionary.html#user-message), which will pop up a dialog. This will also block the execution at that step, although not providing you with a jump-to-next-line mechanism, for me this solution proved to be the best.
Another possibility is to do the debugging in any modern browser if/when NetLogo Web produces source maps. This way one can set breakpoints in the NetLogo code and use Chrome or FireFox or IE11's developer tools on the NetLogo code.
I have used user-message and inspect together as the debugging tool for NetLogo. Here is a video demo on how to use them to identify the cause of an error.
the reason for using inspect is to examine all properties of an object when we are not sure where exactly went wrong
using user-message to print out some instruction, output some outcome and halt the program
I found Netlogo somewhat difficult to debug until I discovered print statements.
I basically zone in on the module that is causing problems and I add print statements within critical blocks to inspect the state of the variables. I have find this to be an effective way to debug.
I do wish the documentation was more comprehensive, with more code examples. Perhaps some good Samaritan will take it up as a project.
NB: Note that this approach is not just a convoluted way to achieve the exact same benefit that using random-seed gives. Using random-seed is an ex-ante way to reproduce a run. However, for rare errors, it is impractical to manually change random-seed (maybe a few hundred times) until you hit by chance a run in which the error appears. This approach, instead, makes you able to reproduce the error after it occurred, potentially saving tons of time by letting you reproduce that rare run ex-post.
Feel free to download this blueprint from the Modelling Commons if anyone finds it useful and wants to save the time to set it up.
Not a NetLogo feature, but an expedient I've devised recently.
Some errors might occur only rarely (for example, every few hundred runs). If that is the case, even just reproducing the error can become time consuming.
In order to avoid this problem, the following arrangement can be used (note that the code blocks only contain very few lines of code, the vast majority is only comments).
globals [
current-seed
]
to setup
clear-all
reset-ticks
manage-randomness
end
to manage-randomness
; This procedure checks if the user wants to use a new seed given by the 'new-seed' primitive, or if
; they want to use a custom-defined seed. In the latter case, the custom-defined seed is retrieved
; from the 'custom-seed' global variable, which is set in the Interface in the input box. Such variable
; can be defined either manually by the user, or through the 'save-current-seed-as-custom' button (see
; the 'save-current-seed-as-custom' procedure below).
; In either case, this selection is mediated by the 'current-seed' global variable. This is because
; having a global variable storing the value that will be passed to the 'random-seed' primitive is the
; only way to [1] have such value displayed in the Interface, and [2] re-use such value in case the
; user wants to perform 'save-current-seed-as-custom'.
ifelse (use-custom-seed?)
[set current-seed custom-seed]
[set current-seed new-seed]
random-seed current-seed
end
to save-current-seed-as-custom
; This procedure lets the user store the seed that was used in the current run (and stored in the
; 'random-seed' global variable; see comment in 'manage-randomness') as 'custom-seed', which will
; allow such value to be re-used after turning on the 'use-custom-seed?' switch in the Interface.
set custom-seed current-seed
end
This will make it possible to reproduce the same run where a rare error occurred, just by saving that run's seed as the custom seed and switching on the switch.
To make things even more useful, the same logic can be applied to ticks: to jump exactly to the same point where the rare error occurred (maybe thousands of ticks after the start of the run), it is possible to combine the previous arrangement about seeds and the following arrangement for ticks:
to go
; The first if-statement allows the user to bring the run to a custom-defined ticks value.
; The custom-defined ticks value is retrieved from the 'custom-ticks' global variable,
; which is set in the Interface in the input box. Such variable can be defined either
; manually by the user, or through the 'save-current-ticks-as-custom' button (see the
; 'save-current-ticks-as-custom' procedure above).
if (use-custom-ticks?) AND (ticks = custom-ticks) [stop]
; Insert here the normal 'go' procedure.
tick
end
to save-current-ticks-as-custom
; This procedure lets the user store the current 'ticks' value as 'custom-ticks'. This will allow
; the user, after switching on the 'use-custom-ticks?' switch, to bring the simulation to the
; exact same ticks as when the 'save-current-ticks-as-custom' button was used. If used in combination
; with the 'save-current-seed-as-custom' button and 'use-custom-seed?' switch, this allows the user
; to surely and quickly jump to the exact same situation as when a previous simulation was interrupted.
set custom-ticks ticks
end
This will make it possible not only to quickly jump to where an otherwise rare error would occur, but also, if needed, to manually change the custom-ticks value to a few ticks earlier, in order to be able to observe how things build up before the error occurring. Something that, with rare errors, can otherwise become quite time-consuming.
NetLogo is all about keeping the code in one spot. When I run a simulation in 2D or 3D, I usually have an idea what my whole system is going to produce at timepoint X. So when I'm testing, I usually color code my agents, the "turtles", around a variable I'm tracking (like number of protein signals etc.)
It can be as simple as making them RED when the variable your wondering about is over a threshold or BLUE when under. Or you can throw in another color, maybe GREEN, so that you track when the turtles of interest fall within the "optimal" range.

Can I mark some code as optional while debugging in Visual Studio 2012?

I'm not sure how to really put my question into words so let me try to explain it with an example:
Let's say my program runs into some weird behavior at a specific action. I already find some code which is the cause of this weird behavior. When disabling this sequence I don't run into this behavior. Unfortunately, I need this code because something else is not working then.
So, what I gonna do next is figuring out why something is going different when that code excerpt is active.
In order to better understand what's going on I sometimes want to run the whole action including the 'bad code' and sometimes without. Then I can compare the outcome, for example what happens in the UI or what my function returns.
The first approach which comes to my mind is to run my program with the code enabled, do whatever I want, then stop my program, comment out the code, recompile and run again. Um... that sounds dumb. Especially if I then again need to turn on that code to see another time the other behavior, and then again turn off, and on, and off and so on.
It's not an option for me to use breakpoints and influence the statement order or to modify values so that I run or not run into if-statements, for-loops etc. Two examples:
I debug a timing critical behavior and when I halt the program the timing changes significantly. Thus, the first breakpoint I can set must be at the end of the action. 1
I expect a tooltip or other window to appear which is 'suppressed' when focus is given to VS. Thus, I cannot use any breakpoints at all. Neither in the beginning nor at the end of the action.1
Is there any technique in Visual Studio 2012 which allows me to mark this code to be optional and I can decide whether or not I want to run this code sequence before I execute the action? I think of something like if(true|false) on a higher level.
I'm not looking for a solution where I need to re-run my program several times. In that case I could still doing the simple approach of simply commenting out the code with #if false.
1 Note that I, of course, may set a breakpoint when I need to look into a specific variable at a certain position (if I haven't written the value into output) but will turn off breakpoints again to run the whole action in one go.
In the Visual Studio debugger you can set a breakpoint right in front of your "code in question". When the code stops at that point, you can elect to let it continue or you can right-click on any other line and select Set Next Statement.
It's kind of a weird option, but I've come to appreciate it.
The only option I can think of is to add something to your UI that only appears when debugging, giving you the option to include/exclude the operations in question.
While you're at it, you might want to enable resetting the application to a "known state" from the UI as well.
I think of something like if(true|false) on a higher level.
Why "on a higher level"? Why not use exactly this?
You want a piece of code sometimes executed, sometimes not, and the switch should be changed at run time, not at compile time - this obviously leads to
if(condition)
{
// code in stake
}
The catch here is what kind of condition you will use - maybe a variable you set to true in the release version of your code, and to false sometimes in your debug version. Maybe the value is taken from a configuration file, maybe from an environment variable, maybe calculated by some kind of logic in your program, whatever and whenever you like.
EDIT: you could also introduce a boolean variable in your code for condition, initialize it to true by default and change its value using the debugger whenever you like.
Preprocessor Directives might be what you're after. They're bits of code for the compiler to execute, identifiable by starting with a # character (and stylistically, by default they don't follow the indent pattern of your code, instead always residing firmly at the left-hand edge of the editor):
#define INCLUDE_DODGY_CODE
public void MyMethodWithDodgyBits() {
#if INCLUDE_DODGY_CODE
myDodgyMethod();
#endif
myOkMethod();
}
In this case, if #define INCLUDE_DODGY_CODE was included, the myDodgyMethod() call will be compiled into your program. Otherwise, the call will be skipped by the compiler and will simply not exist in your binary.
There are a couple of options for debugging as you ask.
Visual Studio has a number of options to directly navigate through code. You can use the Set Next Statement feature to move directly to a particular statement. You can also directly edit values through the Immediate Window the QuickWatch and the tooltip that hovers over variables while debugging.
Visual Studio also has the ability to playback the execution history. Take a look at IntelliTrace to get started. It can be helpful when you have multiple areas of concern that are interacting and generating the error condition.
You can also wrap your sections of code within conditional blocks, and set the conditional variables as appropriate. That could be while you're debugging, or you could pass parameters in through a configuration file. Using conditional checks may be easier than manually stepping through code if there are a number of statements you wish to exclude.
It sometimes depends on the version of VS and the language, but you can happily edit the code (to comment it out, or wrap it in a big #ifdef 0) then press alt+F10 and the compiler will recompile, relink and continue execution as if you'd never fiddled with it.
But while that works beautifully in VC++ (since VS v6 IIRC), C# can have issues - I find (with VS2010) that I cannot edit and continue in this way with functions containing any lambda (mainly linq) statements, and 64-bit code never used to do this too. Still, its worth experimenting with as its really useful sometimes.
I have worked on applications that have optional code used for debugging alone that should not appear in the production environment. This segment of optional code was easiest for us to control using a config file since it didn't require a re-compile to change.
Such a fix might not be the end all be all for your end result, but it might help get through it until a fix is found. If you have multiple optional sections that need to be tested in combination this style of fix could require multiple keys in the config file, which could be a downside and a pain to keep track of.
Your question isn't exactly clear, which is possibly why there are so many answers which you think are invalid. You may want to consider rewording it if no one seems able to answer the question.
With the risk of giving another non-valid answer I'll add some input on how I've dealt with the issue in the past.
The easiest way is to place any optional code within
#if DEBUG
//Optional code here
#endif
That way, when you run in debug mode the code is implemented and when you run in release mode it's not. Switching between the two requires clicking one button.
I've also solved the same problem in a similar way with a simple flag:
bool runOptionalCode = false;
then
if (runOptionalCode)
{
//Place optional code here
}
Again, switching between modes requires changing one word, so is a simple task. You mention this in your question but discount it for reasons that are unclear. As I said, it requires very little effort to switch between the two.
If you need to make changes between the code while it's running the best way is to use a UI item or a keystroke which modifies the flag mentioned in the example above. Depending on your application though this could be more effort than it's worth. In the past I've found that when I have a key listener already implemented as part of the project, having a couple of key strokes decide whether to run my debug (optional) code works best. In an application without key listeners I'd rather stick with one of the previous methods.

Is there a way to clear everything before starting in Mathematica?

In MATLAB there is the function clear to delete all current variables. This is very useful if you start something totally new and don't want to get conflicts with earlier calculations. I'm searching something similar for Mathematica now, but I couldn't find anything except of Clear[VAR] which removes only the variable VAR.
You can use ClearAll to clear the variables and their attributes in your Global context (default) like so:
ClearAll["Global`*"]
If you're working inside a different context (e.g., notebook specific context or cell group specific context), you can do
ClearAll[Evaluate[Context[] <> "*"]]
If you want to remove all symbols from the kernel so that Mathematica doesn't recognize them anymore, you can use Remove[] similar to the above two examples.
Barring these, you can always quit the kernel with Quit[] which will remove all symbols. A fresh kernel will be initiated the next time you evaluate something.
I recommend one of two methods:
1. Keyboard shortcut to Quit[] the kernel
There is a system file KeyEventTranslations.tr that you can edit to customize keyboard shortcuts. I, as others, have added Ctrl+Q to Quit[] the kernel, allowing for a rapid clearing of all sessions variables. For more information on setting this up, see:
Customizing Mathematica shortcuts
2. Give the new Notebook a unique context
In Mathematica, the current $Context defines what Context unqualified symbol names belong to. By giving a new Notebook a unique Context, which is easily done through the Evaluation menu, the symbols used in that Notebook will not collide with unqualified symbols in other Notebooks. See the following question for more detailed information:
Mathematica - Separating Notebooks
I just realized that you might not know that unlike MATLAB, Mathematica is designed to run as two separate processes: the Front End is the user interface, and lets you work with notebooks. The Kernel does the computations. You can quit the kernel without affecting the front end, or even start more than one kernel for different notebooks, or start a kernel on a remote computer and use it with a local front end.
I believe that the only reliable way to clean everything is to Quit the kernel and re-start it (which is automatic). There are just too many things that can get modified apart from user variables/functions (including In/Out, loaded packages, system caches, etc.). So if you need a truly fresh start, I recommend Quit.
For a "soft" reset, #yoda already mentioned ClearAll["Global`*"]. There's the << Utilities`CleanSlate` package, which automates a little bit more than this. You can read the package docs inside the AddOns\ExtraPackages\Utilities\CleanSlate.m file.
In short, CleanSlate[] will attempt to take you back to the kernel state when the package was loaded. ClearInOut[] will clear In and Out to save memory.
I haven't used this package in years (except for the ClearInOut[] functionality), as the Mathematica kernel starts up quickly on modern computers, so I just use Quit. So I can't tell you how well it works.

How to avoid debugger-only variables?

I commonly place into variables values that are only used once after assignment. I do this to make debugging more convenient later, as I'm able to hover the value on the one line where it's later used.
For example, this code doesn't let you hover the value of GetFoo():
return GetFoo();
But this code does:
var foo = GetFoo();
return foo; // your hover-foo is great
This smells very YAGNI-esque, as the functionality of the foo's assignment won't ever be used until someone needs to debug its value, which may never happen. If it weren't for the merely foreseen debugging session, the first code snippet above keeps the code simpler.
How would you write the code to best compromise between simplicity and ease of debugger use?
I don't know about other debuggers, but the integrated Visual Studio debugger will report what was returned from a function in the "Autos" window; once you step over the return statement, the return value shows up as "[function name] returned" with a value of whatever value was returned.
gdb supports the same functionality as well; the "finish" command executes the rest of the current function and prints the return value.
This being a very useful feature, I'd be surprised if most other debuggers didn't support this capability.
As for the more general "problem" of "debugger-only variables," are they really debugger-only? I tend to think that the use of well-named temporary variables can significantly improve code readability as well.
Another possibility is to learn enough assembly programming that you can read the code your compiler generates. With that skill, you can figure out where the value is being held (in a register, in memory) and see the value without having to store it in a variable.
This skill is very useful if you are ever need to debug an optimized executable. The optimizer can generate code that is significantly different from how you wrote it such that symbolic debugging is not helpful.
Another reason why you don't need intermediate variables in the Visual Studio debugger is that you can evaluate the function in the Watch Window and the Immediate window. For the watch window, just simply highlight the statement you want evaluated and drag it into the window.
I'd argue that it's not worth worrying about. Given that there's no runtime overhead in the typical case, go nuts. I think that breaking down complex statements into multiple simple statements usually increases readability.
I would leave out the assignment until it is needed. If you never happen to be in that bit of code, wanting a look at that variable, you haven't cluttered up your code unnecessarily. When you run across the need, put it in (it should be a trivial Extract Variable refactoring). And when you're done with that debugging session, get rid of it (Inline Variable). If you find yourself debugging so much - and so much at that particular point - that you're weary of refactoring back and forth, then think about ways to avoid the need; maybe more unit tests would help.

Resources