I'm relatively new to software development, and I'm on my way to completing my first app for the iPhone.
While learning Swift, I learned that I could add functions outside the class definition, and have it accessible across all views. After a while, I found myself making many global functions for setting app preferences (registering defaults, UIAppearance, etc).
Is this bad practice? The only alternate way I could think of was creating a custom class to encapsulate them, but then the class itself wouldn't serve any purpose and I'd have to think of ways to passing it around views.
Global functions: good (IMHO anyway, though some disagree)
Global state: bad (fairly universally agreed upon)
By which I mean, it’s probably a good practice to break up your code to create lots of small utility functions, to make them general, and to re-use them. So long as they are “pure functions”
For example, suppose you find yourself checking if all the entries in an array have a certain property. You might write a for loop over the array checking them. You might even re-use the standard reduce to do it. Or you could write a re-useable function, all, that takes a closure that checks an element, and runs it against every element in the array. It’s nice and clear when you’re reading code that goes let allAboveGround = all(sprites) { $0.position.y > 0 } rather than a for…in loop that does the same thing. You can also write a separate unit test specifically for your all function, and be confident it works correctly, rather than a much more involved test for a function that includes embedded in it a version of all amongst other business logic.
Breaking up your code into smaller functions can also help avoid needing to use var so much. For example, in the above example you would probably need a var to track the result of your looping but the result of the all function can be assigned using let. Favoring immutable variables declared with let can help make your program easier to reason about and debug.
What you shouldn’t do, as #drewag points out in his answer, is write functions that change global variables (or access singletons which amount to the same thing). Any global function you write should operate only on their inputs and produce the exact same results every time regardless of when they are called. Global functions that mutate global state (i.e. make changes to global variables (or change values of variables passed to them as arguments by reference) can be incredibly confusing to debug due to unexpected side-effects they might cause.
There is one downside to writing pure global functions,* which is that you end up “polluting the namespace” – that is, you have all these functions lying around that might have specific relevance to a particular part of your program, but accessible everywhere. To be honest, for a medium-sized application, with well-written generic functions named sensibly, this is probably not an issue. If a function is purely of use to a specific struct or class, maybe make it a static method. If your project really is getting too big, you could perhaps factor out your most general functions into a separate framework, though this is quite a big overhead/learning exercise (and Swift frameworks aren’t entirely fully-baked yet), so if you are just starting out so I’d suggest leaving this for now until you get more confident.
* edit: ok two downsides – member functions are more discoverable (via autocomplete when you hit .)
Updated after discussion with #AirspeedVelocity
Global functions can be ok and they really aren't much different than having type methods or even instance methods on a custom type that is not actually intended to contain state.
The entire thing comes down mostly to personal preference. Here are some pros and cons.
Cons:
They sometimes can cause unintended side effects. That is they can change some global state that you or the caller forgets about causing hard to track down bugs. As long as you are careful about not using global variables and ensure that your function always returns the same result with the same input regardless of the state of the rest of the system, you can mostly ignore this con.
They make code that uses them difficult to test which is important once you start unit testing (which is a definite good policy in most circumstances). It is hard to test because you can't mock out the implementation of a global function easily. For example, to change the value of a global setting. Instead your test will start to depend on your other class that sets this global setting. Being able to inject a setting into your class instead of having to fake out a global function is generally preferable.
They sometimes hint at poor code organization. All of your code should be separable into small, single purpose, logical units. This ensures your code will remain understandable as your code base grows in size and age. The exception to this is truly universal functions that have very high level and reusable concepts. For example, a function that lets you test all of the elements in a sequence. You can also still separate global functions into logical units by separating them into well named files.
Pros:
High level global functions can be very easy to test. However, you cannot ignore the need to still test their logic where they are used because your unit test should not be written with knowledge of how your code is actually implemented.
Easily accessible. It can often be a pain to inject many types into another class (pass objects into an initializer and probably store it as a property). Global functions can often remove this boiler plate code (even if it has the trade off of being less flexible and less testable).
In the end, every code architecture decision is a balance of trade offs each time you go to use it.
I have a Framework.swift that contains a set of common global functions like local(str:String) to get rid of the 2nd parameter from NSLocalize. Also there are a number of alert functions internally using local and with varying number of parameters which makes use of NSAlert as modal dialogs more easy.
So for that purpose global functions are good. They are bad habit when it comes to information hiding where you would expose internal class knowledge to some global functionality.
Related
As I understand, MATLAB cannot use pass by reference when sending arguments to other functions. I am doing audio processing, and I frequently have to pass waveforms as arguments into functions, and because MATLAB uses pass by value for these arguments, it really eats up a lot of RAM when I do this.
I was considering using global variables as a method to pass my waveforms into functions, but everywhere I read there seems to be a general opinion that this is a bad idea, for organization of code, and potentially performance issues... but I haven't really read any detailed answers on how this might impact performance...
My question: What are the negative impacts of using global variables (with sizes > 100MB) to pass arguments to other functions in MATLAB, both in terms of 1) performance and 2) general code organization and good practice.
EDIT: From #Justin's answer below, it turns out MATLAB does on occasion use pass by reference when you do not modify the argument within the function! From this, I have a second related question about global variable performance:
Will using global variables be any slower than using pass by reference arguments to functions?
MATLAB does use pass by reference, but also uses copy-on-write. That is to say, your variable will be passed by reference into the function (and so won't double up on RAM), but if you change the variable within the the function, then MATLAB will create a copy and change the copy (leaving the original unaffected).
This fact doesn't seem to be too well known, but there's a good post on Loren's blog discussing it.
Bottom line: it sounds like you don't need to use global variables at all (which are a bad idea as #Adriaan says).
While relying on copy on write as Justin suggested is typically the best choice, you can easily implement pass by reference. With Matlab oop being nearly as fast as traditional functions in Matlab 2015b or newer, using handle is a reasonable option.
I encountered an interesting use case of a global variable yesterday. I tried to parallellise a piece of code (1200 lines, multiple functions inside the main function, not written by me), using parfor.
Some weird errors came out and it turned out that this piece of code wrote to a log file, but used multiple functions to write to the log file. Rather than opening and closing the relevant log file every time a function wanted to write to it, which is very slow, the file ID was made global, so that all write-functions could access it.
For the serial case this made perfect sense, but when trying to parallellise this, using global apparently breaks the scope of a worker instance as well. So suddenly we had 4 workers all trying to write into the same log file, which resulted in some weird errors.
So all in all, I maintain my position that using global variables is generally a bad idea, although I can see its use in specific cases, provided you know what you're doing.
Using global variables in Matlab may increase performance alot. This is because you can avoid copying of data in some cases.
Before attempting to gain such performance tweaks, think carefully of the cost to your project, in terms of the many drawbacks that global variables come with. There are also pitfalls to using globals with bad consequences to performance, and those may be difficult to avoid(although possible). Any code that is littered with globals tend to be difficult to comprehend.
If you want to see globals in use for performance, you can look at this real-time toolbox for optical flow that I made. This is the only project in native Matlab that is capable of real-time optical flow that I know of. Using globals was one of the reasons this was doable. It is also a reason to why the code is quite difficult to grasp: Globals are evil.
That globals can be used this way is not a way to argue for their use, rather it should be a hint that something should be updated with Matlabs unflexible notions of workspace and inefficient alternatives to globals such as guidata/getappdata/setappdata.
I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.
As I understand I got a code review that my module has behavior and state at the same time, what does it mean anyway ?
Isn't that the whole point of object oriented programming, that instead of operating on data directly with logical circuitry using functions. We choose to operate on these closed black-boxes (encapsulation) using a set of neatly designed keys, switches and gears.
Wouldn't such a scheme naturally contain data(state) and logic(behavior) at the same time ?
By module I mean : a real Ruby module.
I designed something like this : How to design an application keeping SOLID principles and Design Patterns in mind
and implemented the commands in a module which I used to mixin.
Whatever you are referring to, be it an object defined by a class (or type), a module, or anything else with code in it, state is data that is persisted over multiple calls to the thing. If it "remembers" anything between one execution and the next, then it has state.
Behavior, otoh, is code that manipulates or processes that state-data, or non-state data that is used only during a single execution of the code, (like parameter values passed to a function). Methods, subroutines or functions, anything that changes or does something is behavior.
Most classes, types, or whatever, have both data (state) and behavior, but....
Some classes or types are designed simply to carry data around. They are referred to as Data Transfer objects or DTOs, or Plain Old Container Objects (POCOs). They only have state, and, generally, have little or no behavior.
Other times, a class or type is constructed to hold general utility functions, (like a Math Library). It will not maintain or keep any state between the many times it is called to perform one of its utilities. The only data used in it is data passed in as parameters for each call to the library function, and that data is discarded when the routine is finished. It has behavior. but no state.
You're right in your thinking that OOP encapsulates the ideas of both behaviour and state and mixes the two things together, but from the wording of your question, I'm wondering if you have written a ruby module (mixin, whatever you want to call it) that is stateful, such that there is the potential for state leakage across multiple uses of the same module.
Without seeing the code in question I can't really give you a full answer.
In Object-Oriented terminology, an object is said to have state when it encapsulates data (attributes, properties) and is said to have behavior when it offers operations (methods, procedures, functions) that operate (create, delete, modify, make calculations) on the data.
The same concepts can be extrapolated to a ruby module, it has "state" if it defined data accessible within the module, and it has "behavior" in the form of operations provided which operate on the data.
Are there any languages with possibility of declaring global assertions - that is assertion that should hold during the whole program execution. So that it would be possible to write something like:
global assert (-10 < speed < 10);
and this assertion will be checked every time speed changes state?
eiffel supports all different contracts: precondition, postcondition, invariant... you may want to use that.
on the other hand, why do you have a global variable? why don't you create a class which modifies the speed. doing so, you can easily check your condition every time the value changes.
I'm not aware of any languages that truly do such a thing, and I would doubt that there exist any since it is something that is rather hard to implement and at the same time not something that a lot of people need.
It is often better to simply assert that the inputs are valid and modifications are only done when allowed and in a defined, sane way. This concludes the need of "global asserts".
You can get this effect "through the backdoor" in several ways, though none is truly elegant, and two are rather system-dependent:
If your language allows operator overloading (such as e.g. C++), you can make a class that overloads any operator which modifies the value. It is considerable work, but on the other hand trivial, to do the assertions in there.
On pretty much every system, you can change the protection of memory pages that belong to your process. You could put the variable (and any other variables that you want to assert) separately and set the page to readonly. This will cause a segmentation fault when the value is written to, which you can catch (and verify that the assertion is true). Windows even makes this explicitly available via "guard pages" (which are really only "readonly pages in disguise").
Most modern processors support hardware breakpoints. Unless your program is to run on some very exotic platform, you can exploit these to have more fine-grained control in a similar way as by tampering with protections. See for example this article on another site, which describes how to do it under Windows on x86. This solution will require you to write a kind of "mini-debugger" and implies that you may possibly run into trouble when running your program under a real debugger.
I'm working on a project in C#. The previous programmer didn't know object oriented programming, so most of the code is in huge files (we're talking around 4-5000 lines) spread over tens and sometimes hundreds of methods, but only one class. Refactoring such a project is a huge undertaking, and so I've semi-learned to live with it for now.
Whenever a method is used in one of the code files, the class is instantiated and then the method is called on the object instance.
I'm wondering whether there are any noticeable performance penalties in doing it this way? Should I make all the methods static "for now" and, most importantly, will the application benefit from it in any way?
From here, a static call is 4 to 5 times faster than constructing an instance every time you call an instance method. However, we're still only talking about tens of nanoseconds per call, so you're unlikely to notice any benefit unless you have really tight loops calling a method millions of times, and you could get the same benefit by constructing a single instance outside that loop and reusing it.
Since you'd have to change every call site to use the newly static method, you're probably better spending your time on gradually refactoring.
I have dealt with a similar problem where i work. The programmer before me created 1 controller class where all the BLL functions were dumped.
We are redesigning the system now and have created many Controller classes depending on what they are supposed to control e.g.
UserController, GeographyController, ShoppingController...
Inside each controller class they have static methods which make calls to cache or the DAL using the singleton pattern.
This has given us 2 main advantages. It is slightly faster(around 2-3 times faster but were talking nanoseconds here ;P). The other is that the code is much cleaner
i.e
ShoppingController.ListPaymentMethods()
instead of
new ShoppingController().ListPaymentMethods()
I think it makes sense to use static methods or classes if the class doesn't maintain any state.
It depends on what else that object contains -- if the "object" is just a bunch of functions then it's probably not the end of the world. But if the object contains a bunch of other objects, then instantiating it is gonna call all their constructors (and destructors, when it's deleted) and you may get memory fragmentation and so on.
That said, it doesn't sound like performance is your biggest problem right now.
You have to determine the goals of the rewrite. If you want to have nice testable, extendable & maintainable OO code then you could try to use objects and their instance methods. After all this is Object Oriented programing we're talking about here, not Class Oriented programming.
It is very straightforward to fake and/or mock objects when you define classes that implement interfaces and you execute instance methods. This makes thorough unit testing quick and effective.
Also, if you are to follow good OO principles (see SOLID at http://en.wikipedia.org/wiki/SOLID_%28object-oriented_design%29 ) and/or use design patterns you will certainly be doing a lot of instance based, interface based development, and not using many static methods.
As for this suggestion:
It seems silly to me to create an object JUST so you can call a method which
seemingly has no side effects on the object (from your description i assume this).
I see this a lot in dot net shops and to me this violates encapsulation, a key OO concept. I should not be able to tell whether a method has side effects by whether or not the method is static. As well as breaking encapsulation this means that you will need to be changing methods from static to instance if/when you modify them to have side effects. I suggest you read up on the Open/Closed principle for this one and see how the suggested approach, quoted above, works with that in mind.
Remember that old chestnut, 'premature optimization is the root of all evil'. I think in this case this means don't jump through hoops using inappropriate techniques (i.e. Class Oriented programming) until you know you have a performance issue. Even then debug the issue and look for the most appropriate.
Static methods are a lot faster and uses a lot less memory. There is this misconception that it's just a little faster. It's a little faster as long as you don't put it on loops. BTW, some loops look small but really aren't because the method call containing the loop is also another loop. You can tell the difference in code that performs rendering functions. A lot less memory is unfortunately true in many cases. An instance allows easy sharing of information with sister methods. A static method will ask for the information when he needs it.
But like in driving cars, speed brings responsibility. Static methods usually have more parameters than their instance counterpart. Because an instance would take care of caching shared variables, your instance methods will look prettier.
ShapeUtils.DrawCircle(stroke, pen, origin, radius);
ShapeUtils.DrawSquare(stroke, pen, x, y, width, length);
VS
ShapeUtils utils = new ShapeUtils(stroke,pen);
util.DrawCircle(origin,radius);
util.DrawSquare(x,y,width,length);
In this case, whenever the instance variables are used by all methods most of the time, instance methods are pretty worth it. Instances are NOT ABOUT STATE, it's about SHARING although COMMON STATE is a natural form of SHARING, they are NOT THE SAME. General rule of thumb is this: if the method is tightly coupled with other methods --- they love each other so much that they when one is called, the other needs to be called too and they probably share the same cup of water---, it should be made instance. To translate static methods into instance methods is not that hard. You only need to take the shared parameters and put them as instance variables. The other way around is harder.
Or you can make a proxy class that will bridge the static methods. While it may seem to be more inefficient in theory, practice tells a different story. This is because whenever you need to call a DrawSquare once (or in a loop), you go straight to the static method. But whenever you are gonna use it over and over along with DrawCircle, you are gonna use the instance proxy. An example is the System.IO classes FileInfo (instance) vs File (static).
Static Methods are testable. In fact, even more testable than instance once. Method GetSum(x,y) would be very testable to not just unit test but load test, integrated test and usage test. Instance methods are good for units tests but horrible for every other tests (which matters more than units tests BTW) that is why we get so many bugs these days. The thing that makes ALL Methods untestable are parameters that don't make sense like (Sender s, EventArgs e) or global state like DateTime.Now. In fact, static methods are so good at testability that you see less bugs in C code of a new Linux distro than your average OO programmer (he's full of s*** I know).
I think you've partially answered this question in the way you asked it: are there any noticeable performance penalties in the code that you have?
If the penalties aren't noticeable, you needn't necessarily do anything at all. (Though it goes without saying the codebase would benefit dramtically from a gradual refactor into a respectable OO model).
I guess what I'm saying is, a performance problem is only a problem when you notice that it's a problem.
It seems silly to me to create an object JUST so you can call a method which seemingly has no side effects on the object (from your description i assume this). It seems to me that a better compromise would be to have several global objects and just use those. That way you can put the variables that normally would be global into the appropriate classes so that they have slightly smaller scope.
From there you can slowly move the scope of these objects to be smaller and smaller until you have a decent OOP design.
Then again, the approach that I would probably use is different ;).
Personally, I would likely focus on structures and the functions which operate on them and try to convert these into classes with members little by little.
As for the performance aspect of the question, static methods should be slightly faster (but not much) since they don't involve constructing, passing and deconstructing an object.
It's not valid in PHP,
Object Method is faster :
http://www.vanylla.it/tests/static-method-vs-object.php