Explain/Give example of "Hide pointer operations" in Code Complete 2 - coding-style

I am reading Code Complete 2, Chapter 7.1 and I don't understand the point author said below.
7.1 Valid Reasons to Create a Routine
Hide pointer operations
Pointer operations tend to be hard to read and error prone. By isolating them in routines (or a class, if appropriate), you can concentrate on the intent of the operation rather than the mechanics of pointer manipulation. Also, if the operations are done in only one place, you can be more certain that the code is correct. If you find a better data type than pointers, you can change the program without traumatizing the routines that would have used the pointers.
Please explain or give example of this purpose.

Essentially, the advice is a specific example of the data-hiding. It boils down to this -
Stick to Object-oriented design and hide your data within objects.
In case of pointers, the norm is to NEVER expose pointers of "internal" data-structures as public members. Rather make them private and expose ONLY certain meaningful manipulations that are allowed to be performed on the pointers as public member functions.
Portable / Easy to maintain
The added advantage (as explained in the section quoted) is that any change in the internal data structures never forces the external API to be changed. Only the internal implementation of the publicly exposed member functions needs to be modified to handle any changes.
Code re-use / Easy to debug
Also pointer manipulations are now NOT copy/pasted and littered all around the code with no idea what exactly they do. They are now limited to the member functions which are written keeping in mind how exactly the internal data structures are being manipulated.
For example if we have a table of data which the user is allowed to add rows into,
Do NOT expose
pointers to the head/tail of table.
pointers to the individual elements.
Instead create a table object that exposes the functions
addNewRowTop(newData)
addNewRowBottom(newData)
addNewRow(position, newData)
To take this further, we implement addNewRowTop() and addNewRowBottom() by simply calling addNewRow() with the proper position - another internal variable of the table object.

Related

Is it bad to have many global functions?

I'm relatively new to software development, and I'm on my way to completing my first app for the iPhone.
While learning Swift, I learned that I could add functions outside the class definition, and have it accessible across all views. After a while, I found myself making many global functions for setting app preferences (registering defaults, UIAppearance, etc).
Is this bad practice? The only alternate way I could think of was creating a custom class to encapsulate them, but then the class itself wouldn't serve any purpose and I'd have to think of ways to passing it around views.
Global functions: good (IMHO anyway, though some disagree)
Global state: bad (fairly universally agreed upon)
By which I mean, it’s probably a good practice to break up your code to create lots of small utility functions, to make them general, and to re-use them. So long as they are “pure functions”
For example, suppose you find yourself checking if all the entries in an array have a certain property. You might write a for loop over the array checking them. You might even re-use the standard reduce to do it. Or you could write a re-useable function, all, that takes a closure that checks an element, and runs it against every element in the array. It’s nice and clear when you’re reading code that goes let allAboveGround = all(sprites) { $0.position.y > 0 } rather than a for…in loop that does the same thing. You can also write a separate unit test specifically for your all function, and be confident it works correctly, rather than a much more involved test for a function that includes embedded in it a version of all amongst other business logic.
Breaking up your code into smaller functions can also help avoid needing to use var so much. For example, in the above example you would probably need a var to track the result of your looping but the result of the all function can be assigned using let. Favoring immutable variables declared with let can help make your program easier to reason about and debug.
What you shouldn’t do, as #drewag points out in his answer, is write functions that change global variables (or access singletons which amount to the same thing). Any global function you write should operate only on their inputs and produce the exact same results every time regardless of when they are called. Global functions that mutate global state (i.e. make changes to global variables (or change values of variables passed to them as arguments by reference) can be incredibly confusing to debug due to unexpected side-effects they might cause.
There is one downside to writing pure global functions,* which is that you end up “polluting the namespace” – that is, you have all these functions lying around that might have specific relevance to a particular part of your program, but accessible everywhere. To be honest, for a medium-sized application, with well-written generic functions named sensibly, this is probably not an issue. If a function is purely of use to a specific struct or class, maybe make it a static method. If your project really is getting too big, you could perhaps factor out your most general functions into a separate framework, though this is quite a big overhead/learning exercise (and Swift frameworks aren’t entirely fully-baked yet), so if you are just starting out so I’d suggest leaving this for now until you get more confident.
* edit: ok two downsides – member functions are more discoverable (via autocomplete when you hit .)
Updated after discussion with #AirspeedVelocity
Global functions can be ok and they really aren't much different than having type methods or even instance methods on a custom type that is not actually intended to contain state.
The entire thing comes down mostly to personal preference. Here are some pros and cons.
Cons:
They sometimes can cause unintended side effects. That is they can change some global state that you or the caller forgets about causing hard to track down bugs. As long as you are careful about not using global variables and ensure that your function always returns the same result with the same input regardless of the state of the rest of the system, you can mostly ignore this con.
They make code that uses them difficult to test which is important once you start unit testing (which is a definite good policy in most circumstances). It is hard to test because you can't mock out the implementation of a global function easily. For example, to change the value of a global setting. Instead your test will start to depend on your other class that sets this global setting. Being able to inject a setting into your class instead of having to fake out a global function is generally preferable.
They sometimes hint at poor code organization. All of your code should be separable into small, single purpose, logical units. This ensures your code will remain understandable as your code base grows in size and age. The exception to this is truly universal functions that have very high level and reusable concepts. For example, a function that lets you test all of the elements in a sequence. You can also still separate global functions into logical units by separating them into well named files.
Pros:
High level global functions can be very easy to test. However, you cannot ignore the need to still test their logic where they are used because your unit test should not be written with knowledge of how your code is actually implemented.
Easily accessible. It can often be a pain to inject many types into another class (pass objects into an initializer and probably store it as a property). Global functions can often remove this boiler plate code (even if it has the trade off of being less flexible and less testable).
In the end, every code architecture decision is a balance of trade offs each time you go to use it.
I have a Framework.swift that contains a set of common global functions like local(str:String) to get rid of the 2nd parameter from NSLocalize. Also there are a number of alert functions internally using local and with varying number of parameters which makes use of NSAlert as modal dialogs more easy.
So for that purpose global functions are good. They are bad habit when it comes to information hiding where you would expose internal class knowledge to some global functionality.

Is 'handle' synonymous to pointer in WinAPI?

I've been reading some books on windows programming in C++ lately, and I have had some confusing understanding of some of the recurring concepts in WinAPI. For example, there are tons of data types that start with the handle keyword'H', are these supposed to be used like pointers? But then there are other data types that start with the pointer keyword 'P'. So I guess not. Then what is it exactly? And why were pointers to some data types given separate data types in the first place? For example, PCHAR could have easily designed to be CHAR*?
Handles used to be pointers in early versions of Windows but are not anymore. Think of them as a "cookie", a unique value that allows Windows to find back a resource that was allocated earlier. Like CreateFile() returns a new handle, you later use it in SetFilePointer() and ReadFile() to read data from that same file. And CloseHandle() to clean up the internal data structure, closing the file as well. Which is the general pattern, one api function to create the resource, one or more to use it and one to destroy it.
Yes, the types that start with P are pointer types. And yes, they are superfluous, it works just as well if you use the * yourself. Not actually sure why C programmers like to declare them, I personally think it reduces code readability and I always avoid them. But do note the compound types, like LPCWSTR, a "long pointer to a constant wide string". The L doesn't mean anything anymore, that dates back to the 16-bit version of Windows. But pointer, const and wide are important. I do use that typedef, not doing so will risk future portability problems. Which is the core reason these typedefs exist.
A handle is the same as a pointer only so far as both ID a particular item. Obviously a pointer is the address of the item so if you know it's structure you can start getting fields in the item. A handle may or may not be a pointer - basically if it is a pointer you don't know what it is pointing to so you can't get into the fields.
Best way to think of a handle is that it is a unique ID for something in the system. When you pass it to something in the system the system will know what to cast it to (if it is a pointer) or how to treat it (if it is just some id or index).

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.
I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

What is the design rationale behind HandleScope?

V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.
Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.
Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).

What does it mean when someone says that the module has both behavior and state?

As I understand I got a code review that my module has behavior and state at the same time, what does it mean anyway ?
Isn't that the whole point of object oriented programming, that instead of operating on data directly with logical circuitry using functions. We choose to operate on these closed black-boxes (encapsulation) using a set of neatly designed keys, switches and gears.
Wouldn't such a scheme naturally contain data(state) and logic(behavior) at the same time ?
By module I mean : a real Ruby module.
I designed something like this : How to design an application keeping SOLID principles and Design Patterns in mind
and implemented the commands in a module which I used to mixin.
Whatever you are referring to, be it an object defined by a class (or type), a module, or anything else with code in it, state is data that is persisted over multiple calls to the thing. If it "remembers" anything between one execution and the next, then it has state.
Behavior, otoh, is code that manipulates or processes that state-data, or non-state data that is used only during a single execution of the code, (like parameter values passed to a function). Methods, subroutines or functions, anything that changes or does something is behavior.
Most classes, types, or whatever, have both data (state) and behavior, but....
Some classes or types are designed simply to carry data around. They are referred to as Data Transfer objects or DTOs, or Plain Old Container Objects (POCOs). They only have state, and, generally, have little or no behavior.
Other times, a class or type is constructed to hold general utility functions, (like a Math Library). It will not maintain or keep any state between the many times it is called to perform one of its utilities. The only data used in it is data passed in as parameters for each call to the library function, and that data is discarded when the routine is finished. It has behavior. but no state.
You're right in your thinking that OOP encapsulates the ideas of both behaviour and state and mixes the two things together, but from the wording of your question, I'm wondering if you have written a ruby module (mixin, whatever you want to call it) that is stateful, such that there is the potential for state leakage across multiple uses of the same module.
Without seeing the code in question I can't really give you a full answer.
In Object-Oriented terminology, an object is said to have state when it encapsulates data (attributes, properties) and is said to have behavior when it offers operations (methods, procedures, functions) that operate (create, delete, modify, make calculations) on the data.
The same concepts can be extrapolated to a ruby module, it has "state" if it defined data accessible within the module, and it has "behavior" in the form of operations provided which operate on the data.

Resources