How do Erlang actors differ from OOP objects? - ruby

Suppose I have an Erlang actor defined like this:
counter(Num) ->
receive
{From, increment} ->
From ! {self(), new_value, Num + 1}
counter(Num + 1);
end.
And similarly, I have a Ruby class defined like this:
class Counter
def initialize(num)
#num = num
end
def increment
#num += 1
end
end
The Erlang code is written in a functional style, using tail recursion to maintain state. However, what is the meaningful impact of this difference? To my naive eyes, the interfaces to these two things seem much the same: You send a message, the state gets updated, and you get back a representation of the new state.
Functional programming is so often described as being a totally different paradigm than OOP. But the Erlang actor seems to do exactly what objects are supposed to do: Maintain state, encapsulate, and provide a message-based interface.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
I suspect there are bigger consequences to the functional/OOP dichotomy than I'm seeing. Can anyone point them out?
Let's put aside the fact that the Erlang actor will be scheduled by the VM and thus may run concurrently with other code. I realize that this is a major difference between the Erlang and Ruby versions, but that's not what I'm getting at. Concurrency is possible in other languages, including Ruby. And while Erlang's concurrency may perform very differently (sometimes better), I'm not really asking about the performance differences.
Rather, I'm more interested in the functional-vs-OOP side of the question.

In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
The difference is that in traditional languages like Ruby there is no message passing but method call that is executed in the same thread and this may lead to synchronization problems if you have multithreaded application. All threads have access to each other thread memory.
In Erlang all actors are independent and the only way to change state of another actor is to send message. No process have access to internal state of any other process.

IMHO this is not the best example for FP vs OOP. Differences usually manifest in accessing/iterating and chaining methods/functions on objects. Also, probably, understanding what is "current state" works better in FP.
Here, you put two very different technologies against each other. One happen to be F, the other one OO.
The first difference I can spot right away is memory isolation. Messages are serialized in Erlang, so it is easier to avoid race conditions.
The second are memory management details. In Erlang message handling is divided underneath between Sender and Receiver. There are two sets of locks of process structure held by Erlang VM. Therefore, while Sender sends the message he acquires lock which is not blocking main process operations (accessed by MAIN lock). To sum up, it gives Erlang more soft real-time nature vs totally random behaviour on Ruby side.

Looking from the outside, actors resemble objects. They encapsulate state and communicate with the rest of the world via messages to manipulate that state.
To see how FP works, you must look inside an actor and see how it mutates state. Your example where the state is an integer is too simple. I don't have the time to provide full example, but I'll sketch the code. Normally, an actor loop looks like following:
loop(State) ->
Message = receive
...
end,
NewState = f(State, Message),
loop(NewState).
The most important difference from OOP is that there are no variable mutations i.e. NewState is obtained from the State and may share most of the data with it, but the State variable always remains the same.
This is a nice property, since we never corrupt current state. Function f will usually perform a series of transformation to turn State into NewState. And only if/when it completely succeeds we replace the old state with the new one by calling loop(NewState).
So the important benefit is consistency of our state.
The second benefit I found is cleaner code, but it takes some time getting used to it. Generally, since you cannot modify variable, you will have to divide your code in many very small functions. This is actually nice, because your code will be well factored.
Finally, since you cannot modify a variable, it is easier to reason about the code. With mutable objects you can never be sure whether some part of your object will be modified, and it gets progressively worse if using global variables. You should not encounter such problems when doing FP.
To try it out, you should try to manipulate some more complex data in a functional way by using pure erlang structures (not actors, ets, mnesia or proc dict). Alternatively, you might try it in ruby with this

Erlang includes the message passing approach of Alan Kay's OOP (Smalltalk) and the functional programming from Lisp.
What you describe in your example is the message approach for OOP. The Erlang processes sending messages are a concept similar to Alan Kay's objects sending messages. By the way, you can retrieve this concept implemtented also in Scratch where parallel running objects send messages between them.
The functional programming is how you code the processes. For instance, variables in Erlang cannot be modified. Once they have been set, you can only read them. You have also a list data structure which works pretty much like Lisp lists and you have fun which are insprired by Lisp's lambda.
The message passing on one side, and the functional on the other side are quite two separate things in Erlang. When coding real life erlang applications, you spend 98% of your time doing functional programming and 2% thinking about messages passing, which is mainly used for scalability and concurrency. To say it another way, when you come to tackly complex programming problem, you will probably use the FP side of Erlang to implement the details of the algo, and use the message passing for scalability, reliability, etc...

What do you think of this:
thing(0) ->
exit(this_is_the_end);
thing(Val) when is_integer(Val) ->
NewVal = receive
{From,F,Arg} -> NV = F(Val,Arg),
From ! {self(), new_value, NV},
NV;
_ -> Val div 2
after 10000
max(Val-1,0)
end,
thing(NewVal).
When you spawn the process, it will live by its own, decreasing its value until it reach the value 0 and send the message {'EXIT',this_is_the_end} to any process linked to it, unless you take care of executing something like:
ThingPid ! {self(),fun(X,_) -> X+1 end,[]}.
% which will increment the counter
or
ThingPid ! {self(),fun(X,X) -> 0; (X,_) -> X end,10}.
% which will do nothing, unless the internal value = 10 and in this case will go directly to 0 and exit
In this case you can see that the "object" lives its own live by itself in parallel with the rest of the application, that it can interact with the outside almost without any code, and that the outside can ask him to do things you didn't know when you wrote and compile the code.
This is a stupid code, but there are some principle that are used to implement application like mnesia transaction, the behaviors... IMHO the concept is really different, but you have to try to think different if you want to use it correctly. I am pretty sure that it is possible to write "OOPlike" code in Erlang, but it will be extremely difficult to avoid concurrency :o), and at the end no advantage. Have a look at OTP principle which gives some tracks about the application architecture in Erlang (supervision trees, pool of "1 single client servers", linked processes, monitored processes, and of course pattern matching single assignment, messages, node clusters ...).

Related

how to know what is NOT thread-safe in ruby?

starting from Rails 4, everything would have to run in threaded environment by default. What this means is all of the code we write AND ALL the gems we use are required to be threadsafe
so, I have few questions on this:
what is NOT thread-safe in ruby/rails? Vs What is thread-safe in ruby/rails?
Is there a list of gems that is known to be threadsafe or vice-versa?
is there List of common patterns of code which are NOT threadsafe example #result ||= some_method?
Are the data structures in ruby lang core such as Hash etc threadsafe?
On MRI, where there a GVL/GIL which means only 1 ruby thread can run at a time except for IO, does the threadsafe change effect us?
None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).
MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like #n = 0; 3.times { Thread.start { 100.times { #n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.
Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?
In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. #n += 1 is not thread safe, because it is multiple operations. #n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). #n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless #started; #started = true, which is not thread safe at all.
I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); #x = x; end is not side-effect free.
One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:
class Thing
attr_reader :stuff
def initialize(initial_stuff)
#stuff = initial_stuff
#state_lock = Mutex.new
end
def add(item)
#state_lock.synchronize do
#stuff << item
end
end
end
A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.
Another classic Ruby example is this:
STANDARD_OPTIONS = {:color => 'red', :count => 10}
def find_stuff
#some_service.load_things('stuff', STANDARD_OPTIONS)
end
find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.
If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.
It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like #n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).
Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.
In addition to Theo's answer, I'd add a couple problem areas to lookout for in Rails specifically, if you're switching to config.threadsafe!
Class variables:
##i_exist_across_threads
ENV:
ENV['DONT_CHANGE_ME']
Threads:
Thread.start
starting from Rails 4, everything would have to run in threaded environment by default
This is not 100% correct. Thread-safe Rails is just on by default. If you deploy on a multi-process app server like Passenger (community) or Unicorn there will be no difference at all. This change only concerns you, if you deploy on a multi-threaded environment like Puma or Passenger Enterprise > 4.0
In the past if you wanted to deploy on a multi-threaded app server you had to turn on config.threadsafe, which is default now, because all it did had either no effects or also applied to a Rails app running in a single process (Prooflink).
But if you do want all the Rails 4 streaming benefits and other real time stuff of the multi-threaded deployment
then maybe you will find this article interesting. As #Theo sad, for a Rails app, you actually just have to omit mutating static state during a request. While this a simple practice to follow, unfortunately you cannot be sure about this for every gem you find. As far as i remember Charles Oliver Nutter from the JRuby project had some tips about it in this podcast.
And if you want to write a pure concurrent Ruby programming, where you would need some data structures which are accessed by more than one thread you maybe will find the thread_safe gem useful.

What is the design rationale behind HandleScope?

V8 requires a HandleScope to be declared in order to clean up any Local handles that were created within scope. I understand that HandleScope will dereference these handles for garbage collection, but I'm interested in why each Local class doesn't do the dereferencing themselves like most internal ref_ptr type helpers.
My thought is that HandleScope can do it more efficiently by dumping a large number of handles all at once rather than one by one as they would in a ref_ptr type scoped class.
Here is how I understand the documentation and the handles-inl.h source code. I, too, might be completely wrong since I'm not a V8 developer and documentation is scarce.
The garbage collector will, at times, move stuff from one memory location to another and, during one such sweep, also check which objects are still reachable and which are not. In contrast to reference-counting types like std::shared_ptr, this is able to detect and collect cyclic data structures. For all of this to work, V8 has to have a good idea about what objects are reachable.
On the other hand, objects are created and deleted quite a lot during the internals of some computation. You don't want too much overhead for each such operation. The way to achieve this is by creating a stack of handles. Each object listed in that stack is available from some handle in some C++ computation. In addition to this, there are persistent handles, which presumably take more work to set up and which can survive beyond C++ computations.
Having a stack of references requires that you use this in a stack-like way. There is no “invalid” mark in that stack. All the objects from bottom to top of the stack are valid object references. The way to ensure this is the LocalScope. It keeps things hierarchical. With reference counted pointers you can do something like this:
shared_ptr<Object>* f() {
shared_ptr<Object> a(new Object(1));
shared_ptr<Object>* b = new shared_ptr<Object>(new Object(2));
return b;
}
void g() {
shared_ptr<Object> c = *f();
}
Here the object 1 is created first, then the object 2 is created, then the function returns and object 1 is destroyed, then object 2 is destroyed. The key point here is that there is a point in time when object 1 is invalid but object 2 is still valid. That's what LocalScope aims to avoid.
Some other GC implementations examine the C stack and look for pointers they find there. This has a good chance of false positives, since stuff which is in fact data could be misinterpreted as a pointer. For reachability this might seem rather harmless, but when rewriting pointers since you're moving objects, this can be fatal. It has a number of other drawbacks, and relies a lot on how the low level implementation of the language actually works. V8 avoids that by keeping the handle stack separate from the function call stack, while at the same time ensuring that they are sufficiently aligned to guarantee the mentioned hierarchy requirements.
To offer yet another comparison: an object references by just one shared_ptr becomes collectible (and actually will be collected) once its C++ block scope ends. An object referenced by a v8::Handle will become collectible when leaving the nearest enclosing scope which did contain a HandleScope object. So programmers have more control over the granularity of stack operations. In a tight loop where performance is important, it might be useful to maintain just a single HandleScope for the whole computation, so that you won't have to access the handle stack data structure so often. On the other hand, doing so will keep all the objects around for the whole duration of the computation, which would be very bad indeed if this were a loop iterating over many values, since all of them would be kept around till the end. But the programmer has full control, and can arrange things in the most appropriate way.
Personally, I'd make sure to construct a HandleScope
At the beginning of every function which might be called from outside your code. This ensures that your code will clean up after itself.
In the body of every loop which might see more than three or so iterations, so that you only keep variables from the current iteration.
Around every block of code which is followed by some callback invocation, since this ensures that your stuff can get cleaned if the callback requires more memory.
Whenever I feel that something might produce considerable amounts of intermediate data which should get cleaned (or at least become collectible) as soon as possible.
In general I'd not create a HandleScope for every internal function if I can be sure that every other function calling this will already have set up a HandleScope. But that's probably a matter of taste.
Disclaimer: This may not be an official answer, more of a conjuncture on my part; but the v8 documentation is hardly
useful on this topic. So I may be proven wrong.
From my understanding, in developing various v8 based backed application. Its a means of handling the difference between the C++ and javaScript environment.
Imagine the following sequence, which a self dereferencing pointer can break the system.
JavaScript calls up a C++ wrapped v8 function : lets say helloWorld()
C++ function creates a v8::handle of value "hello world =x"
C++ returns the value to the v8 virtual machine
C++ function does its usual cleaning up of resources, including dereferencing of handles
Another C++ function / process, overwrites the freed memory space
V8 reads the handle : and the data is no longer the same "hell!#(#..."
And that's just the surface of the complicated inconsistency between the two; Hence to tackle the various issues of connecting the JavaScript VM (Virtual Machine) to the C++ interfacing code, i believe the development team, decided to simplify the issue via the following...
All variable handles, are to be stored in "buckets" aka HandleScopes, to be built / compiled / run / destroyed by their
respective C++ code, when needed.
Additionally all function handles, are to only refer to C++ static functions (i know this is irritating), which ensures the "existence"
of the function call regardless of constructors / destructor.
Think of it from a development point of view, in which it marks a very strong distinction between the JavaScript VM development team, and the C++ integration team (Chrome dev team?). Allowing both sides to work without interfering one another.
Lastly it could also be the sake of simplicity, to emulate multiple VM : as v8 was originally meant for google chrome. Hence a simple HandleScope creation and destruction whenever we open / close a tab, makes for much easier GC managment, especially in cases where you have many VM running (each tab in chrome).

What does it mean when someone says that the module has both behavior and state?

As I understand I got a code review that my module has behavior and state at the same time, what does it mean anyway ?
Isn't that the whole point of object oriented programming, that instead of operating on data directly with logical circuitry using functions. We choose to operate on these closed black-boxes (encapsulation) using a set of neatly designed keys, switches and gears.
Wouldn't such a scheme naturally contain data(state) and logic(behavior) at the same time ?
By module I mean : a real Ruby module.
I designed something like this : How to design an application keeping SOLID principles and Design Patterns in mind
and implemented the commands in a module which I used to mixin.
Whatever you are referring to, be it an object defined by a class (or type), a module, or anything else with code in it, state is data that is persisted over multiple calls to the thing. If it "remembers" anything between one execution and the next, then it has state.
Behavior, otoh, is code that manipulates or processes that state-data, or non-state data that is used only during a single execution of the code, (like parameter values passed to a function). Methods, subroutines or functions, anything that changes or does something is behavior.
Most classes, types, or whatever, have both data (state) and behavior, but....
Some classes or types are designed simply to carry data around. They are referred to as Data Transfer objects or DTOs, or Plain Old Container Objects (POCOs). They only have state, and, generally, have little or no behavior.
Other times, a class or type is constructed to hold general utility functions, (like a Math Library). It will not maintain or keep any state between the many times it is called to perform one of its utilities. The only data used in it is data passed in as parameters for each call to the library function, and that data is discarded when the routine is finished. It has behavior. but no state.
You're right in your thinking that OOP encapsulates the ideas of both behaviour and state and mixes the two things together, but from the wording of your question, I'm wondering if you have written a ruby module (mixin, whatever you want to call it) that is stateful, such that there is the potential for state leakage across multiple uses of the same module.
Without seeing the code in question I can't really give you a full answer.
In Object-Oriented terminology, an object is said to have state when it encapsulates data (attributes, properties) and is said to have behavior when it offers operations (methods, procedures, functions) that operate (create, delete, modify, make calculations) on the data.
The same concepts can be extrapolated to a ruby module, it has "state" if it defined data accessible within the module, and it has "behavior" in the form of operations provided which operate on the data.

TDD for a Device Communicator

I've been reading about TDD, and would like to use it for my next project, but I'm not sure how to structure my classes with this new paradigm. The language I'd like to use is Java, although the problem is not really language-specific.
The Project
I have a few pieces of hardware that come with a ASCII-over-RS232 interface. I can issue simple commands, and get simple responses, and control them as if from their front panels. Each one has a slightly different syntax and very different sets of commands. My goal is to create an abstraction/interface so I can control them all through a GUI and/or remote procedure calls.
The Problem
I believe the first step is to create an abstract class (I'm bad at names, how about 'Communicator'?) to implement all the stuff like Serial I/O, and then create a subclass for each device. I'm sure it will be a little more complicated than that, but that's the core of the application in my mind.
Now, for unit tests, I don't think I really need the actual hardware or a serial connection. What I'd like to do is hand my Communicators an InputStream and OutputStream (or Reader and Writer) that could be from a serial port, file, stdin/stdout, piped from a test function, whatever. So, would I just have the Communicator constructor take those as inputs? If so, it would be easy to put the responsibility of setting it all up on the testing framework, but for the real thing, who makes the actual connection? A separate constructor? The function calling the constructor again? A separate class who's job it is to 'connect' the Communicator to the correct I/O streams?
Edit
I was about to rewrite the problem section in order to get answers to the question I thought I was asking, but I think I figured it out. I had (correctly?) identified two different functional areas.
1) Dealing with the serial port
2) Communicating with the device (understanding its output & generating commands)
A few months ago, I would have combined it all into one class. My first idea towards breaking away from this was to pass just the IO streams to the class that understands the device, and I couldn't figure out who would then be responsible for creating the streams.
Having done more research on inversion of control, I think I have an answer. Have a separate interface and class that solve problem #1 and pass it to the constructor of the class(es?) that solve problem #2. That way, it's easy to test both separately. #1 by connecting to the actual hardware and allowing the test framework to do different things. #2 can be tested by being given a mock of #1.
Does this sound reasonable? Do I need to share more information?
With TDD, you should let your design emerge, start small, with baby steps and grow your classes test by test, little by little.
CLARIFIED: Start with a concrete class, to send one command, unit test it with a mock or a stub. When it will work enough (perhaps not with all options), test it against your real device, once, to validate your mock/stub/simulator.
Once the class for the first command is operational, start implementing a second command, the same way: first again your mock/stub, then once against the device for validation. Now if you're seeing similarities between your two classes, you can refactor to your abstract class based design - or to something different.
Sorry for being a little Linux centric ..
My favorite way of simulating gadgets is to write character device drivers that simulate their behavior. This also gives you fun abilities, like providing an ioctl() interface that makes the simulated device behave abnormally.
At that point .. from testing to real world, it only matters which device(s) you actually open, read and write.
It should not be too hard to mimic the behavior of your gadgets .. it sounds like they take very basic instructions and return very basic responses. Again, a simple ioctl() could tell the simulated device that its time to misbehave, so you can ensure that your code is handling such events adequately. For instance, fail intentionally on every n'th instruction, where n is randomly selected upon the call to ioctl().
After seeing your edits I think you are heading in exactly the right direction. TDD tends to drive you towards a design composed of small classes with a well-defined responsibility. I would also echo tinkertim's advice - a device simulator which you can control and "provoke" into behaving in different ways is invaluable for testing.

TDD ...how?

I'm about to start out my first TDD (test-driven development) program, and I (naturally) have a TDD mental block..so I was wondering if someone could help guide me on where I should start a bit.
I'm creating a function that will read binary data from socket and parses its data into a class object.
As far as I see, there are 3 parts:
1) Logic to parse data
2) socket class
3) class object
What are the steps that I should take so that I could incrementally TDD? I definitely plan to first write the test before even implementing the function.
The issue in TDD is "design for testability"
First, you must have an interface against which to write tests.
To get there, you must have a rough idea of what your testable units are.
Some class, which is built by a function.
Some function, which reads from a socket and emits a class.
Second, given this rough interface, you formalize it into actual non-working class and function definitions.
Third, you start to write your tests -- knowing they'll compile but fail.
Part-way through this, you may start head-scratching about your function. How do you set up a socket for your function? That's a pain in the neck.
However, the interface you roughed out above isn't the law, it's just a good idea. What if your function took an array of bytes and created a class object? This is much, much easier to test.
So, revisit the steps, change the interface, write the non-working class and function, now write the tests.
Now you can fill in the class and the function until all your tests pass.
When you're done with this bit of testing, all you have to do is hook in a real socket. Do you trust the socket libraries? (Hint: you should) Not much to test here. If you don't trust the socket libraries, now you've got to provide a source for the data that you can run in a controlled fashion. That's a large pain.
Your split sounds reasonable. I would consider the two dependencies to be the input and output. Can you make them less dependent on concrete production code? For instance, can you make it read from a general stream of data instead of a socket? That would make it easier to pass in test data.
The creation of the return value could be harder to mock out, and may not be a problem anyway - is the logic used for the actual population of the resulting object reasonably straightforward (after the parsing)? For instance, is it basically just setting trivial properties? If so, I wouldn't bother trying to introduce a factory etc there - just feed in some test data and check the results.
First, start thinking "the testS", plural, rather than "the test", singular. You should expect to write more than one.
Second, if you have mental block, consider starting with a simpler challenge. Lower the difficulty until it's real easy to do, then move on to more substantial work.
For instance, assume you already have a byte array with the binary data, so you don't even need to think about sockets. All you need to write is something that takes in a byte[] and return an instance of your object. Can you write a test for that ?
If you still have mental block, lower it yet another notch. Assume your byte array is only going to contain default values anyway. So you don't even have to worry about the parsing, just about being able to return an instance of your object that has all values set to the defaults. Can you write a test for that ?
I imagine something like:
public void testFooReaderCanParseDefaultFoo() {
FooReader fr = new FooReader();
Foo myFoo = fr.buildFoo();
assertEquals(0, myFoo.bar());
}
That's rock bottom, right ? You're only testing the Foo constructor. But you can then move up to the next level:
public void testFooReaderGivenBytesBuildsFoo() {
FooReader fr = new FooReader();
byte[] fooData = {1};
fr.consumeBytes(fooData);
Foo myFoo = fr.buildFoo();
assertEquals(1, myFoo.bar());
}
And so on...
'The best testing framework is the application itself'
I believe that a common misconception amongst developers is, they mistakenly make a strong association between testing frameworks and TDD principles. I would advise re-reading the official docs on TDD; bearing in mind that, there is no real relationship between testing frameworks and TDD. After all, TDD is a paradigm not a framework.
Upon reading the wiki on TDD (https://en.wikipedia.org/wiki/Test-driven_development), I've come to realise that to an extent things are a little bit open to interpretation.
There are various personal styles of TDD mainly due to the fact that TDD principles are open to interpretation.
I'm not here to say anyone is wrong, but I would like to share my techniques with you and explain how they have served me well. Bear in mind that I have been programming for 36 years; making my programming habits very well evolved.
Code reuse is over rated. Reuse code too much and you'll end up with bad abstraction and it will become very difficult to fix or change something without it affecting something else. The obvious advantage being less code to manage.
Repeating too much code leads to code management problems and oversized code bases. However it does have the advantage of good separation of concerns (the ability to tweak, change and fix things without affecting other parts of the app).
Don't repeat/refactor too much, don't reuse too much. Code needs to be maintainable. It’s important to understand and respect the balance between code reuse and abstraction/separation of concerns.
When deciding whether to reuse code I base the decision on: .... Will the nature of this code change in context throughout the app codebase? If the answer is no, then I reuse it. If the answer is yes or I'm not sure, I repeat/refactor it. I will however revise my codebases from time to time and see if any of my repeated code can be merged without compromising separation of concerns/abstraction.
As far as my basic programming habits are concerned, I like to write the conditions (if then else switch case etc) first; test them, then fill the conditions with the code and test again. Keep in mind there's no rule that you have to do this in a unit test. I refer to this as the low level stuff.
Once my low level stuff is done, I'll either reuse the code or refactor it into another part of the app, but not after testing it very thoroughly. Problem with repeating/refactoring badly tested code is that, if it’s broken, you have to fix it in multiple places.
BDD To me is a natural follow on from TDD. Once my code base is well tested I can easily tweak behaviours by moving entire blocks of code around. Cool thing is about my programming habits is that sometimes I move code around and discover useful behaviours that I didn’t even intend. It can sometimes even be useful for rebranding stuff to seem like a completely different code base.
To this end my code bases tend to start out a bit slow and pick up momentum because as I advance toward the end of development I have more and more code to refactor from or reuse.
The advantages for me in the way that I code is that, I am able to take on very high levels of complexity as this is promoted by good separation of concerns. It’s also awesome for writing highly optimised code. However the well optimised code tends to be a bit bloated, but to my knowledge there is no way to write optimized code without a bit of bloating. If the app doesn't need high processor efficiency, there's nothing stopping me from de-bloating my code. I'm of the opinion that server side code should be optimised and most client side code normally doesn't require it.
Going back to the topic of testing frameworks, I use them to just save a bit of compiler time.
As far as following story boards is concerned, that comes naturally to me without actually considering it. I've noticed most devs develop in the natural order of story boards even when they are not available.
As a general separation of concerns strategy, in most apps I separate concerns based on UI forms. For example I’ll reuse code within a form and repeat/refactor across forms. This is only a generalistic rule. There are times when I have to think outside the box. Sometimes repeating code can serve well for making code processor efficient.
As a little addendum to my TDD habits; I do optimizations and fault tolerance last. I will try to avoid using try catch blocks as much as possible and write my code in such a way as to not need them. For example rather than catch a null, I will check for null, or rather than catch an index out of bounds, I will scrutinise my code so that it never happens. I find that error trapping too early in app development, leads to semantic errors (behavioural errors that don't crash the app). Semantic errors can be very hard to trace or even notice.
Well that’s my 10 cents. Hope it helps.
Test Driven Development ?
So, this means you should start with writing a test first.
Write a test which contains the code like 'how you want to use your class'. This class or method that you are going to test with this test, is not even there yet.
For instance, you could write a test first like this:
[Test]
public void CanReadDataFromSocket()
{
SocketReader r = new SocketReader( ... ); // create a socketreader instance which takes a socket or a mock-socket in its constructor
byte[] data = r.Read();
Assert.IsTrue (data.Length > 0);
}
For instance; I'm just making up an example here.
Next, once you're able to read data from a socket, you can start thinking on how you'll parse it, and write a test in where you use the 'Parser' class which takes the data that you've read, and outputs an instance of your data class.
etc...
Knowing where to start writing tests and when to stop writing tests while using TDD, is a common problem when starting out.
I have found that it can sometimes help to write an integration test first. Doing so will help create some of the common objects you will be using. It will also allow you to focus your thoughts and tests, since you will need to start writing tests to make the integration test pass.
When I was starting with TDD, I read these 3 rules by Uncle Bob that really helped me out:
You are not allowed to write any production code unless it is to
make a failing unit test pass.
You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
In a shorter version it would be:
Write only enough of a unit test to fail.
Write only enough production code to make the failing unit test pass.
as you can see, this is very simple.

Resources