TDD for IMDB html scraper

TDD for IMDB html scraper - tdd

I'm currently developing a TDD idmb html scraper which ill extract certain fields from the imdb webpage. Eg. Title, Synopsis,Cast etc in C++.
I'm just wondering if i have done the TDD right , i have 2 classes the Parser Class & MatchPattern class.
The parser class has like a loadfile function that loads the file into a string and then starts calling the various matchpatttern functions like MatchPattern::extractTitle(string filecontents) and stores them in Parsers' private variables.
the matchpattern is essentially a utility class with static functions. I have no problem testing the matchpattern class. But as for parser class? how should i have designed it for TDD. Am i doing it right or is there something wrong?

You don't design it for TDD, you design it using TDD... By writing the test first your design will automatically be testable. Think "How do I want to use this and how can I test it in a simple way". That's where to start.

Related

What is a good practice for dependency injection in Ruby?

I've been reading Sandi Metz's Practical Object-Oriented Design in Ruby and many sites online discussing design in Ruby. Something I've had a hard time fully understanding is the proper way to implement dependency injection.
The internet is flooded with blog posts that explain how dependency injection works in what I think is a very partial way.
I understand that this is supposed to be bad:
class ThisClass
def initialize
#another_class = AnotherClass.new
end
end
While this is a solution:
class ThisClass
def initialize(another_class)
#another_class = another_class
end
end
And that I could send the AnotherClass.new like this:
this_class = ThisClass.new(AnotherClass.new)
That is the approach that Sandi Metz recommends at least. What I don't understand is where should a line like that go? It has to go somewhere and generally in examples of this what's shown is a line like that being placed totally outside of any class, method, or module as if I'm simply entering it all by hand in IRB for testing purposes.
This post (among others) suggests this different approach:
class ThisClass
def another_class
#another_class ||= AnotherClass.new
end
end
Jamis Buck would take a similar approach like this:
class AnotherClass
end
class ThisClass
def another_class_factory(class_name = AnotherClass)
class_name.new
end
end
However, these two examples both preserve AnotherClass's name inside ThisClass, which Sandi Metz says is one of the main things we're trying to avoid.
So what is the best practice for doing this? Should I make a 'dependency' module filled with methods that are factories for objects of each class in my application?

Something I've had a hard time fully understanding is the proper way to implement dependency injection.
I think the best definition of a "proper" implementation is one that adheres to the SOLID principles of object oriented design. In this case mostly the Dependency Inversion Principle.
In this regard, this is the only presented solution that does not violate the DIP(1):
class ThisClass
def initialize(another_class)
#another_class = another_class
end
end
In all other cases, ThisClass has a hard dependency on AnotherClass, and can not function without it. Furthermore, if we wish to replace AnotherClass with a third, we need to modify ThisClass, which is a violation of the Open Closed Principle.
Of course, in the example above, naming the parameter and instance variable another_class is not ideal, since we do not now (and do not need to know) what object is passed to us, as long as it responds to the expected interface. This is the beauty of polymorphism.
Consider the below example, taken from this ThoughtBot video on DIP:
class Copier
def initialize(reader, writer)
#reader = reader
#writer = writer
end
def copy
#writer.write(#reader.read_until_eof)
end
end
Here you can pass any reader and writer objects that respond to read_until_eof and write respectively. This gives you full freedom to compose your business logic using different pairs of read and write implementations, even at runtime:
Copier.new(KeyboardReader.new, Printer.new)
Copier.new(KeyboardReader.new, NetworkPrinter.new)
Which brings us to your next question.
It has to go somewhere and generally in examples of this what's shown is a line like that being placed totally outside of any class, method, or module [...]
You are correct. While object thinking involves modelling the domain with well isolated, decoupled, and composable objects, you will still need to define how these objects interact, in order to implement any business logic. After all, having composable objects is no good unless we compose them.
The analogy that is often made here is to think of your objects as actors. You are the director, and you still need to create a script(2) for the actors to know how to interact with each other.
That is, you need an entry point into your application. A place where the script starts. This might itself be an object--normally an abstract one. In a command line application, it can be your classic Main class, and in a Rails application it can be your controller.
This might seem strange at first, because the focus of object thinking is on modelling concrete domain objects, and a great deal of all writings on the subject is dedicated to this effort, but just remember the actor-script metaphor, and you'll be on your way.
I strongly recommend you pick up the book Object Thinking. It does a great job explaining the mindset behind object oriented design, without which knowing the language specific implementation details becomes rather futile.
(1): It is worth noting that some proponents consider storing an instance of another class in an instance variable an anti-pattern, but in Ruby, this is fairly idiomatic.
(2): I am not sure if this is the origin of the term script in programming in general, but maybe some historian can shed some light on this.

Organizing classes in Ruby script

I'm writing a small game. I have:
class MyGame
...
class_methods
a bit of game logic
end
# after my_game unwrapped code
putss
get.chomps
methods, loops (to be DRY)
interaction with user, returning values
Is there a correct approach to wrap code together? Is it correct to not wrap code after my_game class in any class or module, or should I always put my code in classes/modules?

I would wrap functionality in modules and classes, as there are some benefits to it:
a) you can easily write tests for code in classes (and for modules by including them in classes as mixins and testing the classes)
b) you have control over the visibility of functions/methods and you can actually create an interface in case you have a consumer of the game that needs to access something more than just the game class
c) it's easier to extend the functionality of parts of the game by creating new implementors of parts of the game's functionality
That said, there is no dogma on writing only in an Object Oriented way. In some cases (perhaps for scripts that will be used as command line scripts), just having some functions and code executing those functions might be enough (especially if the script is simple and short in general).
My advice regarding the structure of the little game you're building is to look for the interactions, the "verbs" that need to take place (ie the messages sent between objects) and then you'll come up with the classes that will send those messages (methods) and designing and structuring the game will get much easier I believe.
By the way, a good book that could help in the direction of designing software is the following:
http://www.amazon.com/Practical-Object-Oriented-Design-Ruby-Addison-Wesley/dp/0321721330
Hope the above help.

Adopting "Growing Object-Oriented Software" techniques to Ruby on Rails

I read Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce and was impressed very much. I want to adopt the ideas of this book in my Rails projects using RSpec, though its examples are written in Java.
A basic precept of this book is that we should mock interfaces instead of concrete classes. They say we can improve the application design by extracting interfaces and naming them.
But, Ruby doesn't have any syntax equivalent to Java's interface. How can I utilize their techniques for Rails projects?
UPDATE
For example, in the page 126 the authors introduced Auction interface in order to implement the bid method. Firstly, they mocked Auction.class to make the test pass, then they implemented an Auction class as anonymous inner class in the Main class. Finally, they extracted a new concrete class XMPPAuction from Main (page 131-132).
This incremental approach is the crux of this book in my opinion.
How can I adopt or imitate such a series of code transformation in the Ruby development?

Check out this previous Stack Overflow answer for a good explanation of interfaces in ruby.
Also, Practical Object-Oriented Design in Ruby is a book in similar vein as Growing Object Oriented Software book, but with ruby examples. Worth checking it out.

Since in Ruby, all things are duck-typed and the interface is simply the set of fields and methods that are publicly exposed you can do one or more of the following:
Test
Design your tests to test interfaces and name and comment your tests appropriately - then you can pass all of your "concrete implementations" of your "interface" through the same test suite. As long as your test suite covers your application's edge cases anything in your application that takes an instance of any of these concrete classes will be able to handle an instance of any of the other concrete classes.
Use base classes
Define a base class that all of your concrete classes will inherit from where all of the methods throw a NotImplemented error. This gives you a hierarchy that you can visualize - however, it does require extra classes and may encourage numerous is a tests in your production code rather than that code relying on has a.

You're right that Ruby does not have formal interfaces. Instead, interfaces are implicit in the messages that an object handles (see duck typing).
If you are still looking for a more formal way to "enforce" an interface in Ruby, consider writing a suite of automated unit tests that are green if an object conforms to the interface properly, and are red otherwise.
For examples, see ActiveModel::Lint::Tests and rspec's shared examples.

Should I only be testing public interfaces in BDD? (in general, and specifically in Ruby)

I'm reading through the (still beta) rspec book by the prag progs as I'm interested in behavioral testing on objects. From what I've gleaned so far (caveat: after only reading for 30 min), the basic idea is that I want ensure my object behaves as expected 'externally' i.e. in its output and in relation to other objects.
Is it true then that I should just be black box testing my object to ensure the proper output/interaction with other objects?
This may be completely wrong, but given all of the focus on how my object behaves in the system, it seems this is ideology one would take. If that's so, how do we focus on the implementation of an object? How do I test that my private method is doing what I want it to do for all different types of input?
I suppose this question is maybe valid for all types of testing?? I'm still fairly new to TDD and BDD.

If you want to understand BDD better, try thinking about it without using the word "test".
Instead of writing a test, you're going to write an example of how you can use your class (and you can't use it except through public methods). You're going to show why your class is valuable to other classes. You're defining the scope of your class's responsibilities, while showing (through mocks) what responsibilities are delegated elsewhere.
At the same time, you can question whether the responsibilities are appropriate, and tune the methods on your class to be as intuitively usable as possible. You're looking for code which is easy to understand and use, rather than code which is easy to write.
If you can think in terms of examples and providing value through behaviour, you'll create code that's easy to use, with examples and descriptions that other people can follow. You'll make your code safe and easy to change. If you think about testing, you'll pin it down so that nobody can break it. You'll make it hard to change.
If it's complex enough that there are internal methods you really want to test separately, break them out into another class then show why that class is valuable and what it does for the class that uses it.
Hope this helps!

I think there are two issues here.
One is that from the BDD perspective, you are typically testing at a higher level than from the TDD perspective. So your BDD tests will assert a bigger piece of functionality than your TDD tests and should always be "black box" tests.
The second is that if you feel the need to test private methods, even at the unit test level, that could be a code smell that your code is violating the Single Responsibilty Principle
and should be refactored so that the methods you care about can be tested as public methods of a different class. Michael Feathers gave an interesting talk about this recently called "The Deep Synergy Between Testability and Good Design."

Yes, focus on the exposed functionality of the class. Private methods are just part of a public function you will test. This point is a bit controversial, but in my opinion it should be enough to test the public functionality of a class (everything else also violates the OOP principle).

What separates a Ruby DSL from an ordinary API

What are some defining characteristics of a Ruby DSL that separate it from just a regular API?

When you use an API you instantiate objects and call methods in an imperative manner. On the other hand a good DSL should be declarative, representing rules and relationships in your problem domain, not instructions to be executed. Moreover ideally DSL should be readable and modifiable by somebody who is not a programmer (which is not the case with APIs).
Also please keep in mind the distinction between internal and external DSLs.
Internal domain specific language is embedded in a programming language (eg. Ruby). It's easy to implement, but the structure of the DSL is dependent on the parent language it is embedded in.
External domain specific language is a separate language designed with the particular domain in mind. It gives you a greater flexibility when it comes to syntax, but you have to implement the code to interpret it. It's also more secure, as the person editing domain rules doesn't have access to all the power of the parent language.

DSL (domain specific language) is an over-hyped term. If you are simply using a sub-set of a language (say Ruby), how is it a different language than the original? The answer is, it isn't.
However, if you do some preprocessing of the source text to introduce new syntax or new semantics not found in the core language then you indeed have a new language, which may be domain-specific.

The combination of Ruby's poetry mode and operator overloading does present the possibility of having something that is at the same time legal Ruby syntax and a reasonable DSL.
And the continued aggravation that is XML does show that perhaps the simple DSL built into all those config files wasn't completely misguided..

Creating a DSL:
Adding new methods to the Object class so that you can just call them as if they were built-in language constructs. (see rake)
Creating methods on a custom object or set of objects, and then having script files run the statements in the context of a top-level object. (see capistrano)
API design:
Creating methods on a custom object or set of objects, so the user creates an object to use the methods.
Creating methods as class methods, so that the user prefixes the classname in front of all the methods.
Creating methods as a mixin that users include or extend to use the methods in their custom objects.
So yes, the line is thin between them. It's trivial to turn a custom set of objects into a DSL by adding one method that runs a script file in the right context.

The difference between a DSL and an API to me is that a DSL could be at least understood (and verified) if not written as a sub-language of Ruby by someone in that domain.
For example, you could have financial analysts writing rules for a stock trading application in a Ruby DSL and they would never have to know they were using Ruby.

They are, in fact, the same thing. DSLs are generally implemented via the normal language mechanisms in Ruby, so technically they're all APIs.
However, for people to recognize something as a DSL, it usually ends up adding what look like declarative statements to existing classes. Something like the validators and relationship declarations in ActiveRecord.
class Foo << ActiveRecord::Base
validates_uniqueness_of :name
validates_numericality_of :number, :integer_only => true
end
looks like a DSL, while the following doesn't:
class Foo <<ActiveRecord::BAse
def validate
unless unique? name
errors.add(:name, "must be unique")
end
unless number.to_s.match?(/^[-]?\d$/)
errors.add(:number, "must be an integer")
end
end
end
They're both going to be implemented by normal Ruby code. It's just that one looks like you've got cool new language constructs, while the other seems rather pedestrian (and overly verbose, etc. etc.)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

TDD for IMDB html scraper - tdd

You don't design it for TDD, you design it using TDD... By writing the test first your design will automatically be testable. Think "How do I want to use this and how can I test it in a simple way". That's where to start.

Related

What is a good practice for dependency injection in Ruby?

Organizing classes in Ruby script

Adopting "Growing Object-Oriented Software" techniques to Ruby on Rails

Should I only be testing public interfaces in BDD? (in general, and specifically in Ruby)

What separates a Ruby DSL from an ordinary API

Categories

Resources