how to organize classes in ruby if they are literal subclasses - ruby

I know that title didn't make sense, Im sorry! Its hard to word what I am trying to ask. I had trouble googling it for the same reason.
So this isn't even Ruby specific, but I am working in ruby and I am new to it, so bear with me.
So you have a class that is a document. Inside each document, you have sentences, and each sentence has words. Words will have properties, like "noun" or a count of how many times they are used in the document, etc.
I would like each of the elements, document, sentence, word be an object.
Now, if you think literally - sentences are in documents, and words are in sentences.
Should this be organized literally like this as well? Like inside the document class you will define and instantiate the sentence objects, and inside the sentence class you will define and instantiate the words?
Or, should everything be separate and reference each other? Like the word class would sit outside the sentence class but the sentence class would be able to instantiate and work with words?
This is a basic OOP question I guess, and I suppose you could argue to do it either way. What do you guys think?
Each sentence in the document could be stored in a hash of sentence objects inside the document object, and each word in the sentence could be stored in a hash of word objects inside the sentence.
I dont want to code myself into a corner here, thats why I am asking, plus I have wondered this before in other situations.
Thank you!

Looks like you are mixing 2 concepts here: nesting objects and nesting classes. These are different things. Your objects, likely, need to be nested. However, your classes don't have to be nested. They might, but it's a matter of style (and some languages don't allow nested classes at all). Also, it depends on how you plan to use these classes. For example, in your case, do words have meaning as a separate concept? not just as a part of a sentence? If so, I would define Word as a separate class.

For languages that allow a class to be defined within the definition of another class, you could run into visibility problems where the internal class cannot be used by an outside class. In your case, if you think you might want to use one of these internal classes outside of the whole document-sentence-word system, then internal classes is not a good idea. In fact, I try to avoid using internal classes as a general rule, and only do it when the internal class is one that does not make sense in any other context except that of the containing class.
...so to answer your question: if it were me, I'd have them all as separate classes that reference each other.

You can create an anonymous class. You might even be able to control who has access to those anonymous classes by using access control on a method that creates / returns the anonymous class - I don't know. But I don't think it's something you really need.
Rather than totally prohibiting other parts of the code from using those classes, it'd probably be best to merely document that they're for internal use only. One way of indicating they're for internal use is to use namespaces.
If the class is named DocumentParser, then it sounds like a class everyone can use. However, if you have something named DocumentParser::Noun or DocumentParser::Sentence then they might realize that they may be using implementation details that shouldn't be relied upon.

Related

Are methods such as XContainer.Elements, XContainer.Nodes etc also considered Linq to XML query operators?

Book uses different terms for Linq-to-XML methods/properties defined in classes XObject, XNode, XElement etc ( such as XContainer.Elements, XContainer.Nodes, XObject.Document ... ) and for extension methods defined in Extensions class. For former it uses the term methods, while with extension methods it uses the term query operators.
Is there a particular reason why author uses two different terms or are XContainer.Elements, XContainer.Nodes etc also considered Linq-to-XML query operators?
Thank you
Ultimately I doubt that these terms are specified anywhere in a particularly definitive way - and I wouldn't worry too much about it.
I wouldn't be surprised to see the author using the terms inconsistently themselves. I'd be even less surprised if that were the case and the author turned out to be me ;)
I'm not sure which book you are refering to, but the Elements/Nodes/etc methods are considered Axis Methods (http://msdn.microsoft.com/en-us/library/bb387099.aspx). I would think the query operators would be things like Select/Where/OrderBy regardless of whether they exist directly on the type in question or if they were static extension methods.

How can I determine the size of methods and classes in Ruby?

I'm working on a code visualization tool and I'd like to be able to display the size(in lines) of each Class, Method, and Module in a project. It seems like existing parsers(such as Ripper) could make this info easy to get. Is there a preferred way to do this? Is there a method of assessing size for classes that have been re-opened in separate locations? How about for dynamically (Class.new {}, Module.new {}) defined structures?
I think what you're asking for is not possible in general without actually running the whole Ruby program the classes are part of (and then you run into the halting problem). Ruby is extremely dynamic, so lines could be added to a class' definition anywhere, at any time, without necessarily referring to the particular class by name (e.g. using class_eval on a class passed into a method as an argument). Not that the source code of a class' definition is saved anyway... I think the closest you could get to that is the source_locations of the methods of the class.
You could take the difference of the maximum and minimum line numbers of those source_locations for each file. Then you'd have to assume that the class is opened only once per file, and that the size of the last method in a file is negligible (as well as any non-method parts of the class definition that happen before the first method definition or after the last one).
If you want something more accurate maybe you could run the program, get method source_locations, and try to correlate those with a separate parse of the source file(s), looking for enclosing class blocks etc.
But anything you do will most likely involve assumptions about how classes are generally defined, and thus not always be correct.
EDIT: Just saw that you were asking about methods and modules too, not just classes, but I think similar arguments apply for those.
I've created a gem that handles this problem in the fashion suggested by wdebaum. class_source. It certainly doesn't cover all cases but is a nice 80% solution for folks that need this type of thing. Patches welcome!

How to name factory like methods?

I guess that most factory-like methods start with create. But why are they called "create"? Why not "make", "produce", "build", "generate" or something else? Is it only a matter of taste? A convention? Or is there a special meaning in "create"?
createURI(...)
makeURI(...)
produceURI(...)
buildURI(...)
generateURI(...)
Which one would you choose in general and why?
Some random thoughts:
'Create' fits the feature better than most other words. The next best word I can think of off the top of my head is 'Construct'. In the past, 'Alloc' (allocate) might have been used in similar situations, reflecting the greater emphasis on blocks of data than objects in languages like C.
'Create' is a short, simple word that has a clear intuitive meaning. In most cases people probably just pick it as the first, most obvious word that comes to mind when they wish to create something. It's a common naming convention, and "object creation" is a common way of describing the process of... creating objects.
'Construct' is close, but it is usually used to describe a specific stage in the process of creating an object (allocate/new, construct, initialise...)
'Build' and 'Make' are common terms for processes relating to compiling code, so have different connotations to programmers, implying a process that comprises many steps and possibly a lot of disk activity. However, the idea of a Factory "building" something is a sensible idea - especially in cases where a complex data-structure is built, or many separate pieces of information are combined in some way.
'Generate' to me implies a calculation which is used to produce a value from an input, such as generating a hash code or a random number.
'Produce', 'Generate', 'Construct' are longer to type/read than 'Create'. Historically programmers have favoured short names to reduce typing/reading.
Joshua Bloch in "Effective Java" suggests the following naming conventions
valueOf — Returns an instance that has, loosely speaking, the same value
as its parameters. Such static factories are effectively
type-conversion methods.
of — A concise alternative to valueOf, popularized by EnumSet (Item 32).
getInstance — Returns an instance that is described by the parameters
but cannot be said to have the same value. In the case of a singleton,
getInstance takes no parameters and returns the sole instance.
newInstance — Like getInstance, except that newInstance guarantees that
each instance returned is distinct from all others.
getType — Like getInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.
newType — Like newInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.
Wanted to add a couple of points I don't see in other answers.
Although traditionally 'Factory' means 'creates objects', I like to think of it more broadly as 'returns me an object that behaves as I expect'. I shouldn't always have to know whether it's a brand new object, in fact I might not care. So in suitable cases you might avoid a 'Create...' name, even if that's how you're implementing it right now.
Guava is a good repository of factory naming ideas. It is popularising a nice DSL style. examples:
Lists.newArrayListWithCapacity(100);
ImmutableList.of("Hello", "World");
"Create" and "make" are short, reasonably evocative, and not tied to other patterns in naming that I can think of. I've also seen both quite frequently and suspect they may be "de facto standards". I'd choose one and use it consistently at least within a project. (Looking at my own current project, I seem to use "make". I hope I'm consistent...)
Avoid "build" because it fits better with the Builder pattern and avoid "produce" because it evokes Producer/Consumer.
To really continue the metaphor of the "Factory" name for the pattern, I'd be tempted by "manufacture", but that's too long a word.
I think it stems from “to create an object”. However, in English, the word “create” is associated with the notion “to cause to come into being, as something unique that would not naturally evolve or that is not made by ordinary processes,” and “to evolve from one's own thought or imagination, as a work of art or an invention.” So it seems as “create” is not the proper word to use. “Make,” on the other hand, means “to bring into existence by shaping or changing material, combining parts, etc.” For example, you don’t create a dress, you make a dress (object). So, in my opinion, “make” by meaning “to produce; cause to exist or happen; bring about” is a far better word for factory methods.
Partly convention, partly semantics.
Factory methods (signalled by the traditional create) should invoke appropriate constructors. If I saw buildURI, I would assume that it involved some computation, or assembly from parts (and I would not think there was a factory involved). The first thing that I thought when I saw generateURI is making something random, like a new personalized download link. They are not all the same, different words evoke different meanings; but most of them are not conventionalised.
I'd call it UriFactory.Create()
Where,
UriFactory is the name of the class type which is provides method(s) that create Uri instances.
and Create() method is overloaded for as many as variations you have in your specs.
public static class UriFactory
{
//Default Creator
public static UriType Create()
{
}
//An overload for Create()
public static UriType Create(someArgs)
{
}
}
I like new. To me
var foo = newFoo();
reads better than
var foo = createFoo();
Translated to english we have foo is a new foo or foo is create foo. While I'm not a grammer expert I'm pretty sure the latter is grammatically incorrect.
I'd point out that I've seen all of the verbs but produce in use in some library or other, so I wouldn't call create being an universal convention.
Now, create does sound better to me, evokes the precise meaning of the action.
So yes, it is a matter of (literary) taste.
Personally I like instantiate and instantiateWith, but that's just because of my Unity and Objective C experiences. Naming conventions inside the Unity engine seem to revolve around the word instantiate to create an instance via a factory method, and Objective C seems to like with to indicate what the parameter/s are. This only really works well if the method is in the class that is going to be instantiated though (and in languages that allow constructor overloading, this isn't so much of a 'thing').
Just plain old Objective C's initWith is also a good'un!

Using Hash Tables as function inputs

I am working in a group that is writing some APIs for tools that we are using in Ruby. When writing API methods, many of my team mates use hash tables as the method's only parameter while I write my methods with each value specified.
For example, a class Apple defined as:
class Apple
#commonName
#volume
#color
end
I would instantiate the class with method:
Apple.new( commonName, volume, color )
My team mates would write it so the method looked like:
Apple.new( {"commonName"=>commonName, "volume"=>volume, "color"=>color )
I don't like using a hash table as the input. To me is seems unnecessarily bulky and doesn't add any clarity to the code. While it doesn't appear to be a big deal in this example, some of our methods have greater than 10 parameters and there will often be hash tables nested in inside other hash tables. I also noticed that using hash tables in this way is extremely uncommon in public APIs(net/telnet is the only exception that I can think of right now).
Question: What arguments could I make to my team members to not use hash tables as input parameters. The bulkiness of the code isn't a sufficient justification(they are not afraid of writing 200-400 character lines) and excessive memory/processing overhead won't work because it won't become an issue with the way our tools will be used.
Actually if your method takes more than 10 arguments, you should either redesign your class or eat dirt and use hashes. For any method that takes more than 4 arguments, using typical arguments can be counter-intuitive while calling the method, because you got to remember the order correctly.
I think best solution would be to simply redesign such methods and use something like builder or fluent patterns.
First of all, you should chide them for using strings instead of symbols for hash keys.
One issue with using a hash is that you then have to check that all the appropriate keys are in it. This makes it useful for optional parameters, but for mandatory one, why not use the built-in functionality of the language? For example, with their method, what happens if I do this:
Apple.new( {"commonName"=>commonName, "volume"=>volume} )
Whereas, with Apple.new(commonName, volume), you know you'll get an ArgumentError.
Named parameters make for more self-documenting code which is nice. But other than that there's not a lot of difference. The Hash allows for more flexibility, especially if you start doing any method aliasing. Also, the various Hash methods in ActiveSupport make setting defaults and verifying inputs pretty painless. I guess this probably wasn't the answer you were looking for.

Question about object oriented design with Ruby

I'm thinking of writing a CLI Monopoly game in Ruby. This would be the first large project I've done in Ruby. Most of my experience in programming has been with functional programming languages like Clojure and Haskell and such. I understand Object Orientation pretty well, but I have no experience with designing object oriented programs.
Now, here's the deal.
In Monopoly, there are lots of spaces spread around the board. Most spaces are properties, and others do other things.
Would it be smart to have a class for each of the spaces? I was thinking of having a Space class that all other spaces inherit from, and having a Property class that inherits from Space, and then having a class for every property that inherits from Property. This would mean a lot of classes, which leads me to believe that this is an awful way to do what I'm trying to do.
What I also intended to do, was use the 'inherited' hook method to keep track of all the properties, so that I could search them and remove them from the unbought list when needed.
This sort of problem seems to arise in a lot of programs, so my question is: Is there a better way to do this, and am I missing something very crucial to Object Oriented design?
I'm sorry if this is a stupid question, I'm just clueless when it comes to OOPD.
Thanks.
You're on the right track, but you've gone too far, which is a common beginner's mistake with OOP. Every property should not be a separate class; they should all be instances of the Property class. I'd do classes with attributes along these lines:
Space
Name
Which pieces are on it
Which space is next (and maybe previous?)
Any special actions which take place when landing on it
Property (extends Space)
Who owns it
How many houses/hotels are on it
Property value
Property monopoly group
Rent rates
Whether it is mortgaged
So, for example, Boardwalk would be an object of type Property, with specific values that apply to it, such as belonging to the dark blue monopoly group.
Classes for spaces is probably not a bad idea; but let's see why that's the case.
If you were writing this in a procedural language, where would the majority of the 'if' and 'switch' statements be? They'd be in determining what happened to each player when they landed on a space. In OO, we want to prevent as many if/switch statements as possible, and replace them with polymorphism. So, clearly, if we create many derivatives of Space, we will prevent a large number of if/switch statements.
But before we make that leap, let's look at the metaphor of the game. The game is all about a city. Addresses, streets, utilities, public works, jails, etc. Each square on the board represents some kind of location in the city. Indeed the players travel through the city paying rent (or some other action) anywhere they happen to land.
So let's call the spaces, CityBlocks. Each CityBlock has an address, like "Boardwalk" or "Electric Company", or "Community Chest". Are there different types of CityBlock objects? Or should we simply consider CityBlocks to be locations that have a certain kind of ZoningOrdinance?
I kind of like that. |CityBlock|---->|ZoningOrdinance|
Now we could have several different derivatives of ZoningOrdinance. This is the "Strategy" Pattern. I rather like the flexibility this give us, and the separation of 'where' and 'what' that it provides.
You'll also need a Game class that understand the rules of dice, tokens, FreeParking, Passing Go, etc. That class will manipulate the players, and invoke the methods on CityBlock. CityBlock will invoke methods on ZoningOrdinance.
Anyway, that's one way...

Resources