Do you use nouns for classnames that represent callable objects? - coding-style

There is a more generic question asked here. Consider this as an extension to that.
I understand that classes represent type of objects and we should use nouns as their names. But there are function objects supported in many language that acts more like functions than objects. Should I name those classes also as nouns, or verbs are ok in that case. doSomething(), semantically, makes more sense than Something().
Update / Conclusion
The two top voted answers I got on this shares a mixed opinion:
Attila
In the case of functors, however, they represent "action", so naming them with a verb (or some sort of noun-verb combination) is more appropriate -- just like you would name a function based on what it is doing.
Rook
The instance of a functor on the other hand is effectively a function, and would perhaps be better named accordingly. Following Jim's suggestion above, SomethingDoer doSomething; doSomething();
Both of them, after going through some code, seems to be the common practice. In GNU implementation of stl I found classes like negate, plus, minus (bits/stl_function.h) etc. And variate_generator, mersenne_twister (tr1/random.h). Similarly in boost I found classes like base_visitor, predecessor_recorder (graph/visitors.hpp) as well as inverse, inplace_erase (icl/functors.hpp)

Objects (and thus classes) usually represent "things", therefore you want to name them with nouns. In the case of functors, however, they represent "action", so naming them with a verb (or some sort of noun-verb combination) is more appropriate -- just like you would name a function based on what it is doing.
In general, you would want to name things (objects, functions, etc.) after their purpose, that is, what they represent in the program/the world.
You can think of functors as functions (thus "action") with state. Since the clean way of achieving this (having a state associated with your "action") is to create an object that stores the state, you get an object, which is really an "action" (a fancy function).
Note that the above applies to languages where you can make a pure functor, that is, an object where the invocation is the same as for a regular function (e.g. C++). In languages where this is not supported (that is, you have to have a method other than operator() called, like command.do()), you would want to name the command-like class/object a noun and name the method called a verb.

The type of the functor is a class, which you may as well name in your usual style (for instance, a noun) because it doesn't really do anything on its own. The instance of a functor on the other hand is effectively a function, and would perhaps be better named accordingly. Following Jim's suggestion above, SomethingDoer doSomething; doSomething();
A quick peek at my own code shows I've managed to avoid this issue entirely by using words which are both nouns and verbs... Filter and Show for example. Might be tricky to apply that naming scheme in all circumstances, however!

I suggest you give them a suffix like Function, Callback, Action, Command, etc. so that you will have classes called SortFunction, SearchAction, ExecuteCommand instead of Sort, Search, Execute which sound more like names of actual functions.

I think there are two ways to see this:
1) The class which could be named sensibly so that it can be invoked as a functor directly upon construction:
Doer()(); // constructed and invoked in one shot, compiler could optimize
2) The instance in cases where we want the functor to be invoked multiple times on a stateless functor object, or perhaps for stylistic reasons when #1's syntax is not preferable.
Doer _do;
_do(); // operator () invocation after construction
I like #Rook's suggestion above of naming the class with words that have the same verb and noun forms such as Search or Filter.

Related

How to split up really long match on enum with many variants?

What's the usual best practice to split up a really long match on an enum with dozens of variants to handle, each with dozens or hundreds of lines of code?
I've started to create helper functions for each case and just call those functions passing in the enum's fields (or whatever they're called). But it seems a bit redundant to have MyEnum::MyCase{a,b,c} => handle_mycase(a,b,c) many times.
And if that is the best practice, is it possible to destructure MyEnum::MyCase directly in that helper function's parameters, despite the fact that technically it's refutable, since realistically I already know I'm calling it with the right case?
Maybe the crate enum_dispatch helps you.
IIRC, on a high level: It assumes that all your enum variants implement a trait with a function handle_mycase. Then handle_mycase can be called on the enum directly and will be dispatched to the concrete struct.

Is there a difference between fun(n::Integer) and fun(n::T) where T<:Integer in performance/code generation?

In Julia, I most often see code written like fun(n::T) where T<:Integer, when the function works for all subtypes of Integer. But sometimes, I also see fun(n::Integer), which some guides claim is equivalent to the above, whereas others say it's less efficient because Julia doesn't specialize on the specific subtype unless the subtype T is explicitly referred to.
The latter form is obviously more convenient, and I'd like to be able to use that if possible, but are the two forms equivalent? If not, what are the practicaly differences between them?
Yes Bogumił Kamiński is correct in his comment: f(n::T) where T<:Integer and f(n::Integer) will behave exactly the same, with the exception the the former method will have the name T already defined in its body. Of course, in the latter case you can just explicitly assign T = typeof(n) and it'll be computed at compile time.
There are a few other cases where using a TypeVar like this is crucially important, though, and it's probably worth calling them out:
f(::Array{T}) where T<:Integer is indeed very different from f(::Array{Integer}). This is the common parametric invariance gotcha (docs and another SO question about it).
f(::Type) will generate just one specialization for all DataTypes. Because types are so important to Julia, the Type type itself is special and allows parameterization like Type{Integer} to allow you to specify just the Integer type. You can use f(::Type{T}) where T<:Integer to require Julia to specialize on the exact type of Type it gets as an argument, allowing Integer or any subtypes thereof.
Both definitions are equivalent. Normally you will use fun(n::Integer) form and apply fun(n::T) where T<:Integer only if you need to use specific type T directly in your code. For example consider the following definitions from Base (all following definitions are also from Base) where it has a natural use:
zero(::Type{T}) where {T<:Number} = convert(T,0)
or
(+)(x::T, y::T) where {T<:BitInteger} = add_int(x, y)
And even if you need type information in many cases it is enough to use typeof function. Again an example definition is:
oftype(x, y) = convert(typeof(x), y)
Even if you are using a parametric type you can often avoid using where clause (which is a bit verbose) like in:
median(r::AbstractRange{<:Real}) = mean(r)
because you do not care about the actual value of the parameter in the body of the function.
Now - if you are Julia user like me - the question is how to convince yourself that this works as expected. There are the following methods:
you can check that one definition overwrites the other in methods table (i.e. after evaluating both definitions only one method is present for this function);
you can check code generated by both functions using #code_typed, #code_warntype, #code_llvm or #code_native etc. and find out that it is the same
finally you can benchmark the code for performance using BenchmarkTools
A nice plot explaining what Julia does with your code is here http://slides.com/valentinchuravy/julia-parallelism#/1/1 (I also recommend the whole presentation to any Julia user - it is excellent). And you can see on it that Julia after lowering AST applies type inference step to specialize function call before LLVM codegen step.
You can hint Julia compiler to avoid specialization. This is done using #nospecialize macro on Julia 0.7 (it is only a hint though).

Naming convention for syntactic sugar methods

I'm build a library for generic reporting, Excel(using Spreadsheet), and most of the time I'll be writing things out on the last created worksheet (or active as I mostly refer to it).
So I'm wondering if there's a naming convention for methods that are mostly sugar to the normal/unsugared method.
For instance I saw a blog post, scroll down to Composite, a while ago where the author used the #method for the sugared, and #method! when unsugared/manual.
Could this be said to be a normal way of doing things, or just an odd implementation?
What I'm thinking of doing now is:
add_row(data)
add_row!(sheet, data)
This feels like a good fit to me, but is there a consensus on how these kinds of methods should be named?
Edit
I'm aware that the ! is used for "dangerous" methods and ? for query/boolean responses. Which is why I got curious whether the usage in Prawn (the blog post) could be said to be normal.
I think it's fair to say that your definitions:
add_row(data)
add_row!(sheet, data)
are going to confuse Ruby users. There is a good number of naming conventions is the Ruby community that are considered like a de-facto standard for naming. For example, the bang methods are meant to modify the receiver, see map and map!. Another convention is add the ? as a suffix to methods that returns a boolean. See all? or any? for a reference.
I used to see bang-methods as more dangerous version of a regular named method:
Array#reverse! that modifies array itself instead of returning new array with reversed order of elements.
ActiveRecord::Base#save! (from Rails) validates model and save it if it's valid. But unlike regular version that return true or false depending on whether the model was saved or not raises an exception if model is invalid.
I don't remember seeing bang-methods as sugared alternatives for regular methods. May be I'd give such methods their own distinct name other then just adding a bang to regular version name.
Why have two separate methods? You could for example make the sheet an optional parameter, for example
def add_row(sheet = active_sheet, data)
...
end
default values don't have to just be static values - in this case it's calling the active_sheet method. If my memory is correct prior to ruby 1.9 you'd have to swap the parameters as optional parameters couldn't be followed by non optional ones.
I'd agree with other answers that ! has rather different connotations to me.

How to name factory like methods?

I guess that most factory-like methods start with create. But why are they called "create"? Why not "make", "produce", "build", "generate" or something else? Is it only a matter of taste? A convention? Or is there a special meaning in "create"?
createURI(...)
makeURI(...)
produceURI(...)
buildURI(...)
generateURI(...)
Which one would you choose in general and why?
Some random thoughts:
'Create' fits the feature better than most other words. The next best word I can think of off the top of my head is 'Construct'. In the past, 'Alloc' (allocate) might have been used in similar situations, reflecting the greater emphasis on blocks of data than objects in languages like C.
'Create' is a short, simple word that has a clear intuitive meaning. In most cases people probably just pick it as the first, most obvious word that comes to mind when they wish to create something. It's a common naming convention, and "object creation" is a common way of describing the process of... creating objects.
'Construct' is close, but it is usually used to describe a specific stage in the process of creating an object (allocate/new, construct, initialise...)
'Build' and 'Make' are common terms for processes relating to compiling code, so have different connotations to programmers, implying a process that comprises many steps and possibly a lot of disk activity. However, the idea of a Factory "building" something is a sensible idea - especially in cases where a complex data-structure is built, or many separate pieces of information are combined in some way.
'Generate' to me implies a calculation which is used to produce a value from an input, such as generating a hash code or a random number.
'Produce', 'Generate', 'Construct' are longer to type/read than 'Create'. Historically programmers have favoured short names to reduce typing/reading.
Joshua Bloch in "Effective Java" suggests the following naming conventions
valueOf — Returns an instance that has, loosely speaking, the same value
as its parameters. Such static factories are effectively
type-conversion methods.
of — A concise alternative to valueOf, popularized by EnumSet (Item 32).
getInstance — Returns an instance that is described by the parameters
but cannot be said to have the same value. In the case of a singleton,
getInstance takes no parameters and returns the sole instance.
newInstance — Like getInstance, except that newInstance guarantees that
each instance returned is distinct from all others.
getType — Like getInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.
newType — Like newInstance, but used when the factory method is in a
different class. Type indicates the type of object returned by the
factory method.
Wanted to add a couple of points I don't see in other answers.
Although traditionally 'Factory' means 'creates objects', I like to think of it more broadly as 'returns me an object that behaves as I expect'. I shouldn't always have to know whether it's a brand new object, in fact I might not care. So in suitable cases you might avoid a 'Create...' name, even if that's how you're implementing it right now.
Guava is a good repository of factory naming ideas. It is popularising a nice DSL style. examples:
Lists.newArrayListWithCapacity(100);
ImmutableList.of("Hello", "World");
"Create" and "make" are short, reasonably evocative, and not tied to other patterns in naming that I can think of. I've also seen both quite frequently and suspect they may be "de facto standards". I'd choose one and use it consistently at least within a project. (Looking at my own current project, I seem to use "make". I hope I'm consistent...)
Avoid "build" because it fits better with the Builder pattern and avoid "produce" because it evokes Producer/Consumer.
To really continue the metaphor of the "Factory" name for the pattern, I'd be tempted by "manufacture", but that's too long a word.
I think it stems from “to create an object”. However, in English, the word “create” is associated with the notion “to cause to come into being, as something unique that would not naturally evolve or that is not made by ordinary processes,” and “to evolve from one's own thought or imagination, as a work of art or an invention.” So it seems as “create” is not the proper word to use. “Make,” on the other hand, means “to bring into existence by shaping or changing material, combining parts, etc.” For example, you don’t create a dress, you make a dress (object). So, in my opinion, “make” by meaning “to produce; cause to exist or happen; bring about” is a far better word for factory methods.
Partly convention, partly semantics.
Factory methods (signalled by the traditional create) should invoke appropriate constructors. If I saw buildURI, I would assume that it involved some computation, or assembly from parts (and I would not think there was a factory involved). The first thing that I thought when I saw generateURI is making something random, like a new personalized download link. They are not all the same, different words evoke different meanings; but most of them are not conventionalised.
I'd call it UriFactory.Create()
Where,
UriFactory is the name of the class type which is provides method(s) that create Uri instances.
and Create() method is overloaded for as many as variations you have in your specs.
public static class UriFactory
{
//Default Creator
public static UriType Create()
{
}
//An overload for Create()
public static UriType Create(someArgs)
{
}
}
I like new. To me
var foo = newFoo();
reads better than
var foo = createFoo();
Translated to english we have foo is a new foo or foo is create foo. While I'm not a grammer expert I'm pretty sure the latter is grammatically incorrect.
I'd point out that I've seen all of the verbs but produce in use in some library or other, so I wouldn't call create being an universal convention.
Now, create does sound better to me, evokes the precise meaning of the action.
So yes, it is a matter of (literary) taste.
Personally I like instantiate and instantiateWith, but that's just because of my Unity and Objective C experiences. Naming conventions inside the Unity engine seem to revolve around the word instantiate to create an instance via a factory method, and Objective C seems to like with to indicate what the parameter/s are. This only really works well if the method is in the class that is going to be instantiated though (and in languages that allow constructor overloading, this isn't so much of a 'thing').
Just plain old Objective C's initWith is also a good'un!

Why isn't DRY considered a good thing for type declarations?

It seems like people who would never dare cut and paste code have no problem specifying the type of something over and over and over. Why isn't it emphasized as a good practice that type information should be declared once and only once so as to cause as little ripple effect as possible throughout the source code if the type of something is modified? For example, using pseudocode that borrows from C# and D:
MyClass<MyGenericArg> foo = new MyClass<MyGenericArg>(ctorArg);
void fun(MyClass<MyGenericArg> arg) {
gun(arg);
}
void gun(MyClass<MyGenericArg> arg) {
// do stuff.
}
Vs.
var foo = new MyClass<MyGenericArg>(ctorArg);
void fun(T)(T arg) {
gun(arg);
}
void gun(T)(T arg) {
// do stuff.
}
It seems like the second one is a lot less brittle if you change the name of MyClass, or change the type of MyGenericArg, or otherwise decide to change the type of foo.
I don't think you're going to find a lot of disagreement with your argument that the latter example is "better" for the programmer. A lot of language design features are there because they're better for the compiler implementer!
See Scala for one reification of your idea.
Other languages (such as the ML family) take type inference much further, and create a whole style of programming where the type is enormously important, much more so than in the C-like languages. (See The Little MLer for a gentle introduction.)
It isn't considered a bad thing at all. In fact, C# maintainers are already moving a bit towards reducing the tiring boilerplate with the var keyword, where
MyContainer<MyType> cont = new MyContainer<MyType>();
is exactly equivalent to
var cont = new MyContainer<MyType>();
Although you will see many people who will argue against var usage, which kind of shows that many people is not familiar with strong typed languages with type inference; type inference is mistaken for dynamic/soft typing.
Repetition may lead to more readable code, and sometimes may be required in the general case. I've always seen the focus of DRY being more about duplicating logic than repeating literal text. Technically, you can eliminate 'var' and 'void' from your bottom code as well. Not to mention you indicate scope with indentation, why repeat yourself with braces?
Repetition can also have practical benefits: parsing by a program is easier by keeping the 'void', for example.
(However, I still strongly agree with you on prefering "var name = new Type()" over "Type name = new Type()".)
It's a bad thing. This very topic was mentioned in Google's Go language Techtalk.
Albert Einstein said, "Everything should be made as simple as possible, but not one bit simpler."
Your complaint makes no sense in the case of a dynamically typed language, so you must intend this to refer to statically typed languages. In that case, your replacement example implicitly uses Generics (aka Template Classes), which means that any time that fun or gun is used, a new definition based upon the type of the argument. That could result in dozens of extra methods, regardless of the intent of the programmer. In particular, you're throwing away the benefit of compiler-checked type-safety for a runtime error.
If your goal was to simply pass through the argument without checking its type, then the correct type would be Object not T.
Type declarations are intended to make the programmer's life simpler, by catching errors at compile-time, instead of failing at runtime. If you have an overly complex type definition, then you probably don't understand your data. In your example, I would have suggested adding fun and gun to MyClass, instead of defining them separately. If fun and gun don't apply to all possible template types, then they should be defined in an explicit subclass, not as separate functions that take a templated class argument.
Generics exist as a way to wrap behavior around more specific objects. List, Queue, Stack, these are fine reasons for Generics, but at the end of the day, the only thing you should be doing with a bare Generic is creating an instance of it, and calling methods on it. If you really feel the need to do more than that with a Generic, then you probably need to embed your Generic class as an instance object in a wrapper class, one that defines the behaviors you need. You do this for the same reason that you embed primitives into a class: because by themselves, numbers and strings do not convey semantic information about their contents.
Example:
What semantic information does List convey? Just that you're working with multiple triples of integers. On the other hand, List, where a color has 3 integers (red, blue, green) with bounded values (0-255) conveys the intent that you're working with multiple Colors, but provides no hint as to whether the List is ordered, allows duplicates, or any other information about the Colors. Finally a Palette can add those semantics for you: a Palette has a name, contains multiple Colors, but no duplicates, and order isn't important.
This has gotten a bit far afield from the original question, but what it means to me is that DRY (Don't Repeat Yourself) means specifying information once, but that specification should be as precise as is necessary.

Resources