Should data be descriptive of itself? In what cases should it preferably be or not be? - data-structures

I am not sure how to put this easily into a simple question.
I am just going to use an example.
Say I am sending some parameter to a web browser from a server. The Javascript will know what to do with it. Say it was a setting for some page element that could have 4 different values. I could make it be 0-3, or I could make it be "bright", "dark", "transparent", "none". Do you see what I mean? In one case the data is descriptive.
Now step outside of the realm of web development. In fact, step away from any facet of programming that would NOT require one method or the other, and think of some that would prefer one over the other. Meaning it would be beneficial to the over all goals if it was done in a descriptive manner, or beneficial if it was done in a cryptic manner.
Can you think of some examples where you would want one over the other?
PS: I may need help with the tags on this one guys.

Benefit of the number variant is smaller data size. That can be useful if you are communicating a lot of data or communicating over a restricted bandwidth channel. Also comparing numbers is much faster than comparing strings.
The alternative with meaningful names is beneficial when you need easy extensibility and maintainability. You can see what the value means without using any other translation table. Also you can enable others to add new values by defining some naming rules.

The benefits of using the one strategy over the other is quite simmilar to the benefits of strong vs. weak typing. Values like "bright", "dark" etc. is strongly typed while 0, 1, 2 is weakly typed.
The most important benefits of using strongly typed data is 1) that it is easy for other people to know what the value means and how to use it and 2) that you will get a meaningful, syntactic error early if you use an illegal value.
The benefits of weakly typing is that you may introduce new values without having to change intermediate modules. I.e. you could introduce "4" without changing intermediate modules that don't really have to understand what the value means.
I would definitely go for "bright", "dark" etc.
NB! Some would probably argue that "bright" is a string and so is weakly typed in the same way as "1", but this depends on the perspective.

Related

When to use references versus types versus boxes and slices versus vectors as arguments and return types?

I've been working with Rust the past few days to build a new library (related to abstract algebra) and I'm struggling with some of the best practices of the language. For example, I implemented a longest common subsequence function taking &[&T] for the sequences. I figured this was Rust convention, as it avoided copying the data (T, which may not be easily copy-able, or may be big). When changing my algorithm to work with simpler &[T]'s, which I needed elsewhere in my code, I was forced to put the Copy type constraint in, since it needed to copy the T's and not just copy a reference.
So my higher-level question is: what are the best-practices for passing data between threads and structures in long-running processes, such as a server that responds to queries requiring big data crunching? Any specificity at all would be extremely helpful as I've found very little. Do you generally want to pass parameters by reference? Do you generally want to avoid returning references as I read in the Rust book? Is it better to work with &[&T] or &[T] or Vec<T> or Vec<&T>, and why? Is it better to return a Box<T> or a T? I realize the word "better" here is considerably ill-defined, but hope you'll understand my meaning -- what pitfalls should I consider when defining functions and structures to avoid realizing my stupidity later and having to refactor everything?
Perhaps another way to put it is, what "algorithm" should my brain follow to determine where I should use references vs. boxes vs. plain types, as well as slices vs. arrays vs. vectors? I hesitate to start using references and Box<T> returns everywhere, as I think that'd get me a sort of "Java in Rust" effect, and that's not what I'm going for!

Is this backwards naming convention a bad idea (ie. contrary to industry standards)?

I've always reversed names so that they naturally group in intellisense. I am wondering if this is a bad idea.
For example, I run a pet store and I have invoicing pages add, edit, delete, and store pages display, preview, edit. To get the URL for these, I would call the methods (in a suitable class like GlobalUrls.cs
InvoicingAddUrl()
InvoicingEditUrl()
InvoicingDeleteUrl()
StoreDisplayUrl()
StorePreviewUrl()
StoreEditUrl()
This groups them nicely in intellisense. More logical naming would be:
AddInvoiceUrl()
EditInvoiceUrl()
DeleteInvoiceUrl()
DisplayStoreUrl()
PreviewStoreUrl()
EditStoreUrl()
Is it better (better being, more of an industry standard way) to group them for intellisense, or logically?
Grouping in Intellisense is just one factor in creating a naming scheme, but logically grouping by category rather than function is a common practice as well.
Most naming "conventions" dictate usage of characters, casing, underscores, etc. I think it is a matter of personal preference (company, team or otherwise) as to whether you use NounVerb or VerbNoun formatting for your method names.
Here are some resources:
Microsoft - General Naming Conventions
Wikibooks C# Programming/Naming
Akadia .NET Naming Conventions
Related questions:
Naming Conventions - Guidelines for Verbs, Nouns and English Grammar Usage
Do vs. Run vs. Execute vs. Perform verbs
Events - naming convention and style
Check out how the military names things. For example, MREs are Meals, Ready to Eat. They do this because of sort order, efficiency and not making mistakes. They are ready to ignore the standard naming conventions of the language (i.e., English) used outside of their organization because they are not impressed with the quality of operations outside of their organization. In the military, the quality of operations is literally a matter of life and death. Also, by doing things their own way they have a way of identifying who is inside and who is outside of the organization. Anyone unable or unwilling to learn the military way, which is different but not impossibly difficult, is not their first choice for recruitment or promotion.
So, if you are impressed with the standard quality of software out there, then by all means keep doing what everyone else is doing. But, if you wish to do better than you have in the past, or better than your competitor, then I suggest looking at other fields for lessons learned the hard way, such as the military. Then make some choices for your organization, that are not impossible but are for you and your competitiveness. You can choose big-endian names (most significant information comes last) or the military-style little-endian names (most significant information comes first), or you can use the dominant style your competitors probably use, which is doing whatever you feel like whenever you feel like it.
Personally, I prefer little-endian Hungarian (Apps) naming, which was widely seen as superior when it first came out, but then lost favor because Hungarian (Sys) naming destroyed the advantage due to a mistranslation of the basic idea, and because of rampant abbreviations. The original intent was to start a name with what kind of a thing it is, then become increasingly specific until you end with a unique qualification. This is also the order that most array dimensions and object qualifiers are in, so in most languages little-endian naming flows into the larger scheme of the language.
You are on to something. Forward, march.
It's not intrinsically bad. It has the upside of being easier to identify the type while scanning, and groups the options together in Intellisense like you said. As long as you and everyone else on your team picks a way of doing things and stays consistent about it there shouldn't be any big problems.
Based on the methods listed, you might be able to refactor Invoicing and Store out into their own classes, which would be closer to the mythical "industry standard" way.
That said, whatever your programming team can agree on for naming convention should be fine. The important thing is to be consistent within the project.
I don't think it's a good idea to develop a coding standard around a tool (as least not as the first consideration). Even though most IDEs will have Intellisense these days, and most people will be using said IDEs, I think that first and foremost a coding standard should be about making the code legible and navigable on its own merits.
I would opt for most logical naming, personally. When I write code and I have some object I'm about to call a member function on, I'm usually thinking about what member function to call based on the action I'm about to do, because I already know the object I'm manipulating. So my first impulse would be to start typing "Add" if I wanted to add something, and see what Intellisense showed me. This is, of course, subjective.
I have never actually seen anybody using your alphabetical, Intellisense grouping anywhere -- at least not in code that is not worth using as a basis for comparison because it was so horrid in other ways.
That said, if it's your standard, do what you want -- consistency is the important part.

Why is tightly coupled bad but strongly typed good?

I am struggling to see the real-world benefits of loosely coupled code. Why spend so much effort making something flexible to work with a variety of other objects? If you know what you need to achieve, why not code specifically for that purpose?
To me, this is similar to creating untyped variables: it makes it very flexible, but opens itself to problems because perhaps an unexpected value is passed in. It also makes it harder to read, because you do not explicitly know what is being passed in.
Yet I feel like strongly typed is encouraged, but loosely coupling is bad.
EDIT: I feel either my interpretation of loose coupling is off or others are reading it the wrong way.
Strong coupling to me is when a class references a concrete instance of another class. Loose coupling is when a class references an interface that another class can implement.
My question then is why not specifically call a concrete instance/definition of a class? I analogize that to specifically defining the variable type you need.
I've been doing some reading on Dependency Injection, and they seem to make it out as fact that loose coupling better design.
First of all, you're comparing apples to oranges, so let me try to explain this from two perspectives. Typing refers to how operations on values/variables are performed and if they are allowed. Coupling, as opposed to cohesion, refers to the architecture of a piece (or several pieces) of software. The two aren't directly related at all.
Strong vs Weak Typing
A strongly typed language is (usually) a good thing because behavior is well defined. Take these two examples, from Wikipedia:
Weak typing:
a = 2
b = '2'
concatenate(a, b) # Returns '22'
add(a, b) # Returns 4
The above can be slightly confusing and not-so-well-defined because some languages may use the ASCII (maybe hex, maybe octal, etc) numerical values for addition or concatenation, so there's a lot of room open for mistakes. Also, it's hard to see if a is originally an integer or a string (this may be important, but the language doesn't really care).
Strongly typed:
a = 2
b = '2'
#concatenate(a, b) # Type Error
#add(a, b) # Type Error
concatenate(str(a), b) # Returns '22'
add(a, int(b)) # Returns 4
As you can see here, everything is more explicit, you know what variables are and also when you're changing the types of any variables.
Wikipedia says:
The advantage claimed of weak typing
is that it requires less effort on the
part of the programmer than, because
the compiler or interpreter implicitly
performs certain kinds of conversions.
However, one claimed disadvantage is
that weakly typed programming systems
catch fewer errors at compile time and
some of these might still remain after
testing has been completed. Two
commonly used languages that support
many kinds of implicit conversion are
C and C++, and it is sometimes claimed
that these are weakly typed languages.
However, others argue that these
languages place enough restrictions on
how operands of different types can be
mixed, that the two should be regarded
as strongly typed languages.
Strong vs weak typing both have their advantages and disadvantages and neither is good or bad. It's important to understand the differences and similarities.
Loose vs Tight Coupling
Straight from Wikipedia:
In computer science, coupling or
dependency is the degree to which each
program module relies on each one of
the other modules.
Coupling is usually contrasted with
cohesion. Low coupling often
correlates with high cohesion, and
vice versa. The software quality
metrics of coupling and cohesion were
invented by Larry Constantine, an
original developer of Structured
Design who was also an early proponent
of these concepts (see also SSADM).
Low coupling is often a sign of a
well-structured computer system and a
good design, and when combined with
high cohesion, supports the general
goals of high readability and
maintainability.
In short, low coupling is a sign of very tight, readable and maintainable code. High coupling is preferred when dealing with massive APIs or large projects where different parts interact to form a whole. Neither is good or bad. Some projects should be tightly coupled, i.e. an embedded operating system. Others should be loosely coupled, i.e. a website CMS.
Hopefully I've shed some light here :)
The question is right to point out that weak/dynamic typing is indeed a logical extension of the concept of loose coupling, and it is inconsistent for programmers to favor one but not the other.
Loose coupling has become something of a buzzword, with many programmers unnecessarily implementing interfaces and dependency injection patterns -- or, more often than not, their own garbled versions of these patterns -- based on the possibility of some amorphous future change in requirements. There is no hiding the fact that this introduces extra complexity and makes code less maintainable for future developers. The only benefit is if this anticipatory loose coupling happens to make a future change in requirements easier to implement, or promote code reuse. Often, however, requirements changes involve enough layers of the system, from UI down to storage, that the loose coupling doesn't improve the robustness of the design at all, and makes certain types of trivial changes more tedious.
You're right that loose coupling is almost universally considered "good" in programming. To understand why, let's look at one definition of tight coupling:
You say that A is tightly coupled to B if A must change just because B changed.
This is a scale that goes from "completely decoupled" (even if B disappeared, A would stay the same) to "loosely coupled" (certain changes to B might affect A, but most evolutionary changes wouldn't), to "very tightly coupled" (most changes to B would deeply affect A).
In OOP we use a lot of techniques to get less coupling - for example, encapsulation helps decouple client code from the internal details of a class. Also, if you depend on an interface then you don't generally have to worry as much about changes to concrete classes that implement the interface.
On a side note, you're right that typing and coupling are related. In particular, stronger and more static typing tend to increase coupling. For example, in dynamic languages you can sometimes substitute a string for an array, based on the notion that a string can be seen as an array of characters. In Java you can't, because arrays and strings are unrelated. This means that if B used to return an array and now returns a string, it's guaranteed to break its clients (just one simple contrived example, but you can come up with many more that are both more complex and more compelling). So, stronger typing and more static typing are both trade-offs. While stronger typing is generally considered good, favouring static versus dynamic typing is largely a matter of context and personal tastes: try setting up a debate between Python programmers and Java programmers if you want a good fight.
So finally we can go back to your original question: why is loose coupling generally considered good? Because of unforeseen changes. When you write the system, you cannot possibly know which directions it will eventually evolve in two months, or maybe two hours. This happens both because requirements change over time, and because you don't generally understand the system completely until after you've written it. If your entire system is very tightly coupled (a situation that's sometimes referred to as "the Big Ball of Mud"), then any change in every part of the system will eventually ripple through every other part of the system (the definition of "very tight coupling"). This makes for very inflexible systems that eventually crystallize into a rigid, unmaintanable blob. If you had 100% foresight the moment you start working on a system, then you wouldn't need to decouple.
On the other hand, as you observe, decoupling has a cost because it adds complexity. Simpler systems are easier to change, so the challenge for a programmer is striking a balance between simple and flexible. Tight coupling often (not always) makes a system simpler at the cost of making it more rigid. Most developers underestimate future needs for changes, so the common heuristic is to make the system less coupled than you're tempted to, as long as this doesn't make it overly complex.
Strongly typed is good because it prevents hard to find bugs by throwing compile-time errors rather than run-time errors.
Tightly coupled code is bad because when you think you "know what you need to achieve", you are often wrong, or you don't know everything you need to know yet.
i.e. you might later find out that something you've already done could be used in another part of your code. Then maybe you decide to tightly couple 2 different versions of the same code. Then later you have to make a slight change in a business rule and you have to alter 2 different sets of tightly coupled code, and maybe you will get them both correct, which at best will take you twice as long... or at worst you will introduce a bug in one, but not in the other, and it goes undetected for a while, and then you find yourself in a real pickle.
Or maybe your business is growing much faster than you expected, and you need to offload some database components to a load-balancing system, so now you have to re-engineer everything that is tightly coupled to the existing database system to use the new system.
In a nutshell, loose coupling makes for software that is much easier to scale, maintain, and adapt to ever-changing conditions and requirements.
EDIT: I feel either my interpretation
of loose coupling is off or others are
reading it the wrong way. Strong
coupling to me is when a class
references a concrete instance of
another class. Loose coupling is when
a class references an interface that
another class can implement.
My question then is why not
specifically call a concrete
instance/definition of a class? I
analogize that to specifically
defining the variable type you need.
I've been doing some reading on
Dependency Injection, and they seem to
make it out as fact that loose
coupling better design.
I'm not really sure what your confusion is here. Let's say for instance that you have an application that makes heavy use of a database. You have 100 different parts of your application that need to make database queries. Now, you could use MySQL++ in 100 different locations, or you can create a separate interface that calls MySQL++, and reference that interface in 100 different places.
Now your customer says that he wants to use SQL Server instead of MySQL.
Which scenario do you think is going to be easier to adapt? Rewriting the code in 100 different places, or rewriting the code in 1 place?
Okay... now you say that maybe rewriting it in 100 different places isn't THAT bad.
So... now your customer says that he needs to use MySQL in some locations, and SQL Server in other locations, and Oracle in yet other locations.
Now what do you do?
In a loosely coupled world, you can have 3 separate database components that all share the same interface with different implementations. In a tightly coupled world, you'd have 100 sets of switch statements strewn with 3 different levels of complexity.
If you know what you need to achieve, why not code specifically for that purpose.
Short answer: You almost never know exactly what you need to achieve. Requirements change, and if your code is loosely coupled in the first place, it will be less of a nightmare to adapt.
Yet I feel like strongly typed is encouraged, but loosely coupling is bad.
I don't think it is fair to say that strong typing is good or encouraged. Certainly lots of people prefer strongly typed languages because it comes with compile-time checking. But plenty of people would say that weak typing is good. It sounds like since you've heard "strong" is good, how can "loose" be good too. The merits of a language's typing system isn't even in the realm of a similar concept as class design.
Side note: don't confuse strong and static typing
strong typing will help reduce errors while typically aiding performance. the more information the code-generation tools can gather about acceptable value ranges for variables, the more these tools can do to generate fast code.
when combined with type inference and feature's like traits (perl6 and others) or type classes (haskell), strongly typed code can continue to be compact and elegant.
I think that tight/loose coupling (to me: Interface declaration and assignment of an object instance) is related to the Liskov Principle. Using loose coupling enables some of the advantages of the Liskov Principle.
However, as soon as instanceof, cast or copying operations are executed, the usage of loose coupling starts being questionable. Furthermore, for local variables withing a method or block, it is non-sense.
If any modification done in our function, which is in a derived class, will change the code in the base abstract class, then this shows the full dependency and it means this is tight coupled.
If we don't write or recompile the code again then it showes the less dependency, hence it is loose coupled.

Computing Efficiency Question

I'm wondering about computational efficiency. I'm going to use Java in this example, but it is a general computing question. Lets say I have a string and I want to get the value of the first letter of the string, as a string. So I can do
String firstletter = String.valueOf(somestring.toCharArray()[0]);
Or I could do:
char[] stringaschar = somestring.toCharArray();
char firstchar = stringaschar[0];
String firstletter = String.valueOf(firstchar);
My question is, are the two ways essentially the same, computationally? I mean, the second way I explicitly had to create 2 intermediate variables, to be stored in memory (the stack?) temporarily.
But the first way, too, the computer will have to still create the same variables, implicitly, right? And the number of operations doesn't change. My thinking is, the two ways are the same. But I'd like to know for sure.
In most cases the two ways should produce the same, or nearly the same, object code. Optimizing compilers usually detect that the intermediate variables in the second option are not necessary to get the correct result, and will collapse the call graph accordingly.
This all depends on how your Java interpreter decides to translate your code into an intermediary language for runtime execution. It may actually have optimizations which translate the two approaches to be the same exact byte code.
The two should be essentially the same. In both cases you make the same calls converting the string to an array, finding the first character, and getting the value of the character. There may be minor differences in how the compiler handles these, but they should be insignficant.
The earlier answers are coincident and right, AFAIK.
However, I think there are a few additional and general considerations you should be aware of each time you wonder about the efficiency of any computational asset (code, for example).
First, if everything is under your strict control you could in principle count clock cycles one by one from assembly code. Or from some more abstract reasoning find the computational cost of an operation/algorithm.
So far so good. But don't forget to measure afterwards. You may find that measuring execution times is not so easy and straightforward, and sometimes is elusive (How to account for interrupts, for I/O wait, for network bottlenecks ...). But it pays. You ask here for counsel, but YOUR Compiler/Interpreter/P-code generator/Whatever could be set with just THAT switch in the third layer of your config scripts.
The other consideration, more to your current point is the existence of Black Boxes. You are not alone in the world and a Black Box is any piece used to run your code, which is essentially out of your control. Compilers, Operating Systems, Networks, Storage Systems, and the World in general fall into this category.
What we do with Black Boxes (they are black, either because their code is not public or because we just happen to use our free time fishing instead of digging library source code) is establishing mental models to help us understand how they work. (BTW, This is an extraordinary book about how we humans forge our mental models). But you should always beware that they are models, not the real thing. Models help us to explain things ... to a certain extent. Classical Mechanics reigned until Relativity and Quantum Mechanics fluorished. None of them is wrong They have limits, and so have all our models.
Even if you happen to be friend with your router OS, or your Linux kernel, when confronting an efficiency problem, design a good experiment and measure.
HTH!
NB: By design a good experiment I mean beware of the tar pits. Examples: measuring your measurement code instead the target of the experiment, being influenced by external factors, forget external factors that will influence the production code, test with data whose cardinality, orthogonality, or whatever-ality is dissimilar with the "real world", mapping wrongly the production and testing Client/server workhorses, et c, et c, et c.
So go, and meassure your code. Your results will be the most interesting thing in this page.

Using Sets in Ruby

I'm building a simple Ruby on Rails plugin and I'm considering using the Set class. I don't see the Set class used all too often in other people's code.
Is there a reason why people choose to use (subclasses of) Array rather than a set? Would using a set introduce dependecy problems for some people?
Set is part of the standard library, so it shouldn't pose any dependency problems. If it's the cleanest way to solve the problem, go for it.
Regarding use (or lack thereof) I think there are probably two main reasons:
programmers not being aware of the library
programmers not realising when sets are the best way to a solution
programmers not knowing/remembering anythign about sets at all.
Make that three main reasons.
Ruby arrays are very flexible anyway, there are plenty of methods that allow to threat It like a set, a stack or a queue for example.
The way I've understood it from what I've read here is that Set's are likely mathematical sets, ie. order doesn't matter and isn't preserved.
In most programming applications, unless you're doing maths, it have limited use, because normally you want to preserve order.

Resources