Performance Implications of Using Spaces Instead of Tabs for Indentation - ruby

I currently use soft tabs (i.e. spaces) for indenting my Ruby code, if I were to use hard tabs would it increase the performance when the code is interpreted? I assume it's faster to read one tab character than parse 4 space characters (however negligible).

Do you have an idea of all the phases involved in interpreting from source? Only the very first one, lexical analysis, has to deal with whitespace, and in the case of whitespace, "deal with" means "ignore it". This phase only takes a tiny fraction of the total time, it's generally done using regular expression and pretty much has linear complexity. Constrast that with parsing, which can take ages in comparision. And interpreting is only somewhat viable because those two phases (plus a third, bytecode generation, in implementations that use bytecode) takes much less than the actual execution for nontrivial programs.
Don't worry about this. There is no difference anyone would ever notice. Honestly, I'd be surprised if you could measure a difference using time and a small program that does close to no actual work.

Pretty sure that whatever negligible impact the parser may have between reading one byte for tabbed indentation vs. four bytes for spaces will be offset by the next person that has to read your code and fix your tabbed / spaced mess.
Please use spaces. Signed, the next guy to read your code.

Performance impact is ε, that is, a very small number greater than zero. The spaces only get read and parsed once, the Ruby code is then transformed into an intermediate form.

Related

Performance loss through obfuscation?

I’ve read that good obfuscation techniques not merely do things like replacing method names with something obscure, but also, for instance, replace strings in the source code with byte arrays and add methods to convert those back to the original strings.
This might be one of those questions leading to opinion-based answers, but I’m going to ask it anyway: Is there any general notion how much performance loss an application would suffer from in case such an obfuscation method is applied? I’ve got in mind a software that is heavily leaning on a database, i.e., queries exist in the code, for instance, as C# strings or StringBuilder entities.
Yes, string obfuscation has a significant performance impact, at the micro-level. With obfuscation, instead of a direct memory lookup you have code that has to execute (every time), and it is usually somewhat complicated, so it is necessarily much worse at the micro-performance level.
However, that cost usually doesn't matter; the time required for the database call (or showing the UI dialog, or sending the error to a log, or network traffic, or ...) is going to be orders of magnitude higher than the cost of converting the string. In most cases, the cost of the conversion is essentially invisible.
As with everything, careful testing is wise, but usually the costs are only "visible" if you are accessing obfuscated strings in a tight loop that is already CPU-performance-sensitive.

What exactly is the token count in functions/methods used for?

I've been using some tools to measure code quality and CCN (Cyclomatic Complexity Number) and some of those tools provides a count for tokens in functions what does that count says about my function or method? What is it used for?
Cyclomatic Complexity Number is a metric to indicate complexity of function, procedure or program. The best (large enough and intuitive) explanation I have found is provided here.
I think that tokens refer to conditional statements tokens that actually are taken into account to compute the cyclomatic complexity.
[later edit]
A high CCN means complex code that:
it is (much) more hard to read and understand
it is hard to maintain
unit tests are harder to maintain since a decent code coverage is reached with more difficulty
might lead to more bugs
CCN can be reduced using various techniques. Some examples can be seen here or here.
In the context of CCN tools, a token is any distinct operator or operand.
How this is implemented depends on the tool. Since the page on Lizard doesn't go into details, you will have to examine the source code (its not many lines)
https://github.com/terryyin/lizard/tree/master/lizard_languages
If you search the source for 'token', you will see how the tool is parsing the code. In most cases it is looking for code blocks, expressions, annotations and accessing of methods/objects.
For example, according to java.py, Java is only parsed for '{', '#', and '.'
Not sure why it isn't looking for expression...?
The OP has not declared which tool they're using but for lizard this has been asked from as an issue so it might help someone
Token is the word and operators, etc.
For example: if (abc % 3 != 0) Has [‘if’, ‘(‘, ‘abc’, ‘%’, ‘3’, ‘!=‘, ‘0’, ‘)’] 8 tokens.
Also another source that has similar description:
One program can have a maximum of 8192 tokens. Each token is a word
(e.g. variable name) or operator. Pairs of brackets, and strings count
as 1 token. commas, periods, LOCALs, semi-colons, ENDs, and comments
are not counted.
Now the next question is, would the number of tokens matter like CNN? Giving the disclaimer that I am not an expert in code quality, it depends on the language. For example, in compiled languages, you might want to break a complex line into multiple lines which increases the number of tokens but significantly enhances the readability of the code. You should go for it, the modern compilers are smart enough to optimize them.
However, this might not be so much true in interpreted languages. Again, you should look into the specific language you are using to make sure if there is any optimization behind the scene or not. That being said, some languages such as Python provide syntaxes to reduce the number of tokens. This is great as long as it was designed in the language.
TL;DR: This factor doesn't matter as much as code readability. Double-check your code if it is high but don't mess up the code to lower it.

Why does this program in ><> (Fish) run so slow after a while?

The program 1+:o in the esoteric programming language ><> (Fish) slows down over time, and I don't know why. It slows down most on the :, which duplicates the item at the top of the stack, and slows down somewhat on the o, which prints out the corresponding character for the top item in the stack. You can try it out here; just make sure to initialize the stack with a 0. It slows down faster on mobile devices (source: my phone), in case you want to check in less time.
never heard of this language before but here some possibilities:
your output is increasing
so each iteration you got 1 more character to show
if your brownser does not have some kind of smart refreshing
then this will slow down a lot for higher n
have no clue what you are doing as the language is foreign to me
but it looks like you are also increasing some array/stack/heap/whatever size
by one item each iteration which takes memory
and on phones there is not so much of it available
not to mention the relocations ...
the output looks like it is in Unicode
and once you hit the special characters then it depends on fonts installed
some characters are pretty slow.
Usual policy is to find used code page in installed fonts which could take a while
and the rendering itself is also for some characters not very good.
If you have some Unicode font (not just the chunks) then it should speed things up considerably (especially raster fonts).
But there are not many of them out there and never saw any that is complete (it wold be huge) but it is a while I search for those...
here one of the "complete" Unicode fonts example GNU_Unifont
I've programmed in ><> before, so I'll give my view on it.
Personally, I didn't experience much slowdown of the program when running it on my computer. I ran it with the animation, so I could see what was going on with the stack and execution speed.
The stack operations seem to occur at the same rate the entire time. I ran it past 3300. The output appears to slow down, but that it because the character sets for foreign languages use characters that combine or interact with each other in some way. This is the primary cause of the visual slowdown, as the output has to be rewritten as certain characters are printed adjacent to each other. So really, it's the fact that the memory usage increases the longer you run it, and the fact that the output is atypical.
Also, the RTL (right-to-left) character was printed, so all output following this character is displayed from right to left, which the browser isn't as used to and may not be optimized for.
Another way I can tell that the output is slowing down the browser is that when I pause execution of the program, the page is still slow. I tried to zoom in/out, and it took multiple seconds to render the output in some cases.

How many lines should a function have at most?

Is there a good coding technique that specifies how many lines a function should have ?
No. Lines of code is a pretty bad metric for just about anything. The exception is perhaps functions that have thousands and thousands of lines - you can be pretty sure those aren't well written.
There are however, good coding techniques that usually result in fewer lines of code per function. Things like DRY (Don't Repeat Yourself) and the Unix-philosophy ("Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." from Wikipedia). In this case replace "programs" with "functions".
I don't think it matters, who is to say that once a functions lengths passes a certain number of lines it breaks a rule.
In general just code clean functions easy to use and reuse.
A function should have a well defined purpose. That is, try to create functions which does a single thing, either by doing the thing itself or by delegating work to a number of other functions.
Most functional compilers are excellent at inlining. Thus there is no inherent price to pay for breaking up your code: The compiler usually does a good job at deciding if a function call should really be one or if it can just inline the code right away.
The size of the function is less relevant though most functions in FP tend to be small, precise and to the point.
There is a McCabe metric of Cyclomatic Complexity which you might read about at this Wikipedia article.
The metric measures how many tests and loops are present in a routine. A rule of thumb might be that under 10 is a manageable amount of complexity while over 11 becomes more fault prone.
I have seen horrendous code that had a Complexity metric above 50. (It was error-prone and difficult to understand or change.) Re-writing it and breaking it down into subroutines reduced the complexity to 8.
Note the Complexity metric is usually proportional to the lines of code. It would provide you a measure on complexity rather than lines of code.
When working in Forth (or playing in Factor) I tend to continually refactor until each function is a single line! In fact, if you browse through the Factor libraries you'll see that the majority of words are one-liners and almost nothing is more than a few lines. In a language with inner-functions and virtually zero cost for calls (that is, threaded code implicitly having no stack frames [only return pointer stack], or with aggressive inlining) there is no good reason not to refractor until each function is tiny.
From my experience a function with a lot of lines of code (more than a few pages) is a nightmare to maintain and test. But having said that I don't think there is a hard and fast rule for this.
I came across some VB.NET code at my previous company that one function of 13 pages, but my record is some VB6 code I have just picked up that is approx 40 pages! Imagine trying to work out which If statement an Else belongs to when they are pages apart on the screen.
The main argument against having functions that are "too long" is that subdividing the function into smaller functions that only do small parts of the entire job improves readability (by giving those small parts actual names, and helping the reader wrap his mind around smaller pieces of behavior, especially when line 1532 can change the value of a variable on line 45).
In a functional programming language, this point is moot:
You can subdivide a function into smaller functions that are defined within the larger function's body, and thus not reducing the length of the original function.
Functions are expected to be pure, so there's no actual risk of line X changing the value read on line Y : the value of the line Y variable can be traced back up the definition list quite easily, even in loops, conditionals or recursive functions.
So, I suspect the answer would be "no one really cares".
I think a long function is a red flag and deserves more scrutiny. If I came across a function that was more than a page or two long during a code review I would look for ways to break it down into smaller functions.
There are exceptions though. A long function that consists of mostly simple assignment statements, say for initialization, is probably best left intact.
My (admittedly crude) guideline is a screenful of code. I have seen code with functions going on for pages. This is emetic, to be charitable. Functions should have a single, focused purpose. If you area trying to do something complex, have a "captain" function call helpers.
Good modularization makes friends and influences people.
IMHO, the goal should be to minimize the amount of code that a programmer would have to analyze simultaneously to make sense of a program. In general, excessively-long methods will make code harder to digest because programmers will have to look at much of their code at once.
On the other hand, subdividing methods into smaller pieces will only be helpful if those smaller pieces can be analyzed separately from the code which calls them. Splitting a method into sub-methods which would only be meaningful in the context where they are called is apt to impair rather than improve legibility. Even if before splitting the method would have been over 250 lines, breaking it into ten pieces which don't make sense in isolation would simply increase the simultaneous-analysis requirement from 250 lines to 300+ (depending upon how many lines are added for method headers, the code that calls them, etc.) When deciding whether a method should be subdivided, it's far more important to consider whether the pieces make sense in isolation, than to consider whether the method is "too long". Some 20-lines routine might benefit from being split into two ten-line routines and a two-line routine that calls them, but some 250-line routines might benefit from being left exactly as they are.
Another point which needs to be considered, btw, is that in some cases the required behavior of a program may not be a good fit with the control structures available in the language it's written in. Most applications have large "don't-care" aspects of their behavior, and it's generally possible to assign behavior that will fit nicely with a language's available control structures, but sometimes behavioral requirements may be impossible to meet without awkward code. In some such cases, confining the awkwardness to a single method which is bloated, but which is structured around the behavioral requirements, may be better than scattering it among many smaller methods which have no clear relationship to the overall behavior.

Writing shorter code/algorithms, is more efficient (performance)?

After coming across the code golf trivia around the site it is obvious people try to find ways to write code and algorithms as short as the possibly can in terms of characters, lines and total size, even if that means writing something like:
//Code by: job
//Topic: Code Golf - Collatz Conjecture
n=input()
while n>1:n=(n/2,n*3+1)[n%2];print n
So as a beginner I start to wonder whether size actually matters :D
It is obviously a very subjective question highly dependent on the actual code being used, but what is the rule of thumb in the real world.
In the case that size wont matter, how come then we don't focus more on performance rather than size?
I hope this does not become a flame war. Good code has many attributes, including:
Solving the use-case properly.
Readability.
Maintainability.
Performance.
Testability.
Low memory signature.
Good user interface.
Reusability.
The brevity of code is not that important in 21st century programming. It used to be more important when memory was really scarce. Please see this question, including my answer, for books referencing the attributes above.
A lot of good answers already about what's important versus what's not. In real life, (almost) nobody writes code like code golf, wtih shortened identifiers, minimal whitespace, and the fewest possible statements.
That said, "more code" does correlate with more bugs and complexity, and "less code" tends to correlate with better readability and performance. So all other things being equal, it's useful to strive for shorter code, but only in the sense of "these simple 30 lines of code do the same as that 100 complex lines of code".
Writing "code golf" solutions are often to do with showing how "clever" you are in getting the job done in the most succinct way even at the expense of readability. Quite often, however, more verbose code including, for example, memoization of function results, can be faster. Code size can matter for performance, smaller blocks of code can fit in the L1 CPU cache but this is an extreme case of optimization and a faster algorithm will most always be better. "Code Golf" code is not like production code - always write for clarity & readability of the solution rather than terseness if anyone, including yourself, ever intend to read that code again.
Whitespace has no effect on performance. So code like that is just silly (or perhaps the golf score was based on the character count?). The number of lines also has no effect, although the number of statements can have an effect. (exception: python, where whitespace is significant)
The effect is complex, however. It's not at all uncommon to discover that you have to add statements to a function in order to improve it's performance.
Still, without knowing anything else, bet that more statements is a larger object file, and a slower program. But more lines doesn't do anything other than make code more readable up to a point (after which adding more lines makes it less readable ;)
I don't believe that Code Golf has any practical significance. In practice, readable code is what counts. Which in itself is a conflicting requirement: readable code should be concise, but still easy to understood.
However, I would like to answer your question yet differently. Usually, there are fast and simple algorithms. However, if the speed is top priority, things can get complex real fast (and the resulting code will be longer). I don't believe that simplicity equals speed.
There are many aspects to performance. Performance can for example be measured by memory footprint, speed of execution, bandwith consumption, framerate, maintainability, supportability and so on. Performance usually means spending as little as possible of the most scarce resource.
When applied to networking, brevity IS performance. If your webserver serves a little javascript snippet on every page, it doesn't exactly hurt to keep the variable names short. Pull up www.google.com and view source!
Some times DRY is not helping performance. An example is that Microsoft has found that they don't want a to loop through an array unless it is bigger than 3 elements.
String.Format has signatures for one, two and three arguments, and then for array.
There are many ways of trading one aspect for another. This is usually called caching.
You can for example trade memory footprint for speed of execution. For example by doing lookup instead of execution. It is just a matter of replacing () with [] in most popular languages. If you plan it so that the spaceship in your game can only go in a fixed number of directions, you can save on trigonometric function calls.
Or you can use a proxy server with a cache for looking up things over a network. DNS servers do this all the time.
Finally, if development team availability is the most scarce resource, clarity of code is the best bet for maintainability performance, even if it doesn't run quite as fast or is quite as interesting or "elegant" in code.
Absolutely not. Code size and performance (however you measure it) are only very loosly connected. To make matter worse whats a neat trick on one chip/compiler/OS may very well be the worse thing you can do in another archictecture.
Its counter-intuitive but a clear well written simple as possible implmentation is often far more efficient than than a devious bag of tricks. Today's optimizing compilers like clear uncomplicated code just as much as humans and complex trickery can cause them to abandon thier best optimizing strategies.
Writing fewer lines of code tends to be better for a bunch of reasons. For example, the less code you have, the less chance for bugs. See for example Paul Graham's essay, "Succinctness is Power"
Notwithstanding that, the level reached by Code Golf is usually far beyond what makes sense. In Code Golf, people are trying to write code that is as short as possible, even if they know that it's less readable.
Efficiency is a much harder thing to decide. I'm guessing that less code is usually more efficient, but there are many cases where this isn't true.
So to answer the real question, why do we even have Code Golf competitions which aim at a low character count, if that's not a very important thing?
Two reasons:
Making code as short as possible means you have to be both clever, and know a language pretty well to find all kinds of tricks. This makes it a fun riddle.
Also, it's the easiest measure to use for a code competition. Efficiency, for example, is very hard to measure, especially using many different languages, especially since some solutions are more efficient in some cases, but less in others (big input vs small). Readability: that's a very personal thing, which often leads to heated debates.
In short, I don't think there is any way of doing Code Golf style competitions without using "shortness of code" as the criterion.
This is from "10 Commandments for Java Developers"
Keep in Mind - "Less is more" is not always better. - Code efficiency is a great thing, but > in many situations writing less lines of code does not improve the efficiency of that code.
This is (probably) true for all programming languages (though in assembly it could be different).
It makes a difference if you're talking about little academic-style algorithms or real software, which can be thousands of lines of code. I'm talking about the latter.
Here's an example where a reasonably well-written program was speeded up by a factor of 43x, and it's code size was reduced by 4x.
"Code golf" is just squeezing code, like cramming undergraduates into a phone booth. I'm talking about reducing code by rewriting it in a form that is declarative, like a domain-specific-language (DSL). Since it is declarative, it maps more directly onto its requirements, so it is not puffed up with code that exists only for implementation's sake. That link shows an example of doing that.
This link shows a way of reducing size of UI code in a similar way.
Good performance is achieved by avoiding doing things that don't really have to be done. Of course, when you write code, you're not intentionally making it do unnecessary work, but if you do aggressive performance tuning as in that example, you'd be amazed at what you can remove.
The point of code golf is to optimise for one thing (source length), at the potential expense of everything else (performance, comprehensibility, robustness). If you accidentally improve performance that's a fluke - if you could shave a character off by doubling the runtime, then you would.
You ask "how come then we don't focus more on performance rather than size", but the question is based on a false premise that programmers focus more on code size than on performance. They don't, "code golf" is a minority interest. It's challenging and fun, but it's not important. Look at the number of questions tagged "code-golf" against the number tagged "performance".
As other people point out, making code shorter often means making it simpler to understand, by removing duplication and opportunities for obscure errors. That's usually more important than running speed. But code golf is a completely different thing, where you remove whitespace, comments, descriptive names, etc. The purpose isn't to make the code more comprehensible.

Resources