Hidden Whitespace Best Practice - coding-style

I think most people agree that trailing whitespace is not good practice. A lot of editors will display it for you or automatically strip it out.
Consider this Python function as a simple example:
The extra whitespace on lines 11 and 13 are wrong. What I'm wondering about is line 10. Should a blank line inside a control block that doesn't change indentation have leading whitespace?
Most editors I've used will keep the cursor at the indentation level from the preceding line, so making a blank line without leading whitespace takes some extra formatting. What's the best practice? Should line 10 have leading whitespace or not?

When it comes to code execution it makes absolutely zero difference; the practice I have seen the most in python IS the one with white spaces, but I don't think anyone can really reasonably say one is objectively better than the other.

I'll try to answer your question sticking to your Python example, quoting their style guide.
From PEP-8:
Method definitions inside a class are separated by a single blank line.
From Wikipedia (blank line):
A blank line usually refers to a line containing zero characters (not counting any end-of-line characters); though it may also refer to any line that does not contain any visible characters (consisting only of whitespace).
If you believe the Wikipedia definition, you might ask why zero characters is preferred.
For one it's simpler, even if autoindent is turned on in your editor, you're just filling your file with extra bytes for no good reason.
Second, a regex for a zero character blank line is simpler as well, '^$' usually vs something like '^\s*$'.
As other answers have pointed out, it makes no difference execution wise to put in whitespace. With no good reason to do so, I would say the best practice is to leave it out and keep it simple. Can you imagine a situation where a zero character line would be treated differently than a line with some whitespace? I would hate to program in that language. Putting in whitespace seems baroque to me.

As #PinkElephantsOnParade writes, it makes no difference for the execution. Thus, it's solely a matter of personal aesthetical preference.
I myself set my editor to display trailing whitespace, since I think trailing whitespace is a bad idea. Thus, your line 10 would be highlighted and stare me in the face all the time.
Nobody wants that, so I'd argue line 10 should not contain whitespace. (Which, coincidentally is how my editor, emacs, handles this automatically.)

To each his own, but one way to look at it is: code style is about human readability. Therefore, trailing white-space is only an issue if it extends the length of the line past some preexisting (self-imposed) limit (ex. 80 char limit).
On the other hand, if you consistently display white-space in your editor, and this matters to you, I personally would keep it there, as it would be (for some, at least) more efficient to have the white-space present; if you decide to add code at that line at some point, you won't have to add additional white-space.

Related

Escape sequences \033[01;36m\] vs. \033[1;36m\] in PS1 in .bashrc: why the zero?

I've just compared the $PS1 prompts in .bashrc on two of my Debian machines:
PS1='${debian_chroot:+($debian_chroot)}\[\033[01;36m\]\u\[\033[0;90m\]#\[\033[0;32m\]\h\[\033[0;90m\]:\[\033[01;34m\]\w\[\033[0;90m\]\$\[\033[0m\] '
PS1='${debian_chroot:+($debian_chroot)}\[\033[1;36m\]\u\[\033[0;37m\]#\[\033[0;32m\]\h\[\033[0;37m\]:\[\033[01;34m\]\w\[\033[0;37m\]\$\[\033[0m\] '
As you see, the first sequence says \033[01;, whereas the second has \033[1; on the same position. Do both mean the same (I guess, bold) or do they mean something different? Any idea why the zero has appeared or disappeared? I have no recollection of having introduced/removed this zero myself. A Web search returns numerous occurrences both with and without zero.
"ANSI" numeric parameters are all decimal integers (see ECMA-48, section 5.4.1 Parameter Representation). In section 5.4.2, it explains
A parameter string consists of one or more parameter sub-strings, each of which represents a number
in decimal notation.
A leading zero makes no difference. Someone noticed the unnecessary character and trimmed it.
the ESC[#;#m escape is for the console font color. I've seen many subtle variations on escape implementations, so I'm not surprised. Regardless I think both should be interpreted the same way

In a trie for inserting and searching normal paths, are ascii 1-31 worth considering?

I am working on a trie data structure which inserts and searches for normal paths.
A path can contain any character from unicode, so in order to represent it completely in utf-8, the array in trie needs to contain next nodes for all 256 ascii.
But I am also concerned about the space and insertion time taken by trie.
The conditions under which my trie is setup rarely would insert a character of unicode(I mean 128-255 ascii). So I just put an if condition to reject paths which contain above ascii 127. I don’t think the ascii 1-31 are relevant either, although I am unsure about this. As 1-31 chars are like carriage return, esc etc, can I simply continue the loop without inserting them? Like is it possible to encounter paths that are actually differentiable because of ascii 1-31 in a real scenario?
Answering this old question, on macOS ascii 13 is used to represent custom icons which may appear in many paths. Thanks to #EricPostpischil who told that in comments.
All other characters ranging between 1-31 appear pretty less in paths.
Also, macOS users mostly have a case-insensitive path, so generally considering both lowercase and uppercase is also useless.
PS:
Although this question seems to be opinion based, but it actually isn't because it can be answered quite concisely. It attempts to ask for frequency of appearance of characters in paths on macOS. (sorry for the confusing title, I was a noob that time, changing it now will make all comments on it absurd)

Count Number of Sentence Ruby

I happened to search around everywhere and did not managed to find a solution to count number of sentence in a String using Ruby. Does anyone how to do it?
Example
string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
This string should return number 4.
You can split the text into sentences and count them. Here:
string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4
Explaining regex:
[^\.!?]
Caret inside of a character class [^ ] is the negation operator. Which means we are looking for characters which are not present in list: ., ! and ?.
+
is a greedy operator that returns matches between 1 and unlimited times. (capturing our sentences here and ignoring repetitions like ...)
[\.!?]
matching characters ., ! or ?.
In a nutshell, we are capturing all characters that are not ., ! or ? till we get characters that are ., ! or ?. Which basically can be treated as a sentence (in broad senses).
I think it makes sense to consider a word char followed by a ?! or . the delimiter of a sentence:
string.strip.split(/\w[?!.]/).length
#=> 4
So I'm not considering the ... a delimiter when it hangs on it's own like that:
"I waited a while ... and then I went home"
But then again, maybe I should...
It also occurs to me that maybe a better delimiter is a punctuation followed by some space and a capital letter:
string.split(/[?!.]\s+[A-Z]/).length
#=> 4
Sentences end with full stops, question marks, and exclamation marks. They can also be
separated with dashes and other punctuation, but we won’t worry about these rare cases here.
The split is simple. Instead of asking Ruby to split the text on one type of character, you simply
ask it to split on any of three types of characters, like so:
txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "
sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7
string.squeeze('.!?').count('.!?')
#=> 4

Atom escaping rules in Prolog

I need to export to a file a Prolog program expressed using an arbitrary term representation in Java. The idea is that a Prolog interpreter should be able to consult the generated file afterwards.
My question is about the correct way to write in the file Java Strings representing atom terms.
For example, if the string has a space in the middle, it should be surrounded by single quotes in the file:
hello world becomes 'hello world'
And the exporter should take into consideration characters that should be escaped:
' becomes '\''
Could someone point me to the place were these rules are specified?, and: Can I assume that these rules are respected by major Prolog implementors? (I mean, a Prolog program generated following these rules would be correctly parsed by most Prolog interpreters?).
The precise place for this is the standard, ISO/IEC 13211-1:1995, quoted_token (* 6.4.2 *). See this answer how to get it for USD 30.
The precise syntax is quite complex due to a lot of extras like continuation lines and the like. If you are only writing atoms that should be read by Prolog, things are a bit easier. Also in that situation, you could always quote, which makes writing again a bit simpler.
Some things to be aware of:
Only simple spaces may occur as layout in a quoted atom. All other spaces need to be escaped like \t, \n (abrftnv). Many systems accept also other layout but they differ to each other in very tiny details.
Backslash and quote must be escaped.
Characters outside the printable ASCII range depend on the PCS supported by a system. In a conforming system, the accompanying documentation should define how the additional characters (extended characters) are classified. Documentation quality varies on a wide range.
In any case, test your interface also with GNU-Prolog from 1.4.1 upwards. To date, no differences are known between GNU 1.4.1+ and the standard as far as syntax is concerned.
Here are some 240+ syntax related test cases. Please report any oversight!
A practical hint: if you issue a writeq with your Prolog, with data you need to know about, you'll get quotes around when required.

Why are text editors slow when editing very long lines?

Most text editors are slow when lines are very long. The suggested structure for data storage for text editor seems to be rope, which should be immune to long lines modification. By the way editors are even slow when simply navigating within long lines.
Example :
A single character like 0 repeated 100000 times in PSPad or 1000000 times in Vim on a single line slow the cursor moves when you are at the end of the line. If there is as much bytes in the file but dispatched on multiple lines the cursor is not slowed down at all so I suppose it's not a memory issue.
What's the origin of that issue that is so common ?
I'm mostly using Windows, so may be this is something related to Windows font handling ?
You're probably using a variable-length encoding like utf8. The editor wants to keep track of what column you're in with every cursor movement, and with a variable-length encoding there is no shortcut to scanning every byte to see how many characters there are; with a long line that's a lot of scanning.
I suspect that you will not see such a slowdown with long lines using a single-byte encoding like iso8859-1 (latin1). If you use a single-byte encoding then character length = byte length and the column can be calculated quickly with simple pointer arithmetic. A fixed-length multibyte encoding like ucs-2 should be able to use the same shortcut (just dividing by the constant character size) but the editors might not be smart enough to take advantage of that.
As evil otto suggested, line encoding can force the line to be re-parse and for long lines this causes all sorts of performance issues.
But it is not only encoding that causes the line to be re-parsed.
Tab characters also require a full line scan, since you need to parse the whole line in order to calculate the true cursor location.
Certain syntax highlighting definitions (i.e. block comments, quoted strings etc) also require a full line parse.
You mentioned vim, so I'll assume that's the editor you use. Vim does not use a rope, as described here and here. It uses an array of lines, so your assumption that ropes should be immune to such long lines does not matter because ropes are not used.

Resources