Escape sequences \033[01;36m\] vs. \033[1;36m\] in PS1 in .bashrc: why the zero? - bash

I've just compared the $PS1 prompts in .bashrc on two of my Debian machines:
PS1='${debian_chroot:+($debian_chroot)}\[\033[01;36m\]\u\[\033[0;90m\]#\[\033[0;32m\]\h\[\033[0;90m\]:\[\033[01;34m\]\w\[\033[0;90m\]\$\[\033[0m\] '
PS1='${debian_chroot:+($debian_chroot)}\[\033[1;36m\]\u\[\033[0;37m\]#\[\033[0;32m\]\h\[\033[0;37m\]:\[\033[01;34m\]\w\[\033[0;37m\]\$\[\033[0m\] '
As you see, the first sequence says \033[01;, whereas the second has \033[1; on the same position. Do both mean the same (I guess, bold) or do they mean something different? Any idea why the zero has appeared or disappeared? I have no recollection of having introduced/removed this zero myself. A Web search returns numerous occurrences both with and without zero.

"ANSI" numeric parameters are all decimal integers (see ECMA-48, section 5.4.1 Parameter Representation). In section 5.4.2, it explains
A parameter string consists of one or more parameter sub-strings, each of which represents a number
in decimal notation.
A leading zero makes no difference. Someone noticed the unnecessary character and trimmed it.

the ESC[#;#m escape is for the console font color. I've seen many subtle variations on escape implementations, so I'm not surprised. Regardless I think both should be interpreted the same way

Related

Does it exist some kind of sorting convention?

Does it exist some established convention of sorting lines (characters)? Some convention which should play the similar role as PCRE for regular expressions.
For example, if you try to sort 0A1b-a2_B (each character on its own line) with Sublime Text (Ctrl-F9) and Vim (:%sort), the result will be the same (see below). However, I'm not sure it will be the same with another editors and IDEs.
-
0
1
2
A
B
_
a
b
Generally, characters are sorted based on their numeric value. While this used to only be applied to ASCII characters, this has also been adopted by unicode encodings as well. http://www.asciitable.com/
If no preference is given to the contrary, this is the de facto standard for sorting characters. Save for the actual alphabetical characters, the ordering is somewhat arbitrary.
There are two main ways of sorting character strings:
Lexicographic: numeric value of either the codepoint values or the code unit values or the serialized code unit values (bytes). For some character encodings, they would all be the same. The algorithm is very simple but this method is not human-friendly.
Culture/Locale-specific: an ordinal database for each supported culture is used. For the Unicode character set, it's called the CLDR. Also, in applying sorting for Unicode, sorting can respect grapheme clusters. A grapheme cluster is a base codepoint followed by a sequence of zero or more non-spacing (applied as extensions of the previous glyph) marks.
For some older character sets with one encoding, designed for only one or two scripts, the two methods might amount to the same thing.
Sometimes, people read a format into strings, such as a sequence of letters followed by a sequence of digits, or one of several date formats. These are very specialized sorts that need to be applied where users expect. Note: The ISO 8601 date format for the Julian calendar sorts correctly regardless of method (for all? character encodings).

In a trie for inserting and searching normal paths, are ascii 1-31 worth considering?

I am working on a trie data structure which inserts and searches for normal paths.
A path can contain any character from unicode, so in order to represent it completely in utf-8, the array in trie needs to contain next nodes for all 256 ascii.
But I am also concerned about the space and insertion time taken by trie.
The conditions under which my trie is setup rarely would insert a character of unicode(I mean 128-255 ascii). So I just put an if condition to reject paths which contain above ascii 127. I don’t think the ascii 1-31 are relevant either, although I am unsure about this. As 1-31 chars are like carriage return, esc etc, can I simply continue the loop without inserting them? Like is it possible to encounter paths that are actually differentiable because of ascii 1-31 in a real scenario?
Answering this old question, on macOS ascii 13 is used to represent custom icons which may appear in many paths. Thanks to #EricPostpischil who told that in comments.
All other characters ranging between 1-31 appear pretty less in paths.
Also, macOS users mostly have a case-insensitive path, so generally considering both lowercase and uppercase is also useless.
PS:
Although this question seems to be opinion based, but it actually isn't because it can be answered quite concisely. It attempts to ask for frequency of appearance of characters in paths on macOS. (sorry for the confusing title, I was a noob that time, changing it now will make all comments on it absurd)

Simple to enter Unicode character that would sort after Z in most cases?

As you probably know most symbols are sorted before the alphabetical letters.
I am looking for one character that is easy to enter from the keyboard that would be sorted after "z" by most sort implementations.
If this is also an ASCII character, the better :)
Any ideas?
On a Mac, these are the only characters I can type using a US keyboard (with and without shift and option modifiers) that sort below Z and z:
Ω (option+z)
π (option+p)
µ (option+m)
 (shift+option+k)
It seems like omega and then pi are the best options for cross-platform compatibility.
On Windows, none of these options work because they all sort before A.
A solution I ended up using is an Arabic character:
ٴ This folder comes after z in windows
Source
A Tilde '~' is ASCII code 126.
This comes after all the standard English usage characters and would therefore out-sort a 'Z' of any case.
It would not out-sort other special characters, however ASCII or unicode sequencing is not sufficient to cover international sorts in any context.
Example: internationisation in javascript
Xi "Ξ" works nicely!
On Mac: Ctrl+Cmd+Space, then type "xi".
The answers provided here that worked for me:
Ξ _Greek capital letter XI (per tonystar's answer);
π _Greek small letter PI (per DaveC's answer);
Ω _Greek capital letter OMEGA (per DaveC's answer);
µ _international symbol for micrometre, previously and AKA micron (per DaveC's answer);
ٴ _Arabic letter ٴ(unidentified) (per degenerate's answer);
ﻩ _Arabic letter HEH isolated form (per Bytee's answer);
Notes:
using macOS 10.14.2.
a tilde ~ always displays before numbers in an ascending sort.
In macOS Numbers (spreadsheet) app, sort (ascending) displays as follows:
0
9
a
z
µ
Ξ
π
Ω
ٴ
ﻩ
Perhaps worth mentioning that the last two Arabic letters ٴ (unknown) and ﻩ (HEH) are difficult to edit (not as expected) in Numbers.
In macOS Finder, sort (ascending) displays as follows:
ٴ (appears as a narrow 'blank' space at the beginning of the file name)
0
9
a
z
µ
Ξ
π
Ω
ﻩ (appears at the end of the file name in display, at the beginning during edit)
Late to the party, but I was tearing my hair out to find a character that sorted last that wouldn't tweak my OCD either. Finally found this Arabic character "ﻩ" sorts after z. Putting one on either side of the folder name like so...
ﻩ Odds & Ends ﻩ
...looks rather pretty to me, so maybe it'll work for you all too!
If you want to do it somehow invisible you can use no-break-space ascii code:
Windows: ALT+0160 (only works with numpad)
I'm trying to do this to my Amazon Wishlists. None of the suggestions here have worked (I have tried, Ω, Ξ, ~).
I ended up using zzz_
It looks like a sort with LC_ALL=C sorts by ascii value, so {|}~ and DEL will come after z.
% echo $'a\nA\n1\n#\n~' | LC_ALL=C sort
1
#
A
a
~
It appears this is default when LC_ALL is not set on mac/bsd sort but must be explicitly set for gnu sort.

Atom escaping rules in Prolog

I need to export to a file a Prolog program expressed using an arbitrary term representation in Java. The idea is that a Prolog interpreter should be able to consult the generated file afterwards.
My question is about the correct way to write in the file Java Strings representing atom terms.
For example, if the string has a space in the middle, it should be surrounded by single quotes in the file:
hello world becomes 'hello world'
And the exporter should take into consideration characters that should be escaped:
' becomes '\''
Could someone point me to the place were these rules are specified?, and: Can I assume that these rules are respected by major Prolog implementors? (I mean, a Prolog program generated following these rules would be correctly parsed by most Prolog interpreters?).
The precise place for this is the standard, ISO/IEC 13211-1:1995, quoted_token (* 6.4.2 *). See this answer how to get it for USD 30.
The precise syntax is quite complex due to a lot of extras like continuation lines and the like. If you are only writing atoms that should be read by Prolog, things are a bit easier. Also in that situation, you could always quote, which makes writing again a bit simpler.
Some things to be aware of:
Only simple spaces may occur as layout in a quoted atom. All other spaces need to be escaped like \t, \n (abrftnv). Many systems accept also other layout but they differ to each other in very tiny details.
Backslash and quote must be escaped.
Characters outside the printable ASCII range depend on the PCS supported by a system. In a conforming system, the accompanying documentation should define how the additional characters (extended characters) are classified. Documentation quality varies on a wide range.
In any case, test your interface also with GNU-Prolog from 1.4.1 upwards. To date, no differences are known between GNU 1.4.1+ and the standard as far as syntax is concerned.
Here are some 240+ syntax related test cases. Please report any oversight!
A practical hint: if you issue a writeq with your Prolog, with data you need to know about, you'll get quotes around when required.

Hidden Whitespace Best Practice

I think most people agree that trailing whitespace is not good practice. A lot of editors will display it for you or automatically strip it out.
Consider this Python function as a simple example:
The extra whitespace on lines 11 and 13 are wrong. What I'm wondering about is line 10. Should a blank line inside a control block that doesn't change indentation have leading whitespace?
Most editors I've used will keep the cursor at the indentation level from the preceding line, so making a blank line without leading whitespace takes some extra formatting. What's the best practice? Should line 10 have leading whitespace or not?
When it comes to code execution it makes absolutely zero difference; the practice I have seen the most in python IS the one with white spaces, but I don't think anyone can really reasonably say one is objectively better than the other.
I'll try to answer your question sticking to your Python example, quoting their style guide.
From PEP-8:
Method definitions inside a class are separated by a single blank line.
From Wikipedia (blank line):
A blank line usually refers to a line containing zero characters (not counting any end-of-line characters); though it may also refer to any line that does not contain any visible characters (consisting only of whitespace).
If you believe the Wikipedia definition, you might ask why zero characters is preferred.
For one it's simpler, even if autoindent is turned on in your editor, you're just filling your file with extra bytes for no good reason.
Second, a regex for a zero character blank line is simpler as well, '^$' usually vs something like '^\s*$'.
As other answers have pointed out, it makes no difference execution wise to put in whitespace. With no good reason to do so, I would say the best practice is to leave it out and keep it simple. Can you imagine a situation where a zero character line would be treated differently than a line with some whitespace? I would hate to program in that language. Putting in whitespace seems baroque to me.
As #PinkElephantsOnParade writes, it makes no difference for the execution. Thus, it's solely a matter of personal aesthetical preference.
I myself set my editor to display trailing whitespace, since I think trailing whitespace is a bad idea. Thus, your line 10 would be highlighted and stare me in the face all the time.
Nobody wants that, so I'd argue line 10 should not contain whitespace. (Which, coincidentally is how my editor, emacs, handles this automatically.)
To each his own, but one way to look at it is: code style is about human readability. Therefore, trailing white-space is only an issue if it extends the length of the line past some preexisting (self-imposed) limit (ex. 80 char limit).
On the other hand, if you consistently display white-space in your editor, and this matters to you, I personally would keep it there, as it would be (for some, at least) more efficient to have the white-space present; if you decide to add code at that line at some point, you won't have to add additional white-space.

Resources