Is there a warning in gcc for octal literals? - gcc

ie. so that if you use any octal literals it give you a warning.
Same question for Microsoft's compiler.
If not are there any other tools to detect octal literals.
(vim seems to have a cool trick where it highlights the first leading
zero a different color, but I'm thinking more of an automated tool).

I don't believe gcc has such a warning. I just ran info gcc (for gcc 4.5.2) and searched for "octal". There were only two occurrences, neither of them useful.
I don't know about Microsoft's compiler.
You could search your source files for a regular expression that matches octal constants. If you have grep, something like this should do the trick (warning: I haven't tested this):
grep '\<0[0-7][0-7]*' foo.c
This matches a 0 followed by one or more digits in the range 0..7, at the beginning of a word. It deliberately does not match 0, which is an octal constant but presumably not one you're worried about. It's likely to give you some false positives, for example in string literals and comments. It will also match a character constant like '\007', which is octal but not as error-prone as an octal integer constant.

(Question is tagged gcc, but text asks about Microsoft's compiler as well.)
I don't think MSVC++ has a warning for general octal literals, but it will warn about suspicious octal escape sequences in string literals. For example "foo \669", will trigger "C4125 decimal digit terminates octal escape sequence" if you have the warning level turned all the way up.

Related

What is the meaning of special character sequences like `\027[0K`?

I found this commit from facebook infer, and I have no idea what \027[0K and \027[%iA means.
What does these special string mean? And (I think) if there are more strings like this, where can I find the full documentation about this?
Those are escape sequences to tell your terminal what to do.
For example, the sequence of characters represented by \027[0K (where \027 is ASCII decimal value for Esc character) tells the terminal to "clear line from cursor to the end."
One helpful document/guide on this subject can be found at https://shiroyasha.svbtle.com/escape-sequences-a-quick-guide-1
The facebook code is copied from another source here, which uses hard-coded formatters imitating termcap (this page gives some background). The original has comments indicating where its information came from.
The formatter uses "%i" for integers. That's a repeat-count for the cursor movement "cursor-up" \033[A
In most languages, \033 (octal) is used for the ASCII escape character. But this source (according to the github analysis) is written in OCaml, and is using the decimal value for the ASCII escape character. According to the OCaml syntax, you could use an octal value like this: \o033
Once you see that the formatting parts (how the escape character is represented, the use of %i to format a number), the rest of this is documented in several places.
The relevant standard is ECMA-48
the termcap (or analogous terminfo) information is in the terminal database.

Semantic meaning of '36_864_7_345ms' as a time literal

Reading the spec for verilog, it appears that
36_864_7_345ms
Is a valid time literal: http://www.ece.uah.edu/~gaede/cpe526/SystemVerilog_3.1a.pdf (see section 2)
Note: decimal_digit is defined as [0-9] in the full IEEE spec.
What is the semantic meaning (if any) of this time literal? Or am I misreading the spec?
Edit:
Looking elsewhere in the spec (section 3.7.9), it appears that the underscore characters are silently discarded. Does the underscore act as an arbitrary seperating character in a similar way as numbers in English (ex. 43,251) have commas to visually separate the numbers? Or is there another meaning altogether?
The spec you quoted from is long since obsolete. Please get the latest from the IEEE where it says in section 5.7.1 Integer literal constants:
The underscore character (_) shall be legal anywhere in a number
except as the first character. The underscore character is ignored.
This feature can be used to break up long numbers for readability
purposes.

ANSI escape characters in gprolog

Trying to print bold and underlined text in prolog but can't write them
write('\033[1mbold\033[0m')
Makes this (expected) error:
syntax error: \ expected in \constant\ sequence
What's the correct way to do it with gprolog ? Maybe with format ?
write('\33\[1mbold\33\[0m').
That is, octal escape sequences (and hexadecimal which start with \x) need to be closed with a \ too. En revanche, a leading zero is not required, but possible. This is in no way specific to GNU, in fact, probably all systems close to ISO Prolog have it.

Are VHDL character substitutions ever used in real life?

VHDL allows the following substitutions, presumably because some computers might not support the vertical bar (or pipe symbol) (|) or the hash (or pound sign / number sign) (#):
case A|B can be written as case A!B
16#fff# can be written as 16:fff:
Any computer nowadays supports the vertical bar and the hash symbol, so I figured nobody uses these substitutions... Until somebody requested support for the exclamation mark.
My question: is this a lone case or are other people also using the exclamation mark as substitute for the vertical bar? Anybody using the colon?
Data point 1: Not me :)
And I've never seen it as far as I recall in any code - nor was I taught it at any point (in fact, this is the first I knew of those substitutions).
I had a quick look in Ashenden's Designer's Guide to VHDL, and the ! alternative is not even mentioned when the | is introduced for case statements.
These are inherited from Ada (in which they are obsolescent since Ada95). The Ada83 Rationale says "For portability reasons, it is possible to write any program in a 56 character subset of the ISO character set." in which ISO character set must be understood as ISO-646, aka ASCII (well ISO-646 has provision for replacing some characters for national variant, ASCII can be understood as the US national variant of ISO-646)
There is a third replacement: % can be use instead of " as string delimitor (both must be replaced).
I seem to remember that EBCDIC is using | or ! for the same code point depending on the variant.

LANG and sed on OSX

In a recent question it was noted that on OSX running sed on a non ascii file gave strange results. For instance if you do (/usr/bin/cal is a random binary file)
sed 's/[^A-Z]//' /usr/bin/cal
sed will remove all of the printable characters other than A-Z, but many nonprintable characters remain. If however, you do
LANG='' sed 's/[^A-Z]//' /usr/bin/cal
only A-Z (and newlines) are output. Why?
Normally LANG=en-US.UTF-8 What is going on? I cannot see anyway that the output of sed could be considered correct in UTF-8. Is it broken, or is there some notion of working that I do not understand?
I know that the OSX sed is conforming to POSIX, and is therefore different from the beloved GNU sed.
Binary data, such as the contents of /usr/bin/cal, are not UTF-8, and so will confuse any code that reads it as if it was. In particular, any byte with the high bit set (e.g., >= 128) will be interpreted as part of a multi-byte sequence representing a single character, and will thus be elided from the output. Not all sequences of bytes with the high-bit set are valid UTF-8, so things will get quite confused, but this probably explains why some non-printable characters remain but (possibly) not others.
In short: if you want to use text-oriented tools on binary data, don't.

Resources