Shell # inside (( )) - shell

I am new to shell. I am not quite understand the following function. This function basically increase the hour by 1.
I am wondering why the developer put "10#" in front of $g_current_hour+1. From my understanding, dose # in shell means comments?
f_increment_hour() {
g_next_hour=$((10#$g_current_hour+1))
}

Everything depends on the context. Here 10# means base 10.
Constants with a leading 0 are interpreted as octal numbers. A
leading 0x or 0X denotes hexadecimal. Otherwise, numbers take the
form [base#]n, where the optional base is a decimal number between
2 and 64 representing the arithmetic base, and n is a number in that
base. If base# is omitted, then base 10 is used.

'#' will be interpreted as part of a token unless it is preceded by a space, newline, or semi-colon.
(or any other non-word symbol)
Section 2.3 "Token recognition" of the language spec, states:
7. If the current character is an unquoted <newline>, the current
token shall be delimited.
8. If the current character is an unquoted <blank>, any token
containing the previous character is delimited and the current
character shall be discarded.
9. If the previous character was part of a word, the current character
shall be appended to that word.
10. If the current character is a '#' , it and all subsequent characters
up to, but excluding, the next <newline> shall be discarded as
a comment. The <newline> that ends the line is not considered
part of the comment.
When the shell is parsing its input and reads "foo#bar", as it is processing the '#' character it applies rule 9 and appends the # to the token. Once rule 9 is applied, it stops checking and rule 10 is never considered. If the character preceding the '#' is whitespace, then rule 9 does not apply, so rule 10 is checked and a comment is started.
In other words, a '#' only starts a comment if the character preceded it is not part of a word ( eg whitespace or semi-colon), so "foo#bar" is one token, and not "foo" followed by a comment, but "foo #bar" is the token "foo" followed by a comment.

Related

What exactly expands the KERN_INFO, and where it is implemented?

In this question: Why doesn't the function printk() use a comma to separate parameters?, someone said KERN_INFO expands to ""\001" "6". I know the first \0 is null character, but then what 01 is? As I suppose to be one in octal. When preprocessor concatenate it together to "\0016", the rest after null is 016, which is 14 in decimal. So I have look up in ascii and found it as 0E SO (shift out)? That doesn't make sense to me and it should have something to do with logging (as it is purpose of printk). So what is the meaning of the KERN_INFO macro sequences after expansion?
Also, I have tried to look in source, in /usr/include/linux/kernel.h, but didn't find there the macro. So is it in kernel.h or somewhere else?
"\001" "6" is two string literals that will be concatenated (with any other adjacent string literals) into a single string literal. (The concatenation is done at translation phase 6 as defined in the C standard.)
The first of those string literals, "\001" contains a single octal escape sequence, defining a single character. An octal escape sequence in a string literal or a character constant consists of the backslash (\) followed by from 1 to 3 octal digits (001 in this case). In this case, the single character has numeric code 1, which corresponds to the ASCII SOH (start of heading) character.
The string literal "\0016" contains sequences for two characters '\001' and '6', because an octal escape sequence is always terminated after at most 3 octal digits.
Escape sequences do not cross the boundary between adjacent string literals. (Escape sequences are expanded at translation phase 3, so are already expanded before adjacent string literals are concatenated at translation phase 6). Therefore, the pair of string literals "\1" "6" is equivalent (after concatenation) to the single string literal "\0016", not "\16".
As mentioned by #Peter L., the KERN_INFO macro and other "kernel level" macros are defined in "include/linux/kern_levels.h" in the Linux kernel source. Actually, that is true since kernel version 3.6. Before kernel version 3.6, they were defined in "include/linux/printk.h" and used a different string format with the kernel level number specified between angle brackets (for example KERN_INFO used to be defined as "<6>").
The purpose of these kernel level macros is to prefix the format string parameter of the printk function with special codes to designate the log-level to use for the message written to the kernel log (apart from KERN_CONT which specifies that the message is to be appended to the previous message).

Regex matching plus or minus

Could someone please look at the following function and explain the regex for me as I don't understand it and I don't like using something I don't understand as then I won't be able to replicate it for use in the future and nor do I learn from it.
Also can someone explain the double !! in front, I know single means not so does double mean not "not"?
The function is a patch to String to check if it's capable of being converted to an integer or not.
class String
def is_i?
!!(self =~ /\A[-+]?[0-9]+\z/)
end
end
The main thing that's giving me trouble is [-+] as it makes little sense to me, if you could explain in the context given it would be very helpful.
EDIT:
Since people missed the second part of the question I'll be a little more explicit.
What does !! Mean in front of the check, I know a single ! means NOT but I can't find what !! means.
The [-+] Character Class
[-+] is a character class. It means "match one character specified by the class", i.e. - or +.
Hyphens in Character Classes
I can see how this particular class can be confusing because the hyphen often plays a special role in a character class: it links two characters to form a character range. For instance, [a-z] means "match one character between a and z, and [a-z0-9] means "match one character between a and z or between 0 and 9.
However, in this case, the hypen in [-+] is positioned in a place where it cannot be used to specify a range, and the - is just a literal hyphen.
Decoding the entire expression
Assert position at the beginning of the string \A
Match a single character from the list “-+” [-+]?
Between zero and one times, as many times as possible, giving back as needed (greedy) ?
Match a single character in the range between “0” and “9” [0-9]+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Assert position at the very end of the string \z
A Character Class defines a set of characters, any one of which can occur in a string for a match to succeed.
For example, the regular expression [-+]?[0-9]+ will match 123, -123, or +123 because it defines a character class (accepting either -, +, or neither one) as its first character.
In context:
\A asserts position at the start of the string.
[-+] any character of: - or + (? optional, meaning between zero and one time)
[0-9] any character of: 0 to 9 (+ quantifier meaning 1 or more times)
\z asserts position at the very end of the string.
What does !! mean?
!! placed together converts the value to a boolean.
explain the regex for me as I don't understand it
Pattern explanation: \A[-+]?[0-9]+\z
\A Start of string
[-+]? plus or minus sign [zero or one time (optional)]
[0-9]+ 0 to 9 any digit [one or more times]
\z End of string
The above regex pattern is able to match any positive and negative integer number that has + or - sign optional.
Read more about Character Classes and test your regex pattern online at Rubular

Syntax Highlighting in Notepad++: how to highlight timestamps in log files

I am using Notepad++ to check logs. I want to define custom syntax highlighting for timestamps and log levels. Highlighting logs levels works fine (defined as keywords). However, I am still struggling with highlighting timestamps of the form
06 Mar 2014 08:40:30,193
Any idea how to do that?
If you just want simple highlighting, you can use Notepad++'s regex search mode. Open the Find dialog, switch to the Mark tab, and make sure Regular Expression is set as the search mode. Assuming the timestamp is at the start of the line, this Regex should work for you:
^\d{2}\s[A-Za-z]+\s\d{4}\s\d{2}:\d{2}:\d{2},[\d]+
Breaking it down bit by bit:
^ means the following Regex should be anchored to the start of the line. If your timestamp appears anywhere but the start of a line, delete this.
\d means match any digit (0-9). {n} is a qualifier that means to match the preceding bit of Regex exactly n times, so \d{2} means match exactly two digits.
\s means match any whitespace character.
[A-Za-z] means match any character in the set A-Z or the set a-z, and the + is a qualifier that means match the preceding bit of Regex 1 or more times. So we're looking for an alphabetic character sequence containing one or more alphabetic characters.
\s means match any whitespace character.
\d{4} is just like \d{2} earlier, only now we're matching exactly 4 digits.
\s means match any whitespace character.
\d{2} means match exactly two digits.
: matches a colon.
\d{2} matches exactly two digits.
: matches another colon.
\d{2} matches another two digits.
, matches a comma.
[\d]+ works similarly to the alphabetic search sequence we set up earlier, only this one's for digits. This finds one or more digits.
When you run this Regex on your document, the Mark feature will highlight anything that matches it. Unlike the temporary highlighting the "Find All in Document" search type can give you, Mark highlighting lasts even after you click somewhere else in the document.

Understanding negative look aheads in regular expressions

I want to match urls that do NOT contain the string 'localhost' using Ruby regex
Based on answers and comments here, I put together two solutions, both of which seem to work:
Solution A:
(?!.*localhost)^.*$
Example: http://rubular.com/r/tQtbWacl3g
Solution B:
^((?!localhost).)*$
Example: http://rubular.com/r/2KKnQZUMwf
The problem is that I don't understand what they're doing. For example, according to the docs, ^ can be used in various ways:
[^abc] Any single character except: a, b, or c
^ Start of line
But I don't get how it's being applied here.
Can someone breakdown these expressions for me, and how they differ from one another?
In both of your cases, ^ is just the start of the line (since it's not used inside a character class). Since both ^ and the lookahead are zero-width assertions, we can switch them around in the first case - I think that makes it a bit easier to explain:
^(?!.*localhost).*$
The ^ anchors the expression to the beginning of the string. The lookahead then starts from that position and tries to find localhost anywhere the string (the "anywhere" is taken care of by the .* in front of localhost). If that localhost can be found, the subexpression of the lookahead matches and therefore the negative lookahead causes the pattern to fail. Since the lookahead is bound to start at the beginning of the string by the adjacent ^ this means, the pattern overall cannot match. If, however the .*localhost does not match (and hence localhost does not occur in the string), the lookahead succeeds, and the .*$ simply takes care of matching the rest of the string.
Now the other one
^((?!localhost).)*$
This time the lookahead only checks at the current position (there is no .* inside it). But the lookahead is repeated for every single character. This way it does check every single position again. Here is roughly what happens: the ^ makes sure that we're starting at the beginning of the string again. The lookahead checks whether the word localhost is found at that position. If not, all is well, and . consumes one character. The * then repeats both of those steps. We are now one character further in the string, and the lookahead checks whether the second character starts the word localhost - again, if not, all is well, and . consumes another character. This is done for every single character in the string, until we reach the end.
In this particular case both methods are equivalent, and you could select one based on performance (if it matters) or readability (if not; probably the first one). However, in other cases the second variant is preferable, because it allows you to do this repetition for a fixed part of the string, whereas the first variant will always check the entire string.
You can get them easily explained online. The first:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
localhost 'localhost'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
--------------------------------------------------------------------------------
' '
And the second:
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
localhost 'localhost'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
--------------------------------------------------------------------------------
As an aside comment, these two solutions are slow. A better way is to use:
^(?:[^l]+|l(?!ocalhost))+
In other words: all characters that are not a l or a l not followed by ocalhost
This will give you a better result since you don't have to check each positions. (For an url like http://localhost:1234/toto this kind of pattern will fail in ~15 steps vs ~50 steps for the two other patterns)
You can improve this pattern using atomic groups and possessive quantifiers to forbid backtracks:
^(?>[^l]++|l(?!ocalhost))++
Note that in your particular case you can speed up your pattern considering that you only want to check the host part of the url. Example:
^http:\/\/(?>[^l\s\/]++|l(?!ocalhost))++(?>\/\S*+|$)
according to the docs, ^ can be used in various ways:
[^abc] Any single character except: a, b, or c
^ Start of line
But I don't get how it's being applied here.
In the regex
(?!.*localhost)^.*$
The ^ is not inside any brackets, so the second one applies. Here is a trivial example:
/^x/
That regex says to match the start of the line, followed by the letter x. So it will match lines like this:
xcellent
x-ray
However, the regex will not match the lines:
axb
excellent
...because the x does not appear directly after the start of the line. You may wonder why 'axb' doesn't match. After all 'a' is the start of the line, and it is followed by an 'x'. However, 'start of the line' is just to the left of the first character, like this:
|
V
axb
^ is called a zero-width match because it matches the slim sliver just to the left of the 'a', e.g. between the starting quote mark and the 'a' in "axb". There's not really any space there, so ^ matches something that is 0 width.
Here is another example:
/x^/
That says to match the character x followed by the start of the line. Well, no line can have an x first and then the start of the line second, so that won't ever match anything.
Now your regex:
(?!.*localhost)^.*$
Like the 'start of line' ^, a lookahead is zero-width. What that means is that the lookahead scans the string looking for the match, but when it finds the match, it comes back to the beginning of the string, and then looks for the rest of the regex:
^.*$
One word of advice, when a regex requires lookarounds(lookaheads or lookbehinds), 99% of the time there are easier ways to do what you want. For instance, you could write:
url = "....."
if url.index('http') == 0
#then the line starts with 'http'
else
#the line doesn't start with http
end
That's much easier to read, and it doesn't require trying to decipher a complex regex.

Regex that allows for A-z, 0-9, and dashing in the middle, never on the ends?

I'm working to create a ruby regex that meets the following conditions:
Supported:
A-Z, a-z, 0-9, dashes in the middle but never starting or ending in a dash.
At least 5, no more than 500 characters
So far I have:
[0-9a-z]{5,500}
Any suggestions on how to update to meet the criteria above?
Thanks
[A-Za-z\d][-A-Za-z\d]{3,498}[A-Za-z\d]
If you are willing to treat _ as a letter also, it's even simpler:
\w[-\w]{3,498}\w
This should work:
[0-9A-Za-z][0-9A-Za-z-]{3,498}[0-9A-Za-z]
Here you go:
/^[0-9A-Za-z][0-9A-Za-z\-]{3,498}[0-9A-Za-z]$/
or if you want the beginning and end to be only 0-9,A-Z,a-z (instead of non dash) then:
Explanation:
The first ^ matches beginning of string.
The next [] matches a A-Z,a-z,0-9
The next [] matches 3 to 498 chars of A-Z,a-z,0-9,dash. Note that we match 3 to 498 chars because we match one char in the beginning and one in the end.
The next [^] is again a A-Z,a-z,0-9.
And lastly we match $ for the end of the string.
This assumes that there are either always dashes or never dashes. It also assumes only one dash is allowed between alphanumeric characters. It's the only way I can think off hand to limit characters instead of number of instances of the string.
(([0-9a-zA-Z]{4,499})|([0-9a-zA-Z][\d]?){2,249})[0-9a-zA-Z]
Assuming there's no limit to the number of adjacent dashes allowed, this would work:
[0-9a-zA-Z][0-9a-zA-Z\d]{3,498}[0-9a-zA-Z]

Resources