What does the colon operator in an expr do? - shell

I am looking at a shell script someone wrote and they wrote this:
expr "$myvariable" : '0*$'
What does this mean?

The colon : in an expr is a matching operator. The strings on the left is tested against a regex on the right. Whatever matches is caught by the regex group ( ). The regex replaces itself with a count of characters, then the : acts kind of like an array indexing operation - you get the left-side string from that location onward (like somestring[n:] in Python)
The '0*$' is matching a zero, any stuff, end of line. I don't know what's in myvariable, but I guess no fireworks unless its value starts with a zero character.
An example of a similar line of shell script - except in assigning the result to a variable, and different specific strings - is explained in http://docstore.mik.ua/orelly/unix3/upt/ch36_23.htm (from O'Reilly Unix Power Tools)
See also http://pubs.opengroup.org/onlinepubs/7908799/xcu/expr.html about halfway down

Related

Finding number range with grep

I have a database in this format:
username:something:UID:something:name:home_folder
Now I want to see which users have a UID ranging from 1000-5000. This is what what I tried to do:
ypcat passwd | grep '^.*:.*:[1-5][0-9]\{2\}:'
My thinking is this: I go to the third column and find numbers that start with a number from 1-5, the next number can be any number - range [0-9] and that range repeats itself 2 more times making it a 4 digit number. In other words it would be something like [1-5][0-9][0-9][0-9].
My output, however, lists even UID's that are greater than 5000. What am I doing wrong?
Also, I realize the code I wrote could potentially lists numbers up to 5999. How can I make the numbers 1000-5000?
EDIT: I'm intentionally not using awk since I want to understand what I'm doing wrong with grep.
There are several problems with your regex:
As Sundeep pointed out in a comment, ^.*:.*: will match two or more columns, because the .* parts can match field delimiters (":") as well as field contents. To fix this, use ^[^:]*:[^:]*: (or, equivalently, ^\([^:]:\)\{2\}); see the notes on bracket expressions and basic vs extended RE syntax below)
[0-9]\{2\} will match exactly two digits, not three
As you realized, it matches numbers starting with "5" followed by digits other than "0"
As a result of these problems, the pattern ^.*:.*:[1-5][0-9]\{2\}: will match any record with a UID or GID in the range 100-599.
To do it correctly with grep, use grep -E '^([^:]*:){2}([1-4][0-9]{3}|5000):' (again, see Sundeep's comments).
[Added in edit:]
Concerning bracket expressions and what ^ means in them, here's the relevant section of the re_format man page:
A bracket expression is a list of characters enclosed in '[]'. It
normally matches any single character from the list (but see below).
If the list begins with '^', it matches any single character (but see
below) not from the rest of the list. If two characters in the list
are separated by '-', this is shorthand for the full range of
characters between those two (inclusive) in the collating sequence,
e.g. '[0-9]' in ASCII matches any decimal digit.
(bracket expressions can also contain other things, like character classes and equivalence classes, and there are all sorts of special rules about things like how to include characters like "^", "-", "[", or "]" as part of a character list, rather than negating, indicating a range, class, or end of the expression, etc. It's all rather messy, actually.)
Concerning basic vs. extended RE syntax: grep -E uses the "extended" syntax, which is just different enough to mess you up. The relevant differences here are that in a basic RE, the characters "(){}" are treated as literal characters unless escaped (if escaped, they're treated as RE syntax indicating grouping and repetition); in an extended RE, this is reversed: they're treated as RE syntax unless escaped (if escaped, they're treated as literal characters).
That's why I suggest ^\([^:]:\)\{2\} in the first bullet point, but then actually use ^([^:]*:){2} in the proposed solution -- the first is basic syntax, the second is extended.
The other relevant difference -- and the reason I switched to extended for the actual solution -- is that only extended RE allows | to indicate alternatives, as in this|that|theother (which matches "this" or "that" or "theother"). I need this capability to match a 4-digit number starting with 1-4 or the specific number 5000 ([1-4][0-9]{3}|5000). There's simply no way to do this in a basic RE, so grep -E and the extended syntax are required here.
(There are also many other RE variants, such as Perl-compatible RE (PCRE). When using regular expressions, always be sure to know which variant your regex tool uses, so you don't use syntax it doesn't understand.)
ypcat passwd |awk -F: '$3>1000 && $3 <5000{print $1}'
awk here can go the task in a simple manner. Here we made ":" as the delimiter between the fields and put the condition that third field should be greater than 1000 and less then 5000. If this condition meets print first field.

How to implement Siri/Cortana like functionality in commandline?

I would like to implement a small subset of siri/cortana like features in command line.
For e.g.
$ What is the sum of 100 and 1000
> Response: 1100
$ What is the product of 10 and 12
> Response: 120
The questions are predefined regular expressions. It needs to call the matching function in ruby.
Pattern: What is the sum of (\d)+ and (\d)+
Ruby method to call: sum(a,b)
Any pointers/suggestion is appreciated.
That sounds exactly like cucumber, maybe take a look and see if you can just use their classes to hack something together :) ?
You could do something like the following:
question = gets.chomp
/\A.*(sum |product |quotient |difference )\D+([0-9]+)\D+([0-9]+).*\z/.match question
send($1, $2.to_i, $3.to_i)
Quick explanation for anyone that may be new to matching in Ruby:
This gets a line of input from the command line and scans it for a function name (i.e. sum, product, etc) followed by a space and potentially some non-digit characters. Then, it looks for a first number (similarly followed by a space and 0 or more non-digit characters) and a second number followed by nothing or anything. The parentheses determine what gets assigned to the variables preceded by a $, i.e. the substring that matches the contents of the first set of parentheses gets assigned to $1.
Next, it calls the method whose name is the value of $1 with the arguments (casted to integers) found in $2 and $3.
Obviously, this isn't generalized at all--you're putting the method names in the regex, and it's taking a fixed number of arguments--but it'll hopefully be useful for getting you on the right track.

InDesign Grep: Changing sentence beginnings to Uppercase

I am relatively new to scripting and within an InDesign Script I am trying to change all the first letters of all sentences to uppercase (many of the are lowercase, since I randomly generated the setences from different text sources).
I am so far able to find the text parts with this Grep expression:
\.(\s)+\l
I also found this script by Peter Kahrel, that he shares on InDesign Secrets:
app.findGrepPreferences.findWhat = "^.";
found = app.activeDocument.findGrep();
for (i = 0; i < found.length; i++)
found[i].characters[0].changecase (ChangecaseMode.lowercase);
However, when I now replace the ^. with my own expression, and change lowercase to uppercase, the script does not work, which makes sense, since I do not want to change the first character of my findGrep results, but the last one. But how can I find the last character? The breaks between the sentences have different lengths, so I cannot simply type 2 instead of 0.
Any help would be very appreciated! Thank you!
Edit: I'm working on CS6.
Your GREP returns matches that start with a period, then have any number of spaces (including hard returns, probably), and always end with one lowercase character. So far, so good. You can access the last character (and in fact any last item in any InDesign object collection) in this way:
found[i].characters[-1].changecase (ChangecaseMode.lowercase);
which 'indexes' from the end, rather than from the start.
However! The only character in your matches, other than the period and spaces, is always going to be a lowercase letter. So you can skip the entire "how to find the correct index" thing, and probably slightly speed up the script as well, by simply applying lowercase (or, as you are using it, uppercase) to the entire match:
found[i].changecase (ChangecaseMode.lowercase);
because nothing will happen to not-lowercaseable characters (a word I declare to signify "having the Unicode-defined property of being lowercase and having an uppercase equivalent). (Or the other way around, if I understand your purpose correct.)

Bash, regular expressions, parameter to file

I have an assignment in which I have to send to a file an unlimited list of parameters, the file will have to print the strings which are repeated in the following way:
NumNumNumCharCharChar...
Num- number
Char-character
every three following numbers are the same, as well as the three next characters, then another three numbers and then another three characters.
The string must start with numbers and end with characters in a repeated way.
In order to solve this question, you may use only grep/egrep — up to you, which means that the solution is in regular expressions..
OK, this is what I thought to do for the egrep:
egrep "^([0-9][0-9][0-9][a-b][a-b][a-b])\1*$"
Your attempt is almost correct. The backreference \1 will require repetitions of the matching string, not the matching pattern. Allow the pattern to repeat instead. Inside the repetitions, you do want backreferences:
egrep '^(([0-9])\2{2}([a-z])\3{2})+$' file
As a shell scripting tweak, I switched to single quotes (double quotes are less safe) and I extended the lowercase class to [a-z]. Note that the outer parentheses are group 1, so the backreferences to the inner parenthesized expressions will be \2 and \3.

Replacing partial regex matches in place with Ruby

I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

Resources