pattern matching in ruby - ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me

Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).

Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.

There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.

For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Related

Regexp for certain character to end of line

I have a string
"So on and so forth $5.99"
I would like to extract everything after the $ until the end of the line.
/$ finds the character $. How do I select the rest of the string? I know it's something \z but I can't get the syntax right.
In regexp $ represents the end of the line.
So in your case you need \$.*$ To include your escaped $ and everything (.*) up until the end of the line $.
No, /$ does not match that character. You need to escape it \ to match a literal.
string = "So on and so forth $5.99"
result = string.match(/\$(.*)$/)
puts result[1] #=> "5.99"
If you want to capture everything after the $, you'll want:
/\$(.*)\z/
See http://rubular.com/r/T4fR1SEl3j

How do prevent whitespace from appearing in these bash variables?

I'm reading in values from an .ini file, and sometimes may get trailing or leading whitespace.
How do I amend this first line to prevent that?
db=$(sed -n 's/.*DB_USERNAME *= *\([^ ]*.*\)/\1/p' < config.ini);
echo -"$db"-
Result;
-myinivar -
I need;
-myinivar-
Use parameter expansion.
echo "=${db% }="
You don't need the .* inside the capturing group (or the semicolon at the end of line):
db="$(sed -n 's/.*DB_USERNAME *= *\([^ ]*\).*/\1/p' < config.ini)"
To elaborate:
.* matches anything at all
DB_USERNAME matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
= matches that literal string
* (a single space followed by an asterisk) matches any number of spaces
\( starts the capturing group that is used for \1 later
[^ ] matches anything which is not a space character
* repeats that zero or more times
\) ends the capturing group
.* matches anything at all
Therefore, the result will be all the characters after DB_USERNAME = and any number of spaces, up to the next space or end of line, whichever comes first.
You can use echo to trim whitespace:
db='myinivar '
echo -"$(echo $db)"-
-myinivar-
Use crudini which handles these ini file edge cases transparently
db=$(crudini --get config.ini '' DB_USERNAME)
To get rid of more than one trailing space, use %% which removes the longest matching pattern from the end of the string
echo "=${db%% *}="

Ruby regex to split text

I am using the below regex to split a text at certain ending punctuation however it doesn't work with quotes.
text = "\"Hello my name is Kevin.\" How are you?"
text.scan(/\S.*?[...!!??]/)
=> ["\"Hello my name is Kevin.", "\" How are you?"]
My goal is to produce the following result, but I am not very good with regex expressions. Any help would be greatly appreciated.
=> ["\"Hello my name is Kevin.\"", "How are you?"]
text.scan(/"(?>[^"\\]+|\\{2}|\\.)*"|\S.*?[...!!??]/)
The idea is to check for quoted parts before. The subpattern is a bit more elaborated than a simple "[^"]*" to deal with escaped quotes (* see at the end to a more efficient pattern).
pattern details:
" # literal: a double quote
(?> # open an atomic group: all that can be between quotes
[^"\\]+ # all that is not a quote or a backslash
| # OR
\\{2} # 2 backslashes (the idea is to skip even numbers of backslashes)
| # OR
\\. # an escaped character (in particular a double quote)
)* # repeat zero or more times the atomic group
" # literal double quote
| # OR
\S.*?[...!!??]
to deal with single quote to you can add: '(?>[^'\\]+|\\{2}|\\.)*'| to the pattern (the most efficient), but if you want make it shorter you can write this:
text.scan(/(['"])(?>[^'"\\]+|\\{2}|\\.|(?!\1)["'])*\1|\S.*?[...!!??]/)
where \1 is a backreference to the first capturing group (the found quote) and (?!\1) means not followed by the found quote.
(*) instead of writing "(?>[^"\\]+|\\{2}|\\.)*", you can use "[^"\\]*+(?:\\.[^"\\]*)*+" that is more efficient.
Add optional quote (["']?) to the pattern:
text.scan(/\S.*?[...!!??]["']?/)
# => ["\"Hello my name is Kevin.\"", "How are you?"]

Difference of answers while using split function in Ruby

Given the following inputs:
line1 = "Hey | Hello | Good | Morning"
line2 = "Hey , Hello , Good , Morning"
file1=length1=name1=title1=nil
Using ',' to split the string as follows:
file1, length1, name1, title1 = line2.split(/,\s*/)
I get the following output:
puts file1,length1,name1,title1
>Hey
>Hello
>Good
>Morning
However, using '|' to split the string I receive a different output:
file1, length1, name1, title1 = line2.split(/|\s*/)
puts file1,length1,name1,title1
>H
>e
>y
Both the strings are same except the separating symbol (a comma in first case and a pipe in second case). The format of the split function I am using is also the same except, of course, for the delimiting character. What causes this variation?
The problem is because | has the meaning of OR in regex. If you want literal character, then you need to escape it \|. So the correct regex should be /\|\s*/
Currently, the regex /|\s*/ means empty string or series of whitespace character. Since the empty string specified first in the OR, the regex engine will break the string up at every character (you can imagine that there is an empty string between characters). If you swap it to /\s*|/, then the whitespaces will be preferred over empty string where possible and there will be no white spaces in the list of tokens after splitting.

how to get substring, starting from the first occurence of a pattern in bash

I'm trying to get a substring from the start of a pattern.
I would simply use cut, but it wouldn't work if the pattern is a few characters long.
if I needed a single-character, delimiter, then this would do the trick:
result=`echo "test String with ( element in parenthesis ) end" | cut -d "(" -f 2-`
edit: sample tests:
INPUT: ("This test String is an input", "in")
OUTPUT: "ing is an input"
INPUT: ("This test string is an input", "in ")
OUTPUT: ""
INPUT: ("This test string is an input", "n")
OUTPUT: "ng is an input"
note: the parenthesis mean that the input both takes a string, and a delimiter string.
EDITED:
In conclusion, what was requested was a way to parse out the text from a string beginning at a particular substring and ending at the end of the line. As mentioned, there are numerous ways to do this. Here's one...
egrep -o "DELIM.*" input
... where 'DELIM' is the desired substring.
Also
awk -v delim="in" '{print substr($0, index($0, delim))}'
This can be done without external programs. Assuming the string to be processed is in $string and the delimiter is DELIM:
result=${string#"${string%%DELIM*}"}
The inner part substitutes $string with everything starting from the first occurrence of DELIM (if any) removed. The outer part then removes that value from the start of $string, leaving everything starting from the first occurrence of DELIM or the empty string if DELIM does not occur. (The variable string remains unchanged.)

Resources