How do I modify \1 in gsub? - ruby

When I see some text that matches my pattern, I want to create a link to an external site using RedCloth that has a query link for it.
If I have something like:
Text 123 1234 12345
When I see that I want to replace it with:
"Text 123 1234 12345":http://site.com/query?value=Text%20123%201234%2012345
If I let it keep the spaces, RedCloth won't notice this as a link correctly.
Here is where I am at:
s = "this is a string which has Text 123 1234 12345 "
s = s.s.gsub(/(Text \d+ \d+ \d+)/,'"\1":http://site.com/query?value=\1'
=> "Text 123 1234 12345":http://site.com/query?value=Text 123 1234 12345"
The probem is that RedCloth stops parsing after:
"Text 123 1234 12345":http://site.com/query?value=Text
So I really need:
"Text 123 1234 12345":http://site.com/query?value=Text%20123%201234%2012345"
Is there a way I can mess with \1 in the right hand side of gsub, such that I could get the following? If not, what's the best way to do this?

Ok, thanks to the comment by Narfanator I found the following: "$1 and \1 in Ruby".
The solutions was super easy:
s = "this is a string which has Text 123 1234 12345 "
s = s.s.gsub(/(Text \d+ \d+ \d+)/){|x| "\"" + x + "\":https://site.com/query?value=" + CGI::escape(x)}

Related

Parenthesize first character of each word

i want to Parenthesize first character of each word
$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'
can anyone explain? i am not getting how it is working?
sed 's/pattern1/pattern2' --- Does "replace first occurrence of pattern1 with pattern2"
sed 's/pattern1/pattern2/g' --- Does "A (g)lobal replacement => replace all the occurrence of pattern1 with pattern2 "
sed 's/\b(pattern1)/pattern2/g' --- Does " A word by word search"
sed 's/\b([A-Z])/pattern2/g' --- Does " Matches a single uppercase letter"
sed 's/\b([A-Z])/(\1)/g' --- Does " sed 's/\b([A-Z])/([A-Z])/g' "
\1 is a back reference. [Refer][https://www.gnu.org/software/sed/manual/html_node/Back_002dreferences-and-Subexpressions.html]
In short It does a global replacement (replace all occurrences) of any uppercase letter with (uppercase letter) doing a word by word search.
I need to use sed -E to get that working.
$ echo "Welcome To The Geek Stuff" | sed 's/(\b[A-Z])/(\1)/g'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
$ echo "Welcome To The Geek Stuff" | sed -E 's/(\<.)/(\1)/g'
(W)elcome (T)o (T)he (G)eek (S)tuff
You could also use the \< anchor which is "start of word", where \b is "word boundary". Using start of word marker lets you simplify the regex to match any word character:
$ echo "Welcome To The Geek Stuff 123" | sed -E 's/\<./(&)/g'
(W)elcome (T)o (T)he (G)eek (S)tuff (1)23
you should do this:
echo "Welcome To The Our Class" | sed 's/\([A-Z]\)/\(\1\)/g'
(remove the "\b")
between the first "/" to the second, there is an expression to be replace on the expression between the second "/" to the third.
you search in the sentence a string that begin in capital letter(chr between A to Z) and add to this letter "(" before and ")" after. 1 means the first letter in a word.
the output will be :
(W)elcome (T)o (T)he (O)ur (C)lass
To parenthesize the first 3 letters of a word, you can use
$ echo "the quick brown fox jumps over a lazy
dog" | sed 's/(\b[a-Z]{1,3})/(&)/g' (the) (qui)ck (bro)wn (fox)
(jum)ps (ove)r (a) (laz)y (dog) $

how to remove lines containing only numbers, special characters or blanks after a delimiter

The following code:
#!/bin/bash
osascript -e \
'tell application "Google_Chrome" to tell tab 1 of window 1 \
set t to execute javascript "document.body.innerText" \
end tell' | grep ':'
Results in output:
line1:blah blah
line2:blah 123
line3:
line4:[456] blah
Line5:blah blah
line6:[789]
line 7:
The desired output:
line1:blah blah
line2:blah 123
line4:[456] blah
I can use cut -d : -f1 to get just the left side and cut -d : -f2 to get just right side. But I can't seem to figure out how to remove blank lines or lines with only numbers and/or special characters while still preserving the structure of data.
To the best of my knowledge, what I'm trying to achieve follows this specific set of rules:
Every valid line of output contains a : (but not all lines containing : are valid)
No spaces, special characters or capital letters permitted to the left of :
Only lowercase letters, numbers and underscores [a-z] [0-9] and _ permitted to the left of :
Any line not containing letters [a-z] to right of : should be discarded. (case is not important)
Any ideas how to accomplish this?
Replace your grep with this:
... | grep -E '^[a-z0-9_]+:[^a-zA-Z]*[a-zA-Z]'
line1:blah blah
line2:blah 123
line4:[456] blah
This will meet your requirements of allowing only [a-z0-9_] characters on left of : and at least one of [a-zA-Z] on RHS of :.

expr does not return the pattern if not at the beginning of the string

Using this version of bash:
GNU bash, version 4.1.2(1)-release (i386-redhat-linux-gnu)
How can I get expr to find my pattern within a string, if the pattern I'm looking for does not begin this string?
Example:
expr match "123 abc 456 def ghi789" '\([0-9]*\)' #returns 123 as expected
expr match "z 123 abc 456 def ghi789" '\([0-9]*\)' #returns nothing
In the second example, I would expect 123 to be returned.
Further analysis:
If I start from the end of the string by adding .* in my command, I get a weird result:
expr match "123 abc 456 def ghi789" '.*\([0-9]*\)' #returns nothing
expr match "123 abc 456 def ghi789" '.*\([0-9]\)' #returns 9 as expected
expr match "123 abc 456 def ghi789 z" '.*\([0-9]\)' #returns also 9
Here, it seems that the pattern can be found at the end of the string (so at the beginning of my search), and also if it's not at the end of the string. But it does not work if I add the * at the end of the regular expression.
In the other hand, the same does not apply if I start from the beginning of my string:
expr match "z 123 abc 456 def ghi789" '\([0-9]\)' #returns nothing
I think I must misunderstand something obvious, but I cannot find what.
Thank you for your help :)
Would
expr match "123 abc 456 def ghi789 z" '[^0-9]*\([0-9]*\)
do it? (Just added [0-9]* instead of .* at the beginning)
Like mentioned in the comments - the expression
expr match "123 abc 456 def ghi789 z" '^[^0-9]*\([0-9]*\)
would fit better, because the part ^[^0-9] can be read as "skip all characters which are not digits ([^0-9]) form begin (the ^as first character)"

grep the input file with keyword, then generate new report

cat infile
abc 123 678
sda 234 345 321
xyz 234 456 678
I need grep the file for keyword sda and report with first and last column.
sda has the value of 321
If you know bash script, I need a function in ruby as in below bash(awk) script:
awk '/sda/{print $1 " has the value of " $NF}' infile
How about something like this?
File.open("infile", "r").each_line do |line|
next unless line =~ /^sda/ # don't process the line unless it starts with "sda"
entries = line.split(" ")
var1 = entries.first
var2 = entries.last
puts "#{var1} has the value of #{var2}"
end
I don't know where you are defining the "sda" matcher. If it's fixed, you can just put it in there.
If not, you might try grabbing it from commandline arguments.
key, *_, value = line.split
next unless key == 'sda' # or "next if key != 'sda'"
puts your_string
Alternatively, you could use a regexp matcher in the beginning to see if the line starts with 'sda' or not.

How to use sed command to add a string before a pattern string?

I want to use sed to modify my file named "baz".
When i search a pattern foo , foo is not at the beginning or end of line, i want to append bar before foo, how can i do it using sed?
Input file named baz:
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
blah_foo_blahblahblah
Output file
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
blah_barfoo_blahblahblah
You can just use something like:
sed 's/foo/barfoo/g' baz
(the g at the end means global, every occurrence on each line rather than just the first).
For an arbitrary (rather than fixed) pattern such as foo[0-9], you could use capture groups as follows:
pax$ echo 'xyz fooA abc
xyz foo5 abc
xyz fooB abc' | sed 's/\(foo[0-9]\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc
xyz fooB abc
The parentheses capture the actual text that matched the pattern and the \1 uses it in the substitution.
You can use arbitrarily complex patterns with this one, including ensuring you match only complete words. For example, only changing the pattern if it's immediately surrounded by a word boundary:
pax$ echo 'xyz fooA abc
xyz foo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc' | sed 's/\(\bfoo[0-9]\b\)/bar\1/g'
xyz fooA abc
xyz barfoo5 abc foo77 qqq xfoo4 zzz
xyz fooB abc
In terms of how the capture groups work, you can use parentheses to store the text that matches a pattern for later use in the replacement. The captured identifiers are based on the ( characters reading from left to right, so the regex (I've left off the \ escape characters and padded it a bit for clarity):
( ( \S* ) ( \S* ) )
^ ^ ^ ^ ^ ^
| | | | | |
| +--2--+ +--3--+ |
+---------1---------+
when applied to the text Pax Diablo would give you three groups:
\1 = Pax Diablo
\2 = Pax
\3 = Diablo
as shown below:
pax$ echo 'Pax Diablo' | sed 's/\(\(\S*\) \(\S*\)\)/[\1] [\2] [\3]/'
[Pax Diablo] [Pax] [Diablo]
Just substitute the start of the line with something different.
sed '/^foo/s/^/bar/'
To replace or modify all "foo" except at beginning or end of line, I would suggest to temporarily replace them at beginning and end of line with a unique sentinel value.
sed 's/^foo/____veryunlikelytoken_bol____/
s/foo$/____veryunlikelytoken_eol____/
s/foo/bar&/g
s/^____veryunlikelytoken_bol____/foo/
s/____veryunlikelytoken_eol____$/foo/'
In sed there is no way to specify "cannot match here". In Perl regex and derivatives (meaning languages which borrowed from Perl's regex, not necessarily languages derived from Perl) you have various negative assertions so you can do something like
perl -pe 's/(?!^)foo(?!$)/barfoo/g'

Resources