Is it possible in Ruby to print a part of a regex (group) and instead of the whole matched substring? - ruby

Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?

Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]

The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'

You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'

Related

Delete the lines from file between pattern match

How to delete all the lines between two pattern in file using sed.
Here pattern are //test and //endtest, file content:
blah blah blah
c
f
f
[
]
//test
all text to be deleted
line1
line2
xyz
amv
{
//endtest
l
dsf
dsfs
Expected result:
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
This is common feature of sed
sed '/^\/\/test$/,/^\/\/endtest/d'
As / is used to bound regex, they have to be escaped, in regex.
If you want to keep marks (as requested):
sed '/^\/\/test$/,/^\/\/endtest/{//!d}'
Explanation:
Have a look at info sed, search for sed address -> Regexp Addresses and Range Addresses.
Enclosed by { ... }, symbol // mean any bound.
The empty regular expression '//' repeats the last regular
expression match (the same holds if the empty regular expression is
passed to the 's' command).
! mean not, then d for delete line
Alternative: You could write:
sed '/^\/\/\(end\)\?test$/,//{//!d}'
or
sed -E '/^\/\/(end)?test$/,//{//!d}'
Will work same, but care, this could reverse effect if some extra pattern //endtest may exist before first open pattern (//test).
... All this was done, using GNU sed 4.4!
Under MacOS, BSD sed
Under MacOS, I've successfully dropped wanted lines with this syntax:
sed '/^\/\/test$/,/^\/\/endtest/{/^\/\/\(end\)\{0,1\}test$/!d;}'
or
sed -E '/^\/\/test$/,/^\/\/endtest/{/^\/\/(end)?test$/!d;}'
With awk:
$ awk '/\/\/endtest/{p=0} !p; /\/\/test/{p = 1}' file
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
if your data in 'd' file, try gnu sed:
sed -E '/\/\/test/,/\/\/endtest/{/\/\/.*test/!d}' d

BASH - Shuffle characters in strings from several rows

I have a file (filename.txt) with the following structure:
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
I would like to shuffle the characters in the strings that do not start wit >. The output would (for example) look like the following:
>line1
DGC
FEI
HBA
>line2
JRP
OKN
QML
>line3
SZV
YXT
UW
This is what I tried to shuffle the characters for each >line[number]: ruby -lpe '$_ = $_.chars.shuffle * "" if !/^>/' filename.txt. The command works (see my post BASH - Shuffle characters in strings from file) but it shuffles line by line. I was wondering how I could modify the command to shuffle characters between all strings of each >line[number]). Using ruby is not a requirement.
First, we need to solve the problem: how to shuffle all characters in multiple lines:
echo -e 'ABC\nDEF\nGHI' |grep -o . |shuf |tr -d '\n'
GDAFHEIBC
and, we also need an array to record the length of each line in origin strings.
s=GDAFHEIBC
lens=(3 3 3)
start=0
for len in "${lens[#]}"; do
echo ${s:${start}:${len}}
((start+=len))
done
GDA
FHE
IBC
So, the origin multiple lines:
ABC
DEF
GHI
have been shuffled to:
GDA
FHE
IBC
Now, we can do our jobs:
lens=()
string=""
function shuffle_lines {
local start=0
local shuffled_string=$(grep -o . <<< ${string} |shuf |tr -d '\n')
for len in "${lens[#]}"; do
echo ${shuffled_string:${start}:${len}}
((start+=len))
done
lens=()
string=""
}
while read -r line; do
if [[ "${line}" =~ ^\> ]]; then
shuffle_lines
echo "${line}"
else
string+="${line}"
lens+=(${#line})
fi
done <filename.txt
shuffle_lines
Examples:
$ cat filename.txt
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
>line4
0123
456
78
9
$ ./solution.sh
>line1
HFG
BED
AIC
>line2
JOP
KMQ
RLN
>line3
UVW
TYZ
XS
>line4
1963
245
08
7
#!/bin/bash
# echo > output.txt # uncomment to write in a file output.txt
mix(){
{
echo "$title"
line="$( fold -w1 <<< "$line" | shuf )"
echo "${line//$'\n'}" | fold -w3
} # >> output.txt # uncomment to write in a file output.txt
unset line
}
while read -r; do
if [[ $REPLY =~ ^\> ]]; then
mix
title="$REPLY"
else
line+="$REPLY"
fi
done < filename.txt
mix # final mix after loop's exit, otherwise line3 will be not mixed
exit
edited with comment of gniourf-gniourf
First create a test file.
str =<<FINI
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
FINI
File.write('test', str)
#=> 56
Now read the file and perform the desired operations.
result = File.read('test').split(/(>line\d+)/).map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end.join
# ">line1\nECF\nHIA\nGBD\n>line2\nJNP\nKLR\nOQM\n>line3\nTXY\nUZV\nSW\n"
puts result
>line1
ECF
HIA
GBD
>line2
JNP
KLR
OQM
>line3
TXY
UZV
SW
To do this from the command convert the code to a string with statements separated by a semicolon.
ruby -e "puts (File.read('test').split(/(>line\d+)/).map do |s|; if s.match?(/\A(?:|>line\d+)\z/); s; else; a = s.scan(/\p{Lu}/).shuffle; s.gsub(/\p{Lu}/) { a.shift }; end; end).join"
The steps are as follows.
a = File.read('test')
#=> ">line1\nABC\nDEF\nGHI\n>line2\nJKL\nMNO\nPQR\n>line3\nSTU\nVWX\nYZ\n"
b = a.split(/(>line\d+)/)
#=> ["", ">line1", "\nABC\nDEF\nGHI\n", ">line2", "\nJKL\nMNO\nPQR\n",
# ">line3", "\nSTU\nVWX\nYZ\n"]
Notice that the regular expression that is split's argument places >line\d+ within a capture group. Without that, ">line1", ">line2" and ">line3" would not be included in b.
c = b.map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end
#=> ["", ">line1", "\nEAC\nIHB\nDGF\n", ">line2", "\nKQJ\nROL\nMPN\n",
# ">line3", "\nSUY\nXTV\nZW\n"]
c.join
#=> ">line1\nEAC\nIHB\nDGF\n>line2\nKQJ\nROL\nMPN\n>line3\nSUY\nXTV\nZW\n"
Now consider more closely the calculation of c. The first element of b is passed to the block and the block variable s is set to its value:
s = ""
We then compute
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so "" is returned from the block. The regular expression can be expressed as follows.
/
\A # match the beginning of the string
(?: # begin a non-capture group
# match an empty space
| # or
>line\d+ # match '>line' followed by one or more digits
) # end non-capture group
\z # match the end of the string
/x # free-spacing regex definition mode.
Within the non-capture group an empty space was matched.
The next element of b is then passed to the block.
s = ">line1"
Again
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so s is return from the block.
Now the third element of b is passed to the block. (Finally, something interesting.)
s = "\nABC\nDEF\nGHI\n"
d = s.scan(/\p{Lu}/)
#=> ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
a = d.shuffle
#=> ["D", "C", "G", "H", "B", "F", "I", "E", "A"]
s.gsub(/\p{Lu}/) { a.shift }
#=> "\nDCG\nHBF\nIEA\n"
The remaining calculations are similar.

Ruby one liner to replace only lines that match, discard others

Looking for the ruby one liner substitute to print out a substitution only if the line matches the regular expression:
echo -e "Line 1\nLine 2\nLine 3" | perl -ne "print if s/Line 2/Line 2 replaced, others discarded/g"
Input:
Line 1
Line 2
Line 3
Output:
Line 2 replaced, others discarded
As I know, there is no equivalent to -ne shorthand in ruby. So it will be little longer:
echo -e "Line 1\nLine 2\nLine 3" | ruby -e 'puts $<.read.lines.map {|l| l =~ /Line 2/ ? l.gsub(/Line 2/, "Line 2 replaced, others discarded") : nil }.compact'
Where:
$< also ARGF (docs) is Stream for file argument or STDIO
$<.read will read it all to string
$<.read.lines split by new line character, returns array
map {|l| ... } will collect result of expression in a block to new array
l =~ /Line 2/ check if string match Regex
l.gsub(/Line 2/, "Line 2 replaced") will replace all "Line 2" to "Line 2 replaced"
.compact will remove nil values from array (return new array without nil's)
puts [] will print each element of array on new line
Probably ruby is not a best chose for this task, I would choose sed or do it in text editor. Most of text editors can find and replace by regex nowdays

sed not working as expected (trying to get value between two matches in a string)

I have a file (/tmp/test) the has a the string "aaabbbccc" in it
I want to extract "bbb" from the string with sed.
Doing this returns the entire string:
sed -n '/aaa/,/ccc/p' /tmp/test
I just want to return bbb from the string with sed (I am trying to learn sed so not interested in other solutions for this)
Sed works on a line basic, and a,b{action} will run action for lines matching a until lines matching b. In your case
sed -n '/aaa/,/ccc/p'
will start printing lines when /aaa/ is matched, and stop when /ccc/ is matched which is not what you want.
To manipulate a line there is multiply options, one is s/search/replace/ which can be utilized to remove the leading aaa and trailing ccc:
% sed 's/^aaa\|ccc$//g' /tmp/test
bbb
Breakdown:
s/
^aaa # Match literal aaa in beginning of string
\| # ... or ...
ccc$ # Match literal ccc at the end of the sting
// # Replace with nothing
g # Global (Do until there is no more matches, normally when a match is
# found and replacement is made this command stops replacing)
If you are not sure how many a's and c's you have you can use:
% sed 's/^aa*\|cc*$//g' /tmp/test
bbb
Which will match literal a followed by zero or more a's at the beginning of the line. Same for the c's but just at the end.
With GNU sed:
sed 's/aaa\(.*\)ccc/\1/' /tmp/test
Output:
bbb
See: The Stack Overflow Regular Expressions FAQ

Ruby replace word

exapl
I have specific situation. I am trying to replace some words in string. I have two example strings:
string1 = "aaabbb aaa bbb"
string2 = "a. bbb"
In string1 I want to replace full word "aaa" with "ccc" so I do it right this:
translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"
So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.
I tried also this:
translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"
But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.
Try
string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")
In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".
Update
As #Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:
string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# => "ccc bbb"
Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:
string = "a. b.a. a. bbb"
translation = "a."
# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"
# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"
Anyway, the rightness of my reasoning depends by OP needs ;-)
The . (dot) has a special meaning in regexes: it means match any character.
You should escape it with \.

Resources