Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'
Related
How to delete all the lines between two pattern in file using sed.
Here pattern are //test and //endtest, file content:
blah blah blah
c
f
f
[
]
//test
all text to be deleted
line1
line2
xyz
amv
{
//endtest
l
dsf
dsfs
Expected result:
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
This is common feature of sed
sed '/^\/\/test$/,/^\/\/endtest/d'
As / is used to bound regex, they have to be escaped, in regex.
If you want to keep marks (as requested):
sed '/^\/\/test$/,/^\/\/endtest/{//!d}'
Explanation:
Have a look at info sed, search for sed address -> Regexp Addresses and Range Addresses.
Enclosed by { ... }, symbol // mean any bound.
The empty regular expression '//' repeats the last regular
expression match (the same holds if the empty regular expression is
passed to the 's' command).
! mean not, then d for delete line
Alternative: You could write:
sed '/^\/\/\(end\)\?test$/,//{//!d}'
or
sed -E '/^\/\/(end)?test$/,//{//!d}'
Will work same, but care, this could reverse effect if some extra pattern //endtest may exist before first open pattern (//test).
... All this was done, using GNU sed 4.4!
Under MacOS, BSD sed
Under MacOS, I've successfully dropped wanted lines with this syntax:
sed '/^\/\/test$/,/^\/\/endtest/{/^\/\/\(end\)\{0,1\}test$/!d;}'
or
sed -E '/^\/\/test$/,/^\/\/endtest/{/^\/\/(end)?test$/!d;}'
With awk:
$ awk '/\/\/endtest/{p=0} !p; /\/\/test/{p = 1}' file
blah blah blah
c
f
f
[
]
//test
//endtest
l
dsf
dsfs
if your data in 'd' file, try gnu sed:
sed -E '/\/\/test/,/\/\/endtest/{/\/\/.*test/!d}' d
I have a file (filename.txt) with the following structure:
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
I would like to shuffle the characters in the strings that do not start wit >. The output would (for example) look like the following:
>line1
DGC
FEI
HBA
>line2
JRP
OKN
QML
>line3
SZV
YXT
UW
This is what I tried to shuffle the characters for each >line[number]: ruby -lpe '$_ = $_.chars.shuffle * "" if !/^>/' filename.txt. The command works (see my post BASH - Shuffle characters in strings from file) but it shuffles line by line. I was wondering how I could modify the command to shuffle characters between all strings of each >line[number]). Using ruby is not a requirement.
First, we need to solve the problem: how to shuffle all characters in multiple lines:
echo -e 'ABC\nDEF\nGHI' |grep -o . |shuf |tr -d '\n'
GDAFHEIBC
and, we also need an array to record the length of each line in origin strings.
s=GDAFHEIBC
lens=(3 3 3)
start=0
for len in "${lens[#]}"; do
echo ${s:${start}:${len}}
((start+=len))
done
GDA
FHE
IBC
So, the origin multiple lines:
ABC
DEF
GHI
have been shuffled to:
GDA
FHE
IBC
Now, we can do our jobs:
lens=()
string=""
function shuffle_lines {
local start=0
local shuffled_string=$(grep -o . <<< ${string} |shuf |tr -d '\n')
for len in "${lens[#]}"; do
echo ${shuffled_string:${start}:${len}}
((start+=len))
done
lens=()
string=""
}
while read -r line; do
if [[ "${line}" =~ ^\> ]]; then
shuffle_lines
echo "${line}"
else
string+="${line}"
lens+=(${#line})
fi
done <filename.txt
shuffle_lines
Examples:
$ cat filename.txt
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
>line4
0123
456
78
9
$ ./solution.sh
>line1
HFG
BED
AIC
>line2
JOP
KMQ
RLN
>line3
UVW
TYZ
XS
>line4
1963
245
08
7
#!/bin/bash
# echo > output.txt # uncomment to write in a file output.txt
mix(){
{
echo "$title"
line="$( fold -w1 <<< "$line" | shuf )"
echo "${line//$'\n'}" | fold -w3
} # >> output.txt # uncomment to write in a file output.txt
unset line
}
while read -r; do
if [[ $REPLY =~ ^\> ]]; then
mix
title="$REPLY"
else
line+="$REPLY"
fi
done < filename.txt
mix # final mix after loop's exit, otherwise line3 will be not mixed
exit
edited with comment of gniourf-gniourf
First create a test file.
str =<<FINI
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
FINI
File.write('test', str)
#=> 56
Now read the file and perform the desired operations.
result = File.read('test').split(/(>line\d+)/).map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end.join
# ">line1\nECF\nHIA\nGBD\n>line2\nJNP\nKLR\nOQM\n>line3\nTXY\nUZV\nSW\n"
puts result
>line1
ECF
HIA
GBD
>line2
JNP
KLR
OQM
>line3
TXY
UZV
SW
To do this from the command convert the code to a string with statements separated by a semicolon.
ruby -e "puts (File.read('test').split(/(>line\d+)/).map do |s|; if s.match?(/\A(?:|>line\d+)\z/); s; else; a = s.scan(/\p{Lu}/).shuffle; s.gsub(/\p{Lu}/) { a.shift }; end; end).join"
The steps are as follows.
a = File.read('test')
#=> ">line1\nABC\nDEF\nGHI\n>line2\nJKL\nMNO\nPQR\n>line3\nSTU\nVWX\nYZ\n"
b = a.split(/(>line\d+)/)
#=> ["", ">line1", "\nABC\nDEF\nGHI\n", ">line2", "\nJKL\nMNO\nPQR\n",
# ">line3", "\nSTU\nVWX\nYZ\n"]
Notice that the regular expression that is split's argument places >line\d+ within a capture group. Without that, ">line1", ">line2" and ">line3" would not be included in b.
c = b.map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end
#=> ["", ">line1", "\nEAC\nIHB\nDGF\n", ">line2", "\nKQJ\nROL\nMPN\n",
# ">line3", "\nSUY\nXTV\nZW\n"]
c.join
#=> ">line1\nEAC\nIHB\nDGF\n>line2\nKQJ\nROL\nMPN\n>line3\nSUY\nXTV\nZW\n"
Now consider more closely the calculation of c. The first element of b is passed to the block and the block variable s is set to its value:
s = ""
We then compute
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so "" is returned from the block. The regular expression can be expressed as follows.
/
\A # match the beginning of the string
(?: # begin a non-capture group
# match an empty space
| # or
>line\d+ # match '>line' followed by one or more digits
) # end non-capture group
\z # match the end of the string
/x # free-spacing regex definition mode.
Within the non-capture group an empty space was matched.
The next element of b is then passed to the block.
s = ">line1"
Again
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so s is return from the block.
Now the third element of b is passed to the block. (Finally, something interesting.)
s = "\nABC\nDEF\nGHI\n"
d = s.scan(/\p{Lu}/)
#=> ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
a = d.shuffle
#=> ["D", "C", "G", "H", "B", "F", "I", "E", "A"]
s.gsub(/\p{Lu}/) { a.shift }
#=> "\nDCG\nHBF\nIEA\n"
The remaining calculations are similar.
Looking for the ruby one liner substitute to print out a substitution only if the line matches the regular expression:
echo -e "Line 1\nLine 2\nLine 3" | perl -ne "print if s/Line 2/Line 2 replaced, others discarded/g"
Input:
Line 1
Line 2
Line 3
Output:
Line 2 replaced, others discarded
As I know, there is no equivalent to -ne shorthand in ruby. So it will be little longer:
echo -e "Line 1\nLine 2\nLine 3" | ruby -e 'puts $<.read.lines.map {|l| l =~ /Line 2/ ? l.gsub(/Line 2/, "Line 2 replaced, others discarded") : nil }.compact'
Where:
$< also ARGF (docs) is Stream for file argument or STDIO
$<.read will read it all to string
$<.read.lines split by new line character, returns array
map {|l| ... } will collect result of expression in a block to new array
l =~ /Line 2/ check if string match Regex
l.gsub(/Line 2/, "Line 2 replaced") will replace all "Line 2" to "Line 2 replaced"
.compact will remove nil values from array (return new array without nil's)
puts [] will print each element of array on new line
Probably ruby is not a best chose for this task, I would choose sed or do it in text editor. Most of text editors can find and replace by regex nowdays
I have a file (/tmp/test) the has a the string "aaabbbccc" in it
I want to extract "bbb" from the string with sed.
Doing this returns the entire string:
sed -n '/aaa/,/ccc/p' /tmp/test
I just want to return bbb from the string with sed (I am trying to learn sed so not interested in other solutions for this)
Sed works on a line basic, and a,b{action} will run action for lines matching a until lines matching b. In your case
sed -n '/aaa/,/ccc/p'
will start printing lines when /aaa/ is matched, and stop when /ccc/ is matched which is not what you want.
To manipulate a line there is multiply options, one is s/search/replace/ which can be utilized to remove the leading aaa and trailing ccc:
% sed 's/^aaa\|ccc$//g' /tmp/test
bbb
Breakdown:
s/
^aaa # Match literal aaa in beginning of string
\| # ... or ...
ccc$ # Match literal ccc at the end of the sting
// # Replace with nothing
g # Global (Do until there is no more matches, normally when a match is
# found and replacement is made this command stops replacing)
If you are not sure how many a's and c's you have you can use:
% sed 's/^aa*\|cc*$//g' /tmp/test
bbb
Which will match literal a followed by zero or more a's at the beginning of the line. Same for the c's but just at the end.
With GNU sed:
sed 's/aaa\(.*\)ccc/\1/' /tmp/test
Output:
bbb
See: The Stack Overflow Regular Expressions FAQ
exapl
I have specific situation. I am trying to replace some words in string. I have two example strings:
string1 = "aaabbb aaa bbb"
string2 = "a. bbb"
In string1 I want to replace full word "aaa" with "ccc" so I do it right this:
translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"
So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.
I tried also this:
translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"
But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.
Try
string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")
In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".
Update
As #Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:
string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# => "ccc bbb"
Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:
string = "a. b.a. a. bbb"
translation = "a."
# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"
# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"
Anyway, the rightness of my reasoning depends by OP needs ;-)
The . (dot) has a special meaning in regexes: it means match any character.
You should escape it with \.