BASH - Shuffle characters in strings from several rows - ruby

I have a file (filename.txt) with the following structure:
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
I would like to shuffle the characters in the strings that do not start wit >. The output would (for example) look like the following:
>line1
DGC
FEI
HBA
>line2
JRP
OKN
QML
>line3
SZV
YXT
UW
This is what I tried to shuffle the characters for each >line[number]: ruby -lpe '$_ = $_.chars.shuffle * "" if !/^>/' filename.txt. The command works (see my post BASH - Shuffle characters in strings from file) but it shuffles line by line. I was wondering how I could modify the command to shuffle characters between all strings of each >line[number]). Using ruby is not a requirement.

First, we need to solve the problem: how to shuffle all characters in multiple lines:
echo -e 'ABC\nDEF\nGHI' |grep -o . |shuf |tr -d '\n'
GDAFHEIBC
and, we also need an array to record the length of each line in origin strings.
s=GDAFHEIBC
lens=(3 3 3)
start=0
for len in "${lens[#]}"; do
echo ${s:${start}:${len}}
((start+=len))
done
GDA
FHE
IBC
So, the origin multiple lines:
ABC
DEF
GHI
have been shuffled to:
GDA
FHE
IBC
Now, we can do our jobs:
lens=()
string=""
function shuffle_lines {
local start=0
local shuffled_string=$(grep -o . <<< ${string} |shuf |tr -d '\n')
for len in "${lens[#]}"; do
echo ${shuffled_string:${start}:${len}}
((start+=len))
done
lens=()
string=""
}
while read -r line; do
if [[ "${line}" =~ ^\> ]]; then
shuffle_lines
echo "${line}"
else
string+="${line}"
lens+=(${#line})
fi
done <filename.txt
shuffle_lines
Examples:
$ cat filename.txt
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
>line4
0123
456
78
9
$ ./solution.sh
>line1
HFG
BED
AIC
>line2
JOP
KMQ
RLN
>line3
UVW
TYZ
XS
>line4
1963
245
08
7

#!/bin/bash
# echo > output.txt # uncomment to write in a file output.txt
mix(){
{
echo "$title"
line="$( fold -w1 <<< "$line" | shuf )"
echo "${line//$'\n'}" | fold -w3
} # >> output.txt # uncomment to write in a file output.txt
unset line
}
while read -r; do
if [[ $REPLY =~ ^\> ]]; then
mix
title="$REPLY"
else
line+="$REPLY"
fi
done < filename.txt
mix # final mix after loop's exit, otherwise line3 will be not mixed
exit
edited with comment of gniourf-gniourf

First create a test file.
str =<<FINI
>line1
ABC
DEF
GHI
>line2
JKL
MNO
PQR
>line3
STU
VWX
YZ
FINI
File.write('test', str)
#=> 56
Now read the file and perform the desired operations.
result = File.read('test').split(/(>line\d+)/).map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end.join
# ">line1\nECF\nHIA\nGBD\n>line2\nJNP\nKLR\nOQM\n>line3\nTXY\nUZV\nSW\n"
puts result
>line1
ECF
HIA
GBD
>line2
JNP
KLR
OQM
>line3
TXY
UZV
SW
To do this from the command convert the code to a string with statements separated by a semicolon.
ruby -e "puts (File.read('test').split(/(>line\d+)/).map do |s|; if s.match?(/\A(?:|>line\d+)\z/); s; else; a = s.scan(/\p{Lu}/).shuffle; s.gsub(/\p{Lu}/) { a.shift }; end; end).join"
The steps are as follows.
a = File.read('test')
#=> ">line1\nABC\nDEF\nGHI\n>line2\nJKL\nMNO\nPQR\n>line3\nSTU\nVWX\nYZ\n"
b = a.split(/(>line\d+)/)
#=> ["", ">line1", "\nABC\nDEF\nGHI\n", ">line2", "\nJKL\nMNO\nPQR\n",
# ">line3", "\nSTU\nVWX\nYZ\n"]
Notice that the regular expression that is split's argument places >line\d+ within a capture group. Without that, ">line1", ">line2" and ">line3" would not be included in b.
c = b.map do |s|
if s.match?(/\A(?:|>line\d+)\z/)
s
else
a = s.scan(/\p{Lu}/).shuffle
s.gsub(/\p{Lu}/) { a.shift }
end
end
#=> ["", ">line1", "\nEAC\nIHB\nDGF\n", ">line2", "\nKQJ\nROL\nMPN\n",
# ">line3", "\nSUY\nXTV\nZW\n"]
c.join
#=> ">line1\nEAC\nIHB\nDGF\n>line2\nKQJ\nROL\nMPN\n>line3\nSUY\nXTV\nZW\n"
Now consider more closely the calculation of c. The first element of b is passed to the block and the block variable s is set to its value:
s = ""
We then compute
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so "" is returned from the block. The regular expression can be expressed as follows.
/
\A # match the beginning of the string
(?: # begin a non-capture group
# match an empty space
| # or
>line\d+ # match '>line' followed by one or more digits
) # end non-capture group
\z # match the end of the string
/x # free-spacing regex definition mode.
Within the non-capture group an empty space was matched.
The next element of b is then passed to the block.
s = ">line1"
Again
s.match?(/\A(?:|>line\d+)\z/)
#=> true
so s is return from the block.
Now the third element of b is passed to the block. (Finally, something interesting.)
s = "\nABC\nDEF\nGHI\n"
d = s.scan(/\p{Lu}/)
#=> ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
a = d.shuffle
#=> ["D", "C", "G", "H", "B", "F", "I", "E", "A"]
s.gsub(/\p{Lu}/) { a.shift }
#=> "\nDCG\nHBF\nIEA\n"
The remaining calculations are similar.

Related

Evaluate a string with indexed array as values

I would like to take a string that contains positional argument markers (not named), supply it with an array (not hash) of values, and have it evaluated.
The use case as an example would be somewhat like ARGV.
For example,
# given:
string = "echo $1 ; echo $#"
values = ["hello", "world"]
# expected result:
"echo hello ; echo hello world"
The below function is the best I could come up with:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
# Create a variable that can replace $# with all arguments, and quote
# arguments that had "more than one word" originally
all_arguments = arguments.map{|a| a =~ /\s/ ? "\"#{a}\"" : a}.join ' '
# Replace all $1 - $9 with their respective argument ($1 ==> arguments[0])
string.gsub!(/\$(\d)/) { arguments[$1.to_i - 1] }
# Replace $# or $* with all arguments
string.gsub!(/\$[*|#]/, all_arguments)
return string
end
And it seems to me like it can and should be simpler.
I was hoping to find something that is closer to the Kernel.sprintf method of doing things - like "string with %{marker}" % {marker: 'value'}
So, although this issue is almost solved for me (I think), I would love to know if there is something I missed that can make it more elegant.
It seems like you're trying to reproduce Bash-style variable expansion, which is an extremely complex problem. At the very least, though, you can simplify your code in two ways:
Use Kernel.sprintf's built in positional argument feature. The below code does this by substituting e.g. $1 with the sprintf equivalent %1$s.
Use Shellwords from the standard library to escape arguments with spaces etc.
require 'shellwords'
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
tmpl = string.gsub(/\$(\d+)/, '%\1$s')
(tmpl % arguments).gsub(/\$[*#]/, arguments.shelljoin)
end
string = "echo $1 ; echo $#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
If you didn't have the $* requirement I'd suggest just dropping the Bash-like format and just using sprintf, since it covers everything else you mentioned. Even so, you could further simplify things by using sprintf formatting for everything else:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string.gsub('%#', arguments.shelljoin) % arguments
end
string = "echo %1$s ; echo %#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
Edit
If you want to use %{1} with sprintf you could turn the input array into a hash where the integer indexes are turned into symbol keys, e.g. ["hello", "world"] becomes { :"1" => "hello", :"2" => "world" }:
require "shellwords"
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string % {
:* => arguments.shelljoin,
**arguments.map.with_index {|val,idx| [ :"#{idx + 1}", val ] }.to_h
}
end
string = "echo %{1} ; echo %{*}"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
string = "echo $1 ; echo $# ; echo $2 ; echo $cat"
values = ["hello", "World War II"]
vals = values.map { |s| s.include?(' ') ? "\"#{s}\"" : s }
#=> ["hello", "\"World War II\""]
all_vals = vals.join(' ')
#=> "hello \"World War II\""
string.gsub(/\$\d+|\$[#*]/) { |s| s[/\$\d/] ? vals[s[1..-1].to_i-1] : all_vals }
#=> "echo hello ; echo hello \"World War II\" ; echo \"World War II\" ; echo $cat" $cat"

Check if string1 is before string2 on the same line

I am trying to match comment lines in a c#/sql code. CREATE may come before or after /*. They can be on the same line.
line6 = " CREATE /* this is ACTIVE line 6"
line5 = " charlie /* CREATE inside this is comment 5"
In the first case, it will be an active line; in the second, it will be a comment. I probably can do some kind of charindex, but maybe there is a simpler way
regex1 = /\/\*||\-\-/
if (line1 =~ regex1) then puts "Match comment___" + line6 else puts '____' end
if (line1 =~ regex1) then puts "Match comment___" + line5 else puts '____' end
With the regex
r = /
\/ # match forward slash
\* # match asterisk
\s+ # match > 0 whitespace chars
CREATE # match chars
\b # match word break (to avoid matching CREATED)
/ # extended mode for regex def
you can return an array of the comment lines thus:
[line6, line5].select { |l| l =~ r }
#=> [" charlie /* CREATE inside this is comment 5"]

How to read the contents from a file in an array in Ruby

I want to read a file which has name value pair on a remote server. As per the requirement I need to shell into the remote server, read the file, and then grep for the values. Example:
/domain/srvr/primary = ABC
/host/DEF/second = DEF
/host/XYZ/second = XYZ
/host/GHI/second = GHI
:
:
:
Now I want to read this file and make an array of all secondary servers (ex: DEF, XYZ, GHI) but I am getting nil value.
primary = #ssh.exec!("cd /home/dir; grep 'srvr/primary' #{filename} | awk '{print $3}'")
secondary = #ssh.exec!("cd /home/dir; grep '\<host.*second\>' #{filename} | awk '{print $3}'")`
It prints the primary server name properly but returns nil for secondary servers array. I tried to use split("\n") but errors out by saying undefined method 'split' for nil:NilClass.
Need help in getting all the secondary servers in an array.
You can use something like this
file_contents.
split("\n").
map {|line| line.split(" = ") }.
find_all {|pair| pair[0] =~ /second$/ }.
map(&:last)
and probably get file contents with cat or downloading the file through ssh. If in the same server just use File.read.
If you can only use bash or just prefer to, you can use
grep -p "^\/host.*second" path/to/file | cut -d" " -f 3
The -p option will enable perl regexp syntax on grep which will give you all the capabilities you want to search through the file. Then cut will split each string by the delimiter -d ang get the field of 1 based index -f. In this case string = server, so server is on the third field.
Instead of trying to do the work on the remote host, you can simplify the task and only grab the data and process it locally. You can use Net::SCP (docs), Net::FTP or Net::SFTP to easily retrieve the data.
I'd use something like this to grab the desired data once the text has been received:
data = <<EOT
/domain/srvr/primary = ABC
/host/DEF/second = DEF
/host/XYZ/second = XYZ
/host/GHI/second = GHI
EOT
data.split("\n").grep(/\bsecond\b/).map{ |l| l.split.last }
# => ["DEF", "XYZ", "GHI"]
or:
data.split("\n").grep(/\bsecond\b/).map{ |l| l[/= (\S+)/, 1] }
# => ["DEF", "XYZ", "GHI"]
or:
data.split("\n").grep(/\bsecond\b/).map{ |l| l.rstrip[/\S+$/] }
# => ["DEF", "XYZ", "GHI"]
Just to make it more interesting:
require 'fruity'
data = <<EOT
/domain/srvr/primary = ABC
/host/DEF/second = DEF
/host/XYZ/second = XYZ
/host/GHI/second = GHI
EOT
compare do
p1 { data.split("\n").grep(/\bsecond\b/).map{ |l| l[/= (\S+)/, 1] } }
p2 { data.split("\n").grep(/\bsecond\b/).map{ |l| l.rstrip[/\S+$/] } }
p3 { data.split("\n").grep(/\bsecond\b/).map{ |l| l.rstrip[/\S+$/] } }
end
# >> Running each test 256 times. Test will take about 1 second.
# >> p1 is faster than p2 by 3.9x ± 0.01
# >> p2 is similar to p3

Is it possible in Ruby to print a part of a regex (group) and instead of the whole matched substring?

Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
And,
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $_.split.map {|w| w[0]}.join(" ")'

Parsing delimited text with escape characters

I'm trying to parse (in Ruby) what's effectively the UNIX passwd file-format: comma delimiters, with an escape character \ such that anything escaped should be considered literally. I'm trying to use a regular expression for this, but I'm coming up short — even when using Oniguruma for lookahead/lookbehind assertions.
Essentially, all of the following should work:
a,b,c # => ["a", "b", "c"]
\a,b\,c # => ["a", "b,c"]
a,b,c\
d # => ["a", "b", "c\nd"]
a,b\\\,c # => ["a", "b\,c"]
Any ideas?
The first response looks pretty good. With a file containing
\a,,b\\\,c\,d,e\\f,\\,\
g
it gives:
[["\\a,"], [","], ["b\\\\\\,c\\,d,"], ["e\\\\f,"], ["\\\\,"], ["\\\ng\n"], [""]]
which is pretty close. I don't need the unescaping done on this first pass, as long as everything splits correctly on the commas. I tried Oniguruma and ended up with (the much longer):
Oniguruma::ORegexp.new(%{
(?: # - begins with (but doesn't capture)
(?<=\A) # - start of line
| # - (or)
(?<=,) # - a comma
)
(?: # - contains (but doesn't capture)
.*? # - any set of characters
[^\\\\]? # - not ending in a slash
(\\\\\\\\)* # - followed by an even number of slashes
)*?
(?: # - ends with (but doesn't capture)
(?=\Z) # - end of line
| # - (or)
(?=,)) # - a comma
},
'mx'
).scan(s)
Try this:
s.scan(/((?:\\.|[^,])*,?)/m)
It doesn't translate the characters following a \, but that can be done afterwards as a separate step.
I'd give the CSV class a try.
And a regex solution (hack?) might look like this:
#!/usr/bin/ruby -w
# contents of test.csv:
# a,b,c
# \a,b\,c
# a,b,c\
# d
# a,b\\\,c
file = File.new("test.csv", "r")
tokens = file.read.scan(/(?:\\.|[^,\r\n])*|\r?\n/m)
puts "-----------"
tokens.length.times do |i|
if tokens[i] == "\n" or tokens[i] == "\r\n"
puts "-----------"
else
puts ">" + tokens[i] + "<"
end
end
file.close
which will produce the output:
-----------
>a<
>b<
>c<
-----------
>\a<
>b\,c<
-----------
>a<
>b<
>c\
d<
-----------
>a<
>b\\\,c<
-----------

Resources