Printing string fields in Ruby - ruby

I have this string in the variable var:
cheese dogs cats alligators
I know I could get the second field in this string " dogs" using awk if I was on a linux command line.
> cat var | awk '{print $2}'
But how would I do this in Ruby?

Ruby has a String#split method that splits on whitespace by default, returning an array whose second element can then be accessed:
irb(main):001:0> 'cheese dogs cats alligators'.split
=> ["cheese", "dogs", "cats", "alligators"]
irb(main):002:0> 'cheese dogs cats alligators'.split[1]
=> "dogs"

echo cheese dogs cats alligators | ruby -ne 'puts $_.split[1] '


How do I keep the split token in the second part of what was split in Ruby?

In Ruby, how do you split a stirng and keep the token with which you are splitting on in the second part of the result of the split? I have
But the token is getting merged into the first part of teh split and I want it in the second part
2.4.0 :004 > split_token = "aaa"
=> "aaa"
2.4.0 :005 > line = "bbb aaa ccc"
=> "bbb aaa ccc"
2.4.0 :006 > line.split(/(?<=#{Regexp.escape(split_token)})/)
=> ["bbb aaa", " ccc"]
Changing lookbehind ((?<=) to lookahead ((?=) seems to do the trick:
split_token = "aaa"
line = "bbb aaa ccc"
# => ["bbb ", "aaa ccc"]
This just changes the split point to before the token rather than after it.
Another possibility is to use slice_before :
line.split.slice_before('aaa').map{|s| s.join(' ')}

Ruby one liner to replace only lines that match, discard others

Looking for the ruby one liner substitute to print out a substitution only if the line matches the regular expression:
echo -e "Line 1\nLine 2\nLine 3" | perl -ne "print if s/Line 2/Line 2 replaced, others discarded/g"
Line 1
Line 2
Line 3
Line 2 replaced, others discarded
As I know, there is no equivalent to -ne shorthand in ruby. So it will be little longer:
echo -e "Line 1\nLine 2\nLine 3" | ruby -e 'puts $< {|l| l =~ /Line 2/ ? l.gsub(/Line 2/, "Line 2 replaced, others discarded") : nil }.compact'
$< also ARGF (docs) is Stream for file argument or STDIO
$<.read will read it all to string
$<.read.lines split by new line character, returns array
map {|l| ... } will collect result of expression in a block to new array
l =~ /Line 2/ check if string match Regex
l.gsub(/Line 2/, "Line 2 replaced") will replace all "Line 2" to "Line 2 replaced"
.compact will remove nil values from array (return new array without nil's)
puts [] will print each element of array on new line
Probably ruby is not a best chose for this task, I would choose sed or do it in text editor. Most of text editors can find and replace by regex nowdays

Remove duplicate substrings from a string

str = "hi ram hi shyam hi jhon"
I want something like:
"ram hi shyam hi jhon"
"ram shyam hi jhon"
I assume you want to remove duplicate occurrences of all words, not just "hi". Here are two ways of doing that.
1 Use String#reverse, Array#reverse and Array#uniq
str = "hi shyam ram hi shyam hi jhon"
str.split.reverse.uniq.reverse.join(' ')
#=> "ram shyam hi jhon"
The doc for uniq states: "self is traversed in order, and the first occurrence is kept."
2 Use a regular expression
r = /
\b # match a word break
(\w+) # match a word in capture group 1
\s # match a trailing space
(?= # begin a positive lookahead
.* # match any number of characters
\s # match a space
\1 # match the contents of capture group 1
\b # match a word break
) # end the positive lookahead
/x # free-spacing regex definition mode
str.gsub(r, '')
#=> "ram shyam hi jhon"
To remove the extra spaces change \s to \s+ in the third line of the regex definition.
str = "hi ram hi shyam hi jhon"
To remove one occurrence:
str.sub('hi', '').strip.squeeze
#⇒ "ram hi shyam hi jhon"
To remove n occurrences:
n.times { str.replace(str.sub('hi', '').strip.squeeze) }
You are looking for sub!:
str = "hi ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram hi shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam hi jhon"
str.sub!("hi ", "")
#=> "ram shyam jhon"
In-case you do not what to modify your original string, which is not how the example looks like, you might want to use sub instead and an extra variable

Is it possible in Ruby to print a part of a regex (group) and instead of the whole matched substring?

Is it possible in sed may be even in Ruby to memorize the matched part of a pattern and print it instead of the full string which was matched:
"aaaaaa bbb ccc".strip.gsub(/([a-z])+/, \1) # \1 as a part of the regex which I would like to remember and print then instead of the matched string.
# => "a b c"
I thing in sed it should be possible with its h = holdspace command or similar, but what also about Ruby?
Not sure what you mean. Here are few example:
print "aaaaaa bbb ccc".strip.gsub(/([a-z])+/, '\1')
# => "a b c"
print "aaaaaa bbb ccc".strip.scan(/([a-z])+/).flatten
# => ["a", "b", "c"]
The shortest answer is grep:
echo "aaaaaa bbb ccc" | grep -o '\<.'
You can do:
"aaaaaa bbb ccc".split
and then join that array back together with the first character of each element
[a[0][0,1], a[1][0,1], a[2][0,1], a[3][0,1], ... ].join(" ")
#glennjackman's suggestion: ruby -ne 'puts $ {|w| w[0]}.join(" ")'

Ruby scan Regular Expression

I'm trying to split the string:
"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
into the following array:
["foo","bar bar bar"]
["test","abc","123","456 789"]
I tried the following, but it isn't quite right:
"[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
# =>
# [
# ["test", "blah"]
# ["foo", "bar bar bar"]
# ["test", "abc |123 | 456 789"]
# ]
I need to split at every pipe instead of the first pipe. What would be the correct regular expression to achieve this?
s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
arr = s.scan(/\[(.*?)\]/).map {|m| m[0].split(/ *\| */)}
Two alternatives:
s = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
s.split(/\s*\n\s*/).map{ |p| p.scan(/[^|\[\]]+/).map(&:strip) }
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
irb> s.split(/\s*\n\s*/).map do |line|
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
Both of them start by splitting on newlines (throwing away surrounding whitespace).
The first one then splits each chunk by looking for anything that is not a [, |, or ] and then throws away extra whitespace (calling strip on each).
The second one then throws away leading [ and trailing ] (with whitespace) and then splits on | (with whitespace).
You cannot get the final result you want with a single scan. About the closest you can get is this:
s.scan /\[(?:([^|\]]+)\|)*([^|\]]+)\]/
#=> [["test", " blah"], ["foo ", "bar bar bar"], ["123 ", " 456 789"]]
…which drops information, or this:
s.scan /\[((?:[^|\]]+\|)*[^|\]]+)\]/
#=> [["test| blah"], ["foo |bar bar bar"], ["test| abc |123 | 456 789"]]
…which captures the contents of each "array" as a single capture, or this:
s.scan /\[(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?(?:([^|\]]+)\|)?([^|\]]+)\]/
#=> [["test", nil, nil, " blah"], ["foo ", nil, nil, "bar bar bar"], ["test", " abc ", "123 ", " 456 789"]]
…which is hardcoded to a maximum of four items, and inserts nil entries that you would need to .compact away.
There is no way to use Ruby's scan to take a regex like /(?:(aaa)b)+/ and get multiple captures for each time the repetition is matched.
Why the hard path (single regex)? Why not a simple combo of splits? Here are the steps, to visualize the process.
str = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
arr = str.split("\n").map(&:strip) # => ["[test| blah]", "[foo |bar bar bar]", "[test| abc |123 | 456 789]"]
arr ={|s| s[1..-2] } # => ["test| blah", "foo |bar bar bar", "test| abc |123 | 456 789"]
arr ={|s| s.split('|').map(&:strip)} # => [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
This is likely far less efficient than scan, but at least it's simple :)
A "Scan, Split, Strip, and Delete" Train-Wreck
The whole premise seems flawed, since it assumes that you will always find alternation in your sub-arrays and that expressions won't contain character classes. Still, if that's the problem you really want to solve for, then this should do it.
First, str.scan( /\[.*?\]/ ) will net you three array elements, each containing pseudo-arrays. Then you map the sub-arrays, splitting on the alternation character. Each element of the sub-array is then stripped of whitespace, and the square brackets deleted. For example:
str = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
str.scan( /\[.*?\]/ ).map { |arr| arr.split('|').map { |m| m.strip.delete '[]' }}
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
Verbosely, Step-by-Step
Mapping nested arrays is not always intuitive, so I've unwound the train-wreck above into more procedural code for comparison. The results are identical, but the following may be easier to reason about.
string = "[test| blah] \n [foo |bar bar bar]\n[test| abc |123 | 456 789]"
array_of_strings = string.scan( /\[.*?\]/ )
#=> ["[test| blah]", "[foo |bar bar bar]", "[test| abc |123 | 456 789]"]
sub_arrays = { |sub_array| sub_array.split('|') }
#=> [["[test", " blah]"],
# ["[foo ", "bar bar bar]"],
# ["[test", " abc ", "123 ", " 456 789]"]]
stripped_sub_arrays = { |sub_array| }
#=> [["[test", "blah]"],
# ["[foo", "bar bar bar]"],
# ["[test", "abc", "123", "456 789]"]]
sub_arrays_without_brackets = { |sub_array| {|elem| elem.delete '[]'} }
#=> [["test", "blah"], ["foo", "bar bar bar"], ["test", "abc", "123", "456 789"]]
