Replacing a word in text using Ruby - ruby

I know how to do it by using this code
poem['sleepy'] = 'busy'
What the above object type called? Is it an Array?
How can I assign the poem to the first line of the text in order to replace the word "sleepy" that is sitting on that specific line?

It's not an array, it's String's []= method. If you want to only replace on a certain line, you can split the poem string and get an array by using the split method:
lines = poem.split("\n")
You could then make the replacement on the line you wish:
lines[3]["sleepy"] = "busy"
And then you can use Array's join method to join the Array into a String again:
poem = lines.join("\n")

Related

string capture between duplicates in ruby

string = 'xabcdexfghijk'
In the example above, 'x' appears twice. I want to capture everything between the first 'x' and the next 'x'. Thus, the desired result is a new string that equals 'xabcdex'. Any ideas?
You could use a simple regular expression: /x.*?x/. This basically means "match any characters in between two x characters, as few times as possible (non-greedy)".
The matched text can be extracted with String#[regexp]
string = 'xabcdexfghijk'
string[/x.*?x/] # => "xabcdex"

Performing operations on each line of a string

I have a string named "string" that contains six lines.
I want to remove an "Z" from the end of each line (which each has) and capitalize the first character in each line (ignoring numbers and white space; e.g., "1. apple" -> "1. Apple").
I have some idea of how to do it, but have no idea how to do it in Ruby. How do I accomplish this? A loop? What would the syntax be?
Using regular expression (See String#gsub):
s = <<EOS
1. applez
2. bananaz
3. catz
4. dogz
5. elephantz
6. fruitz
EOS
puts s.gsub(/z$/i, '').gsub(/^([^a-z]*)([a-z])/i) { $1 + $2.upcase }
# /z$/i - to match a trailing `z` at the end of lines.
# /^([^a-z]*)([a-z])/i - to match leading non-alphabets and alphabet.
# capture them as group 1 ($1), group 2 ($2)
output:
1. Apple
2. Banana
3. Cat
4. Dog
5. Elephant
6. Fruit
I would approach this by breaking your problem into smaller steps. After we've solved each of the smaller problems, you can put it all back together for a more elegant solution.
Given the initial string put forth by falsetru:
s = <<EOS
1. applez
2. bananaz
3. catz
4. dogz
5. elephantz
6. fruitz
EOS
1. Break your string into an array of substrings, separated by the newline.
substrings = s.split(/\n/)
This uses the String class' split method and a regular expression. It searches for all occurrences of newline (backslash-n) and treats this as a delimiter, splitting the string into substrings based on this delimiter. Then it throws all of these substrings into an array, which we've named substrings.
2. Iterate through your array of substrings to do some stuff (details on what stuff later)
substrings.each do |substring|
.
# Do stuff to each substring
.
end
This is one form for how you iterate across an array in Ruby. You call the Array's each method, and you give it a block of code which it will run on each element in the array. In our example, we'll use the variable name substring within our block of code so that we can do stuff to each substring.
3. Remove the z character at the end of each substring
substrings.each do |substring|
substring.gsub!(/z$/, '')
end
Now, as we iterate through the array, the first thing we want to do is remove the z character at the end of each string. You do this with the gsub! method of String, which is a search-and-replace method. The first argument for this method is the regular expression of what you're looking for. In this case, we are looking for a z followed by the end-of-string (denoted by the dollar sign). The second argument is an empty string, because we want to replace what's been found with nothing (another way of saying - we just want to remove what's been found).
4. Find the index of the first letter in each substring
substrings.each do |substring|
substring.gsub!(/z$/, '')
index = substring.index(/[a-zA-Z]/)
end
The String class also has a method called index which will return the index of the first occurrence of a string that matches the regular expression your provide. In our case, since we want to ignore numbers and symbols and spaces, we are really just looking for the first occurrence of the very first letter in your substring. To do this, we use the regular expression /[a-zA-Z]/ - this basically says, "Find me anything in the range of small A to small Z or in big A to big Z." Now, we have an index (using our example strings, the index is 3).
5. Capitalize the letter at the index we have found
substrings.each do |substring|
substring.gsub!(/z$/, '')
index = substring.index(/[a-zA-Z]/)
substring[index] = substring[index].capitalize
end
Based on the index value that we found, we want to replace the letter at that index with that same letter, but capitalized.
6. Put our substrings array back together as a single-string separated by newlines.
Now that we've done everything we need to do to each substring, our each iterator block ends, and we have what we need in the substrings array. To put the array back together as a single string, we use the join method of Array class.
result = substrings.join("\n")
With that, we now have a String called result, which should be what you're looking for.
Putting It All Together
Here is what the entire solution looks like, once we put together all of the steps:
substrings = s.split(/\n/)
substrings.each do |substring|
substring.gsub!(/z$/, '')
index = substring.index(/[a-zA-Z]/)
substring[index] = substring[index].capitalize
end
result = substrings.join("\n")

sorting text lines Google Apps Script

Sorry for this extreme beginner question. I have a string variable originaltext containing some multiline text. I can convert it into an array of lines like so:
lines = originaltext.split("\n");
But I need help sorting this array. This DOES NOT work:
lines.sort;
The array remains unsorted.
And an associated question. Assuming I can sort my array somehow, how do I then convert it back to a single variable with no separators?
Your only issue is a small one - sort is actually a method, so you need to call lines.sort(). In order to join the elements together, you can use the join() method:
var originaltext = "This\n\is\na\nline";
lines = originaltext.split("\n");
lines.sort();
joined = lines.join("");

Read from a file into an array and stop if a ":" is found in ruby

How can I in Ruby read a string from a file into an array and only read and save in the array until I get a certain marker such as ":" and stop reading?
Any help would be much appreciated =)
For example:
10.199.198.10:111 test/testing/testing (EST-08532522)
10.199.198.12:111 test/testing/testing (EST-08532522)
10.199.198.13:111 test/testing/testing (EST-08532522)
Should only read the following and be contained in the array:
10.199.198.10
10.199.198.12
10.199.198.13
This is a rather trivial problem, using String#split:
results = open('a.txt').map { |line| line.split(':')[0] }
p results
Output:
["10.199.198.10", "10.199.198.12", "10.199.198.13"]
String#split breaks a string at the specified delimiter and returns an array; so line.split(':')[0] takes the first element of that generated array.
In the event that there is a line without a : in it, String#split will return an array with a single element that is the whole line. So if you need to do a little more error checking, you could write something like this:
results = []
open('a.txt').each do |line|
results << line.split(':')[0] if line.include? ':'
end
p results
which will only add split lines to the results array if the line has a : character in it.

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources