Splitting a string into multiple strings after a specified character in Ruby - ruby

I am trying to create a script that takes in a string with an unknown number of parts that are separated by ';'.
"blah blah one; blah blah two"
I tried to use the (?: re) example from rubydocs but had no luck with it. Can someone show me the best way to do this?

you can just use the split method
a='blah blah blah;blah foo blah'.split(";")
the parts will be store in index from 0 to 1 (in the above example )

Related

How to select words that are made up of the same letter using regex?

I have a dictionary text file that contains some words that I don't want.
Example:
aa
aaa
aaaa
bb
b
bbb
etc
I want to use a regular expression to select these words and remove them. However,
what I have seems to be getting too long and there must be a more efficient approach.
Here is my code so far:
/^a{1,6}$|^b{1,6}$|^c{1,6}$|^d{1,6}$|^e{1,6}$|^f{1,6}$|^g{1,6}$|^[i]{2,3}$/
It seems that I have to do this for every letter. How could I do this more succinctly?
It's a lot easier to collapse the word down to unique letters and remove all of those with just one letter in them:
words = "aa aaa aaaa bb b bbb etc aab abcabc"
words.split(/\s+/).select do |word|
word.chars.uniq.length > 1
end
# => ["etc", "aab", "abcabc"]
This splits your string into words, then selects only those words that have more than one type of character in them (.chars.uniq)
^([a-z])\1?\1?\1?\1?\1?$
Match any single letter, followed by 5 optional backreferences to the initial letter.
This might work too:
^([a-z])\1{,5}$
Try this
\b([a-zA-Z])\1*\b
if you want (in addition to letters) to include also repeated digits or underscores, use this code:
\b([\w])\1*\b
Update:
To exclude I from being removed:
(?i)ii+|\b((?i)[a-hj-z])\1*\b
(?i) is added above to make letters not case sensitive.
Demo:
https://regex101.com/r/gFUWE8/7
You can try with this regex:
\b([a-z])\1{0,}\b
and replace by empty
Ruby code sample:
re = /\b([a-z])\1{0,}\b/m
str = 'aa aaa aaaa bb b bbb abc aa a pqaaa '
result = str.gsub(re,'')
puts result
Run the code here

Return specific segment from Ruby regex

I have a big chunk of text I am scanning through and I am searching with a regex that is prefixed by some text.
var1 = textchunk.match(/thedata=(\d{6})/)
My result from var1 would return something like:
thedata=123456
How do I only return the number part of the search so in the example above just 123456 without taking var1 and then stripping thedata= off in a line below
If you expect just one match in the string, you may use your own code and access the captures property and get the first item (since the data you need is captured with the first set of unescaped parentheses that form a capturing group):
textchunk.match(/thedata=(\d{6})/).captures.first
See this IDEONE demo
If you have multiple matches, just use scan:
textchunk.scan(/thedata=(\d{6})/)
NOTE: to only match thedata= followed with exactly 6 digits, add a word boundary:
/thedata=(\d{6})\b/
^^
or a lookahead (if there can be word chars after 6 digits other than digits):
/thedata=(\d{6})(?!\d)/
^^^^^^
▶ textchunk = 'garbage=42 thedata=123456'
#⇒ "garbage=42 thedata=123456"
▶ textchunk[/thedata=(\d{6})/, 1]
#⇒ "123456"
▶ textchunk[/(?<=thedata=)\d{6}/]
#⇒ "123456"
The latter uses positive lookbehind.

Display characters around regex match

Is it possible to display the characters around a regex match? I have the string below, and I want to substitute every occurrence of "change" while displaying the 3-5 characters before the match.
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
What I have so far
while line.match(/change/)
printf "\n\n Substitute the FIRST change below:\n"
printf "#{line}\n"
printf "\n\tSubstitute => "
substitution = gets.chomp
line = line.sub(/change/, "#{substitution}")
end
If you want to get down and dirty Perl style:
before_chars = $`[-3, 3]
This is the last three characters just before your pattern match.
You would likely use gsub! with block given in the following manner:
line = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
# line.gsub!(/(?<where>.{0,3})change/) {
line.gsub!(/(?<where>\S+)change/) {
printf "\n\n Substitute the change around #{Regexp.last_match[:where]} => \n"
substitution = gets.chomp
"#{Regexp.last_match[:where]}#{substitution}"
}
puts line
Yielding:
Substitute the change around val= =>
111
Substitute the change around anotherval= =>
222
Substitute the change around stringhere: =>
333
val=111 anotherval=222 stringhere:333: foo=bar foofoo=barbar
gsub! will substitute the matches in place, while more suitable pattern \S+ instead of commented out .{0,3} will give you an ability to print out the human-readable hint.
Alternative: Use $1 Match Variable
tadman's answer uses the special prematch variable ($`). Ruby will also store a capture group in a numbered variable, which is probably just as magical but possibly more intuitive. For example:
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
string.sub(/(.{3})?change/, "\\1#{substitution}")
$1
# => "al="
No matter what method you use, though, make sure you explicitly check your match variables for nils in the event that your last attempted match was unsuccessful.

How to get multiple occurences of a pattern between two definite endpoints

I have a string like:
CREATE TABLE foobar (
bar foo,
foo bar
) DISTRIBUTED BY
I would like to get all column definitions from this string. I tried:
my_string.scan /CREATE TABLE .*\n([^\n]*?)\n.*DISTRIBUTED BY/
But it does not return with the desired values (["bar foo,", "foo bar"]) . Any ideas?
The key point of scan method is each new match begins when the last one ends:
a = "cruel world"
a.scan(/.../) #=> ["cru", "el ", "wor"]
So you need to define your pattern so that it will match both at the beginning and in the middle of the string. Needless to say, that won't be easy to build such a look-behind expression.
But I wonder will this be enough to your specific goals:
s = <<HR
CREATE TABLE foobar (
bar foo,
foo bar
) DISTRIBUTED BY}
HR
ax = s.scan /\s+(.+?)(?:,\n|\n\))/
#=> [["bar foo"], ["foo bar"]]
As you see, I didn't try to match CREATE TABLE here, assuming the string has the query ready.
I think this is what you were trying for:
/CREATE TABLE .*\n((?:.*\n)+).*DISTRIBUTED BY/
(?:.*\n) matches an individual line, so ((?:.*\n)+) captures one or more lines in group #1. The linefeed at the end of the last line (foo bar) is included, but you can delete that at the same time you clean up the commas (e.g. from bar foo,).
If you're thinking about doing anything more complicated, think about using an actual parser; regex do not play well with SQL.
Probably this is the way to go.
my_string.split[1..-2].map(&:strip)

Parsing String in Ruby then Storing to Database

I'd like to parse string for two different tags then store each in the database. Let's call these tag1 and tag2. I have a delimeter of sorts, "?#" that is the split between tag1 and tag2.
Suppose
t = "random text blah firsttag?#secondtag more blah"
Goal: tag1 should be "firsttag" and tag2 should be "secondtag" without the preceding or trailing random text. Each should get stored as objects in the database.
I tried something like:
t.split
but it returns
["random text blah firsttag", "secondtag more blah"]
and includes the random text. How can I get the split to stop when it reaches the first space in either direction?
I'd like this to also work if there are multiple tag pairs in the string, for example, if:
m = "random firsttag#?secondtag blah blah 1sttag#?2ndtag blah blah blah"
I'm pretty new to both ruby and rails, so I really appreciate your help on this one!
You can use a regular expression combined with split:
tags = t.match(/\S+\?#\S+/)[0].split('?#')
Explanation
First let's capture the interesting part of the text, which is tag?#tag2. We'll use a regex for that:
/\S+\?#\S+/
Explanation:
\S+ captures non-whitespace characters preceding the delimiter (tag)
\?# captures the delimiter token
\S+ captures all trailing non-whitespace chars (tag2)
The match of that regex (which we access with [0] is:
firsttag?#secondtag
Then from that we just split using the delimiter and we get the array with the tags.
Cheers!
PS: I've retagged the post since it has nothing to do with ruby-on-rails. It's just plain ruby

Resources