Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I've got a string that has variable length sections. The length of the section precedes the content of that section. So for example, in the string:
13JOHNSON,STEVE
The first 2 characters define the content length (13), followed by the actual content. I'd like to be able to parse this using named capture groups with a backreference, but I'm not sure it is possible. I was hoping this would work:
(?<length>\d{2})(?<name>.{\k<length>})
But it doesn't. Seems like the backreference isn't interpreted as a number. This works fine though:
(?<length>\d{2})(?<name>.{13})
No, that will not work of course. You need to recompile your regular expression after extracting the first number.
I would recommend you to use two different expressions:
the first one that extracts number, and the second one that extracts texts basing on the number extracted by the first one.
You can't do that.
>> s = '13JOHNSON,STEVE'
=> "13JOHNSON,STEVE"
>> length = s[/^\d{2}/].to_i # s[0,2].to_i
=> 13
>> s[2,length]
=> "JOHNSON,STEVE"
This really seems like you're going after this the hard way. I suspect the sample string is not as simple as you said, based on:
I've got a string that has variable length sections. The length of the section precedes the content of that section.
Instead I'd use something like:
str = "13JOHNSON,STEVE 08Blow,Joe 10Smith,John"
str.scan(/\d{2}(\S+)/).flatten # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
If the string can be split accurately, then there's this:
str.split.map{ |s| s[2..-1] } # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
If you only have length bytes followed by strings, with nothing between them something like this works:
offset = 0
str.delete!(' ') # => "13JOHNSON,STEVE08Blow,Joe10Smith,John"
str.scan(/\d+/).map{ |l| s = str[offset + 2, l.to_i]; offset += 2 + l.to_i ; s }
# => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
won't work if the names have digits in them – tihom
str = "13JOHNSON,STEVE 08Blow,Joe 10Smith,John 1012345,7890"
str.scan(/\d{2}(\S+)/).flatten # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
str.split.map{ |s| s[2..-1] } # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
With a a minor change, and minor addition it'll continue to work correctly with strings not containing delimiters:
str.delete!(' ') # => "13JOHNSON,STEVE08Blow,Joe10Smith,John1012345,7890"
offset = 0
str.scan(/\d{2}/).map{ |l| s = str[offset + 2, l.to_i]; offset += 2 + l.to_i ; s }.compact
# => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
\d{2} grabs the numerics in groups of two. For the names where the numeric is a leading length value of two characters, which is according to the OPs sample, the correct thing happens. For a solid numeric "name" several false-positives are returned, which would return nil values. compact cleans those out.
What about this?
a = '13JOHNSON,STEVE'
puts a.match /(?<length>\d{2})(?<name>(.*),(.*))/
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to split a query string like:
"(first_name:zach AND last_name:woods) OR (first_name:thomas AND last_name:middleditch) OR (first_name:martin AND last_name:starr) OR "...
into substrings, each not greater than 5000 characters, and I want to split on the pattern " OR ".
Help would be appreciated.
If your query is just like the example, you can just split by OR, then loop through the substrings to join them together until it reaches 5000 characters.
original_query = "(first_name:zach AND last_name:woods) OR ..."
split_arr = original_query.split(/(?<=OR)/) # Split but keeps delimiter OR
result = []
pattern = ""
split_arr.each do |query|
if (pattern.length + query.length) > 5000 # If reached limit
result.push(pattern) # Store the current pattern
pattern = query # Start new substring
else # Else
pattern = pattern + " " + query # Just add more query to current pattern
end
end
result.push(pattern) if pattern.length > 0 # Check for the final case
puts result
Then, you will get the array result with substrings that are less than 5000 characters. However, given your string is an SQL query (maybe), whether the substrings are correct syntactically or not depends on your original query.
It is better to have these query constraints while building the query itself.
If you still want to work with this approach, one way would be to scan the conditions and concatenate them based on the size you preferred.
# Scan all matching conditions
conditions = str.scan(/first_name:[a-z]+ AND last_name:[a-z]+/)
# Final queries array
result = []
# Iterate over the conditions array as batch collection and build query
# Considering average size of each one as 35, batching group of 140 items
conditions.in_groups_of(140) { |group| group.reduce { |x, y| result << (x + (y.nil? ? '' : ' OR '+ y)) } }
The result array would have the queries split by size.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to split a string, for food products, such as "Chocolate Biscuits 200g"
I need to extract the "200g" from the String and then split this by number and then by the measurement/weight.
So I need the "200" and "g" separately.
I have written a Ruby regex to find the "200g" in the String (sometimes there may be space between the number and measurement so I have included an optional whitespace between them):
([0-9]*[?:\s ]?[a-zA-Z]+)
And I think it works. But now that I have the result ("200g") that it matched from the entire String, I need to split this by number and measurement.
I wrote two regexes to split these:
([0-9]+)
to split by number and
([a-zA-Z]+)
to split by letters.
But the .split method is not working with these.
I get the following error:
undefined method 'split' for #MatchData "200"
Of course I will need to convert the 200 to a number instead of a String.
Any help is greatly appreciated,
Thank you!
UPDATE:
I have tested the 3 regexes on http://www.rubular.com/.
My issue seems to be around splitting up the result from the first regex into number and measurement.
One way among many is to use String#scan with a regex. See the last sentence of the doc concerning the treatment of capture groups.
str = "Chocolate Biscuits 200g"
r = /
(\d+) # match one or more digits in capture group 1
([[:alpha:]]+) # match one or more alphabetic characters in capture group 2
/x # free-spacing regex definition mode
number, weight = str.scan(r).flatten
#=> ["200", "g"]
number = number.to_i
#=> 200
I'm not an expert in ruby, but I guess that the following code does the deal
myString = String("Chocolate Biscuits 200g");
weight = 0;
unit = String('');
stringArray = myString.split(/(?:([a-zA-Z]+)|([0-9]+))/);
stringArray.each{
|val|
if val =~ /\A[0-9]+\Z/
weight = val.to_i;
elsif weight > 0 and val.length > 0
unit = val;
end
}
p weight;
p unit;
I'm trying to write a Ruby script that replaces all rem values in a CSS file with their px equivalents. This would be an example CSS file:
body{font-size:1.6rem;margin:4rem 7rem;}
The MatchData I'd like to get would be:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
# 3. 7
However I'm entirely clueless as to how to get multiple and different MatchData results. The RegEx that got me closest is this (you can also take a look at it at Rubular):
/([^}{;]+):\s*([0-9.]+?)rem(?=\s*;|\s*})/i
This will match single instances of value declarations (so it will properly return the desired Match 1 result), but entirely disregards multiples.
I also tried something along the lines of ([0-9.]+?rem\s*)+, but that didn't return the desired result either, and doesn't feel like I'm on the right track, as it won't return multiple result data sets.
EDIT After the suggestions in the answers, I ended up solving the problem like this:
# search for any declarations that contain rem unit values and modify blockwise
#output.gsub!(/([^ }{;]+):\s*([^}{;]*[0-9.]rem+[^;]*)(?=\s*;|\s*})/i) do |match|
# search for any single rem value
string = match.gsub(/([0-9.]+)rem/i) do |value|
# convert the rem value to px by multiplying by 10 (this is not universal!)
value = sprintf('%g', Regexp.last_match[1].to_f * 10).to_s + 'px'
end
string += ';' + match # append the original match result to the replacement
match = string # overwrite the matched result
end
You can't capture a dynamic number of match groups (at least not in ruby).
Instead you could do either one of the following:
Capture the whole value and split on space
Use multilevel matching to capture first the whole key/value pair and secondly match the value. You can use blocks on the match method in ruby.
This regex will do the job for your example :
([^}{;]+):(?:([0-9\.]+?)rem\s?)?(?:([0-9\.]+?)rem\s?)
But whith this you can't match something like : margin:4rem 7rem 9rem
This is what I've been able to do: DEMO
Regex: (?<={|;)([^:}]+)(?::)([^A-Za-z]+)
And this is what my result looks like:
# Match 1 Match 2
# 1. font-size 1. margin
# 2. 1.6 2. 4
As #koffeinfrei says, dynamic capture isn't possible in Ruby. Would be smarter to capture the whole string and remove spaces.
str = 'body{font-size:1.6rem;margin:4rem 7rem;}'
str.scan(/(?<=[{; ]).+?(?=[;}])/)
.map { |e| e.match /(?<prop>.+):(?<value>.+)/ }
#⇒ [
# [0] #<MatchData "font-size:1.6rem" prop:"font-size" value:"1.6rem">,
# [1] #<MatchData "margin:4rem 7rem" prop:"margin" value:"4rem 7rem">
# ]
The latter match might be easily adapted to return whatever you want, value.split(/\s+/) will return all the values, \d+ instead of .+ will match digits only etc.
This question already has answers here:
Match a string against multiple patterns
(2 answers)
Closed 8 years ago.
I'm new to ruby and I'm trying to solve a problem.
I'm parsing through several text field where I want to remove the header which has different values. It works fine when the header always is the same:
variable = variable.gsub(/(^Header_1:$)/, '')
But when I put in several arguments it doesn't work:
variable = variable.gsub(/(^Header_1$)/ || /(^Header_2$)/ || /(^Header_3$)/ || /(^Header_4$)/ || /^:$/, '')
You can use Regexp.union:
regex = Regexp.union(
/^Header_1/,
/^Header_2/,
/^Header_3/,
/^Header_4/,
/^:$/
)
variable.gsub(regex, '')
Please note that ^something$ will not work on strings containing something more than something :)
Cause ^ is for matching beginning of string and $ is for end of string.
So i intentionally removed $.
Also you do not need brackets when you only need to remove the matched string.
You can also use it like this:
headers = %w[Header_1 Header_2 Header_3]
regex = Regexp.union(*headers.map{|s| /^#{s}/}, /^\:$/, /etc/)
variable.gsub(regex, '')
And of course you can remove headers without explicitly define them.
Most likely there are a white space after headers?
If so, you can do it as simple as:
variable = "Header_1 something else"
puts variable.gsub(/(^Header[^\s]*)?(.*)/, '\2')
#=> something else
variable = "Header_BLAH something else"
puts variable.gsub(/(^Header[^\s]*)?(.*)/, '\2')
#=> something else
Just use a proper regexp:
variable.gsub(/^(Header_1|Header_2|Header_3|Header_4|:)$/, '')
If the header is always the same format of Header_n, where n is some integer value, then you can simplify your regex greatly:
/Header_\d+/
will find every one of these:
%w[Header_1 Header_2 Header_3].grep(/Header_\d+/)
[
[0] "Header_1",
[1] "Header_2",
[2] "Header_3"
]
Tweaking it to handle finding words, not substrings:
/^Header_\d+$/
or:
/\bHeader_\d+\b/
As mentioned, using Regexp.union is a good start, but, used blindly, can result in very slow or inefficient patterns, so think ahead and help out the engine by giving it useful sub-patterns to work with:
values = %w[foo bar]
/Header_(?:\d+|#{ values.join('|') })/
=> /Header_(?:\d+|foo|bar)/
Unfortunately, Ruby doesn't have the equivalent to Perl's Regexp::Assemble module, which can build highly optimized patterns from big lists of words. Search here on Stack Overflow for examples of what it can do. For instance:
use Regexp::Assemble;
my #values = ('Header_1', 'Header_2', 'foo', 'bar', 'Header_3');
my $ra = Regexp::Assemble->new;
foreach (#values) {
$ra->add($_);
}
print $ra->re, "\n";
=> (?-xism:(?:Header_[123]|bar|foo))
I have the following function which accepts text and a word count and if the number of words in the text exceeded the word-count it gets truncated with an ellipsis.
#Truncate the passed text. Used for headlines and such
def snippet(thought, wordcount)
thought.split[0..(wordcount-1)].join(" ") + (thought.split.size > wordcount ? "..." : "")
end
However what this function doesn't take into account is extremely long words, for instance...
"Helloooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
world!"
I was wondering if there's a better way to approach what I'm trying to do so it takes both word count and text size into consideration in an efficient way.
Is this a Rails project?
Why not use the following helper:
truncate("Once upon a time in a world far far away", :length => 17)
If not, just reuse the code.
This is probably a two step process:
Truncate the string to a max length (no need for regex for this)
Using regex, find a max words quantity from the truncated string.
Edit:
Another approach is to split the string into words, loop through the array adding up
the lengths. When you find the overrun, join 0 .. index just before the overrun.
Hint: regex ^(\s*.+?\b){5} will match first 5 "words"
The logic for checking both word and char limits becomes too convoluted to clearly express as one expression. I would suggest something like this:
def snippet str, max_words, max_chars, omission='...'
max_chars = 1+omision.size if max_chars <= omission.size # need at least one char plus ellipses
words = str.split
omit = words.size > max_words || str.length > max_chars ? omission : ''
snip = words[0...max_words].join ' '
snip = snip[0...(max_chars-3)] if snip.length > max_chars
snip + omit
end
As other have pointed out Rails String#truncate offers almost the functionality you want (truncate to fit in length at a natural boundary), but it doesn't let you independently state max char length and word count.
First 20 characters:
>> "hello world this is the world".gsub(/.+/) { |m| m[0..20] + (m.size > 20 ? '...' : '') }
=> "hello world this is t..."
First 5 words:
>> "hello world this is the world".gsub(/.+/) { |m| m.split[0..5].join(' ') + (m.split.size > 5 ? '...' : '') }
=> "hello world this is the world..."