String split based on pattern and size [closed] - ruby

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to split a query string like:
"(first_name:zach AND last_name:woods) OR (first_name:thomas AND last_name:middleditch) OR (first_name:martin AND last_name:starr) OR "...
into substrings, each not greater than 5000 characters, and I want to split on the pattern " OR ".
Help would be appreciated.

If your query is just like the example, you can just split by OR, then loop through the substrings to join them together until it reaches 5000 characters.
original_query = "(first_name:zach AND last_name:woods) OR ..."
split_arr = original_query.split(/(?<=OR)/) # Split but keeps delimiter OR
result = []
pattern = ""
split_arr.each do |query|
if (pattern.length + query.length) > 5000 # If reached limit
result.push(pattern) # Store the current pattern
pattern = query # Start new substring
else # Else
pattern = pattern + " " + query # Just add more query to current pattern
end
end
result.push(pattern) if pattern.length > 0 # Check for the final case
puts result
Then, you will get the array result with substrings that are less than 5000 characters. However, given your string is an SQL query (maybe), whether the substrings are correct syntactically or not depends on your original query.

It is better to have these query constraints while building the query itself.
If you still want to work with this approach, one way would be to scan the conditions and concatenate them based on the size you preferred.
# Scan all matching conditions
conditions = str.scan(/first_name:[a-z]+ AND last_name:[a-z]+/)
# Final queries array
result = []
# Iterate over the conditions array as batch collection and build query
# Considering average size of each one as 35, batching group of 140 items
conditions.in_groups_of(140) { |group| group.reduce { |x, y| result << (x + (y.nil? ? '' : ' OR '+ y)) } }
The result array would have the queries split by size.

Related

check if an item in weblist is indented or not [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have list of items in a weblist and which has both parent and child. Child is indented to the right, I need to retrieve values of child and parent in two different columns in a datatable.
My code goes like this:
list = qtp_getroproperty(page.weblist(), "items count", itemsCount
For n = 1 To itemsCount
items = page.weblist(), getitem(n)
In VBScript it's Left():
>> For Each s In Array("x", " x", " x")
>> WScript.Echo s, CStr(" " = Left(s, 1))
>> Next
>>
x Falsch
x Wahr
x Wahr
>>
Try this
if strSurname.StartsWith(" ")
There are several ways to go about this:
Extract the first character with the Left function, as Ekkehard Horner suggested:
If Left(str, 1) = " " Then
...
End If
Check the first character with the InStrRev function:
If InStrRev(str, " ", 1) > 0 Then
...
End If
LTrim the string and compare it to the original string:
If LTrim(str) <> str Then
...
End If
Use a regular expression:
Set re = New RegExp
re.Pattern = "^ "
If re.Test(str) Then
...
End If
Note that this last approach is the most versatile, but also the most expensive. Usually it won't make sense to use this to check for something as simple as "does the string begin with a space". It becomes more useful if for instance you want to check "does the string begin with any kind of whitespace" ("^\s").

Ruby - Split a String to retrieve a number and a measurement/weight and then convert numberFo [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to split a string, for food products, such as "Chocolate Biscuits 200g"
I need to extract the "200g" from the String and then split this by number and then by the measurement/weight.
So I need the "200" and "g" separately.
I have written a Ruby regex to find the "200g" in the String (sometimes there may be space between the number and measurement so I have included an optional whitespace between them):
([0-9]*[?:\s ]?[a-zA-Z]+)
And I think it works. But now that I have the result ("200g") that it matched from the entire String, I need to split this by number and measurement.
I wrote two regexes to split these:
([0-9]+)
to split by number and
([a-zA-Z]+)
to split by letters.
But the .split method is not working with these.
I get the following error:
undefined method 'split' for #MatchData "200"
Of course I will need to convert the 200 to a number instead of a String.
Any help is greatly appreciated,
Thank you!
UPDATE:
I have tested the 3 regexes on http://www.rubular.com/.
My issue seems to be around splitting up the result from the first regex into number and measurement.
One way among many is to use String#scan with a regex. See the last sentence of the doc concerning the treatment of capture groups.
str = "Chocolate Biscuits 200g"
r = /
(\d+) # match one or more digits in capture group 1
([[:alpha:]]+) # match one or more alphabetic characters in capture group 2
/x # free-spacing regex definition mode
number, weight = str.scan(r).flatten
#=> ["200", "g"]
number = number.to_i
#=> 200
I'm not an expert in ruby, but I guess that the following code does the deal
myString = String("Chocolate Biscuits 200g");
weight = 0;
unit = String('');
stringArray = myString.split(/(?:([a-zA-Z]+)|([0-9]+))/);
stringArray.each{
|val|
if val =~ /\A[0-9]+\Z/
weight = val.to_i;
elsif weight > 0 and val.length > 0
unit = val;
end
}
p weight;
p unit;

ruby regex replace the corresponding string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
str = "1627207:132069:color:green;20518:28421:size:62cm"
aliastr = "20518:28421:S;20518:28358:L;20518:28357:M;1627207:132069:red"
How to dynamic replace str to "1627207:132069:color:red;20518:28421:size:S".
It was a pretty unclear question, but I think I got it now. Your aliastr contains mappings which control the replacements, i.e., the key '20518:28421:' should map to value 'S' and the key '1627207:132069:' should map to 'red'. Then you want to search for those keys in str and replace their current value with that new value. This does that:
str = "1627207:132069:color:green;20518:28421:size:62cm"
aliastr = "20518:28421:S;20518:28358:L;20518:28357:M;1627207:132069:red"
mapping = Hash[aliastr.scan(/(\d+:\d+:)(.*?)(?:;|$)/)]
# mapping = {"20518:28421:"=>"S", "20518:28358:"=>"L", "20518:28357:"=>"M", "1627207:132069:"=>"red"}
replaced = str.gsub(/(\d+:\d+:)(\w+:).*?(;|$)/) do |match|
key = $1
value = mapping[$1]
key + $2 + value + $3
end
p replaced
# => "1627207:132069:color:red;20518:28421:size:S"
Your question is not very clear, and probably contains an error ("color:red" in your wanted result vs. "red" in aliastr).
You may try something like this:
str = "1627207:132069:color:green;20518:28421:size:62cm"
aliastr = "20518:28421:S;20518:28358:L;20518:28357:M;1627207:132069:red"
replacements = aliastr.split(";").map{|s| parts=s.split(":"); [/#{parts[0]}:#{parts[1]}:.*/,s]}
src = str.split(";")
src.map{|s| replacements.each{|r| s.sub!(r[0],r[1])}; s }.join(";")

Retrieving multiple matched tokens from a regexp [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Here's what I want to happen
> /x(y\d)*/.somefunction('xy1y2y3').each { |x| puts x }
y1
y2
y3
This seems like a pretty natural use of the asterisk in a regexp. I've matched a bunch of tokens and I want them printed out.
The closest I've been able to find is:
/x((y\d)*)/.match('xy1y2y3')[1].scan(/y\d/).each { |x| puts x }
Which is just abysmal.
The issue you are running into has to do with the regex rather than Ruby. You are repeating a capture group rather than capturing a repeated group. You could use
str.scan(/x((?:y\d)*)/)
However, this will capture all of the groups combined as one string. In order to do what you actually want to do (check that the string follows the pattern x followed by these groups) you unfortunately need to do two steps as you are doing in your question. Either that, or you can remove the additional requirement and search only for the pattern as other answers are suggesting.
I assume this is what you want:
'xy1y2y3'.gsub(/y\d/) { |s| puts s }
The gsub method accepts a block.
Based on your input and output, this looks about right:
'xy1y2y3'.scan(/y\d/)
# => ["y1", "y2", "y3"]
Use this if you want to print them:
puts 'xy1y2y3'.scan(/y\d/)
# >> y1
# >> y2
# >> y3
String's scan is your friend if you want to look through a string and capture repeating patterns.

Using regex backreference value as numeric value in regex [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I've got a string that has variable length sections. The length of the section precedes the content of that section. So for example, in the string:
13JOHNSON,STEVE
The first 2 characters define the content length (13), followed by the actual content. I'd like to be able to parse this using named capture groups with a backreference, but I'm not sure it is possible. I was hoping this would work:
(?<length>\d{2})(?<name>.{\k<length>})
But it doesn't. Seems like the backreference isn't interpreted as a number. This works fine though:
(?<length>\d{2})(?<name>.{13})
No, that will not work of course. You need to recompile your regular expression after extracting the first number.
I would recommend you to use two different expressions:
the first one that extracts number, and the second one that extracts texts basing on the number extracted by the first one.
You can't do that.
>> s = '13JOHNSON,STEVE'
=> "13JOHNSON,STEVE"
>> length = s[/^\d{2}/].to_i # s[0,2].to_i
=> 13
>> s[2,length]
=> "JOHNSON,STEVE"
This really seems like you're going after this the hard way. I suspect the sample string is not as simple as you said, based on:
I've got a string that has variable length sections. The length of the section precedes the content of that section.
Instead I'd use something like:
str = "13JOHNSON,STEVE 08Blow,Joe 10Smith,John"
str.scan(/\d{2}(\S+)/).flatten # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
If the string can be split accurately, then there's this:
str.split.map{ |s| s[2..-1] } # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
If you only have length bytes followed by strings, with nothing between them something like this works:
offset = 0
str.delete!(' ') # => "13JOHNSON,STEVE08Blow,Joe10Smith,John"
str.scan(/\d+/).map{ |l| s = str[offset + 2, l.to_i]; offset += 2 + l.to_i ; s }
# => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John"]
won't work if the names have digits in them – tihom
str = "13JOHNSON,STEVE 08Blow,Joe 10Smith,John 1012345,7890"
str.scan(/\d{2}(\S+)/).flatten # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
str.split.map{ |s| s[2..-1] } # => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
With a a minor change, and minor addition it'll continue to work correctly with strings not containing delimiters:
str.delete!(' ') # => "13JOHNSON,STEVE08Blow,Joe10Smith,John1012345,7890"
offset = 0
str.scan(/\d{2}/).map{ |l| s = str[offset + 2, l.to_i]; offset += 2 + l.to_i ; s }.compact
# => ["JOHNSON,STEVE", "Blow,Joe", "Smith,John", "12345,7890"]
\d{2} grabs the numerics in groups of two. For the names where the numeric is a leading length value of two characters, which is according to the OPs sample, the correct thing happens. For a solid numeric "name" several false-positives are returned, which would return nil values. compact cleans those out.
What about this?
a = '13JOHNSON,STEVE'
puts a.match /(?<length>\d{2})(?<name>(.*),(.*))/

Resources