Best way to capture multiple matches - ruby

Having in same text message fixed part once (id of item) and multiple lines (several references and dimensions of each part):
..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53
I want to parse and store IdCode, References, Array with dimensions.
When applying REGEX.match(my_text) method getting only first occurencies of REF and CMT:
REGEX = %r{
ID\/(?<IdCode> \d{10})\s
(REF\/(?<ReferenceCode> \w{3}\-\d{3}\-\d)\s)+
(CMT\/(?<Length> \d+)\-(?<Width> \d+)\-(?<Height> \d+)\s)+
}x
The result looks like this:
IdCode: "1100008273"
ReferenceCode: "D14-219-0"
Length: "37"
Width: "37"
Height: "20"
Is there a way to capture multiple occurrences without iterating ?

Suppose your string were:
str = %w| dog
ID/11000082734
REF/D14-109-0
REF/D14-209-0
CMT/49-41-31
CMT/44-34-53
cat
ID/11000082735
REF/D14-109-1
REF/D14-209-1
CMT/49-41-32
CMT/44-34-54
pig |.join("\n")
#=> "dog\nID/11000082734\nREF/D14-109-0\nREF/D14-209-0\nCMT/49-41-31\nCMT/44-34-53\ncat\nID/11000082735\nREF/D14-109-1\nREF/D14-209-1\nCMT/49-41-32\nCMT/44-34-54\npig"
Then you could write:
r = /(ID\/\d{11}) # match string in capture group 1
\n # match newline
((?:REF\/[A-Z]\d{2}-\d{3}-\d\n)+) # match consecutive REF lines in capture group 2
((?:CMT\/\d{2}-\d{2}-\d{2}\n)+) # match consecutive CMT lines in capture group 3
/x # free-spacing regex definition mode
arr = str.scan(r)
#=> [["ID/11000082734", "REF/D14-109-0\nREF/D14-209-0\n",
# "CMT/49-41-31\nCMT/44-34-53\n"],
# ["ID/11000082735", "REF/D14-109-1\nREF/D14-209-1\n",
# "CMT/49-41-32\nCMT/44-34-54\n"]]
This extracts the desired information without iterating.
At this point it may be desirable to convert arr to a more convenient data structure. For example:
arr.map do |a,b,c|
{ :id => a[/\d+/],
:ref => b.split("\n").map { |s| s[4..-1] },
:cmt => c.scan(/(\d{2})-(\d{2})-(\d{2})/).map { |e|
[:length, :width, :height].zip(e.map(&:to_i)).to_h }
}
end
#=> [{ :id=>"11000082734",
# :ref=>["D14-109-0", "D14-209-0"],
# :cmt=>[{ :length=>49, :width=>41, :height=>31 },
# { :length=>44, :width=>34, :height=>53 }
# ]
# },
# { :id=>"11000082735",
# :ref=>["D14-109-1", "D14-209-1"],
# :cmt=>[{ :length=>49, :width=>41, :height=>32 },
# { :length=>44, :width=>34, :height=>54 }
# ]
# }
# ]

Try this
(?<IdCode>\d{10,})|REF\/(?<ReferenceCode>\w{3}\-\d{3}\-\d)|CMT\/(?<Length>\d+)\-(?<Width>\d+)\-(?<Height>\d+)
Regex demo
Explanation:
( … ): Capturing group sample
?: Once or none sample
\: Escapes a special character sample
|: Alternation / OR operand sample
+: One or more sample
Input
..some random text here..
ID/11000082734
REF/D14-109-0
REF/D14-209-0
REF/D14-219-0
CMT/59-40-25
CMT/38-25-28
CMT/59-40-25
CMT/37-37-20
CMT/40-40-20
CMT/37-37-20
CMT/49-41-31
CMT/44-34-53
Output:
MATCH 1
IdCode [29-40] `11000082734`
MATCH 2
ReferenceCode [45-54] `D14-109-0`
MATCH 3
ReferenceCode [59-68] `D14-209-0`
MATCH 4
ReferenceCode [73-82] `D14-219-0`
MATCH 5
Length [87-89] `59`
Width [90-92] `40`
Height [93-95] `25`
MATCH 6
Length [100-102] `38`
Width [103-105] `25`
Height [106-108] `28`
MATCH 7
Length [113-115] `59`
Width [116-118] `40`
Height [119-121] `25`
MATCH 8
Length [126-128] `37`
Width [129-131] `37`
Height [132-134] `20`
MATCH 9
Length [139-141] `40`
Width [142-144] `40`
Height [145-147] `20`
MATCH 10
Length [152-154] `37`
Width [155-157] `37`
Height [158-160] `20`
MATCH 11
Length [165-167] `49`
Width [168-170] `41`
Height [171-173] `31`
MATCH 12
Length [178-180] `44`
Width [181-183] `34`
Height [184-186] `53`

Related

Replace pattern in string with value from a ruby array

I have a string like this
"base: [_0x3e63[241], _0x3e63[242]],
gray: [_0x3e63[243], _0x3e63[244], _0x3e63[245], _0x3e63[246], _0x3e63[247], _0x3e63[248], _0x3e63[249], _0x3e63[250], _0x3e63[251], _0x3e63[252]],
red: [_0x3e63[253], _0x3e63[254], _0x3e63[255], _0x3e63[256], _0x3e63[257], _0x3e63[258], _0x3e63[259], _0x3e63[260], _0x3e63[261], _0x3e63[262]],
pink: [_0x3e63[263], _0x3e63[264], _0x3e63[265], _0x3e63[266], _0x3e63[267], _0x3e63[268], _0x3e63[269], _0x3e63[270], _0x3e63[271], _0x3e63[272]],
grape: [_0x3e63[273], _0x3e63[274], _0x3e63[275], _0x3e63[276], _0x3e63[277], _0x3e63[278], _0x3e63[279], _0x3e63[280], _0x3e63[281], _0x3e63[282]],
violet: [_0x3e63[283], _0x3e63[284], _0x3e63[285], _0x3e63[286], _0x3e63[287], _0x3e63[288], _0x3e63[289], _0x3e63[290], _0x3e63[291], _0x3e63[292]],
indigo: [_0x3e63[293], _0x3e63[294], _0x3e63[295], _0x3e63[296], _0x3e63[297], _0x3e63[298], _0x3e63[299], _0x3e63[300], _0x3e63[301], _0x3e63[302]],
blue: [_0x3e63[303], _0x3e63[304], _0x3e63[305], _0x3e63[306], _0x3e63[307], _0x3e63[308], _0x3e63[309], _0x3e63[310], _0x3e63[311], _0x3e63[312]],
cyan: [_0x3e63[313], _0x3e63[314], _0x3e63[315], _0x3e63[316], _0x3e63[317], _0x3e63[318], _0x3e63[319], _0x3e63[320], _0x3e63[321], _0x3e63[322]],
teal: [_0x3e63[323], _0x3e63[324], _0x3e63[325], _0x3e63[326], _0x3e63[327], _0x3e63[328], _0x3e63[329], _0x3e63[330], _0x3e63[331], _0x3e63[332]],
green: [_0x3e63[333], _0x3e63[334], _0x3e63[335], _0x3e63[336], _0x3e63[337], _0x3e63[338], _0x3e63[339], _0x3e63[340], _0x3e63[341], _0x3e63[342]],
lime: [_0x3e63[343], _0x3e63[344], _0x3e63[345], _0x3e63[346], _0x3e63[347], _0x3e63[348], _0x3e63[349], _0x3e63[350], _0x3e63[351], _0x3e63[352]],
yellow: [_0x3e63[353], _0x3e63[354], _0x3e63[355], _0x3e63[356], _0x3e63[357], _0x3e63[358], _0x3e63[359], _0x3e63[360], _0x3e63[361], _0x3e63[362]],
orange: [_0x3e63[363], _0x3e63[364], _0x3e63[365], _0x3e63[366], _0x3e63[367], _0x3e63[368], _0x3e63[369], _0x3e63[370], _0x3e63[371], _0x3e63[372]]"
_0x3e63 is a ruby array with the values.
_0x3e63 = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c", "#9c36b5", "#ae3ec9", "#be4bdb", "#cc5de8", "#da77f2", "#e599f7", "#eebefa", "#f3d9fa", "#f8f0fc", "#5f3dc4", "#6741d9", "#7048e8", "#7950f2", "#845ef7", "#9775fa", "#b197fc", "#d0bfff", "#e5dbff", "#f3f0ff", "#364fc7", "#3b5bdb", "#4263eb", "#4c6ef5", "#5c7cfa", "#748ffc", "#91a7ff", "#bac8ff", "#dbe4ff", "#edf2ff", "#1864ab", "#1971c2", "#1c7ed6", "#228be6", "#339af0", "#4dabf7", "#74c0fc", "#a5d8ff", "#d0ebff", "#e7f5ff", "#0b7285", "#0c8599", "#1098ad", "#15aabf", "#22b8cf", "#3bc9db", "#66d9e8", "#99e9f2", "#c5f6fa", "#e3fafc", "#087f5b", "#099268", "#0ca678", "#12b886", "#20c997", "#38d9a9", "#63e6be", "#96f2d7", "#c3fae8", "#e6fcf5", "#2b8a3e", "#2f9e44", "#37b24d", "#40c057", "#51cf66", "#69db7c", "#8ce99a", "#b2f2bb", "#d3f9d8", "#ebfbee", "#5c940d", "#66a80f", "#74b816", "#82c91e", "#94d82d", "#a9e34b", "#c0eb75", "#d8f5a2", "#e9fac8", "#f4fce3", "#e67700", "#f08c00", "#f59f00", "#fab005", "#fcc419", "#ffd43b", "#ffe066", "#ffec99", "#fff3bf", "#fff9db", "#d9480f", "#e8590c"]
I cannot find a way to retrieve from the string _0x3e63[xxxxxxx] replacing it with the right value....
Use String#gsub with a block.
Assuming your input string is stored in the variable input, the following code does the replacement and displays the result:
puts input.gsub(/_0x3e63\[(\d+)\]/){|s| _0x3e63[$1.to_i]}
(The array _0x3e63 you posted in the question does not contain enough values to have indices like 247 or 251 but the code works nevertheless.)
The code is very simple. The regular expression /_0x3e63\[(\d+)\]/ matches any string that starts with _0x3e63[, continues with one or more digits (\d+) and ends with ].
For each match the block is executed and the value returned by the block is used to replace the matched piece of the original string.
The replacement uses $1 (that contains the sub-string that matches the first capturing group) as an index into the array _0x3e63. Because the value of $1 is a string, .to_i is used to convert it to a number (required to be used as index in the array).
We are given:
str =<<~END
base: [arr[6], arr[3]],
gray: [arr[0], arr[4], arr[1], arr[5]],
red: [arr[2]]
END
#=> "base: [arr[6], arr[3]],\ngray: [arr[0], arr[4], arr[1], arr[5]],\nred: [arr[2]]\n"
and
arr = ["#f783ac", "#faa2c1", "#fcc2d7", "#ffdeeb", "#fff0f6", "#862e9c",
"#9c36b5"]
We can perform the required replacements by using String#gsub with a regular expression and Kernel#eval:
puts str.gsub(/\barr\[\d+\]/) { |s| eval s }
base: [#9c36b5, #ffdeeb],
gray: [#f783ac, #fff0f6, #faa2c1, #862e9c],
red: [#fcc2d7]
The regular expression preforms the following operations:
\b # match a word break (to avoid matching 'gnarr')
arr\[ # match string 'arr['
\d+ # match 1+ digits
\] # match ']'
Rubular
One must be cautious about using eval (to avoid launching missiles inadvertently, for example), but as long as the matches of the string can be trusted it's a perfectly safe and useful method.

How to get a block at an offset in the IO.foreach loop in ruby?

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.
You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end
It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.
You may use
/#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
See the regex demo
Details
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's:
#\[([^]]+)]\([^:]+:([^)]+)\)
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

extract some info from a complete sentance using preg_match

I want to extract specific info using preg_match in php. Is there any idea?
sample text = 2xMUA Matte Lipstick - Totally Nude[300]=[600]
here i want to
product_name = MUA Matte Lipstick - Totally Nude
product_qty = 2
product_price = 300
product_subtotal = 600
You could capture your values in (named) captured groups.
Try it like this:
(?<qty>\d+)x(?<name>[^[]+)\[(?<price>[^]]+)\]=\[(?<subtotal>[^]]+)\]
That would match:
(?<qty> # Named captured group qty
\d+ # One or more digits
) # Close group
x # Match x
(?<name> # Named captured group name
[^[]+ # Match not [ one or more times
) # Close group
\[ # Match [
(?<price> # Named captured group price
[^]]+ # Match not ] one or more times
) # Close group
\]=\[ # Match ]=[
(?<subtotal> # Named captured group subtotal
[^]]+ # # Match not ] one or more times
) # Close group
\] # Match ]
Demo php
Or without named captured groups:
(\d+)x([^[]+)\[([^]]+)\]=\[([^]]+)\]

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

Resources