How to extract a number using regular expression in ruby - ruby

I am new to regular expressions and ruby. below is the example which I start working with
words= "apple[12345]: {123123} boy 1233 6F74 2AC 28458 1594 6532 1500 D242g
apple[13123]: {123123123} girl Aui817E 9AD453 91321SDF 3423FS 1213FDAS 110FADA4 43ADAC0 1AADS4D8 BASAA24 "
I want to extract boy 1233 6F74 .. to .. D242g in an array
Similarly I want to extract girl Aui817E 9AD453 .. to .. 43ADAC0 1AADS4D8 BASAA24 in an array
I did tried to this could not do it. Can some one please help me to this simple exercise.
Thanks in advance.
begin
pattern = /apple\[\d+\]: \{\d+\} (\w) (\d+) (\d+) /
f = pattern.match(words)
puts " #{f}"
end

words.scan(/apple\[\d+\]: \{\d+\}(.+)/).map{|a| a.first.scan(/\S+/)}
or
words.each_line.map{|s| s.split.drop(2)}
Output:
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]

array = words.scan(/apple\[\d+\]: {\d+}(.+)/).flatten.map { |line| line.scan(/\w+/) }
({ and } are not need to escape on regex.)
return
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array[0] gives an array start with "boy", and array[1] gives an array start with "girl".

Related

How to get a block at an offset in the IO.foreach loop in ruby?

I'm using the IO.foreach loop to find a string using regular expressions. I want to append the next block (next line) to the file_names list. How can I do that?
file_names = [""]
IO.foreach("a.txt") { |block|
if block =~ /^file_names*/
dir = # get the next block
file_names.append(dir)
end
}
Actually my input looks like this:
file_names[174]:
name: "vector"
dir_index: 1
mod_time: 0x00000000
length: 0x00000000
file_names[175]:
name: "stl_bvector.h"
dir_index: 2
mod_time: 0x00000000
length: 0x00000000
I have a list of file_names, and I want to capture each of the name, dir_index, mod_time and length properties and put them into the files_names array index according to the file_names index in the text.
You can use #each_cons to get the value of the next 4 rows from the text file:
files = IO.foreach("text.txt").each_cons(5).with_object([]) do |block, o|
if block[0] =~ /file_names.*/
o << block[1..4].map{|e| e.split(':')[1]}
end
end
puts files
#=> "vector"
# 1
# 0x00000000
# 0x00000000
# "stl_bvector.h"
# 2
# 0x00000000
# 0x00000000
Keep in mind that the files array contains subarrays of 4 elements. If the : symbol occurs later in the lines, you could replace the third line of my code with this:
o << block[1..4].map{ |e| e.partition(':').last.strip}
I also added #strip in case you want to remove the whitespaces around the values. With this line changed, the actual array will look something like this:
p files
#=>[["\"vector\"", "1", "0x00000000", "0x00000000"], ["\"stl_bvector.h\"", "2", "0x00000000", "0x00000000"]]
(the values don't contain the \ escape character, that's just the way #p shows it).
Another option, if you know the pattern 1 filename, 4 values will be persistent through the entire text file and the textfile always starts with a filename, you can replace #each_cons with #each_slice and remove the regex completely, this will also speed up the entire process:
IO.foreach("text.txt").each_slice(5).with_object([]) do |block, o|
o << block[1..4].map{ |e| e.partition(':').last.strip }
end
It's actually pretty easy to carve up a series of lines based on a pattern using slice_before:
File.readlines("data.txt").slice_before(/\Afile_names/)
Now you have an array of arrays that looks like:
[
[
"file_names[174]:\n",
" name: \"vector\"\n",
" dir_index: 1\n",
" mod_time: 0x00000000\n",
" length: 0x00000000\n"
],
[
"file_names[175]:\n",
" name: \"stl_bvector.h\"\n",
" dir_index: 2\n",
" mod_time: 0x00000000\n",
" length: 0x00000000"
]
]
Each of these groups could be transformed further, like for example into a Ruby Hash using those keys.

How to parse username, ID or whole part using Ruby Regex in this sentence?

I have a sentences like this:
Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?
And what I want to is get #[Pratha](user:1) and #[John](user:3). Either their names and ids or just as texts as I quoted so that i can explode and parse name and id myself.
But there is an issue here. Names Pratha and John may include non-abc characters like ', ,, -, + , etc... But not [] and ()
What I tried so far:
c = ''
f = c.match(/(?:\s|^)(?:#(?!(?:\d+|\w+?_|_\w+?)(?:\s(\[)|$)))(\w+)(?=\s|$)/i)
But no success.
You may use
/#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
See the regex demo
Details
# - a # char
\[ - a [
([^\]\[]*) - Group 1: 0+ chars other than [ and ]
\] - a ] char
\( - a ( char
[^()]*- 0+ chars other than ( and )
: - a colon
(\d+) - Group 2: 1 or more digits
\) - a ) char.
Sample Ruby code:
s = "Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?"
rx = /#\[([^\]\[]*)\]\([^()]*:(\d+)\)/
res = s.scan(rx)
puts res
# = > [["Pratha", "1"], ["John", "3"]]
"Hello #[Pratha](user:1), did you see #[John](user:3)'s answer?".scan(/#.*?\)/)
#⇒ ["#[Pratha](user:1)", "#[John](user:3)"]
Since the line is not coming from the user input, you might rely on that the part you are interested in starts with # and ends with ).
You could use 2 capturing groups to get the names and the id's:
#\[([^]]+)]\([^:]+:([^)]+)\)
That will match
# Match literally
\[ Match [
([^]]+) 1st capturing group which matches not ] 1+ times using a negated character class.
\( Match literally
[^:]+: Match not :, then match :
([^)]+) 2nd capturing group which matches not ) 1+ times
\) Match )
Regex demo | Ruby demo

Create array from csv using readlines ruby

I can’t seem to get this to work
I know I can do this with csv gem but Im trying out new stuff and I want to do it this way. All Im trying to do is to read lines in from a csv and then create one array from each line. I then want to put the second element in each array.
So far I have
filed="/Users/me/Documents/Workbook3.csv"
if File.exists?(filed)
File.readlines(filed).map {|d| puts d.split(",").to_a}
else puts "No file here”
The problem is that this creates one array which has all the lines in it whereas I want a separate array for each line (perhaps an array of arrays?)
Test data
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
What I would like
S5411
B5406
S5398
Let write your data to a file:
s =<<THE_BITTER_END
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
THE_BITTER_END
IO.write('temp',s)
#=> 363
We can then do this:
arr = File.readlines('temp').map { |s| s.split(',') }
#=> [["Trade date", "Settle date", "Reference", "Description", "Unit cost (p)",
"Quantity", "Value (pounds)\n"],
["04/09/2014", "09/09/2014", "S5411",
"Plus500 Ltd ILS0.01 152 # 419", "419", "152", "624.93\n"],
["02/09/2014", "05/09/2014", "B5406",
"Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75",
"4284.75", "150", "-6439.08\n"],
["29/08/2014", "03/09/2014", "S5398",
"Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84", "1116.84",
"520", "5795.62\n"]]
The values you want begin in the second element of arr and is the third element in each of those arrays. Therefore, you can pluck them out as follows:
arr[1..-1].map { |a| a[2] }
#=> ["S5411", "B5406", "S5398"]
Adopting #Stefan's suggestion of putting [2] within the block containing split, we can write this more compactly as follows:
File.readlines('temp')[1..-1].map { |s| s.split(',')[2] }
#=> ["S5411", "B5406", "S5398"]
You can also use built-in class CSV to do this very easily.
require "csv"
s =<<THE_BITTER_END
Trade date,Settle date,Reference,Description,Unit cost (p),Quantity,Value (pounds)
04/09/2014,09/09/2014,S5411,Plus500 Ltd ILS0.01 152 # 419,419,152,624.93
02/09/2014,05/09/2014,B5406,Biomarin Pharmaceutical Com Stk USD0.001 150 # 4284.75,4284.75,150,-6439.08
29/08/2014,03/09/2014,S5398,Hargreaves Lansdown plc Ordinary 0.4p 520 # 1116.84,1116.84,520,5795.62
THE_BITTER_END
arr = CSV.parse(s, :headers=>true).collect { |row| row["Reference"] }
p arr
#=> ["S5411", "B5406", "S5398"]
PS: I have borrowed the string from #Cary's answer

Splitting a single string of hashes into an array of hashes

I can't get regex to split the string to give the desired result.
http://rubular.com/r/ytFwP3ivAv - according to rubular this expression should work.
str = "{"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}, {"DATE"=>"11/26/2013 11:10", "DESC"=>"Hold-up alarm", "LOCATION"=>"4725 S KIRKMAN RD", "DISTRICT"=>"E5", "INCIDENT"=>"2013-00496235"}"
sub_str_array = str.split(/({"[\w"=>\/ :,()-]*})/)
# the desired result - each hash is an element in an array
puts the_split[0] #=> {"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}
Is there another way (an easier way) to convert these string hashes into an array of hashes?
You can use this:
require 'json'
yourstr = '[' + '{"DATE"=>"11/26/2013 11:15", "DESC"=>"Accident (minor)", "LOCATION"=>"12 S THORNTON AV", "DISTRICT"=>"C5", "INCIDENT"=>"2013-00496193"}, {"DATE"=>"11/26/2013 11:10", "DESC"=>"Hold-up alarm", "LOCATION"=>"4725 S KIRKMAN RD", "DISTRICT"=>"E5", "INCIDENT"=>"2013-00496235"}, {"DATE"=>"11/26/2013 11:08", "DESC"=>"Missing person - adult", "LOCATION"=>"4818 S SEMORAN BV 503", "DISTRICT"=>"K1", "INCIDENT"=>"2013-00496198"}, {"DATE"=>"11/26/2013 11:07", "DESC"=>"911 hang up", "LOCATION"=>"311 W PRINCETON ST", "DISTRICT"=>"C2", "INCIDENT"=>"2013-00496231"}' + ']'
my_hash = JSON.parse(yourstr.gsub("=>", ":"))
puts my_hash[0]
You've set str as an object. Wrap it in quotes and it should work.
It may be better to use %Q(string goes here) rather than double quotes.
You can use eval "[#{str}]", if str is hardcoded and nobody can change it.

How to calculate a formula with undefined variable

I am trying to eval a string with undefined variable. For example: Formula = 2 * 3 + a The result should return a string of 6 + a. Can the eval method do something like that? Or, can you give me some ideas on how to do this?
Update: Thank you for all the inputs. I guess this is not as simple as i thought it would be. Let's say if I don't need to simplify the formula and all i need to do is to replace the variable with value in string?
Example:
a = { "Bob" => 82,
"Jim" => 94,
"Billy" => 58, ........ and more}
How do I convert this string
"2 * 3 + a["Bob"] * b"
to this: "2 * 3 + 82 * b"
Thanks again for your help.
What you are trying to do is complicated, and I don't think it is worth doing it.
You can use some gem to parse the string into a tree of tokens. Then, look for any node under which there is no undefined variable, and replace the node with the calculated value. After doing that, put the tree back to a string.
What about using something like Parslet and making a parsing expression grammar to simplify your expressions? Look at the get started page where you are walked through a simple example in which Parslet reduces integer expressions.
Here is a naïve solution:
str = "+ 1 * 2 - 3 / 2 + b - a"
terms = str.gsub(/^[^+-]+|[+-][^+-]+/).to_a
=> ["+ 1 * 2 ",
"- 3 ",
"+ b ",
"- a"]
numbers, variables = terms.partition { |exp| eval(exp.to_s) rescue false }
=> [["+ 1 * 2 ", "- 3 "],
["+ b ", "- a"]]
numbers.map! { |exp| exp.gsub(/\d+/, &:to_f) }
=> ["+ 1.0 * 2.0 ",
"- 3.0 / 2.0 "]
numbers.map! { |exp| eval(exp) }
=> [2.0, -1.5]
sum = numbers.reduce(:+)
=> "0.5"
result = "#{sum} #{variables.join}"
=> "0.5 + b - a"

Resources