Ruby regex: scan all - ruby

I have a string:
TFS[MAD,GRO,BCN],ALC[GRO,PMI,ZAZ,MAD,BCN],BCN[ALC,...]...
I want to convert it into a list:
list = (
[0] => "TFS"
[0] => "MAD"
[1] => "GRO"
[2] => "BCN"
[1] => "ALC"
[0] => "GRO"
[1] => "PMI"
[2] => "ZAZ"
[3] => "MAD"
[4] => "BCN"
[2] => "BCN"
[1] => "ALC"
[2] => ...
[3] => ...
)
How do I do this in Ruby?
I tried:
(([A-Z]{3})\[([A-Z]{3},+))
But it returns only the first element in [] and doesn't make a comma optional (at the end of "]").

You need to tell the regex that the , is not required after each element, but instead in front of each argument except the first. This leads to the following regex:
str="TFS[MAD,GRO,BCN],ALC[GRO,PMI,ZAZ,MAD,BCN],BCN[ALC]"
str.scan(/[A-Z]{3}\[[A-Z]{3}(?:,[A-Z]{3})*\]/)
#=> ["TFS[MAD,GRO,BCN]", "ALC[GRO,PMI,ZAZ,MAD,BCN]", "BCN[ALC]"]
You can also use scan's behavior with capturing groups, to split each match into the part before the brackets and the part inside the brackets:
str.scan(/([A-Z]{3})\[([A-Z]{3}(?:,[A-Z]{3})*)\]/)
#=> [["TFS", "MAD,GRO,BCN"], ["ALC", "GRO,PMI,ZAZ,MAD,BCN"], ["BCN", "ALC"]]
You can then use map to split each part inside the brackets into multiple tokens:
str.scan(/([A-Z]{3})\[([A-Z]{3}(?:,[A-Z]{3})*)\]/).map do |x,y|
[x, y.split(",")]
end
#=> [["TFS", ["MAD", "GRO", "BCN"]],
# ["ALC", ["GRO", "PMI", "ZAZ", "MAD", "BCN"]],
# ["BCN", ["ALC"]]]

Here's another way using a hash to store your contents, and less regex.
string = "TFS[MAD,GRO,BCN],ALC[GRO,PMI,ZAZ,MAD,BCN],BCN[ALC]"
z=Hash.new([])
string.split(/][ \t]*,/).each do |x|
o,p=x.split("[")
z[o]=p.split(",")
end
z.each_pair{|x,y| print "#{x}:#{y}\n"}
output
$ ruby test.rb
TFS:["MAD", "GRO", "BCN"]
ALC:["GRO", "PMI", "ZAZ", "MAD", "BCN"]
BCN:["ALC]"]

first split the groups
groups = s.scan(/[^,][^\[]*\[[^\[]*\]/)
# => ["TFS[MAD,GRO,BCN]", "ALC[GRO,PMI,ZAZ,MAD,BCN]"]
Now you have the groups, the rest is pretty straightforward:
groups.map {|x| [x[0..2], x[4..-2].split(',')] }
# => [["TFS", ["MAD", "GRO", "BCN"]], ["ALC", ["GRO", "PMI", "ZAZ", "MAD", "BCN"]]]

If I understood correctly, you may want to get such array.
yourexamplestring.scan(/([A-Z]{3})\[([^\]]+)/).map{|a,b|[a,b.split(',')]}
[["TFS", ["MAD", "GRO", "BCN"]], ["ALC", ["GRO", "PMI", "ZAZ", "MAD", "BCN"]], ["BCN", ["ALC", "..."]]]

Related

Comparing a string with an array

How do I compare "string1" with ["string1"]? The following results in false:
params[:abc] # => "neon green"
#abc # => ["neon green"]
params[:abc] == #abc # => false
You could use Array#include?. However, this will return true if the array contains "string1" and "string2".
["string1"].include?("string1") # => true
["string1", "string2"].include?("string1") # => true
In the event you want to compare the array contains only the string, I'd recommend using the Array method, which converts the parameters provided to it into an array.
Array(["string1"]) == Array("string1") # => true
Array(["string1", "string2"]) == Array("string1") # => false
How it works:
Array(["string1"]) # => ["string1"]
Array("string1") # => ["string1"]
Array(nil) # => []
Another option - put the string inside an array of itself:
[params[:abc]] == #abc # => true
Or, if you don't know which one is an array, use an array-splat ([*]) combination:
[*params[:abc]] == [*#abc] # => true
Array-splat will work in a similar fashion to #Jkarayusuf's Array():
[*["string1"]] # => ["string1"]
[*"string1"] # => ["string1"]
[*nil] # => []
you can wrap the second one in an array, or extract the string from the array
[params[:abc]] == #abc
or
params[:abc] == #abc.first
I kinda like the first one more
I'd do:
#abc = #abc.join('')
#=> "neon green"
if params[:abc] == #abc
do thing 1
else
do thing 2
end
Try this
params[:abc].in? #abc

How can I upcase first occurrence of an alphabet in alphanumeric string?

Is there any easy way to convert strings like 3500goat to 3500Goat and goat350rat to Goat350rat?
I am trying to convert the first occurrence of alphabet in an alphanumeric string to uppercase. I was trying the code below using the method sub, but no luck.
stringtomigrate = 3500goat
stringtomigrate.sub!(/\D{0,1}/) do |w|
w.capitalize
This should work:
string.sub(/[a-zA-Z]/) { |s| s.upcase }
or a shorthand:
string.sub(/[a-zA-Z]/, &:upcase)
examples:
'3500goat'.sub(/[a-zA-Z]/, &:upcase)
# => "3500Goat"
'goat350rat'.sub(/[a-zA-Z]/, &:upcase)
# => "Goat350rat"
Try this
1.9.3-p545 :060 > require 'active_support/core_ext'
=> true
1.9.3-p545 :099 > "goat350rat to Goat350rat".sub(/[a-zA-Z]/){ |x| x.titleize}
=> "Goat350rat to Goat350rat"

Obfuscating numbers in a string?

I have a challenge that calls for obfuscating numbers in a string, such as a SSN, for example: XXX-XX-4430. I've gotten pretty close:
def hide_all_ssns(string)
string.scan(/\w{3}-\w{2}-\w{4}/)
string.gsub('/\w{3}-\w{2}', 'XXX-XX')
end
but I get an error:
Error! hide_all_ssns obfuscates any SSNs in the string expected:
"XXX-XX-1422, XXX-XX-0744, XXX-XX-8762" got: "234-60-1422,
350-80-0744, 013-60-8762" (using ==)
I initially had the regular-expression (/\d{3}-\d{2}-\d{4}/) but thought that the problem was attempting to convert the integers in the string to X. Now I'm using \w, yet I am getting the same error.
Does anyone have any insight? I'm a newbie to coding and have exhausted Ruby-doc, as well as any blogs I can find on regex/gsub, but I am getting nowhere.
You're mis-using gsub (your regular expression needs to be between forward slashes), but I still thing gsub! is what you want...
def hide_all_ssns(string)
string.scan(/\w{3}-\w{2}-\w{4}/)
string.gsub!(/\w{3}-\w{2}/, 'XXX-XX')
end
Working example:
1.9.3p448 :063 > string = "123-45-6789"
=> "123-45-6789"
1.9.3p448 :064 > def hide_all_ssns(string)
1.9.3p448 :065?> string.scan(/\w{3}-\w{2}-\w{4}/)
1.9.3p448 :066?> string.gsub!(/\w{3}-\w{2}/, 'XXX-XX')
1.9.3p448 :067?> end
=> nil
1.9.3p448 :068 > hide_all_ssns(string)
=> "XXX-XX-6789"
1.9.3p448 :069 > string
=> "XXX-XX-6789"
Why does it have to be so hard? All U.S. social security numbers are the same format, right? So, work from that point. Here's some variations on a theme, ordered by escalating obscurity:
ssn = '123-45-6789' # => "123-45-6789"
ssn[0, 6] = 'XXX-XX' # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
numbers = ssn.scan(/\d+/) # => ["123", "45", "6789"]
'XXX-XX-' + numbers.last # => "XXX-XX-6789"
Or:
ssn = '123-45-6789' # => "123-45-6789"
ssn[0, 6] = ssn[0, 6].gsub(/\d/, 'X') # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
ssn[0,6] = ssn[0, 6].tr('0-9', 'X') # => "XXX-XX"
ssn # => "XXX-XX-6789"
Or:
numbers = ssn.split('-') # => ["123", "45", "6789"]
[*numbers[0, 2].map{ |s| 'X' * s.size }, numbers[-1]].join('-') # => "XXX-XX-6789"
Or:
ssn[/(\d+)-(\d+)-(\d+)/] # => "123-45-6789"
[$1, $2, $3] # => ["123", "45", "6789"]
[$3, *[$2, $1].map{ |s| s.gsub(/./, 'X') }].reverse.join('-') # => "XXX-XX-6789"
Of course, using one of these would cheating, since you're supposed to figure the challenge out by yourself, but they're good food for thought and a decent starting point for your own solution.
Short and simple... You could maybe try something like this:
crypted = ('X' * 6) + "4543-2329-1354-1111".to_s[14..18]
=> "XXXXXX-1111"

Ruby multiple string replacement

str = "Hello☺ World☹"
Expected output is:
"Hello:) World:("
I can do this: str.gsub("☺", ":)").gsub("☹", ":(")
Is there any other way so that I can do this in a single function call?. Something like:
str.gsub(['s1', 's2'], ['r1', 'r2'])
Since Ruby 1.9.2, String#gsub accepts hash as a second parameter for replacement with matched keys. You can use a regular expression to match the substring that needs to be replaced and pass hash for values to be replaced.
Like this:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
'(0) 123-123.123'.gsub(/[()-,. ]/, '') #=> "0123123123"
In Ruby 1.8.7, you would achieve the same with a block:
dict = { 'e' => 3, 'o' => '*' }
'hello'.gsub /[eo]/ do |match|
dict[match.to_s]
end #=> "h3ll*"
Set up a mapping table:
map = {'☺' => ':)', '☹' => ':(' }
Then build a regex:
re = Regexp.new(map.keys.map { |x| Regexp.escape(x) }.join('|'))
And finally, gsub:
s = str.gsub(re, map)
If you're stuck in 1.8 land, then:
s = str.gsub(re) { |m| map[m] }
You need the Regexp.escape in there in case anything you want to replace has a special meaning within a regex. Or, thanks to steenslag, you could use:
re = Regexp.union(map.keys)
and the quoting will be take care of for you.
You could do something like this:
replacements = [ ["☺", ":)"], ["☹", ":("] ]
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
There may be a more efficient solution, but this at least makes the code a bit cleaner
Late to the party but if you wanted to replace certain chars with one, you could use a regex
string_to_replace.gsub(/_|,| /, '-')
In this example, gsub is replacing underscores(_), commas (,) or ( ) with a dash (-)
Another simple way, and yet easy to read is the following:
str = '12 ene 2013'
map = {'ene' => 'jan', 'abr'=>'apr', 'dic'=>'dec'}
map.each {|k,v| str.sub!(k,v)}
puts str # '12 jan 2013'
You can also use tr to replace multiple characters in a string at once,
Eg., replace "h" to "m" and "l" to "t"
"hello".tr("hl", "mt")
=> "metto"
looks simple, neat and faster (not much difference though) than gsub
puts Benchmark.measure {"hello".tr("hl", "mt") }
0.000000 0.000000 0.000000 ( 0.000007)
puts Benchmark.measure{"hello".gsub(/[hl]/, 'h' => 'm', 'l' => 't') }
0.000000 0.000000 0.000000 ( 0.000021)
Riffing on naren's answer above, I'd go with
tr = {'a' => '1', 'b' => '2', 'z' => '26'}
mystring.gsub(/[#{tr.keys}]/, tr)
So
'zebraazzeebra'.gsub(/[#{tr.keys}]/, tr) returns
"26e2r112626ee2r1"

Reading in a fixed number of pipe delimited fields per row?

I have a bunch of pipe-delimited files that weren't properly escaped for carriage returns when generated, and so I cant use the CR or newline characters to delimit the rows. I DO know however that each record has to have exactly 7 fields.
Splitting the fields is easy with the CSV library in Ruby 1.9 setting the 'col_sep' argument, but the 'row_sep' argument cannot be set because I have newlines within the fields.
Is there a way to parse a pipe-delimited file using a fixed number of fields as the row delimiter?
Thanks!
Here's one way of doing it:
Build a sample string of seven words, with an embedded new-line in the
middle of the string. There are three lines worth.
text = (["now is the\ntime for all good"] * 3).join(' ').gsub(' ', '|')
puts text
# >> now|is|the
# >> time|for|all|good|now|is|the
# >> time|for|all|good|now|is|the
# >> time|for|all|good
Process like this:
lines = []
chunks = text.gsub("\n", '|').split('|')
while (chunks.any?)
lines << chunks.slice!(0, 7).join(' ')
end
puts lines
# >> now is the time for all good
# >> now is the time for all good
# >> now is the time for all good
So, that shows we can rebuild the rows.
Pretending that the words are actually columns from the pipe-delimited file we can make the code do the real thing by taking out the .join(' '):
while (chunks.any?)
lines << chunks.slice!(0, 7)
end
ap lines
# >> [
# >> [0] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ],
# >> [1] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ],
# >> [2] [
# >> [0] "now",
# >> [1] "is",
# >> [2] "the",
# >> [3] "time",
# >> [4] "for",
# >> [5] "all",
# >> [6] "good"
# >> ]
# >> ]
Say for instance you wanted to parse all charities in the IRS txt file that is pipe delimited.
Say you had a model called Charity that had all the same fields as your pipe delimited file.
class Charity < ActiveRecord::Base
# http://apps.irs.gov/app/eos/forwardToPub78DownloadLayout.do
# http://apps.irs.gov/app/eos/forwardToPub78Download.do
attr_accessible :city, :country, :deductibility_status, :deductibility_status_description, :ein, :legal_name, :state
end
You can make a rake task called import.rake
namespace :import do
desc "Import Pipe Delimted IRS 5013c Data "
task :irs_data => :environment do
require 'csv'
txt_file_path = 'db/irs_5013cs.txt'
results = File.open(txt_file_path).readlines do |line|
line = line.split('|').each_slice(7)
end
# Order Field Notes
# 1 EIN Required
# 2 Legal Name Optional
# 3 City Optional
# 4 State Optional
# 5 Deductibility Status Optional
# 6 Country Optional - If Country is null, then Country is assumed to be United States
# 7 Deductibility Status Description Optional
results.each do |row|
row = row.split('|').each_slice(7).to_a.first
#ID,Category,Sub Category,State Standard
Charity.create!({
:ein => row[0],
:legal_name => row[1],
:city => row[2],
:state => row[3],
:deductibility_status => row[4],
:country => row[5],
:deductibility_status_description => row[6]
})
end
end
end
finally you can run this import by typing following on command line from your rails app
rake import:irs_data
Here's one idea, use a regex:
#!/opt/local/bin/ruby
fp = File.open("pipe_delim.txt")
r1 = /.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|/m
results = fp.gets.scan(r1)
results.each do |result|
puts result
end
This regex seems to trip up on newlines within a field, but I'm sure you could tweak it to work properly.
Just a thought, but the cucumber testing gem has a Cucumber::Ast::Table class you could use to process this file.
Cucumber::Ast::Table.new(File.read(file))
Then I think it's the rows method you can use to read it out.
Try using String#split and Enumerable#each_slice:
result = []
text.split('|').each_slice(7) { |record| result << record }

Categories

Resources