Generate unique initial strings from array of strings - ruby

I have a very long array of strings. For example:
["Abyssal Specter", "Air Elemental", "Aladdin's Ring", "Ambition's Cost", "Anaba Shaman", "Angel of Mercy", "Angelic Page", "Archivist", "Ardent Militia", "Avatar of Hope", "Aven Cloudchaser","Aven Fisher"]
Now this array must be passed to a method which should return
[["Abyssal Specter","Ab"], ["Air Elemental", "Ai"], ["Aladdin's Ring","Al"], ["Ambition's Cost","Am"], ["Anaba Shaman","Ana"], ["Angel of Mercy","Angel "], ["Angelic Page","Angeli"], ["Archivist","Arc"], ["Ardent Militia","Ard"], ["Avatar of Hope","Ava"], ["Aven Cloudchaser","Aven C"],["Aven Fisher","Aven F"]]
The method should return the unique initials of each string in the array.
For instance, "Abyssal Specter" should return "Ab" as there is no other string starting with "Ab". Similarly for "Air Elemental" to "Ai". But "Aven Cloudchaser" should return "Aven C", as there is a string "Aven Fisher". In short, it should just generate the unique string initials.

Abbrev in Standard Lib does exactly that:
require 'abbrev'
ar = ["Abyssal Specter", "Air Elemental", "Aladdin's Ring", "Ambition's Cost", "Anaba Shaman", "Angel of Mercy", "Angelic Page", "Archivist", "Ardent Militia", "Avatar of Hope", "Aven Cloudchaser","Aven Fisher"]
p ar.abbrev.invert.to_a
# [["Abyssal Specter", "Ab"], ["Air Elemental", "Ai"], ["Aladdin's Ring", "Al"], ["Ambition's Cost", "Am"], ["Anaba Shaman", "Ana"], ["Angel of Mercy", "Angel "], ["Angelic Page", "Angeli"], ["Archivist", "Arc"], ["Ardent Militia", "Ard"], ["Avatar of Hope", "Ava"], ["Aven Cloudchaser", "Aven C"], ["Aven Fisher", "Aven F"]]

Related

Sort hash by values

This is not how I populated my hash. Just for easier reading, here are its contents, keys are on a fixed length string:
my %country_hash = (
"001 Sample Name New Zealand" => "NEW ZEALAND",
"002 Samp2 Nam2 Zimbabwe " => "ZIMBABWE",
"003 SSS NNN Australia " => "AUSTRALIA",
"004 John Sample Philippines" => "PHILIPPINES,
);
I want to get the sorted keys based on values. So my expectation:
"003 SSS NNN Australia "
"001 Sample Name New Zealand"
"004 John Sample Philippines"
"002 Samp2 Nam2 Zimbabwe "
What I did:
foreach my $line( sort {$country_hash{$a} <=> $country_hash{$b} or $a cmp $b} keys %country_hash ){
print "$line\n";
}
also;
(I doubted this will sort but anyway)
my #sorted = sort { $country_hash{$a} <=> $country_hash{$b} } keys %country_hash;
foreach my $line(#sorted){
print "$line\n";
}
Neither of them sorted correctly. I hope someone could help.
If you had used warnings, you would have been told that <=> is the wrong operator; it is used for numeric comparison. Use cmp for string comparison instead. Refer to sort.
use warnings;
use strict;
my %country_hash = (
"001 Sample Name New Zealand" => "NEW ZEALAND",
"002 Samp2 Nam2 Zimbabwe " => "ZIMBABWE",
"003 SSS NNN Australia " => "AUSTRALIA",
"004 John Sample Philippines" => "PHILIPPINES",
);
my #sorted = sort { $country_hash{$a} cmp $country_hash{$b} } keys %country_hash;
foreach my $line(#sorted){
print "$line\n";
}
This prints:
003 SSS NNN Australia
001 Sample Name New Zealand
004 John Sample Philippines
002 Samp2 Nam2 Zimbabwe
This also works (without the extra array):
foreach my $line (sort {$country_hash{$a} cmp $country_hash{$b}} keys %country_hash) {
print "$line\n";
}

How do I search within a nested hash for the value of a specific key?

Say I have a hash like this:
[82] pry(main)> commit2
=> {:sha=>"4df2b779ddfcb27761c71e00e2b241bfa06a0950",
:commit=>
{:author=>
{:name=>"asasa asasa",
:email=>"asa#asasad.com",
:date=>2016-08-06 16:24:04 UTC,
:sha=> "876239789879ab9876c8769287698769876fed"},
:committer=>
{:name=>"asasa asasa",
:email=>"asa#asasad.com",
:date=>2016-08-06 16:26:45 UTC},
:message=>
"applies new string literal convention in activerecord/lib\n\nThe current code base is not uniform. After some discussion,\nwe have chosen to go with double quotes by default.",
:tree=>
{:sha=>"7a83cce62195f7b20afea6d6a8873b953d25cb84",
:url=>
"https://api.github.com/repos/rails/rails/git/trees/7a83cce62195f7b20afea6d6a8873b953d25cb84"},
:url=>
"https://api.github.com/repos/rails/rails/git/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950",
:comment_count=>0},
:url=>
"https://api.github.com/repos/rails/rails/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950",
:html_url=>
"https://github.com/rails/rails/commit/4df2b779ddfcb27761c71e00e2b241bfa06a0950",
:comments_url=>
"https://api.github.com/repos/rails/rails/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950/comments"
}
}
}
This hash has many nested hashes, but I want to check to see if any of the nested hashes have a :sha value of 876239789879ab9876c8769287698769876fed.
In the above example, it should return the [:commit][:author] hash, because that one has :sha key whose value is the same as the one we are looking for.
How do I do this?
Here's a recursive method :
data = {a: {b: :c, d: :e}, f: {g: {h: {i: :j}}}}
def find_value_in_nested_hash(data, desired_value)
data.values.each do |value|
case value
when desired_value
return data
when Hash
f = find_value_in_nested_hash(value, desired_value)
return f if f
end
end
nil
end
p find_value_in_nested_hash(data, :e)
# {b=>:c, :d=>:e}
With your example :
repo = { sha: '4df2b779ddfcb27761c71e00e2b241bfa06a0950',
commit: { author: { name: 'asasa asasa',
email: 'asa#asasad.com',
date: '2016-08-06 16:24:04 UTC',
sha: '876239789879ab9876c8769287698769876fed' },
committer: { name: 'asasa asasa',
email: 'asa#asasad.com',
date: '2016-08-06 16:26:45 UTC' },
message: "applies new string literal convention in activerecord/lib\n\nThe current code base is not uniform. After some discussion,\nwe have chosen to go with double quotes by default.",
tree: { sha: '7a83cce62195f7b20afea6d6a8873b953d25cb84',
url: 'https://api.github.com/repos/rails/rails/git/trees/7a83cce62195f7b20afea6d6a8873b953d25cb84' },
url: 'https://api.github.com/repos/rails/rails/git/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950',
comment_count: 0 },
url: 'https://api.github.com/repos/rails/rails/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950',
html_url: 'https://github.com/rails/rails/commit/4df2b779ddfcb27761c71e00e2b241bfa06a0950',
comments_url: 'https://api.github.com/repos/rails/rails/commits/4df2b779ddfcb27761c71e00e2b241bfa06a0950/comments' }
p find_value_in_nested_hash(repo, '876239789879ab9876c8769287698769876fed')
#=> {:name=>"asasa asasa", :email=>"asa#asasad.com", :date=>"2016-08-06 16:24:04 UTC", :sha=>"876239789879ab9876c8769287698769876fed"}

How to extract a number using regular expression in ruby

I am new to regular expressions and ruby. below is the example which I start working with
words= "apple[12345]: {123123} boy 1233 6F74 2AC 28458 1594 6532 1500 D242g
apple[13123]: {123123123} girl Aui817E 9AD453 91321SDF 3423FS 1213FDAS 110FADA4 43ADAC0 1AADS4D8 BASAA24 "
I want to extract boy 1233 6F74 .. to .. D242g in an array
Similarly I want to extract girl Aui817E 9AD453 .. to .. 43ADAC0 1AADS4D8 BASAA24 in an array
I did tried to this could not do it. Can some one please help me to this simple exercise.
Thanks in advance.
begin
pattern = /apple\[\d+\]: \{\d+\} (\w) (\d+) (\d+) /
f = pattern.match(words)
puts " #{f}"
end
words.scan(/apple\[\d+\]: \{\d+\}(.+)/).map{|a| a.first.scan(/\S+/)}
or
words.each_line.map{|s| s.split.drop(2)}
Output:
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array = words.scan(/apple\[\d+\]: {\d+}(.+)/).flatten.map { |line| line.scan(/\w+/) }
({ and } are not need to escape on regex.)
return
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array[0] gives an array start with "boy", and array[1] gives an array start with "girl".

Interpreting this raw text - a strategy?

I have this raw text:
________________________________________________________________________________________________________________________________
Pos Car Competitor/Team Driver Vehicle Cap CL Laps Race.Time Fastest...Lap
1 6 Jason Clements Jason Clements BMW M3 3200 10 9:48.5710 3 0:57.3228*
2 42 David Skillender David Skillender Holden VS Commodore 6000 10 9:55.6866 2 0:57.9409
3 37 Bruce Cook Bruce Cook Ford Escort 3759 10 9:56.4388 4 0:58.3359
4 18 Troy Marinelli Troy Marinelli Nissan Silvia 3396 10 9:56.7758 2 0:58.4443
5 75 Anthony Gilbertson Anthony Gilbertson BMW M3 3200 10 10:02.5842 3 0:58.9336
6 26 Trent Purcell Trent Purcell Mazda RX7 2354 10 10:07.6285 4 0:59.0546
7 12 Scott Hunter Scott Hunter Toyota Corolla 2000 10 10:11.3722 5 0:59.8921
8 91 Graeme Wilkinson Graeme Wilkinson Ford Escort 2000 10 10:13.4114 5 1:00.2175
9 7 Justin Wade Justin Wade BMW M3 4000 10 10:18.2020 9 1:00.8969
10 55 Greg Craig Grag Craig Toyota Corolla 1840 10 10:18.9956 7 1:00.7905
11 46 Kyle Orgam-Moore Kyle Organ-Moore Holden VS Commodore 6000 10 10:30.0179 3 1:01.6741
12 39 Uptiles Strathpine Trent Spencer BMW Mini Cooper S 1500 10 10:40.1436 2 1:02.2728
13 177 Mark Hyde Mark Hyde Ford Escort 1993 10 10:49.5920 2 1:03.8069
14 34 Peter Draheim Peter Draheim Mazda RX3 2600 10 10:50.8159 10 1:03.4396
15 5 Scott Douglas Scott Douglas Datsun 1200 1998 9 9:48.7808 3 1:01.5371
16 72 Paul Redman Paul Redman Ford Focus 2lt 9 10:11.3707 2 1:05.8729
17 8 Matthew Speakman Matthew Speakman Toyota Celica 1600 9 10:16.3159 3 1:05.9117
18 74 Lucas Easton Lucas Easton Toyota Celica 1600 9 10:16.8050 6 1:06.0748
19 77 Dean Fuller Dean Fuller Mitsubishi Sigma 2600 9 10:25.2877 3 1:07.3991
20 16 Brett Batterby Brett Batterby Toyota Corolla 1600 9 10:29.9127 4 1:07.8420
21 95 Ross Hurford Ross Hurford Toyota Corolla 1600 8 9:57.5297 2 1:12.2672
DNF 13 Charles Wright Charles Wright BMW 325i 2700 9 9:47.9888 7 1:03.2808
DNF 20 Shane Satchwell Shane Satchwell Datsun 1200 Coupe 1998 1 1:05.9100 1 1:05.9100
Fastest Lap Av.Speed Is 152kph, Race Av.Speed Is 148kph
R=under lap record by greatest margin, r=under lap record, *=fastest lap time
________________________________________________________________________________________________________________________________
Issue# 2 - Printed Sat May 26 15:43:31 2012 Timing System By NATSOFT (03)63431311 www.natsoft.com.au/results
Amended
I need to parse it into an object with the obvious Position, Car, Driver etc fields. The issue is I have no idea on what sort of strategy to use. If I split it on whitespace, I would end up with a list like so:
["1", "6", "Jason", "Clements", "Jason", "Clements", "BMW", "M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"]
Can you see the issue. I cannot just interpret this list, because people may have just 1 name, or 3 words in a name, or many different words in a car. It makes it impossible to just reference the list using indexes alone.
What about using the offsets defined by the column names? I can't quite see how that could be used though.
Edit: So the current algorithm I am using works like this:
Split the text on new line giving a collection of lines.
Find the common whitespace characters FURTHEST RIGHT on each line. I.e. the positions (indexes) on each line where every other
line contains whitespace. EG:
Split the lines based on those common characters.
Trim the lines
Several issues exist:
If the names contain the same lengths like so:
Jason Adams
Bobby Sacka
Jerry Louis
Then it will interpret that as two separate items: (["Jason" "Adams", "Bobby", "Sacka", "Jerry", "Louis"]).
Whereas if they all differed like so:
Dominic Bou
Bob Adams
Jerry Seinfeld
Then it would correctly split on the last 'd' in Seinfeld (and thus we'd get a collection of three names(["Dominic Bou", "Bob Adams", "Jerry Seinfeld"]).
It's also quite brittle. I am looking for a nicer solution.
This is not a good case for regex, you really want to discover the format and then unpack the lines:
lines = str.split "\n"
# you know the field names so you can use them to find the column positions
fields = ['Pos', 'Car', 'Competitor/Team', 'Driver', 'Vehicle', 'Cap', 'CL Laps', 'Race.Time', 'Fastest...Lap']
header = lines.shift until header =~ /^Pos/
positions = fields.map{|f| header.index f}
# use that to construct an unpack format string
format = 1.upto(positions.length-1).map{|x| "A#{positions[x] - positions[x-1]}"}.join
# A4A5A31A25A21A6A12A10
lines.each do |line|
next unless line =~ /^(\d|DNF)/ # skip lines you're not interested in
data = line.unpack(format).map{|x| x.strip}
puts data.join(', ')
# or better yet...
car = Hash[fields.zip data]
puts car['Driver']
end
http://blog.ryanwood.com/past/2009/6/12/slither-a-dsl-for-parsing-fixed-width-text-files this may solve your problem.
here are few more examples and github.
Hope this helps!
I think it is easy enough to just use the fixed width on each line.
#!/usr/bin/env ruby
# ruby parsing_winner.rb winners_list.txt
args = ARGV
puts "ruby parsing_winner.rb winners_list.txt " if args.empty?
winner_file = open args.shift
array_of_race_results, array_of_race_results_array = [], []
class RaceResult
attr_accessor :position, :car, :team, :driver, :vehicle, :cap, :cl_laps, :race_time, :fastest, :fastest_lap
def initialize(position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap)
#position = position
#car = car
#team = team
#driver = driver
#vehicle = vehicle
#cap = cap
#cl_laps = cl_laps
#race_time = race_time
#fastest = fastest
#fastest_lap = fastest_lap
end
def to_a
# ["1", "6", "Jason", "Clements", "Jason", "Clements", "BMW", "M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"]
[position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap]
end
end
# Pos Car Competitor/Team Driver Vehicle Cap CL Laps Race.Time Fastest...Lap
# 1 6 Jason Clements Jason Clements BMW M3 3200 10 9:48.5710 3 0:57.3228*
# 2 42 David Skillender David Skillender Holden VS Commodore 6000 10 9:55.6866 2 0:57.9409
# etc...
winner_file.each_line do |line|
next if line[/^____/] || line[/^\w{4,}|^\s|^Pos/] || line[0..3][/\=/]
position = line[0..3].strip
car = line[4..8].strip
team = line[9..39].strip
driver = line[40..64].strip
vehicle = line[65..85].strip
cap = line[86..91].strip
cl_laps = line[92..101].strip
race_time = line[102..113].strip
fastest = line[114..116].strip
fastest_lap = line[117..-1].strip
racer = RaceResult.new(position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap)
array_of_race_results << racer
array_of_race_results_array << racer.to_a
end
puts "Race Results Objects: #{array_of_race_results}"
puts "Race Results: #{array_of_race_results_array.inspect}"
Output =>
Race Results Objects: [#<RaceResult:0x007fcc4a84b7c8 #position="1", #car="6", #team="Jason Clements", #driver="Jason Clements", #vehicle="BMW M3", #cap="3200", #cl_laps="10", #race_time="9:48.5710", #fastest="3", #fastest_lap="0:57.3228*">, #<RaceResult:0x007fcc4a84aa08 #position="2", #car="42", #team="David Skillender", #driver="David Skillender", #vehicle="Holden VS Commodore", #cap="6000", #cl_laps="10", #race_time="9:55.6866", #fastest="2", #fastest_lap="0:57.9409">, #<RaceResult:0x007fcc4a849ce8 #position="3", #car="37", #team="Bruce Cook", #driver="Bruce Cook", #vehicle="Ford Escort", #cap="3759", #cl_laps="10", #race_time="9:56.4388", #fastest="4", #fastest_lap="0:58.3359">, #<RaceResult:0x007fcc4a8491f8 #position="4", #car="18", #team="Troy Marinelli", #driver="Troy Marinelli", #vehicle="Nissan Silvia", #cap="3396", #cl_laps="10", #race_time="9:56.7758", #fastest="2", #fastest_lap="0:58.4443">, #<RaceResult:0x007fcc4b091ab8 #position="5", #car="75", #team="Anthony Gilbertson", #driver="Anthony Gilbertson", #vehicle="BMW M3", #cap="3200", #cl_laps="10", #race_time="10:02.5842", #fastest="3", #fastest_lap="0:58.9336">, #<RaceResult:0x007fcc4b0916a8 #position="6", #car="26", #team="Trent Purcell", #driver="Trent Purcell", #vehicle="Mazda RX7", #cap="2354", #cl_laps="10", #race_time="10:07.6285", #fastest="4", #fastest_lap="0:59.0546">, #<RaceResult:0x007fcc4b091298 #position="7", #car="12", #team="Scott Hunter", #driver="Scott Hunter", #vehicle="Toyota Corolla", #cap="2000", #cl_laps="10", #race_time="10:11.3722", #fastest="5", #fastest_lap="0:59.8921">, #<RaceResult:0x007fcc4b090e88 #position="8", #car="91", #team="Graeme Wilkinson", #driver="Graeme Wilkinson", #vehicle="Ford Escort", #cap="2000", #cl_laps="10", #race_time="10:13.4114", #fastest="5", #fastest_lap="1:00.2175">, #<RaceResult:0x007fcc4b090a78 #position="9", #car="7", #team="Justin Wade", #driver="Justin Wade", #vehicle="BMW M3", #cap="4000", #cl_laps="10", #race_time="10:18.2020", #fastest="9", #fastest_lap="1:00.8969">, #<RaceResult:0x007fcc4b090668 #position="10", #car="55", #team="Greg Craig", #driver="Grag Craig", #vehicle="Toyota Corolla", #cap="1840", #cl_laps="10", #race_time="10:18.9956", #fastest="7", #fastest_lap="1:00.7905">, #<RaceResult:0x007fcc4b090258 #position="11", #car="46", #team="Kyle Orgam-Moore", #driver="Kyle Organ-Moore", #vehicle="Holden VS Commodore", #cap="6000", #cl_laps="10", #race_time="10:30.0179", #fastest="3", #fastest_lap="1:01.6741">, #<RaceResult:0x007fcc4b08fe48 #position="12", #car="39", #team="Uptiles Strathpine", #driver="Trent Spencer", #vehicle="BMW Mini Cooper S", #cap="1500", #cl_laps="10", #race_time="10:40.1436", #fastest="2", #fastest_lap="1:02.2728">, #<RaceResult:0x007fcc4b08fa38 #position="13", #car="177", #team="Mark Hyde", #driver="Mark Hyde", #vehicle="Ford Escort", #cap="1993", #cl_laps="10", #race_time="10:49.5920", #fastest="2", #fastest_lap="1:03.8069">, #<RaceResult:0x007fcc4b08f628 #position="14", #car="34", #team="Peter Draheim", #driver="Peter Draheim", #vehicle="Mazda RX3", #cap="2600", #cl_laps="10", #race_time="10:50.8159", #fastest="10", #fastest_lap="1:03.4396">, #<RaceResult:0x007fcc4b08f218 #position="15", #car="5", #team="Scott Douglas", #driver="Scott Douglas", #vehicle="Datsun 1200", #cap="1998", #cl_laps="9", #race_time="9:48.7808", #fastest="3", #fastest_lap="1:01.5371">, #<RaceResult:0x007fcc4b08ee08 #position="16", #car="72", #team="Paul Redman", #driver="Paul Redman", #vehicle="Ford Focus", #cap="2lt", #cl_laps="9", #race_time="10:11.3707", #fastest="2", #fastest_lap="1:05.8729">, #<RaceResult:0x007fcc4b08e9f8 #position="17", #car="8", #team="Matthew Speakman", #driver="Matthew Speakman", #vehicle="Toyota Celica", #cap="1600", #cl_laps="9", #race_time="10:16.3159", #fastest="3", #fastest_lap="1:05.9117">, #<RaceResult:0x007fcc4b08e5e8 #position="18", #car="74", #team="Lucas Easton", #driver="Lucas Easton", #vehicle="Toyota Celica", #cap="1600", #cl_laps="9", #race_time="10:16.8050", #fastest="6", #fastest_lap="1:06.0748">, #<RaceResult:0x007fcc4b08e1d8 #position="19", #car="77", #team="Dean Fuller", #driver="Dean Fuller", #vehicle="Mitsubishi Sigma", #cap="2600", #cl_laps="9", #race_time="10:25.2877", #fastest="3", #fastest_lap="1:07.3991">, #<RaceResult:0x007fcc4b08ddc8 #position="20", #car="16", #team="Brett Batterby", #driver="Brett Batterby", #vehicle="Toyota Corolla", #cap="1600", #cl_laps="9", #race_time="10:29.9127", #fastest="4", #fastest_lap="1:07.8420">, #<RaceResult:0x007fcc4a848348 #position="21", #car="95", #team="Ross Hurford", #driver="Ross Hurford", #vehicle="Toyota Corolla", #cap="1600", #cl_laps="8", #race_time="9:57.5297", #fastest="2", #fastest_lap="1:12.2672">, #<RaceResult:0x007fcc4a847948 #position="DNF", #car="13", #team="Charles Wright", #driver="Charles Wright", #vehicle="BMW 325i", #cap="2700", #cl_laps="9", #race_time="9:47.9888", #fastest="7", #fastest_lap="1:03.2808">, #<RaceResult:0x007fcc4a847010 #position="DNF", #car="20", #team="Shane Satchwell", #driver="Shane Satchwell", #vehicle="Datsun 1200 Coupe", #cap="1998", #cl_laps="1", #race_time="1:05.9100", #fastest="1", #fastest_lap="1:05.9100">]
Race Results: [["1", "6", "Jason Clements", "Jason Clements", "BMW M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"], ["2", "42", "David Skillender", "David Skillender", "Holden VS Commodore", "6000", "10", "9:55.6866", "2", "0:57.9409"], ["3", "37", "Bruce Cook", "Bruce Cook", "Ford Escort", "3759", "10", "9:56.4388", "4", "0:58.3359"], ["4", "18", "Troy Marinelli", "Troy Marinelli", "Nissan Silvia", "3396", "10", "9:56.7758", "2", "0:58.4443"], ["5", "75", "Anthony Gilbertson", "Anthony Gilbertson", "BMW M3", "3200", "10", "10:02.5842", "3", "0:58.9336"], ["6", "26", "Trent Purcell", "Trent Purcell", "Mazda RX7", "2354", "10", "10:07.6285", "4", "0:59.0546"], ["7", "12", "Scott Hunter", "Scott Hunter", "Toyota Corolla", "2000", "10", "10:11.3722", "5", "0:59.8921"], ["8", "91", "Graeme Wilkinson", "Graeme Wilkinson", "Ford Escort", "2000", "10", "10:13.4114", "5", "1:00.2175"], ["9", "7", "Justin Wade", "Justin Wade", "BMW M3", "4000", "10", "10:18.2020", "9", "1:00.8969"], ["10", "55", "Greg Craig", "Grag Craig", "Toyota Corolla", "1840", "10", "10:18.9956", "7", "1:00.7905"], ["11", "46", "Kyle Orgam-Moore", "Kyle Organ-Moore", "Holden VS Commodore", "6000", "10", "10:30.0179", "3", "1:01.6741"], ["12", "39", "Uptiles Strathpine", "Trent Spencer", "BMW Mini Cooper S", "1500", "10", "10:40.1436", "2", "1:02.2728"], ["13", "177", "Mark Hyde", "Mark Hyde", "Ford Escort", "1993", "10", "10:49.5920", "2", "1:03.8069"], ["14", "34", "Peter Draheim", "Peter Draheim", "Mazda RX3", "2600", "10", "10:50.8159", "10", "1:03.4396"], ["15", "5", "Scott Douglas", "Scott Douglas", "Datsun 1200", "1998", "9", "9:48.7808", "3", "1:01.5371"], ["16", "72", "Paul Redman", "Paul Redman", "Ford Focus", "2lt", "9", "10:11.3707", "2", "1:05.8729"], ["17", "8", "Matthew Speakman", "Matthew Speakman", "Toyota Celica", "1600", "9", "10:16.3159", "3", "1:05.9117"], ["18", "74", "Lucas Easton", "Lucas Easton", "Toyota Celica", "1600", "9", "10:16.8050", "6", "1:06.0748"], ["19", "77", "Dean Fuller", "Dean Fuller", "Mitsubishi Sigma", "2600", "9", "10:25.2877", "3", "1:07.3991"], ["20", "16", "Brett Batterby", "Brett Batterby", "Toyota Corolla", "1600", "9", "10:29.9127", "4", "1:07.8420"], ["21", "95", "Ross Hurford", "Ross Hurford", "Toyota Corolla", "1600", "8", "9:57.5297", "2", "1:12.2672"], ["DNF", "13", "Charles Wright", "Charles Wright", "BMW 325i", "2700", "9", "9:47.9888", "7", "1:03.2808"], ["DNF", "20", "Shane Satchwell", "Shane Satchwell", "Datsun 1200 Coupe", "1998", "1", "1:05.9100", "1", "1:05.9100"]]
You can use the fixed_width gem.
Your given file can be parsed with the following code:
require 'fixed_width'
require 'pp'
FixedWidth.define :cars do |d|
d.head do |head|
head.trap { |line| line !~ /\d/ }
end
d.body do |body|
body.trap { |line| line =~ /^(\d|DNF)/ }
body.column :pos, 4
body.column :car, 5
body.column :competitor, 31
body.column :driver, 25
body.column :vehicle, 21
body.column :cap, 5
body.column :cl_laps, 11
body.column :race_time, 11
body.column :fast_lap_no, 4
body.column :fast_lap_time, 10
end
end
pp FixedWidth.parse(File.open("races.txt"), :cars)
The trap method identifies the lines in each section. I used regex:
The head regex looks for lines that don't contain a digit.
The body regex looks for lines starting with a digit or "DNF"
Each section must include the line immediately after the last. The column definitions simply identify the number of columns to grab. The library strips whitespace for you. If you wanted to produce a fixed-width file, you can add alignment parameters, but it doesn't appear you will need that.
The result is a hash that starts like this:
{:head=>[{}, {}, {}],
:body=>
[{:pos=>"1",
:car=>"6",
:competitor=>"Jason Clements",
:driver=>"Jason Clements",
:vehicle=>"BMW M3",
:cap=>"3200",
:cl_laps=>"10",
:race_time=>"9:48.5710",
:fast_lap_no=>"3",
:fast_lap_time=>"0:57.3228"},
{:pos=>"2",
:car=>"42",
:competitor=>"David Skillender",
:driver=>"David Skillender",
:vehicle=>"Holden VS Commodore",
:cap=>"6000",
:cl_laps=>"10",
:race_time=>"9:55.6866",
:fast_lap_no=>"2",
:fast_lap_time=>"0:57.9409"},
Depending on how consistent the formatting is, you can probably use regex for this.
Here is a sample regex that works for the current data - may need to be tweaked depending on precise rules, but it gives the idea:
^
# Pos
(\d+|DNF)
\s+
#Car
(\d+)
\s+
# Team
([\w-]+(?: [\w-]+)+)
\s+
# Driver
([\w-]+(?: [\w-]+)+)
\s+
# Vehicle
([\w-]+(?: ?[\w-]+)+)
\s+
# Cap
(\d{4}|\dlt)
\s+
# CL Laps
(\d+)
\s+
# Race.Time
(\d+:\d+\.\d+)
\s+
# Fastest Lap
(\d+)
\s+
# Fastest Lap Time
(\d+:\d+\.\d+\*?)
\s*
$
If you can verify that the whitespace is space characters rather than tabs, and that overlong text is always truncated to fit the column structure, then I'd hard-code the slice boundaries:
parsed = [rawLine[0:3],rawLine[4:7],rawLine[9:38], ...etc... ]
Depending on the data source, this may be brittle (if, for instance every run has different column widths).
If the header row is always the same, you could extract the slice boundaries by searching for the known words of the header row.
Alright, I gotchu:
Edit: I forgot to mention, its assuming you've stored your input text in the variable input_string
# Choose a delimeter that is unlikely to occure
DELIM = '|||'
# DRY -> extend String
class String
def split_on_spaces(min_spaces = 1)
self.strip.gsub(/\s{#{min_spaces},}/, DELIM).split(DELIM)
end
end
# just get the data lines
lines = input_string.split("\n")
lines = lines[2...(lines.length - 4)].delete_if { |line|
line.empty?
}
# Grab all the entries into a nice 2-d array
entries = lines.map { |line|
[
line[0..8].split_on_spaces,
line[9..85].split_on_spaces(3).map{ |string|
string.gsub(/\s+/, ' ') # replace whitespace with 1 space
},
line[85...line.length].split_on_spaces(2)
].flatten
}
# BONUS
# Make nice hashes
keys = [:pos, :car, :team, :driver, :vehicle, :cap, :cl_laps, :race_time, :fastest_lap]
objects = entries.map { |entry|
Hash[keys.zip entry]
}
Outputs:
entries # =>
["1", "6", "Jason Clements", "Jason Clements", "BMW M3", "3200", "10", "9:48.5710", "3 0:57.3228*"]
["2", "42", "David Skillender", "David Skillender", "Holden VS Commodore", "6000", "10", "9:55.6866", "2 0:57.9409"]
...
# all of length 9, no extra spaces
And in case arrays just dont cut it
objects # =>
{:pos=>"1", :car=>"6", :team=>"Jason Clements", :driver=>"Jason Clements", :vehicle=>"BMW M3", :cap=>"3200", :cl_laps=>"10", :race_time=>"9:48.5710", :fastest_lap=>"3 0:57.3228*"}
{:pos=>"2", :car=>"42", :team=>"David Skillender", :driver=>"David Skillender", :vehicle=>"Holden VS Commodore", :cap=>"6000", :cl_laps=>"10", :race_time=>"9:55.6866", :fastest_lap=>"2 0:57.9409"}
...
I leave refactoring it into nice functions to you.
Unless there's a clear rule on how the columns are separated, you can't really do it.
The approach you have is good, assuming you know that each column value is properly indented to the column title.
Another approach could be to group words that are separated by exactly one space together (from the text you provided, I can see that this rule also holds).
Assuming the text will always be spaced the same, you could split the string based on position, then strip away extra spaces around each part. For example, in python:
pos=row[0:3].strip()
car=row[4:7].strip()
and so on. Alternately, you could define a regular expression to capture each part:
([:alnum:]+)\s([:num:]+)\s(([:alpha:]+ )+)\s(([:alpha:]+ )+)\s(([:alpha:]* )+)\s
and so on. (The exact syntax depends on your regexp grammar.) Note that the car regexp needs to handle the added spaces.
I'm not going to code this, but one way that definitely works for the above data set is by parsing it by white space and then assigning elements this way:
someArray = array of strings that were split by white space
Pos = someArray[0]
Car = someArray[1]
Competitor/Team = someArray[2] + " " + someArray[3]
Driver = someArray[4] + " " + someArray[5]
Vehicle = someArray[6] + " " + ... + " " + someArray[someArray.length - 6]
Cap = someArray[someArray.length - 5]
CL Laps = someArray[someArray.length - 4]
Race.Time = someArray[someArray.length - 3]
Fastest...Lap = someArray[someArray.length - 2] + " " + someArray[someArray.length - 1]
The vehicle part can be done by some sort of for or while loop.

Match an ID then inject new data

I think I am getting myself massively confused here in Ruby...
I have an array:
array1 = [["4b411f2bf964a52082c125e3", "The Three Pigeons", 51.236318, -0.57055], ["4b444648f964a52049f325e3", "The Royal Oak", 51.23555937678702, -0.5702378403809515], ["4b92c695f964a520aa1a34e3", "Slug And Lettuce", 51.237156, -0.571021], ["4b490136f964a520a56126e3", "The Robin Hood", 51.23603403568268, -0.568686], ["4b425f85f964a52092d225e3", "The Guildford Tup", 51.237734, -0.5703823], ["4b48f87ff964a520096026e3", "The Keep", 51.234704, -0.572574], ["4b426369f964a520e3d225e3", "The Five & Lime", 51.236908, -0.573695], ["4b56243af964a5204f0228e3", "The Albany", 51.23687122552597, -0.5666781994529876], ["4b426047f964a520a4d225e3", "The Kings Head", 51.234176, -0.573656], ["4b4261e4f964a520c6d225e3", "The Live and Let Live", 51.238477, -0.573306], ["4b425ec9f964a52086d225e3", "The Star Inn - Shepherd Neame", 51.23501026190194, -0.5749610066413879], ["4cb995490180721e03e09461", "Prince Albert", 51.242471, -0.572899], ["4e02726dc65b8061424b59f1", "Bar Mambo", 51.236896, -0.577263], ["4b7451e5f964a520fad42de3", "The Rodboro Buildings (Wetherspoon)", 51.236624141592365, -0.5775332450866699], ["4b6de739f964a520769a2ce3", "The White House", 51.23463575311113, -0.5773776769638062], ["4bb504a30ef1c9b6dbc2f412", "The Britannia - Shepherd Neame", 51.233105063438416, -0.5760687589645386], ["4b4447f8f964a5206cf325e3", "The George Abbot", 51.235186599066246, -0.5779409408569336], ["4b894378f964a520e72632e3", "The Boatman", 51.23155028087051, -0.572927], ["4bb475f449bdc9b65bcb0c10", "The Keystone", 51.23437208365849, -0.5779758095741272], ["4ba55291f964a5209afa38e3", "Rogues Bar", 51.23763173808256, -0.5610001087188721], ["4b40ce65f964a5205ebb25e3", "The Drummond", 51.24133950313797, -0.5758380889892578], ["4b48c8a8f964a520cb5626e3", "The Stok", 51.24272843225208, -0.5718989403261525], ["4c4f275651c2c9288af1859f", "The Parkway", 51.248229, -0.569356], ["4b48a60ff964a5208c5126e3", "The King's Head", 51.24666037427897, -0.5728936419289142], ["4b9a86c4f964a520f2bd35e3", "Ye Olde Ship Inn", 51.225673503520696, -0.5796146392822266], ["4c582015a7d976b0130cddee", "Wates House", 51.2420380341127, -0.5908584594726562], ["4b484917f964a520184b26e3", "The Rowbarge", 51.25055105804697, -0.5729025186239382], ["4bd86cb6e914a593c92f53fa", "The Wooden Bridge", 51.248547, -0.58514], ["4bd8b0442e6f0f4754240808", "The Seahorse", 51.218605041503906, -0.569018], ["4bb61c6bef159c74ff6b75f7", "The Astolat Public House", 51.23704061748392, -0.5893993377685547], ["4c978b274f16b71312c2ce3f", "The Queen Victoria", 51.21475338935852, -0.567119], ["4bb0eb08f964a520006a3ce3", "Anchor & Horseshoes", 51.254823, -0.548787], ["4c126e3c82a3c9b60ab0f9f8", "The Garage Tavern", 51.261162, -0.586647], ["4bed4208bac3c9b692fcfde9", "The Cricketers", 51.254246288978116, -0.6047425123927289], ["4b92cb0bf964a520501c34e3", "Horse & Groom", 51.24614672635736, -0.5279016494750977], ["4e8b7820be7b1b0656b1f927", "Apple Tree Pub", 51.2460676, -0.61427766], ["4dc58ce152b1e8f9f7d7378b", "Royal Oak", 51.248792, -0.626987], ["4c1a1c70838020a137aae661", "Withies Inn", 51.21199, -0.621551], ["4baded59f964a52031733be3", "The Jolly Farmer", 51.194106, -0.558244], ["4e21b6f052b1f82ffba120b5", "Compton Royal British Legion", 51.21349872454546, -0.6290990092597657], ["4b646071f964a52052ae2ae3", "The Jolly Farmer", 51.27866916930163, -0.5856227874755859], ["4d987f7961a3a1cd32aace42", "White Hart", 51.25011, -0.636736], ["4c8ccb34509e3704d9533655", "The Harrow Inn", 51.213387, -0.6316709518432617], ["4b71bb30f964a5200b592de3", "Bull's Head", 51.253260091783105, -0.5042177438735962], ["4f63b0b5e4b087553c2ae4fa", "The freeholders", 51.194189, -0.603889], ["4bc762a32f94d13aebd2117f", "The Cricketers", 51.194975, -0.608211], ["4be6c014bcef2d7f476805e5", "Worplesdon Place (Beef Eater Grill)", 51.27501810816803, -0.6078529357910156], ["4dbd63785da3ff58ec6192b1", "Scratchers", 51.19211, -0.60234], ["4e687a6bb3ad5d9197518ed6", "Three Lions - Shepherd Neame", 51.19198564344851, -0.6023865938186646], ["4c714ddcb5a5236acb995252", "The White heart Pub", 51.200254, -0.603593]]
I pass these results to an API that scores them. The result I get back is an array with lots of hashes in them. The Key is the ID from the array1 and the value is the score
array2 = [{"4bed4208bac3c9b692fcfde9"=>743.0}, {"4e21b6f052b1f82ffba120b5"=>789.0}, {"4b646071f964a52052ae2ae3"=>921.0}, {"4bb504a30ef1c9b6dbc2f412"=>99.0}, {"4b426369f964a520e3d225e3"=>80.0}, {"4c4f275651c2c9288af1859f"=>254.0}, {"4b92cb0bf964a520501c34e3"=>468.0}, {"4b425f85f964a52092d225e3"=>27.0}, {"4bd86cb6e914a593c92f53fa"=>512.0}, {"4e687a6bb3ad5d9197518ed6"=>622.0}, {"4b4447f8f964a5206cf325e3"=>73.0}, {"4b425ec9f964a52086d225e3"=>26.0}, {"4b484917f964a520184b26e3"=>328.0}, {"4b426047f964a520a4d225e3"=>37.0}, {"4c978b274f16b71312c2ce3f"=>253.0}, {"4b6de739f964a520769a2ce3"=>81.0}, {"4b48c8a8f964a520cb5626e3"=>167.0}, {"4bb475f449bdc9b65bcb0c10"=>80.0}, {"4c126e3c82a3c9b60ab0f9f8"=>739.0}, {"4bd8b0442e6f0f4754240808"=>210.0}, {"4bb61c6bef159c74ff6b75f7"=>231.0}, {"4b56243af964a5204f0228e3"=>56.0}, {"4b411f2bf964a52082c125e3"=>0.0}, {"4b48a60ff964a5208c5126e3"=>211.0}, {"4baded59f964a52031733be3"=>514.0}, {"4b40ce65f964a5205ebb25e3"=>124.0}, {"4b444648f964a52049f325e3"=>81.0}, {"4bb0eb08f964a520006a3ce3"=>376.0}, {"4f63b0b5e4b087553c2ae4fa"=>586.0}, {"4b9a86c4f964a520f2bd35e3"=>192.0}, {"4cb995490180721e03e09461"=>125.0}, {"4dc58ce152b1e8f9f7d7378b"=>955.0}, {"4b92c695f964a520aa1a34e3"=>20.0}, {"4c582015a7d976b0130cddee"=>484.0}, {"4c8ccb34509e3704d9533655"=>743.0}, {"4b48f87ff964a520096026e3"=>48.0}, {"4c1a1c70838020a137aae661"=>640.0}, {"4b894378f964a520e72632e3"=>55.0}, {"4e8b7820be7b1b0656b1f927"=>666.0}, {"4e02726dc65b8061424b59f1"=>78.0}, {"4b4261e4f964a520c6d225e3"=>57.0}, {"4ba55291f964a5209afa38e3"=>77.0}, {"4c714ddcb5a5236acb995252"=>473.0}, {"4b7451e5f964a520fad42de3"=>80.0}, {"4b490136f964a520a56126e3"=>71.0}, {"4d987f7961a3a1cd32aace42"=>1008.0}, {"4dbd63785da3ff58ec6192b1"=>622.0}, {"4b71bb30f964a5200b592de3"=>640.0}, {"4be6c014bcef2d7f476805e5"=>1016.0}, {"4bc762a32f94d13aebd2117f"=>577.0}]
I would like to end up with a new array that includes the value of the matched key in array 2 to array 1 e.g.
["4c714ddcb5a5236acb995252", "The White heart Pub", 51.200254, -0.603593, 622]
Not every value in array1 will get a score, some will not get anything and as a result they will not be returned in array2 at all. So I need to match them by id("4c714ddcb5a5236acb995252") and check if they are present and then match. the score with the data in array1
Array.assoc is nice for this:
array2.each{|h| array1.assoc(h.keys.first) << h.values.first}
p array1
Since these should be hashes, lets turn them into ones:
h1 = array1.each_with_object({}) { |a, h| h[a.first] = a[1..-1] }
#=> {"4b411f2bf964a52082c125e3"=>["The Three Pigeons", 51.236318, -0.57055], ... }
h2 = array2.inject(:merge)
#=> {"4bed4208bac3c9b692fcfde9"=>743.0, ... }
then we can easily create a final hash:
result = h2.each { |k, v| h1[k] << v }
#=> {"4b411f2bf964a52082c125e3"=>["The Three Pigeons", 51.236318, -0.57055, 0.0], ... }
note that this will break if h2 has a key that h1 doesn't already (or, more specifically, that h1's key's value isn't an array).
If you really want this in the array form you give, you can do:
result = result.map { |k, v| v.unshift(k) }
#=> [["4b411f2bf964a52082c125e3", "The Three Pigeons", 51.236318, -0.57055, 0.0], ... ]
I would use map from Enumerable like this:
array3 = array2.map do |item_from_array2|
id = item_from_array2.keys[0]
item_from_array1 = array1.find { |item| item.include?(id) }
item_from_array1.dup << item_from_array2[id]
end
Here, I enumerate over the items in array2 so that the result only contains items that exist in array2. Then, I get the id and search for the array in array1 that has that id. Then, add the value from array2 to the end of the item found from array1. Finally, map makes an array from every returned value.
As a performance consideration, you may want to create a hash from the first array that you can index by the id because calling array1.find every iteration will be slower than using hash1[id] every iteration.

Resources