How to match between two arrays and update one based on criteria - ruby

I'm trying to match two supplier csv's and update one based on the results of the other; things like if price is different, update one file with the matching item of the other. If the product is in the first csv but not in the other, update it. Once the data set is adjusted, I'll write it back to the csv which I'm ok with. Each supplier file is about 9000 lines long. Sample data from the two Puts lines in the code are:
#<struct RecordBUY item_type=nil, buy_product_id="1000", product_name="Plastic Jeweled Crown", product_type=nil, product_code_SKU="105238", option_set=nil, duplicate={"1000"=>["105238"]}, brand_name="Rubies Costumes", prod_desc="This plastic crown has six large jewel stones accross the top. Adjustable headband. (Colors of the jewel stones may vary, our choice please.)", cost_price="$3.76", prod_weight="00.14", prod_width="5.75", prod_height="0.5", prod_depth="23.5", prod_category="Hats, Wigs & Masks", prod_upn="082686025935", prod_size="One Size", prod_color="Gold">
#<struct BCRecord item_type="Product", bc_product_id="620", product_name="Dollar Ring", product_type=nil, product_code_SKU="109624", option_set=nil, duplicate=nil, brand_name="Rubies Costumes", prod_desc="Ring has three large glittery Dollar Signs '$' that extend over your fingers.", cost_price="3.20", prod_weight="0.7200", prod_width="4.0000", prod_height="1.0000", prod_depth="7.0000", prod_category="Accessories & Makeup", prod_upn="82686006996", prod_size=nil, prod_color=nil, option_set=nil, price="5.60", allow_purchases=[21]>
I read the csv data into arrays against respective objects, but don't know how to do searching and updating efficiently. I did not come across concepts to avoid the bad ones (or whether doing a bad one on 9k lines is actually bad or just frowned upon). What I have is:
puts records[0]
puts recordsBC[1]
#start script
records.each do | buyline |
recordsBC.each do | bcline |
if bcline.product_code_SKU == buyline.product_code_SKU
##update pricing (brute force);
#bcline.price = buyline.cost_price * 1.75 #this fails with undefined method `price=' for #<Record:0x007fbb9088b960>
bcline.cost_price = buyline.cost_price
end
##if product is in BC currently, but not in buy - needs to be marked as inactive in BC
if bcline.product_code_SKU.include? buyline.product_code_SKU
#bcline.allow_purchases = "N" # this fails with undefined method `allow_purchases=' for #<Record:0x007fb2878822c8>
end
#if product is in Buy but not in BC then add it into BC
if buyline.product_code_SKU.include? bcline.product_code_SKU
recordsBC.push buyline
end
end
end
I can't figure out a better way, nor understand why I'm getting the undefined method errors on some but not all lines. I'm not after complete answers, just enough to figure out the rest of the solution.

I'd start by reducing the number of iterations. At the moment you are iterating through all of recordsBC for each buyline. So I'd start with:
records.each do | buyline |
record_subset = recordsBC.select{|r|!(r.product_code_SKU.split & buyling.product_code_SKU.split).empty?}
record_subset.each do |bcline|
.....
end
end
That should mean you only iterate through bcline items that have a matching product_code_SKU. You may have to modify the split as your example doesn't show how multiple SKUs are separated (e.g. '123 456', '123,456', or '123/456')

Related

Separate characters and numbers following specific rules

I am trying to distinguish flight numbers.
Example:
flightno = "FR556"
split_data = flightno.upcase.match(/([A-Za-z]+)(\d+)/)
first = split_data[1] # FR
second = split_data[1] # 556
I then go on to query the database to find an airline based on the FR in this example and apply some logic with the result which is Ryanair.
My problem is when the flight number might be:
flightno = "U21920"
split_data = flightno.upcase.match(/([A-Za-z]+)(\d+)/)
first = split_data[1] # U
second = split_data[1] # 21920
i basically want first to be U2 not just U. This is used to search the database of airlines by their IATA code in this case is U2
****EDIT**
In the interest of clarity i made some mistakes in terminology when asking my question. Due to the complexities of booking reference numbers, the input is taken from whatever the passenger provides. For an easyJet flight for example, the passenger may input EZY1920 or U21920 only the airline provides either so the passenger is ignorant really.
"EZY" = ICAO
"U2" = IATA
I take the input from the user and try to separate the ICAO or IATA from the flight number "1920" but there is no way of determining that without searching the database or separating the input which i feel is cumbersome from a user experience point of view.
Using a regex to separate characters from numbers works until the user inputs an IATA as part of their flight number (the passenger won't know the difference) and as you can see in the example above this confuses the regex.**
The trouble is i cant think of any other pattern with flight numbers. They always have at least two characters made up of just letters or a mixture of a letter and a number and can be 3 characters in length. The numbers part can be as short as 1 but can also be as long as 4 - always numbers.
****edit**
As has been mentioned in the comments, there is no fixed size however one thing that is always true (at least so far) is the first character will always be a letter regardless if it is ICAO or IATA.
After considering every bodies input so far i'm wondering if searching the database and returning airlines with an IATA or ICAO that matches the first two letters provided by the user (U2), (FR), (EZ) might be one way to go, however this is subject to obvious problems should an ICAO or IATA be released that matches another airline, for example "EZY" & "EZT". This is not future proof and i'm looking for better ruby or regex solutions.**
Appreciate your input.
EDIT
I have answered my own question below. While other answers provide a solution for handling some conditions they would fall down if the flight number began with a number so i worked out a crass but to date stable way to analyse the string for digits and then work out if it is an ICAO or IATA from that.
A solution I think of is that you match your given flight number against a complete list of ICAO/IATA codes: https://raw.githubusercontent.com/datasets/airport-codes/master/data/airport-codes.csv
Spending some time with google might give you a more appropriate list.
Then use the first three characters (if that is the maximum) of your flight number to find a match within the icao codes. If you find one, you will know where to seperate your string.
Here a minimal ugly example that should set you on a track. Feel free to update!
ICAOCODES = %w(FR DEU U21) # grab your data here
def retrieve_flight_information(flightnumber)
ICAOCODES.each do |icao|
co = flightnumber.match(icao).to_s
if co.length > 0
# airline
puts co
# flight number
puts flightnumber.gsub(co,'')
end
end
end
retrieve_flight_information("FR556")
#=> FR
#=> 556
retrieve_flight_information("U21214123")
#=> U21
#=> 214123
The biggest flaw lies in using .gsub() as it might mess up your flightnumber in case it looks like this: "FR21413FR2"
However you will find plenty of solutions to this problem on so.
As mentioned in the comments, a list of icao codes is not what you are looking for. But what is relevant here, is that you somehow need a list of strings that you can securely compare against.
I have a fairly crass solution that seems to be working in all scenarios i can throw at it to date. I wanted to make this available to anybody else that might find it useful?
The general rule of thumb for flight codes/numbers seems to be:
IATA: two characters made up of any combination letters and digits
ICAO: three characters made up of letters only (to date)
With that in mind we should be able to work out if we need to search the database by IATA or ICAO depending on the condition of the first three characters.
First we take the flight number and convert to uppercase
string = "U21920".upcase
Next we analyse the first three characters to check for any numbers.
first_three = string[0,3] # => U21
Is there a digit in first_three?
if first_three =~ /\d/ # => true
iata = first_three[0,2] # => If true lets get rid of the last character
# Now we go to the database searching IATA (U2)
search = Airline.where('iata LIKE ?', "#{iata}%") # => Starts with search, just in case
Otherwise if there isnt a digit found in the string
else
icao = string.match(/([A-Za-z]+)(\d+)/)
search = Airline.where('icao LIKE ?', "#{icao[1]}%")
This seems to work for the random flight numbers ive tested it with today from a few of the major airport live departure/arrival boards. Its an interesting problem because some airlines issue tickets with either an ICAO or IATA code as part of the flight number which means passengers won't know any different, not to mention, some airports provide flight information in their own format so assumign there isnt a change to the ICAO and IATA build then the above should work.
Here is an example script you can run
test.rb
puts "What is your flight number?"
string = gets.upcase
first_three = string[0,3]
puts "Taking first three from #{string} is #{first_three}"
if first_three =~ /\d/ # Calling String's =~ method.
puts "The String #{first_three} DOES have a number in it."
iata = first_three[0,2]
search = Airline.where('iata LIKE ?', "#{iata}%")
puts "Searching Airlines starting with IATA #{iata} = #{search.count}"
puts "Found #{search.first.name} from IATA #{iata}"
else
puts "The String #{first_three} does not have a number in it."
icao = string.match(/([A-Za-z]+)(\d+)/)
search = Airline.where('icao LIKE ?', "#{icao[1]}%")
puts "Searching Airlines starting with ICAO #{icao[1]} = #{search.count}"
puts "Found #{search.first.name} from IATA #{icao[1]}"
end
Airline
Airline(id: integer, name: string, iata: string, icao: string, created_at: datetime, updated_at: datetime )
stick this in your lib folder and run
rails runner lib/test.rb
Obviously you can remove all of the puts statements to get straight to the result. I'm using rails runner to include access to my Airline model when running the script.

Scraping tracklist

I'm trying to scrape a tracklist from a website. My relevant code is:
page.css('ol').each do |line|
subarray = line.text.strip.split(" - ")
end
This makes the array take the first artist into the first index (as I want), but adds the track and the artist of track two into the second index like this:
subarray[0] = Rick Wilhite
subarray[1] = Magic Water [Still Music]
Edward
subarray[2] = Into A Better Future [Giegling]
Kassem Mosse
subarray[3] = Zolarem [Mikrodisko Recordings]
After Hours
I included the nested tag so my code reads:
page.css('ol li').each do |line|
subarray = line.text.strip.split(" - ")
end
but this only seems to leave subarray[0] displaying "Klara Lewis" and subarray[1] displaying "Shine [Editions Mego]", which is the last track on the tracklist. All other index values are blank.
A further complication is that I would like to remove the record label from what will end up being the track value. I believe the correct regular expression is \[[\d\D]*?\], but I'm under the impression that this needs to be applied before the data goes into the array to avoid complications involved in iterating over arrays. I tried passing it as a second delimiter to split (along with ' - ') which didn't work, and I also attempted to test it by changing my code to:
page.css('ol').each do |line|
subarray = line.text.strip.split("\[[\d\D]*?\]")
end
but that also appears not to work. Can anyone help me on this or give me the right pointers?
Here's what's happening:
page.css('ol') gives you the entire <ol> with every one of the <li> tags:
<ol>
<li>Rick Wilhite...</li>
<li>Edward...</li>
...
<li>Klara Lewis...</li>
</ol>
When that one big chunk enters the .each loop, you're only running through the loop once. So when you apply the .split(" - ") method, subarray will be filled once with all the text separated by -.
On the other hand, page.css('ol li') gives you each individual <li>, like this:
<li>Rick Wilhite...</li>
<li>Edward...</li>
...
<li>Klara Lewis...</li>
This time, you're running through the loop 17 times, once for each <li> tag. The first time through, .split(" - ") is applied to the text and stored in the subarray variable. The problem is that the next time through the loop, subarray is overwritten with the split text of the second <li>. So after the final time through, the only contents of the subarray variable is the split text of the final <li>: "Klara Lewis" and "Shine [Editions Mego]".
I think you've gotten the general idea of how to scrape from a website, but I recommend building your script more incrementally so you understand exactly what you're doing in each step. For example, use puts to check what page.css('ol') gives you and how it differs from page.css('ol li'). What happens when it goes through a loop? What do you get when you apply .split()? Building more slowly and exploring around to make sure you understand what you're doing will help you avoid hitting dead ends. Hope that helps!

How to iterate only through unique combinations of multiple objects?

The title is a bit of a doozy.
I'm working on a project where users can make bids. The resulting items can be won exclusively or split between up to 3 users. One user can put in an exclusive bet of $20, and another 3 users can both agree to do a 3-way split and each only pay $10, resulting in $30, beating the first bidder.
I need to run through a list of possibly a dozen different bidders who agreed to the 3-way split to determine the winning trio:
Rza => $20 # loses
ODB + Gza => $25 # loses
InspectahDeck + Ghostface + ODB => $50 # wins
Alternatively
Rza => $100,000 # wins
ODB + Gza => $25 # loses
InspectahDeck + Ghostface + ODB => $50 # loses
All I have is an array of Bid objects, belonging to a variety of users. My goal is to see all possible combinations of up those who wish to split with others and see who comes out on top.
I tried to do something like:
bids.each do |bid1|
bids.each do |bid2|
bids.each do |bid3|
# Fill a hash here, but only if the permutation of the bids is unique
end
end
end
I'm having a hard time with this since it seems horribly inefficient and has tons of duplicates, sometimes same bids appearing twice. I'd like some help or at tips to point me in the right direction.
I'm really stumped.
Thanks in advance.
PS: Another tricky detail: Each bidder can have multiple bids set. So the same guy can have 1 exclusive, 1 2-way and 1 3-way.
Suppose you have something like this:
class Bid
attr_accessor :user # link to the user
attr_accessor :price # dollar amount
attr_accessor :way # 1 means 1-way, 2 means 2-way, 3 means 3-way
end
Get the highest bets of each kind:
best_1_way = bids.select{|bid| bid.way == 1}.max
best_2_ways = bids.select{|bid| bid.way == 2}.sort[-2,2]
best_3_ways = bids.select{|bid| bid.way == 3}.sort[-3,3]
Get the total prices:
total_1_way_price = best_1_way.price
total_2_ways_price = best_2_ways.map(&:price).inject(&:+)
total_3_ways_price = best_3_ways.map(&:price).inject(&:+)
Compare these three items, and you get your winner.
If you have a lot of bids and want to optimize:
all_1_ways, all_2_ways, all_3_ways =
bids.group_by{|bid| bid.way }.values_at(1,2,3)

Python Birthday paradox math not working

it run corectly but it should have around 500 matches but it only has around 50 and I dont know why!
This is a probelm for my comsci class that I am having isues with
we had to make a function that checks a list for duplication I got that part but then we had to apply it to the birthday paradox( more info here http://en.wikipedia.org/wiki/Birthday_problem) thats where I am runing into problem because my teacher said that the total number of times should be around 500 or 50% but for me its only going around 50-70 times or 5%
duplicateNumber=0
import random
def has_duplicates(listToCheck):
for i in listToCheck:
x=listToCheck.index(i)
del listToCheck[x]
if i in listToCheck:
return True
else:
return False
listA=[1,2,3,4]
listB=[1,2,3,1]
#print has_duplicates(listA)
#print has_duplicates(listB)
for i in range(0,1000):
birthdayList=[]
for i in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x= has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
else:
pass
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000)*100),3),"%"
This code gave me a result in line with what you were expecting:
import random
duplicateNumber=0
def has_duplicates(listToCheck):
number_set = set(listToCheck)
if len(number_set) is not len(listToCheck):
return True
else:
return False
for i in range(0,1000):
birthdayList=[]
for j in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x = has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000.0)*100),3),"%"
The first change I made was tidying up the indices you were using in those nested for loops. You'll see I changed the second one to j, as they were previously bot i.
The big one, though, was to the has_duplicates function. The basic principle here is that creating a set out of the incoming list gets the unique values in the list. By comparing the number of items in the number_set to the number in listToCheck we can judge whether there are any duplicates or not.
Here is what you are looking for. As this is not standard practice (to just throw code at a new user), I apologize if this offends any other users. However, I believe showing the OP a correct way to write a program should be could all do us a favor if said user keeps the lack of documentation further on in his career.
Thus, please take a careful look at the code, and fill in the blanks. Look up the python doumentation (as dry as it is), and try to understand the things that you don't get right away. Even if you understand something just by the name, it would still be wise to see what is actually happening when some built-in method is being used.
Last, but not least, take a look at this code, and take a look at your code. Note the differences, and keep trying to write your code from scratch (without looking at mine), and if it messes up, see where you went wrong, and start over. This sort of practice is key if you wish to succeed later on in programming!
def same_birthdays():
import random
'''
This is a program that does ________. It is really important
that we tell readers of this code what it does, so that the
reader doesn't have to piece all of the puzzles together,
while the key is right there, in the mind of the programmer.
'''
count = 0
#Count is going to store the number of times that we have the same birthdays
timesToRun = 1000 #timesToRun should probably be in a parameter
#timesToRun is clearly defined in its name as well. Further elaboration
#on its purpose is not necessary.
for i in range(0,timesToRun):
birthdayList = []
for j in range(0,23):
random_birthday = random.randint(1,365)
birthdayList.append(random_birthday)
birthdayList = sorted(birthdayList) #sorting for easier matching
#If we really want to, we could provide a check in the above nester
#for loop to check right away if there is a duplicate.
#But again, we are here
for j in range(0, len(birthdayList)-1):
if (birthdayList[j] == birthdayList[j+1]):
count+=1
break #leaving this nested for-loop
return count
If you wish to find the percent, then get rid of the above return statement and add:
return (count/timesToRun)
Here's a solution that doesn't use set(). It also takes a different approach with the array so that each index represents a day of the year. I also removed the hasDuplicate() function.
import random
sim_total=0
birthdayList=[]
#initialize an array of 0's representing each calendar day
for i in range(365):
birthdayList.append(0)
for i in range(0,1000):
first_dup=True
for n in range(365):
birthdayList[n]=0
for b in range(0, 23):
r = random.randint(0,364)
birthdayList[r]+=1
if (birthdayList[r] > 1) and (first_dup==True):
sim_total+=1
first_dup=False
avg = float(sim_total) / 1000 * 100
print "after 1000 simulations with 23 students there were", sim_total,"simulations with atleast one duplicate. The approximate problibility is", round(avg,3),"%"

matching array items in rails

I have two arrays and I want to see the total number of matches, between the arrays individual items that their are.
For example arrays with:
1 -- House, Dog, Cat, Car
2 -- Cat, Book, Box, Car
Would return 2.
Any ideas? Thanks!
EDIT/
Basically I have two forms (for two different types of users) that uses nested attributes to store the number of skills they have. I can print out the skills via
current_user.skills.each do |skill| skill.name
other_user.skills.each do |skill| skill.name
When I print out the array, I get: #<Skill:0x1037e4948>#<Skill:0x1037e2800>#<Skill:0x1037e21e8>#<Skill:0x1037e1090>#<Skill:0x1037e0848>
So, yes, I want to compare the two users skills and return the number that match. Thanks for your help.
This works:
a = %w{house dog cat car}
b = %w{cat book box car}
(a & b).size
Documentation: http://www.ruby-doc.org/core/classes/Array.html#M000274
To convert classes to an array using the name, try something like:
class X
def name
"name"
end
end
a = [X.new]
b = [X.new]
(a.map{|x| x.name} & b.map{|x| x.name}).size
In your example, a is current_user.skills and b is other_users.skills. x is simply a reference to the current index of the array as the map action loops through the array. The action is documented in the link I provided.

Resources