Ruby String/Array Write program - ruby

For a project that I am working on for school, one of the parts of the project asks us to take a collection of all the Federalist papers and run it through a program that essentially splits up the text and writes new files (per different Federalist paper).
The logic I decided to go with is to run a search, and every time the search is positive for "Federalist No." it would save into a new file everything until the next "Federalist No".
This is the algorithm that I have so far:
file_name = "Federalist"
section_number = "1"
new_text = File.open(file_name + section_number, 'w')
i = 0
n= 1
while i < l.length
if (l[i]!= "federalist") and (l[i+1]!= "No")
new_text.puts l[i]
i = i + i
else
new_text.close
section_number = (section_number.to_i +1).to_s
new_text = File.open(file_name + section_number, "w")
new_text.puts(l[i])
new_text.puts(l[i+1])
i=i+2
end
end
After debugging the code as much as I could (I am a beginner at Ruby), the problem that I run into now is that because the while function always holds true, it never proceeds to the else command.
In terms of going about this in a different way, my TA suggested the following:
Put the entire text in one string by looping through the array(l) and adding each line to the one big string each time.
Split the string using the split method and the key word "FEDERALIST No." This will create an array with each element being one section of the text:
arrayName = bigString.split("FEDERALIST No.")
You can then loop through this new array to create files for each element using a similar method you use in your program.
But as simple as it may sound, I'm having an extremely difficult time putting even that code together.

i = i + i
i starts at 0, and 0 gets added to it, which gives 0, which will always be less than l, whatever that value is/means.
Since this is a school assignment, I hesitate to give you a straight-up answer. That's really not what SO is for, and I'm glad that you haven't solicited a full solution either.
So I'll direct you to some useful methods in Ruby instead that could help.
In Array: .join, .each or .map
In String: .split
Fyi, your TA's suggestion is far simpler than the algorithm you've decided to embark on... although technically, it is not wrong. Merely more complex.

Related

Scraping tracklist

I'm trying to scrape a tracklist from a website. My relevant code is:
page.css('ol').each do |line|
subarray = line.text.strip.split(" - ")
end
This makes the array take the first artist into the first index (as I want), but adds the track and the artist of track two into the second index like this:
subarray[0] = Rick Wilhite
subarray[1] = Magic Water [Still Music]
Edward
subarray[2] = Into A Better Future [Giegling]
Kassem Mosse
subarray[3] = Zolarem [Mikrodisko Recordings]
After Hours
I included the nested tag so my code reads:
page.css('ol li').each do |line|
subarray = line.text.strip.split(" - ")
end
but this only seems to leave subarray[0] displaying "Klara Lewis" and subarray[1] displaying "Shine [Editions Mego]", which is the last track on the tracklist. All other index values are blank.
A further complication is that I would like to remove the record label from what will end up being the track value. I believe the correct regular expression is \[[\d\D]*?\], but I'm under the impression that this needs to be applied before the data goes into the array to avoid complications involved in iterating over arrays. I tried passing it as a second delimiter to split (along with ' - ') which didn't work, and I also attempted to test it by changing my code to:
page.css('ol').each do |line|
subarray = line.text.strip.split("\[[\d\D]*?\]")
end
but that also appears not to work. Can anyone help me on this or give me the right pointers?
Here's what's happening:
page.css('ol') gives you the entire <ol> with every one of the <li> tags:
<ol>
<li>Rick Wilhite...</li>
<li>Edward...</li>
...
<li>Klara Lewis...</li>
</ol>
When that one big chunk enters the .each loop, you're only running through the loop once. So when you apply the .split(" - ") method, subarray will be filled once with all the text separated by -.
On the other hand, page.css('ol li') gives you each individual <li>, like this:
<li>Rick Wilhite...</li>
<li>Edward...</li>
...
<li>Klara Lewis...</li>
This time, you're running through the loop 17 times, once for each <li> tag. The first time through, .split(" - ") is applied to the text and stored in the subarray variable. The problem is that the next time through the loop, subarray is overwritten with the split text of the second <li>. So after the final time through, the only contents of the subarray variable is the split text of the final <li>: "Klara Lewis" and "Shine [Editions Mego]".
I think you've gotten the general idea of how to scrape from a website, but I recommend building your script more incrementally so you understand exactly what you're doing in each step. For example, use puts to check what page.css('ol') gives you and how it differs from page.css('ol li'). What happens when it goes through a loop? What do you get when you apply .split()? Building more slowly and exploring around to make sure you understand what you're doing will help you avoid hitting dead ends. Hope that helps!

Python Birthday paradox math not working

it run corectly but it should have around 500 matches but it only has around 50 and I dont know why!
This is a probelm for my comsci class that I am having isues with
we had to make a function that checks a list for duplication I got that part but then we had to apply it to the birthday paradox( more info here http://en.wikipedia.org/wiki/Birthday_problem) thats where I am runing into problem because my teacher said that the total number of times should be around 500 or 50% but for me its only going around 50-70 times or 5%
duplicateNumber=0
import random
def has_duplicates(listToCheck):
for i in listToCheck:
x=listToCheck.index(i)
del listToCheck[x]
if i in listToCheck:
return True
else:
return False
listA=[1,2,3,4]
listB=[1,2,3,1]
#print has_duplicates(listA)
#print has_duplicates(listB)
for i in range(0,1000):
birthdayList=[]
for i in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x= has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
else:
pass
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000)*100),3),"%"
This code gave me a result in line with what you were expecting:
import random
duplicateNumber=0
def has_duplicates(listToCheck):
number_set = set(listToCheck)
if len(number_set) is not len(listToCheck):
return True
else:
return False
for i in range(0,1000):
birthdayList=[]
for j in range(0,23):
birthday=random.randint(1,365)
birthdayList.append(birthday)
x = has_duplicates(birthdayList)
if x==True:
duplicateNumber+=1
print "after 1000 simulations with 23 students there were", duplicateNumber,"simulations with atleast one match. The approximate probibilatiy is", round(((duplicateNumber/1000.0)*100),3),"%"
The first change I made was tidying up the indices you were using in those nested for loops. You'll see I changed the second one to j, as they were previously bot i.
The big one, though, was to the has_duplicates function. The basic principle here is that creating a set out of the incoming list gets the unique values in the list. By comparing the number of items in the number_set to the number in listToCheck we can judge whether there are any duplicates or not.
Here is what you are looking for. As this is not standard practice (to just throw code at a new user), I apologize if this offends any other users. However, I believe showing the OP a correct way to write a program should be could all do us a favor if said user keeps the lack of documentation further on in his career.
Thus, please take a careful look at the code, and fill in the blanks. Look up the python doumentation (as dry as it is), and try to understand the things that you don't get right away. Even if you understand something just by the name, it would still be wise to see what is actually happening when some built-in method is being used.
Last, but not least, take a look at this code, and take a look at your code. Note the differences, and keep trying to write your code from scratch (without looking at mine), and if it messes up, see where you went wrong, and start over. This sort of practice is key if you wish to succeed later on in programming!
def same_birthdays():
import random
'''
This is a program that does ________. It is really important
that we tell readers of this code what it does, so that the
reader doesn't have to piece all of the puzzles together,
while the key is right there, in the mind of the programmer.
'''
count = 0
#Count is going to store the number of times that we have the same birthdays
timesToRun = 1000 #timesToRun should probably be in a parameter
#timesToRun is clearly defined in its name as well. Further elaboration
#on its purpose is not necessary.
for i in range(0,timesToRun):
birthdayList = []
for j in range(0,23):
random_birthday = random.randint(1,365)
birthdayList.append(random_birthday)
birthdayList = sorted(birthdayList) #sorting for easier matching
#If we really want to, we could provide a check in the above nester
#for loop to check right away if there is a duplicate.
#But again, we are here
for j in range(0, len(birthdayList)-1):
if (birthdayList[j] == birthdayList[j+1]):
count+=1
break #leaving this nested for-loop
return count
If you wish to find the percent, then get rid of the above return statement and add:
return (count/timesToRun)
Here's a solution that doesn't use set(). It also takes a different approach with the array so that each index represents a day of the year. I also removed the hasDuplicate() function.
import random
sim_total=0
birthdayList=[]
#initialize an array of 0's representing each calendar day
for i in range(365):
birthdayList.append(0)
for i in range(0,1000):
first_dup=True
for n in range(365):
birthdayList[n]=0
for b in range(0, 23):
r = random.randint(0,364)
birthdayList[r]+=1
if (birthdayList[r] > 1) and (first_dup==True):
sim_total+=1
first_dup=False
avg = float(sim_total) / 1000 * 100
print "after 1000 simulations with 23 students there were", sim_total,"simulations with atleast one duplicate. The approximate problibility is", round(avg,3),"%"

when and how to convert section of code to a method in ruby

I had a question regarding identifying all the points next to a given cell or set of cells) in a matrix (see Need a Ruby way to determine the elements of a matrix "touching" another element). Since no suitable ideas were put forth, I decided to proceed via brute force.
The code below successfully does what I sought to do. The array tmpl (template) contains a map of how to get from a given coordinate (provided by atlantis) to the 8 cells surrounding it. I then construct an array sl (shoreline) that contains all the “underwater” land touching the shoreline of atlantis by summing each element of atlantis with all elements of tmpl.
# create method to determine elements contiguous to atlantis
require 'matrix'
atlantis = [[2,3],[3,4]]
tmpl = [[-1,-1],[-1,0],[-1,1],[0,-1],[0,1],[1,-1],[1,0],[1,1]]
ln = 0
sl = []
while ln < atlantis.length
n = 0
tsl = []
while n < 8
tsl[n] = [atlantis[ln], tmpl[n]].transpose.map { |x| x.reduce(:+) }
n = n+ 1
end
sl = sl + tsl
ln = ln + 1
end
sl = sl - atlantis
sl.uniq!
sl.to_a.each { |r| puts r.inspect }
But I have a problem (one of many remaining) in that I still need 2 levels of loops above what’s shown here (one to keep adding land to atlantis until it reaches a set size and another to make additional islands, Bermuda, Catalina, etc.) and already this is becoming difficult to read and follow. My vague understanding of object oriented programming suggests that this cold be improved by turning some of these loops into methods. However, I learned to program 35 years ago in basic and am struggling to learn Ruby as it is. So my requests are:
Is in fact better to turn these into methods?
If so, would anyone be willing to show me how that’s done by changing something into an method?
What do you do when you add additional levels and discover you need to change something in a lower method as a result? (e.g, after figuring out the simple case of how to create sl with just one value in atlantis, I had to go back and rework it for longer values.)
I hoping by asking the question in this way, it becomes something also useful to other nubies.
BTW, this bit .transpose.map { |x| x.reduce(:+) } I found on Stack Overflow (after hours of trying to do it ‘cause it should be simple and if I couldn’t do it I must be missing something obvious. Yeah, I bet you know too.) lets you add two arrays element by element and I have no idea how it works.)
already this is becoming difficult to read and follow
One way of making it less difficult to read and follow is to try to make the code "self document", by using readable variable names and Ruby idioms to reduce the clutter.
A quick refactor of your code gives this:
require 'matrix'
atlantis = [[2,3],[3,4]]
template = [[-1,-1],[-1,0],[-1,1],[0,-1],[0,1],[1,-1],[1,0],[1,1]]
shoreline = []
atlantis.each do |atlantum|
shoreline += template.inject([]) do |memo, element|
memo << [atlantum, element].transpose.map { |x| x.reduce(:+) }
memo
end
end
shoreline = shoreline - atlantis
shoreline.uniq!
shoreline.each { |r| puts r.inspect }
The main processing block is half the size, and (hopefully) more readable, and from here you can use the extract method refactor to tidy it further if you still need/want to.

Can't convert nil into string--Ruby Secret Santa

I wrote a Secret Santa program (ala Ruby Quiz...ish), but occasionally when the program runs, I get an error.
Stats: If there's 10 names in the pot, the error comes up about 5% of the time. If there's 100 names in the pot, it's less than 1%. This is on a trial of 1000 times in bash. I've determined that the gift arrays are coming up nil at some point, but I'm not sure why or how to avoid it.
Providing code...
0.upto($lname.length-1).each do |i|
j = rand($giftlname.length) # should be less each time.
while $giftlname[j] == $lname[i] # redo random if it picks same person
if $lname[i] == $lname.last # if random gives same output again, means person is left with himself; needs to switch with someone
$giftfname[j], $fname[i] = $giftfname[i], $fname[j]
$giftlname[j], $lname[i] = $giftlname[i], $lname[j]
$giftemail[j], $email[i] = $giftemail[i], $email[j]
else
j = rand($giftlname.length)
end
end
$santas.push('Santa ' + $fname[i] + ' ' + $lname[i] + ' sends gift to ' + $giftfname[j] + ' ' + $giftlname[j] + ' at ' + '<' + $giftemail[j] + '>.') #Error here, something is sometimes nil
$giftfname.delete_at(j)
$giftlname.delete_at(j)
$giftemail.delete_at(j)
end
Thanks SO!
I think your problem is right here:
$giftfname[j], $fname[i] = $giftfname[i], $fname[j]
Your i values range between zero to the last index in $fname (inclusive) and, presumably, your $giftfname starts off as a clone of $fname (or at least another array with the same length). But, as you spin through the each, you're shrinking $giftfname so $giftfname[i] will be nil and the swap operation above will put nil into $giftfname[j] (which is supposed to be a useful entry of $giftfname). Similar issues apply to $giftlname and $giftemail.
I'd recommend using one array with three element objects (first name, last name, email) instead of your three parallel arrays. There's also a shuffle method on Array that might be of use to you:
Start with an array of people.
Make copy of that array.
Shuffle the copy until it is different at every index from that original array.
Then zip the together to get your final list of giver/receiver pairs.
Figured it out and used the retry statement. the if statement now looks like this (all other variables have been edited to be non-global as well)
if lname[i] == lname.last
santas = Array.new
giftfname = fname.clone
giftlname = lname.clone
giftemail = email.clone
retry
That, aside from a few other edits, created the solution I needed without breaking apart the code too much again. Will definitely try out mu's solution as well, but I'm just glad I have this running error-free for now.

Ruby, Count syllables

I am using ruby to calculate the Gunning Fog Index of some content that I have, I can successfully implement the algorithm described here:
Gunning Fog Index
I am using the below method to count the number of syllables in each word:
Tokenizer = /([aeiouy]{1,3})/
def count_syllables(word)
len = 0
if word[-3..-1] == 'ing' then
len += 1
word = word[0...-3]
end
got = word.scan(Tokenizer)
len += got.size()
if got.size() > 1 and got[-1] == ['e'] and
word[-1].chr() == 'e' and
word[-2].chr() != 'l' then
len -= 1
end
return len
end
It sometimes picks up words with only 2 syllables as having 3 syllables. Can anyone give any advice or is aware of a better method?
text = "The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense."
# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')
word_array = text.split(' ')
word_array.each do |word|
puts word if count_syllables(word) > 2
end
"themselves" is being counted as 3 but it's only 2
The function I give you before is based upon these simple rules outlined here:
Each vowel (a, e, i, o, u, y) in a
word counts as one syllable subject to
the following sub-rules:
Ignore final -ES, -ED, -E (except
for -LE)
Words of three letters or
less count as one syllable
Consecutive vowels count as one
syllable.
Here's the code:
def new_count(word)
word.downcase!
return 1 if word.length <= 3
word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
word.sub!(/^y/, '')
word.scan(/[aeiouy]{1,2}/).size
end
Obviously, this isn't perfect either, but all you'll ever get with something like this is a heuristic.
EDIT:
I changed the code slightly to handle a leading 'y' and fixed the regex to handle 'les' endings better (such as in "candles").
Here's a comparison using the text in the question:
# used to get rid of any puncuation
text = text.gsub!(/\W+/, ' ')
words = text.split(' ')
words.each do |word|
old = count_syllables(word.dup)
new = new_count(word.dup)
puts "#{word}: \t#{old}\t#{new}" if old != new
end
The output is:
logorrhoea: 3 4
used: 2 1
makes: 2 1
themselves: 3 2
So it appears to be an improvement.
One thing you ought to do is teach your algorithm about diphthongs. If I'm reading your code correctly, it would incorrectly flag "aid" as having two syllables.
You can also add "es" and the like to your special-case endings (you already have "ing") and just not count it as a syllable, but that might still result in some miscounts.
Finally, for best accuracy, you should convert your input to a spelling scheme or alphabet that has a definite relationship to the word's pronunciation. With your "themselves" example, the algorithm has no reliable way to know that the "e" "ves" is dropped. However, if you respelled it as "themselvz", or taught the algorithm the IPA and fed it [ðəmsɛlvz], it becomes very clear that the word is only pronounced with two syllables. That, of course, assumes you have control over the input, and is probably more work than just counting the syllables yourself.
To begin with it seems you should decrement len for the suffixes that should be excluded.
len-=1 if /.*[ing,es,ed]$/.match(word)
You could also check out Lingua::EN::Readability.
It can also calculate several readability measures, such as a Fog Index and a Flesch-Kincaid level.
PS. I think I know where you got the function from. DS.
There is also a rubygem called Odyssey that calculates Gunning Fog, along with some of the other popular ones (Flesch-Kincaid, SMOG, etc.)

Resources