Ruby splitting string into different files - ruby

Here I've created an algorithm that extracts an array of the Federalist papers and splits them up saving them into separate files titled "Federalist No." followed by their respective numbers. Everything works perfectly and the files are being created beautifully; however, the only problem I run into now is that it fails to create the last output.
Maybe it's because I've been staring at this for too many hours but I'm at an impasse.
I've inserted the line puts fedSections.length to see what the output is.
Using a smaller version of the compilation of the Fed papers for testing, the terminal output is 3... it creates "Federalist No. 0" a blank document to take into account empty space and "Federalist No. 1" with the first federalist paper. No "Federalist No. 2."
Any thoughts?
# Create new string to add array l to
fedString = " "
for f in 0...l.length-1
fedString += l[f] + ''
end
# Create variables applied to new files
Federalist_No= "Federalist No."
a = "0"
b = "FEDERALIST No."
fedSections = Array.new() # New array to insert Federalist paper to
fedSections = fedString.split("FEDERALIST No.") # Split string into elements of the array at each change in Federalist paper
puts fedSections.length
# Split gives empty string, off by one
for k in 0...fedSections.length-1 # Use of loop to write each Fed paper to its own file
new_text = File.open(Federalist_No + a + ".txt", "w") # Open said file with write capabilities
new_text.puts(b+a) # Write the "FEDERALIST No" and the number from "a"
new_text.puts fedSections[k] # Write contents of string (section of paper) to a file
new_text.close()
a = a.to_i + 1 # Increment "a" by one to accomodate for consecutive papers
a = a.to_s # Restore to string
end

The error is in your for loop
for k in 0...fedSections.length-1
you actually want
for k in 0..fedSections.length-1
... does not include the last element in the range
but as screenmutt said, it is more idiomatic ruby to use an each loop
fedSections.each do |section|

Related

Ruby file renamer

this is a text file renamer i made, you throw the file in a certain folder and the program renames them to file1.txt, file2.txt, etc
it gets the job done but it's got two problems
it gives me this error no implicit conversion of nil into String error
if i add new files into the folder where there's already organized files, they're all deleted and a new file is created
what's causing these problems?
i=0
Dir.chdir 'C:\Users\anon\Desktop\newfolder'
arr = Dir.entries('C:\Users\anon\Desktop\newfolder')
for i in 2..arr.count
if (File.basename(arr[i]) == 'file'+((i-1).to_s)+'.txt')
puts (arr[i]+' is already renamed to '+'file'+i.to_s)
else
File.rename(arr[i],'file'+((i-1).to_s)+'.txt')
end
end
There are two main problems in your program.
The first is that you are using an out of bounds value in the array arr. Try this a = [1,2,3]; a[a.count] and you will get nil because you are trying at access a[3] but the last element in the array has index 2.
Then, you are using as indexes for names fileINDEX.txt always 2...foobar without taking into account that some indexes may be already used in your directory.
Extra problem, you are using Dir.entries, this in my OS gives regular entries more . and .. which should be managed properly, they are not what you want to manipulate.
So, I wrote you a little script, I hope you find it readable, to me it works. You can improve it for sure! (p.s. I am under Linux OS).
# Global var only to stress its importance
$dir = "/home/p/tmp/t1"
Dir.chdir($dir)
# get list of files
fnames = Dir.glob "*"
# get the max index "fileINDEX.txt" already used in the directory
takenIndexes = []
fnames.each do |f|
if f.match /^file(\d+).txt/ then takenIndexes.push $1.to_i; end
end
# get the first free index available
firstFreeIndex = 1
firstFreeIndex = (takenIndexes.max + 1) if takenIndexes.length > 0
# get a range of fresh indexes for possible use
idxs = firstFreeIndex..(firstFreeIndex + (fnames.length))
# i transform the range to list and reverse the order because i want
# to use "pop" to get and remove them.
idxs = idxs.to_a
idxs.reverse!
# rename the files needing to be renamed
puts "--- Renamed files ----"
fnames.each do |f|
# if file has already the wanted format then move to next iteration
next if f.match /^file\d+.txt/
newName = "file" + idxs.pop.to_s + ".txt"
puts "rename: #{f} ---> #{newName}"
File.rename(f, newName)
end

copy the lines of a file into hashmap in ruby

I have a file with multiple lines. In each line, there two words and a number, split by a comma - for example a, b, 1. It means that string A and string B have the key as 1. I wrote the below piece of code
File.open(ARGV[0], 'r') do |f1|
while line = f1.gets
puts line
end
end
i'm looking for an idea of how to split and copy the characters and number in such a way that the first two words have the last number as key in the hashmap.
Does this work for you?
hash = {}
File.readlines(ARGV[0]).each do |line|
var = line.gsub(' ','').split(',')
hash[var[2]] = var[0],var[1]
end
This would give:
hash['1'] = ['a','b']
I don't know if you want to store number one as an integer or a string, if it's a integer you're looking for, just do var[2].to_i before storing.
Modified your code a little bit, i think it's shorter this way, if i'm in any way wrong, do let me know.

Programming concept

I want to make a program that sort mail from junkmail using a point--system.
For some couple of words in the mail,
I want the program to give different points for each word that I have in my program categorized as "junkwords" where I also have assign different points for different words, so that each word is worth some amount of points.
My pseudocode:
Read text from file
Look for "junk words"
for each word that comes up give the point the word is worth.
If the total points for each junkword is 10 print "SPAM" followed by a list of words that were in the file and categorized as junkwords and their points.
Example (a textfile):
Hello!
Do you have trouble sleeping?
Do you need to rest?
Then dont hesitate call us for the absolute solution- without charge!
So when the programs run and analyzes the text above it should look like:
SPAM 14p
trouble 6p
charge 3p
solution 5p
So what I was planing to write was in this manners:
class junk(object):
fil = open("filnamne.txt","r")
junkwords = {"trouble":"6p","solution":"3p","virus":"4p"}
words = junkwords
if words in fil:
print("SPAM")
else:
print("The file doesn't contain any junk")
So my problem now is how do I give points for each word in my list that comes up in the file?
And how to I sum the total points so that if total_points are > 10 then the program should print "SPAM",
Followed by the list of the 'junkwords' that are found in the file and the total points of each word..
Here is a quick script that might get you close to there:
MAXPOINTS = 10
JUNKWORDS={"trouble":6,"solution":5,"charge":3,"virus":7}
fil = open("filnamne.txt", "r")
foundwords = {}
points = 0
for word in fil.read().split():
if word in JUNKWORDS:
if word not in foundwords:
foundwords[word] = 0
points += JUNKWORDS[word]
foundwords[word] += 1
if points > 10:
print "SPAM"
for word in foundwords:
print word, foundwords[word]*JUNKWORDS[word]
else:
print "The file doesn't contain any junk"
You may want to use .lower() on the words and make all your dictionary keys lowercase. Maybe also remove all non-alphanumeric characters.
Here's another approach:
from collections import Counter
word_points = {'trouble': 6, 'solution': 5, 'charge': 3, 'virus': 7}
words = []
with open('ham.txt') as f:
for line in f:
if line.strip(): # weed out empty lines
for word in line.split():
words.append(word)
count_of_words = Counter(words)
total_points = {}
for word in word_points:
if word in count_of_words:
total_points[word] = word_points[word] * count_of_words[word]
if sum(i[0] for i in total_points.iteritems()) > 10:
print 'SPAM {}'.format(sum(i[0] for i in total_points.iteritems()))
for i in total_points.iteritems():
print 'Word: {} Points: {}'.format(*i)
There are some optimizations you can do, but it should give you an idea of the general logic. Counter is available from Python 2.7 and above.
I have assumed that each word has different points, so I have used a dictionary.
You need to find the number of times a word in words has come in the file.
You should store the point for each word as an integer. not as '6p' or '4p'
So, try this:
def find_junk(filename):
word_points = {"trouble":6,"solution":3,"charge":2,"virus":4}
word_count = {word:0 for word in word_points}
count = 0
found = []
with open(filename) as f:
for line in f:
line = line.lower()
for word in word_points:
c = line.count(word)
if c > 0:
count += c * word_points[word]
found.append(word)
word_count[word] += c
if count >= 10:
print ' SPAM'*4
for word in found:
print '%10s%3s%3s' % (word, word_points[word], word_count[word])
else:
print "Not spam"
find_junk('spam.txt')

How to get text between two strings in ruby?

I have a text file that contains this text:
What's New in this Version
==========================
-This is the text I want to get
-It can have 1 or many lines
-These equal signs are repeated throughout the file to separate sections
Primary Category
================
I just want to get everything between ========================== and Primary Category and store that block of text in a variable. I thought the following match method would work but it gives me, NoMethodError: undefined method `match'
f = File.open(metadataPath, "r")
line = f.readlines
whatsNew = f.match(/==========================(.*)Primary Category/m).strip
Any ideas? Thanks in advance.
f is a file descriptor - you want to match on the text in the file, which you read into line. What I prefer to do instead of reading the text into an array (which is hard to regex on) is to just read it into one string:
contents = File.open(metadataPath) { |f| f.read }
contents.match(/==========================(.*)Primary Category/m)[1].strip
The last line produces your desired output:
-This is the text I want to get \n-It can have 1 or many lines\n-These equal signs are repeated throughout the file to separate sections"
f = File.open(metadataPath, "r")
line = f.readlines
line =~ /==========================(.*)Primary Category/m
whatsNew = $1
you may want to consider refining the .* though as that could be greedy
Your problem is that readlines gives you an array of strings (one for each line), but the regular expression you're using needs a single string. You could read the file as one string:
contents = File.read(metadataPath)
puts contents[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
or you could join the lines into a single string before applying the regular expression:
lines = File.readlines(metadataPath)
puts lines.join[/^=+(.*?)Primary Category/m]
# => ==========================
# => -This is the text I want to get
# => -It can have 1 or many lines
# => -These equal signs are repeated throughout the file to separate sections
# =>
# => Primary Category
The approach I'd take is read in the lines, find out which line numbers are a series of equal signs (using Array#find_index), and group the lines into chunks from the line after the equal signs to the line before (or two lines before) the next lot of equal signs (probably using Enumerable#each_cons(2) and map). That way I don't have to modify much if the section headings change.

Using Ruby to find the first previous occurrence of a string

I'm creating some basic work assistance utilities using Ruby. I've hit a problem that I don't really need to solve, but curiosity has the best of me.
What I would like to be able to do is search the contents of a file, starting from a particular line and find the first PREVIOUS occurrence of a string.
For example, if I have the following text saved in a file, I would like to be able to search for "CREATE PROCEDURE" starting at line 4 and have this return/output "CREATE PROCEDURE sp_MERGE_TABLE"
CREATE PROCEDURE sp_MERGE_TABLE
AS
SOME HORRIBLE STATEMENT
HERE
CREATE PROCEDURE sp_SOMETHING_ELSE
AS
A DIFFERENT STATEMENT
HERE
Searching for content isn't a challenge, but specifying a starting line - no idea. And then searching backwards... well...
Any help at all appreciated!
TIA!
I think you have to read file line one by line
then follwing will work
flag=true
if flag && line.include?("CREATE PROCEDURE")
puts line
flag=false
end
If performance isn't a big issue, you could just use a simple loop:
# pseudocode
line_no = 0
while line_no < start_line
read line from file
if content_found in this line
last_seen = line_no # or file offset
end
line_no += 1
end
return last_seen
I'm afraid you will have to work line by line through the file, unless you have some index over it, pointing to the beginnings of the lines. That would make the loop a little bit simpler but working through the file in backwards manner is harder (unless you keep the whole file in memory).
Edit:
I just had a much better idea, but I'm going to include the old solution anyway.
The benefit of searching backwards means you only have to read the first chunk of the file, upto the specified line number. For proximity, you get closer and closer to the start_line, and if you find a match you just forget the old one.. You still read in some redundant data at the beginning, but at least it's O(n)
path = "path/to/file"
start_line = 20
search_string = "findme!"
#assuming file is at least start_line lines long
match_index = nil
f = File.new(path)
start_line.times do |i|
line = f.readline
match_index = i if line.include? search_string
end
puts "Matched #{search_string} on line #{match_index}"
Of course, bear in mind that the size of this file plays an important role in answering your question.
If you wanted to get really serious, you could look into the IO class - it seems like this might be the ultimate solution. Untested, just a thought.
f = File.new(path)
start_line.downto(0) do |i|
f.lineno = i
break if f.gets.include?(search_string)
end
Original:
For an exhaustive solution, you could try something like the following. The downside is you'd need to read the whole file into memory, but it takes into account continuing from the bottom-up if it gets to the top without a match. Untested.
path = "path/to/file"
start_line = 20
search_string = "findme!"
#get lines of the file into an array (chomp optional)
lines = File.readlines(path).map(&:chomp)
#"cut" the deck, as with playing cards, so start_line is first in the array
lines = lines.slice!(start_line..lines.length) + lines
#searching backwards can just be searching a reversed array forwards
lines.reverse!
#search through the reversed-array, for the first occurence
reverse_occurence = nil
lines.each_with_index do |line,index|
if line.include?(search_string)
reverse_occurence = index
break
end
end
#reverse_occurence is now either "nil" for no match, or a reversed-index
#also un-cut the array when calculating the index
if reverse_occurence
occurence = lines.size - reverse_occurence - 1 + start_line
line = lines[reverse_occurence]
puts "Matched #{search_string} on line #{occurence}"
puts line
end
1) Read the entire file into a string.
2) Reverse the file-data string.
3) Reverse the search string.
4) Search forward. Remember to match end-of-line instead of beginning-of-line, and to start from position end-minus-N rather than from N.
Not very fast or efficient, but it's elegant. Or at least clever.

Resources