Swap Words in File with hash - ruby

I have a text file and I am trying to replace certain lines with the values in a hash. I am trying to make it loop through the file, and swap out anything that matches the hash. For some reason this isn't working, it only duplicates the file, doesn't swap anything out. Any Ideas?
HASHBROWNS{
'mustard' => 'dijon',
'ketchup' => 'catsup',
}
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
if HASHBROWNS.has_key?(swaparoo.downcase)
file.puts HASHBROWNS[swaparoo.downcase]
else
file.puts swaparoo
end
end
end
Thanks
Ryn

Change this line:
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
to this:
File.open('oldfile.txt', 'r').readlines.map(&:chomp).each do |swaparoo|
The problem is your array of lines contains newlines.

When you read data with readlines there will be a newline present in each string. This is what's making your match miss. The easy way is to just trim it off with chomp. You may want do modify your test slightly:
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |line|
line = line.chomp.downcase
file.puts HASHBROWNS[line] || line
end
end
One thing to pay attention to is not repeatedly calling methods like downcase if you can simply save the result to a temporary variable and recycle it.

Related

How to delete lines from multiple files

I'm trying to read a file (d:\mywork\list.txt) line by line and search if that string occurs in any of the files (one by one) in a particular directory (d:\new_work).
If present in any of the files (may be one or more) I want to delete the string (car\yrui3,) from the respective files and save the respective file.
list.txt:
car\yrui3,
dom\09iuo,
id\byt65_d,
rfc\some_one,
desk\aa_tyt_99,
.........
.........
Directory having multiple files: d:\new_work:
Rollcar-access.txt
Mycar-access.txt
Newcar-access.txt
.......
......
My code:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
The issue is, values are not getting deleted as expected. Also, text is empty when I tried to print the value.
There are a number of things wrong with your code, and you're not safely handling your file changes.
Meditate on this untested code:
ACCESS_FILES = Dir.glob("D:/new_work/*-access.txt")
File.foreach('D:/mywork/list.txt') do |target|
target = target.strip.sub(/,$/, '')
ACCESS_FILES.each do |filename|
new_filename = "#{filename}.new"
old_filename = "#{filename}.old"
File.open(new_filename, 'w') do |fileout|
File.foreach(filename) do |line_in|
fileout.puts line_in unless line_in[target]
end
end
File.rename(filename, old_filename)
File.rename(new_filename, filename)
File.delete(old_filename)
end
end
In your code you use:
File.open('D:\\mywork\\list.txt').read
instead, a shorter, and more concise and clear way would be to use:
File.read('D:/mywork/list.txt')
Ruby will automatically adjust the pathname separators based on the OS so always use forward slashes for readability. From the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb".
The problem using read is it isn't scalable. Imagine if you were doing this in a long term production system and your input file had grown into the TB range. You'd halt the processing on your system until the file could be read. Don't do that.
Instead use foreach to read line-by-line. See "Why is "slurping" a file not a good practice?". That'll remove the need for
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
While
Dir.glob("D:/new_work/*-access.txt") do |fn|
is fine, its placement isn't. You're doing it for every line processed in your file being read, wasting CPU. Read it first and store the value, then iterate over that value repeatedly.
Again,
text = File.read(fn)
has scalability issues. Using foreach is a better solution. Again.
Replacing the text using gsub is fast, but it doesn't outweigh the potential problems of scalability when line-by-line IO is just as fast and sidesteps the issue completely:
replace = text.gsub(line.strip, "")
Opening and writing to the same file as you were reading is an accident waiting to happen in a production environment:
File.open(fn, "w") { |file| file.puts replace }
A better practice is to write to a separate, new, file, rename the old file to something safe, then rename the new file to the old file's name. This preserves the old file in case the code or machine crashes mid-save. Then, when that's finished it's safe to remove the old file. See "How to search file text for a pattern and replace it with a given value" for more information.
A final recommendation is to strip all the trailing commas from your input file. They're not accomplishing anything and are only making you do extra work to process the file.
I just ran your code and it works as expected on my machine. My best guess is that you're not taking the commas at the end of each line in list.txt into account. Try removing them with an extra chomp!:
value=File.open('D:\\mywork\\list.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
line.chomp!(",")
print "For the string: #{line}"
Dir.glob("D:/new_work/*-access.txt") do |fn|
print "checking files:#{fn}\n"
text = File.read(fn)
replace = text.gsub(line.strip, "")
File.open(fn, "w") { |file| file.puts replace }
end
end
By the way, you shouldn't need this line: value.gsub!(/\r\n?/, "\n") since you're chomping all the newlines away anyway, and chomp can recognize \r\n by default.

Alternative code to read and process array by newline in Ruby

My code is supposed to read a file on the server, store its content in an Array, then read the array elements (eventually each element is a line) and split each line into 7 parts by (:)
I wrote this code and it works 100% fine.
lines = File.readlines('/etc/passwd')
lines.each do |line|
line = line.chomp! #I removed the \n
line_arr = line.split(/:/)
puts line_arr.inspect
puts "*************"
end
I just want to know if there is a shortcut to do this since each element of the array ends with \n.
Maybe I am a bit confused between a an array elements ending with \n and a string that contains \n
the content of the file looks like this
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
As for the output, there's no specific format, because I am going to use this part and extend my code later. As long as I can access those 7 parts that I extracted from the line_arr, i should be fine.
thank you
require 'etc'
[].tap {|ary| Etc.passwd {|u|
ary << [u.name, u.passwd, u.uid, u.gid, u.gecos, u.dir, u.shell, u.change,
u.uclass, u.expire]
}}
Rule of thumb: never try to reimplement behavior that someone else has already written for you. Unless you are really, really, really, REALLY smart.
Actually, now that you have edited your question, I don't even see why you need those arrays in the first place and cannot just use the Etc.passwd iterator and Struct::Passwd directly.

How to print from specific column range?

I want to grab only the first line of columns 46 to 245 of source.txt and write it to output.txt
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
Bonus: I also need to keep a count of the number of characters in this range, as some may be whitespace. i.e. 38 characters and the rest whitespace.
Example:
source_file: (first line only, columns 45 to 245): 13287912721981239854 + 180 blank columns
output_file: 13287912721981239854
count = 20 characters
Update: appending [46..245].delete(' ').size gives me the desired count.
If I am understanding what you are asking correctly, there's no reason to grab the whole file when you only want the first line. If this isn't what you're asking for, then you need to specify what you're trying to pull out of the source file more clearly.
This should grab the data you need:
output_line = source_file.gets [45..244]
If you write:
source_file.each { |line|
File.open(output_file,"a+") { |f|
f.print ???
}
}
You will open, then close, your output file for each line read from the output file. That is the wrong way to do it, even if you only want to read one line of input.
Instead try something like one of these:
File.open(output_file, 'a') do |fo|
File.open('path/to/input_file') do |fi|
fo.puts fi.readline[46..245]
end
end
This uses IO.readline, which reads a single line from the file. The block falls through afterwards, causing both the input and output files to be closed automatically. Also, it opens the output file as 'a' which is append-mode only. 'a+' is wrong unless you intend to append and read, which is rarely done. From the documentation:
"a+" Read-write, starts at end of file if file exists,
otherwise creates a new file for reading and
writing
Or:
File.open(output_file, 'a') do |fo|
File.foreach('path/to/input_file') do |li|
fo.puts li[46..245]
break
end
end
foreach is used most often when we're reading a file line-by-line. It's the mainstay for reading files in a scalable manner. It wants to loop over the file inside the block, which is why break is there, to break out of that loop.
Or:
File.foreach('path/to/input_file') do |li|
File.write(output_file, li[46..245], -1, :mode => 'a')
break
end
File.write is useful when you have a blob of text or binary, and want to write it in one chunk, then move on. The -1 tells Ruby to move to the end of the file. :mode => 'a' overrides the default mode which would normally truncate an existing file.
Maybe this will do the job:
line = f.readline
columns = line.split
File.open("output.txt", "w") do |out|
columns[46, (245 - 46 + 1)].each do |column|
out.puts column
end
end
break # only process first line
I have used 245 - 46 + 1 to indicate this is the number of columns we are interested in. I have also assumed that columns are separate by whitespaces. If that is not the case you will need to change the delimiter of split.

Deleting a specific line in a text file?

How can I delete a single, specific line from a text file? For example the third line, or any other line. I tried this:
line = 2
file = File.open(filename, 'r+')
file.each { last_line = file.pos unless file.eof? }
file.seek(last_line, IO::SEEK_SET)
file.close
Unfortunately, it does nothing. I tried a lot of other solutions, but nothing works.
I think you can't do that safely because of file system limitations.
If you really wanna do a inplace editing, you could try to write it to memory, edit it, and then replace the old file. But beware that there's at least two problems with this approach. First, if your program stops in the middle of rewriting, you will get an incomplete file. Second, if your file is too big, it will eat your memory.
file_lines = ''
IO.readlines(your_file).each do |line|
file_lines += line unless <put here your condition for removing the line>
end
<extra string manipulation to file_lines if you wanted>
File.open(your_file, 'w') do |file|
file.puts file_lines
end
Something along those lines should work, but using a temporary file is a much safer and the standard approach
require 'fileutils'
File.open(output_file, "w") do |out_file|
File.foreach(input_file) do |line|
out_file.puts line unless <put here your condition for removing the line>
end
end
FileUtils.mv(output_file, input_file)
Your condition could be anything that showed it was the unwanted line, like, file_lines += line unless line.chomp == "aaab" for example, would remove the line "aaab".

How can I further process the line of data that causes the Ruby FasterCSV library to throw a MalformedCSVError?

The incoming data file(s) contain malformed CSV data such as non-escaped quotes, as well as (valid) CSV data such as fields containing new lines. If a CSV format error is detected I would like to use an alternative routine on that data.
With the following sample code (abbreviated for simplicity)
FasterCSV.open( file ){|csv|
row = true
while row
begin
row = csv.shift
break unless row
# Do things with the good rows here...
rescue FasterCSV::MalformedCSVError => e
# Do things with the bad rows here...
next
end
end
}
The MalformedCSVError is caused in the csv.shift method. How can I access the data that caused the error from the rescue clause?
require 'csv' #CSV in ruby 1.9.2 is identical to FasterCSV
# File.open('test.txt','r').each do |line|
DATA.each do |line|
begin
CSV.parse(line) do |row|
p row #handle row
end
rescue CSV::MalformedCSVError => er
puts er.message
puts "This one: #{line}"
# and continue
end
end
# Output:
# Unclosed quoted field on line 1.
# This one: 1,"aaa
# Illegal quoting on line 1.
# This one: aaa",valid
# Unclosed quoted field on line 1.
# This one: 2,"bbb
# ["bbb", "invalid"]
# ["3", "ccc", "valid"]
__END__
1,"aaa
aaa",valid
2,"bbb
bbb,invalid
3,ccc,valid
Just feed the file line by line to FasterCSV and rescue the error.
This is going to be really difficult. Some things that make FasterCSV, well, faster, make this particularly hard. Here's my best suggestion: FasterCSV can wrap an IO object. What you could do, then, is to make your own subclass of File (itself a subclass of IO) that "holds onto" the result of the last gets. Then when FasterCSV raises an exception you can ask your special File object for the last line. Something like this:
class MyFile < File
attr_accessor :last_gets
#last_gets = ''
def gets(*args)
line = super
#last_gets << $/ << line
line
end
end
# then...
file = MyFile.open(filename, 'r')
csv = FasterCSV.new file
row = true
while row
begin
break unless row = csv.shift
# do things with the good row here...
rescue FasterCSV::MalformedCSVError => e
bad_row = file.last_gets
# do something with bad_row here...
next
ensure
file.last_gets = '' # nuke the #last_gets "buffer"
end
end
Kinda neat, right? BUT! there are caveats, of course:
I'm not sure how much of a performance hit you take when you add an extra step to every gets call. It might be an issue if you need to parse multi-million-line files in a timely fashion.
This fails utterly might or might not fail if your CSV file contains newline characters inside quoted fields. The reason for this is described in the source--basically, if a quoted value contains a newline then shift has to do additional gets calls to get the entire line. There could be a clever way around this limitation but it's not coming to me right now. If you're sure your file doesn't have any newline characters within quoted fields then this shouldn't be a worry for you, though.
Your other option would be to read the file using File.gets and pass each line in turn to FasterCSV#parse_line but I'm pretty sure in so doing you'd squander any performance advantage gained from using FasterCSV.
I used Jordan's file subclassing approach to fix the problem with my input data before CSV ever tries to parse it. In my case, I had a file that used \" to escape quotes, instead of the "" that CSV expects. Hence,
class MyFile < File
def gets(*args)
line = super
if line != nil
line.gsub!('\\"','""') # fix the \" that would otherwise cause a parse error
end
line
end
end
infile = MyFile.open(filename)
incsv = CSV.new(infile)
while row = infile.shift
# process each row here
end
This allowed me to parse the non-standard CSV file. Ruby's CSV implementation is very strict and often has trouble with the many variants of the CSV format.

Resources