Custom serialize and parse methods in Ruby - ruby

I have developed this class Directory that some what emulates a directory using hashes. I have difficulties figuring out how to do the serialize and parse methods. The returned string from the serialize method should look something like this:
2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:
Now to explain what exactly this means. This is the master directory and the 2 upfront means the number of files, than we have the file name README and after that the length of the contents of the file 19, represented with a string that I get from the parse method of the other class in the module. And after that the second file, also notice that the two files are not separated by :, we don't need it here since when know the string length. So in a little better look:
<file count><file1_data><file2_data>1:rbfs:4:0:0:, here <file1_data>, encompasses the name, length and contents part.
Now the 1:rbfs:4:0:0: means we have one sub-directory with name rbfs, 4 representing the length of it's contents as a string and 0:0: representing that it's empty, no file and no sub-directories. Here is another example:
0:1:directory1:40:0:1:directory2:22:1:README:9:number:420: which is equivalent to:
.
`-- directory1
`-- directory2
`-- README
I have no problem with the files part,and i know how to get the number of directories and their names, but the other part I have no idea what to do. I know that recursion is the best answer, but I have no clue what should the bottom of that recursion be and how to implement it. Also solving this will help greatly in figuring out how to do the parse method by reverse engineering it.
The code is below:
module RBFS
class File
... # here I have working `serialize` and `parse` methods for `File`
end
class Directory
attr_accessor :content
def initialize
#content = {}
end
def add_file (name,file)
#content[name]=file
end
def add_directory(name, subdirectory = nil)
if subdirectory
#content[name] = subdirectory
else
#content[name] = RBFS::Directory.new
end
end
def serialize
...?
end
def self.parse (string)
...?
end
end
end
PS: I check the kind of values in the hash with the is_a? method.
Another example for #Jordan:
2:file1:17:string:Test test?file2:10:number:4322:direc1:34:0:1:dir2:22:1:README:9:number:420:direc2::1:README2:9:number:33:0
...should be this structure (if I've formulated it right):
. ->file1,file2
`-- direc1,.....................................direc2 -> README2
`-- dir2(subdirectory of direc1) -> README
direc1 contains only a directory and no files, while direc2 contains only a file.
You can see that the master directory doesn't specify it's string length while all others do.

Okay, let's work through this iteratively, starting with your example:
str = "2:README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries = {} # No entries yet!
The very first thing we need to know is how many files there are, and we know we know that's the number before the first ::
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
# num_entries is now 2
# rest is now "README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
The second argument to split says "I only want 2 pieces," so it stops splitting after the first :.) We use Integer(n) instead of n.to_i because it's stricter. (to_i will convert "10xyz" to 10; Integer will raise an error, which is what we want here.)
Now we know we have two files. We don't know anything else yet, but what's left of our string is this:
README:19:string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:
The next thing we can get is the name and length of the first file.
name, len, rest = rest.split(':', 3)
len = Integer(len.to_i)
# name = "README"
# len = 19
# rest = "string:Hello world!spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
Cool, now we have the name and length of the first file, so we can get its content:
content = rest.slice!(0, len)
# content = "string:Hello world!"
# rest = "spec.rb:20:string:describe RBFS1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!" }
We used rest.slice! which modifies removes len characters from the front of the string and returns them, so content is just what we want (string:Hello world!) and rest is everything that was after it. Then we added it to entries Hash. One file down, one to go!
For the second file, we do the exact same thing:
name, len, rest = rest.split(':', 3)
len = Integer(len)
# name = "spec.rb"
# len = 20
# rest = "string:describe RBFS1:rbfs:4:0:0:"
content = rest.slice!(0, len)
# content = "string:describe RBFS"
# rest = "1:rbfs:4:0:0:"
entries[name] = content
# entries = { "README" => "string:Hello world!",
# "spec.rb" => "string:describe RBFS" }
Since we do the exact same thing twice, obviously we should do this in a loop! But before we write that, we need to get organized. So far we have two discrete steps: First, get the number of files. Second, get those files' contents. We also know we'll need to get the number of directories and the directories. We'll take a guess at how this'll look:
def parse(serialized)
files, rest = parse_files(serialized)
# `files` will be a Hash of file names and their contents and `rest` will be
# the part of the string we haven't serialized yet
directories, rest = parse_directories(rest)
# `directories` will be a Hash of directory names and their contents
files.merge(directories)
end
def parse_files(serialized)
# Get the number of files from the beginning of the string
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
entries = {}
# `rest` now starts with the first file (e.g. "README:19:...")
num_entries.times do
name, len, rest = rest.split(':', 3) # get the file name and length
len = Integer(len)
content = rest.slice!(0, len) # get the file contents from the beginning of the string
entries[name] = content # add it to the hash
end
[ entries, rest ]
end
def parse_directories(serialized)
# TBD...
end
That parse_files method is a bit long for my taste, though, so how about we split it up?
def parse_files(serialized)
# Get the number of files from the beginning of the string
num_entries, rest = str.split(':', 2)
num_entries = Integer(num_entries)
entries = {}
# `rest` now starts with the first file (e.g. "README:19:...")
num_entries.times do
name, content, rest = parse_file(rest)
entries[name] = content # add it to the hash
end
[ entries, rest ]
end
def parse_file(serialized)
name, len, rest = serialized.split(':', 3) # get the name and length of the file
len = Integer(len)
content = rest.slice!(0, len) # use the length to get its contents
[ name, content, rest ]
end
Clean!
Now, I'm going to give you a big spoiler: Since the serialization format is reasonably well-designed, we don't actually need a parse_directories method, because it would do exactly the same thing as parse_files. The only difference is that after this line:
name, content, rest = parse_file(rest)
...we want to do something different if we're parsing directories instead of files. In particular, we want to call parse(content), which will do all of this over again on the directory's contents. Since it's pulling double-duty now, we should probably change it's name to something more general like parse_entries, and we also need to give it another argument to tell it when to do that recursion.
Rather than post more code here, I've posted my "finished" product over in this Gist.
Now, I know that doesn't help you with the serialize part, but hopefully it'll help get you started. serialize is the easier part because there are plenty of questions and answers on SO about recursively iterating over a Hash.

Related

Ruby file renamer

this is a text file renamer i made, you throw the file in a certain folder and the program renames them to file1.txt, file2.txt, etc
it gets the job done but it's got two problems
it gives me this error no implicit conversion of nil into String error
if i add new files into the folder where there's already organized files, they're all deleted and a new file is created
what's causing these problems?
i=0
Dir.chdir 'C:\Users\anon\Desktop\newfolder'
arr = Dir.entries('C:\Users\anon\Desktop\newfolder')
for i in 2..arr.count
if (File.basename(arr[i]) == 'file'+((i-1).to_s)+'.txt')
puts (arr[i]+' is already renamed to '+'file'+i.to_s)
else
File.rename(arr[i],'file'+((i-1).to_s)+'.txt')
end
end
There are two main problems in your program.
The first is that you are using an out of bounds value in the array arr. Try this a = [1,2,3]; a[a.count] and you will get nil because you are trying at access a[3] but the last element in the array has index 2.
Then, you are using as indexes for names fileINDEX.txt always 2...foobar without taking into account that some indexes may be already used in your directory.
Extra problem, you are using Dir.entries, this in my OS gives regular entries more . and .. which should be managed properly, they are not what you want to manipulate.
So, I wrote you a little script, I hope you find it readable, to me it works. You can improve it for sure! (p.s. I am under Linux OS).
# Global var only to stress its importance
$dir = "/home/p/tmp/t1"
Dir.chdir($dir)
# get list of files
fnames = Dir.glob "*"
# get the max index "fileINDEX.txt" already used in the directory
takenIndexes = []
fnames.each do |f|
if f.match /^file(\d+).txt/ then takenIndexes.push $1.to_i; end
end
# get the first free index available
firstFreeIndex = 1
firstFreeIndex = (takenIndexes.max + 1) if takenIndexes.length > 0
# get a range of fresh indexes for possible use
idxs = firstFreeIndex..(firstFreeIndex + (fnames.length))
# i transform the range to list and reverse the order because i want
# to use "pop" to get and remove them.
idxs = idxs.to_a
idxs.reverse!
# rename the files needing to be renamed
puts "--- Renamed files ----"
fnames.each do |f|
# if file has already the wanted format then move to next iteration
next if f.match /^file\d+.txt/
newName = "file" + idxs.pop.to_s + ".txt"
puts "rename: #{f} ---> #{newName}"
File.rename(f, newName)
end

Ruby script which can replace a string in a binary file to a different, but same length string?

I would like to write a Ruby script (repl.rb) which can replace a string in a binary file (string is defined by a regex) to a different, but same length string.
It works like a filter, outputs to STDOUT, which can be redirected (ruby repl.rb data.bin > data2.bin), regex and replacement can be hardcoded. My approach is:
#!/usr/bin/ruby
fn = ARGV[0]
regex = /\-\-[0-9a-z]{32,32}\-\-/
replacement = "--0ca2765b4fd186d6fc7c0ce385f0e9d9--"
blk_size = 1024
File.open(fn, "rb") {|f|
while not f.eof?
data = f.read(blk_size)
data.gsub!(regex, str)
print data
end
}
My problem is that when string is positioned in the file that way it interferes with the block size used by reading the binary file. For example when blk_size=1024 and my 1st occurance of the string begins at byte position 1000, so I will not find it in the "data" variable. Same happens with the next read cycle. Should I process the whole file two times with different block size to ensure avoiding this worth case scenario, or is there any other approach?
I would posit that a tool like sed might be a better choice for this. That said, here's an idea: Read block 1 and block 2 and join them into a single string, then perform the replacement on the combined string. Split them apart again and print block 1. Then read block 3 and join block 2 and 3 and perform the replacement as above. Split them again and print block 2. Repeat until the end of the file. I haven't tested it, but it ought to look something like this:
File.open(fn, "rb") do |f|
last_block, this_block = nil
while not f.eof?
last_block, this_block = this_block, f.read(blk_size)
data = "#{last_block}#{this_block}".gsub(regex, str)
last_block, this_block = data.slice!(0, blk_size), data
print last_block
end
print this_block
end
There's probably a nontrivial performance penalty for doing it this way, but it could be acceptable depending on your use case.
Maybe a cheeky
f.pos = f.pos - replacement.size
at the end of the while loop, just before reading the next chunk.

Ruby: How do you search for a substring, and increment a value within it?

I am trying to change a file by finding this string:
<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]>
and replacing {CLONEINCR} with an incrementing number. Here's what I have so far:
file = File.open('input3400.txt' , 'rb')
contents = file.read.lines.to_a
contents.each_index do |i|contents.join["<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>"] = "<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>" end
file.close
But this seems to go on forever - do I have an infinite loop somewhere?
Note: my text file is 533,952 lines long.
You are repeatedly concatenating all the elements of contents, making a substitution, and throwing away the result. This is happening once for each line, so no wonder it is taking a long time.
The easiest solution would be to read the entire file into a single string and use gsub on that to modify the contents. In your example you are inserting the (zero-based) file line numbers into the CDATA. I suspect this is a mistake.
This code replaces all occurrences of <![CDATA[{CLONEINCR}]]> with <![CDATA[1]]>, <![CDATA[2]]> etc. with the number incrementing for each matching CDATA found. The modified file is sent to STDOUT. Hopefully that is what you need.
File.open('input3400.txt' , 'r') do |f|
i = 0
contents = f.read.gsub('<![CDATA[{CLONEINCR}]]>') { |m|
m.sub('{CLONEINCR}', (i += 1).to_s)
}
puts contents
end
If what you want is to replace CLONEINCR with the line number, which is what your above code looks like it's trying to do, then this will work. Otherwise see Borodin's answer.
output = File.readlines('input3400.txt').map.with_index do |line, i|
line.gsub "<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>",
"<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>"
end
File.write('input3400.txt', output.join(''))
Also, you should be aware that when you read the lines into contents, you are creating a String distinct from the file. You can't operate on the file directly. Instead you have to create a new String that contains what you want and then overwrite the original file.

Using a Hash of function names, search file and return with line numbers

Given a hash that contains function names like "find_by_user", "find_by_id", ...
I want to search in a directory of files, and return a object that has each file name, along with the line numbers of where the function name occurred.
I have this so far:
files = Dir.glob(#folder_path)
files.each do |file_name|
content = File.read(file_name)
end
This will be scanning a few hundred files.
Here's the basic functionality you need:
# Given a path to a file and a regex,
# return an array of paired filename+line number matches
def matching_lines( file_path, regex )
name = File.basename(file_path)
File.readlines(file_path)
.map.with_index{ |line,i| [name,line,i] }
.select{ |name,line,i| line =~ regex }
.map{ |name,line,i| [name,i] }
end
You can choose to use this as you like, iterating over multiple files and/or patterns, or using Regexp.union to create a pattern matching any one of a set of strings.
However: this is what grep was made for:
C:\>grep --line-number Nokogiri *.rb
push_nav_to_docs.rb:13: nav_dom = Nokogiri.XML(IO.read(NAV))
push_nav_to_docs.rb:39: landing = Nokogiri.XML(html)
push_nav_to_docs.rb:53: doc = Nokogiri.XML(IO.read(doc_path))
push_nav_to_docs.rb:73: if File.exists?(toc_path) && toc = Nokogiri.XML(IO.read(toc_path)).at('ul')
push_nav_to_docs.rb:104: container << Nokogiri.make("<ul/>").tap do |ul|
In Ruby you could call this code and get the output you want via:
lookfor = "Nokogiri"
grepped = `grep --line-number #{lookfor} *.rb`
results = grepped.scan(/^(.+?):(\d+)/)
#=> [["push_nav_to_docs.rb", "13"], ["push_nav_to_docs.rb", "39"], ["push_nav_to_docs.rb", "53"], ["push_nav_to_docs.rb", "73"], ["push_nav_to_docs.rb", "104"]]
Grep can also recurse into directories, match only particular file names, take regular expressions as patterns, and more.

Using Ruby to automate a large directory system

So I have the following little script to make a file setup for organizing reports that we get.
#This script is to create a file structure for our survey data
require 'fileutils'
f = File.open('CustomerList.txt') or die "Unable to open file..."
a = f.readlines
x = 0
while a[x] != nil
Customer = a[x]
FileUtils.mkdir_p(Customer + "/foo/bar/orders")
FileUtils.mkdir_p(Customer + "/foo/bar/employees")
FileUtils.mkdir_p(Customer + "/foo/bar/comments")
x += 1
end
Everything seems to work before the while, but I keep getting:
'mkdir': Invalid argument - Cust001_JohnJacobSmith(JJS) (Errno::EINVAL)
Which would be the first line from the CustomerList.txt. Do I need to do something to the array entry to be considered a string? Am I mismatching variable types or something?
Thanks in advance.
The following worked for me:
IO.foreach('CustomerList.txt') do |customer|
customer.chomp!
["orders", "employees", "comments"].each do |dir|
FileUtils.mkdir_p("#{customer}/foo/bar/#{dir}")
end
end
with data like so:
$ cat CustomerList.txt
Cust001_JohnJacobSmith(JJS)
Cust003_JohnJacobSmith(JJS)
Cust002_JohnJacobSmith(JJS)
A few things to make it more like the ruby way:
Use blocks when opening a file or iterating through arrays, that way you don't need to worry about closing the file or accessing the array directly.
As noted by #inger, local vars start with lower case, customer.
When you want the value of a variable in a string usign #{} is more rubinic than concatenating with +.
Also note that we took off the trailing newline using chomp! (which changes the var in place, noted by the trailing ! on the method name)

Resources