Using Ruby to automate a large directory system - ruby

So I have the following little script to make a file setup for organizing reports that we get.
#This script is to create a file structure for our survey data
require 'fileutils'
f = File.open('CustomerList.txt') or die "Unable to open file..."
a = f.readlines
x = 0
while a[x] != nil
Customer = a[x]
FileUtils.mkdir_p(Customer + "/foo/bar/orders")
FileUtils.mkdir_p(Customer + "/foo/bar/employees")
FileUtils.mkdir_p(Customer + "/foo/bar/comments")
x += 1
end
Everything seems to work before the while, but I keep getting:
'mkdir': Invalid argument - Cust001_JohnJacobSmith(JJS) (Errno::EINVAL)
Which would be the first line from the CustomerList.txt. Do I need to do something to the array entry to be considered a string? Am I mismatching variable types or something?
Thanks in advance.

The following worked for me:
IO.foreach('CustomerList.txt') do |customer|
customer.chomp!
["orders", "employees", "comments"].each do |dir|
FileUtils.mkdir_p("#{customer}/foo/bar/#{dir}")
end
end
with data like so:
$ cat CustomerList.txt
Cust001_JohnJacobSmith(JJS)
Cust003_JohnJacobSmith(JJS)
Cust002_JohnJacobSmith(JJS)
A few things to make it more like the ruby way:
Use blocks when opening a file or iterating through arrays, that way you don't need to worry about closing the file or accessing the array directly.
As noted by #inger, local vars start with lower case, customer.
When you want the value of a variable in a string usign #{} is more rubinic than concatenating with +.
Also note that we took off the trailing newline using chomp! (which changes the var in place, noted by the trailing ! on the method name)

Related

Ruby file renamer

this is a text file renamer i made, you throw the file in a certain folder and the program renames them to file1.txt, file2.txt, etc
it gets the job done but it's got two problems
it gives me this error no implicit conversion of nil into String error
if i add new files into the folder where there's already organized files, they're all deleted and a new file is created
what's causing these problems?
i=0
Dir.chdir 'C:\Users\anon\Desktop\newfolder'
arr = Dir.entries('C:\Users\anon\Desktop\newfolder')
for i in 2..arr.count
if (File.basename(arr[i]) == 'file'+((i-1).to_s)+'.txt')
puts (arr[i]+' is already renamed to '+'file'+i.to_s)
else
File.rename(arr[i],'file'+((i-1).to_s)+'.txt')
end
end
There are two main problems in your program.
The first is that you are using an out of bounds value in the array arr. Try this a = [1,2,3]; a[a.count] and you will get nil because you are trying at access a[3] but the last element in the array has index 2.
Then, you are using as indexes for names fileINDEX.txt always 2...foobar without taking into account that some indexes may be already used in your directory.
Extra problem, you are using Dir.entries, this in my OS gives regular entries more . and .. which should be managed properly, they are not what you want to manipulate.
So, I wrote you a little script, I hope you find it readable, to me it works. You can improve it for sure! (p.s. I am under Linux OS).
# Global var only to stress its importance
$dir = "/home/p/tmp/t1"
Dir.chdir($dir)
# get list of files
fnames = Dir.glob "*"
# get the max index "fileINDEX.txt" already used in the directory
takenIndexes = []
fnames.each do |f|
if f.match /^file(\d+).txt/ then takenIndexes.push $1.to_i; end
end
# get the first free index available
firstFreeIndex = 1
firstFreeIndex = (takenIndexes.max + 1) if takenIndexes.length > 0
# get a range of fresh indexes for possible use
idxs = firstFreeIndex..(firstFreeIndex + (fnames.length))
# i transform the range to list and reverse the order because i want
# to use "pop" to get and remove them.
idxs = idxs.to_a
idxs.reverse!
# rename the files needing to be renamed
puts "--- Renamed files ----"
fnames.each do |f|
# if file has already the wanted format then move to next iteration
next if f.match /^file\d+.txt/
newName = "file" + idxs.pop.to_s + ".txt"
puts "rename: #{f} ---> #{newName}"
File.rename(f, newName)
end

Ruby: How do you search for a substring, and increment a value within it?

I am trying to change a file by finding this string:
<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]>
and replacing {CLONEINCR} with an incrementing number. Here's what I have so far:
file = File.open('input3400.txt' , 'rb')
contents = file.read.lines.to_a
contents.each_index do |i|contents.join["<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>"] = "<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>" end
file.close
But this seems to go on forever - do I have an infinite loop somewhere?
Note: my text file is 533,952 lines long.
You are repeatedly concatenating all the elements of contents, making a substitution, and throwing away the result. This is happening once for each line, so no wonder it is taking a long time.
The easiest solution would be to read the entire file into a single string and use gsub on that to modify the contents. In your example you are inserting the (zero-based) file line numbers into the CDATA. I suspect this is a mistake.
This code replaces all occurrences of <![CDATA[{CLONEINCR}]]> with <![CDATA[1]]>, <![CDATA[2]]> etc. with the number incrementing for each matching CDATA found. The modified file is sent to STDOUT. Hopefully that is what you need.
File.open('input3400.txt' , 'r') do |f|
i = 0
contents = f.read.gsub('<![CDATA[{CLONEINCR}]]>') { |m|
m.sub('{CLONEINCR}', (i += 1).to_s)
}
puts contents
end
If what you want is to replace CLONEINCR with the line number, which is what your above code looks like it's trying to do, then this will work. Otherwise see Borodin's answer.
output = File.readlines('input3400.txt').map.with_index do |line, i|
line.gsub "<aspect name=\"lineNumber\"><![CDATA[{CLONEINCR}]]></aspect>",
"<aspect name=\"lineNumber\"><![CDATA[#{i}]]></aspect>"
end
File.write('input3400.txt', output.join(''))
Also, you should be aware that when you read the lines into contents, you are creating a String distinct from the file. You can't operate on the file directly. Instead you have to create a new String that contains what you want and then overwrite the original file.

Extract a single line string having "foo: XXXX"

I have a file with one or more key:value lines, and I want to pull a key:value out if key=foo. How can I do this?
I can get as far as this:
if File.exist?('/file_name')
content = open('/file_name').grep(/foo:??/)
I am unsure about the grep portion, and also once I get the content, how do I extract the value?
People like to slurp the files into memory, which, if the file will always be small, is a reasonable solution. However, slurping isn't scalable, and the practice can lead to excessive CPU and I/O waits as content is read.
Instead, because you could have multiple hits in a file, and you're comparing the content line-by-line, read it line-by-line. Line I/O is very fast and avoids the scalability problems. Ruby's File.foreach is the way to go:
File.foreach('path/to/file') do |li|
puts $1 if li[/foo:\s*(\w+)/]
end
Because there are no samples of actual key/value pairs, we're shooting in the dark for valid regex patterns, but this is the basis for how I'd solve the problem.
Try this:
IO.readlines('key_values.txt').find_all{|line| line.match('key1')}
i would recommend to read the file into array and select only lines you need:
regex = /\A\s?key\s?:/
results = File.readlines('file').inject([]) do |f,l|
l =~ regex ? f << "key = %s" % l.sub(regex, '') : f
end
this will detect lines starting with key: and adding them to results like key = value,
where value is the portion going after key:
so if you have a file like this:
key:1
foo
key:2
bar
key:3
you'll get results like this:
key = 1
key = 2
key = 3
makes sense?
value = File.open('/file_name').read.match("key:(.*)").captures[0] rescue nil
File.read('file_name')[/foo: (.*)/, 1]
#=> XXXX

Ruby - detecting the end of the read file

I upload through a form a file and in the controller this file read. My problem is, that I don't know, hot to detect the end of the file (=> when stop a loop). This part of code looks like this:
dat = params[:data]
while(d = dat.read)
puts d
break if d.eof #this doesn't work
end
The result of this part is (except the error about eof) infinity while looping.
From http://ruby-doc.org/core-1.9.3/IO.html#method-i-read:
If length is omitted or is nil, it reads until EOF and the encoding conversion is applied. It returns a string even if EOF is met at beginning.
So I guess you should just do dat.read
Edit: if you want all the lines of the file, use dat.readlines - this will return an Array of Strings

Ruby MatchData class is repeating captures, instead of including additional captures as it "should"

Ruby 1.9.1, OSX 10.5.8
I'm trying to write a simple app that parses through of bunch of java based html template files to replace a period (.) with an underscore if it's contained within a specific tag. I use ruby all the time for these types of utility apps, and thought it would be no problem to whip up something using ruby's regex support. So, I create a Regexp.new... object, open a file, read it in line by line, then match each line against the pattern, if I get a match, I create a new string using replaceString = currentMatch.gsub(/./, '_'), then create another replacement as whole string by newReplaceRegex = Regexp.escape(currentMatch) and finally replace back into the current line with line.gsub(newReplaceRegex, replaceString) Code below, of course, but first...
The problem I'm having is that when accessing the indexes within the returned MatchData object, I'm getting the first result twice, and it's missing the second sub string it should otherwise be finding. More strange, is that when testing this same pattern and same test text using rubular.com, it works as expected. See results here
My pattern:
(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+.)+(?:[a-zA-Z0-9]+)(?:>))
Text text:
<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText
Here's the relevant code:
tagRegex = Regexp.new('(<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>))+')
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each{|htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
tagMatch = tagRegex.match(htmlLine)
if(tagMatch)
matchesArray = tagMatch.to_a
firstMatch = matchesArray[0]
secondMatch = matchesArray[1]
puts "First match: #{firstMatch} and second match #{secondMatch}"
tagMatch.captures.each {|lineMatchCapture|
puts "Current capture for tagMatches: #{lineMatchCapture} of total match count #{matchesArray.size}"
#create a new regex using the match results; make sure to use auto escape method
originalPatternString = Regexp.escape(lineMatchCapture)
replacementRegex = Regexp.new(originalPatternString)
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = lineMatchCapture.gsub(/\./, '_')
#replace original match with underscore replaced copy within line
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
}
end
}
I would think that I should get the first tag in matchData[0] then the second tag in matchData1, or, what I'm really doing because I don't know how many matches I'll get within any given line is matchData.to_a.each. And in this case, matchData has two captures, but they're both the first tag match
which is: <WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>
So, what the heck am I doing wrong, why does rubular test give me the expected results?
You want to use the on String#scan instead of the Regexp#match:
tag_regex = /<(?:WEBOBJECT|webobject) (?:NAME|name)=(?:[a-zA-Z0-9]+\.)+(?:[a-zA-Z0-9]+)(?:>)/
lines = "<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>moreNonMatchingText\
<WEBOBJECT NAME=admin.SecondLineMatch>AndEvenMoreNonMatchingText"
lines.scan(tag_regex)
# => ["<WEBOBJECT NAME=admin.normalMode.someOtherPatternWeDontWant.moreThatWeDontWant>", "<WEBOBJECT NAME=admin.SecondLineMatch>"]
A few recommendations for next ruby questions:
newlines and spaces are your friends, you don't loose points for using more lines on your code ;-)
use do-end on blocks instead of {}, improves readability a lot
declare variables in snake case (hello_world) instead of camel case (helloWorld)
Hope this helps
I ended up using the String.scan approach, the only tricky point there was figuring out that this returns an array of arrays, not a MatchData object, so there was some initial confusion on my part, mostly due to my ruby green-ness, but it's working as expected now. Also, I trimmed the regex per Trevoke's suggestion. But snake case? Never...;-) Anyway, here goes:
tagRegex = /(<(?:webobject) (?:name)=(?:\w+\.)+(?:\w+)(?:>))/i
testFile = File.open('RegexTestingCompFix.txt', "r+")
lineCount=0
testFile.each do |htmlLine|
lineCount += 1
puts ("Current line: #{htmlLine} at line num: #{lineCount}")
oldMatches = htmlLine.scan(tagRegex) #oldMatches thusly named due to not explicitly using Regexp or MatchData, as in "the old way..."
if(oldMatches.size > 0)
oldMatches.each_index do |index|
arrayMatch = oldMatches[index]
aMatch = arrayMatch[0]
#create a new regex using the match results; make sure to use auto escape method
replacementRegex = Regexp.new(Regexp.escape(aMatch))
#replace any periods with underscores in a copy of lineMatchCapture
periodToUnderscoreCorrection = aMatch.gsub(/\./, '_')
#replace original match with underscore replaced copy within line, matching against the new escaped literal regex
htmlLine.gsub!(replacementRegex, periodToUnderscoreCorrection)
puts "The modified htmlLine is now: #{htmlLine}"
end # I kind of still prefer the brackets...;-)
end
end
Now, why does MatchData work the way it does? It seems like it's behavior is a bug really, and certainly not very useful in general if you can't get it provide a simple means of accessing all the matches. Just my $.02
Small bits:
This regexp helps you get "normalMode" .. But not "secondLineMatch":
<webobject name=\w+\.((?:\w+)).+> (with option 'i', for "case insensitive")
This regexp helps you get "secondLineMatch" ... But not "normalMode":
<webobject name=\w+\.((?:\w+))> (with option 'i', for "case insensitive").
I'm not really good at regexpt but I'll keep toiling at it.. :)
And I don't know if this helps you at all, but here's a way to get both:
<webobject name=admin.(\w+) (with option 'i').

Resources