Replace text String from the shell disabling any regular expression - ruby

I need to replace a large set of broken HTML links in a file. For that, I'd need to do a find/replace disabling any kind of regular expression- i.e. the kind of basic Find/Replace you would do from your notepad.
I came across to a Ruby script which should do exactly that:
ruby -p -i -e "gsub('Home', 'NEWLINK')" test.txt
However, the file test.txt is not changed, nor an output is returned. (I don't know much about ruby so I might be just missing something obvious)
Is there any other tool which does what I need?
Edit: I'd expect that the following test.txt file:
Home
....is changed to:
NEWLINK
Thanks

Instead of a regular expression consider using a HTML parser which actually understands HTML and won't leave you with a broken HTML document.
# link_parser.rb
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'nokogiri'
end
fn = ARGV[0]
if File.exist(fn)
puts "Processing #{fn}..."
File.open(fn, 'rw') do |file|
doc = Nokogiri::HTML(file)
links = doc.css('a[href="index.php?option=com_content&view=article&id=130&catid=111&Itemid=324"]')
if links.any?
links.each do |link|
link.href = "NEWLINK"
end
file.rewind
file.write(doc.to_s)
puts "#{links.length} links replaced"
else
puts "No links found"
end
end
else
puts "File not found."
end
ruby link_parser.rb path/to/file.html

Related

Zip::ZipFile: How to modify contents of inner textfiles without unpacking zip?

Cheers,
as a beginner to ruby, I am currently in the process of solving my smaller-world problems with ruby, to get accustomed to it. Right now I am trying to modify the contents of a text file within a zip container.
the Structure is
ZIP
>> diretory/
>> mytext.text
And I am able to iterate over the contents
Zip::ZipFile.open(file_path) do |zipfile|
files = zipfile.select(&:file?)
files.each do |zip_entry|
## ....?
end
end
...but I find it very difficult to modify the text file without unpacking it.
Any help appreciated!
So with the help of Ben, here's one solution:
require "rubygems"
require "zip/zip"
zip_file_name = "src/test.zip"
Zip::ZipFile.open(zip_file_name) do |zipfile|
files = zipfile.select(&:file?)
files.each do |zip_entry|
contents = zipfile.read(zip_entry.name)
zipfile.get_output_stream(zip_entry.name){ |f| f.puts contents + ' added some text' }
end
zipfile.commit
end
I though I had tried this before - anyways. Thanks a lot!
This snip bit adds " added some text" to the end of myFile.txt.
Zip::File.open(file_path) do |zipfile|
contents = zipfile.read('myFile.txt')
zipfile.get_output_stream('myFile.txt') { |f| f.puts contents + ' added some text' }
end
For some reason, the modifications to the zip file aren't saved if the writing (the call to get_output_stream) is done while using each to iterate over the archive's files.
Edit: To modify files while iterating over them via each, open the archive with Zip::ZipFile.open (see Chris's answer for an example).
Hopefully, this snip bit will help point you in the right direction.

Ruby - Reading and editing XML file

I am writing a Ruby (1.9.3) script that reads XML files from a folder and then edit it if necessary.
My issue is that I was given XML files converted by Tidy but its ouput is a little strange, fo example:
<?xml version="1.0" encoding="utf-8"?>
<XML>
<item>
<ID>000001</ID>
<YEAR>2013</YEAR>
<SUPPLIER>Supplier name test,
Coproration</SUPPLIER>
...
As you can see the has and extra CRLF. I dont know why it has this behaviour but I am addressing it with a ruby script. But am having trouble as I need to see either if the last character of the line is ">" or if the first is "<" so that I can see if there is something wrong with the markup.
I have tried:
Dir.glob("C:/testing/corrected/*.xml").each do |file|
puts file
File.open(file, 'r+').each_with_index do |line, index|
first_char = line[0,1]
if first_char != "<"
//copy this line to the previous line and delete this one?
end
end
end
I also feel like I should be copying the original file content as I read it to another temporary file and then overwrite. Is that the best "way"? Any tips are welcome as I do not have much experience in altering a files content.
Regards
Does that extra \n always appear in the <SUPPLIER> node? As others have suggested, Nokogiri is a great choice for parsing XML (or HTML). You could iterate through each <SUPPLIER> node and remove the \n character, then save the XML as a new file.
require 'nokogiri'
# read and parse the old file
file = File.read("old.xml")
xml = Nokogiri::XML(file)
# replace \n and any additional whitespace with a space
xml.xpath("//SUPPLIER").each do |node|
node.content = node.content.gsub(/\n\s+/, " ")
end
# save the output into a new file
File.open("new.xml", "w") do |f|
f.write xml.to_xml
end

Ruby deleting directories

I'm trying to delete a non-empty directory in Ruby and no matter which way I go about it it refuses to work.
I have tried using FileUtils, system calls, recursively going into the given directory and deleting everything, but always seem to end up with (temporary?) files such as
.__afsECFC
.__afs73B9
Anyone know why this is happening and how I can go around it?
require 'fileutils'
FileUtils.rm_rf('directorypath/name')
Doesn't this work?
Safe method: FileUtils.remove_dir(somedir)
Realised my error, some of the files hadn't been closed.
I earlier in my program I was using
File.open(filename).read
which I swapped for a
f = File.open(filename, "r")
while line = f.gets
puts line
end
f.close
And now
FileUtils.rm_rf(dirname)
works flawlessly
I guess the best way to remove a directory with all your content "without using an aditional lib" is using a simple recursive method:
def remove_dir(path)
if File.directory?(path)
Dir.foreach(path) do |file|
if ((file.to_s != ".") and (file.to_s != ".."))
remove_dir("#{path}/#{file}")
end
end
Dir.delete(path)
else
File.delete(path)
end
end
remove_dir(path)
The built-in pathname gem really improves the ergonomics of working with paths, and it has an #rmtree method that can achieve exactly this:
require "pathname"
path = Pathname.new("~/path/to/folder").expand_path
path.rmtree

How to read an open file in Ruby

I want to be able to read a currently open file. The test.rb is sending its output to test.log which I want to be able to read and ultimately send via email.
I am running this using cron:
*/5 * * * /tmp/test.rb > /tmp/log/test.log 2>&1
I have something like this in test.rb:
#!/usr/bin/ruby
def read_file(file_name)
file = File.open(file_name, "r")
data = file.read
file.close
return data
end
puts "Start"
puts read_file("/tmp/log/test.log")
puts "End"
When I run this code, it only gives me this output:
Start
End
I would expect the output to be something like this:
Start
Start (from the reading of the test.log since it should have the word start already)
End
Ok, you're trying to do several things at once, and I suspect you didn't systematically test before moving from one step to the next.
First we're going to clean up your code:
def read_file(file_name)
file = File.open(file_name, "r")
data = file.read
file.close
return data
end
puts "Start"
puts read_file("/tmp/log/test.log")
puts "End"
can be replaced with:
puts "Start"
puts File.read("./test.log")
puts "End"
It's plain and simple; There's no need for a method or anything complicated... yet.
Note that for ease of testing I'm working with a file in the current directory. To put some content in it I'll simply do:
echo "foo" > ./test.log
Running the test code gives me...
Greg:Desktop greg$ ruby test.rb
Start
foo
End
so I know the code is reading and printing correctly.
Now we can test what would go into the crontab, before we deal with its madness:
Greg:Desktop greg$ ruby test.rb > ./test.log
Greg:Desktop greg$
Hmm. No output. Something is broken with that. We knew there was content in the file previously, so what happened?
Greg:Desktop greg$ cat ./test.log
Start
End
Cat'ing the file shows it has the "Start" and "End" output of the code, but the part that should have been read and output is now missing.
What happening is that the shell truncated "test.log" just before it passed control to Ruby, which then opened and executed the code, which opened the now empty file to print it. In other words, you're asking the shell to truncate (empty) it just before you read it.
The fix is to read from a different file than you're going to write to, if you're trying to do something with the contents of it. If you're not trying to do something with its contents then there's no point in reading it with Ruby just to write it to a different file: We have cp and/or mv to do those things for us witout Ruby being involved. So, this makes more sense if we're going to do something with the contents:
ruby test.rb > ./test.log.out
I'll reset the file contents using echo "foo" > ./test.log, and cat'ing it showed 'foo', so I'm ready to try the redirection test again:
Greg:Desktop greg$ ruby test.rb > ./test.log.out
Greg:Desktop greg$ cat test.log.out
Start
foo
End
That time it worked. Trying it again has the same result, so I won't show the results here.
If you're going to email the file you could add that code at this point. Replacing the puts in the puts File.read('./test.log') line with an assignment to a variable will store the file's content:
contents = File.read('./test.log')
Then you can use contents as the body of a email. (And, rather than use Ruby for all of this I'd probably do it using mail or mailx or pipe it directly to sendmail, using the command-line and shell, but that's your call.)
At this point things are in a good position to add the command to crontab, using the same command as used on the command-line. Because it's running in cron, and errors can happen that we'd want to know about, we'd add the 2>&1 redirect to capture STDERR also, just as you did before. Just remember that you can NOT write to the same file you're going to read from or you'll have an empty file to read.
That's enough to get your app working.
class FileLineRead
File.open("file_line_read.txt") do |file|
file.each do |line|
phone_number = line.gsub(/\n/,'')
user = User.find_by_phone_number(line)
user.destroy unless user.nil?
end
end
end
open file
read line
DB Select
DB Update
In the cron job you have already opened and cleared test.log (via redirection) before you have read it in the Ruby script.
Why not do both the read and write in Ruby?
It may be a permissions issue or the file may not exist.
f = File.open("test","r")
puts f.read()
f.close()
The above will read the file test. If the file exists in the current directory
The problem is, as I can see, already solved by Slomojo. I'll only add:
to read and print a text file in Ruby, just:
puts File.read("/tmp/log/test.log")

command-line ruby scripts accessing a libs folder

I'm trying to create an application that will primarily consist of ruby scripts that will be run from the command-line (cron, specifically). I want to have a libs folder, so I can put encapsulated, reusable classes/modules in there, and be able to access them from any script.
I want to be able to put my scripts into a "bin" folder.
What is the best way to give them access to the libs folder? I know I can add to the load path via command-line argument, or at the top of each command-line script. In PHP, it sometimes made more sense to create a custom .ini file and point the cli to the ini file, so you got them all in one pop.
Anything similar for ruby? Based on your experience, what's the best way to go here?
At the top of each bin/executable, you can put this at the top
#!/usr/bin/env ruby
$:.unshift(File.join(File.dirname(__FILE__), '..', 'lib')
require 'libfile'
[etc.]
Were you looking for something different?
If you turn your application into a Ruby gem and install the gem on your system, you don't even need to put this stuff at the top. The require statement would suffice in that case.
Sean,
There is no way to not have to require a library, that I know of. I guess if you want to personalize your Ruby so much you could "roll your own" using eval.
The script below basically works as the interpreter. You can add your own functions and include libraries. Give the file executable permissions and put it in /usr/bin if you really want. Then just use
$ myruby <source>
Here's the code for a very minimal one. As an example I've included the md5 digest library and created a custom function called md5()
#!/usr/bin/ruby -w
require 'digest/md5';
def executeCode(file)
handle = File.open(file,'r');
for line in handle.readlines()
line = line.strip();
begin
eval(line);
rescue Exception => e
print "Problem with script '" + file + "'\n";
print e + "\n";
end
end
end
def checkFile(file)
if !File.exists?(file)
print "No such source file '" + file + "'\n";
exit(1);
elsif !File.readable?(file)
print "Cannot read from source file '" + file + "'\n";
exit(1);
else
executeCode(file);
end
end
# My custom function for our "interpreter"
def md5(key=nil)
if key.nil?
raise "md5 requires 1 parameter, 0 given!\n";
else
return Digest::MD5.hexdigest(key)
end
end
if ARGV[0].nil?
print "No input file specified!\n"
exit(1);
else
checkFile(ARGV[0]);
end
Save that as myruby or myruby.rb and give it executable permissions (755). Now you're ready to create a normal ruby source file
puts "I will now generate a md5 digest for mypass using the md5() function"
puts md5('mypass')
Save that and run it as you would a normal ruby script but with our new interpreter. You'll notice I didn't need to include any libraries or write the function in the source code because it's all defined in our interpreter.
It's probably not the most ideal method, but it's the only one I can come up with.
Cheers
There is a RUBYLIB environment variable that can be set to any folder on the system
If you want to use your classes/modules globally, why not just move them to your main Ruby lib directory? eg: /usr/lib/ruby/1.8/ ?
Eg:
$ cat > /usr/lib/ruby/1.8/mymodule.rb
module HelloWorld
def hello
puts("Hello, World!");
end
end
We have our module in the main lib directory - should be able to
require it from anywhere in the system now.
$ irb
irb(main):001:0> require 'mymodule'
=> true
irb(main):002:0> include HelloWorld
=> Object
irb(main):003:0> hello
Hello, World!
=> nil

Resources