How would I replace the first line of a text file or xml file using ruby? I'm having problems replicating a strange xml API and need to edit the document instruction after I create the XML file. It is strange that I have to do this, but in this case it is necessary.
If you are editing XML, use a tool specially designed for the task. sub, gsub and regex are not good choices if the XML being manipulated is not under your control.
Use Nokogiri to parse the XML, locate nodes and change them, then emit the updated XML.
There are many examples on SO showing how to do this, plus the tutorials on the Nokogiri site.
There are a couple different ways you can do this:
Use ARGF (assuming that your ruby program takes a file name as a command line parameter)
ruby -e "puts ARGF.to_a[n]" yourfile.xml
Open the file regularly then read n lines
File.open("yourfile") { |f|
line = nil
n.times { line = f.gets }
puts line
}
This approach is less intensive on memory, as only a single line is considered at a time, it is also the simplest method.
Use IO.readlines() (will only work if the entire file will fit in memory!)
IO.readlines("yourfile")[n]
IO.readlines(...) will read every line from your file into an array.
Where n in all the above examples is the nth line of your file.
Related
Suppose, I have an input.txt file with the following text:
First line
Second line
Third line
Fourth line
I want to delete, for example, the second and fourth lines to get this:
First line
Third line
So far, I've managed to delete only one the second line using this code
require 'fileutils'
File.open('output.txt', 'w') do |out_file|
File.foreach('input.txt') do |line|
out_file.puts line unless line =~ /Second/
end
end
FileUtils.mv('output.txt', 'input.txt')
What is the right way to delete multiple lines in text file in Ruby?
Deleting lines cleanly and efficiently from a text file is "difficult" in the general case, but can be simple if you can constrain the problem somewhat.
Here are some questions from SO that have asked a similar question:
How do I remove lines of data in the middle of a text file with Ruby
Deleting a specific line in a text file?
Deleting a line in a text file
Delete a line of information from a text file
There are numerous others, as well.
In your case, if your input file is relatively small, you can easily afford to use the approach that you're using. Really, the only thing that would need to change to meet your criteria is to modify your input file loop and condition to this:
File.open('output.txt', 'w') do |out_file|
File.foreach('input.txt').with_index do |line,line_number|
out_file.puts line if line_number.even? # <== line numbers start at 0
end
end
The changes are to capture the line number, using the with_index method, which can be used due to the fact that File#foreach returns an Enumerator when called without a block; the block now applies to with_index, and gains the line number as a second block argument. Simply using the line number in your comparison gives you the criteria that you specified.
This approach will scale, even for somewhat large files, whereas solutions that read the entire file into memory have a fairly low upper limit on file size. With this solution, you're more constrained by available disk space and speed at which you can read/write the file; for instance, doing this to space-limited online storage may not work as well as you'd like. Writing to local disk or thumb drive, assuming that you have space available, should be no problem at all.
Use File.readlines to get an array of the lines in your input file.
input_lines = File.readlines('input.txt')
Then select only those with an even index.
output_lines = input_lines.select.with_index { |_, i| i.even? }
Finally, write those in your output file.
File.open('output.txt', 'w') do |f|
output_lines.each do |line|
f.write line
end
end
I'm using Ruby 1.9.3 and REXML to parse an XML document, make a few changes (additions/subtractions), then re-output the file. Within this file is a block that looks like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2
some.namespace.something3=somevalue3
</someElement>
The problem is that after re-writing the file, this block always ends up looking like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2 some.namespace.something3=somevalue3
</someElement>
The newline after the second value (but never the first!) has been lost and turned into a space. Later, some other code which I have no control or influence over will be reading this file and depending on those newlines to properly parse the content. Generally in this situation i'd use a CDATA to preserve the whitespace, but this isn't an option as the code that parses this data later is not expecting one - it's essential that the inner text of this element is preserved exactly as-is.
My read/write code looks like this:
xmlFile = File.open(myFile)
contents = xmlFile.read
xmlDoc = REXML::Document.new(contents, { :respect_whitespace => :all })
xmlFile.close
{perform some tasks}
out = ""
xmlDoc.write(out, 2)
File.open(filePath, "w"){|file| file.puts(out)}
I'm looking for a way to preserve the whitespace of text between elements when reading/writing a file in this manner using REXML. I've read a number of other questions here on stackoverflow on this subject, but none that quite replicate this scenario. Any ideas or suggestions are welcome.
I get correct behavior by removing the indent (second) parameter to Document.write():
#xmlDoc.write(out, 2)
xmlDoc.write(out)
That seems like a bug in Document.write() according to my reading of the docs, but if you don't really need to set the indentation, then leaving that off should solve yor problem.
Ruby newbie here. I'm using Ruby version 1.9.2. I working at a military facility and whenever when need to send support data to our vendors it needs to be scrubbed of idenfying IP and Hostname info. This is new role for me and now the task of scrubbing files (both text and binary) falls on me when handling support issues.
I created the following script to "scrub" files plain text files of IP address info:
File.open("subnet.htm", 'r+') do |f|
text = f.read
text.gsub!(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/, "000.000.000.000")
f.rewind
f.write(text)
end
I need to modify my script to search and replace hostname AND IP address information on text files AND .dat binary files. I'm looking for something really simple like my little script above and I'd like the keep the processing of txt and dat files as separate scripts. The task of creating one script to do both is one I'd like to take up as learning exercise from the two separate scripts. Right now I'm under certain time constraints to scrub the supports files and send them out.
The priority for me is to scrub my binary .dat trace files which are of data type XML. These are binary performance trace files from our storage arrays and they need to have the identifying IP address information scrubbed out before sending off to support for analysis.
I've searched stackoverflow.com somewhat extensively and haven't found a question with answer that addresses my specific need and I simply having a time trying to figure out string.unpack.
Thanks.
In general Ruby processes binary files the same as other files, with two caveats:
On Windows reading files normally translates CRLF pairs into just LF. You need to read in binary mode to ensure no conversion:
File.open('foo.bin','rb'){ ... }
In order to ensure that your binary data is not interpreted as text in some other encoding under Ruby 1.9+ you need to specify the ASCII-8BIT encoding:
File.open('foo.bin','r:ASCII-8BIT'){ ... }
However, as noted in this post, setting the 'b' flag as shown above also sets the encoding for you. Thus, just use the first code snippet above.
However, as noted in the comment by #ennuikiller, I suspect that you don't actually have true binary data. If you're really reading text files with a non-ASCII encoding (e.g. UTF-8) there is a small chance that treating them as binary will accidentally find only half of a multi-byte encoding and cause harm in the resulting file.
Edit: To use Nokogiri on XML files, you might do something like the following:
require 'nokogiri'
File.open("foo.xml", 'r+') do |f|
doc = Nokogiri.XML(f.read)
doc.xpath('//text()').each do |text_node|
# You cannot use gsub! here
text_node.content = text_node.content.gsub /.../, '...'
end
f.rewind
f.write doc.to_xml
end
I've done some binary file parsing, and this is how I read it in and cleaned it up:
data = File.open("file", 'rb' ) {|io| io.read}.unpack("C*").map do |val|
val if val == 9 || val == 10 || val == 13 || (val > 31 && val < 127)
end
For me, my binary file didn't have sequential character strings, so I had to do some shifting and filtering before I could read it (Hence the .map do |val| ... end Unpack with the "C" tag (see http://www.ruby-doc.org/core-1.9.2/String.html#method-i-unpack) will give ASCII character codes rather than the letters, so call val.chr if you'd like to use the interpreted character instead.
I'd suggest that you open your files in a binary editor and look through them to determine how to best handle the data parsing. If they are XML, you might consider parsing them with Nokogiri or a similar XML tool.
I think I may not have done a good enough job explaining my question the first time.
I want to open a bunch of text, and binary files and scan those files with my regular expression. What I need from the csv is to take the data in the second column, which are the paths to all the files, as the means to point to which file to open.
Once the file is opened and the regexp is scanned thru the file, if it matches anything, it displays to the screen. I am sorry for the confusion and thank you so much for everything! –
Hello,
I am sorry for asking what is probably a simple question. I am new to ruby and will appreciate any guidance.
I am trying to use a csv file as an index to leverage other actions.
In particular, I have a csv file that looks like:
id, file, description, date
1, /dir_a/file1, this is the first file, 02/10/11
2, /dir_b/file2, this is the second file, 02/11/11
I want to open every file defined in the "file" column and search for a regular expression.
I know that you can define the headers in each column with the CSV class
require 'rubygems'
require 'csv'
require 'pp'
index = CSV.read("files.csv", :headers => true)
index.each do |row|
puts row ['file']
end
I know how to create a loop that opens every file and search's for a regexp in each file, and if there is one, displays it:
regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/
Dir.glob('/home/Bob/**/*').each do |file|
next unless File.file?(file)
File.open(file, "rb") do |f|
f.each_line do |line|
f.each_line do |line|
unless (pattern = line.scan(regex)).empty?
puts "#{pattern}"
end
end
end
end
end
Is there a way I can use the contents of the second column in my csv file as my variable to open each of the files, search the regexp and if there is a match in the file, output the the row in the csv that had the match to a new csv?
Thank you in advance!!!!
At a quick glance it looks like you could reduce it to:
index.each do |row|
File.foreach(row['file']) do |line|
puts "#{pattern}" if (line[regex])
end
end
A CSV file shouldn't be binary, so you can drop the 'rb' when opening the file, letting us reduce the file read to foreach, which iterates over the file, returning it line by line.
The depth of the files in your directory hierarchy is in question based on your sample code. It's not real clear what's going on there.
EDIT:
it tells me that "regex" is an undefined variable
In your question you said:
regex = /[0-9A-Za-z]{8,8}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{4,4}-[0-9A-Za-z]{12,12}/
the files I open to do the search on may be a binary.
According to the spec:
Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter.
It goes on to say:
Security considerations:
CSV files contain passive text data that should not pose any
risks. However, it is possible in theory that malicious binary
data may be included in order to exploit potential buffer overruns
in the program processing CSV data. Additionally, private data
may be shared via this format (which of course applies to any text
data).
So, if you're seeing binary data you shouldn't because it's not CSV according to the spec. Unfortunately the spec has been abused over the years, so it's possible you are seeing binary data in the file. If so, continue to use 'rb' as the file mode but do it cautiously.
An important question to ask is whether you can read the file using Ruby's CSV library, which makes a lot of this a moot discussion.
Using Ruby, I am reading a file line by line, using IO.gets to incrementally read the next line of the file. Under certain circumstances I want to do the opposite (look at the previous line by decrementing). The way I tried to accomplish this was...
IO.lineno = int
IO.gets
It seems that no matter what I set "lineno" to equal it still just reads the next line when I follow up by calling "gets". How should I go about reading previous lines in the file?
You need to use
IO.readlines("myfile")
This returns the file as an array of strings and then iterate over it with indizies. With a stream there is no way to go back one line.