search a string from a text file and delete that line RUBY - ruby

I have a text file that contains some numbers and I want to search a specific number then delete that line. This is the content of the file
83087
308877
214965
262896
527530
So if I want to delete 262896, I will open the file, search for the string and delete that line.

You need to open a temporary file to write lines you want to keep.
something along the lines like this should do it :
require 'fileutils'
require 'tempfile'
# Open temporary file
tmp = Tempfile.new("extract")
# Write good lines to temporary file
open('sourcefile.txt', 'r').each { |l| tmp << l unless l.chomp == '262896' }
# Close tmp, or troubles ahead
tmp.close
# Move temp file to origin
FileUtils.mv(tmp.path, 'sourcefile.txt')
This will run as :
$ cat sourcefile.txt
83087
308877
214965
262896
527530
$ ruby ./extract.rb
$ cat sourcefile.txt
83087
308877
214965
527530
$
You can also do it in-memory only, without a temporary file. But the memory footprint might be huge depending on your file size. The above solution only loads one line at a time in memory, so it should work fine on big files.
-- Hope it helps --

Related

How to create a random unique file directly in /tmp using Ruby?

I am writing an application that creates and places a logfile in /tmp and afterwards moves this logfile to another directory. Unfortunately I faced some issues with this implementation and I would like to make this logfile more unique.
I came across mktemp, which would automatically create a file in /tmp. Perfect, just what I need! Unfortunately I cannot seem to get it to work in Ruby. I have tried the following without success:
def temporary_logfile
#temporary_logfile = `mktemp "#{File.basename($PROGRAM_NAME)}_#{Time.now.strftime('%Y%m%dT%H%M%S')}.logXXXX"`
end
I expected to see my logfile in /tmp but unfortunately nothing happens. I wonder what I did wrong?
The next step would be to use slice! to remove the random generated characters from mktemp from the logfile name and than move the file somewhere else.
Have a look at Tempfile: https://ruby-doc.org/stdlib-2.6.3/libdoc/tempfile/rdoc/Tempfile.html
file = Tempfile.new('foo')
begin
# ...do something with file...
ensure
file.close
file.unlink # deletes the temp file
end
Example is taken directly from the docu.

How to find text file in same directory

I am trying to read a list of baby names from the year 1880 in CSV format. My program, when run in the terminal on OS X returns an error indicating yob1880.txt doesnt exist.
No such file or directory # rb_sysopen - /names/yob1880.txt (Errno::ENOENT)
from names.rb:2:in `<main>'
The location of both the script and the text file is /Users/*****/names.
lines = []
File.expand_path('../yob1880.txt', __FILE__)
IO.foreach('../yob1880.txt') do |line|
lines << line
if lines.size >= 1000
lines = FasterCSV.parse(lines.join) rescue next
store lines
lines = []
end
end
store lines
If you're running the script from the /Users/*****/names directory, and the files also exist there, you should simply remove the "../" from your pathnames to prevent looking in /Users/***** for the files.
Use this approach to referencing your files, instead:
File.expand_path('yob1880.txt', __FILE__)
IO.foreach('yob1880.txt') do |line|
Note that the File.expand_path is doing nothing at the moment, as the return value is not captured or used for any purpose; it simply consumes resources when it executes. Depending on your actual intent, it could realistically be removed.
Going deeper on this topic, it may be better for the script to be explicit about which directory in which it locates files. Consider these approaches:
Change to the directory in which the script exists, prior to opening files
Dir.chdir(File.dirname(File.expand_path(__FILE__)))
IO.foreach('yob1880.txt') do |line|
This explicitly requires that the script and the data be stored relative to one another; in this case, they would be stored in the same directory.
Provide a specific path to the files
# do not use Dir.chdir or File.expand_path
IO.foreach('/Users/****/yob1880.txt') do |line|
This can work if the script is used in a small, contained environment, such as your own machine, but will be brittle if it data is moved to another directory or to another machine. Generally, this approach is not useful, except for short-lived scripts for personal use.
Never put a script using this approach into production use.
Work only with files in the current directory
# do not use Dir.chdir or File.expand_path
IO.foreach('yob1880.txt') do |line|
This will work if you run the script from the directory in which the data exists, but will fail if run from another directory. This approach typically works better when the script detects the contents of the directory, rather than requiring certain files to already exist there.
Many Linux/Unix utilities, such as cat and grep use this approach, if the command-line options do not override such behavior.
Accept a command-line option to find data files
require 'optparse'
base_directory = "."
OptionParser.new do |opts|
opts.banner = "Usage: example.rb [options]"
opts.on('-d', '--dir NAME', 'Directory name') {|v| base_directory = Dir.chdir(File.dirname(File.expand_path(v))) }
end
IO.foreach(File.join(base_directory, 'yob1880.txt')) do |line|
# do lines
end
This will give your script a -d or --dir option in which to specify the directory in which to find files.
Use a configuration file to find data files
This code would allow you to use a YAML configuration file to define where the files are located:
require 'yaml'
config_filename = File.expand_path("~/yob/config.yml")
config = {}
name = nil
config = YAML.load_file(config_filename)
base_directory = config["base"]
IO.foreach(File.join(base_directory, 'yob1880.txt')) do |line|
# do lines
end
This doesn't include any error handling related to finding and loading the config file, but it gets the point across. For additional information on using a YAML config file with error handling, see my answer on Asking user for information, and never having to ask again.
Final thoughts
You have the tools to establish ways to locate your data files. You can even mix-and-match solutions for a more sophisticated solution. For instance, you could default to the current directory (or the script directory) when no config file exists, and allow the command-line option to manually override the directory, when necessary.
Here's a technique I always use when I want to normalize the current working directory for my scripts. This is a good idea because in most cases you code your script and place the supporting files in the same folder, or in a sub-folder of the main script.
This resets the current working directory to the same folder as where the script is situated in. After that it's much easier to figure out the paths to everything:
# Reset working directory to same folder as current script file
Dir.chdir(File.dirname(File.expand_path(__FILE__)))
After that you can open your data file with just:
IO.foreach('yob1880.txt')

Convert a PDF to .txt gives me an empty .txt file

Hi I'm trying to read a pdf in Ruby, first of all I want to convert it into a txt. path is the path to the PDF, The point is that I get a .txt file empty, and as someone told me is a pdftotext problem, but I don't know how to fix it.
spec = path.sub(/\.pdf$/, '')
`pdftotext #{spec}.pdf`
file = File.new("#{spec}.txt", "w+")
text = []
file.readlines.each do |l|
if l.length > 0
text << l
Rails.logger.info l
end
end
file.close
What's wrong with my code? Thanks!
It's not possible to extract text from every PDF. Some PDF files use a font encoding that makes it impossible to extract text with simple tools such as pdftotext (and some PDF files are even completely immune to direct text extraction with any tool known to me -- in these cases you'll have to apply OCR first to have a chance to extract text...).
So if you test your code with the same "weird" PDF file all the time, it may well happen that you're getting frustrated over your code while in reality the fault lies with the PDF.
First make sure that the commandline usage of pdftotxt works well with a given PDF, then test (and develop further) your code with that PDF.
The problem is you are opening the file in write ("w") mode, whuch truncates the file. You can see a table of file modes and what they mean at http://ruby-doc.org/core-1.9.3/IO.html.
Try something like this, it uses a pdftotext option to send the text to stdout to avoid creating a temporary file and uses blocks for more idiomatic ruby.
text = `pdftotext #{path} -`
text.split.select { |line|
line.length > 0
}.each { |line|
Rails.logger.info(line)
}
You would need to open the txt file with write permission.
file = File.new("#{spec}.txt", "w")
You could consult How to create a file in Ruby
Update: your code is not complete and looks buggy.
Cant say what is path
Looks like you are trying to read the text file to which you intend to write file.readlines.each
spell check length you have it l.lenght
You may want to paste the actual code.
Check this gist https://gist.github.com/4160587
As mentioned, your code is not working because you are reading and writing to the same file.
Example
Ruby code file_write.rb to do the file write operation
pdf_file = File.open("in.txt")
output_file = File.open("out.txt", "w") # file to which you want to write
#iterate over input file and write the content to output file
pdf_file.readlines.each do |l|
output_file.puts(l)
end
output_file.close
pdf_file.close
Sample txt file in.txt
Some text in file
Another line of text
1. Line 1
2. Not really line 2
Once your run file_write.rb you should see new file called out.txt with same content as in.txt You could change the content of input file if you want. In your case you would use pdf reader to get the content and write it to the text file. Basically first line of the code will change.

Parsing a Zip file and extracting records from text files

I am really new to Ruby and could use some help with a program. I need to open a zip file that contains multiple text files that has many rows of data (eg.)
CDI|3|3|20100515000000|20100515153000|2008|XXXXX4791|0.00|0.00
CDI|3|3|20100515000000|20100515153000|2008|XXXXX5648|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX3276|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX4342|0.00|0.00
MITR|3|3|20100515000000|20100515153000|0000|XXXXX7832|0.00|0.00
HR|3|3|20100515000000|20100515153000|1114|XXXXX0238|0.00|0.00
I first need to extract the zip file, read the text files located in the zip file and write only the complete rows that start with (CDI and CHO) to two output files, one for the rows of data starting with CDI and one for the rows of data starting with CHO (basically parsing the file). I have to do it with Ruby and possibly try to set the program to an auto function for arrival of continuous zip files of the same stature. I completely appreciate any advice, direction or help via some sample anyone can give.
One means is using the ZipFile library.
require 'zip/zip'
# To open the zip file and pass each entry to a block
Zip::ZipFile.foreach(path_to_zip) do |text_file|
# Read from entry, turn String into Array, and pass to block
text_file.read.split("\n").each do |line|
if line.start_with?("CDI") || line.start_with?("CHO")
# Do something
end
end
end
I'm not sure if I entirely follow your question. For starters, if you're looking to unzip files using Ruby, check out this question. Once you've got the file unzipped to a readable format, you can try something along these lines to print to the two separate outputs:
cdi_output = File.open("cdiout.txt", "a") # Open an output file for CDI
cho_output = File.open("choout.txt", "a") # Open an output file for CHO
File.open("text.txt", "r") do |f| # Open the input file
while line = f.gets # Read each line in the input
cdi_output.puts line if /^CDI/ =~ line # Print if line starts with CDI
cho_output.puts line if /^CHO/ =~ line # Print if line starts with CHO
end
end
cdi_output.close # Close cdi_output file
cho_output.close # Close cho_output file

Replace a line in a file using '+' File IO modes in Ruby

Ruby beginner here!
I am aware that Ruby's File.open method has certain modes like r,w,a,r+,w+,a+ and the complimentary b. I totally understand the use of r,w and a modes. But I cannot seem to understand how to use the ones with the '+' symbol. Can anyone provide me with some links where there are examples as well as explanations for the use of it?
Can it be used to read a line and edit/replace it in place by a equal amount of content? If so, then how?
Sample data file: a.txt
aaa
bbb
ccc
ddd
Demo.rb
file = File.open "a.txt","r+"
file.each do |line|
line = line.chomp
if(line=="bbb")then
file.puts "big"
end
end
file.close
I am trying to replace "bbb" with "big" but I am getting this:-
in notepad++
aaa
bbb
big
ddd
in notepad
aaa
bbb
bigddd
snatched this documentation from another answer, so not mine, the solution is mine
r Read-only mode. The file pointer is placed at the beginning of the file. This is the default mode.
r+ Read-write mode. The file pointer will be at the beginning of the file.
w Write-only mode. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
w+ Read-write mode. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
a Write-only mode. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
a+ Read and write mode. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
EDIT: here the solution to your sample, most of the time the whole string is gsubbed and written back to the file but 'infile' replacing without rewriting the whole file is also possible
You should be cautious to replace with a string of the same length.
File.open('a.txt', 'r+') do |file|
file.each_line do |line|
if (line=~/bbb/)
file.seek(-line.length-3, IO::SEEK_CUR)
file.write 'big'
end
end
end
=>
aaa
big
ccc
ddd
And this is a more conventional way, though more concise then most other solutions
File.open(filename = "a.txt", "r+") { |file| file << File.read(filename).gsub(/bbb/,"big") }
EDIT2: i now realize this can still shorter
File.write(f = "a.txt", File.read(f).gsub(/bbb/,"big"))
So you are reading an entire file into a variable, then performing the
substitution, and the writing the variable's contents back to the
file. Am I right? I was looking for something kinda inline
That's the way to do it. You can alternativly use IO#readlines to read all lines into Array and then process them.
And this has been already answered:
How to search file text for a pattern and replace it with a given value
If you are woried about performance or memory usage then use the right tools for the right job. On *nix (or cygwin on windows):
sed -i -e "s/bbb/big/g" a.txt
Will do exactly what you want.

Resources