I want to remove duplicate lines from a file but only remove duplicate lines that match a specific regular expression, leaving all other duplicates in the file. Here is what I currently have:
unique_lines = File.readlines("Ops.Web.csproj").uniq do |line|
line[/^.*\sInclude=\".*\"\s\/\>$/]
end
File.open("Ops.Web.csproj", "w+") do |file|
unique_lines.each do |line|
file.puts line
end
end
This will deduplicate the lines correctly but will only add the lines that meet the regular expression back into the file. I need all the other lines in the file to be added back unchanged. I know I am missing something small here. Ideas?
Try this:
lines = File.readlines("input.txt")
out = File.open("output.txt", "w+")
seen = {}
lines.each do |line|
# check if we want this de-duplicated
if line =~ /Include/
if !seen[line]
out.puts line
seen[line] = true
end
else
out.puts line
end
end
out.close
Demo:
➜ 12980122 cat input.txt
a
b
c
Include a
Include b
Include a
Include a
d
e
Include b
f
➜ 12980122 ruby exec.rb
➜ 12980122 cat output.txt
a
b
c
Include a
Include b
d
e
f
Related
New to ruby here!
How to replace the whole line in a text file which contains a specific string using ruby?
Example: I want to remove and add the whole line contains "DB_URL" and add something like "DB_CON=jdbc:mysql:replication://master,slave1,slave2,slave3/test"
DB_URL=jdbc:oracle:thin:#localhost:TEST
DB_USERNAME=USER
DB_PASSWORD=PASSWORD
Here is your solution.
file_data = ""
word = 'Word you want to match in line'
replacement = 'line you want to set in replacement'
IO.foreach('pat/to/file.txt') do |line|
file_data += line.gsub(/^.*#{Regexp.quote(word)}.*$/, replacement)
end
puts file_data
File.open('pat/to/samefile.txt', 'w') do |line|
line.write file_data
end
Here is my attempt :
file.txt
First line
Second line
foo
bar
baz foo
Last line
test.rb
f = File.open("file.txt", "r")
a = f.map do |l|
(l.include? 'foo') ? "replacing string\n" : l # Please note the double quotes
end
p a.join('')
Output
$ ruby test.rb
"First line\nSecond line\nreplacing string\nbar\nreplacing string\nLast line"
I commented # Please note the double quotes because single quotes will escape the \n (that will become \\n). Also, you might want to think about the last line of your file since it will add \n at the end of the last line when there will not have one at the end of your original file. If you don't want that you could make something like :
f = File.open("file.txt", "r")
a = f.map do |l|
(l.include? 'foo') ? "replacing string\n" : l
end
a[-1] = a[-1][0..-2] if a[-1] == "replacing string\n"
p a.join('')
There is a file with some marker word in it:
qwerty
I am the marker!
zxcvbn
123456
I want to overwrite all the rest of the file after the marker with some unknown amount of lines instead:
qwerty
I am the marker!
inserted line #1
inserted line #2
inserted line #3
But if there are too few lines to be inserted, the tail can be still there, that I do not need:
qwerty
I am the marker!
inserted line #1
123456
Here is my code (simplified):
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# I wish I could do file.put_EOF here
end
File.open("file.txt", "r+") do |file|
file.gets "marker"
file.gets
lines_to_insert.each do |line|
file.puts line
end
# EOF here
file.truncate(file.pos)
end
Making use of File#pos to specify where to truncate.
How about using a temp file?
File.open("file.tmp", "w") do |tmp_file|
File.open("file.txt", "r+") do |file|
file.readlines.each do |line|
# add each line of the original file up to and including marker line
tmp_file.puts line
if line.include? "marker" #or however you're indicating marker
break
end
end
# add new lines
lines_to_insert.each do |line|
tmp_file.puts line
end
end
end
FileUtils.mv 'file.tmp', 'file.txt'
This will guarantee a file with a proper EOF line and not a hacky set of lines at the end that are nothing but newline characters or spaces.
Why not fill an array with each line, by using something like this:
array = file.split("\\\n")
Then you can just find the index of the array that contains the word marker
marker_index = array.index{|line|line.include('marker')}
Then just add random values to any index > marker_index
Finally concatenate all the strings in your array (don't forget to add your \n back in) and write back to your file.
I need to find each occurrence of "$" and change it to a number using a count. eg str = "foo $ bar $ foo $ bar $ * run code here * => "foo 1 bar 2 foo 3 bar 4
It feels like this should be a lot easier than i'm making it out to be. Here's my code:
def counter(file)
f = File.open(file, "r+")
count = 0
contents = f.readlines do |s|
if s.scan =~ /\$/
count += 1
f.seek(1)
s.sub(/\$/, count.to_s)
else
puts "Total changes: #{count}"
end
end
end
However I'm not sure if I'm meant to be using .match, .scan, .find or whatever else.
When i run this it doesn't come up with any errors but it doesn't change anything either.
Your syntax for scan is incorrect and it should throw error.
You can try something along this line:
count = 0
str = "foo $ bar $ foo $ bar $ "
occurences = str.scan('$')
# => ["$", "$", "$", "$"]
occurences.size.times do str.sub!('$', (count+=1).to_s) end
str
# => "foo 1 bar 2 foo 3 bar 4 "
Explanation:
I am finding all occurences of $ in the string, then I am using sub! in iteration as it replaces only the first occurrence at a time.
Note: You may want to improve scan line by using regex with boundary match instead of plain "$" as it will replace $ even from within words. Eg: exa$mple will also get replace to something like: exa1mple
Why your code is not throwing error?
If you read the description about readlines, you will find:
Reads the entire file specified by name as individual lines, and
returns those lines in an array.
As it reads the entire file at once there is no value passing block along this method. Following example will make it more clear:
contents = f.readlines do |s|
puts "HELLO"
end
# => ["a\n", "b\n", "c\n", "d\n", "asdasd\n", "\n"] #lines of file f
As you can see "HELLO" never gets printed, showing the block code is never executed.
n00b question alert!
here is the problem:
I am creating a shell script that takes a minimum of 3 arguments: a string, a line number, and at least one file.
I've written a script that will accept EXACTLY 3 arguments, but I don't know how to handle multiple file name arguments.
here's the relevant parts of my code (skipping the writing back into the file etc):
#!/usr/bin/env ruby
the_string = ARGV[0]
line_number = ARGV[1]
the_file = ARGV[2]
def insert_script(str, line_n, file)
f = file
s = str
ln = line_n.to_i
if (File.file? f)
read_in(f,ln,s)
else
puts "false"
end
end
def read_in(f,ln,s)
lines = File.readlines(f)
lines[ln] = s + "\n"
return lines
end
# run it
puts insert_script(the_string, line_number, the_file)
now I know that it's easy to write a block that will iterate through ALL the arguments:
ARGV.each do |a|
puts a
end
but I need to ONLY loop through the args from ARGV[2] (the first file name) to the last file name.
I know there's got to be - at a minimum - at least one easy way to do this, but I just can't see what it is at the moment!
in any case - I'd be more than happy if someone can just point me to a tutorial or an example, I'm sure there are plenty out there - but I can't seem to find them.
thanks
Would you consider using a helpful gem? Trollop is great for command line parsing because it automatically gives you help messages, long and short command-line switches, etc.
require 'trollop'
opts = Trollop::options do
opt :string, "The string", :type => :string
opt :line, "line number", :type => :int
opt :file, "file(s)", :type => :strings
end
p opts
When I call it "commandline.rb" and run it:
$ ruby commandline.rb --string "foo bar" --line 3 --file foo.txt bar.txt
{:string=>"foo bar", :line=>3, :file=>["foo.txt", "bar.txt"], :help=>false, :string_given=>true, :line_given=>true, :file_given=>true}
If you modify the ARGV array to remove the elements you're no longer interested in treating as filenames, you can treat all remaining elements as filenames and iterate over their contents with ARGF.
That's a mouthful, a small example will demonstrate it more easily:
argf.rb:
#!/usr/bin/ruby
str = ARGV.shift
line = ARGV.shift
ARGF.each do |f|
puts f
end
$ ./argf.rb one two argf.rb argf.rb
#!/usr/bin/ruby
str = ARGV.shift
line = ARGV.shift
ARGF.each do |f|
puts f
end
#!/usr/bin/ruby
str = ARGV.shift
line = ARGV.shift
ARGF.each do |f|
puts f
end
$
There are two copies of the argf.rb file printed to the console because I gave the filename argf.rb twice on the command line. It was opened and iterated over once for each mention.
If you want to operate on the files as files, rather than read their contents, you can simply modify the ARGV array and then use the remaining elements directly.
The canonical way is to use shift, like so:
the_string = ARGV.shift
line_number = ARGV.shift
ARGV.each do |file|
puts insert_script(the_string, line_number, the_file)
end
Take a look at OptionParser - http://ruby-doc.org/stdlib-1.9.3/libdoc/optparse/rdoc/OptionParser.html. It allows you to specify the number of arguments, whether they are mandatory or optional, handle errors such as MissingArgument or InvalidOption.
An alternate (and somewhat uglier) trick if you don't want to use another library or change the ARGV array is to use .upto
2.upto(ARGV.length-1) do |i|
puts ARGV[i]
end
Is it possible to extract a particular line from a file knowing its line number? For example, just get the contents of line N as a string from file "text.txt"?
You could get it by index from readlines.
line = IO.readlines("file.txt")[42]
Only use this if it's a small file.
Try one of these two solutions:
file = File.open "file.txt"
#1 solution would eat a lot of RAM
p [*file][n-1]
#2 solution would not
n.times{ file.gets }
p $_
file.close
def get_line_from_file(path, line)
result = nil
File.open(path, "r") do |f|
while line > 0
line -= 1
result = f.gets
end
end
return result
end
get_line_from_file("/tmp/foo.txt", 20)
This is a good solution because:
You don't use File.read, thus you don't read the entire file into memory. Doing so could become a problem if the file is 20MB large and you read often enough so GC doesn't keep up.
You only read from the file until the line you want. If your file has 1000 lines, getting line 20 will only read the 20 first lines into Ruby.
You can replace gets with readline if you want to raise an error (EOFError) instead of returning nil when passing an out-of-bounds line.
File has a nice lineno method.
def get_line(filename, lineno)
File.open(filename,'r') do |f|
f.gets until f.lineno == lineno - 1
f.gets
end
end
linenumber=5
open("file").each_with_index{|line,ind|
if ind+1==linenumber
save=line
# break or exit if needed.
end
}
or
linenumber=5
f=open("file")
while line=f.gets
if $. == linenumber # $. is line number
print "#{f.lineno} #{line}" # another way
# break # break or exit if needed
end
end
f.close
If you just want to get the line and do nothing else, you can use this one liner
ruby -ne '(print $_ and exit) if $.==5' file
If you want one liner and do not care about memory usage, use (assuming lines are numbered from 1)
lineN = IO.readlines('text.txt')[n-1]
or
lineN = f.readlines[n-1]
if you already have file opened.
Otherwise it would be better to do like this:
lineN = File.open('text.txt') do |f|
(n-1).times { f.gets } # skip lines preceeding line N
f.gets # read line N contents
end
These solutions work if you want only one line from a file, or if you want multiple lines from a file small enough to be read repeatedly. Large files (for example, 10 million lines) take much longer to search for a specific line so it's better to get the necessary lines sequentially in a single read so the large file doesn't get read multiple times.
Create a large file:
File.open('foo', 'a') { |f| f.write((0..10_000_000).to_a.join("\n")) }
Pick which lines will be read from it and make sure they're sorted:
lines = [9_999_999, 3_333_333, 6_666_666].sort
Print out those lines:
File.open('foo') do |f|
lines.each_with_index do |line, index|
(line - (index.zero? ? 0 : lines[index - 1]) - 1).times { f.gets }
puts f.gets
end
end
This solution works for any number of lines, does not load the entire file into memory, reads as few lines as possible, and only reads the file one time.