Python edit file with an insanely long line - python-2.6

I am trying to edit particular html files that I download in python. I am running into a problem where I run my code to edit the file and my python context locks up. I checked the file it's writing to and found that there are two files. The html file and a .bak file.
The html file starts out at 0kb and the .bak file constantly grows to a point, maybe 12 mb or so, then the .html file will grow to a larger size, then the .bak file will grow again. This seems to cycle endlessly. The html file I am editing is 22kb. I watched the output file grow to a gig once just to see if it would stop... It doesn't.
Here is the function I am using to edit the file:
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
content = f.readlines()
for i, line in enumerate(content):
content[i] = line.replace(search_str, replace_str)
f.writelines(content)
f.close()
The issue, I imagine relates to the fact that the html file, as downloaded, is mostly in a single line with ~ 21,000 characters in it. Any ideas?
edit:
I have also tried another function, but get the same result:
def replace(self, search_str, replace_str):
assert self.path != None, 'No file path provided.'
fi = fileinput.FileInput(self.path,inplace=1)
for line in fi:
if search_str in line:
line=line.replace(search_str,replace_str)
print line
fi.close()

Try using generator. Thats the way to go if you need to read a large file
for line in open(self.path,'r+'):
# do stuff with line

I re-wrote the function to write everything out to a new file and it works.
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
new_path = self.path.split('.')[0]+'.TEMP'
new_f = open(new_path,'w')
new_lines = [x.replace(search_str, replace_str) for x in f]
new_f.writelines(new_lines)
f.close()
new_f.close()
os.remove(self.path)
os.rename(new_path, self.path)

Related

How to combine lines from separate text files into one new text tile using Shell Script

I just want to preface this by saying I am an absolute noob to shell script, so stick with me here.
Essentially, I am trying to combine lines of text from separate test files (one line from HC_01_MNoFraming.txt added at the end of the corresponding line from HC_01_Gist.txt) into a single line of text on a newly created text file called HC_01_MGNoFraming.txt. This combination of the lines of text would occur for each of the four lines of text in each text file. Below is an example:
HC_01_Gist.txt
12.0130754383 33.0026698754 117.01282238700001 182.01823507400002 201.005570202 220.010026843 352.023495725 478.012859369 518.0072172580001
12.012680624100001 56.0118834624 144.026174161 167.018345335 228.002317522 247.00666698400002 356.027611312 434.014129075 478.013307259
56.01133045 142.00709709999998 207.0121417 331.01635039999996 350.0040858 369.0084907 390.01512310000004 409.0028586 430.0265601 512.0175128999999 556.0159299
12.012615221199999 176.01120546400003 199.00347731 243.019245908 430.00998814900004 472.02324678400004 495.01541966900004
HC_01_MNoFraming
262.006565969 283.013202477 375.015624781 541.016055549
33.019710919299996 497.017719727 537.0287543219999
121.0167582 226.0165682
75.01583263170001 113.024949998 262.023691934 386.010783941 537.011983108
HC_01_MGNoFraming.txt
12.0130754383 33.0026698754 117.01282238700001 182.01823507400002 201.005570202 220.010026843 352.023495725 478.012859369 518.0072172580001 262.006565969 283.013202477 375.015624781 541.016055549
12.012680624100001 56.0118834624 144.026174161 167.018345335 228.002317522 247.00666698400002 356.027611312 434.014129075 478.013307259 33.019710919299996 497.017719727 537.0287543219999
56.01133045 142.00709709999998 207.0121417 331.01635039999996 350.0040858 369.0084907 390.01512310000004 409.0028586 430.0265601 512.0175128999999 556.0159299 121.0167582 226.0165682
12.012615221199999 176.01120546400003 199.00347731 243.019245908 430.00998814900004 472.02324678400004 495.01541966900004 75.01583263170001 113.024949998 262.023691934 386.010783941 537.011983108
I tried several things but the simplest solution I could find online was just using the paste command like this:
#(A) First Timing File = Timing File you Want to Be in Front
A=HC_01_Gist.txt
#(B) Second Timing File Timing File you Want to Be in Back
B=HC_01_MNoFraming.txt
#(C) Output Timing File Name C=HC_01_MGNoFraming.txt
#------------------------------------------------------------
paste $B $A
paste $B $A >$C
Looking at the terminal output, this is what this code gives:
262.006512.0130754383 33.0026698754 117.01282238700001 182.01823507400002 201.005570202 220.010026843 352.023495725 478.012859369 518.0072172580001
33.0197112.012680624100001 56.0118834624 144.026174161 167.018345335 228.002317522 247.00666698400002 356.027611312 434.014129075 478.013307259
121.016756.01133045 142.00709709999998 207.0121417 331.01635039999996 350.0040858 369.0084907 390.01512310000004 409.0028586 430.0265601 512.0175128999999 556.0159299
75.0158312.012615221199999 176.01120546400003 199.00347731 243.019245908 430.00998814900004 472.02324678400004 495.01541966900004
It almost looks like in terminal that the paste command starts working but then after the first data point of HC_01_MNoFraming, the paste jumbles up the text and then resumes with the second data point of HC_01_Gist.txt.
Looking at the HC_01_MGNoFraming.txt file that the code creates:
262.006565969 283.013202477 375.015624781 541.016055549
12.0130754383 33.0026698754 117.01282238700001 182.01823507400002 201.005570202 220.010026843 352.023495725 478.012859369 518.0072172580001
33.019710919299996 497.017719727 537.0287543219999
12.012680624100001 56.0118834624 144.026174161 167.018345335 228.002317522 247.00666698400002 356.027611312 434.014129075 478.013307259
121.0167582 226.0165682
56.01133045 142.00709709999998 207.0121417 331.01635039999996 350.0040858 369.0084907 390.01512310000004 409.0028586 430.0265601 512.0175128999999 556.0159299
75.01583263170001 113.024949998 262.023691934 386.010783941 537.011983108
12.012615221199999 176.01120546400003 199.00347731 243.019245908 430.00998814900004 472.02324678400004 495.01541966900004
I'm wondering about two things:
1)How can I fix the output .txt file to have the data points in-line 2) how can I ensure that the data points for HC_01_Gist.txt go before the data points for HC_01_MNoFraming.
Thanks and I'd be more than happy to clarify my problem and answer any questions you have.

How to sequentially create multiple CSV files in Ruby?

Silly question, but I want to do some processing on a dataset and put them into different CSVs, like UDID1.csv, UDID2.csv, ..., UDID1000.csv. So this is my code:
for i in 1..1000
logfile = File.new('C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv',"a")
#I'll do some processing here
end
But the program throws an error when running because of the UDID#{i} part. So, how to overcome this issue? Thanks.
Edit: This is the error:
in `initialize': No such file or directory # rb_sysopen - C:\Users\hp1\Desktop\Datasets\New File\udid#{1}\.csv (Errno::ENOENT)from C:/Ruby21/bin/hashedUDID.rb:38:in `new' from C:/Ruby21/bin/hashedUDID.rb:38:in '<main>'
The ' is one problem, another problem is the path.
In your posting the New File must exist as a directory. Inside this directory must exist another directories like UDID0001. This gets a .csv file.
Correct is (I don't use the non-rubyesk for-loop):
1.upto(1000) do |i|
logfile = File.new("C:\\Users\\hp1\\Desktop\\Datasets\\UDID#{i}.csv", "a")
#I'll do some processing here
logfile.close #Don't forget to close the file
end
Inside " the backslash must be masked (\\). Instead you may use /:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID#{i}/.csv", "a")
Another possibility is the usage of %i to insert the number:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID%02i/.csv" % i, "a")
I prefer to use open, then the file is closed with the end of the block:
File.open("C:/Users/hp1/Desktop/Datasets/New File/UDID%04i/.csv" % i, "a") do |logfile|
#I'll do some processing here
end #closes the file
Warning:
I'm not sure, if you really want to create 1000 log files (The File is opened inside the loop. so each step creates a file.).
If yes, then the %04i-version has the advantage, that the files get all the same number of digits (starting with 0001 and ending with 1000).
(1..10).each { |i| logfile = File.new("/base/path/UDID#{i}.csv") }
You must use double quote (") when you need string interpolation.
#{} can only be used in strings with double quotes ". So change your code to:
for i in 1..1000
logfile = File.new("C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv","a")
# other stuff
end

How to write a file in specific path in ruby

I want to save my files in specific path..
I have used like this
file_name = gets
F = open.(Dir.pwd, /data/folder /#{#file_name },w+)
I'm not sure whether the above line is correct or not! Where Dir.pwd tell the directory path followed by my folder path and the file name given.
It should get store the value on the specific path with the specific file name given. Can anyone tell me how to do that.
Your code has multiple errors. Have you ever tried to execute the script?
Your script ends with:
test.rb:7: unknown regexp options - fldr
test.rb:7: syntax error, unexpected end-of-input
F = open.(Dir.pwd, /data/folder /#{#file_name },w+)
First: You need to define the strings with ' or ":
file_name = gets
F = open.(Dir.pwd, "/data/folder/#{#file_name}","w+")
Some other errors:
You use file_name and later #file_name.
The open method belongs to File and needs two parameters.
The file is defined as a constant F. I would use a variable.
The path must be concatenated. I'd use File.join for it.
You don't close the file.
After all these changes you get:
file_name = gets
f = File.open(File.join(Dir.pwd, "/data/folder/#{file_name}"),"w+")
##
f.close
and the error:
test.rb:29:in `initialize': No such file or directory # rb_sysopen - C:/Temp/data/folder/sdssd (Errno::ENOENT)
The folder must exist, so you must create it first.
Now the script looks like:
require 'fileutils'
dirname = "data/folder"
file_name = gets.strip
FileUtils.mkdir_p(dirname) unless Dir.exists?(dirname)
f = File.open(File.join(Dir.pwd, dirname, file_name),"w+")
##fill the content
f.close

Ruby - CSV works while SmarteCSV doesn't

I want to open a csv file using SmarterCSV.process
market_csv = SmarterCSV.process(market)
p "just read #{market_csv}"
The problem is that the data is not read and this prints:
[]
However, if I attempt the same thing with the default CSV library implementation the content of the file is read(the following print statement prints the file).
CSV.foreach(market) do |row|
p row
end
The content of the file I was reading is of the form:
Date,Close
03/06/15,0.1634
02/06/15,0.1637
01/06/15,0.1638
31/05/15,0.1638
The problem could come from the line separator, the file is not exactly the same if you're using windows or unix system ("\r\n" or "\r"). Try to identify and specify the character in the SmarterCSV.process like this:
market_csv = SmarterCSV.process(market, row_sep: "\r")
p "just read #{market_csv}"
or like this:
market_csv = SmarterCSV.process(market, row_sep: :auto)
p "just read #{market_csv}"

No such file or directory - ruby

I am trying to read the contents of the file from a local disk as follows :
content = File.read("C:\abc.rb","r")
when I execute the rb file I get an exception as Error: No such file or directory .What am I missing in this?
In a double quoted string, "\a" is a non-printable bel character. Similar to how "\n" is a newline. (I think these originate from C)
You don't have a file with name "C:<BEL>bc.rb" which is why you get the error.
To fix, use single quotes, where these interpolations don't happen:
content = File.read('C:\abc.rb')
content = File.read("C:\/abc.rb","r")
First of all:
Try using:
Dir.glob(".")
To see what's in the directory (and therefore what directory it's looking at).
open("C:/abc.rb", "rb") { |io| a = a + io.read }
EDIT: Unless you're concatenating files together, you could write it as:
data = File.open("C:/abc.rb", "rb") { |io| io.read }

Resources