Opening files from filepaths on windows - windows

I try to define a function that will take a file path and turn it into a string.
This is the defenition I came up with:
def get_book(file_path):
'''Takes a file path and returns the entire book as a string.'''
with open(file_path, 'r', 'utf-8') as infile:
content = infile.read()
return content
AnnaKarenina = get_book('../Python/Data/books/AnnaKarenina.txt')
I now get TypeError: an integer is required (got type str)
I also tried using the os.path, different kinds of slashes and other tricks for opening files with windows, but that all returns the error file not found.
Does anyone know what I am doing wrong?

The encoding parameters of open function is a named parameters, so you have to specify it like this :
def get_book(file_path):
'''Takes a file path and returns the entire book as a string.'''
with open(file_path, 'r', encoding='utf-8') as infile:
content = infile.read()
return content
AnnaKarenina = get_book('../Python/Data/books/AnnaKarenina.txt')

Related

Python: Opening auto-generated file

As part of my larger program, I want to create a logfile with the current time & date as part of the title. I can create it as follows:
malwareLog = open(datetime.datetime.now().strftime("%Y%m%d - %H.%M " + pcName + " Malware scan log.txt"), "w+")
Now, my app is going to call a number of other functions, so I'll need to open the file, write some output to it and close the file, several times. It doesn't seem to work if I simply go:
malwareLog.open(malwareLog, "a+")
or similar. So how should I open a dynamically created txt file that I don't know the actual filename for...?
When you create malwareLog object, it has name attribute which contains the file name.
Here's an example: (my test is your malwareLog)
import random
test = open(str(random.randint(0,999999))+".txt", "w+")
test.write("hello ")
test.close()
test = open(test.name, "a+")
test.write("world!")
test.close()
with open(test.name, "r") as f: print(f.read())
You also can store the file name in a variable before or after creating the file.
###Before
file_name = "123"
malwareLog = open(file_name, "w")
###After
malwareLog = open(random.randint(0,999999), "w")
file_name = malwareLog.name

Tempfile.new vs. File.open on Heroku

I'm capturing/creating user entered text into files from my app, attempting to temporarily store them in my Heroku tmp directory, then upload them to a cloud service such as Google Drive.
In using Tempfile I can successfully upload, but when using File.open I get the following error when attempting to upload:
ArgumentError (wrong number of arguments (1 for 0))
The error is on the call:
#client.upload_file_by_folder_id(save_path, #folder_id)
Where #client is a session with the cloud service, save_path is the location of the attached file for upload and #folder_id is the folder they should go into.
When I use Tempfile.new I am successful in doing so:
tempfile = Tempfile.new([final_filename, '.txt'], Rails.root.join('tmp','text-temp'))
tempfile.binmode
tempfile.write msgbody
tempfile.close
save_path = tempfile.path
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
tempfile.unlink
File.open code is:
path = 'tmp/text-temp'
filename = "#{final_filename}.txt"
save_path = Rails.root.join(path, filename)
File.open(save_path, 'wb') do |file|
file.write(msgbody)
file.close
end
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
File.delete(save_path)
Could it be that the File.path is a string, and Tempfile.path is the full path (but not as a string)? When I put out each, they look identical.
I'd like to use File as I don't want to change the filename of the existing attachments I'm uploading, whereas Tempfile appends to the filename.
Any and all assistance is greatly appreciated. Thanks!
In order for it to work using File, I needed to set the save_path to a string:
save_path.to_s

Python 3.3 - Getting Header and redirect the results

I would like to create a script that look to a file, read each line (which are url) and fetch the HTTP header for me.
I have a questions :
I try to redirect the result to a text file but, anyhow I try, it is not working.
Can someone help me with my code please ?
import urllib.request
import sys
open('sorti.txt','w')
sorti = open("sorti.txt",'w')
print('Creation de sorti.txt')
text_file = open ("id.txt", "r")
text_file.read().strip('\n')
for lines in text_file:
urllib.request.urlopen('lines').write.sorti()
header = urllib.request.parse_http_list(lines).write.sorti()
sys.stdout(sorti)
text_file.close
sorti.close
Supposing you are looking for something like this
URL1
header_1: value
...
header_n: value
URL2
header_1: value
...
header_n: value
change the code as follows:
text_file = open ("id.txt", "r")
for line in text_file:
sorti.write(line) // writes the current url
obj = urllib.request.urlopen(line)
headers = dict(obj.info()) // gets the headers
for (h,v) in headers.items(): // write all with the specified format
sorti.write("{0}: {1}\n".format(h, v))
sorti.write("\n")
sorti.close
Those write.sorti() will not work.

'File path' use causing program exit in Python 3

I have downloaded a set of html files and saved the file paths which I saved them to in a .txt file. It has each path on a new line. I wanted to look at the first file in the list and then itterate through the whole list, opening the files and extracting data before going on to the next file.
My code works fine with a single path put in directly (for the first file) as:
path = r'C:\path\to\file.html'
and works if I itterate through the text file using:
file_list_fp = r'C:\path\to\file_with_pathlist.txt'
with open(file_list_fp, 'r') as file_list:
for filepath in file_list:
pathend = filepath.find('\n')
path = file[:pathend]
q = open(path, 'r').read()
but it fails when I try getting a single path using either:
with open(file_list_fp, 'r') as file_list:
path_n = file_list.readline()
end = path_n.find('\n')
path_bad1 = path_n[:end]
or:
with open(file_list_fp, 'r') as file_list:
path_bad2 = file_list.readline().split('\n')[0]
With these two my code exits just after that point. I can't figure out why. Any pointers very welcome. (I'm using Python 3.3.1 on windows.)

Is there a way to remove the BOM from a UTF-8 encoded file?

Is there a way to remove the BOM from a UTF-8 encoded file?
I know that all of my JSON files are encoded in UTF-8, but the data entry person who edited the JSON files saved it as UTF-8 with the BOM.
When I run my Ruby scripts to parse the JSON, it is failing with an error.
I don't want to manually open 58+ JSON files and convert to UTF-8 without the BOM.
With ruby >= 1.9.2 you can use the mode r:bom|utf-8
This should work (I haven't test it in combination with json):
json = nil #define the variable outside the block to keep the data
File.open('file.txt', "r:bom|utf-8"){|file|
json = JSON.parse(file.read)
}
It doesn't matter, if the BOM is available in the file or not.
Andrew remarked, that File#rewind can't be used with BOM.
If you need a rewind-function you must remember the position and replace rewind with pos=:
#Prepare test file
File.open('file.txt', "w:utf-8"){|f|
f << "\xEF\xBB\xBF" #add BOM
f << 'some content'
}
#Read file and skip BOM if available
File.open('file.txt', "r:bom|utf-8"){|f|
pos =f.pos
p content = f.read #read and write file content
f.pos = pos #f.rewind goes to pos 0
p content = f.read #(re)read and write file content
}
So, the solution was to do a search and replace on the BOM via gsub!
I forced the encoding of the string to UTF-8 and also forced the regex pattern to be encoded in UTF-8.
I was able to derive a solution by looking at http://self.d-struct.org/195/howto-remove-byte-order-mark-with-ruby-and-iconv and http://blog.grayproductions.net/articles/ruby_19s_string
def read_json_file(file_name, index)
content = ''
file = File.open("#{file_name}\\game.json", "r")
content = file.read.force_encoding("UTF-8")
content.gsub!("\xEF\xBB\xBF".force_encoding("UTF-8"), '')
json = JSON.parse(content)
print json
end
You can also specify encoding with the File.read and CSV.read methods, but you don't specify the read mode.
File.read(path, :encoding => 'bom|utf-8')
CSV.read(path, :encoding => 'bom|utf-8')
the "bom|UTF-8" encoding works well if you only read the file once, but fails if you ever call File#rewind, as I was doing in my code. To address this, I did the following:
def ignore_bom
#file.ungetc if #file.pos==0 && #file.getc != "\xEF\xBB\xBF".force_encoding("UTF-8")
end
which seems to work well. Not sure if there are other similar type characters to look out for, but they could easily be built into this method that can be called any time you rewind or open.
Server side cleanup of utf-8 bom bytes that worked for me:
csv_text.gsub!("\xEF\xBB\xBF".force_encoding(Encoding::BINARY), '')

Resources