I have about 150 .xls and .xlsx files that I need converting into tab-delimited. I tried using automator, but I was only able to do it one-by-one. It's definitely faster than opening up each one individually, though. I have very little scripting knowledge, so I would appreciate a way to do this as painlessly as possible.
If you would be prepared to use Python for this I have written a script that converts Excel spreadsheets to csv files. The code is available in Pastebin.
You would just need to change the following line:
writer = csv.writer(fileout)
to:
writer = csv.writer(fileout, delimiter="\t")
to make the output file tab delimited rather than the standard comma delimited.
As it stands this script prompts you for files one at a time (allows you to select from a dialogue), but it could easily be adapted to pick up all of the Excel files in a given directory tree or where the names match a given pattern.
If you give this a try with an individual file first and let me know how you get on, I can help with the changes to automate the rest if you like.
UPDATE
Here is a wrapper script you could use:
#!/usr/bin/python
import os, sys, traceback
sys.path.insert(0,os.getenv('py'))
import excel_to_csv
def main():
# drop out if no arg for excel dir
if len(sys.argv) < 2:
print 'Usage: Python xl_csv_wrapper <path_to_excel_files>'
sys.exit(1)
else:
xl_path = sys.argv[1]
xl_files = os.listdir(xl_path)
valid_ext = ['.xls', '.xlsx', '.xlsm']
# loop through files in path
for f in xl_files:
f_name, ext = os.path.splitext(f)
if ext.lower() in valid_ext:
try:
print 'arg1:', os.path.join(xl_path,f)
print 'arg2:', os.path.join(xl_path,f_name+'.csv')
excel_to_csv.xl_to_csv(os.path.join(xl_path,f),
os.path.join(xl_path,f_name+'.csv'))
except:
print '** Failed to convert file:', f, '**'
exc_type, exc_value, exc_traceback = sys.exc_info()
lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
for line in lines:
print '!!', line
else:
print 'Sucessfully conveted', f, 'to .csv'
if __name__ == '__main__':
main()
You will need to replace the :
sys.path.insert(0,os.getenv('py'))
At the top with an absolute path to the excel_to_csv script or an environment variable on your system.
Use VBA in a control workbook to loop through the source workbooks in a specified directory or a list of workbooks, opening each, saving out the converted data, then closing each in turn.
Related
I'm trying to write a file with a python program. When I perform all the actions command line, they all work fine. The file is created.
When I perform the actions in a python script, the file does not exist after the script terminates.
I created a small script that demonstrates the behavior.
import os
import os.path
current_dir = os.getcwd()
output_file = os.path.join(current_dir, "locations.js")
print output_file
f = open(output_file, "w")
f.write("var locations = [")
f.write("{lat: 55.978467, lng: 9.863467}")
f.write("]")
f.close()
if os.path.isfile(output_file):
print output_file + " exists"
exit()
Running the script from the command line, I get these results:
D:\Temp\GeoMap>python test.py
D:\Temp\GeoMap\locations.js
D:\Temp\GeoMap\locations.js exists
D:\Temp\GeoMap>dir locations.js
Volume in drive D is Data
Volume Serial Number is 0EBF-9720
Directory of D:\Temp\GeoMap
File Not Found
D:\Temp\GeoMap>
Hence the file is actually created, but removed when the script terminates.
What do I need to do the keep the file?
Problem was solved by changing firewall settings.
I'm trying to make a program which finds a word into some .rtf files
e.g : I have some text edit files with all the words which starts with all the characters(WordsWhichStartsWithA.rtf, WordsWhichStartsWithB.rtf, WordsWhichStartsWithC.rtf etc), and when I give a word(e.g : "help") I want to return the file where the given word is (in this example "help" will be in WordsWhichStartsWithH.rtf file, and the program will return WordsWhichStartsWithH.rtf NOT WordsWhichStartsWithW.rtf where "whelp" is, I want to find just the word, not the strings which contains the given word)
Here is what I have so far :
import os
for fname in os.listdir('/Users/andreivaran/Desktop/Info'):
if os.path.isfile(fname):
#print fname
with open(fname) as f:
for line in f:
if 'help' in line:
print fname
break
I've printed the fname before the first if checks if the path is file to find out if the code is checking all the text files and I get this
>>>.DS_Store
WordsWhichStartsWithA.rtf
It's returning that .DS_Store which I can't even see in the folder and it's not checking all the .rtf files, it stops after the first one!
Forgot to mention that the directory where the .rtf files are, is different from the directory where the python file is.
Thank you !
I am working on writing a ruby script to iterate through a file containing a list of file paths to open in Microsoft Excel. I read the file like this:
file_names = IO.readlines('D:\TEST_1\file_names.txt')
Next, I create an array of file names from each line of the parsed file (thus containing an array of file paths). Finally, I loop through that array with the following code, to open the documents:
require 'win32ole'
xl = WIN32OLE.new('Excel.Application')
xl.Visible = 1
file_names.each do |file_name|
wb1=xl.Workbooks.Open(file_name)
ws1=wb1.worksheets(1)
end
That first call to parse file_names.txt produces this exception, which I am having difficulty understanding:
Test4.rb:6:in 'method_missing'
OLE error code:800A03EC in Microsoft Office
Excel 'D:\Test_1\1.xlsx' couldnot be found. Check the spelling of
the file name, and verify that the file location is correct.
if you are trying to open the file from your list most recently used
files, make sure that the file has not been renamed, moved or deleted
HR Error code : 0x80020009 Exception occurred. from Test4.rb:6:in
'block in ' from Test4.rb:5:in 'each' from Test4.rb:5:in
''
This error does not appear when I pass a single file name (instead of a file path) as my parameter - so why do I get it here? Any help would be much appreciated.
At first look you are not using the variable "file_name" but a symbol :file_name.
file_array.each do |file_name|
wb1=xl.Workbooks.Open(file_name)
ws1=wb1.worksheets(1)
end
Hi I'm trying to read a pdf in Ruby, first of all I want to convert it into a txt. path is the path to the PDF, The point is that I get a .txt file empty, and as someone told me is a pdftotext problem, but I don't know how to fix it.
spec = path.sub(/\.pdf$/, '')
`pdftotext #{spec}.pdf`
file = File.new("#{spec}.txt", "w+")
text = []
file.readlines.each do |l|
if l.length > 0
text << l
Rails.logger.info l
end
end
file.close
What's wrong with my code? Thanks!
It's not possible to extract text from every PDF. Some PDF files use a font encoding that makes it impossible to extract text with simple tools such as pdftotext (and some PDF files are even completely immune to direct text extraction with any tool known to me -- in these cases you'll have to apply OCR first to have a chance to extract text...).
So if you test your code with the same "weird" PDF file all the time, it may well happen that you're getting frustrated over your code while in reality the fault lies with the PDF.
First make sure that the commandline usage of pdftotxt works well with a given PDF, then test (and develop further) your code with that PDF.
The problem is you are opening the file in write ("w") mode, whuch truncates the file. You can see a table of file modes and what they mean at http://ruby-doc.org/core-1.9.3/IO.html.
Try something like this, it uses a pdftotext option to send the text to stdout to avoid creating a temporary file and uses blocks for more idiomatic ruby.
text = `pdftotext #{path} -`
text.split.select { |line|
line.length > 0
}.each { |line|
Rails.logger.info(line)
}
You would need to open the txt file with write permission.
file = File.new("#{spec}.txt", "w")
You could consult How to create a file in Ruby
Update: your code is not complete and looks buggy.
Cant say what is path
Looks like you are trying to read the text file to which you intend to write file.readlines.each
spell check length you have it l.lenght
You may want to paste the actual code.
Check this gist https://gist.github.com/4160587
As mentioned, your code is not working because you are reading and writing to the same file.
Example
Ruby code file_write.rb to do the file write operation
pdf_file = File.open("in.txt")
output_file = File.open("out.txt", "w") # file to which you want to write
#iterate over input file and write the content to output file
pdf_file.readlines.each do |l|
output_file.puts(l)
end
output_file.close
pdf_file.close
Sample txt file in.txt
Some text in file
Another line of text
1. Line 1
2. Not really line 2
Once your run file_write.rb you should see new file called out.txt with same content as in.txt You could change the content of input file if you want. In your case you would use pdf reader to get the content and write it to the text file. Basically first line of the code will change.
I am really new to Ruby and could use some help with a program. I need to open a zip file that contains multiple text files that has many rows of data (eg.)
CDI|3|3|20100515000000|20100515153000|2008|XXXXX4791|0.00|0.00
CDI|3|3|20100515000000|20100515153000|2008|XXXXX5648|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX3276|0.00|0.00
CHO|3|3|20100515000000|20100515153000|2114|XXXXX4342|0.00|0.00
MITR|3|3|20100515000000|20100515153000|0000|XXXXX7832|0.00|0.00
HR|3|3|20100515000000|20100515153000|1114|XXXXX0238|0.00|0.00
I first need to extract the zip file, read the text files located in the zip file and write only the complete rows that start with (CDI and CHO) to two output files, one for the rows of data starting with CDI and one for the rows of data starting with CHO (basically parsing the file). I have to do it with Ruby and possibly try to set the program to an auto function for arrival of continuous zip files of the same stature. I completely appreciate any advice, direction or help via some sample anyone can give.
One means is using the ZipFile library.
require 'zip/zip'
# To open the zip file and pass each entry to a block
Zip::ZipFile.foreach(path_to_zip) do |text_file|
# Read from entry, turn String into Array, and pass to block
text_file.read.split("\n").each do |line|
if line.start_with?("CDI") || line.start_with?("CHO")
# Do something
end
end
end
I'm not sure if I entirely follow your question. For starters, if you're looking to unzip files using Ruby, check out this question. Once you've got the file unzipped to a readable format, you can try something along these lines to print to the two separate outputs:
cdi_output = File.open("cdiout.txt", "a") # Open an output file for CDI
cho_output = File.open("choout.txt", "a") # Open an output file for CHO
File.open("text.txt", "r") do |f| # Open the input file
while line = f.gets # Read each line in the input
cdi_output.puts line if /^CDI/ =~ line # Print if line starts with CDI
cho_output.puts line if /^CHO/ =~ line # Print if line starts with CHO
end
end
cdi_output.close # Close cdi_output file
cho_output.close # Close cho_output file