I have written a bit of code in python 2.7 to pull a list of files from a given directory and output to a csv file. Fairly simple and for the most part it works great. I am using this code to get the file names and add them to a list. Then I will print the list once gathered. There are command line options to determine which extensions to use. This is working fine.
for root, dirnames, filenames in os.walk(lines):
if searchtype == 'all':
for filename in fnmatch.filter(filenames, '*'):
matches.append(os.path.join(root, filename))
elif searchtype == 'audio':
for extensions in audio_ext:
for filename in fnmatch.filter(filenames, extensions):
matches.append(os.path.join(root, filename))
elif searchtype == 'video':
for extensions in video_ext:
for filename in fnmatch.filter(filenames, extensions):
matches.append(os.path.join(root, filename))
The issue occurs when I attempt to get the modified date before printing using this.
for entries in matches:
mod_date = datetime.datetime.fromtimestamp(os.path.getmtime(entries)).\
strftime ('%Y-%m-%d %H:%M:%S')
This works for some files and then errors out with error code 2.
[Error 2] The system cannot find the file specified: 'E:\\Mp3s\\Artists_A-D\\Beatles, The\\Anthology 1\\60 - Kansas City Hey-Hey-Hey-Hey!.mp3'
The file as printed from the list (matches) is as:
E:\Mp3s\Artists_A-D\Beatles, The\Anthology 1\60 - Kansas City Hey-Hey-Hey-Hey!.mp3
Now the file is there for sure and if I skip doing the modified date and just print out the file names there is no issues. It rolls through 50k files without issues. so I am bit stumped. At first I thought it was the ! messing up the path but it does not seem to be the case since it prints fine without the mod date. I even updated the mod date to see if that was it, still no joy. I am still fairly new to Python so any thoughts?
Related
So I'm trying to convert some files with script I have that includes wave and os module.
For some reason, the script wouldn't work when entering absolute path of a file, so I decided to do a little testing and wrote this code:
import os
print os.path.exists("C:\Users\mavri\Desktop\Bubnjevi\Bass-Drum-1.wav")
print os.path.exists("C:\Users\mavri\Desktop\proba\testni.wav")
As you can see on screenshots I provided below, both files exist, I rechecked path a couple of times just to be sure, but I get True for the first statement and False for the second one.
Screenshot 1: (proof that the first file exists) http://prntscr.com/ekns86
Screenshot 2: (proof that the second file exists) http://prntscr.com/eknr2n
Sidenote: It returns True for Bass-Drum-1.wav file, while it returns False for testni.wav
I have about 150 .xls and .xlsx files that I need converting into tab-delimited. I tried using automator, but I was only able to do it one-by-one. It's definitely faster than opening up each one individually, though. I have very little scripting knowledge, so I would appreciate a way to do this as painlessly as possible.
If you would be prepared to use Python for this I have written a script that converts Excel spreadsheets to csv files. The code is available in Pastebin.
You would just need to change the following line:
writer = csv.writer(fileout)
to:
writer = csv.writer(fileout, delimiter="\t")
to make the output file tab delimited rather than the standard comma delimited.
As it stands this script prompts you for files one at a time (allows you to select from a dialogue), but it could easily be adapted to pick up all of the Excel files in a given directory tree or where the names match a given pattern.
If you give this a try with an individual file first and let me know how you get on, I can help with the changes to automate the rest if you like.
UPDATE
Here is a wrapper script you could use:
#!/usr/bin/python
import os, sys, traceback
sys.path.insert(0,os.getenv('py'))
import excel_to_csv
def main():
# drop out if no arg for excel dir
if len(sys.argv) < 2:
print 'Usage: Python xl_csv_wrapper <path_to_excel_files>'
sys.exit(1)
else:
xl_path = sys.argv[1]
xl_files = os.listdir(xl_path)
valid_ext = ['.xls', '.xlsx', '.xlsm']
# loop through files in path
for f in xl_files:
f_name, ext = os.path.splitext(f)
if ext.lower() in valid_ext:
try:
print 'arg1:', os.path.join(xl_path,f)
print 'arg2:', os.path.join(xl_path,f_name+'.csv')
excel_to_csv.xl_to_csv(os.path.join(xl_path,f),
os.path.join(xl_path,f_name+'.csv'))
except:
print '** Failed to convert file:', f, '**'
exc_type, exc_value, exc_traceback = sys.exc_info()
lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
for line in lines:
print '!!', line
else:
print 'Sucessfully conveted', f, 'to .csv'
if __name__ == '__main__':
main()
You will need to replace the :
sys.path.insert(0,os.getenv('py'))
At the top with an absolute path to the excel_to_csv script or an environment variable on your system.
Use VBA in a control workbook to loop through the source workbooks in a specified directory or a list of workbooks, opening each, saving out the converted data, then closing each in turn.
I am struggeling to read an *.xls file into R:
I did the following:
I set my working directory to the *.xls file and then:
> library(gdata) # load the gdata package
> mydata = read.xls("comprice.xls", sheet=1, verbose=FALSE)
Mistake in findPerl(verbose = verbose) : perl executable not found. Use perl= argument to specify the correct path. mistake in file.exists(tfn) : unknown 'file' argument
However, my path is correct and there is the file! Whats wrong?
UPDATE
I have installed it already, however now I get: Exception: cannot find function "read.xls"...
This error message means that perl is not installed on your computer or it is not set on your path.
If the perl is installed then you can put argument perl= inside read.xls() function.
read.xls(xlsfile, perl="C:/perl/bin/perl.exe")
As an alternative, you could try xlsxpackage:
read.xlsx("comprice.xls", 1) reads your file and makes the data.frame column classes nearly useful, but is very slow for large data sets.
read.xlsx2("comprice.xls", 1) is faster, but you'll have to define column classes manually. If you run the command twice, you will not need to count columns so much:
data <- read.xlsx2("comprice.xls", 1)
data <- read.xlsx2("comprice.xls", 1, colClasses= rep("numeric", ncol(data)))
Perl is either not installed or cannot be found. You can either install it, or specify the path where it is installed using
perl='path of perl installation'
in the call.
I tried to append the current day and time to the existing file name in shell scripting and I found my command is not working as expected.
For example, if my file name is f1.log and I nees to append it along with current time. This appended version must be used for further processing of the file.
I tried with the following script but getting an error
now=$(date +"%m-%d-%Y/%T")
echo hi >>time.log
mv "time.log" "time.$now.log" (error here : file or directory not found)
echo hello >> time.log$now (have to continue processing with new file)
You cannot have a / character in a filename. The mv command is looking for a directory named with the minute, day, and year of the output of date and trying to create a file named by the time. Just change your format to not include / in the filename.
The problem is with shell's interpertation of / in your date +"%m-%d-%Y/%T".
Change it to a - instead (or something else, as long as it's not / or another meta character that will make the files difficult to work with in the future)
I'm trying to crawl FTP and pull down all the files recursively.
Up until now I was trying to pull down a directory with
ftp.list.each do |entry|
if entry.split(/\s+/)[0][0, 1] == "d"
out[:dirs] << entry.split.last unless black_dirs.include? entry.split.last
else
out[:files] << entry.split.last unless black_files.include? entry.split.last
end
But turns out, if you split the list up until last space, filenames and directories with spaces are fetched wrong.
Need a little help on the logic here.
You can avoid recursion if you list all files at once
files = ftp.nlst('**/*.*')
Directories are not included in the list but the full ftp path is still available in the name.
EDIT
I'm assuming that each file name contains a dot and directory names don't. Thanks for mentioning #Niklas B.
There are a huge variety of FTP servers around.
We have clients who use some obscure proprietary, Windows-based servers and the file listing returned by them look completely different from Linux versions.
So what I ended up doing is for each file/directory entry I try changing directory into it and if this doesn't work - consider it a file :)
The following method is "bullet proof":
# Checks if the give file_name is actually a file.
def is_ftp_file?(ftp, file_name)
ftp.chdir(file_name)
ftp.chdir('..')
false
rescue
true
end
file_names = ftp.nlst.select {|fname| is_ftp_file?(ftp, fname)}
Works like a charm, but please note: if the FTP directory has tons of files in it - this method takes a while to traverse all of them.
You can also use a regular expression. I put one together. Please verify if it works for you as well as I don't know it your dir listing look different. You have to use Ruby 1.9 btw.
reg = /^(?<type>.{1})(?<mode>\S+)\s+(?<number>\d+)\s+(?<owner>\S+)\s+(?<group>\S+)\s+(?<size>\d+)\s+(?<mod_time>.{12})\s+(?<path>.+)$/
match = entry.match(reg)
You are able to access the elements by name then
match[:type] contains a 'd' if it's a directory, a space if it's a file.
All the other elements are there as well. Most importantly match[:path].
Assuming that the FTP server returns Unix-like file listings, the following code works. At least for me.
regex = /^d[r|w|x|-]+\s+[0-9]\s+\S+\s+\S+\s+\d+\s+\w+\s+\d+\s+[\d|:]+\s(.+)/
ftp.ls.each do |line|
if dir = line.match(regex)
puts dir[1]
end
end
dir[1] contains the name of the directory (given that the inspected line actually represents a directory).
As #Alex pointed out, using patterns in filenames for this is hardly reliable. Directories CAN have dots in their names (.ssh for example), and listings can be very different on different servers.
His method works, but as he himself points out, takes too long.
I prefer using the .size method from Net::FTP.
It returns the size of a file, or throws an error if the file is a directory.
def item_is_file? (item)
ftp = Net::FTP.new(host, username, password)
begin
if ftp.size(item).is_a? Numeric
true
end
rescue Net::FTPPermError
return false
end
end
I'll add my solution to the mix...
Using ftp.nlst('**/*.*') did not work for me... server doesn't seem to support that ** syntax.
The chdir trick with a rescue seems expensive and hackish.
Assuming that all files have at least one char, a single period, and then an extension, I did a simple recursion.
def list_all_files(ftp, folder)
entries = ftp.nlst(folder)
file_regex = /.+\.{1}.*/
files = entries.select{|e| e.match(file_regex)}
subfolders = entries.reject{|e| e.match(file_regex)}
subfolders.each do |subfolder|
files += list_all_files(ftp, subfolder)
end
files
end
nlst seems to return the full path to whatever it finds non-recursively... so each time you get a listing, separate the files from the folders, and then process any folder you find recrsively. Collect all the file results.
To call, you can pass a starting folder
files = list_all_files(ftp, "my_starting_folder/my_sub_folder")
files = list_all_files(ftp, ".")
files = list_all_files(ftp, "")
files = list_all_files(ftp, nil)