Ruby csv for each - clean up characters? - ruby

I have the following code which reads each line of a csv and cleans up each row. The rows are all path\ file name directories. I am having an issue where the script cannot find a path\file because the file name has a - in it. The - (dash) is read by ruby as \x96 . Does anyone know how to get it to not do that, and to read the - as a dash?
This is what I have, but it is not working:
CSV.foreach("#{batch_File_Dir_sdata}") do |ln|
line_number += 1
pathline = ln.to_s
log_linemsg = "Source #{line_number}= #{pathline}"
log_line = ["#{$cname}","#{log_linemsg}","","",]
puts log_linemsg
insert_logitems(connection, table_namelog, log_line)
if pathline.include?("\\")
cleanpath = pathline.gsub!("\\\\","\\")
#cleanpath = cleanpath.gsub!("[","")
#cleanpath = cleanpath.gsub!("]","")
cleanpath.gsub!("\"","")
#THIS IS THE LINE WHERE I AM TRYING TO FIX THE ISSUE
cleanpath.gsub!("\\x96","\-")
cleanpath.slice!(0)
cleanpath.chop!
#puts "Clean path - has backslash\n#{cleanpath}"
else
cleanpath = pathline
#puts "#{cleanpath}"
#puts "Clean path - has NO backslash\n#{cleanpath}"
end
Any help would be greatly appreciated.

Related

Python - Round Robin file move

I am trying to create a Python script that moves files in a round robin into a DIR that has the least amount of files in it so that the files are equally distributed for the source DIR to the two target DIR's.
For example:
If c:\test contains:
test_1.txt
test_2.txt
test_3.txt
test_4.txt
I want these test_1.txt and test_3.txt to be moved to c:\test\dir_a and test_2.txt and test_4.tx to be moved to c:\test\dir_b.
I have been able to successfully do this in Ruby, however when i try to do this in Python when the script runs it moves all the files into the DIR with the least least amount of files in it instead of distributing them in a round robin.
Here is my Ruby example:
require 'fileutils'
def check_file
watchfolder_1 = 'F:/Transcoder/testing/dir_a/'
watchfolder_2 = 'F:/Transcoder/testing/dir_b/'
if !Dir.glob('F:/Transcoder/testing/prep/*.txt').empty?
Dir['F:/Transcoder/testing/prep/*.txt'].each do |f|
node_1 = Dir["#{watchfolder_1}"+'*']
node_2 = Dir["#{watchfolder_2}"+'*']
nc_1 = node_1.count
nc_2 = node_2.count
loadmin =[nc_1,nc_2].min
#puts loadmin
if loadmin == nc_1
FileUtils.mv Dir.glob("#{f}"), watchfolder_1
puts "#{f} moved to DIR A"
elsif loadmin == nc_2
FileUtils.mv Dir.glob("#{f}"), watchfolder_2
puts "#{f} moved to DIR B"
end
puts 'Files successfully moved to staging area.'
end
else
puts 'No valid files found'
end
end
check_file
This outputs the following:
C:\Ruby22-x64\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift)
F:/ruby/transcode_engine/test.rb
F:/Transcoder/testing/prep/test_1.txt moved to DIR A
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_2.txt moved to DIR B
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_3.txt moved to DIR A
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_4.txt moved to DIR B
Files successfully moved to staging area.
The files move as I want them to.
Now here is my Python script:
import shutil
from glob import glob
import os.path
dir_a = os.listdir('F:\\Transcoder\\testing\\dir_a\\')
dir_b = os.listdir('F:\\Transcoder\\testing\\dir_b\\')
t_a = 'F:\\Transcoder\\testing\\dir_a\\'
t_b = 'F:\\Transcoder\\testing\\dir_b\\'
if os.listdir('F:\\Transcoder\\testing\\prep\\'):
prep = glob('F:\\Transcoder\\testing\\prep\\*.txt')
for file in prep:
ac = len(dir_a)
bc = len(dir_b)
load = [ac, bc]
if min(load) == ac:
print('Moving' + file + 'to DIR A')
shutil.move(file, t_a)
elif min(load) == bc:
print('Moving' + file + 'to DIR B')
shutil.move(file, t_b)
else:
print('No Files')
This script returns this:
C:\Users\3A01\AppData\Local\Programs\Python\Python35-32\python.exe
F:/Projects/python_transcoder/test_2.py
Moving F:\Transcoder\testing\prep\test_1.txt to DIR A
Moving F:\Transcoder\testing\prep\test_2.txt to DIR A
Moving F:\Transcoder\testing\prep\test_3.txt to DIR A
Moving F:\Transcoder\testing\prep\test_4.txt to DIR A
Where am I going wrong with the Python script, why is it not moving the files in a round robin?
dir_a and dir_b are computed at the start of your script so the load is always identical even if you move files in your loop.
Move this in your for loop:
dir_a = os.listdir(r'F:\Transcoder\testing\dir_a')
dir_b = os.listdir(r'F:\Transcoder\testing\dir_b')
fox proposal (with some other small fixes as well, like not repeating paths and using "raw" prefix (r"the\data") to avoid escaping the antislashes.
import shutil
from glob import glob
import os.path
t_a = r'F:\Transcoder\testing\dir_a'
t_b = r'F:\Transcoder\testing\dir_b'
prep = glob('F:\\Transcoder\\testing\\prep\\*.txt')
if prep:
for file in prep:
dir_a = os.listdir(t_a)
dir_b = os.listdir(t_b)
ac = len(dir_a)
bc = len(dir_b)
load = [ac, bc]
if min(load) == ac:
print('Moving' + file + 'to DIR A')
shutil.move(file, t_a)
else:
print('Moving' + file + 'to DIR B')
shutil.move(file, t_b)
else:
print('No Files')

A pythonic way of finding folder

What's the most pythonic way of finding the child folder from a supplied path?
import os
def get_folder(f, h):
pathList = f.split(os.sep)
sourceList = h.split(os.sep)
src = set(sourceList)
folderList = [x for x in pathList if x not in src]
return folderList[0]
print get_folder("C:\\temp\\folder1\\folder2\\file.txt", "C:\\temp") # "folder1" correct
print get_folder("C:\\temp\\folder1\\file.txt", "C:\\temp") # "folder1" correct
print get_folder("C:\\temp\\file.txt", "C:\\temp") # "file.txt" fail should be "temp"
In the example above I have a file.txt in "folder 2". The path "C:\temp" is supplied as the start point to look from.
I want to return the child folder from it; in the event that the file in question is in the source folder it should return the source folder.
Try this. I wasn't sure why you said folder1 is correct for the first example, isn't it folder2? I am also on a Mac so os.sep didn't work for me but you can adapt this.
import os
def get_folder(f, h):
pathList = f.split("\\")
previous = None
for index, obj in enumerate(pathList):
if obj == h:
if index > 0:
previous = pathList[index - 1]
return previous
print get_folder("C:\\temp\\folder1\\folder2\\file.txt", "file.txt") # "folder2" correct
print get_folder("C:\\temp\\folder1\\file.txt", "file.txt") # "folder1" correct
print get_folder("C:\\temp\\file.txt", "file.txt") # "file.txt" fail should be "temp"

Why did the order of my script give a Divide by Zero error?

I'm working on some beginner Python exercises. I have the following, working code:
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
inp=fh.readlines()
count=0
total=0.0
for line in inp:
line=line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
value=line[19:]
value=float(value)
count=count+1
total=total + value
print "Average spam confidence:",total/count
When I first wrote this, I put the "count" line before the "value" line like this:
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
inp=fh.readlines()
count=0
total=0.0
for line in inp:
line=line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
count=count+1
value=line[19:]
value=float(value)
total=total + value
print "Average spam confidence:",total/count
This resulted in a divide by zero error. Why?

Ruby XML Reading from one XML and parsing into another

XPath.each( xmldoc, "//speech/speaking") do |element|
# puts element.attributes['name']
# puts element.text
File.open(file_name + "_" + element.attributes['name'] + "-" + year + ".xml", 'a+') do |f|
f.write("<speaker>" + element.attributes['name'] + "</speaker>")
f.write("<speech>" + doc.xpath('//speech/speaking').text + "</speech>" + "\n")
end
end
Hello stackoverflow I am looking for help solving a logic issue I am having with XML files. The above code creates a file with the "speakers" name and then it should place what the speaker says into that file.
The problem that I am running into is that it places ALL of the speakers into the same file. So I am thinking the problem lies here:
f.write("<speech>" + doc.xpath('//speech/speaking').text + "</speech>" + "\n")
I am hoping that someone has a better way of doing this, but the idea would be to change the above code to:
doc.xpath('//speech/speaking').text WHERE speaker == element.attributes['name']
Ultimately I would like to have each speaker in their own XML file with their own speeches.
<speaking name="Mr. FAZIO">I appreciate my friend yielding.</speaking>
The above is a sample from the XML file.
The xpath you are looking for is:
doc.xpath("//speech/speaking[#name='#{element.attributes['name']}']").text
see XPath to select Element by attribute value

test if a PDF file is finished in Ruby (on Solaris/Unix)?

i have a server, that generates or copies PDF-Files to a specific folder.
i wrote a ruby script (my first ever), that regularily checks for own PDF-files and displayes them with acrobat. So simple so nice.
But now I have the Problem: how to detect the PDF is complete?
The generated PDF ends with %%EOF\n
but the copied ones are generated with some Apple-Magic (Acrobat Writer I think), that has an %%EOF near the beginning of the File, lots of binary Zeros and another %%EOF near the end with a carriage return (or line feed) and a binary zero at the end.
while true
dir = readpfad
Dir.foreach(dir) do |f|
datei = File.join(dir, f)
if File.file?(datei)
if File.stat(datei).owned?
if datei[-9..-1].upcase == "__PDF.PDF"
if File.stat(datei).size > 5
test = File.new(datei)
dummy = test.readlines
if dummy[-1][0..4] == "%%EOF"
#move the file, so it will not be shown again
cmd = "mv " + datei + " " + movepfad
system(cmd)
acro = ACROREAD + " " + File.join(movepfad, f) + "&"
system(acro)
else
puts ">>>" + dummy[-1] + "<<<"
end
end
end
end
end
end
sleep 1
end
Any help or idea?
Thanks
Peter
All the %%EOF token means is that there should be one within the last 1024 bytes of the physical end of file. The structure of PDF is such that a PDF document may have 1 or more %%EOF tokens within it (the details are in the spec).
As such, "contains %%EOF" is not equivalent to "completely copied". Really, the correct answer is that the server should signal when it's done and your code should be a client of that signal. In general, polling -- especially IO bound polling is the wrong answer to this problem.

Resources