Opening same file names in multiple directories using Ruby - ruby

Looking to do the following using Ruby:
Directory A and Directory B will have same number of XML files and also same filenames.
Step 1:
* Go into directory A (directory A has X number of XML files)
* Inside directory A, take the first XML file and save the file name and also open the file
Step 2:
* Go into directory B (directory B will have same number of XML files with the same filenames as directory A)
* Inside directory B, open the same XML filename that was saved and opened in directory A.
Step 3: (I've already completed this part) **
* Compare the two files (I've already completed this part)**
Step 4:
* Repeat this for ALL XML files in both directories.
I've tried a few things, but for some reason the loop is happening for each file and not once, also the second loop for Dir B is not executing:
id_dir = "#{Dir.pwd}"+"/id_responses"
ht_dir = "#{Dir.pwd}"+"/ht_responses"
Dir.foreach(id_dir) do |id_file|
next if id_file == '.' or id_file == '..'
id_file = File.open("#{id_dir}/#{id_file}", 'r')
doc1 = Nokogiri::XML::Document.parse(File.open(id_file))
Dir["#{Dir.pwd}"+"/ht_responses/#{id_file}"].each do |ht_file|
next if id_file == '.' or id_file == '..'
doc2 = Nokogiri::XML::Document.parse(File.open(ht_file))
end
end

No need to loop through the other dir, just see if a file with the same name exists there.
id_dir = "#{Dir.pwd}"+"/id_responses"
ht_dir = "#{Dir.pwd}"+"/ht_responses"
Dir.foreach(id_dir) do |id_file|
next if id_file == '.' or id_file == '..'
id_file_path = File.join(id_dir, id_file)
ht_file_path = File.join(ht_dir, id_file)
next unless File.exist?(ht_file_path)
doc1 = Nokogiri::XML::Document.parse(File.open(id_file_path, 'r'))
doc2 = Nokogiri::XML::Document.parse(File.open(ht_file_path, 'r'))
end

Related

Open csv from subdirectories with partially unknown name and save all csv in one big file

I have a bunch of files in different subfolders of the root folder. I want to open all the files with the name 'NBack' AND '.csv' extension but not containing the letter 'X'. Then I want to add two columns in each files and merge/concatenate all concerned files into one big file.
I created so far this code, but for some reason it runs an eternity and seems to process the same files again and again (but not sure on this point). At the end I don't have a concatenated file but only one single file
for root, folders, files in os.walk(path):
for f in files:
filteredResults = [f for f in files if not "X" in f] #exlude files with the letter 'X'
for ff in filteredResults:
dd = [ff for ff in filteredResults if ff.endswith('.csv')] #among remaining files, keep the .csv files
for g in dd:
r = [g for g in dd if 'NBack' in g] #among those, keep those containing 'NBack'
a = pd.DataFrame() #empty dataset for the new big dataset
for i in r:
o = [i for i in r if not '.pdf' in i] #exclude .pdf's (for some reason including only .csv didn't work well enough).
appended = [] #necessary to append files before concatenating them????
for ii in o: #for the final set of files
p = os.path.join(root, ii)
data = pd.read_csv(p) #open .csv with specified characteristics in each subdirectory
split = ii.split("_") #split file name to get additional information
data['Run']=split[3] #add this information as a new column
data['IDcheck']=split[0] #add this information as a new column
appended.append(data) #necessary to apprend? creates a list of files
a = pd.concat([data]) #should create one big file but the variable a just contains one file
I would be happy for any comment or suggestion what to try.... where is the error...
This code works for me, sharing it if ever someone has a similar question:
os.chdir(r'C:\Users\...')
rootdir = os.getcwd()
paths = []
df = pd.DataFrame()
for root, _, files in os.walk(rootdir):
for f in files:
path = root + "\\" + f
if ".csv" and "NBack" in path and not("X" in path):
splitt = f.split('_')
r = pd.read_csv(path)
r['Run'] = splitt[2]
r['IDcheck'] = splitt[0]
df = pd.concat([df, r])
Thanks Yasir for the help!

PYsimpleGUI create a listbox of folders

I am trying to modify the demoprogram from PYsimpleGUI (Browser_START_HERE_Demo_program_Browser.py) to:
manually select a main folder
list all the subfolders in that folder (but not the files inside them)
make it possible to select a few of those folders, and list them as an output.
I thought I'd do so by editting the code for getting the file list dic, but everything I tried, just makes it
Any ideas? I attached it:
'''def get_file_list_dict():
"""
Returns dictionary of files
Key is short filename
Value is the full filename and path
:return: Dictionary of demo files
:rtype: Dict[str:str]
"""
demo_path = get_demo_path()
demo_files_dict = {}
for dirname, dirnames, filenames in os.walk(demo_path):
for filename in filenames:
if filename.endswith('.py') or filename.endswith('.pyw'):
fname_full = os.path.join(dirname, filename)
if filename not in demo_files_dict.keys():
demo_files_dict[filename] = fname_full
else:
# Allow up to 100 dupicated names. After that, give up
for i in range(1, 100):
new_filename = f'{filename}_{i}'
if new_filename not in demo_files_dict:
demo_files_dict[new_filename] = fname_full
break
return demo_files_dict'''
It's much difficult for me to modify code of Browser_START_HERE_Demo_program_Browser.py to my requirements.
Assume the target is
Select a main directory by a button to call function sg.popup_get_folder
List all subdirectories under main directory in one sg.Listbox
Subdirectories selected shown in another sg.Listbox as output when click Add button
Example Code
from pathlib import Path
import PySimpleGUI as sg
font = ("Courier New", 11)
sg.theme("Dark")
sg.set_options(font=font)
subfolders = []
selected = []
frame_subholders = [[sg.Listbox(subfolders, size=(80, 10), key='Subfolders',
select_mode=sg.LISTBOX_SELECT_MODE_EXTENDED, enable_events=True,
highlight_background_color='blue', highlight_text_color='white')]]
frame_selected = [[sg.Listbox(selected, size=(80, 10), key='Selected')]]
layout = [
[sg.Input(readonly=True, expand_x=True, key='Main',
disabled_readonly_background_color=sg.theme_input_background_color()),
sg.Button("Main Folder")],
[sg.Frame("Subholder", frame_subholders)],
[sg.Frame("Selected subholder", frame_selected)],
[sg.Button('Add')],
]
window = sg.Window('Title', layout, finalize=True)
entry = window['Main'].Widget
input_size = entry.winfo_width()//sg.Text.char_width_in_pixels(font)
print(input_size)
while True:
event, values = window.read()
if event == sg.WINDOW_CLOSED:
break
elif event == 'Main Folder':
main_folder = sg.popup_get_folder("", no_window=True)
if main_folder and Path(main_folder).is_dir():
main_folder = main_folder.replace("/", '\\') # For Windows
half = input_size//2
text = main_folder if len(main_folder) <= input_size else main_folder[:half-3]+"..."+main_folder[-half:]
window['Main'].update(text)
subfolders = sorted([str(f) for f in Path(main_folder).iterdir() if f.is_dir()])
window['Subfolders'].update(values=subfolders)
selected = []
window['Selected'].update(values=selected)
elif event == 'Add':
selected = sorted([path for path in values['Subfolders']])
window['Selected'].update(values=selected)
window.close()

Python - Round Robin file move

I am trying to create a Python script that moves files in a round robin into a DIR that has the least amount of files in it so that the files are equally distributed for the source DIR to the two target DIR's.
For example:
If c:\test contains:
test_1.txt
test_2.txt
test_3.txt
test_4.txt
I want these test_1.txt and test_3.txt to be moved to c:\test\dir_a and test_2.txt and test_4.tx to be moved to c:\test\dir_b.
I have been able to successfully do this in Ruby, however when i try to do this in Python when the script runs it moves all the files into the DIR with the least least amount of files in it instead of distributing them in a round robin.
Here is my Ruby example:
require 'fileutils'
def check_file
watchfolder_1 = 'F:/Transcoder/testing/dir_a/'
watchfolder_2 = 'F:/Transcoder/testing/dir_b/'
if !Dir.glob('F:/Transcoder/testing/prep/*.txt').empty?
Dir['F:/Transcoder/testing/prep/*.txt'].each do |f|
node_1 = Dir["#{watchfolder_1}"+'*']
node_2 = Dir["#{watchfolder_2}"+'*']
nc_1 = node_1.count
nc_2 = node_2.count
loadmin =[nc_1,nc_2].min
#puts loadmin
if loadmin == nc_1
FileUtils.mv Dir.glob("#{f}"), watchfolder_1
puts "#{f} moved to DIR A"
elsif loadmin == nc_2
FileUtils.mv Dir.glob("#{f}"), watchfolder_2
puts "#{f} moved to DIR B"
end
puts 'Files successfully moved to staging area.'
end
else
puts 'No valid files found'
end
end
check_file
This outputs the following:
C:\Ruby22-x64\bin\ruby.exe -e $stdout.sync=true;$stderr.sync=true;load($0=ARGV.shift)
F:/ruby/transcode_engine/test.rb
F:/Transcoder/testing/prep/test_1.txt moved to DIR A
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_2.txt moved to DIR B
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_3.txt moved to DIR A
Files successfully moved to staging area.
F:/Transcoder/testing/prep/test_4.txt moved to DIR B
Files successfully moved to staging area.
The files move as I want them to.
Now here is my Python script:
import shutil
from glob import glob
import os.path
dir_a = os.listdir('F:\\Transcoder\\testing\\dir_a\\')
dir_b = os.listdir('F:\\Transcoder\\testing\\dir_b\\')
t_a = 'F:\\Transcoder\\testing\\dir_a\\'
t_b = 'F:\\Transcoder\\testing\\dir_b\\'
if os.listdir('F:\\Transcoder\\testing\\prep\\'):
prep = glob('F:\\Transcoder\\testing\\prep\\*.txt')
for file in prep:
ac = len(dir_a)
bc = len(dir_b)
load = [ac, bc]
if min(load) == ac:
print('Moving' + file + 'to DIR A')
shutil.move(file, t_a)
elif min(load) == bc:
print('Moving' + file + 'to DIR B')
shutil.move(file, t_b)
else:
print('No Files')
This script returns this:
C:\Users\3A01\AppData\Local\Programs\Python\Python35-32\python.exe
F:/Projects/python_transcoder/test_2.py
Moving F:\Transcoder\testing\prep\test_1.txt to DIR A
Moving F:\Transcoder\testing\prep\test_2.txt to DIR A
Moving F:\Transcoder\testing\prep\test_3.txt to DIR A
Moving F:\Transcoder\testing\prep\test_4.txt to DIR A
Where am I going wrong with the Python script, why is it not moving the files in a round robin?
dir_a and dir_b are computed at the start of your script so the load is always identical even if you move files in your loop.
Move this in your for loop:
dir_a = os.listdir(r'F:\Transcoder\testing\dir_a')
dir_b = os.listdir(r'F:\Transcoder\testing\dir_b')
fox proposal (with some other small fixes as well, like not repeating paths and using "raw" prefix (r"the\data") to avoid escaping the antislashes.
import shutil
from glob import glob
import os.path
t_a = r'F:\Transcoder\testing\dir_a'
t_b = r'F:\Transcoder\testing\dir_b'
prep = glob('F:\\Transcoder\\testing\\prep\\*.txt')
if prep:
for file in prep:
dir_a = os.listdir(t_a)
dir_b = os.listdir(t_b)
ac = len(dir_a)
bc = len(dir_b)
load = [ac, bc]
if min(load) == ac:
print('Moving' + file + 'to DIR A')
shutil.move(file, t_a)
else:
print('Moving' + file + 'to DIR B')
shutil.move(file, t_b)
else:
print('No Files')

A pythonic way of finding folder

What's the most pythonic way of finding the child folder from a supplied path?
import os
def get_folder(f, h):
pathList = f.split(os.sep)
sourceList = h.split(os.sep)
src = set(sourceList)
folderList = [x for x in pathList if x not in src]
return folderList[0]
print get_folder("C:\\temp\\folder1\\folder2\\file.txt", "C:\\temp") # "folder1" correct
print get_folder("C:\\temp\\folder1\\file.txt", "C:\\temp") # "folder1" correct
print get_folder("C:\\temp\\file.txt", "C:\\temp") # "file.txt" fail should be "temp"
In the example above I have a file.txt in "folder 2". The path "C:\temp" is supplied as the start point to look from.
I want to return the child folder from it; in the event that the file in question is in the source folder it should return the source folder.
Try this. I wasn't sure why you said folder1 is correct for the first example, isn't it folder2? I am also on a Mac so os.sep didn't work for me but you can adapt this.
import os
def get_folder(f, h):
pathList = f.split("\\")
previous = None
for index, obj in enumerate(pathList):
if obj == h:
if index > 0:
previous = pathList[index - 1]
return previous
print get_folder("C:\\temp\\folder1\\folder2\\file.txt", "file.txt") # "folder2" correct
print get_folder("C:\\temp\\folder1\\file.txt", "file.txt") # "folder1" correct
print get_folder("C:\\temp\\file.txt", "file.txt") # "file.txt" fail should be "temp"

How to batch rename images in several folders?

I have 600 images (e.g. images_0, images_1, images_2, ..., images_599) which are saved in 12 folders (e.g. dataset_1, dataset_2, dataset_3, ..., dataset_12).
I am currently using this code to rename images:
mainDirectory = 'C:\Users\Desktop\data';
subDirectory = dir([mainDirectory '/dataset_*']);
for m = 1 : length(subDirectory)
subFolder = dir(fullfile(mainDirectory, subDirectory(m).name, '*.png'));
fileNames = {subFolder.name};
for iFile = 1 : numel( subFolder )
newName = fullfile(mainDirectory, subDirectory(m).name, sprintf('%00d.png', iFile));
movefile(fullfile(mainDirectory, subDirectory(m).name, fileNames{iFile}), newName);
end
end
This code works well but I want to change the format of newName to the following: number-of-dataset_name-of-image (e.g. 1_images_0, 1_images_1, 2_images_0, 2_images_1, etc.). How can I make this change to newName?
You can first split your folder name to get the 1 to 12 number
str = strsplit('dataset_12', '_'); % split along '_'
The folder number will be in str{2}.
Then concatenate this piece of information with
new_name = [str{2} '_' original_image_name]
where original_image_name is the original image name (!) - or use alternatively sprintfas you already did.

Resources