Speed of directory look-up vs. formatted filename look-up - windows

I've had this discussion with my co-worker multiple times now, and I'm 99.9% sure that I'm correct, but they have been insisting that they are correct and I'm starting to wonder if I'm the crazy one.
We are uploading images taken from users from their mobile devices, cumulatively they could upload thousands given enough time. Each of these photos belong to a "work orders", which are given a sequential integer. We want to optimize for retrieval (based on the work order) rather than writing. We are also on a Windows machine.
My proposed storage method looks like this:
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
Their proposed storage method looks like this:
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
For me, in order to gather the 2 images for work order 23875, I would look in the directory, Images/23875 and grab all the .png files.
For them to do the same thing, they would iterate through all the files and run a wildcard filter on all the filenames, something to effect of 23875_*.png.
I believe my method to be superior because, in the case where there are, say, thousands of images, it doesn't need to run a wildcard filter on potentially thousands of irrelevant files. I've asked why they believe their method to be superior, but I haven't gotten a compelling answer.
Any advice is appreciated.

This method
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
requires iterating through every single file in Images to find all files that match 23875_*. Every single time you want to find them. Over and over. Until the world ends and the stars go dark.
Putting all the files in one directory discards information you have when you create the file, thus making the file harder to locate in the future. Trying to encode that information in the file name means the data is mixed in with all other similar data and therefore needs to be filtered out in the future.
Why? You're right - it makes no sense. It's tossing information in the garbage for no good reason.
Your method
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
has already partitioned the files into the required associations. No filtering or searching is needed to find the files.
they have been insisting that they are correct
Oooh-kay. Maybe they like this sort of wrestling...

Related

Find and copy multiple pictures using powershell

I got a list on excel with picture names I have to find, is it in anyway possible to add the list to powershell and find the pictures and copy them out into one folder?
The list is (about 1000)1310 pictures and there is a total of 44k pictures in aprox a ton of folders. I think maybe it was 500k folders.
Picture of how the image software have made the folder structure
Exact number of files and folders, the last 14k pictures are in another main folder and not relevant for the list
Your question is very broad, and I can only give a very general answer. This is clearly scriptable, but it might take a lot of learning and a lot of effort.
First, you might want to consider what the relationship is between the way pictures are named inthe excel sheet and the way the picture files are named in the folders.
If they follow the same naming rules, that gets one big problem out of the way.
Next, you need to learn how to copy and excel table onto a Csv file.
Then you need to learn how to use Import-csv and feed the stream into a pipeline.
Then you need to process the output of the pipeline to a foreach loop that contains a copy-item cmdlet.
If there is a single master folder that contains all the other folders that contain pictures, then you are in luck. Learn the -path, -recurse, and -include parameters.
Perhaps someone who has already dealt with the same problem can provide you with code. But it may not do what you really want.

Bulk Uploading to MediaWiki (Hierarchical Structure)

I have lots of Markdown files each contained in a folder with the same name as Markdown file. I use Pandoc to generate the MediaWiki file in Rendered folder.
For Example
ComputerScience
|
ComputerScience.md
|
Rendered
| |
| ComputerScience.wiki
Image
| |
| Computer.png
Resource
|
Algorithms.pdf
Every Markdown file has its own folder which contains other folders like Image, Resource, which are linked in the markdown file. To explain my structure: let me call the above structure as ComputerScience Container. Each markdown file has this container. These containers are classified in a hierarchical way - Several such containers can exist in a Folder (which I call here as, SuperFolder). These SuperFolder can Contain another SuperFolder. For example (The markdown folders are mentioned as Container):
Computer Science
|
Computer Science Container
|
Algorithms
| |
| Algorithms Container
| |
| DataStructure Container
Architecture Container
In above Computer Science Super Folder consist of Containers as well as another SuperFolder called Algorithms.
How can I upload this kind of hierarchical structure into local mediawiki?
Also, I would like to edit Markdown files and generate the updated MediaWiki files. I hope to update the Mediawiki files using a script.
Any suggestions on how should I approach this?
If you want to learn a tool that's flexible and can be reused in the future, I'd look at pywikibot. If you just want a quick one-off that can be used from bash, and the wiki is local, use edit.php.

Testing File/Folder Navigation and Manipulation

I am working on a module that supplies methods for navigating directories and manipulating files. Basically it will be a combination of the Dir and File classes, with options specific to the needs of a project I'm working on.
Right now I have started writing tests for some of these methods and things are getting messy.
Example
One of the methods I have is a tree function that returns a hash of files and folders where you can pass options like tree(only: 'folders', limit: 3). In order to test that it only goes down 3 levels, I would have to have 4+ subfolders with dummy files in them.
The Problem
Right now I'm testing on folders outside the project since the subfolders are already there, but I want to move away from this, especially considering the implausibility of testing on system files once I start testing methods equivalent to rm -rf (as well as the lack of portability).
I'm starting to think that I need to create a "lab rat" type folder that I do all my "experiments" on, but I have no clue how to approach creating it.
Do I create a function that creates the files?
Do I pull files and folders from another location?
Do I use some sort of "lorem ipsum" generator for file structures?
Do I make all these files and folders manually(ugh)?
Do I just mock and stub the hell out of everything and not actually create/delete the files and folders?(I don't see this happening)
So...
How would someone normally approach testing excessive amounts of file and folder manipulation?
I don't think you want to use mocks/stubs. The file system of your OS should be well tested and fast, so the benefit of mocks/stubs is minimal. Creating a mock/stub system increases the complexity without much benefit.
Here's my answers:
Do I create a function that creates the files?
Yes. You can create tests for these functions to make sure that they are correct. Instead of calling Dir and File, write helper functions that make the code simple and readable. Maybe you can share the helper functions between the source/test code...
Do I pull files and folders from another location?
Not sure what this is for...
Do I use some sort of "lorem ipsum" generator for file structures?
Yes, if you mean create functions that generate file structures.
Do I make all these files and folders manually(ugh)?
No.
Do I just mock and stub the hell out of everything and not actually create/delete the files and folders?(I don't see this happening)
No. One benefit of creating files/directories is that you can manually check what is going on and not be 100% dependent on the tests. This is actually a good approach because without it there could be a bug where both the source code and test code is not doing what you expect, but you wouldn't know because everything seems to be working.

Operating system independent image addressing

Due to using both Windows and Ubuntu on my computer I'd like to be able to create documents independently. I have one directory for logos and I want to use them in any documents everywhere.
The problem with different file addressing I solved with those commands:
\newcommand{\winlogo}{D:/logo/}
\newcommand{\linlogo}{/media/DATA/logo/}
\includegraphics{\winlogo logo_bw}
How to provide this feature:
if(parameter==windows){adress:=D:/logo/}
elseif(parameter=linux){adress:=/media/DATA/logo}
else{error}
I've run into this problem as well, and I found that hard-coding the paths is an absolutely terrible idea. Also, keeping these directories in sync will eventually be a problem once your projects begin to grow.
The way I solved this was to put everything in version control (I like git, your mileage may vary).
Then I created an images folder, so my folder hierarchy looks like this:
Working-Dir
|-- images/
|-- myfile.tex
|-- nextfile.tex
Then in the preamble of my documents: \usepackage{graphicx} and \graphicspath{{images/}} which tells latex to look for a folder called images, then look for the graphics inside the folder.
Then I do my work on on comp, push my finished work back the repo, and when I switch computers I just pull from my repo. This way, everything stays in sync, no matter which computer i'm working on.
Treating tex source like source code has greatly improved my work flow and efficiency. I'd suggest similar measures for anyone dealing with a lot of latex source.
EDIT:
From: http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
Graphics storage
There is a way to tell LaTeX where to
look for images: for example, it can
be useful if you store images
centrally for use in many different
documents. The answer is in the
command \graphicspath which you supply
with an argument giving the name of an
additional directory path you want
searched when a file uses the
\includegraphics command, here are
some examples:
\graphicspath{{c:\mypict~1\camera}}
\graphicspath{{/var/lib/images/}}
\graphicspath{{./images/}}
\graphicspath{{images_folder/}{other_folder/}{third_folder/}}
please see
http://www.ctan.org/tex-archive/macros/latex/required/graphics/grfguide.pdf
As you may have noticed, in the first
example I've used the "safe" (MS-DOS)
form of the Windows MyPictures folder
because it's a bad idea to use
directory names containing spaces.
Using absolute paths, \graphicspath
does make your file less portable,
while using relative paths (like the
last example), you shouldn't have any
problem with portability, but remember
not to use spaces in file-names.
Alternatively, if you are using
PDFLaTeX, you can use the package
grffile which will then allow you to
use spaces in file names.
The third option should do you well-- just specify multiple paths for the \graphicspath I wonder if LaTeX will fail gracefully if you just include all of your paths in there (one for images, one for your logs on linux, one for your logos on windows)?
Mica, thank you once more, your advice works properly!
I've tested this code in preamble, in .sty file it doesn't work:
\usepackage{graphicx}
\graphicspath{{/media/DATA/logo/}{d:/logo/}{img/}}
where
/media/DATA/logo/ is address to directory with logos on mounted partition in Linux
d:/logo/ is address to same directory in windows
img/ is address of images for current document in actual working directory
and this code in document:
\includegraphics{logo_zcu_c} from logo dir
\includegraphics{hvof} from img/ dir`

Arbitrary sort key in filesystem

I have a pet project where I build a text-to-HTML translator. I keep the content and the converted output in a directory tree, mirroring the structure via the filesystem hierachy. Chapters go into directories and subchapters go into subdirectories. I get the chapter headings from the directory and file names. I want to keep all data in files, no database or so.
Kind of a keep-it-simple approach, no need to deal with meta-data.
All works well, except for the sort order of the directories and files to be included. I need sort of an arbitrary key for sorting directories and files in my application. That would determine the order the content goes into the output.
I have two solutions, both not really good:
1) Prepend directories and files with a sort key (e.g. "01_") and strip that in the output files in order not to pollute the output file names. That works badly for directories since they must keep the key data in order not to break the directory structure. That ends with an ugly "01_Introduction"...
2) put an config file into each directory with information on how to sort the directory content, to be used from my applications. That is error-prone and breaks the keep-it-simple no meta-data approach.
Do you have an idea? What would you do?
If your goal is to effectively avoid metadata, then I'd go with some variation of option 1.
I really do not find 01_Introduction to be ugly., at all.

Resources