Bulk Uploading to MediaWiki (Hierarchical Structure) - bash

I have lots of Markdown files each contained in a folder with the same name as Markdown file. I use Pandoc to generate the MediaWiki file in Rendered folder.
For Example
ComputerScience
|
ComputerScience.md
|
Rendered
| |
| ComputerScience.wiki
Image
| |
| Computer.png
Resource
|
Algorithms.pdf
Every Markdown file has its own folder which contains other folders like Image, Resource, which are linked in the markdown file. To explain my structure: let me call the above structure as ComputerScience Container. Each markdown file has this container. These containers are classified in a hierarchical way - Several such containers can exist in a Folder (which I call here as, SuperFolder). These SuperFolder can Contain another SuperFolder. For example (The markdown folders are mentioned as Container):
Computer Science
|
Computer Science Container
|
Algorithms
| |
| Algorithms Container
| |
| DataStructure Container
Architecture Container
In above Computer Science Super Folder consist of Containers as well as another SuperFolder called Algorithms.
How can I upload this kind of hierarchical structure into local mediawiki?
Also, I would like to edit Markdown files and generate the updated MediaWiki files. I hope to update the Mediawiki files using a script.
Any suggestions on how should I approach this?

If you want to learn a tool that's flexible and can be reused in the future, I'd look at pywikibot. If you just want a quick one-off that can be used from bash, and the wiki is local, use edit.php.

Related

Speed of directory look-up vs. formatted filename look-up

I've had this discussion with my co-worker multiple times now, and I'm 99.9% sure that I'm correct, but they have been insisting that they are correct and I'm starting to wonder if I'm the crazy one.
We are uploading images taken from users from their mobile devices, cumulatively they could upload thousands given enough time. Each of these photos belong to a "work orders", which are given a sequential integer. We want to optimize for retrieval (based on the work order) rather than writing. We are also on a Windows machine.
My proposed storage method looks like this:
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
Their proposed storage method looks like this:
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
For me, in order to gather the 2 images for work order 23875, I would look in the directory, Images/23875 and grab all the .png files.
For them to do the same thing, they would iterate through all the files and run a wildcard filter on all the filenames, something to effect of 23875_*.png.
I believe my method to be superior because, in the case where there are, say, thousands of images, it doesn't need to run a wildcard filter on potentially thousands of irrelevant files. I've asked why they believe their method to be superior, but I haven't gotten a compelling answer.
Any advice is appreciated.
This method
Images
|-- 23875_f0347b8.png
|-- 23875_b04675b.png
|-- 28765_aab658c.png
requires iterating through every single file in Images to find all files that match 23875_*. Every single time you want to find them. Over and over. Until the world ends and the stars go dark.
Putting all the files in one directory discards information you have when you create the file, thus making the file harder to locate in the future. Trying to encode that information in the file name means the data is mixed in with all other similar data and therefore needs to be filtered out in the future.
Why? You're right - it makes no sense. It's tossing information in the garbage for no good reason.
Your method
Images
|-- 23875
| |-- f0347b8.png
| |-- b04675b.png
|-- 28765
|-- aab658c.png
has already partitioned the files into the required associations. No filtering or searching is needed to find the files.
they have been insisting that they are correct
Oooh-kay. Maybe they like this sort of wrestling...

What is a good way to share templates across apps with separate codebases?

We have one Sinatra app and one Backbone app.
I saw Sharing the same codebase across multiple apps but didn't understand it or how I could implement it.
This question is not really specific to Sinatra or Backbone; it could be pretty much any apps. Using Heroku and Git
One idea is to put the HTML on S3, but we aren't using S3 to store HTML. And how would you get it from Git onto S3? It seems very convoluted.
So, is there a good way of sharing HTML templates between the apps?
We do it by having a containing parent directory, and well-defined paths to the common files, and by having a common YAML file used to tell different apps where to look.
Create a common YAML file that contains a Hash, with the keys being a common-name for a particular resource or path to resources, and the value being the absolute path to that on the disk.
For instance:
---
html: /absolute/path/to/shared/html
images: /absolute/path/to/shared/images
main_css: /absolute/path/to/shared/styles.css
Load that using Ruby with:
require 'yaml'
SHARED_RESOURCES = YAML.load_file('/absolute/path/to/shared_resources.yaml')
# => {"html"=>"/absolute/path/to/shared/html", "images"=>"/absolute/path/to/shared/images", "main_css"=>"/absolute/path/to/shared/styles.css"}
Use the resulting SHARED_RESOURCES hash to retrieve the information you need:
main_css = SHARED_RESOURCES['main_css']
# => "/absolute/path/to/shared/styles.css"
You can use that same YAML file from ANY language that can read YAML, or where you can open that file and parse its contents. At that point, all your code-bases can play from the same sheet of music, and will know how to access the common files when necessary.
For instance, from Perl:
use YAML;
$SHARED_RESOURCES = Load('
---
html: /absolute/path/to/shared/html
images: /absolute/path/to/shared/images
main_css: /absolute/path/to/shared/styles.css
');
print $SHARED_RESOURCES->{'main_css'}, "\n";
>> /absolute/path/to/shared/styles.css
If you want to get fancier, use a database to hold those shared resources. Either way, the idea is there's just one place for the code to look for a particular resource/file.

Bash - Identify files not referenced by other files

I have a website that runs off an OpenWRT router. I'd like to optimize the site by removing an files that aren't being used. Here is my directory structure...
/www/images
/www/js
/www/styles
/www/otherSubDirectories <--- not really named that
I'm mostly concerned about identifying images that are not used because those take the most space. But it would also be nice to identify style sheets and javascript files that are not being used. So, is there a way I can search /www and all sub directories and files and print a list of files in /www/images, /www/js, and /www/styles that are not referenced by any other files?
When I'm looking for files that contain a specific string I use this:
find . | xargs grep -Hn 'myImage.jpg'
That would tell me all files that reference the image. Maybe some variation of that?
Any help would be appreciated!
EV
Swiss File Knife is very nice tool.
Find out which files are used (referenced) by other files through fuzzy content analysis
Consider using a cross-reference program (for example, lxr) for this problem. (I haven't verified if lxr can do the job, but believe it can.) If an off-the-shelf cross-reference program doesn't work, look for an open source cross-reference program in a language you know, and adapt it.

Pretty text tables & trees

I'm wondering what SO uses (if anything) to create text-based tables and trees? I'm running Notepad++, however I'm thinking of making a change.
When I'm not whiteboarding, I'm in Notepad++, however it's a painstaking process creating both trees and tables. I've seen some quick scripts around for CLI driven processes, like file system output, but I'm looking for something that allows one to quickly create arbitrary tables and trees (self contained GUI or file import perhaps) and dump them to text. No plugins for Notepad++ in my searches.
I have various graphical modeling tools, however I prefer monospaced text (I'm an ASCII art kid, not to mention for script docs inclusion) so no sense in mentioning Visio (blech) or the plethora of others (unless they happen to support this sort of functionality)
+- Thank +---------+----------------+
| | Any | Suggestions |
+- You +---------+----------------+
| | | Are | Certainly |
| +- Very +---------+----------------+
| | Welcome | |
+- Much +---------+----------------+
Note: Running Win7x64 + Cygwin
I'm not positive about trees, but I have used the perl Text::FormatTable module before and have found it very very helpful in automatting output from scripts into tables.
For straight editing yourself, I'd recommend org-mode for emacs, which has a fantastic ascii table-editing mode.

Operating system independent image addressing

Due to using both Windows and Ubuntu on my computer I'd like to be able to create documents independently. I have one directory for logos and I want to use them in any documents everywhere.
The problem with different file addressing I solved with those commands:
\newcommand{\winlogo}{D:/logo/}
\newcommand{\linlogo}{/media/DATA/logo/}
\includegraphics{\winlogo logo_bw}
How to provide this feature:
if(parameter==windows){adress:=D:/logo/}
elseif(parameter=linux){adress:=/media/DATA/logo}
else{error}
I've run into this problem as well, and I found that hard-coding the paths is an absolutely terrible idea. Also, keeping these directories in sync will eventually be a problem once your projects begin to grow.
The way I solved this was to put everything in version control (I like git, your mileage may vary).
Then I created an images folder, so my folder hierarchy looks like this:
Working-Dir
|-- images/
|-- myfile.tex
|-- nextfile.tex
Then in the preamble of my documents: \usepackage{graphicx} and \graphicspath{{images/}} which tells latex to look for a folder called images, then look for the graphics inside the folder.
Then I do my work on on comp, push my finished work back the repo, and when I switch computers I just pull from my repo. This way, everything stays in sync, no matter which computer i'm working on.
Treating tex source like source code has greatly improved my work flow and efficiency. I'd suggest similar measures for anyone dealing with a lot of latex source.
EDIT:
From: http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
Graphics storage
There is a way to tell LaTeX where to
look for images: for example, it can
be useful if you store images
centrally for use in many different
documents. The answer is in the
command \graphicspath which you supply
with an argument giving the name of an
additional directory path you want
searched when a file uses the
\includegraphics command, here are
some examples:
\graphicspath{{c:\mypict~1\camera}}
\graphicspath{{/var/lib/images/}}
\graphicspath{{./images/}}
\graphicspath{{images_folder/}{other_folder/}{third_folder/}}
please see
http://www.ctan.org/tex-archive/macros/latex/required/graphics/grfguide.pdf
As you may have noticed, in the first
example I've used the "safe" (MS-DOS)
form of the Windows MyPictures folder
because it's a bad idea to use
directory names containing spaces.
Using absolute paths, \graphicspath
does make your file less portable,
while using relative paths (like the
last example), you shouldn't have any
problem with portability, but remember
not to use spaces in file-names.
Alternatively, if you are using
PDFLaTeX, you can use the package
grffile which will then allow you to
use spaces in file names.
The third option should do you well-- just specify multiple paths for the \graphicspath I wonder if LaTeX will fail gracefully if you just include all of your paths in there (one for images, one for your logs on linux, one for your logos on windows)?
Mica, thank you once more, your advice works properly!
I've tested this code in preamble, in .sty file it doesn't work:
\usepackage{graphicx}
\graphicspath{{/media/DATA/logo/}{d:/logo/}{img/}}
where
/media/DATA/logo/ is address to directory with logos on mounted partition in Linux
d:/logo/ is address to same directory in windows
img/ is address of images for current document in actual working directory
and this code in document:
\includegraphics{logo_zcu_c} from logo dir
\includegraphics{hvof} from img/ dir`

Resources