How to prevent Pandoc from overwriting existing files when extracting media? - pandoc

I'm using Pandoc to convert a bunch of DOCX files into RST.
pandoc -f docx -t rst file1.docx -o file1.rst --extract-media=.
pandoc -f docx -t rst file2.docx -o file2.rst --extract-media=.
pandoc -f docx -t rst file3.docx -o file3.rst --extract-media=.
...
Images within each file are being extracted into the media directory as expected (media/image1.png, media/image2.png, ...), but my problem is that images from each file overwrite those from the previous one.
The solution I have so far is basically to convert each file into a separate directory:
mkdir file1
pandoc -f docx -t rst file1.docx -o file1/file.rst --extract-media=file1
mkdir file2
pandoc -f docx -t rst file2.docx -o file2/file.rst --extract-media=file2
mkdir file3
pandoc -f docx -t rst file3.docx -o file3/file.rst --extract-media=file3
...
Is there any option or way to have all images in the same directory? Maybe some kind of media prefix?

Related

I'm using Pandoc to convert Markdown to .docx - how do I remove the .md from the resulting filename?

I've been using a shell script in Automator on MacOS (OSX) successfully, but my method retains the '.md' extension in the resulting filename.
For example, if I input the file myfile.md the output is myfile.md.docx
This is my script:
for f in "$#"
do
if [[ "$f" = *.md ]]; then
/Users/myname/opt/anaconda3/bin/pandoc -o "${f%}.docx" -f markdown -t docx $f && open "${f%}.docx"
fi
done
Can anyone help me with this last step?
Use -o "${f%.*}.docx" to remove the original extension.

How to convert a folder of xml docbook files to reST?

Using pandoc, it is easy to convert an xml docbook file to reST (rESTRUCTUREDTEXT) using the command:
pandoc -f docbook -t rst path_to_xml_file
Is it possible to convert a whole folder of xml docbook files to reST using pandoc ?
You can use simple shell script within directory with your docbook .xml files:
for FILENAME in *.xml; do pandoc -f docbook -t rst -o "${FILENAME/.xml/.rst}" "$FILENAME"; done
Note: I assumed your docbook files have .xml extension.

Configure pandoc to extract media to different folder

I use pandoc to convert docx to markdown with the following:
pandoc -f docx -t markdown --extract-media="pandoc-output/$filename/" -o "pandoc-output/$filename/full.md" "$fullfile"
Which works OK. However, the media is stored in:
pandoc-output/$filename/media/
I want the media to be stored in
/pandoc-output/media/$filename/
Is this possible?
UPDATE
I ended up with a sed command to search and replace the offending lines together with a mv to the proper directory.
gsed -i -r "s/([a-zA-Z0-9_-]+)\/pandoc-output\/media\/([a-zA-Z0-9]+)/\/public\/media\/\1\/\2/" $ROOTDIR"$d"_"$filename.html.md"

Bash Partial Unzip of Archive in Loop

I have a series of zip archives from which I wish to extract one text file to an output directory. the file is in the general location:
archive.zip/archive/summary.txt
I have the following code that I thought should work:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j $name/$base/summary.txt -d /$output/$file-summary.txt
done
However unzip cannot find the text files.
In the end the following did what I wanted:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j "$name" "$base/summary.txt" -d "$output/$base"
done

Stream chain after Untar

I would like to convert efficiently a couple of jpeg Images contained in a tar.gz to an x264 mp4 movie.
gzip -cd Monitor-1-xx.tar.gz|cpio -i --to-stdout|jpegtopnm|ppmtoy4m -F 4:1| \
> x264 --crf 24 -o Monitor-1-xx.mp4 --stdin y4m -
The problem here is that, after cpio I have multiple jpg files in a single stream and jpegtopnm only converts the first one.
I would like to find a function to split the stream (or to get it pre-split). Then I would like to run jpegtopnm multiple times for each split. It is somewhat like what xargs does when I untar to disk first. Writing to disk is something I am trying to eschew:
mkdir tmpMonitor && cd tmpMonitor && tar -xf ../Monitor-1-xx.tar.gz
find . -iname "*.jpg"|xargs -n1 jpegtopnm|ppmtoy4m -F 4:1| \
x264 --crf 24 -o ../xx.mp4 --stdin y4m -
cd .. && rm -rf tmpMonitor
Any suggestions?
tar has a couple of options that may be useful here (I have GNU tar, so I apologize in advance for assuming you do in case you actually don't):
--wildcards - lets you pick files to extract from the tar using globs like *.jpeg
--to-command - pipe each extracted file to the given command.
So maybe something like this?
tar -xzf Monitor-1-xx.tar.gz --wildcards '*.jpeg' \
--to-command="jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx.mp4 --stdin y4m -"
Well I don't know much about x264 so do consider that untested code. I tested this using simple .txt files instead of .jpegs and cat -n instead of jpegtopnm etc. The other thing is, I am guessing you want separate output files (one per jpeg), so it looks to me like ../xx.mp4 won't do... So assuming you want separate invocations of jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx.mp4 --stdin y4m - for each file then you want a different output filename for -o right? - In which case, the following hack might work:
tar -xzf Monitor-1-xx.tar.gz --wildcards '*.jpeg' \
--to-command="jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx-`date +%H%M%S%N`.mp4 --stdin y4m -"

Resources