I'm using Pandoc to convert a bunch of DOCX files into RST.
pandoc -f docx -t rst file1.docx -o file1.rst --extract-media=.
pandoc -f docx -t rst file2.docx -o file2.rst --extract-media=.
pandoc -f docx -t rst file3.docx -o file3.rst --extract-media=.
...
Images within each file are being extracted into the media directory as expected (media/image1.png, media/image2.png, ...), but my problem is that images from each file overwrite those from the previous one.
The solution I have so far is basically to convert each file into a separate directory:
mkdir file1
pandoc -f docx -t rst file1.docx -o file1/file.rst --extract-media=file1
mkdir file2
pandoc -f docx -t rst file2.docx -o file2/file.rst --extract-media=file2
mkdir file3
pandoc -f docx -t rst file3.docx -o file3/file.rst --extract-media=file3
...
Is there any option or way to have all images in the same directory? Maybe some kind of media prefix?
Related
I've been using a shell script in Automator on MacOS (OSX) successfully, but my method retains the '.md' extension in the resulting filename.
For example, if I input the file myfile.md the output is myfile.md.docx
This is my script:
for f in "$#"
do
if [[ "$f" = *.md ]]; then
/Users/myname/opt/anaconda3/bin/pandoc -o "${f%}.docx" -f markdown -t docx $f && open "${f%}.docx"
fi
done
Can anyone help me with this last step?
Use -o "${f%.*}.docx" to remove the original extension.
Using pandoc, it is easy to convert an xml docbook file to reST (rESTRUCTUREDTEXT) using the command:
pandoc -f docbook -t rst path_to_xml_file
Is it possible to convert a whole folder of xml docbook files to reST using pandoc ?
You can use simple shell script within directory with your docbook .xml files:
for FILENAME in *.xml; do pandoc -f docbook -t rst -o "${FILENAME/.xml/.rst}" "$FILENAME"; done
Note: I assumed your docbook files have .xml extension.
I use pandoc to convert docx to markdown with the following:
pandoc -f docx -t markdown --extract-media="pandoc-output/$filename/" -o "pandoc-output/$filename/full.md" "$fullfile"
Which works OK. However, the media is stored in:
pandoc-output/$filename/media/
I want the media to be stored in
/pandoc-output/media/$filename/
Is this possible?
UPDATE
I ended up with a sed command to search and replace the offending lines together with a mv to the proper directory.
gsed -i -r "s/([a-zA-Z0-9_-]+)\/pandoc-output\/media\/([a-zA-Z0-9]+)/\/public\/media\/\1\/\2/" $ROOTDIR"$d"_"$filename.html.md"
I have a series of zip archives from which I wish to extract one text file to an output directory. the file is in the general location:
archive.zip/archive/summary.txt
I have the following code that I thought should work:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j $name/$base/summary.txt -d /$output/$file-summary.txt
done
However unzip cannot find the text files.
In the end the following did what I wanted:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j "$name" "$base/summary.txt" -d "$output/$base"
done
I would like to convert efficiently a couple of jpeg Images contained in a tar.gz to an x264 mp4 movie.
gzip -cd Monitor-1-xx.tar.gz|cpio -i --to-stdout|jpegtopnm|ppmtoy4m -F 4:1| \
> x264 --crf 24 -o Monitor-1-xx.mp4 --stdin y4m -
The problem here is that, after cpio I have multiple jpg files in a single stream and jpegtopnm only converts the first one.
I would like to find a function to split the stream (or to get it pre-split). Then I would like to run jpegtopnm multiple times for each split. It is somewhat like what xargs does when I untar to disk first. Writing to disk is something I am trying to eschew:
mkdir tmpMonitor && cd tmpMonitor && tar -xf ../Monitor-1-xx.tar.gz
find . -iname "*.jpg"|xargs -n1 jpegtopnm|ppmtoy4m -F 4:1| \
x264 --crf 24 -o ../xx.mp4 --stdin y4m -
cd .. && rm -rf tmpMonitor
Any suggestions?
tar has a couple of options that may be useful here (I have GNU tar, so I apologize in advance for assuming you do in case you actually don't):
--wildcards - lets you pick files to extract from the tar using globs like *.jpeg
--to-command - pipe each extracted file to the given command.
So maybe something like this?
tar -xzf Monitor-1-xx.tar.gz --wildcards '*.jpeg' \
--to-command="jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx.mp4 --stdin y4m -"
Well I don't know much about x264 so do consider that untested code. I tested this using simple .txt files instead of .jpegs and cat -n instead of jpegtopnm etc. The other thing is, I am guessing you want separate output files (one per jpeg), so it looks to me like ../xx.mp4 won't do... So assuming you want separate invocations of jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx.mp4 --stdin y4m - for each file then you want a different output filename for -o right? - In which case, the following hack might work:
tar -xzf Monitor-1-xx.tar.gz --wildcards '*.jpeg' \
--to-command="jpegtopnm|ppmtoy4m -F 4:1| x264 --crf 24 -o ../xx-`date +%H%M%S%N`.mp4 --stdin y4m -"