Using pandoc, it is easy to convert an xml docbook file to reST (rESTRUCTUREDTEXT) using the command:
pandoc -f docbook -t rst path_to_xml_file
Is it possible to convert a whole folder of xml docbook files to reST using pandoc ?
You can use simple shell script within directory with your docbook .xml files:
for FILENAME in *.xml; do pandoc -f docbook -t rst -o "${FILENAME/.xml/.rst}" "$FILENAME"; done
Note: I assumed your docbook files have .xml extension.
Related
I've been using a shell script in Automator on MacOS (OSX) successfully, but my method retains the '.md' extension in the resulting filename.
For example, if I input the file myfile.md the output is myfile.md.docx
This is my script:
for f in "$#"
do
if [[ "$f" = *.md ]]; then
/Users/myname/opt/anaconda3/bin/pandoc -o "${f%}.docx" -f markdown -t docx $f && open "${f%}.docx"
fi
done
Can anyone help me with this last step?
Use -o "${f%.*}.docx" to remove the original extension.
I'm using Pandoc to convert a bunch of DOCX files into RST.
pandoc -f docx -t rst file1.docx -o file1.rst --extract-media=.
pandoc -f docx -t rst file2.docx -o file2.rst --extract-media=.
pandoc -f docx -t rst file3.docx -o file3.rst --extract-media=.
...
Images within each file are being extracted into the media directory as expected (media/image1.png, media/image2.png, ...), but my problem is that images from each file overwrite those from the previous one.
The solution I have so far is basically to convert each file into a separate directory:
mkdir file1
pandoc -f docx -t rst file1.docx -o file1/file.rst --extract-media=file1
mkdir file2
pandoc -f docx -t rst file2.docx -o file2/file.rst --extract-media=file2
mkdir file3
pandoc -f docx -t rst file3.docx -o file3/file.rst --extract-media=file3
...
Is there any option or way to have all images in the same directory? Maybe some kind of media prefix?
I use pandoc to convert docx to markdown with the following:
pandoc -f docx -t markdown --extract-media="pandoc-output/$filename/" -o "pandoc-output/$filename/full.md" "$fullfile"
Which works OK. However, the media is stored in:
pandoc-output/$filename/media/
I want the media to be stored in
/pandoc-output/media/$filename/
Is this possible?
UPDATE
I ended up with a sed command to search and replace the offending lines together with a mv to the proper directory.
gsed -i -r "s/([a-zA-Z0-9_-]+)\/pandoc-output\/media\/([a-zA-Z0-9]+)/\/public\/media\/\1\/\2/" $ROOTDIR"$d"_"$filename.html.md"
I have a series of zip archives from which I wish to extract one text file to an output directory. the file is in the general location:
archive.zip/archive/summary.txt
I have the following code that I thought should work:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j $name/$base/summary.txt -d /$output/$file-summary.txt
done
However unzip cannot find the text files.
In the end the following did what I wanted:
for file in *.zip
do
name=${file##*/}
base=${name%.zip}
unzip -j "$name" "$base/summary.txt" -d "$output/$base"
done
I have a tar archive which contains several text files. I would like to write a script to display (stdout) the content of a file without extracting it to the current directory.
Actually I would like to do the same as:
tar tf myArchive.tar folder/someFile.txt
cat folder/someFile.txt
rm -R folder
but without the rm...
I tried this way but it didn't work:
tar tf myArchive.tar folder/someFile.txt | cat
Thanks
Use x to extract, with f from archive file. Then add also option -O to direct extracted files to standard output.
tar xf myArchive.tar folder/someFile.txt -O