I use pandoc to convert docx to markdown with the following:
pandoc -f docx -t markdown --extract-media="pandoc-output/$filename/" -o "pandoc-output/$filename/full.md" "$fullfile"
Which works OK. However, the media is stored in:
pandoc-output/$filename/media/
I want the media to be stored in
/pandoc-output/media/$filename/
Is this possible?
UPDATE
I ended up with a sed command to search and replace the offending lines together with a mv to the proper directory.
gsed -i -r "s/([a-zA-Z0-9_-]+)\/pandoc-output\/media\/([a-zA-Z0-9]+)/\/public\/media\/\1\/\2/" $ROOTDIR"$d"_"$filename.html.md"
Related
I've been using a shell script in Automator on MacOS (OSX) successfully, but my method retains the '.md' extension in the resulting filename.
For example, if I input the file myfile.md the output is myfile.md.docx
This is my script:
for f in "$#"
do
if [[ "$f" = *.md ]]; then
/Users/myname/opt/anaconda3/bin/pandoc -o "${f%}.docx" -f markdown -t docx $f && open "${f%}.docx"
fi
done
Can anyone help me with this last step?
Use -o "${f%.*}.docx" to remove the original extension.
I'm using Pandoc to convert a bunch of DOCX files into RST.
pandoc -f docx -t rst file1.docx -o file1.rst --extract-media=.
pandoc -f docx -t rst file2.docx -o file2.rst --extract-media=.
pandoc -f docx -t rst file3.docx -o file3.rst --extract-media=.
...
Images within each file are being extracted into the media directory as expected (media/image1.png, media/image2.png, ...), but my problem is that images from each file overwrite those from the previous one.
The solution I have so far is basically to convert each file into a separate directory:
mkdir file1
pandoc -f docx -t rst file1.docx -o file1/file.rst --extract-media=file1
mkdir file2
pandoc -f docx -t rst file2.docx -o file2/file.rst --extract-media=file2
mkdir file3
pandoc -f docx -t rst file3.docx -o file3/file.rst --extract-media=file3
...
Is there any option or way to have all images in the same directory? Maybe some kind of media prefix?
Using pandoc, it is easy to convert an xml docbook file to reST (rESTRUCTUREDTEXT) using the command:
pandoc -f docbook -t rst path_to_xml_file
Is it possible to convert a whole folder of xml docbook files to reST using pandoc ?
You can use simple shell script within directory with your docbook .xml files:
for FILENAME in *.xml; do pandoc -f docbook -t rst -o "${FILENAME/.xml/.rst}" "$FILENAME"; done
Note: I assumed your docbook files have .xml extension.
So I have 20 subfolders full of files in my main folder and have around 200 files in every subfolder. I've been trying to write a script to convert every picture in every subfolder to DNG.
I have done some research and was able to batch convert images from the current folder.
I've tried developping the idea to get it to work for subfolders but to no success.
Here is the code I've written:
for D in 'find . -type d'; do for i in *.RW2; do sips -s format jpeg $i --out "${i%.*}.jpg"; cd ..; done; done;
The easiest and fastest way to do this is with GNU Parallel like this:
find . -iname \*rw2 -print0 | parallel -0 sips -s format jpeg --out {.}.jpg {}
because that will use all your CPU cores in parallel. But before you launch any commands you haven't tested, it is best to use the --dry-run option like this so that it shows you what it is going to do, but without actually doing anything:
find . -iname \*rw2 -print0 | parallel --dry-run -0 sips -s format jpeg --out {.}.jpg {}
Sample Output
sips -s format jpeg --out ./sub1/b.jpg ./sub1/b.rw2
sips -s format jpeg --out ./sub1/a.jpg ./sub1/a.RW2
sips -s format jpeg --out ./sub2/b.jpg ./sub2/b.rw2
If you like the way it looks, remove the --dry-run and run it again. Note that the -iname parameter means it is insensitive to upper/lower case, so it will work for ".RW2" and ".rw2".
GNU Parallel is easily installed on macOS via homebrew with:
brew install parallel
It can also be installed without a package manager (like homebrew) because it is actually a Perl script and Macs come with Perl. So you can install by doing this in Terminal:
(wget pi.dk/3 -qO - || curl pi.dk/3/) | bash
Your question seems confused as to whether you want DNG files like your title suggests, or JPEG files like your code suggests. My code generates JPEGs as it stands. If you want DNG output, you will need to install Adobe DNG Converter, and then run:
find . -iname \*rw2 -print0 | parallel --dry-run -0 \"/Applications/Adobe DNG Converter.app/Contents/MacOS/Adobe DNG Converter\"
There are some other options you can append to the end of the above command:
-e will embed the original RW2 file in the DNG
-u will create the DNG file uncompressed
-fl will add fast load information to the DNG
DNG Converter seems happy enough to run multiple instances in parallel, but I did not test with thousands of files. If you run into issues, just run one job at a time by changing to parallel -j 1 ...
Adobe DNG Converter is easily installed under macOS using homebrew as follows:
brew install caskroom/cask/adobe-dng-converter
Is there any way of getting download links from a website and put those like in a text file?
To download hose files later in with wget ?
You need to download the source of the website. You can use wget link-of-the-webiste-you-want-to-grab-links-from for that. Than you can sed the links like this: sed -n 's/.*href="\([^"]*\).*/\1/p' file
The this questions for details.
With this you can download jpg file, instead of jpg you can give any file format which should be present in source_file. Your downloading links list will be in link.txt
grep -Po 'href=\"\/.+\.jpg' source_file | sed -n 's/href="\([^"]*\)/\1/p' >link.txt; wget -i link.txt