Converting all files in a folder to md using pandoc on Mac - makefile

I am trying to convert an entire directory from html into markdown. The directory tree is quite tall, so there are files nested two and three levels down.
In answering this question, John MacFarlane suggested using the following Makefile:
TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))
.PHONY : all
all : $(MDS)
$(TXTDIR) :
mkdir $(TXTDIR)
$(TXTDIR)/%.markdown : %.html $(TXTDIR)
pandoc -f html -t markdown -s $< -o $#
Now, this doesn't seem to go inside subdirectories. Is there any easy way to modify this so that it will process the entire tree?
I don't need this to be in make. All I'm looking for is a way of getting a mirror of the initial directory where each html file is replaced by the output of running pandoc on that file.
(I suspect something along these lines should help, but I'm far from confident that I won't break things if I try to go at it on my own. I'm illiterate when it comes to GNU make).)

Since you mentioned you don't mind not using make, you can try bash.
I modified the code from this answer, use in the parent directory:
find ./ -iname "*.md" -type f -exec sh -c 'pandoc "${0}" -o "${0%.md}.pdf"' {} \;
It worked when I tested it, so it should work for you.
As per the request Any ideas how to specify the output folder? (Using html as the original file and md as the output):
find ./ -iname "*.html" -type f -exec sh -c 'pandoc "${0}" -o "./output/$(basename ${0%.html}.md)"' {} \;
I have tested this and it works for me.
Edit: As per a comment, the {} \; when used with find and the -exec option is used as a, more or less, placeholder for where the filename should be. As in it expands the filenames found to be placed in the command. The \; ends the -exec. See here for more explanation.

This is how I did it!
files=($(find ${INPUT_FOLDER} -type f -name '*.md'))
for item in ${files[*]}
do
printf " %s\n" $item
install -d ${DIR}/build/$item
pandoc $item -f markdown -t html -o ${DIR}/build/$item.html;
rm -Rf ${DIR}/build/$item
done

I've created a python script for converting all files under a folder tree which have a given suffix. It's called Pandoc-Folder. It might be useful, so I've put it on github: https://github.com/andrewrproper/pandoc-folder
You can create a settings folder and file (YAML format), and then run it like this:
python pandoc-folder.py ./path/to/book/.pandoc-folder/settings-file.yml
there is an example-book folder and matching .bat and .sh scripts for how to convert the markdown from the example-book folder into a single output file.
I hope this might be useful to someone.

John MacFarlane's answer is almost right. However, one needs to create the subfolder for pandoc, in case it doesn't exist. This is how I'd do it:
TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))
.PHONY : all
all : $(MDS)
$(TXTDIR)/%.markdown : %.html $(TXTDIR)
mkdir -p $(dir $#)
pandoc -f html -t markdown -s $< -o $#

This is a solution using ipython:
from pathlib import Path
files = [path for path in Path('.').rglob('*.html')]
for f in files:
!pandoc -s {str(path)} -o {path.name.replace(".html",".md")}
Note that you must execute the command inside the directory where you keep the HTML files, and your file will be saved in the same directory. In case just change the output path.

Related

Bash: Go up and down directories

Dear stackoverflow community,
I am new to bash and I've got a problem regarding loops and directories (code below). So I am working in the opensmile directory and want to look for .wav files in the subdirectory opensmile/test-audio/. But if I change my directory in the "for" section to test-audio/*.wav, it probably could find the .wav-files but then the main-action does have access to the necessary config file "IS10_paraling.conf". Within the main-action the directories have to be written like after "-C", so without a "/" before the directory.
My loop works, if the wav files are inside the opensmile directory, but not inside a sub-directory. I would like to look for files in the subdirectory test-audio while the main-action still has access to all of the opensmile-directory.
So basically: How do I go up and down directories within a for loop?
Thank you very much in advance!
This works
#! /bin/bash
cd /usr/local/opensmile/
for f in *.wav;
do
/usr/local/opensmile/build/progsrc/smilextract/SMILExtract -C config/is09-13/IS10_paraling.conf -I $f -D output/$f.csv ;
done
This does not work
#! /bin/bash
cd /usr/local/opensmile/
for f in test-audio/*.wav;
do
/usr/local/opensmile/build/progsrc/smilextract/SMILExtract -C config/is09-13/IS10_paraling.conf -I $f -D output/$f.csv ;
done
Saying "this does not work", doesn't tell us anything. What happens? Is there an error message?
Nevertheless, your question was "So basically: How do I go up and down directories within a for loop?"
If I'm tempted to go up and down directories within a loop, I'll do it in a subshell, so that I can be sure that the next time I enter the loop I'll be where I was originally. So I'll put all my commands in ( ).
#! /bin/bash
cd /usr/local/opensmile/
CONFIG=$PWD/config
OUTPUT=$PWD/output
for f in test-audio/*.wav;
do
(
cd test-audio
/usr/local/opensmile/build/progsrc/smilextract/SMILExtract -C $CONFIG/is09-13/IS10_paraling.conf -I `basename $f` -D $OUTPUT/$f.csv
)
done
though why one would need to to it for this case, I can't fathom
Instead of using a for loop, could you use find for this:
find /usr/local/opensmile/ -type f -name "*.wav" -exec /usr/local/opensmile/build/progsrc/smilextract/SMILExtract -C config/is09-13/IS10_paraling.conf -I $1 -D output/$1.csv "{}" \;

shell-script -cd in all subdirecories of a directory, execute command on their files

I am new to bash and i am trying to cd to all subdirectories of a parent directory and execute a command in all files these subdirecories contain.But it s not working.
for subdir in $parentdirectory
do
for file in $subdir
do
ngram - lm somefilename.lm - ppl file
done
done
There's many ways to do this, but one would require you to explicitly change to that directory. Assuming $parentdirectory is correctly initialized, then you could look into something like:
for subdir in ${parentdirectory}
do
cd ${subdir} # go into the subdir
for file in * # glob expansion
do
ngram - lm somefilename.lm - ppl ${file}
done
cd .. # go back up
done
Also have a look at the excellent Advanced Bash-Scripting Guide: http://tldp.org/LDP/abs/html/loops1.html
If you're wanting to do this with a small amount of space, you could do something using find -exec.
Such as:
# add a file called foo into every subdirectory
find . -type d -exec sh -c 'touch "$0/foo"' {} \;
Or, if you wanted to echo a string into each of those files you just created:
# find all files and append 'ABC' into them
find . -type f -exec sh -c 'echo "ABC" >> $0' {} \;
The find -exec combo is an extremely powerful tool that can save you on a bit of directory / file navigation, and allows you to achieve what it sounds like is the desired functionality without having to play descend/ascend through the directory structure.
Also, as you can probably guess, this kind of thing can go horribly wrong if you're not careful, so use with great caution.

can I use sed or an equivalent to cleanup vala generated files?

The release tarball from the maintainer for this project contains vala generated c files.
I'm looking for a solution to look for the .vala files and remove the equivalent .c file
For example
directory\file1.vala
directory\file1.c
directory\file3.c
directory\subdirectory\file2.vala
directory\subdirectory\file2.c
directory\subdirectory\file4.c
From the above I want to delete file1.c and file2.c but not file3.c and file4.c
So reaching for the trusty find I can use
find . -name "*.vala" -exec ls {} \;
This will list all vala files.
Going slightly further I can change the output to .c via
find . -name "*.vala" | sed -e 's/.vala/.c/'
Now I need to go one step further and delete those .c files.
I suppose I could redirect the output into another file and then write a shell script to loop round each line and delete the file.
Any thoughts on a better way? Is there a better way to clean up vala generated files?
Append this to your sed command:
| xargs echo rm -v
If everything looks okay remove echo.
Cyrus' recommendation to use xargs is probably more appropriate, but if your find supports -exec ... + you can also use:
find . -name "*.vala" -exec bash -c 'echo ${#/vala/c}' sh {} +
find . -name "*.vala" -exec bash -c 'rm ${#/vala/c}' sh {} +
(You can use \; as well as +, but the extra shell invocations will make it pretty slow if you have a lot of files. ) Note that this is equivalent to your sed and suffers from the same problem as the sed solution in handling files with names like avala.vala.

Linux bash-script to run make in all subdirectories

I'm trying to write a bash-script in Linux which traverses the current directory and, in every subdirectory, it launches the existing makefile. It should work for each subdirectory, regardless of depth.
Some restrictions:
I cannot use Python;
I don't know in advance how many subdirectories and their names;
I don't know in advance the name of current directory;
the make command for each directory should only be launched if there is makefile in such folder.
Any ideas on how to do it?
Using -exec and GNU make
find -type f \( -name 'GNUmakefile' -o -name 'makefile' -o -name 'Makefile' \) \
-exec bash -c 'cd "$(dirname "{}")" && make' \;
Given that this is make-related. I'd try to use a makefile at the top-level instead of a script. Something like this:
MAKEFILES:=$(shell find . -mindepth 2 -name Makefile -type f)
DIRS:=$(foreach m,$(MAKEFILES),$(realpath $(dir $(m))))
.PHONY: all
all: $(DIRS)
.PHONY: $(DIRS)
$(DIRS):
$(MAKE) -C $#
I'd accept what #MLSC says about using for with find, and that kind of applies here too .. the problem with that is when you have a space in the directory name. However, in many cases that's not going to happen, and IMHO there are benefits in using a makefile instead of a script. (There might be a solution using make that can cope with spaces in the directory name, but I can't think of it off the top of my head.)
You can use this script https://gist.github.com/legeyda/8b2cf2c213476c6fe6e25619fe22efd0.
Example usage is:
foreach */ 'test -f Makefile && make'
This should work if dont care about the execution order or if parent directory also has a Makefile.
#!/bin/bash
for f in $(find . -name Makefile); do
pushd $(dirname $f)
make
popd
done

Copy all files with a certain extension from all subdirectories

Under unix, I want to copy all files with a certain extension (all excel files) from all subdirectories to another directory. I have the following command:
cp --parents `find -name \*.xls*` /target_directory/
The problems with this command are:
It copies the directory structure as well, and I only want the files (so all files should end up in /target_directory/)
It does not copy files with spaces in the filenames (which are quite a few)
Any solutions for these problems?
--parents is copying the directory structure, so you should get rid of that.
The way you've written this, the find executes, and the output is put onto the command line such that cp can't distinguish between the spaces separating the filenames, and the spaces within the filename. It's better to do something like
$ find . -name \*.xls -exec cp {} newDir \;
in which cp is executed for each filename that find finds, and passed the filename correctly. Here's more info on this technique.
Instead of all the above, you could use zsh and simply type
$ cp **/*.xls target_directory
zsh can expand wildcards to include subdirectories and makes this sort of thing very easy.
From all of the above, I came up with this version.
This version also works for me in the mac recovery terminal.
find ./ -name '*.xsl' -exec cp -prv '{}' '/path/to/targetDir/' ';'
It will look in the current directory and recursively in all of the sub directories for files with the xsl extension. It will copy them all to the target directory.
cp flags are:
p - preserve attributes of the file
r - recursive
v - verbose (shows you whats
being copied)
I had a similar problem. I solved it using:
find dir_name '*.mp3' -exec cp -vuni '{}' "../dest_dir" ";"
The '{}' and ";" executes the copy on each file.
I also had to do this myself. I did it via the --parents argument for cp:
find SOURCEPATH -name filename*.txt -exec cp --parents {} DESTPATH \;
In 2022 the zsh solution also works in Linux Bash:
cp **/*.extension /dest/dir
works as expected.
find [SOURCEPATH] -type f -name '[PATTERN]' |
while read P; do cp --parents "$P" [DEST]; done
you may remove the --parents but there is a risk of collision if multiple files bear the same name.
On macOS Ventura 13.1, on zsh, I saw the following error when there were too many files to copy, saw the following error:
zsh: argument list too long: cp
Had to use find command along with cp to get the files copied to my destination:
find ./module/*/src -name \*.java -print | while read filelocation; do cp $filelocation mydestinationlocation; done

Resources