xargs on retaining filename for batch html to text conversion - bash

I'm converting some html files to text using html2text and want to retain the name of the file name charliesheenwinning.html as charliesheenwinning.txt or even charliesheenwinning.html.txt .
find ./ -not -regex ".*\(png\|jpg\|gif\)$" -print0 | xargs -0 -L10 {} max-process=0 html2text {} -o ../potistotallywinning/{}.txt
Of course the last part -o is so wrong. How do I retain reusing the filename beyond the first argument to html2text? Can use a for in -exec, but how can I do it with xargs?
update
Ended up doing
find path/to/dir -type f -not -regex ".*\(gif\|png\|jpg\|jpeg\|mov\|pdf\|txt\)$" -print0 | xargs -0 -L10 --max-procs=0 -I {} html2text -o {}.txt {}
mkdir dir/w/textfiles
cp -r path/to/dir dir/w/textfiles
find dir/w/textfiles -type f -not -regex ".*txt$" -print0 | xargs -0 -L10 --max-procs=0 -I {} rm {}
Not the best .. but whatever..
[just in case you were wondering why it isn't just a simple -name '*html' in the find argument, this was a wget of a mediawiki .. ]

You should try to use basename:
$ man basename

I was facing the same problem – for the record, here's what I came up with to get substition into xargs:
seq 100 | xargs -I % -n 1 -P 16 bash -c 'echo % `sed "s/1/X/" <<< %`'
It will print something like this:
10 X0
3 3
12 X2
4 4
11 X1
1 X
15 X5

Related

Why does my xargs command with a pipe work only for a single file, but not multiple?

I am trying to pipe a few commands in a row; it works with a single file, but gives me an error once I try it on multiple files at once.
On a single file in my working folder:
find . -type f -iname "summary.5runs.*" -print0 | xargs -0 cut -f1-2 | head -n 2
#It works
Now I want to scan all files with a certain prefix/suffix in the name in all subdirectories of my working folder, then write the results to text file
find . -type f -iname "ww.*.out.txt" -print0 | xargs -0 cut -f3-5 | head -n 42 > summary.5runs.txt
#Error: xargs: cut: terminated by signal 13
I guess my problem is to reiterate through multiple files, but I am not sure how to do it.
Your final head stops after 42 lines of total output, but you want it to operate per file. You could fudge around with a subshell in xargs:
xargs -0 -I{} bash -c 'cut -f3-5 "$1" | head -n 42' _ {} > summary.5runs.txt
or you could make it part of an -exec action:
find . -type f -iname "ww.*.out.txt" \
-exec bash -c 'cut -f3-5 "$1" | head -n 42' _ {} \; > summary.5runs.txt
Alternatively, you could loop over all the files in the subshell so you have to spawn just one:
find . -type f -iname "ww.*.out.txt" \
-exec bash -c 'for f; do cut -f3-5 "$f" | head -n 42; done' _ {} + \
> summary.5runs.txt
Notice the {} + instead of {} \;.

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

How to combine these 2 commands into single line command

I have 2 useful bash command below, but i want to combine it together.
Is it possible to do ?
find "$1" -type f -print0 | xargs -0 sha1sum -b
find "$1" -type f ! -iname '*thumbs.db*' -print0 | xargs -0 stat -c "%y %s %n"
If you want to write it in one line, you can just use "&" to combine the commands. Maybe this is what you meant:
find "$1" -type f -print0 | xargs -0 sha1sum -b & find "$1" -type f ! -iname '*thumbs.db*' -print0 | xargs -0 stat -c "%y %s %n"

Bash Script interactive mv issues

I'm working on a bash script to help organize files and I want to use mv -i to make sure I don't write over something important.
The script is working right now except for the -i for the mv.
It shows (y/n [n]) not overwritten part, but then goes and and doesn't allow me to interact with it.
createList()
{
ls *.epub | sed 's/-.*//' |uniq >> list.txt
ls *.mobi | sed 's/-.*//' |uniq >> list2.txt
}
atag()
{
find /Users/j/Desktop/Source -maxdepth 1 -iname "*.epub" -type f -print0 | xargs -0 -I '{}' tag -a Purple {}
find /Users/j/Desktop/Source -maxdepth 1 -iname "*.mobi" -type f -print0 | xargs -0 -I '{}' tag -a Purple {}
}
moveEpub()
{
while read -r line; do
if [ -d "/Users/j/Desktop/Dest/$line" ]; then
if [ -d "/Users/j/Desktop/Dest/$line/EPUB" ]; then
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/
else
mkdir "/Users/j/Desktop/Dest/$line/EPUB"
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/
fi
fi
done < "list.txt"
}
moveMobi()
{
while read -r line; do
if [ -d "/Users/j/Desktop/Dest/$line" ]; then
if [ -d "/Users/j/Desktop/Dest/$line/MOBI" ]; then
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv -i {} /Users/j/Desktop/Dest/"$line"/MOBI/
else
mkdir "/Users/j/Desktop/Dest/$line/MOBI"
find /Users/j/Desktop/Source/ -maxdepth 1 -iname "*$line*" -and ! -iname ".*$line*" -type f -print0 | xargs -0 -I '{}' mv --interactive {} /Users/j/Desktop/Dest/"$line"/MOBI/
fi
fi
done < "list2.txt"
}
clear
createList
atag
moveEpub
moveMobi
rm list.txt
rm list2.txt
If you want mv -i to interact with the terminal, that means its stdin needs to be attached to that terminal. There are several places, here, where you're overriding stdin.
For instance:
# THIS LOOP OVERRIDES STDIN
while read -r line
...
done <list.txt
...redirects stdin for the entire duration of the loop, so instead of reading from the user, mv reads from list.txt. To change this, use a different file descriptor:
# This loop uses FD 3 for stdin
while read -r line <&3
...
done 3<list.txt
Another place is in calling xargs. Instead of:
# Overrides stdin for xargs and mv to contain output from find
find ... -print0 | xargs -0 -I '{}' mv -i '{}' "$dest"
...use:
# directly executes mv from find, stdin not modified
find ... -exec mv -i '{}' "$dest" ';'
That said, I would suggest ditching list.txt and list2.txt altogether; you simply don't need them; for that matter, you don't need find either.
dest=/Users/j/Desktop/Dest
source=/Users/j/Desktop/Source
moveEpub() {
local -A finished=( ) # WARNING: This requires bash 4.0 or newer.
for name in *.epub; do
prefix=${name%%-*} # remove everything past the first dash
[[ ${finished[$prefix]} ]] && continue # skip if already done with this prefix
finished[$prefix]=1 # set flag to skip other files w/ this prefix
[[ -d $dest/$prefix ]] || continue # skip if no directory exists for this prefix
mkdir -p "$dest/$prefix/EPUB" # create destination if not existing
mv -i "$source"/*"$prefix"* "$dest/$prefix/EPUB"
done
}
You can use built in find action -exec instead of piping to xargs :
find /Users/j/Desktop/Source/ -maxdepth 1 \
-iname "*$line*" -and ! -iname ".*$line*" -type f \
-exec mv -i {} /Users/j/Desktop/Dest/"$line"/EPUB/ \;

find -exec with multiple commands

I am trying to use find -exec with multiple commands without any success. Does anybody know if commands such as the following are possible?
find *.txt -exec echo "$(tail -1 '{}'),$(ls '{}')" \;
Basically, I am trying to print the last line of each txt file in the current directory and print at the end of the line, a comma followed by the filename.
find accepts multiple -exec portions to the command. For example:
find . -name "*.txt" -exec echo {} \; -exec grep banana {} \;
Note that in this case the second command will only run if the first one returns successfully, as mentioned by #Caleb. If you want both commands to run regardless of their success or failure, you could use this construct:
find . -name "*.txt" \( -exec echo {} \; -o -exec true \; \) -exec grep banana {} \;
find . -type d -exec sh -c "echo -n {}; echo -n ' x '; echo {}" \;
One of the following:
find *.txt -exec awk 'END {print $0 "," FILENAME}' {} \;
find *.txt -exec sh -c 'echo "$(tail -n 1 "$1"),$1"' _ {} \;
find *.txt -exec sh -c 'echo "$(sed -n "\$p" "$1"),$1"' _ {} \;
Another way is like this:
multiple_cmd() {
tail -n1 $1;
ls $1
};
export -f multiple_cmd;
find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
in one line
multiple_cmd() { tail -1 $1; ls $1 }; export -f multiple_cmd; find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;
"multiple_cmd()" - is a function
"export -f multiple_cmd" - will export it so any other subshell can see it
"find *.txt -exec bash -c 'multiple_cmd "$0"' {} \;" - find that will execute the function on your example
In this way multiple_cmd can be as long and as complex, as you need.
Hope this helps.
There's an easier way:
find ... | while read -r file; do
echo "look at my $file, my $file is amazing";
done
Alternatively:
while read -r file; do
echo "look at my $file, my $file is amazing";
done <<< "$(find ...)"
Extending #Tinker's answer,
In my case, I needed to make a command | command | command inside the -exec to print both the filename and the found text in files containing a certain text.
I was able to do it with:
find . -name config -type f \( -exec grep "bitbucket" {} \; -a -exec echo {} \; \)
the result is:
url = git#bitbucket.org:a/a.git
./a/.git/config
url = git#bitbucket.org:b/b.git
./b/.git/config
url = git#bitbucket.org:c/c.git
./c/.git/config
I don't know if you can do this with find, but an alternate solution would be to create a shell script and to run this with find.
lastline.sh:
echo $(tail -1 $1),$1
Make the script executable
chmod +x lastline.sh
Use find:
find . -name "*.txt" -exec ./lastline.sh {} \;
Thanks to Camilo Martin, I was able to answer a related question:
What I wanted to do was
find ... -exec zcat {} | wc -l \;
which didn't work. However,
find ... | while read -r file; do echo "$file: `zcat $file | wc -l`"; done
does work, so thank you!
1st answer of Denis is the answer to resolve the trouble. But in fact it is no more a find with several commands in only one exec like the title suggest. To answer the one exec with several commands thing we will have to look for something else to resolv. Here is a example:
Keep last 10000 lines of .log files which has been modified in the last 7 days using 1 exec command using severals {} references
1) see what the command will do on which files:
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "echo tail -10000 {} \> fictmp; echo cat fictmp \> {} " \;
2) Do it: (note no more "\>" but only ">" this is wanted)
find / -name "*.log" -a -type f -a -mtime -7 -exec sh -c "tail -10000 {} > fictmp; cat fictmp > {} ; rm fictmp" \;
I usually embed the find in a small for loop one liner, where the find is executed in a subcommand with $().
Your command would look like this then:
for f in $(find *.txt); do echo "$(tail -1 $f), $(ls $f)"; done
The good thing is that instead of {} you just use $f and instead of the -exec … you write all your commands between do and ; done.
Not sure what you actually want to do, but maybe something like this?
for f in $(find *.txt); do echo $f; tail -1 $f; ls -l $f; echo; done
should use xargs :)
find *.txt -type f -exec tail -1 {} \; | xargs -ICONSTANT echo $(pwd),CONSTANT
another one (working on osx)
find *.txt -type f -exec echo ,$(PWD) {} + -exec tail -1 {} + | tr ' ' '/'
A find+xargs answer.
The example below finds all .html files and creates a copy with the .BAK extension appended (e.g. 1.html > 1.html.BAK).
Single command with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} cp -- "{}" "{}.BAK"
Multiple commands with multiple placeholders
find . -iname "*.html" -print0 | xargs -0 -I {} echo "cp -- {} {}.BAK ; echo {} >> /tmp/log.txt" | sh
# if you need to do anything bash-specific then pipe to bash instead of sh
This command will also work with files that start with a hyphen or contain spaces such as -my file.html thanks to parameter quoting and the -- after cp which signals to cp the end of parameters and the beginning of the actual file names.
-print0 pipes the results with null-byte terminators.
for xargs the -I {} parameter defines {} as the placeholder; you can use whichever placeholder you like; -0 indicates that input items are null-separated.
I found this solution (maybe it is already said in a comment, but I could not find any answer with this)
you can execute MULTIPLE COMMANDS in a row using "bash -c"
find . <SOMETHING> -exec bash -c "EXECUTE 1 && EXECUTE 2 ; EXECUTE 3" \;
in your case
find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
i tested it with a test file:
[gek#tuffoserver tmp]$ ls *.txt
casualfile.txt
[gek#tuffoserver tmp]$ find . -name "*.txt" -exec bash -c "tail -1 '{}' && ls '{}'" \;
testonline1=some TEXT
./casualfile.txt
Here is my bash script that you can use to find multiple files and then process them all using a command.
Example of usage. This command applies a file linux command to each found file:
./finder.sh file fb2 txt
Finder script:
# Find files and process them using an external command.
# Usage:
# ./finder.sh ./processing_script.sh txt fb2 fb2.zip doc docx
counter=0
find_results=()
for ext in "${#:2}"
do
# #see https://stackoverflow.com/a/54561526/10452175
readarray -d '' ext_results < <(find . -type f -name "*.${ext}" -print0)
for file in "${ext_results[#]}"
do
counter=$((counter+1))
find_results+=("${file}")
echo ${counter}") ${file}"
done
done
countOfResults=$((counter))
echo -e "Found ${countOfResults} files.\n"
echo "Processing..."
counter=0
for file in "${find_results[#]}"
do
counter=$((counter+1))
echo -n ${counter}"/${countOfResults}) "
eval "$1 '${file}'"
done
echo "All files have been processed."

Resources