Task: Scan viruses with clamav and report if infected files exists
one line script
clamscan -ir --exclude=/proc --exclude=/sys --exclude=/dev / | grep "Infected files: [1-9].*" -z | mutt -s 'Viruses detected' -- email1#domain.com email2#domain.com email3#domain.com
Problem: Email message is sent if command "clamscan ...| grep" returned empty output (Viruses not founded, Infected files: 0)
Sub-task: Write bash script without use temporary files. Use only redirect output functions and check if output is empty then "Mutt" no to be executed
You can't make it a one-liner without cheating.
The straightforward solution is to capture the output and use it if there was a match:
if output=$(clam etc | grep etc); then
mutt etc <<<"$output"
The cheat is to hide this functionality somehow:
mongrel () { # aka "mutt maybe"
input=$(cat -)
case $input in '') return 1;; esac
mutt "$#" <<<"$input"
clam etc | grep etc | mongrel etc
If there is a lot of output, I would perhaps actually prefer a temporary file over keeping the results in memory; but if this is your assignment, I won't go there.
Incidentally, the trailing wildcard in your grep regex isn't contributing any value -- unless it somehow helps your understanding (which I think it doesn't; more like it adds confusion) I would leave it out.
Only emailing the summary of the results is of dubious value -- to my mind, it would be better to send the entire report when there is an infection.
output=$(clamscan -ir --exclude=/proc --exclude=/sys --exclude=/dev /)
case $output in *"Infected files: [1-9]"*)
mutt -s 'Viruses detected' -- email1#domain.com email2#domain.com email3#domain.com <<<"$output" ;;
Here is my code:
ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r echo
ls | grep -E '^application--[0-9]{4}-[0-9]{2}.tar.gz$' | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}' | xargs -r rm
As you can see it will get a list of files, show it on screen (for logging purpose) and then delete it.
The issue is that if a file was created between first and second line gets executed, I will delete a file without logging that fact.
Is there a way to create a script that will read the same pipe twice, so the awk result will be piped to both xargs echo and xargs rm commands?
I know I can use a file as a temporary buffer, but I would like to avoid that.
You can change your command to something like
touch example
ls example* | tee >(xargs rm)
I would prefer to avoid parsing ls:
while IFS= read -r file; do
if [[ "$1" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]; then
echo "Removing ${file}"
rm "${file}"
done < <(find . -regextype egrep -regex "./application--[0-9]{4}-[0-9]{2}.tar.gz")
EDIT: An improvement:
As #tripleee mentioned is their answer, using rm -v avoids the additional echo and will also avoid an echo when removing a file failed.
For your specific case, you don't need to read the pipe twice, you can just use rm -v to have rm itself also "echo" each file.
Also, in cases like this, it is better for shell scripts to use globs instead grep ..., both for robustness and performance reasons.
And once you do that, even better: you can loop on the glob and not go through any pipes at all (even more robust in the general case, because there are even less places to worry "could a character in this be special to that program?", and might perform better because everything stays in one process):
for file in application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz
if [[ "$file" < "application--${CLEAR_DATE_LEVEL0}.tar.gz" ]]
# echo "$file"
# rm "$file"
rm -v "$file"
But if you find yourself in a situation where you really do need to get data from a pipe and a glob won't work, there are a couple ways:
One neat trick in the shell is that loops and other compound commands can be pipes - so a loop can read a pipe, and the inside of the loop can have all the commands you wanted to have read from the pipe:
ls ... | awk ... | while IFS="" read -r file
# echo "$file"
# rm "$file"
rm -v "$file"
(As a general best practice, you'd want to set IFS= to the empty string for the read command so that read doesn't split the input on characters like spaces, and give read the -r argument to tell it to not interpret special characters like backslashes. In your specific case it doesn't matter.)
But if a loop doesn't work for what you need, then in the general case, you can catch the result of a pipe in a shell variable:
pipe_contents="$(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}')"
echo "$pipe_contents"
rm $pipe_contents
(This works fine unless your pipe output contains characters that would be special to the shell at the point that the pipe output has to be unquoted - in this case, it needs to be unquoted for the rm, because if it's quoted then the shell won't split the captured pipe output on whitespace, and rm will end up looking for one big file name that looks like the entire pipe output. Part of why looping on a glob is more robust is that it doesn't have these kinds of problems: the pipe combines all file names into one big text that needs to be re-split on whitespace. Luckily in your case, your file names don't have whitespace nor globbing characters, so leaving the pipe output unquoted ends up being fine.)
Also, since you're using bash and your pipe data is multiple separate things, you can use an array variable (bash extension, also found in shells like zsh) instead of a regular variable:
files=($(ls application--[0-9][0-9][0-9][0-9]-[0-9][0-9].tar.gz | awk '{if($1<"application--'"${CLEAR_DATE_LEVEL0}"'.tar.gz") print $1}'))
echo "${files[#]}"
rm "${files[#]}"
(Note that an unquoted expansion still happens with the array, it just happens when defining the array instead of when passing the pipe contents to rm. A small advantage is that if you had multiple commands which needed the unquoted contents, using an array does the splitting only once. A big advantage is that once you recognize array syntax, it does a better job of expressing your big-picture intent through the code itself.)
You can also use a temporary file instead of a shell variable, but you said you want to avoid that. I also prefer a variable when the data fits in memory because Linux/UNIX does not give shell scripts a reliable way to clean up external resources (you can use trap but for example traps can't run on uncatchable signals).
P.S. ideally, in the general habit, you should use printf '%s\n' "$foo" instead of echo "$foo", because echo has various special cases (and portability inconsistencies, but that doesn't matter as much if you always use bash until you need to care about portable sh). In modern featureful shells like bash, you can also use %q instead of %s in printf, which is great because for example printf '%q\n' "${files[#]}" will actually print each file with any special characters properly quoted or escaped, which can help with debugging if you ever are dealing with files that have special whitespace or globbing characters in them.
No, a pipe is a stream - once you read something from it, it is forever gone from the pipe.
A good general solution is to use a temporary file; this lets you rewind and replay it. Just take care to remove it when you're done.
temp=$(mktemp -t) || exit
trap 'rm -f "$temp"' ERR EXIT
cat >"$temp"
cat "$temp"
xargs rm <"$temp"
The ERR and EXIT pseudo-signals are Bash extensions. For POSIX portability, you need a somewhat more involved set of trap commands.
Properly speaking, mktemp should receive an argument which is used as a template for the temporary file's name, so that the user can see which temporary file belongs to which tool. For example, if this script was called rmsponge, you could use mktemp rmspongeXXXXXXXXX to have mktemp generate a temporary file name which begins with rmsponge.
If you only expect a limited amount of input, perhaps just capture the input in a variable. However, this scales poorly, and could have rather unfortunate problems if the input data exceeds available memory;
# XXX avoid: scales poorly
xargs printf "%s\n" <<<"$values"
xargs rm <<<"$values"
The <<< "here string" syntax is also a Bash extension. This also suffers from the various issues from https://mywiki.wooledge.org/BashFAQ/020 but this is inherent to your problem articulation.
Of course, in this individual case, just use rm -v to see which files rm removes.
I'm trying to write a shell script that deletes duplicate commands from my zsh_history file. Having no real shell script experience and given my C background I wrote this monstrosity that seems to work (only on Mac though), but takes a couple of lifetimes to end:
currentLines=$(grep -c '^' $history)
echo "Currently handling a grand total of: $currentLines lines. Please stand by..."
while (( $currentLines - $contrastor > 0 ))
wordToBeSearched=$(awk "NR==$currentLines - $contrastor" $history | cut -d ";" -f 2)
echo "$wordToBeSearched A BUSCAR"
while (( $currentLines - $contrastor - $searchdex > 0 ))
currentWord=$(awk "NR==$currentLines - $contrastor - $searchdex" $history | cut -d ";" -f 2)
echo $currentWord
if test "$currentWord" == "$wordToBeSearched"
sed -i .bak "$((currentLines - $contrastor - $searchdex)) d" $history
currentLines=$(grep -c '^' $history)
echo "Line deleted. New number of lines: $currentLines"
let "searchdex--"
let "searchdex++"
let "contrastor++"
I'm now looking for a less life-consuming approach using more shell-like conventions, mainly sed at this point. Thing is, zsh_history stores commands in a very specific way:
: 1652789298:0;man sed
Where the command itself is always preceded by ":0;".
I'd like to find a way to delete duplicate commands while keeping the last occurrence of each command intact and in order.
Currently I'm at a point where I have a functional line that will delete strange lines that find their way into the file (newlines and such):
#sed -i '/^:/!d' $history
But that's about it. Not really sure how get the expression to look for into a sed without falling back into everlasting whiles or how to delete the duplicates while keeping the last-occurring command.
The zsh option hist_ignore_all_dups should do what you want. Just add setopt hist_ignore_all_dups to your zshrc.
I wanted something similar, but I dont care about preserving the last one as you mentioned. This is just finding duplicates and removing them.
I used this command and then removed my .zsh_history and replacing it with the .zhistory that this command outputs
So from your home folder:
cat -n .zsh_history | sort -t ';' -uk2 | sort -nk1 | cut -f2- > .zhistory
This effectively will give you the file .zhistory containing the changed list, in my case it went from 9000 lines to 3000, you can check it with wc -l .zhistory to count the number of lines it has.
Please double check and make a backup of your zsh history before doing anything with it.
The sort command might be able to be modified to sort it by numerical value and somehow archieve what you want, but you will have to investigate further about that.
I found the script here, along with some commands to avoid saving duplicates in the future
I didn't want to rename the history file.
# dedupe_lines.zsh
if [ $# -eq 0 ]; then
echo "Error: No file specified" >&2
exit 1
if [ ! -f $1 ]; then
echo "Error: File not found" >&2
exit 1
sort $1 | uniq >temp.txt
mv temp.txt $1
Add dedupe_lines.zsh to your home directory, then make it executable.
chmod +x dedupe_lines.zsh
Run it.
./dedupe_lines.zsh .zsh_history
I'm using this script to monitor the downloads folder for new .bin files being created. However, it doesn't seem to be working. If I remove the grep, I can make it copy any file created in the Downloads folder, but with the grep it's not working. I suspect the problem is how I'm trying to compare the two values, but I'm really not sure what to do.
inotifywait -m --format %f -e create $downloadDir -q | \
while read line; do
if [ $(ls $downloadDir -a1 | grep '[^.].*bin' | head -1) == $line ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
The ls $downloadDir -a1 | grep '[^.].*bin' | head -1 is the wrong way to go about this. To see why, suppose you had files named a.txt and b.bin in the download directory, and then c.bin was added. inotifywait would print c.bin, ls would print a.txt\nb.bin\nc.bin (with actual newlines, not \n), grep would thin that to b.bin\nc.bin, head would remove all but the first line leaving b.bin, which would not match c.bin. You need to be checking $line to see if it ends in .bin, not scanning a directory listing. I'll give you three ways to do this:
First option, use grep to check $line, not the listing:
if echo "$line" | grep -q '[.]bin$'; then
Note that I'm using the -q option to supress grep's output, and instead simply letting the if command check its exit status (success if it found a match, failure if not). Also, the RE is anchored to the end of the line, and the period is in brackets so it'll only match an actual period (normally, . in a regular expression matches any single character). \.bin$ would also work here.
Second option, use the shell's ability to edit variable contents to see if $line ends in .bin:
if [ "${line%.bin}" != "$line" ]; then
the "${line%.bin}" part gives the value of $line with .bin trimmed from the end if it's there. If that's not the same as $line itself, then $line must've ended with .bin.
Third option, use bash's [[ ]] expression to do pattern matching directly:
if [[ "$line" == *.bin ]]; then
This is (IMHO) the simplest and clearest of the bunch, but it only works in bash (i.e. you must start the script with #!/bin/bash).
Other notes: to avoid some possible issues with whitespace and backslashes in filenames, use while IFS= read -r line; do and follow #shellter's recommendation about double-quotes religiously.
Also, I'm not very familiar with inotifywait, but AIUI its -e create option will notify you when the file is created, not when its contents are fully written out. Depending on the timing, you may wind up copying partially-written files.
Finally, you don't have any checking for duplicate filenames. What should happen if you download a file named foo.bin, it gets copied, you delete the original, then download a different file named foo.bin. As the script is now, it'll silently overwrite the first foo.bin. If this isn't what you want, you should add something like:
if [ ! -e "$mbedDir/$line" ]; then
cp "$downloadDir/$line" "$mbedDir/$line"
elif ! cmp -s "$downloadDir/$line" "$mbedDir/$line"; then
echo "Eeek, a duplicate filename!" >&2
# or possibly something more constructive than that...
I have a folder with backups from a MySQL database that are created automatically. Their name consists of the date the backup was made, like so:
What is a way to get the filename of the last file in the list, i.e. of the one which in alphabetical order comes last?
In a shell script, I would like to do something like
ls -1 | tail -n 1
If you want to assign this to a variable, use $(...) or backticks.
FILE=`ls -1 | tail -n 1`
FILE=$(ls -1 | tail -n 1)
#Sjoerd's answer is correct, I'll just pick a few nits from it:
you don't need the -1 option to enforce one path per line if you pipe the output somewhere:
ls | tail -n 1
you can use -r to get the listing in reverse order, and take the first one:
ls -r | head -n 1
gunzip some.log.gz will write uncompressed data into some.log and remove some.log.gz, which may or may not be what you want (probably isn't). if you want to keep the compressed source, pipe it into gunzip:
gunzip < some.file.gz
you might want to protect the script against situation when the dir contains no files, since
gunzip $empty_variable
expands to just
and such invocation will wait indefinitely for data on standard input:
latest="$(ls -r /some/where/*.gz | head -1)"
if test -z "$latest"; then
# there's no logs yet, bail out
gunzip < $latest
ls can yield unexpected results when parsed by other commands if the filenames have unusual characters. The following always works:
for LAST_BACKUP_FILE in *; do : ; done
for LAST_BACKUP_FILE in * loops through every filename (and folder name, if there are any) in order in the current directory, storing each in $LAST_BACKUP_FILE
do : does nothing
done finishes after the last file
Now, the last file is stored in $LAST_BACKUP_FILE.
If you happen to want the first file, use this:
for FIRST_BACKUP_FILE in *; do break; done
The break statement jumps out of the loop after the first file is stored in $FIRST_BACKUPT_FILE.
(from comment below) If you want hidden files included in the search, then use the command shopt -s dotglob before running the loops.
The shell is more powerful than many think. Just let it work for you. Assuming filenames without spaces,
set -- $(ls -r *.gz)
does the trick with a single fork, no pipes, and you can even avoid the fork if your shell supports arithmetic expansion as in
set -- *.gz
shift $(($# - 1))
I have a perl script (or any executable) E which will take a file foo.xml and write a file foo.txt. I use a Beowulf cluster to run E for a large number of XML files, but I'd like to write a simple job server script in shell (bash) which doesn't overwrite existing txt files.
I'm currently doing something like
PATTERN="[A-Z]*0[1-2][a-j]"; # this matches foo in all cases
todo=`ls *.xml | grep $PATTERN -o`;
isdone=`ls *.txt | grep $PATTERN -o`;
whatsleft=todo - isdone; # what's the unix magic?
#tack on the .xml prefix with sed or something
#and then call the job server;
jobserve E "$whatsleft";
and then I don't know how to get the difference between $todo and $isdone. I'd prefer using sort/uniq to something like a for loop with grep inside, but I'm not sure how to do it (pipes? temporary files?)
As a bonus question, is there a way to do lookahead search in bash grep?
To clarify/extend the problem:
I have a bunch of programs that take input from sources like (but not necessarily) data/{branch}/special/{pattern}.xml and write output to another directory results/special/{branch}-{pattern}.txt (or data/{branch}/intermediate/{pattern}.dat, e.g.). I want to check in my jobfarming shell script if that file already exists.
So E transforms data/{branch}/special/{pattern}.xml->results/special/{branch}-{pattern}.dat, for instance. I want to look at each instance of the input and check if the output exists. One (admittedly simpler) way to do this is just to touch *.done files next to each input file and check for those results, but I'd rather not manage those, and sometimes the jobs terminate improperly so I wouldn't want them marked done.
N.B. I don't need to check concurrency yet or lock any files.
So a simple, clear way to solve the above problem (in pseudocode) might be
for i in `/bin/ls *.xml`
replace xml suffix with txt
if [that file exists]
add to whatsleft list
but I'm looking for something more general.
shopt -s extglob # allow extended glob syntax, for matching the filenames
LC_COLLATE=C # use a sort order comm is happy with
IFS=$'\n' # so filenames can have spaces but not newlines
# (newlines don't work so well with comm anyhow;
# shame it doesn't have an option for null-separated
# input lines).
files_todo=( **([A-Z])0[1-2][a-j]*.xml )
files_done=( **([A-Z])0[1-2][a-j]*.txt )
files_remaining=( \
$(comm -23 --nocheck-order \
<(printf "%s\n" "${files_todo[#]%.xml}") \
<(printf "%s\n" "${files_done[#]%.txt}") ))
echo jobserve E $(for f in "${files_remaining[#]%.xml}"; do printf "%s\n" "${f}.txt"; done)
This assumes that you want a single jobserve E call with all the remaining files as arguments; it's rather unclear from the specification if such is the case.
Note the use of extended globs rather than parsing ls, which is considered very poor practice.
To transform input to output names without using anything other than shell builtins, consider the following:
if [[ $in_name =~ data/([^/]+)/special/([^/]+).xml ]] ; then
: # ...handle here the fact that you have a noncompliant name...
The question title suggests that you might be looking for:
set -o noclobber
The question content indicates a wholly different problem!
It seems you want to run 'jobserve E' on each '.xml' file without a matching '.txt' file. You'll need to assess the TOCTOU (Time of Check, Time of Use) problems here because you're in a cluster environment. But the basic idea could be:
for file in *.xml
do [ -f ${file%.xml}.txt ] || todo="$todo $file"
jobserve E $todo
This will work with Korn shell as well as Bash. In Bash you could explore making 'todo' into an array; that will deal with spaces in file names better than this will.
If you have processes still generating '.txt' files for '.xml' files while you run this check, you will get some duplicated effort (because this script cannot tell that the processing is happening). If the 'E' process creates the corresponding '.txt' file as it starts processing it, that minimizes the chance or duplicated effort. Or, maybe consider separating the processed files from the unprocessed files, so the 'E' process moves the '.xml' file from the 'to-be-done' directory to the 'done' directory (and writes the '.txt' file to the 'done' directory too). If done carefully, this can avoid most of the multi-processing problems. For example, you could link the '.xml' to the 'done' directory when processing starts, and ensure appropriate cleanup with an 'atexit()' handler (if you are moderately confident your processing program does not crash). Or other trickery of your own devising.
whatsleft=$( ls *.xml *.txt | grep $PATTERN -o | sort | uniq -u )
Note this actually gets a symmetric difference.
i am not exactly sure what you want, but you can check for existence of the file first, if it exists, create a new name? ( Or in your E (perl script) you do this check. )
if [ -f "$file" ];then
jobserve E .... > $newname
if its not what you want, describe more clearly in your question what you mean by "don't overwrite files"..
for posterity's sake, this is what i found to work:
ls *.xml | grep $PATTERN -o > $TMPA;
ls *.txt | grep $PATTERN -o > $TMPB;
whatsleft = `sort $TMPA $TMPB | uniq -u | sed "s/%/.xml" > xargs`;