Making A GNU-Grep-Based Script MacOS Friendly - macos

I recently discovered that mutt allows me to do something I've been trying, without success, to do in my GUI e-mail client for years: (largely) automate the process of saving an e-mail message (*.eml) to a local directory of my choice.
This Unix & Linux StackExchange post shared a rough-and-ready mutt macro for handling this process. As you'll see, however, the macro's grep commands reach for the -P flag (i.e. Perl regular expressions) and, thus, do not run on the Macbook I'm currently using:
#!/usr/bin/env zsh
#Saved piped email to "$1/YYMMDD SUBJECT.eml"
# Don't overwrite existing file
set -o noclobber
message=$(cat)
mail_date=$(<<<"$message" grep -oPm 1 '^Date: ?\K.*')
formatted_date=$(date -d"$mail_date" +%y%m%d)
# Get the first line of the subject, and change / to ∕ so it's not a subdirectory
subject=$(<<<"$message" grep -oPm 1 '^Subject: ?\K.*' | sed 's,/,∕,g')
if [[ $formatted_date == '' ]]; then
echo Error: no date parsed
exit 1
elif [[ $subject == '' ]]; then
echo Warning: no subject found
fi
echo "${message}" > "$1/$formatted_date $subject.eml" && echo Email saved to "$1/$formatted_date $subject.eml"
I'm far from comfortable with complex grep queries, so my meager efforts to make this script work (e.g. swapping out the -P flag for the -e flag) have met with failure.
To wit, this is the error message thrown when I swap in the -e flag:
grep: 1: No such file or directory
grep: ^Date: ?\K.*: No such file or directory
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
grep: 1: No such file or directory
grep: ^Subject: ?\K.*: No such file or directory
Error: no date parsed
Press any key to continue...
Mercifully, the error messages here are pretty clear. The script's use of 1 appears to be faulty, as does the last bit of the anchored grep query (e.g. ^Date: ?\K.*).
Unfortunately, I have no idea how to begin to resolve these errors.
What I'm attempting to do is, in fact, quite simple. Rather than manually running | cat > FILE_PATH/email.eml I'd like to simply be able to hit a key in mutt, extract the selected e-mail's date (e.g. everything to the end-of-the-line after Date:) and subject (e.g. everything to the end-of-line after Subject), then use that information to generate the name of the *.eml file saved locally (e.g. YYYY-MM-DD subject.eml).
Does anyone have any suggestions on how to make this script play nice in MacOS?

One option is to use zsh parameter expansions to parse the values, so there's no need to worry about grep versions. As a bonus, this launches fewer subprocesses:
#!/usr/bin/env zsh
# Save piped email to "$1/YYMMDD SUBJECT.eml"
# Don't overwrite existing file
set -o noclobber
# stdin split into array of lines:
message=("${(#f)$(<&0)}")
mail_date=${${(M)message:#Date: *}[1]#Date: }
formatted_date=$(gdate -d"$mail_date" +%y%m%d)
# Get the subject, and change '/' to '-'
subject=${${${(M)message:#Subject: *}[1]#Subject: }//\//-}
if [[ -z $formatted_date ]]; then
print -u2 Error: no date parsed
exit 1
elif [[ -z $subject ]]; then
print -u2 Warning: no subject found
fi
outdir=${1:?Output directory must be specified}
if [[ ! -d $outdir ]]; then
print -u2 Error: no output directory $outdir
exit 1
fi
outfile="$outdir/$formatted_date $subject.eml"
print -l $message > "$outfile" && print Email saved to "$outfile"
This statement gets the date from the array of lines in message:
mail_date=${${(M)message:#Date: *}[1]#Date: }
${(M)...:#...} - gets elements that match a pattern from the array. Here we use it to find elements that start with Date: .
${...[1]} - returns the first match.
${...#Date: } - removes the prefix Date: , leaving the date string.
This similar statement has an additional expansion that replaces all instances of / with -:
subject=${${${(M)message:#Subject: *}[1]#Subject: }//\//-}
The parameter expansions are documented in the zshexpn man page.
PS: trailing newlines will be removed from the message written to the file. This is a difficult-to-avoid consequence of using a command substitution like $(...). It's not likely to be a significant issue.

Related

Bash File names will not append to file from script

Hello I am trying to get all files with Jane's name to a separate file called oldFiles.txt. In a directory called "data" I am reading from a list of file names from a file called list.txt, from which I put all the file names containing the name Jane into the files variable. Then I'm trying to test the files variable with the files in list.txt to ensure they are in the file system, then append the all the files containing jane to the oldFiles.txt file(which will be in the scripts directory), after it tests to make sure the item within the files variable passes.
#!/bin/bash
> oldFiles.txt
files= grep " jane " ../data/list.txt | cut -d' ' -f 3
if test -e ~data/$files; then
for file in $files; do
if test -e ~/scripts/$file; then
echo $file>> oldFiles.txt
else
echo "no files"
fi
done
fi
The above code gets the desired files and displays them correctly, as well as creates the oldFiles.txt file, but when I open the file after running the script I find that nothing was appended to the file. I tried changing the file assignment to a pointer instead files= grep " jane " ../data/list.txt | cut -d' ' -f 3 ---> files=$(grep " jane " ../data/list.txt) to see if that would help by just capturing raw data to write to file, but then the error comes up "too many arguments on line 5" which is the 1st if test statement. The only way I get the script to work semi-properly is when I do ./findJane.sh > oldFiles.txt on the shell command line, which is me essentially manually creating the file. How would I go about this so that I create oldFiles.txt and append to the oldFiles.txt all within the script?
The biggest problem you have is matching names like "jane" or "Jane's", etc. while not matching "Janes". grep provides the options -i (case insensitive match) and -w (whole-word match) which can tailor your search to what you appear to want without having to use the kludge (" jane ") of appending spaces before an after your search term. (to properly do that you would use [[:space:]]jane[[:space:]])
You also have the problem of what is your "script dir" if you call your script from a directory other than the one containing your script, such as calling your script from your $HOME directory with bash script/findJane.sh. In that case your script will attempt to append to $HOME/oldFiles.txt. The positional parameter $0 always contains the full pathname to the current script being run, so you can capture the script directory no matter where you call the script from with:
dirname "$0"
You are using bash, so store all the filenames resulting from your grep command in an array, not some general variable (especially since your use of " jane " suggests that your filenames contain whitespace)
You can make your script much more flexible if you take the information of your input file (e.g list.txt), the term to search for (e.g. "jane"), the location where to check for existence of the files (e.g. $HOME/data) and the output filename to append the names to (e.g. "oldFile.txt") as command line [positonal] parameters. You can give each default values so it behaves as you currently desire without providing any arguments.
Even with the additional scripting flexibility of taking the command line arguments, the script actually has fewer lines simply filling an array using mapfile (synonymous with readarray) and then looping over the contents of the array. You also avoid the additional subshell for dirname with a simple parameter expansion and test whether the path component is empty -- to replace with '.', up to you.
If I've understood your goal correctly, you can put all the pieces together with:
#!/bin/bash
# positional parameters
src="${1:-../data/list.txt}" # 1st param - input (default: ../data/list.txt)
term="${2:-jane}" # 2nd param - search term (default: jane)
data="${3:-$HOME/data}" # 3rd param - file location (defaut: ../data)
outfn="${4:-oldFiles.txt}" # 4th param - output (default: oldFiles.txt)
# save the path to the current script in script
script="$(dirname "$0")"
# if outfn not given, prepend path to script to outfn to output
# in script directory (if script called from elsewhere)
[ -z "$4" ] && outfn="$script/$outfn"
# split names w/term into array
# using the -iw option for case-insensitive whole-word match
mapfile -t files < <(grep -iw "$term" "$src" | cut -d' ' -f 3)
# loop over files array
for ((i=0; i<${#files[#]}; i++)); do
# test existence of file in data directory, redirect name to outfn
[ -e "$data/${files[i]}" ] && printf "%s\n" "${files[i]}" >> "$outfn"
done
(note: test expression and [ expression ] are synonymous, use what you like, though you may find [ expression ] a bit more readable)
(further note: "Janes" being plural is not considered the same as the singular -- adjust the grep expression as desired)
Example Use/Output
As was pointed out in the comment, without a sample of your input file, we cannot provide an exact test to confirm your desired behavior.
Let me know if you have questions.
As far as I can tell, this is what you're going for. This is totally a community effort based on the comments, catching your bugs. Obviously credit to Mark and Jetchisel for finding most of the issues. Notable changes:
Fixed $files to use command substitution
Fixed path to data/$file, assuming you have a directory at ~/data full of files
Fixed the test to not test for a string of files, but just the single file (also using -f to make sure it's a regular file)
Using double brackets — you could also use double quotes instead, but you explicitly have a Bash shebang so there's no harm in using Bash syntax
Adding a second message about not matching files, because there are two possible cases there; you may need to adapt depending on the output you're looking for
Removed the initial empty redirection — if you need to ensure that the file is clear before the rest of the script, then it should be added back, but if not, it's not doing any useful work
Changed the shebang to make sure you're using the user's preferred Bash, and added set -e because you should always add set -e
#!/usr/bin/env bash
set -e
files=$(grep " jane " ../data/list.txt | cut -d' ' -f 3)
for file in $files; do
if [[ -f $HOME/data/$file ]]; then
if [[ -f $HOME/scripts/$file ]]; then
echo "$file" >> oldFiles.txt
else
echo "no matching file"
fi
else
echo "no files"
fi
done

Need help formatting Tshark command string from bash script

I'm attempting to run multiple parallel instances to tshark to comb through a large number of pcap files in a directory and copy the filtered contents to a new file. I'm running into an issue where Tshark is throwing an error on the command I'm feeding it.
It must have something to do with the way the command string is interpreted by tshark as I can copy / paste the formatted command string to the console and it runs just fine. I've tried formatting the command several ways and read threads from others who had similar issues. I believe I'm formatting correctly... but still get the error.
Here's what I'm working with:
Script #1: - filter
#Takes user arguments <directory> and <filter> and runs a filter on all captures for a given directory.
#
#TO DO:
#Add user prompts and data sanitization to avoid running bogus job.
#Add concatenation via mergecap /w .pcap suffix
#Delete filtered, unmerged files
#Add mtime filter for x days of logs
starttime=$(date)
if [$1 = '']; then echo "no directory specified, you must specify a directory (VLAN)"
else if [$2 = '']; then echo "no filter specified, you must specify a valid tshark filter expression"
else
echo $2 > /home/captures-user/filtered/filter-reference
find /home/captures-user/Captures/$1 -type f | xargs -P 5 -L 1 /home/captures-user/tshark-worker
rm /home/captures-user/filtered/filter-reference
fi
fi
echo Start time is $starttime
echo End time is $(date)
Script #2: - tshark-worker
# $1 = path and file name
#takes the output from the 'filter' command stored in a file and loads a local variable with it
filter=$(cat /home/captures-user/filtered/filter-reference)
#strips the directory off the current working file
file=$(sed 's/.*\///' <<< $1 )
echo $1 'is the file to run' $filter 'on.'
#runs the filter and places the filtered results in the /filtered directory
command=$"tshark -r $1 -Y '$filter' -w /home/captures-user/filtered/$file-filtered"
echo $command
$command
When I run ./filter ICE 'ip.addr == 1.1.1.1' I get the following output for each file. Note the the inclusion of == in the filter expression is not the issue, I've tried substituting 'or' and get the same output. Also, tshark is not aliased to anything, and there's no script with that name. It's the raw tshark executable in /usr/sbin.
Output:
/home/captures-user/Captures/ICE/ICE-2019-05-26_00:00:01 is the file to run ip.addr == 1.1.1.1 on.
tshark -r /home/captures-user/Captures/ICE/ICE-2019-05-26_00:00:01 -Y 'ip.addr == 1.1.1.1' -w /home/captures-user/filtered/ICE-2019-05-26_00:00:01-filtered
tshark: Display filters were specified both with "-d" and with additional command-line arguments.
As I mentioned in the comments, I think this is a problem with quoting and how your command is constructed due to spaces in the filter (and possibly in the file name and/or path).
You could try changing your tshark-worker script to something like the following:
# $1 = path and file name
#takes the output from the 'filter' command stored in a file and loads a local variable with it
filter="$(cat /home/captures-user/filtered/filter-reference)"
#strips the directory off the current working file
file="$(sed 's/.*\///' <<< $1 )"
echo $1 'is the file to run' $filter 'on.'
#runs the filter and places the filtered results in the /filtered directory
tshark -r "${1}" -Y "${filter}" -w "${file}"-filtered

Add time stamp to file which has at least one row in UNIX

I have list of files at a location ${POWERCENTER_FILE_DIR} .
The files consist of row header and values.
MART_Apple.csv
MART_SAMSUNG.csv
MART_SONY.csv
MART_BlackBerry.csv
Requirements:
select only those files which has atleast 1 row.
Add time stamp to the files which has at least 1 row.
For example:
If all the files except MART_BlackBerry.csv has atleast one row then my output files names should be
MART_Apple_20170811112807.csv
MART_SAMSUNG_20170811112807.csv
MART_SONY_20170811112807.csv
Code tried so far
#!/bin/ksh
infilename=${POWERCENTER_FILE_DIR}MART*.csv
echo File name is ${infilename}
if [ wc -l "$infilename"="0" ];
then
RV=-1
echo "input file name cannot be blank or *"
exit $RV
fi
current_timestamp=`date +%Y%m%d%H%M%S`
filename=`echo $infilename | cut -d"." -f1 `
sftpfilename=`echo $filename`_${current_timestamp}.csv
cp -p ${POWERCENTER_FILE_DIR}$infilename ${POWERCENTER_FILE_DIR}$sftpfilename
RV=$?
if [[ $RV -ne 0 ]];then
echo Adding timestamp to ${POWERCENTER_FILE_DIR}$infilename failed ... Quitting
echo Return Code $RV
exit $RV
fi
Encountering errors like:
line 3: [: -l: binary operator expected
cp: target `MART_Apple_20170811121023.csv' is not a directory
failed ... Quitting
Return Code 1
to be frank, i am not able to apprehend the errors nor i am sure i am doing it right. Beginner in unix scripting.Can any experts guide me where to the correct way.
Here's an example using just find, sh, mv, basename, and date:
find ${POWERCENTER_FILE_DIR}MART*.csv ! -empty -execdir sh -c "mv {} \$(basename -s .csv {})_\$(date +%Y%m%d%H%M%S).csv" \;
I recommend reading Unix Power Tools for more ideas.
When it comes to shell scripting there is rarely a single/one/correct way to accomplish the desired task.
Often times you may need to trade off between readability vs maintainability vs performance vs adhering-to-some-local-coding-standard vs shell-environment-availability (and I'm sure there are a few more trade offs). So, fwiw ...
From your requirement that you're only interested in files with at least 1 row, I read this to also mean that you're only interested in files with size > 0.
One simple ksh script:
#!/bin/ksh
# define list of files
filelist=${POWERCENTER_FILE_DIR}/MART*.csv
# grab current datetime stamp
dtstamp=`date +%Y%m%d%H%M%S`
# for each file in our list ...
for file in ${filelist}
do
# each file should have ${POWERCENTER_FILE_DIR} as a prefix;
# uncomment 'echo' line for debugging purposes to display
# the contents of the ${file} variable:
#echo "file=${file}"
# '-s <file>' => file exists and size is greater than 0
# '! -s <file>' => file doesn't exist or size is equal to 0, eg, file is empty in our case
#
# if the file is empty, skip/continue to next file in loop
if [ ! -s ${file} ]
then
continue
fi
# otherwise strip off the '.csv'
filebase=${file%.csv}
# copy our current file to a new file containing the datetime stamp;
# keep in mind that each ${file} already contains the contents of the
# ${POWERCENTER_FILE_DIR} variable as a prefix; uncomment 'echo' line
# for debug purposes to see what the cp command looks like:
#echo "cp command=cp ${file} ${filebase}.${dtstamp}.csv"
cp ${file} ${filebase}.${dtstamp}.csv
done
A few good resources for learning ksh:
O'Reilly: Learning the Korn Shell
O'Reilly: Learning the Korn Shell, 2nd Edition (includes the newer ksh93)
at your UNIX/Linux command line: man ksh
A simplified script would be something like
#!/bin/bash
# Note I'm using bash above, can't guarantee (but I hope) it would work in ksh too.
for file in ${POWERCENTER_FILE_DIR}/*.csv # Check Ref [1]
do
if [ "$( wc -l "$file" | grep -Eo '^[[:digit:]]+' )" -ne 0 ] # checking at least one row? Check Ref [2]
then
mv "$file" "${file%.csv}$(date +'%Y%m%d%H%M%S').csv" # Check Ref [3]
fi
done
References
File Globbing [1]
Command Substitution [2]
Parameter Substitution [3]

Bash check empty output and disable script execution in one line

Task: Scan viruses with clamav and report if infected files exists
one line script
clamscan -ir --exclude=/proc --exclude=/sys --exclude=/dev / | grep "Infected files: [1-9].*" -z | mutt -s 'Viruses detected' -- email1#domain.com email2#domain.com email3#domain.com
Problem: Email message is sent if command "clamscan ...| grep" returned empty output (Viruses not founded, Infected files: 0)
Sub-task: Write bash script without use temporary files. Use only redirect output functions and check if output is empty then "Mutt" no to be executed
You can't make it a one-liner without cheating.
The straightforward solution is to capture the output and use it if there was a match:
if output=$(clam etc | grep etc); then
mutt etc <<<"$output"
fi
The cheat is to hide this functionality somehow:
mongrel () { # aka "mutt maybe"
input=$(cat -)
case $input in '') return 1;; esac
mutt "$#" <<<"$input"
}
clam etc | grep etc | mongrel etc
If there is a lot of output, I would perhaps actually prefer a temporary file over keeping the results in memory; but if this is your assignment, I won't go there.
Incidentally, the trailing wildcard in your grep regex isn't contributing any value -- unless it somehow helps your understanding (which I think it doesn't; more like it adds confusion) I would leave it out.
Only emailing the summary of the results is of dubious value -- to my mind, it would be better to send the entire report when there is an infection.
output=$(clamscan -ir --exclude=/proc --exclude=/sys --exclude=/dev /)
case $output in *"Infected files: [1-9]"*)
mutt -s 'Viruses detected' -- email1#domain.com email2#domain.com email3#domain.com <<<"$output" ;;
esac

Can't get this code to work

I'm writing a bash shell script that uses a case with three options:
If the user enters "change -r txt doc *", a file extension gets changed in a subdirectory.
If a user enters "change -n -r doc ", it should rename files that end with .-r or .-n (this will rename all files in the current directory called *.-r as *.doc)
If the user enters nothing, as in "change txt doc *", it just changes a file extension in the current directory.
Here's the code i produced for it (the last two options, i'm not sure how to implement):
#!/bin/bash
case $1 in
-r)
export currectFolder=`pwd`
for i in $(find . -iname "*.$2"); do
export path=$(readlink -f $i)
export folder=`dirname $path`
export name=`basename $path .$2`
cd $folder
mv $name.$2 $name.$3
cd $currectFolder
done
;;
-n)
echo "-n"
;;
*)
echo "all"
esac
Can anyone fix this for me? Or at least tell me where i'm going wrong?
What you should brush up on are string substitutions. All kinds of them actually. Bash is very good with those. Page 105 (recipe 5.18) of the Bash Cookbook is excellent reading for that.
#!/bin/bash
# Make it more flexible for improving command line parsing later
SWITCH=$1
EXTENSIONSRC=$2
EXTENSIONTGT=$3
# Match different cases for the only allowed switch (other than file extensions)
case $SWITCH in
-r|--)
# If it's not -r we limit the find to the current directory
[[ "x$SWITCH" == "x-r" ]] || DONTRECURSE="-maxdepth 1"
# Files in current folder with particular pattern (and subfolders when -r)
find . $DONTRECURSE -iname "*.$EXTENSIONSRC"|while read fname; do
# We use a while to allow for file names with embedded blank spaces
# Get canonical name of the item into CFNAME
CFNAME=$(readlink -f "$fname")
# Strip extension through string substitution
NOEXT_CFNAME="${CFNAME%.$EXTENSIONSRC}"
# Skip renaming if target exists. This can happen due to collisions
# with case-insensitive matching ...
if [[ -f "$NOEXT_CFNAME.$EXTENSIONTGT" ]]; then
echo "WARNING: Skipping $CFNAME"
else
echo "Renaming $CFNAME"
# Do the renaming ...
mv "$CFNAME" "$NOEXT_CFNAME.$EXTENSIONTGT"
fi
done
;;
*)
# The -e for echo means that escape sequences like \n and \t get evaluated ...
echo -e "ERROR: unknown command line switch\n\tSyntax: change <-r|--> <source-ext> <target-ext>"
# Exit with non-zero (i.e. failure) status
exit 1
esac
The syntax is obviously given in the script. I took the freedom to use the convention of -- separating command line switches from file names. This way it looks cleaner and is easier to implement, actually.
NB: it is possible to condense this further. But here I was trying to get a point across, rather than win the obfuscated Bash contest ;)
PS: also handles the case-insensitive stuff now in the renaming part. However, I decided to make it skip if the target file already exists. Can perhaps be rewritten to be a command line option.

Resources