Command Line Argument for Script Changing Way Code Functions - bash

I'm writing a script to loop through a directory, look through each file, and give the iterations of a certain word in each file. When I write it for the specific directory it works fine, but when I try to make the directory a command line argument, it only gives me the count for the first file. I was thinking maybe this has something to do with, the argument being singular ($1), but I really have no idea.
Works
for f in /home/student/Downloads/reviews_folder/*
do
tr -s ' ' '\n' <$f | grep -c '<Author>'
done
Output
125
163
33
...
Doesn't Work
for f in "$1"
do
tr -s ' ' '\n' <$f | grep -c '<Author>'
done
Command Line Input
student-vm:~$ ./countreviews.sh /home/student/Downloads/reviews_folder/*
Output
125

The shell expands wildcards before passing the list of arguments to your script.
To loop over all the files passed in as command-line arguments,
for f in "$#"
do
tr -s ' ' '\n' <"$f" | grep -c '<Author>'
done
Run it like
./countreviews /home/student/Downloads/reviews_folder/*
or more generally
./countreviews ... list of file names ...
As you discovered, "$1" corresponds to the first file name in the expanded list of wildcards.

If you are using double quotes for the parameter it should work. Like this:
student-vm:~$ ./countreviews.sh "/home/student/Downloads/reviews_folder/*"
At least like this it works for me. I hope this helps you.

Related

bash cat exclude multiple files based on grep results

I have the following cat command that I use in a bash script. I look for $SAMPLE.txt file in subfolders 20* and combine them into 1 output.txt
cat /$FOLDER/20*/$SAMPLE.txt > /$OUTPUTFOLDER/output.txt
I now want to exclude certain files conditionally.
I found the following here https://unix.stackexchange.com/questions/246048/cat-files-except-one
$ shopt -s extglob
$ cat -- !(DISCARD).txt > catKEPT
I want to do something like this.
Look for $SAMPLE and a pattern '$PAT1' in a $SAMPLEFILE. This $SAMPLEFILE is comma seperated. If there is a match, I want to store the first field of this line & use it to exclude files from cat
I would use this command to look for $SAMPLE and $PAT1 & then cut to keep my first field. I would assign that to a variable 'EXLUDE_FOLDER'
EXCLUDE_FOLDER=grep '$SAMPLE' $SAMPLEFILE | grep '$PAT1' | cut -d "," -f 1
And then use it like this
cat /$FOLDER/20*/$SAMPLE.txt -- !($FOLDER/$EXLUDE_FOLDER/$SAMPLE.txt) > /$OUTPUTFOLDER/output.txt
I'm stuck at putting this into an if/statement and dealing with situations where grep results in multiple matches, so multiple files should be excluded
If SAMPLE and PAT are variables, you presumably want them expanded to their contents, which means you must put them in double quotes, not single quotes. Example:
SAMPLE=3
# Compare single quotes versus double
echo '$SAMPLE' # outputs $SAMPLE
echo "$SAMPLE" # outputs 3
If SAMPLEFILE is the name of a file, you must double-quote it, else it will fail if your filename has spaces in it, so you must use:
grep "$SAMPLE" "$SAMPLEFILE"
So, now you can test if your grep works like this:
grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1
So, if that works, the next thing is that you want to capture the output of the command, so you need to use $(...). That means:
EXCLUDE_FOLDER=$(grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1)
So, see test if that works now:
echo "$EXCLUDE_FOLDER"

how to read a value from filename and insert/replace it in the file?

I have to run many python script which differ just with one parameter. I name them as runv1.py, runv2.py, runv20.py. I have the original script, say runv1.py. Then I make all copies that I need by
cat runv1.py | tee runv{2..20..1}.py
So I have runv1.py,.., runv20.py. But still the parameter v=1 in all of them.
Q: how can I also replace v parameter to read it from the file name? so e.g in runv4.py then v=4. I would like to know if there is any one-line shell command or combination of commands. Thank you!
PS: direct editing each file is not a proper solution when there are too many files.
Below for loop will serve your purpose I think
for i in `ls | grep "runv[0-9][0-9]*.py"`
do
l=`echo $i | tr -d [a-z.]`
sed -i 's/v/'"$l"'/g' runv$l.py
done
Below command was to pass the parameter to script extracted from the filename itself
ls | grep "runv[0-9][0-9]*.py" | tr -d [a-z.] | awk '{print "./runv"$0".py "$0}' | xargs sh
in the end instead of sh you can use python or bash or ksh.

Bash variables not acting as expected

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done
You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done
The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

makefile shell grep doesn't find the file I tried to specify

This approach is not finding the file I think I specified.
SHELL = /bin/bash
PKG_NAME = test
PKG_VERSION := $(shell grep -i '^version' $(PKG_NAME)/DESCRIPTION | cut -d ':' -f2 | cut -d ' ' -f2)
In the shell itself, grep -i '^version' test/DESCRIPTION | cut -d ':' -f2 | cut -d ' ' -f2 does return the version successfully, e.g. 0.4-7
But, running via the makefile returns:
grep: test: Is a directory
grep: /DESCRIPTION: No such file or directory
test is a directory, that's true, but test/DESCRIPTION does exist, so I'm guessing $(PKG_NAME)/DESCRIPTION wasn't the right way to assemble the file name.
Suggestions? Thanks.
That error indicates that grep is seeing test and /DESCRIPTION as two separate arguments. Do you have extra spaces on the PKG_NAME assignment line or an errant space between $(PKG_NAME) and /DESCRIPTION in the $(shell ...) line?
As a general rule you might want to start putting quotes around arguments to shell commands (i.e. '$(PKG_NAME)/DESCRIPTION') to prevent this sort of word splitting issue (though without spaces you generally don't have that sort of problem).

How to apply shell command to each line of a command output?

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?
It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'
for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo
You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.
You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.
A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.
xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.
i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.
Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Resources