Bash variables not acting as expected - bash

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done

You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done

The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

Related

bash cat exclude multiple files based on grep results

I have the following cat command that I use in a bash script. I look for $SAMPLE.txt file in subfolders 20* and combine them into 1 output.txt
cat /$FOLDER/20*/$SAMPLE.txt > /$OUTPUTFOLDER/output.txt
I now want to exclude certain files conditionally.
I found the following here https://unix.stackexchange.com/questions/246048/cat-files-except-one
$ shopt -s extglob
$ cat -- !(DISCARD).txt > catKEPT
I want to do something like this.
Look for $SAMPLE and a pattern '$PAT1' in a $SAMPLEFILE. This $SAMPLEFILE is comma seperated. If there is a match, I want to store the first field of this line & use it to exclude files from cat
I would use this command to look for $SAMPLE and $PAT1 & then cut to keep my first field. I would assign that to a variable 'EXLUDE_FOLDER'
EXCLUDE_FOLDER=grep '$SAMPLE' $SAMPLEFILE | grep '$PAT1' | cut -d "," -f 1
And then use it like this
cat /$FOLDER/20*/$SAMPLE.txt -- !($FOLDER/$EXLUDE_FOLDER/$SAMPLE.txt) > /$OUTPUTFOLDER/output.txt
I'm stuck at putting this into an if/statement and dealing with situations where grep results in multiple matches, so multiple files should be excluded
If SAMPLE and PAT are variables, you presumably want them expanded to their contents, which means you must put them in double quotes, not single quotes. Example:
SAMPLE=3
# Compare single quotes versus double
echo '$SAMPLE' # outputs $SAMPLE
echo "$SAMPLE" # outputs 3
If SAMPLEFILE is the name of a file, you must double-quote it, else it will fail if your filename has spaces in it, so you must use:
grep "$SAMPLE" "$SAMPLEFILE"
So, now you can test if your grep works like this:
grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1
So, if that works, the next thing is that you want to capture the output of the command, so you need to use $(...). That means:
EXCLUDE_FOLDER=$(grep "$SAMPLE" "$SAMPLEFILE" | grep "$PAT1" | cut -d "," -f 1)
So, see test if that works now:
echo "$EXCLUDE_FOLDER"

Read file by line then process as different variable

I have created a text file with a list of file names like below
022694-39.tar
022694-39.tar.2017-05-30_13:56:33.OLD
022694-39.tar.2017-07-04_09:22:04.OLD
022739-06.tar
022867-28.tar
022867-28.tar.2018-07-18_11:59:19.OLD
022932-33.tar
I am trying to read the file line by line then strip anything after .tar with awk and use this to create a folder unless it exists.
Then the plan is to copy the original file to the new folder with the original full name stored in $LINE.
$QNAP= "Path to storage"
$LOG_DIR/$NOVA_TAR_LIST= "Path to text file containing file names"
while read -r LINE; do
CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`"
if [ ! -d "$QNAP/$CURNT_JOB_STRIPED" ]
then
echo "Folder $QNAP/$CURNT_JOB_STRIPED doesn't exist."
#mkdir "$QNAP/$CURNT_JOB_STRIPED"
fi
done <"$LOG_DIR/$NOVA_TAR_LIST"
Unfortunately this seems to be trying to join all the file names together when trying to create the directories rather than doing them one by one and I get a
File name too long
output:
......951267-21\n951267-21\n961075-07\n961148-13\n961520-20\n971333-21\n981325-22\n981325-22\n981743-40\n999111-99\n999999-04g\n999999-44': File name too long
Apologies if this is trivial, bit of a rookie...
Try modifying your script as follows:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F ".tar" '{print $1}')
You have to use $(...) for command substitution. Also, you should print the variable LINE in order to prevent the shell from interpreting its value as a command but passing it to the next command of the pipe (as an input) instead. Finally, you should remove the backticks from the awk expression (this is the deprecated syntax for command substitution) since what you want is the result from the piping commands.
For further information, take a look over http://tldp.org/LDP/abs/html/commandsub.html
Alternatively, and far less readable (neither with a higher performance, thus just as a "curiosity"), you can just use instead of the whole while loop:
xargs -I{} bash -c 'mkdir -p "${2}/${1%.tar*}"' - '{}' "${QNAP}" < "${LOG_DIR}/${NOVA_TAR_LIST}"
The problem is with the CURNT_JOB_STRIPED="$LINE | `awk -F ".tar" '{print $1}'`" line.
The `command` is legacy a syntax, $(command) should be used instead.
$LINE variable should be printed so awk can receive its value trough a pipe.
If you run the whole thing in a sub shell ( $(command) ) you can assign the output into a variable: var=$(date)
Is is safer to put variables into ${} so if there is surrounding text you will not get unexpected results.
This should work:
CURNT_JOB_STRIPED=$(echo "${LINE}" | awk -F '.tar' '{print $1}')
With variable substitution this can be achieved with more efficient code, and it also clean to read I believe.
Variable substitution is not change the ${LINE} variable so it can be used later as the variable that have the full filename unchanged while ${LINE%.tar*} cut the last .tar text from the variable value and with * anything after that.
while read -r LINE; do
if [ ! -d "${QNAP}/${LINE%.tar*}" ]
then
echo "Folder ${QNAP}/${LINE%.tar*} doesn't exist."
#mkdir "${QNAP}/${LINE%.tar*}"
fi
done <"${LOG_DIR}/${NOVA_TAR_LIST}"
This way you not store the directory name as variable and ${LINE} only store the filename. If You need it into a variable you can do that easily: var="${LINE%.tar*}"
Variable Substitution:
There is more i only picked this 4 for now as they similar and relevant here.
${var#pattern} - Use value of var after removing text that match pattern from the left
${var##pattern} - Same as above but remove the longest matching piece instead the shortest
${var%pattern} - Use value of var after removing text that match pattern from the right
${var%%pattern} - Same as above but remove the longest matching piece instead the shortest

Command Line Argument for Script Changing Way Code Functions

I'm writing a script to loop through a directory, look through each file, and give the iterations of a certain word in each file. When I write it for the specific directory it works fine, but when I try to make the directory a command line argument, it only gives me the count for the first file. I was thinking maybe this has something to do with, the argument being singular ($1), but I really have no idea.
Works
for f in /home/student/Downloads/reviews_folder/*
do
tr -s ' ' '\n' <$f | grep -c '<Author>'
done
Output
125
163
33
...
Doesn't Work
for f in "$1"
do
tr -s ' ' '\n' <$f | grep -c '<Author>'
done
Command Line Input
student-vm:~$ ./countreviews.sh /home/student/Downloads/reviews_folder/*
Output
125
The shell expands wildcards before passing the list of arguments to your script.
To loop over all the files passed in as command-line arguments,
for f in "$#"
do
tr -s ' ' '\n' <"$f" | grep -c '<Author>'
done
Run it like
./countreviews /home/student/Downloads/reviews_folder/*
or more generally
./countreviews ... list of file names ...
As you discovered, "$1" corresponds to the first file name in the expanded list of wildcards.
If you are using double quotes for the parameter it should work. Like this:
student-vm:~$ ./countreviews.sh "/home/student/Downloads/reviews_folder/*"
At least like this it works for me. I hope this helps you.

For Loop Issues with CAT and tr

I have about 700 text files that consist of config output which uses various special characters. I am using this script to remove the special characters so I can then run a different script referencing an SED file to remove the commands that should be there leaving what should not be in the config.
I got the below from Remove all special characters and case from string in bash but am hitting a wall.
When I run the script it continues to loop and writes the script into the output file. Ideally, it just takes out the special characters and creates a new file with the updated information. I have not gotten to the point to remove the previous text file since it probably wont be needed. Any insight is greatly appreciated.
for file in *.txt for file in *.txt
do
cat * | tr -cd '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]' >> "$file" >> "$file".new_file.txt
done
A less-broken version of this might look like:
#!/usr/bin/env bash
for file in *.txt; do
[[ $file = *.new_file.txt ]] && continue ## skip files created by this same script
tr -cd '[:alnum:]\n\r' <"$file" \
| tr '[:upper:]' '[:lower:]' \
>> "$file".new_file.txt
done
Note:
We're referring to the "$file" variable being set by for.
We aren't using cat. It slows your script down with no compensating benefits whatsoever. Instead, using <"$file" redirects from the specific input file being iterated over at present.
We're skipping files that already have .new_file.txt extensions.
We only have one output redirection (to the new_file.txt version of the file; you can't safely write to the file you're using as input in the same pipeline).
Using GNU sed:
sed -i 's/[^[:alnum:]\n\r]//g;s/./\l&/g' *.txt

bash script grep using variable fails to find result that actually does exist

I have a bash script that iterates over a list of links, curl's down an html page per link, greps for a particular string format (syntax is: CVE-####-####), removes the surrounding html tags (this is a consistent format, no special case handling necessary), searches a changelog file for the resulting string ID, and finally does stuff based on whether the string ID was found or not.
The found string ID is set as a variable. The issue is that when grepping for the variable there are no results, even though I positively know there should be for some of the ID's. Here is the relevant portion of the script:
for link in $(cat links.txt); do
curl -s "$link" | grep 'CVE-' | sed 's/<[^>]*>//g' | while read cve; do
echo "$cve"
grep "$cve" ./changelog.txt
done
done
If I hardcode a known ID in the grep command, the script finds the ID and returns things as expected. I've tried many variations of grepping on this variable (e.g. exporting it and doing command expansion, cat'ing the changelog and piping to grep, setting variable directly via command expansion of the curl chain, single and double quotes surrounding variables, half a dozen other things).
Am I missing something nuanced with the outputted variable from the curl | grep | sed chain? When it is echo'd to stdout or >> to a file, things look fine (a single ID with no odd characters or carriage returns etc.).
Any hints or alternate solutions would be much appreciated. Thanks!
FYI:
OSX:$bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin14)
Edit:
The html file that I was curl'ing was chock full of carriage returns. Running the script with set -x was helpful because it revealed the true string being grepped: $'CVE-2011-2716\r'.
+ read -r link
+ curl -s http://localhost:8080/link1.html
+ sed -n '/CVE-/s/<[^>]*>//gp'
+ read -r cve
+ grep -q -F $'CVE-2011-2716\r' ./kernelChangelog.txt
Also investigating from another angle, opening the curled file in vim showed ^M and doing a printf %s "$cve" | xxd also showed the carriage return hex code 0d appended to the grep'd variable. Relying on 'echo' stdout was a wrong way of diagnosing things. Writing a simple html page with a valid CVE-####-####, but then adding a carriage return (in vim insert mode just type ctrl-v ctrl-m to insert the carriage return) will create a sample file that fails with the original script snippet above.
This is pretty standard string sanitization stuff that I should have figured out. The solution is to remove carriage returns, piping to tr -d '\r' is one method of doing that. I'm not sure there is a specific duplicate on SO for this series of steps, but in any case here is my now working script:
while read -r link; do
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read -r cve; do
if grep -q -F "$cve" ./changelog.txt; then
echo "FOUND: $cve";
else
echo "NOT FOUND: $cve";
fi;
done
done < links.txt
HTML files can contain carriage returns at the ends of lines, you need to filter those out.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | tr -d '\r' | while read cve; do
Notice that there's no need to use grep, you can use a regular expression filter in the sed command. (You can also use the tr command in sed to remove characters, but doing this for \r is cumbersome, so I piped to tr instead).
It should look like this:
# First: Care about quoting your variables!
# Use read to read the file line by line
while read -r link ; do
# No grep required. sed can do that.
curl -s "$link" | sed -n '/CVE-/s/<[^>]*>//gp' | while read -r cve; do
echo "$cve"
# grep -F searches for fixed strings instead of patterns
grep -F "$cve" ./changelog.txt
done
done < links.txt

Resources