Bash script freezing despite outputting correctly - bash

I have a text file holding numerous records formatted as follows:
Ford:Mondeo:1997:Blue:5
There are around 100 of them that I'm trying to sort via a bash script, and I'm looking to extract all cars made between 1994 and 1999. Here's what I have so far:
awk -F: '$3=="1994"' | awk -F: '$3<="1999"' $CARFILE > output/1994-1999.txt
The output file is containing all of the correct information, no duplicates etc but it freezes and doesn't echo out the confirmation afterwards. I have to ctrl + D my way out of the script.
Here's the full file for reference:
#CS101 Assignment BASH script
#--help option
#case $1 in
# --help | carslist.txt)
# cat <<-____HALP
# Script name: ${0##*/} [ --help | carslist.txt ]
# This script will organise the given text file and save various #sections to new files.
# No continuation checks are used but when each part is finished, a #confirmation message will print before the script continues.
#____HALP
# exit 0;;
#esac
CARFILE=$1
while [ ! -f "$CARFILE" ]
do
echo "We cannot detect a car file to load, please enter the new filename and press [ENTER]"
read CARFILE
done
echo "We have detected that you're using $CARFILE as your cars file, please continue."
if [ -f output ]
then
echo "Sorry, a file called 'output' exists in the working directory. The script will now exist."
elif [ -d output ]
then
echo "The directory 'output' has been detected, instead of creating a new one we'll be working in there instead."
else
mkdir output
echo "We couldn't find an existing file or directory named 'output' so we've made one for you. Aren't we generous?"
fi
grep 'Vauxhall' $CARFILE > output/Vauxhall_Cars.txt
echo "We've saved all Vauxhall information in the 'Vauxhall_Cars.txt' file. The script will now continue."
grep '2001' $CARFILE > output/Manufactured_2001.txt
echo "We've saved all cared manufactured in 2001 in the 'Manufactured_2001.txt' file. The script will now continue."
awk -F: '$1=="Volkswagen" && $4=="Blue"' $CARFILE > output/Blue_Volkswagen.txt
echo "We've saved all Blue Volkswagens cars in Blue_Volkswagen.txt. The script will now continue"
grep 'V' $CARFILE > output/Makes_V.txt
echo "All cars with the make starting with 'V' have been saved in Makes_V.txt. The script will now continue."
awk -F: '$3=="1994"' | awk -F: '$3<="1999"' $CARFILE > output/1994-1999.txt
echo "Cars made between 1994 and 1999 have been saved in 1994-1999.txt. The script will now continue."
With the run command being bash myScript.sh carslist.txt
Can anyone tell me why it's freezing after outputting correctly?
Just noticed that a record of 1993 has slipped through the cracks, is there a way of formatting the dates in the line above so it's only between 1994-1999?
Thanks in advance.

The expression:
awk -F: '$3=="1994"' | awk -F: '$3<="1999"' $CARFILE > output/1994-1999.txt
Means: run awk on "something" and then pipe to another awk. But you are not providing any "something", so awk is waiting for it.
This is like saying:
awk '{print}' | awk 'BEGIN {print 1}'
It indeed prints the 1 but waits for some kind of input to come.
Just join the conditions:
awk -F: '$3=="1994" && $3<="1999"' $CARFILE > output/1994-1999.txt
Regarding the rest of the script: note you are not using many double quotes. They are a good practise to prevent problems when you have names with spaces, etc. So for example you can say grep 'Vauxhall' "$CARFILE" and allow $CARFILE to contains things like "my car".
You can find out these kind of errors by pasting your script in ShellCheck.

Related

give a file without changing the name in script [duplicate]

This question already has answers here:
How to pass parameters to a Bash script?
(4 answers)
Closed 1 year ago.
At the beginning I have a file.txt, which contains several informations that I will take using the grep command as you see in the script.
What I want is to give the script the file I want instead of file.txt but without changing the file name each time in the script for example if the file is named Me.txt I don’t want to go into the script and write Me.txt in each grep command especially if I have dozens of orders.
Is there a way to do this?
#!/bin/bash
grep teste file.txt > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote file.txt > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse file.txt > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
Using a parameter, as many contributors here suggested, is of course the obvious approach, and the one which is usually taken in such case, so I want to extend this idea:
If you do it naively as
filename=$1
you have to supply the name on every invocation. You can improve on this by providing a default value for the case the parameter is missing:
filename=${1:-file.txt}
But sometimes you are in a situation, where for some time (working on a specific task), you always need the same filename over and over, and the default value happens to be not the one you need. Another possibility to pass information to a program is via the environment. If you set the filename by
filename=${MOOFOO:-file.txt}
it means that - assuming your script is called myscript.sh - if you invoke your script by
MOOFOO=myfile.txt myscript.sh
it uses myfile.txt, while if you call it by
myscript.sh
it uses the default file.txt. You can also set MOOFOO in your shell, as
export MOOFOO=myfile.txt
and then, even a lone execution of
myscript.sh
with use myfile.txt instead of the default file.txt
The most flexible approach is to combine both, and this is what I often do in such a situation. If you do in your script a
filename=${1:-${MOOFOO:-file.txt}}
it takes the name from the 1st parameter, but if there is no parameter, takes it from the variable MOOFOO, and if this variable is also undefined, uses file.txt as the last fallback.
You should pass the filename as a command line parameter so that you can call your script like so:
script <filename>
Inside the script, you can access the command line parameters in the variables $1, $2,.... The variable $# contains the number of command line parameters passed to the script, and the variable $0 contains the path of the script itself.
As with all variables, you can choose to put the variable name in curly brackets which has advantages sometimes: ${1}, ${2}, ...
#!/bin/bash
if [ $# = 1 ]; then
filename=${1}
else
echo "USAGE: $(basename ${0}) <filename>"
exit 1
fi
grep teste "${filename}" > testline.txt
awk '{print $2}' testline.txt > test.txt
echo '#'
echo '#'
grep remote "${filename}" > remoteline.txt
awk '{print $3}' remoteline.txt > remote.txt
echo '#'
echo '#'
grep adresse "${filename}" > adresseline.txt
awk '{print $2}' adresseline.txt > adresse.txt
By the way, you don't need two different files to achieve what you want, you can just pipe the output of grep straight into awk, e.g.:
grep teste "${filename}" | awk '{print $2}' > test.txt
but then again, awk can do the regex match itself, reducing it all to just one command:
awk '/teste/ {print $2}' "${filename}" > test.txt

How to optimize bash script parsing multiple gzipped files with multiple patterns?

I have a bash script which iterates over many files: f1.gz, f2.gz, .. fn.gz
Each files contains millions of lines and each line could match one pattern out of set: p1, p2, .. pn
Depending on that, the matching line should go to a specific file. The patterns are obtained with date manipulations.
I wrote a couple of versions of the same but I'm not satisfied at all and I would like to ask if any better way/solution can be achieved without recurring to writing anything in compiled language.
Here's what I have:
for FILE in `ls f*.gz`
do
echo "uncompressing only once per file -- $FILE: "
gzcat $FILE > .myfile.txt
while IFS='' read -r LINE || [[ -n "$LINE" ]]; do
for DATE in "$#" # I pass to my script several dates like 20201015, 20201014, etc
do
for i in {0..23};
do
p="DATE_PATTERNS_$DATE[$i]" # I prepared these before to avoid running "date" millions of times
echo $LINE | awk -v pat=${!p} -F '"' '$1 ~ pat {print $2" "$4" "$6}' >> $DATE.txt
done
done
done < .myfile.txt
done
Thanks
When you don't want to replace the code with one awk looping through the dates, you can start with removing the while (and opening the outputfile less often):
for FILE in f*.gz; do
echo "uncompressing only once per file -- $FILE: "
gzcat $FILE > .myfile.txt
# I pass to my script several dates like 20201015, 20201014, etc
for DATE in "$#"; do
for i in {0..23};
do
p="DATE_PATTERNS_$DATE[$i]"
awk -v pat=${!p} -F '"' '$1 ~ pat {print $2" "$4" "$6}' .myfile.txt
done
done >> $DATE.txt
done
When you still have tried this and still want improvements, consider moving the for DATE and for i into awk and/or start gzcat f*gz > .mycombinedfiles.txt (when diskspace is no issue).

while-read loop broken on ssh-command

I have a bash-script that moves backup-files to the remote location. On few occasions the temporary HDDs on the remote server had no space left, so I added a md5 check to compare local and remote files.
The remote ssh breaks however the while-loop (i.e. it runs only for first item listed in dir_list file).
# populate /tmp/dir_list
(while read dirName
do
# create archive files for sub-directories
# populate listA variable with archive-file names
...
for fileName in $listA; do
scp /PoolZ/__Prepared/${dirName}/$fileName me#server:/archiv/${dirName}/
md5_local=`md5sum /PoolZ/__Prepared/${dirName}/${fileName} | awk '{ print $1 }'`
tmpRemoteName=`printf "%q\n" "$fileName"` # some file-names have strange characters
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName | awk '{ print $1 }'`
if [[ $md5_local == $md5_remote ]]; then
echo "Checksum of ${fileName}: on local ${md5_local}, on remote ${md5_remote}."
mv -f /PoolZ/__Prepared/${dirName}/$fileName /PoolZ/__Backuped/${dirName}/
else
echo "Checksum of ${fileName}: on local ${md5_local}, on remote ${md5_remote}."
# write eMail
fi
done
done) < /tmp/dir_list
When started the script gives the same md5-sums for the first directory listed in dir_list. The files are also copied both local and remote to expected directories and then script quits.
If I remove the line:
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName | awk '{ print $1 }'`
then apparently the md5-comaprison is not working but the whole script goes through whole list from dir_list.
I also tried to use double-quotes:
md5_remote=`ssh me#server "md5sum /archiv/${dirName}/${tmpRemoteName}" | awk '{ print $1 }'`
but there was no difference (broken dirName-loop).
I went so far, that I replaced the md5_remote... line with a remote ls-command without any shell-variables, and eventually I even tried a line without setting value to the md5_remote variable, i.e.:
ssh me#server "ls /dir/dir/dir/ | head -n 1"
Every solution that has a ssh-command breaks the while-loop. I have no idea why ssh should break bash-loop. Any suggestion are welcomed.
I'm plainly stupid. I found just answer on — what a surprise — stackoverflow.com.
ssh breaks out of while-loop in bash
As suggested I added a pipe to /dev/null and it works now:
md5_remote=`ssh me#server 'md5sum /archiv/'${dirName}'/'$tmpRemoteName < /dev/null | awk '{ print $1 }'`

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

How to combine two text files into one with a specific format with shell script?

I'm trying to combine two files into one file with a specific format, the files contain the following:
fileA.txt:
2
1
1
...
fileB.txt:
0023412322
1241231132
1234411177
...
So the output should be:
fileC.txt:
A B
2 0023412322
1 1241231132
1 1234411177
...
Where A and B represent the name of the column, and also form the initial line of the output file
The script should run on Solaris but I'm also having trouble with the instruction awk and I have not allowed to change or install anything on the system. Now I have a solution using a loop but not very efficient because the script takes too long with large files. So aside from using the awk instruction and loops, any suggestions?
I could never use an awk instruction so I do not have an awk instruction, only the loop:
echo "A B" > fileC.txt
i=echo 1
for line in cat fileA.txt
do
VAR=`sed -n "$i"',1p' fileB.txt`
echo "$line $VAR" >> fileC.txt
echo "$VAR" >> file"$line".txt #Another neccesary command for the script
i=`expr $i + 1`
done
What changes should I do?
paste is a very handy program that does almost exactly what you want, short of printing out the filenames or writing to a file. I wrote this simple shell script to add the filenames:
echo -e "$1\t$2" # print the filenames separated by a tab
paste $1 $2
You can run this by using chmod to make it executable, then running ./myscript file1 file2 (assuming that you name the script myscript). If you want to write to a third file, you can do ./myscript file1 file2 > file3.
Note that as written, the contents of each file are separated by tabs. If you want them to instead be separated by spaces, you can use this script instead:
echo "$1 $2"
paste -d" " $1 $2

Resources