Bash while read line loop does not print every line in condition - bash

I have the following situation:
I have a text file I'm trying to loop so I can know if each line has a match with ".mp3" in this case which is this one:
12 Stones.mp3
randomfile.txt
Aclarion.mp3
ransomwebpage.html
Agents Of The Sun.mp3
randomvideo.mp4
So, I've written the following script to process it:
while read line || [ -n "$line" ]
do
varline=$(awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-command
fi
done < file.txt
The expected output would be:
yes
no
yes
no
yes
no
Instead, it seems misses the first line and I get the following:
no
yes
no
yes
no

You really don't need Awk for a simple pattern match if that's all you used it for.
while IFS= read -r line; do
case $line in
*.mp3) some-command;,
*) some-other-command;;
esac
done <file.txt
If you are using Awk anyway for other reasons, looping the lines in a shell loop is inefficient and very often an antipattern. This doesn't really fix that, but at least avoids executing a new Awk instance on every iteration:
awk '{ print ($0 ~ /\.mp3$/) ? "yes" : no" }' file.txt |
while IFS= read -r whether; do
case $whether in
'yes') some-command ;;
'no') some-other-command;;
esac
done
If you need the contents of "$line" too, printing that from Awk as well and reading two distinct variables is a trivial change.
I simplified the read expression on the assumption that you can make sure your input file is well-formed separately. If you can't do that, you need to put back the more-complex guard against a missing newline on the last line in the file.

Use awk
$ awk '{if ($0 ~ /mp3/) {print "yes"} else {print "no"}}' file.txt
yes
no
yes
no
yes
no
Or more concise:
$ awk '/mp3/{print "yes";next}{print "no"}' file.txt
$ awk '{print (/mp3/ ? "yes" : "no")}' file.txt

Have you forgot something? Your awk has no explicit input, change to this instead:
while IFS= read -r read line || [ -n "$line" ]
do
varline=$(echo "$line" | awk '/.mp3/{print "yes";next}{print "no"}')
echo $varline
if [ "$varline" == "yes" ]; then
some-command
else
some-other-command
fi
done < file.txt
In this case, you might need to change to /\.mp3$/ or /\.mp3[[:space:]]*$/ for precise matching.
Because . will match any character, so for example /.mp3/ will match Exmp3but.mp4 too.
Update: changed while read line to while IFS= read -r read line, to keep each line's content intact when assigning to the variable.
And the awk part can be improved to:
awk '{print $0~/\.mp3$/ ? "yes":"no"}'
So with awk only, you can do it like this:
awk '{print $0~/\.mp3$/ ? "yes":"no"}' file.txt
Or if your purpose is just the commands in the if structure, you can just do this:
awk '/\.mp3$/{system("some-command");next}{system("some-other-command");}' file.txt
or this:
awk '{system($0~/\.mp3$/ ? "some-command" : "some-other-command")}' file.txt

Related

Detect double new lines with bash script

I am attempting to return the line number of lines that have a break. An input example:
2938
383
3938
3
383
33333
But my script is not working and I can't see why. My script:
input="./input.txt"
declare -i count=0
while IFS= read -r line;
do
((count++))
if [ "$line" == $'\n\n' ]; then
echo "$count"
fi
done < "$input"
So I would expect, 3, 6 as output.
I just receive a blank response in the terminal when I execute. So there isn't a syntax error, something else is wrong with the approach I am taking. Bit stumped and grateful for any pointers..
Also "just use awk" doesn't help me. I need this structure for additional conditions (this is just a preliminary test) and I don't know awk syntax.
The issue is that "$line" == $'\n\n' won't match a newline as it won't be there after consuming an empty line from the input, instead you can match an empty line with regex pattern ^$:
if [[ "$line" =~ ^$ ]]; then
Now it should work.
It's also match easier with awk command:
$ awk '$0 == ""{ print NR }' test.txt
3
6
As Roman suggested, line read by read terminates with a delimiter, and that delimiter would not show up in the line the way you're testing for.
If the pattern you are searching for looks like an empty line (which I infer is how a "double newline" always manifests), then you can just test for that:
while read -r; do
((count++))
if [[ -z "$REPLY" ]]; then
echo "$count"
fi
done < "$input"
Note that IFS is for field-splitting data on lines, and since we're only interested in empty lines, IFS is moot.
Or if the file is small enough to fit in memory and you want something faster:
mapfile -t -O1 foo < i
declare -p foo
for n in "${!foo[#]}"; do
if [[ -z "${foo[$n]}" ]]; then
echo "$n"
fi
done
Reading the file all at once (mapfile) then stepping through an array may be easier on resources than stepping through a file line by line.
You can also just use GNU awk:
gawk -v RS= -F '\n' '{ print (i += NF); i += length(RT) - 1 }' input.txt
By using FS = ".+", it ensures only truly zero-length (i.e. $0 == "") line numbers get printed, while skipping rows consisting entirely of [[:space:]]'s
echo '2938
383
3938
3
383
33333' |
{m,g,n}awk -F'.+' '!NF && $!NF = NR'
3
6
This sed one-liner should do the job at once:
sed -n '/^$/=' input.txt
Simply writes the current line number (the = command) if the line read is empty (the /^$/ matches the empty line).

How to filter text data in bash more efficiently

I have data file which I need to filter with bash script, see data example:
name=pencils
name=apples
value=10
name=rocks
value=3
name=tables
value=6
name=beds
name=cups
value=89
I need to group name value pairs like so apples=10, if current line starts with name and next line starts with name, first line should be omitted entirely. So result file should look like this:
apples=10
rocks=3
tables=6
cups=89
I came with this simple solution which works but is very slow, it takes 5 min to complete for file with 2000 lines.
VALUES=$(cat input.txt)
for x in $VALUES; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done
I'm aware that this kind of task is not very suitable for bash, but script is already written and this is just small part of it.
How can I optimize this task in bash?
Do not run any commands in subshells, it slows your script a lot. You can do everything in the current shell.
#! /bin/bash
while IFS== read k v ; do
if [[ $k == name ]] ; then
name=$v
elif [[ $k == value ]] ; then
printf '%s=%s\n' "$name" "$v"
fi
done
There are three easy optimizations you can make that will greatly speed up the script without requiring a major rethink.
1. Replace for with while read
Loading input.txt into a string, and then looping over that string with for x in $VALUES is slow. It requires the whole file to be read into memory even though this task could be done in a streaming fashion, reading a line at a time.
A common replacement for for line in $(cat file) is while read line; do ... done < file. It turns out that loops are compound commands, and like the normal one-line commands we're used to, compound commands can have < and > redirections. Redirecting a file into a loop means that for the duration of the loop, stdin comes from the file. So if you call read line inside the loop then it will read one line each iteration.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}" >> output.txt
fi
done < input.txt
2. Redirect output outside loop
It's not just input that can be redirected. We can do the same thing for the >> output.txt redirection. Here's where you'll see the biggest speedup. When >> output.txt is inside the loop output.txt must be opened and closed every iteration, which is crazy slow. Moving it to the outside means it only needs to be opened once. Much, much faster.
while IFS= read -r x; do
if [[ -n $(echo $x | grep 'name=') ]]; then
name=$(echo $x | sed "s/name=//")
elif [[ -n $(echo $x | grep 'value=') ]]; then
value=$(echo $x | sed "s/value=//")
echo "${name}=${value}"
fi
done < input.txt > output.txt
3. Shell string processing
One final improvement is to use faster string processing. Calling grep requires forking a subprocess every time just to do a simple string split. It'd be a lot faster if we could do the string splitting using just shell constructs. Well, as it happens that's easy now that we've switched to read. read can do more than read whole lines; it can also split on a delimiter from the variable $IFS (inter-field separator).
while IFS='=' read -r key value; do
case "$key" in
name) name="$value";;
value) echo "$name=$value";;
fi
done < input.txt > output.txt
Further reading
BashFAQ/001 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
This explains why I have IFS= read -r in the first two iterations.
BashFAQ/024 - I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
cmd | while read; do ... done is another popular use of while read, but it has unique pitfalls.
BashFAQ/100 - How do I do string manipulations in bash?
More in-shell string processing options.
If you have performance issues do not use bash at all. Use a text processing tool like, for instance, awk:
$ awk -F= '{name = $2} $1 == "value" {print name "=" $2}' data.txt
apples=10
rocks=3
tables=6
cups=89
Explanation: -F= defines the field separator as character =. The first block is executed only if the first field of a line ($1) is equal to string value. It prints variable name followed by character = and the second field ($2). The second block is executed on each line and it stores the second field ($2) in variable name.
Normally, if your input resembles what you show, this should automatically skip the first line. Else, we can exclude it explicitly using a test on the NR variable which value is the line number, starting at 1:
awk -F= 'NR != 1 && $1 == "value" {print name "=" $2}
NR != 1 {name = $2}' data.txt
All this works on inputs like the one you show but not on inputs where you would have other types of lines or several value=... consecutive lines. If you really want to test that the name/value pair is on two consecutive lines we need something more. For instance, test if the first field is name and use another variable n to store the line number of the last encountered name=... line. With all these tests we can now put the 2 blocks in a slightly more intuitive order (but the opposite would work the same):
awk -F= 'NR != 1 && $1 == "name" {name = $2; n = NR}
NR != 1 && NR == n+1 && $1 == "value" {print name "=" $2}' data.txt
With awk there might be a more elegant solution but you can have:
awk 'BEGIN{RS="\n?name=";FS="\nvalue="} {if($2) printf "%s=%s\n",$1,$2}' inputs.txt
RS="\n?name=" says that the record separator is name=
FS="\nvalue=" says that the field separator for each record is value=
if($2) says to only proceed the printf is the second field exists

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

How to read a file for special lines in bash script

I just want to read even line number from a file in bash shell, how to do it?
Also I just want to read the fifth line of a file, then how do it?
awk 'NR % 2 == 1' <filename>
For the second one:
awk 'NR == 5' <filename>
You can also use sed to get numbers in a specified range:
sed -ne '5,5p' <filename>
You could use the tail command. Put it in a for loop for the first case and the second is totally trivial if you get the first.
Or maybe you could even use awk:
awk NR==5 file_name
To read even number files using gnu-sed:
sed -n "2~2 p" file
To print specific line # from a file using sed:
sed '5q;d' file
Awk is often the answer (or, nowadays, Perl, Python etc. too)
If for some reason you must do it with only bash and the basic shell utilities:
cat file | \
while read line; do
i=$(( (i + 1) % 2 ))
if [[ $i -eq 0 ]]; then
echo $line // or whatever else you wanted to do with it
fi
done
And to get a specific line:
cat file | head -5 | tail -1
try this:
for example lines between 3 and 6
awk 'NR>=3 && NR<=6'`
These is a help to improve it(but not completed)
#!/bin/bash
test=`cat input.txt | awk 'NR>=3 && NR<=6'`
while read line; do
#do stuff
done <input.txt

shell loop match a regex in the current line

I'm trying to create a script to fix a csv file like this:
field_one,field_two,field_three
,field_two,field_three
So I need to check inside my loop if the current line is missing field_one and replace it with sed with a new value for field_one (overwrite the line missing field_one).
For this i have a loop but i need some help with identifying if the line is missing field one or not. I should probably use grep? but how to use it in a loop and get its response?
while read -r line; do
# this is pseudocode:
# if $line matches regex then
# sed 's/,/newfieldone/'
# overwrite the corrected line in the file
# end if
done < my_file
Thanks a lot in advance for your help!!!!
Inside your loop you can run following sed command:
sed 's/^\s*,/newfieldone,/'
To see if a line begins with a , and is hence missing field one, you can use if [[ "$line" =~ ^, ]].
For example:
while read -r line; do
if [[ "$line" =~ ^, ]]
then
echo "newfieldone$line"
else
echo "$line"
fi
done < my_file
Just for the heck of it, here's a solution in awk:
awk '{FS=","} {if ($1 == "") print "field_one" $0;else print $0} ' < /tmp/test.txt
$ sed -e "/^,/s/^,\([^,]*\),\([^,]\)/new_field_one,\1,\2/" < my_file
Edit: This probably is too complicated. Take one of the other fine answers :)
with sed try something like that:
sed -i 's|\(^,.*\)|new_field_one\1|g' <your file>
This might work for you:
a=Field_one,Field_two,Field_three
sed '/^,/c\'$a'' file
field_one,field_two,field_three
Field_one,Field_two,Field_three
Or if just inserting field_one:
a=Field_one
sed '/^,/s/^/'$a'/' file
field_one,field_two,field_three
Field_one,field_two,field_three
Simple bash solution using case statemetn:
while read -r line; do
case "$line" in
,*) printf "%s%s\n" newfieldone "$line" ;;
*) printf "%s\n" "$line" ;;
esac
done < my_file
case uses "glob" matching, not regular expressions, so ,* matches a string beginning with a comma.
sed -i 's/^,/fieldone,/' YOURFILE
Will replace every line starting , with fieldone, (inplace, so the original file gets overwritten, if you need a backup, try -i.backup).
If you want a dynamic fieldone value, well it depends, how dynamic want it to be :-), e.g.:
MYDYNAMICFIELDONE="DYNAF1"
sed -i "s/^,/${MYDYNAMICFIELDONE},/" YOURFILE
Or with your while loop:
while read -r line; do
MYDYNAMICFIELDONE="SET IT"
sed -i "s/^,/${MYDYNAMICFIELDONE},/"
done < my_file > tmpfile
mv tmpfile my_file
Or with awk:
awk '{
/^,/ {
DYNAF1="SET IT HERE"
print gensub("^,",DYNAF1 ",","g",$0)
}
} INPUT > OUTPUT
This is a pretty short 1-liner with awk
awk '{$1="field_one"}1' FS=',' OFS=',' file.csv
. . . and another awk one-liner:
awk '$1==""{$1="field_one"}1' FS=',' OFS=',' file
What about the use of bash only
while IFS=\, read field_one field_two rest_of_line
echo "${field_one:-default_field_one_value},$field_two,$rest_of_line"
doen < my_file > my_corecct_file
where the 'default_field_one_value' is used if the 'field_one' is empty

Resources