Checking for empty lines in a file - bash

I don't have a code example here since I'm not sure how to do this at all, but I have a file. A legal empty line is one that only contains the new-line tab. Spaces or tabs are illegal.
How do I check if a line is "legally empty"?
If it doesn't have any words (I can check this with wc -w), how do I check if it has no spaces or tabs either, just new-line?
So I've tried something like this:
while read line; do
if [[ "$line" =~ ^$ ]]; then
echo empty line
continue
fi
done < $1
But it's not working. If I put a " " in an otherwise empty line, it still considers it empty.

If you want the line numbers of those empty lines:
perl -lne 'print $. if(/^$/)' your_file
If you want to delete those lines without Perl:
grep . your_file >new_file
If you want to delete those empty line in place using Perl:
perl -i -lne 'print if(/./)' your_file

Terminology: a line that contains only white space is a blank line. A line that contains nothing (except for the newline terminator) is an empty line.
The read builtin strips off leading and trailing whitespace. So if it encounters a blank line, it sets its argument to an empty string, regardless of the amount of whitespace. To avoid this behavior and return the input line unmodified, set the field separator characters to nothing (by default, they are space, tab and newline): set the IFS variable to the empty string. See Why is while IFS= read used so often, instead of IFS=; while read..? for a more detailed explanation. While you're at it, pass the -r option to read, unless you want backslash-newline sequences to be a line continuation.
while IFS= read -r line; do
if [ -z "$line" ]; then
echo empty line
fi
done <"$1"
If you want to tell whether a line is blank:
while IFS= read -r line; do
case "$line" in
'') echo "empty line";;
*[![:space:]]*) echo "non-blank line";;
*) echo "non-empty blank line";;
esac
done <"$1"
You can use Bash regular expression matching if you prefer:
while IFS= read -r line; do
if [[ "$line" =~ ^$ ]]; then
echo "empty line"
elif [[ "$line" =~ ^[[:space:]]+$ ]]; then
echo "non-empty blank line"
else
echo "non-blank line"
fi
done <"$1"
These can be done with pattern matching too (using shell wildcards, which have a different syntax from common regular expressions):
while IFS= read -r line; do
if [[ "$line" == "" ]]; then
echo "empty line"
elif [[ "$line" != *[![:space:]]* ]]; then
echo "non-empty blank line"
else
echo "non-blank line"
fi
done <"$1"
If you merely want to look for empty lines in the file and aren't processing the lines in any other way, you can use grep:
if grep -qxF '' <"$1"; then
echo "$1 contains an empty line"
fi
If you're looking for blank lines that are not empty:
if grep -Ex '[[:space:]]+' <"$1"; then
echo "$1 contains a non-empty blank line"
fi

You can check for an empty line with the regex
^$
^ is the beginning of a line, $ is the end of a line, the above regex matches if there are no other characters.
You can now use that in e.g. sed
sed '/^$/d' input.txt
This would delete all empty lines from your input file.
This would remove empty lines from the file and display the file content on console. The file still remains unchanged.
If you want to remove the empty lines from the file (meaning, changing the file content), then run:
sed -i '/^$/d' input.txt

Related

Detect double new lines with bash script

I am attempting to return the line number of lines that have a break. An input example:
2938
383
3938
3
383
33333
But my script is not working and I can't see why. My script:
input="./input.txt"
declare -i count=0
while IFS= read -r line;
do
((count++))
if [ "$line" == $'\n\n' ]; then
echo "$count"
fi
done < "$input"
So I would expect, 3, 6 as output.
I just receive a blank response in the terminal when I execute. So there isn't a syntax error, something else is wrong with the approach I am taking. Bit stumped and grateful for any pointers..
Also "just use awk" doesn't help me. I need this structure for additional conditions (this is just a preliminary test) and I don't know awk syntax.
The issue is that "$line" == $'\n\n' won't match a newline as it won't be there after consuming an empty line from the input, instead you can match an empty line with regex pattern ^$:
if [[ "$line" =~ ^$ ]]; then
Now it should work.
It's also match easier with awk command:
$ awk '$0 == ""{ print NR }' test.txt
3
6
As Roman suggested, line read by read terminates with a delimiter, and that delimiter would not show up in the line the way you're testing for.
If the pattern you are searching for looks like an empty line (which I infer is how a "double newline" always manifests), then you can just test for that:
while read -r; do
((count++))
if [[ -z "$REPLY" ]]; then
echo "$count"
fi
done < "$input"
Note that IFS is for field-splitting data on lines, and since we're only interested in empty lines, IFS is moot.
Or if the file is small enough to fit in memory and you want something faster:
mapfile -t -O1 foo < i
declare -p foo
for n in "${!foo[#]}"; do
if [[ -z "${foo[$n]}" ]]; then
echo "$n"
fi
done
Reading the file all at once (mapfile) then stepping through an array may be easier on resources than stepping through a file line by line.
You can also just use GNU awk:
gawk -v RS= -F '\n' '{ print (i += NF); i += length(RT) - 1 }' input.txt
By using FS = ".+", it ensures only truly zero-length (i.e. $0 == "") line numbers get printed, while skipping rows consisting entirely of [[:space:]]'s
echo '2938
383
3938
3
383
33333' |
{m,g,n}awk -F'.+' '!NF && $!NF = NR'
3
6
This sed one-liner should do the job at once:
sed -n '/^$/=' input.txt
Simply writes the current line number (the = command) if the line read is empty (the /^$/ matches the empty line).

How to remove last part of a string with different length in bash

I am trying to collect the lines from a file which doesn't start with a # as its first caracter.
I have this code I am able to get them:
while IFS= read -r line
do
[[ -z "$line" ]] && continue
[[ "$line" =~ ^# ]] && continue
#echo "LINEREADED: $line"
done < $file
So the output I have is something like this:
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx [100]
My question is how can I get only the string without the [100]?
I know there is some commands like sed or trim but the problem is that the string is not always that length, sometimes is different like:
cross_modules/core_as/xxxx/xxxxxxxxx [100-103]
or
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx [100-103]
or anything like that...
And in all this cases I only need the string without the [....] and without the last blank space at the end of last x, whichever the length of the string is, like cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
echo ${caseReaded:1:${#caseReaded}-7}
This also do the job but is not generic for any length.
Does anyone knows how I can get this?
You can strip a certain part of a string in bash
echo "${line% [*}"
cross_modules/core_as/xxxx/xxxxxxxxx
modules/core_as/xxxx/xxxxxxxxxxxxxxxxxxxxxxxxxxx
cross_modules/core_as/xxxxxxxxxxxx/xxxxxxxxx
If the spaces are only before [:
while IFS= read -r line _
do
[[ -z $line ]] && continue
[[ $line =~ ^# ]] && continue
done < "$file"
grep to match all lines not starting with # and then display the first field using cut, which works if the first field doesn't contain spaces:
grep -v ^# "$file" | cut -f1 -d' '
If the thing before [100] contains spaces, this may be the way to go:
grep -v ^# "$file" | sed -E 's/^(.*) .*$/\1/'
The last one works because the .* match in sed is greedy so only the last space will be left to match the outer condition .*$.

Substitute a variable in a line read from a file

I have read the config file which has the below variable:
export BASE_DIR="\usr\usr1"
In the same script I read a file line by line and I wanted to substitute the ${BASE_DIR} with \usr\usr1.
In the script:
while read line; do
echo $line
done <file.txt
${BASE_DIR}\path1 should be printed as \usr\usr1\path1
Tried eval echo and $(( )).
Can use sed, This command will search and replace a value. The dollar sign is the separator.
sed -ie 's$\${BASE_DIR}$\\usr\\usr1$1' hello.txt
You need to set the variable when you read the line that contains the assignment. Then you can replace it later.
#!/bin/bash
while read line; do
if [[ $line =~ ^BASE_DIR= ]]
then basedir=${line#BASE_DIR=}
fi
line=${line/'${BASE_DIR}'/$basedir}
printf "%s\n" "$line"
done < file.txt > newfile.txt

bash while loop "eats" my space characters

I am trying to parse a huge text file, say 200mb.
the text file contains some strings
123
1234
12345
12345
so my script looked like
while read line ; do
echo "$line"
done <textfile
however using this above method, my string " 12345" gets truncated to "12345"
I tried using
sed -n "$i"p textfile
but the the throughput is reduced from 27 to 0.2 lines per second, which is inacceptable ;-)
any Idea howto solve this?
You want to echo the lines without a fieldsep:
while IFS="" read line; do
echo "$line"
done <<< " 12345"
When you also want to skip interpretation of special characters, use
while IFS="" read -r line; do
echo "$line"
done <<< " 12345"
You can write the IFS without double quotes:
while IFS= read -r line; do
echo "$line"
done <<< " 12345"
This seems to be what you're looking for:
while IFS= read line; do
echo "$line"
done < textfile
The safest method is to use read -r in comparison to just read which will skip interpretation of special characters (thanks Walter A):
while IFS= read -r line; do
echo "$line"
done < textfile
OPTION 1:
#!/bin/bash
# read whole file into array
readarray -t aMyArray < <(cat textfile)
# echo each line of the array
# this will preserve spaces
for i in "${aMyArray[#]}"; do echo "$i"; done
readarray -- read lines from standard input
-t -- omit trailing newline character
aMyArray -- name of array to store file in
< <() -- execute command; redirect stdout into array
cat textfile -- file you want to store in variable
for i in "${aMyArray[#]}" -- for every element in aMyArray
"" -- needed to maintain spaces in elements
${ [#]} -- reference all elements in array
do echo "$i"; -- for every iteration of "$i" echo it
"" -- to maintain variable spaces
$i -- equals each element of the array aMyArray as it cycles through
done -- close for loop
OPTION 2:
In order to accommodate your larger file you could do this to help alleviate the work and speed up the processing.
#!/bin/bash
sSearchFile=textfile
sSearchStrings="1|2|3|space"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")
This will grep the file (faster) before it cycles it through the while command. Let me know how this works for you. Notice you can add multiple search strings to the $sSearchStrings variable.
OPTION 3:
and an all in one solution to have a text file with your search criteria and everything else combined...
#!/bin/bash
# identify file containing search strings
sSearchStrings="searchstrings.file"
while IFS= read -r string; do
# if $sSearchStrings empty read in strings
[[ -z $sSearchStrings ]] && sSearchStrings="${string}"
# if $sSearchStrings not empty read in $sSearchStrings "|" $string
[[ ! -z $sSearchStrings ]] && sSearchStrings="${sSearchStrings}|${string}"
# read search criteria in from file
done <"${sSearchStrings}"
# identify file to be searched
sSearchFile="text.file"
while IFS= read -r line; do
echo "${line}"
done < <(egrep "${sSearchStrings}" "${sSearchFile}")

Bash script get item from array

I'm trying to read file line by line in bash.
Every line has format as follows text|number.
I want to produce file with format as follows text,text,text etc. so new file would have just text from previous file separated by comma.
Here is what I've tried and couldn't get it to work :
FILENAME=$1
OLD_IFS=$IFSddd
IFS=$'\n'
i=0
for line in $(cat "$FILENAME"); do
array=(`echo $line | sed -e 's/|/,/g'`)
echo ${array[0]}
i=i+1;
done
IFS=$OLD_IFS
But this prints both text and number but in different format text number
here is sample input :
dsadadq-2321dsad-dasdas|4212
dsadadq-2321dsad-d22as|4322
here is sample output:
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
What did I do wrong?
Not pure bash, but you could do this in awk:
awk -F'|' 'NR>1{printf(",")} {printf("%s",$1)}'
Alternately, in pure bash and without having to strip the final comma:
#/bin/bash
# You can get your input from somewhere else if you like. Even stdin to the script.
input=$'dsadadq-2321dsad-dasdas|4212\ndsadadq-2321dsad-d22as|4322\n'
# Output should be reset to empty, for safety.
output=""
# Step through our input. (I don't know your column names.)
while IFS='|' read left right; do
# Only add a field if it exists. Salt to taste.
if [[ -n "$left" ]]; then
# Append data to output string
output="${output:+$output,}$left"
fi
done <<< "$input"
echo "$output"
No need for arrays and sed:
while IFS='' read line ; do
echo -n "${line%|*}",
done < "$FILENAME"
You just have to remove the last comma :-)
Using sed:
$ sed ':a;N;$!ba;s/|[0-9]*\n*/,/g;s/,$//' file
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Alternatively, here is a bit more readable sed with tr:
$ sed 's/|.*$/,/g' file | tr -d '\n' | sed 's/,$//'
dsadadq-2321dsad-dasdas,dsadadq-2321dsad-d22as
Choroba has the best answer (imho) except that it does not handle blank lines and it adds a trailing comma. Also, mucking with IFS is unnecessary.
This is a modification of his answer that solves those problems:
while read line ; do
if [ -n "$line" ]; then
if [ -n "$afterfirst" ]; then echo -n ,; fi
afterfirst=1
echo -n "${line%|*}"
fi
done < "$FILENAME"
The first if is just to filter out blank lines. The second if and the $afterfirst stuff is just to prevent the extra comma. It echos a comma before every entry except the first one. ${line%|\*} is a bash parameter notation that deletes the end of a paramerter if it matches some expression. line is the paramter, % is the symbol that indicates a trailing pattern should be deleted, and |* is the pattern to delete.

Resources