I want to read a file line by line in Unix shell scripting. Line can contain leading and trailing spaces and i want to read those spaces also in the line.
I tried with "while read line" but read command is removing space characters from line :(
Example if line in file are:-
abcd efghijk
abcdefg hijk
line should be read as:-
1) "abcd efghijk"
2) " abcdefg hijk"
What I tried is this (which not worked):-
while read line
do
echo $line
done < file.txt
I want line including space and tab characters in it.
Please suggest a way.
Try this,
IFS=''
while read line
do
echo $line
done < file.txt
EDIT:
From man bash
IFS - The Internal Field Separator that is used for word
splitting after expansion and to split lines into words
with the read builtin command. The default value is
``<space><tab><newline>''
You want to read raw lines to avoid problems with backslashes in the input (use -r):
while read -r line; do
printf "<%s>\n" "$line"
done < file.txt
This will keep whitespace within the line, but removes leading and trailing whitespace. To keep those as well, set the IFS empty, as in
while IFS= read -r line; do
printf "%s\n" "$line"
done < file.txt
This now is an equivalent of cat < file.txt as long as file.txt ends with a newline.
Note that you must double quote "$line" in order to keep word splitting from splitting the line into separate words--thus losing multiple whitespace sequences.
Related
I have file a.txt with following content
aaa
bbb
When I execute following script:
while read line
do
echo $line
done < a.txt > b.txt
generated b.txt contains following
aaa
bbb
It is seen that the leading spaces of lines have got removed. How can I preserve leading spaces?
This is covered in the Bash FAQ entry on reading data line-by-line.
The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:
# Exact lines, no trimming
while IFS= read -r line; do
printf '%s\n' "$line"
done < "$file"
As Charles Duffy correctly points out (and I'd missed by focusing on the IFS issue); if you want to see the spaces in your output you also need to quote the variable when you use it or the shell will, once again, drop the whitespace.
Notes about some of the other differences in that quoted snippet as compared to your original code.
The use of the -r argument to read is covered in a single sentence at the top of the previously linked page.
The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should almost always use the -r option with read.
As to using printf instead of echo there the behavior of echo is, somewhat unfortunately, not portably consistent across all environments and the differences can be awkward to deal with. printf on the other hand is consistent and can be used entirely robustly.
There are several problems here:
Unless IFS is cleared, read strips leading and trailing whitespace.
echo $line string-splits and glob-expands the contents of $line, breaking it up into individual words, and passing those words as individual arguments to the echo command. Thus, even with IFS cleared at read time, echo $line would still discard leading and trailing whitespace, and change runs of whitespace between words into a single space character each. Additionally, a line containing only the character * would be expanded to contain a list of filenames.
echo "$line" is a significant improvement, but still won't correctly handle values such as -n, which it treats as an echo argument itself. printf '%s\n' "$line" would fix this fully.
read without -r treats backslashes as continuation characters rather than literal content, such that they won't be included in the values produced unless doubled-up to escape themselves.
Thus:
while IFS= read -r line; do
printf '%s\n' "$line"
done
Supposing we have a file list.txt:
line 1
line 2
if I use this:
for line in $(cat list.txt)
do
echo $line"_"
done
Even if I do:
OLD_IFS=$IFS
$IFS='$'
I get:
line1
line2_
and not:
line1_
line2_
How can I solve the problem?
Ignoring your incorrect assignment $IFS='$', $ does not mean newline, linefeed or end-of-line. It means literal dollar sign.
To assign a line feed, use
IFS=$'\n'
However, do not attempt to use this to iterate over lines. Instead, use a while read loop, which will not expand globs or collapse empty lines:
while IFS= read -r line
do
echo "${line}_"
done < file
or with similar benefits, read the lines into an array with mapfile:
mapfile -t lines < file
for line in "${lines[#]}"
do
echo "${line}_"
done
I have file a.txt with following content
aaa
bbb
When I execute following script:
while read line
do
echo $line
done < a.txt > b.txt
generated b.txt contains following
aaa
bbb
It is seen that the leading spaces of lines have got removed. How can I preserve leading spaces?
This is covered in the Bash FAQ entry on reading data line-by-line.
The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:
# Exact lines, no trimming
while IFS= read -r line; do
printf '%s\n' "$line"
done < "$file"
As Charles Duffy correctly points out (and I'd missed by focusing on the IFS issue); if you want to see the spaces in your output you also need to quote the variable when you use it or the shell will, once again, drop the whitespace.
Notes about some of the other differences in that quoted snippet as compared to your original code.
The use of the -r argument to read is covered in a single sentence at the top of the previously linked page.
The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should almost always use the -r option with read.
As to using printf instead of echo there the behavior of echo is, somewhat unfortunately, not portably consistent across all environments and the differences can be awkward to deal with. printf on the other hand is consistent and can be used entirely robustly.
There are several problems here:
Unless IFS is cleared, read strips leading and trailing whitespace.
echo $line string-splits and glob-expands the contents of $line, breaking it up into individual words, and passing those words as individual arguments to the echo command. Thus, even with IFS cleared at read time, echo $line would still discard leading and trailing whitespace, and change runs of whitespace between words into a single space character each. Additionally, a line containing only the character * would be expanded to contain a list of filenames.
echo "$line" is a significant improvement, but still won't correctly handle values such as -n, which it treats as an echo argument itself. printf '%s\n' "$line" would fix this fully.
read without -r treats backslashes as continuation characters rather than literal content, such that they won't be included in the values produced unless doubled-up to escape themselves.
Thus:
while IFS= read -r line; do
printf '%s\n' "$line"
done
I'm learning bash and I saw this construction:
cat file | while IFS= read -r line;
do
...
done
Can anyone explain what IFS= does? I know it's input field separator, but why is it being set to nothing?
IFS does many things but you are asking about that particular loop.
The effect in that loop is to preserve leading and trailing white space in line. To illustrate, first observe with IFS set to nothing:
$ echo " this is a test " | while IFS= read -r line; do echo "=$line=" ; done
= this is a test =
The line variable contains all the white space it received on its stdin. Now, consider the same statement with the default IFS:
$ echo " this is a test " | while read -r line; do echo "=$line=" ; done
=this is a test=
In this version, the white space internal to the line is still preserved. But, the leading and trailing white space have been removed.
What does -r do in read -r?
The -r option prevents read from treating backslash as a special character.
To illustrate, we use two echo commands that supply two lines to the while loop. Observe what happens with -r:
$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read -r line; do echo "=$line=" ; done
=this \\ line is \=
=continued=
Now, observe what happens without -r:
$ { echo 'this \\ line is \' ; echo 'continued'; } | while IFS= read line; do echo "=$line=" ; done
=this \ line is continued=
Without -r, two changes happened. First, the double-backslash was converted to a single backslash. Second, the backslash on the end of the first line was interpreted as a line-continuation character and the two lines were merged into one.
In sum, if you want backslashes in the input to have special meaning, don't use -r. If you want backslashes in the input to be taken as plain characters, then use -r.
Multiple lines of input
Since read takes input one line at a time, IFS behaves affects each line of multiple line input in the same way that it affects single line input. -r behaves similarly with the exception that, without -r, multiple lines can be combined into one line using the trailing backslash as shown above.
The behavior with multiple line input, however, can be changed drastically using read's -d flag. -d changes the delimiter character that read uses to mark the end of an input line. For example, we can terminate lines with a tab character:
$ echo $'line one \n line\t two \n line three\t ends here'
line one
line two
line three ends here
$ echo $'line one \n line\t two \n line three\t ends here' | while IFS= read -r -d$'\t' line; do echo "=$line=" ; done
=line one
line=
= two
line three=
Here, the $'...' construct was used to enter special characters like newline, \n and tab, \t. Observe that with -d$'\t', read divides its input into "lines" based on tab characters. Anything after the final tab is ignored.
How to handle the most difficult file names
The most important use of the features described above is to process difficult file names. Since the one character that cannot appear in path/filenames is the null character, the null character can be used to separate a list of file names. As an example:
while IFS= read -r -d $'\0' file
do
# do something to each file
done < <(find ~/music -type f -print0)
I want to read a file line by line in Unix shell scripting. Line can contain leading and trailing spaces and i want to read those spaces also in the line.
I tried with "while read line" but read command is removing space characters from line :(
Example if line in file are:-
abcd efghijk
abcdefg hijk
line should be read as:-
1) "abcd efghijk"
2) " abcdefg hijk"
What I tried is this (which not worked):-
while read line
do
echo $line
done < file.txt
I want line including space and tab characters in it.
Please suggest a way.
Try this,
IFS=''
while read line
do
echo $line
done < file.txt
EDIT:
From man bash
IFS - The Internal Field Separator that is used for word
splitting after expansion and to split lines into words
with the read builtin command. The default value is
``<space><tab><newline>''
You want to read raw lines to avoid problems with backslashes in the input (use -r):
while read -r line; do
printf "<%s>\n" "$line"
done < file.txt
This will keep whitespace within the line, but removes leading and trailing whitespace. To keep those as well, set the IFS empty, as in
while IFS= read -r line; do
printf "%s\n" "$line"
done < file.txt
This now is an equivalent of cat < file.txt as long as file.txt ends with a newline.
Note that you must double quote "$line" in order to keep word splitting from splitting the line into separate words--thus losing multiple whitespace sequences.