I'm doing the following, which basically works.
The script tries to insert some lines into a file to rewrite it.
But it is stripping all blank lines and also all line padding.
The main problem is that it does not process the last line of the file.
I'm not sure why.
while read line; do
<... process some things ...>
echo ${line}>> "${ACTION_PATH_IN}.work"
done < "${ACTION_PATH_IN}"
What can be done to fix this?
while IFS= read -r line; do
## some work
printf '%s\n' "$line" >> output
done < <(printf '%s\n' "$(cat input)")
An empty IFS tells read not to strip leading and trailing whitespace.
read -r prevents backslash at EOL from creating a line continuation.
Double-quote your parameter substitution ("$line") to prevent the shell from doing word splitting and globbing on its value.
Use printf '%s\n' instead of echo because it is reliable when processing values like like -e, -n, etc.
< <(printf '%s\n' "$(cat input)") is an ugly way of LF terminating the contents of input. Other constructions are possible, depending on your requirements (pipe instead of redirect from process substitution if it is okay that your whole while runs in a subshell).
It might be better if you just ensured that it was LF-terminated before processing it.
Best yet, use a tool such as awk instead of the shell's while loop. First, awk is meant for parsing/manipulating files so for a huge file to process, awk has the advantage. Secondly, you won't have to care whether you have the last newline or not (for your case).
Hence the equivalent of your while read loop:
awk '{
# process lines
# print line > "newfile.txt"
}' file
One possible reason for not reading the last line is that the file does not end with a newline. On the whole, I'd expect it to work even so, but that could be why.
On MacOS X (10.7.1), I got this output, which is the behaviour you are seeing:
$ /bin/echo -n Hi
Hi$ /bin/echo -n Hi > x
$ while read line; do echo $line; done < x
$
The obvious fix is to ensure that the file ends with a newline.
First thing, use
echo "$line" >> ...
Note the quotes. If you don't put them, the shell itself will remove the padding.
As for the last line, it is strange. It may have to do with whether the last line of the file is terminated by a \n or not (it is a good practice to do so, and almost any editor will do that for you).
Related
I have an ... odd issue with a bash shell script that I was hoping to get some insight on.
My team is working on a script that iterates through lines in a file and checks for content in each one. We had a bug where, when run via the automated process that sequences different scripts together, the last line wasn't being seen.
The code used to iterate over the lines in the file (name stored in DATAFILE was
cat "$DATAFILE" | while read line
We could run the script from the command line and it would see every line in the file, including the last one, just fine. However, when run by the automated process (which runs the script that generates the DATAFILE just prior to the script in question), the last line is never seen.
We updated the code to use the following to iterate over the lines, and the problem cleared up:
for line in `cat "$DATAFILE"`
Note: DATAFILE has no newline ever written at the end of the file.
My question is two part... Why would the last line not be seen by the original code, and why this would change make a difference?
I only thought I could come up with as to why the last line would not be seen was:
The previous process, which writes the file, was relying on the process to end to close the file descriptor.
The problem script was starting up and opening the file prior fast enough that, while the previous process had "ended", it hadn't "shut down/cleaned up" enough for the system to close the file descriptor automatically for it.
That being said, it seems like, if you have 2 commands in a shell script, the first one should be completely shut down by the time the script runs the second one.
Any insight into the questions, especially the first one, would be very much appreciated.
The C standard says that text files must end with a newline or the data after the last newline may not be read properly.
ISO/IEC 9899:2011 §7.21.2 Streams
A text stream is an ordered sequence of characters composed into lines, each line
consisting of zero or more characters plus a terminating new-line character. Whether the
last line requires a terminating new-line character is implementation-defined. Characters
may have to be added, altered, or deleted on input and output to conform to differing
conventions for representing text in the host environment. Thus, there need not be a one-to-
one correspondence between the characters in a stream and those in the external
representation. Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last character is a new-line character.
Whether space characters that are written out immediately before a new-line character
appear when read in is implementation-defined.
I would not have expected a missing newline at the end of file to cause trouble in bash (or any Unix shell), but that does seem to be the problem reproducibly ($ is the prompt in this output):
$ echo xxx\\c
xxx$ { echo abc; echo def; echo ghi; echo xxx\\c; } > y
$ cat y
abc
def
ghi
xxx$
$ while read line; do echo $line; done < y
abc
def
ghi
$ bash -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ ksh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ zsh -c 'while read line; do echo $line; done < y'
abc
def
ghi
$ for line in $(<y); do echo $line; done # Preferred notation in bash
abc
def
ghi
xxx
$ for line in $(cat y); do echo $line; done # UUOC Award pending
abc
def
ghi
xxx
$
It is also not limited to bash — Korn shell (ksh) and zsh behave like that too. I live, I learn; thanks for raising the issue.
As demonstrated in the code above, the cat command reads the whole file. The for line in `cat $DATAFILE` technique collects all the output and replaces arbitrary sequences of white space with a single blank (I conclude that each line in the file contains no blanks).
Tested on Mac OS X 10.7.5.
What does POSIX say?
The POSIX read command specification says:
The read utility shall read a single line from standard input.
By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.
If standard input is a terminal device and the invoking shell is interactive, read shall prompt for a continuation line when it reads an input line ending with a <backslash> <newline>, unless the -r option is specified.
The terminating <newline> (if any) shall be removed from the input and the results shall be split into fields as in the shell for the results of parameter expansion (see Field Splitting); [...]
Note that '(if any)' (emphasis added in quote)! It seems to me that if there is no newline, it should still read the result. On the other hand, it also says:
STDIN
The standard input shall be a text file.
and then you get back to the debate about whether a file that does not end with a newline is a text file or not.
However, the rationale on the same page documents:
Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the -r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that "if any" is used in "The terminating <newline> (if any) shall be removed from the input" in the description. It is not a relaxation of the requirement for standard input to be a text file.
That rationale must mean that the text file is supposed to end with a newline.
The POSIX definition of a text file is:
3.395 Text File
A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2008 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
This does not stipulate 'ends with a <newline>' directly, but does defer to the C standard and it does say "A file that contains characters organized into zero or more lines" and when we look at the POSIX definition of a "Line" it says:
3.206 Line
A sequence of zero or more non- <newline> characters plus a
terminating <newline> character.
so per the POSIX definition a file must end in a terminating newline because it's made up of lines and each line must end in a terminating newline.
A solution to the 'no terminal newline' problem
Note Gordon Davisson's answer. A simple test shows that his observation is accurate:
$ while read line; do echo $line; done < y; echo $line
abc
def
ghi
xxx
$
Therefore, his technique of:
while read line || [ -n "$line" ]; do echo $line; done < y
or:
cat y | while read line || [ -n "$line" ]; do echo $line; done
will work for files without a newline at the end (at least on my machine).
I'm still surprised to find that the shells drop the last segment (it can't be called a line because it doesn't end with a newline) of the input, but there might be sufficient justification in POSIX to do so. And clearly it is best to ensure that your text files really are text files ending with a newline.
According to the POSIX spec for the read command, it should return a nonzero status if "End-of-file was detected or an error occurred." Since EOF is detected as it reads the last "line", it sets $line and then returns an error status, and the error status prevents the loop from executing on that last "line". The solution is easy: make the loop execute if the read command succeeds OR if anything was read into $line.
while read line || [ -n "$line" ]; do
Adding some additional info:
There's no need to use cat with while loop. while ...;do something;done<file is enough.
Don't read lines with for.
When using while loop to read lines:
Set the IFS properly (you may lose indentation otherwise).
You should almost always use the -r option with read.
with meeting the above requirements a proper while loop will look like this:
while IFS= read -r line; do
...
done <file
And to make it work with files without a newline at end (reposting my solution from here):
while IFS= read -r line || [ -n "$line" ]; do
echo "$line"
done <file
Or using grep with while loop:
while IFS= read -r line; do
echo "$line"
done < <(grep "" file)
As a workaround, before reading from the text file a newline can be appended to the file.
echo -e "\n" >> $file_path
This will ensure that all the lines that was previously in the file will be read.We need to pass -e argument to echo to enable interpretation of escape sequences.
https://superuser.com/questions/313938/shell-script-echo-new-line-to-file
I tested this in command line
# create dummy file. last line doesn't end with newline
printf "%i\n%i\nNo-newline-here" >testing
Test with your first form (piping to while-loop)
cat testing | while read line; do echo $line; done
This misses the last line, which makes sense since read only gets input that ends with a newline.
Test with your second form (command substitution)
for line in `cat testbed1` ; do echo $line; done
This gets the last line as well
read only gets input if it's terminated by newline, that's why you miss the last line.
On the other hand, in the second form
`cat testing`
expands to the form of
line1\nline2\n...lineM
which is separated by the shell into multiple fields using IFS, so you get
line1 line2 line3 ... lineM
That's why you still get the last line.
p/s: What I don't understand is how you get the first form working...
Use sed to match the last line of a file, which it will then append a newline if one does not exist and have it do an inline replacement of the file:
sed -i '' -e '$a\' file
The code is from this stackexchange link
Note: I have added empty single quotes to -i '' because, at least in OS X, -i was using -e as a file extension for the backup file. I would have gladly commented on the original post but lacked 50 points. Perhaps this will gain me a few in this thread, thanks.
I had a similar issue.
I was doing a cat of a file, piping it to a sort and then piping the result to a 'while read var1 var2 var3'.
ie:
cat $FILE|sort -k3|while read Count IP Name
do
The work under the "do" was an if statement that identified changing data in the $Name field and based on change or no change did sums of $Count or printed the summed line to the report.
I also ran into the issue where I couldnt get the last line to print to the report.
I went with the simple expedient of redirecting the cat/sort to a new file, echoing a newline to that new file and THEN ran my "while read Count IP Name" on the new file with successful results.
ie:
cat $FILE|sort -k3 > NEWFILE
echo "\n" >> NEWFILE
cat NEWFILE |while read Count IP Name
do
Sometimes the simple, inelegant is the best way to go.
This question already has answers here:
How can I loop over the output of a shell command?
(4 answers)
Closed last year.
Consider this example:
cat > test.txt <<EOF
hello world
hello bob
super world
alice worldview
EOF
# using cat to simulate another command output piping;
# get only lines that end with 'world'
fword="world"
for line in "$(cat test.txt | grep " ${fword}\$")"; do
echo "for line: $line"
done
echo "-------"
while read line; do
echo "while line: $line"
done <<< "$(cat test.txt | grep " ${fword}\$")"
The output of this script is:
for line: hello world
super world
-------
while line: hello world
while line: super world
So, basically, the process substitution in the for ... in loop, ended up being compacted in a single string (with newlines inside) - which for ... in still sees as a single "entry", and so it loops only once, dumping the entire output.
The while loop, on the other hand, uses the "classic" here-string - and even with the same quoting of the process substitution (that is, "$(cat test.txt | grep " ${fword}\$")"), the here-string ends up serving lines one-by-one to the while, so it loops as expected (twice in this example).
Could anyone explain why this difference happens - and if it is possible to "massage" the formatting of the for .. in loop, so it also loops correctly like the while loop?
( It is much easier for me to parse what is going on in the for .. in syntax, so I'd love to be able to use it, to run through loops like these (built out of results of pipelines and process substitution) - so that is why I'm asking this question. )
why this difference happens
read (not while) reads input line by line. So any input is read by read up until a newline character, then while loops it.
for iterates for words, and "anything" (except "$#" and "${array[#]}") is always going to be one word. There is one word.
if it is possible to "massage" the formatting of the for .. in loop, so it also loops .. like the while loop?
Unquoted expansion undergoes word splitting expansion, where the result is separated using characters in IFS into words. So you can set IFS to a newline, and the text will be split on newlines into words.
IFS=$'\n'
for i in $(grep " ${fword}\$" test.txt); do
loops correctly
This is all not correct.
Unquoted expansion undergoes word splitting and filename expansion. Any text with * ? [ will be replaced by words of matching filenames (or not, if no matches).
read without -r removes \ from the input, and with default IFS removes trailing and leading newlines from the input.
It is much easier for me to parse
But it is just not correct. for i in $(...) is a common anti- pattern - you should not use it. Executing a command and then storing the whole output of it and then splitting it is expensive. While it is fine for small files, it may bite when parsing logs. Usually you want to parse the command at the same time as it is running - think in pipelines. I.e. <<<"$(stuff)" is an antipattern, it's better to do stuff |.
Get used to the while syntax and to pipes:
grep " ${fword}\$" test.txt |
while IFS= read -r line; do
echo "while line: $line"
done
Or in Bash with process substitution:
while IFS= read -r line; do
echo "while line: $line"
done < <(grep " ${fword}\$" test.txt)
Or one step at a time, but memory consuming:
tmp=$(grep " ${fword}\$" test.txt)
while IFS= read -r line; do
echo "while line: $line"
done <<<"$tmp"
See https://mywiki.wooledge.org/BashFAQ/001 (and if going with pipes, see https://mywiki.wooledge.org/BashFAQ/024 ). Check your script with shellcheck - he will catch most mistakes.
I have file a.txt with following content
aaa
bbb
When I execute following script:
while read line
do
echo $line
done < a.txt > b.txt
generated b.txt contains following
aaa
bbb
It is seen that the leading spaces of lines have got removed. How can I preserve leading spaces?
This is covered in the Bash FAQ entry on reading data line-by-line.
The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:
# Exact lines, no trimming
while IFS= read -r line; do
printf '%s\n' "$line"
done < "$file"
As Charles Duffy correctly points out (and I'd missed by focusing on the IFS issue); if you want to see the spaces in your output you also need to quote the variable when you use it or the shell will, once again, drop the whitespace.
Notes about some of the other differences in that quoted snippet as compared to your original code.
The use of the -r argument to read is covered in a single sentence at the top of the previously linked page.
The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should almost always use the -r option with read.
As to using printf instead of echo there the behavior of echo is, somewhat unfortunately, not portably consistent across all environments and the differences can be awkward to deal with. printf on the other hand is consistent and can be used entirely robustly.
There are several problems here:
Unless IFS is cleared, read strips leading and trailing whitespace.
echo $line string-splits and glob-expands the contents of $line, breaking it up into individual words, and passing those words as individual arguments to the echo command. Thus, even with IFS cleared at read time, echo $line would still discard leading and trailing whitespace, and change runs of whitespace between words into a single space character each. Additionally, a line containing only the character * would be expanded to contain a list of filenames.
echo "$line" is a significant improvement, but still won't correctly handle values such as -n, which it treats as an echo argument itself. printf '%s\n' "$line" would fix this fully.
read without -r treats backslashes as continuation characters rather than literal content, such that they won't be included in the values produced unless doubled-up to escape themselves.
Thus:
while IFS= read -r line; do
printf '%s\n' "$line"
done
I have code that requires a response within a for loop.
Prior to the loop I set IFS="\n"
Within the loop echo -n is ignored (except for the last line).
Note: this is just an example of the behavior of echo -n
Example:
IFS='\n'
for line in `cat file`
do
echo -n $line
done
This outputs:
this is a test
this is a test
this is a test$
with the user prompt occuring only at the end of the last line.
Why is this occuring and is there a fix?
Neither IFS="\n" nor IFS='\n' set $IFS to a newline; instead they set it to literal \ followed by literal n.
You'd have to use an ANSI C-quoted string in order to assign an actual newline: IFS=$'\n'; alternatively, you could use a normal string literal that contains an actual newline (spans 2 lines).
Assigning literal \n had the effect that the output from cat file was not split into lines, because an actual newline was not present in $IFS; potentially - though not with your sample file content - the output could have been split into fields by embedded \ and n characters.
Without either, the entire file contents were passed at once, resulting in a single iteration of your for loop.
That said, your approach to looping over lines from a file is ill-advised; try something like the following instead:
while IFS= read -r line; do
echo -n "$line"
done < file
Never use for loops when parsing files in bash. Use while loops instead. Here is a really good tutorial on that.
http://mywiki.wooledge.org/BashFAQ/001
I have file a.txt with following content
aaa
bbb
When I execute following script:
while read line
do
echo $line
done < a.txt > b.txt
generated b.txt contains following
aaa
bbb
It is seen that the leading spaces of lines have got removed. How can I preserve leading spaces?
This is covered in the Bash FAQ entry on reading data line-by-line.
The read command modifies each line read; by default it removes all leading and trailing whitespace characters (spaces and tabs, or any whitespace characters present in IFS). If that is not desired, the IFS variable has to be cleared:
# Exact lines, no trimming
while IFS= read -r line; do
printf '%s\n' "$line"
done < "$file"
As Charles Duffy correctly points out (and I'd missed by focusing on the IFS issue); if you want to see the spaces in your output you also need to quote the variable when you use it or the shell will, once again, drop the whitespace.
Notes about some of the other differences in that quoted snippet as compared to your original code.
The use of the -r argument to read is covered in a single sentence at the top of the previously linked page.
The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines). Without this option, any backslashes in the input will be discarded. You should almost always use the -r option with read.
As to using printf instead of echo there the behavior of echo is, somewhat unfortunately, not portably consistent across all environments and the differences can be awkward to deal with. printf on the other hand is consistent and can be used entirely robustly.
There are several problems here:
Unless IFS is cleared, read strips leading and trailing whitespace.
echo $line string-splits and glob-expands the contents of $line, breaking it up into individual words, and passing those words as individual arguments to the echo command. Thus, even with IFS cleared at read time, echo $line would still discard leading and trailing whitespace, and change runs of whitespace between words into a single space character each. Additionally, a line containing only the character * would be expanded to contain a list of filenames.
echo "$line" is a significant improvement, but still won't correctly handle values such as -n, which it treats as an echo argument itself. printf '%s\n' "$line" would fix this fully.
read without -r treats backslashes as continuation characters rather than literal content, such that they won't be included in the values produced unless doubled-up to escape themselves.
Thus:
while IFS= read -r line; do
printf '%s\n' "$line"
done