Writing a Unix filter using bash

Writing a Unix filter using bash - bash

If a Unix/Linux command accepts its input data from the standard input and produces its output (result) on standard output is known as a filter.
The trivial filter is cat. It just copies stdin to stdout without any modification whatsoever.
How do I implement cat in bash? (neglecting the case that the command gets command line arguments)
I came up with
#! /bin/bash
while IFS="" read -r line
do
echo -E "$line"
done
That seems to work in most cases, also for text files containing some binary bytes as long as they are not null bytes. However, if the last line does not end in a newline character, it will be missing from the output.
How can that be fixed?
I'm nearly sure this must have been answered before, but my searching skills don't seem to be good enough.
Obviously I don't want to re-implement cat in bash: It wouldn't work anyway because of the null byte problem. But I want to extend the basic loop to do some custom processing to certain lines of a text file. However, we have all seen text files without the final line feed, so I'd prefer if that case could be handled.

Assuming you don't need to work with arbitrary binary files (since the shell cannot store null bytes in a variable), you can handle a file that isn't terminated by a newline by checking if line is not empty once the loop exits.
while IFS= read -r line; do
printf '%s\n' "$line"
done
if [ -n "$line" ]; then
printf '%s' "$line"
fi
In the loop, we output the newline that read stripped off. In the final if, we don't output a newline because $line would be empty if the last line read by the while loop had ended with one.

Related

During a while loop file read, where is the first line of stdin lost?

Assume we have a file with the numbers 1 to 5 written down line by line.
When I open a file for reading as standard input and use 'while read,' commands which can read stdin are unable to read the first line of that file.
$ while read x; do sed ''; done<file
2
3
4
5
It makes no difference which command you use: sed, awk, cat, etc. That problem occurs if the command is able to read from stdin.There is also no difference between the shells I use. I try the same thing in sh, bash, and zsh, and the results are identical.
It's worth noting that the loop iterates five times, once for each line. For example:
$ while read x; do printf 'something\n'; done<file
something
something
something
something
something
I understand that if I want to read all lines correctly, I must specify a variable in the read command and then pass it to the command. But I'm trying to figure out what's going on here. Why does this problem occur when I do not specify input for a command directly?
Perhaps it is a side effect with no functional purpose.
I couldn't find any information about this behavior of the 'while read' statement, and neither did I find anyone who had a similar problem.

Your code only iterates once.
while read x; do sed ''; done<file
...behaves as follows:
file is opened and attached to stdin
read consumes the first line of the file from stdin and puts it into $x
sed '' consumes the entire rest of the file from stdin and prints it to stdout without changes.
read sees there's no more data (because sed consumed it all), and the loop ends.
If you want sed to operate on only the one line that read x consumed, and to safeguard against other bugs, you might instead write:
while IFS= read -r x; do printf '%s\n' "$x" | sed ''; done <file
The changes:
Using IFS= prevents leading or trailing whitespace from being deleted by read.
Using the -r argument prevents backslashes from being consumed by read.
Piping from printf '%s\n' "$x" into sed changes sed's stdin, such that instead of containing the rest of the file, it only contains the one line. Thus, this ensures that sed is processing the line that was consumed by read, instead of ignoring that line and processing the entire rest of the file. (Using printf instead of echo is a correctness concern; see Why is printf better than echo? on UNIX & Linux Stack Exchange).

the first line of stdin is not lost, but rather it is consumed by the shell when the redirection operator '<' is used to redirect the contents of the file to the while loop. The first line is used as the input to initialize the while loop, and subsequent lines are read inside the loop. This is why the first line is not processed by the commands inside the loop. To avoid this, you can redirect the file to a new file descriptor using '<&', as follows:
$ while read x; do sed ''; done <&3 3<file

Split and display file line in bash

I have a simple bash script and I don't understand the return value.
My script
#!bin/bash
string=$(head -n 1 test.txt)
IFS=":"
read -r pathfile line <<< "$string"
echo "left"$line"right"
And my test.txt
filepath:file content
others lines
...
I have this return on the console.
rightfile content
The problem isn't when file only have 1 line.
I don't know why I don't have left value right to result.

Your input file has MSWin line ends (\x0D\x0A). Therefore, \x0D becomes part of $line and when printed, it moves the cursor back to the beginning, so $line"right" overwrites it.
Run dos2unix or fromdos on the input file to fix it.
BTW, you don't need to quote left and right. Quoting the variable might be needed, though.
echo left"$line"right

Read a file line-by-line on bash; each line containing the path to another unqiue file

Each line in a given file 'a.txt' contains the directory/path to another unique file. Suppose we want to parse 'a.txt' line-by-line, extract the path in string format, and then use a tool such as vim to process the file at this path, and so on.
After going through this thread - Read a file line by line assigning the value to a variable, I wrote the following script, say 'open-file.sh' on bash (I'm new to it)
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
vim -c ":q" -cq $line # Just open the file and close it using :q
done < "$1"
We would then run the above script as -
./open-file.sh a.txt
The problem is that although the path to a new file is correctly specified by $line, when vim opens the file, vim continues to receive the text contained in 'a.txt' as a command. How can I write a script where I can correctly obtain the path from 'a.txt', open it using vim, and then continue parsing the remaining lines in 'a.txt' ?

Replace:
vim -c ":q" -cq $line
With:
vim -c ":q" -cq "$line" </dev/tty
The redirection </dev/tty tells vim to take its standard input from the terminal. Without that, the standard input for vim is "$1".
Also, it is good practice to put $line in double-quotes to protect it from word splitting, etc.
Lastly, while vim is excellent for interactive work, if your end-goal is fully automated processing of each file, you might want to consider tools such as sed or awk.

Although I'm not sure of your ultimate goal, this shell command will execute vim once per line in a.txt:
xargs -o -n1 vim -c ':q' < a.txt
As explained in the comments to Read a file line by line assigning the value to a variable, the issue you're encountering is due to the fact that vim is an interactive program and thus continues to read input from $line.

The problem was already mentioned in a comment under the answer you based your script on.
vim is consuming stdin which is given to the loop by done < $1. We can observe the same behavior in the following example:
$ while read i; do cat; done < <(seq 3)
2
3
<(seq 3) simulates a file with the three lines 1, 2, and 3. Instead of three silent iterations we get only one iteration and the output 2 and 3.
stdin is not only passed to read in the head of the loop, but also to cat in the body of the loop. Therefore read reads one line, the loop is entered, cat reads all remaining lines, stdin is empty, read has nothing to read anymore, the loop exits.
You could circumvent the problem by redirecting something to vim, however there is an even better way. You don't need the loop at all:
< "$1" xargs -d\\n -n1 vim -c :q -cq
xargs will execute vim once for every line in the file given by $1.

How do I iterate over each line in a file with Bash?

Given a text file with multiple lines, I would like to iterate over each line in a Bash script. I had attempted to use cut, but cut does not accept \n (newline) as a delimiter.
This is an example of the file I am working with:
one
two
three
four
Does anyone know how I can loop through each line of this text file in Bash?

I found myself in the same problem, this works for me:
cat file.cut | cut -d$'\n' -f1
Or:
cut -d$'\n' -f1 file.cut

Use cat for concatenating or displaying. No need for it here.
file="/path/to/file"
while read line; do
echo "${line}"
done < "${file}"

Simply use:
echo -n `cut ...`
This suppresses the \n at the end

cat FILE|while read line; do # 'line' is the variable name
echo "$line" # do something here
done
or (see comment):
while read line; do # 'line' is the variable name
echo "$line" # do something here
done < FILE

So, some really good (possibly better) answers have been provided already. But looking at the phrasing of the original question, in wanting to use a BASH for-loop, it amazed me that nobody mentioned a solution with change of Field Separator IFS. It's a pure bash solution, just like the accepted read line
old_IFS=$IFS
IFS='\n'
for field in $(<filename)
do your_thing;
done
IFS=$old_IFS

If you are sure that the output will always be newline-delimited, use head -n 1 in lieu of cut -f1 (note that you mentioned a for loop in a script and your question was ultimately not script-related).
Many of the other answers, including the accepted one, have multiple lines unnecessarily. No need to do this over multiple lines or changing the default delimiter on the system.
Also, the solution provided by Ivan with -d$'\n' did not work for me either on Mac OSX or CentOS 7. Since his answer is four years old, I assume something must have changed on the logic of the $ character for this situation.

While loop with input redirection and read command.
You should not be using cut to perform a sequential iteration of each line in a file as cut was not designed to do this.
Print selected parts of lines from each FILE to standard output.
— man cut
TL;DR
You should use a while loop with the read -r command and redirect standard input to your file inside a function scope where IFS is set to \n and use -E when using echo.
processFile() { # Function scope to prevent overwriting IFS globally
file="$1" # Any file that exists
local IFS="\n" # Allows spaces and tabs
while read -r line; do # Read exits with 1 when done; -r allows \
echo -E "$line" # -E allows printing of \ instead of gibberish
done < $file # Input redirection allows us to read file from stdin
}
processFile /path/to/file
Iteration
In order to iterate over each line of a file, we can use a while loop. This will let us iterate as many times as we need to.
while <condition>; do
<body>
done
Getting our file ready to read
We can use the read command to store a single line from standard input in a variable. Before we can use that to read a line from our file, we need to redirect standard input to point to our file. We can do this with input redirection. According to the man pages for bash, the syntax for redirection is [fd]<file where fd defaults to standard input (a.k.a file descriptor 0). We can place this before or after our while loop.
while <condition>; do
<body>
done < /path/to/file
# or the non-traditional way
</path/to/file while <condition>; do
<body>
done
Reading the file and ending the loop
Now that our file can be read from standard input, we can use read. The syntax for read in our context is read [-r] var... where -r preserves the \ (backslash) character, instead of using it as an escape sequence character, and var is the name of the variable to store the input in. You can have multiple variables to store pieces of the input in but we only need one to read an entire line. Along with this, to preserve any backslashes in any output from echo you will likely need to use the -E flag to disable the interpretation of backslash escapes. If you have any indentation (spaces or tabs), you will need to temporarily change the IFS (Input Field Separators) variable to only "\n"; normally it is set to " \t\n".
main() {
local IFS="\n"
read -r line
echo -E "$line"
}
main
How do we use read to end our while loop?
There is really only one reliable way, that I know of, to determine when you've finished reading a file with read: check the exit value of read. If the exit value of read is 0 then we successfully read a line, if it is 1 or higher then we reached EOF (end of file). With that in mind, we can place the call to read in our while loop's condition section.
processFile() {
# Could be any file you want hardcoded or dynamic
file="$1"
local IFS="\n"
while read -r line; do
# Process line here
echo -E "$line"
done < $file
}
processFile /path/to/file1
processFile /path/to/file2
A visual breakdown of the above code via Explain Shell.

If I am executing a command and want to cut the output but it has multiple lines I found it helpful to do
echo $([command]) | cut [....]
This puts all the output of [command] on a single line that can be easier to process.

My opinion is that "cut" uses '\n' as its default delimiter.
If you want to use cut, I have two ways:
cut -d^M -f1 file_cut
I make ^M By click Enter After Ctrl+V. Another way is
cut -c 1- file_cut
Does that help?

How to read entire line from bash

I have a file file.txt with contents like
i love this world
I hate stupid managers
I love linux
I have MS
When I do the following:
for line in `cat file.txt`; do
echo $line
done
It gives output like
I
love
this
world
I
..
..
But I need the output as entire lines like below — any thoughts ?
i love this world
I hate stupid managers
I love linux
I have MS

while read -r line; do echo "$line"; done < file.txt

As #Zac noted in the comments, the simplest solution to the question you post is simply cat file.txt so i must assume there is something more interesting going on so i have put the two options that solve the question as asked as well:
There are two things you can do here, either you can set IFS (Internal Field Separator) to a newline and use existing code, or you can use the read or line command in a while loop
IFS="
"
or
(while read line ; do
//do something
done) < file.txt

I believe the question was how to read in an entire line at a time. The simple script below will do this. If you don't specify a variable name for "read" it will stuff the entire line into the variable $REPLY.
cat file.txt|while read; do echo $REPLY; done
Dave..

You can do it by using read if the file is coming into stdin. If you need to do it in the middle of a script that already uses stdin for other purposes, you can temporarily reassign the stdin file descriptor.
#!/bin/bash
file=$1
# save stdin to usually unused file descriptor 3
exec 3<&0
# connect the file to stdin
exec 0<"$file"
# read from stdin
while read -r line
do
echo "[$line]"
done
# when done, restore stdin
exec 0<&3

Try
(while read l; do echo $l; done) < temp.txt
read: Read a line from the standard
input and split it into fields.
Reads a single line from the standard input, or from file
descriptor FD
if the -u option is supplied. The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned
to
the last NAME. Only the characters found in $IFS are
recognized as word
delimiters.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Writing a Unix filter using bash - bash

Related

During a while loop file read, where is the first line of stdin lost?

Split and display file line in bash

Read a file line-by-line on bash; each line containing the path to another unqiue file

How do I iterate over each line in a file with Bash?

How to read entire line from bash

Categories

Resources