Why does "cut" command skip first line in this "while read line" loop? - bash

I'm writing a bash script, and I need to take the second field of every line in a file, and save them in another file. I know there are many possible ways to do this, BUT, I tried first using while read line; do, and I got stuck. Now, I really want to know what is happening.
For example, input file would be:
line1 11111
line2 222222
line3 333
line4 4444
(The field separtor is "\t").
This is what I was doing:
inputfile=$1
cat $"inputfile" | while read -r line
do
cut -f2 >> results_file
done
The problem is, the output would be:
222222
333
4444
(skipping the first line)
I´ve alredy tested hundreds of modifications, and tried to used other commands instead of cut(like, sed, grep...). I would appreciate some help, or someone pointing me in the right direction.
Thank you very much!

You are not using the variable $line set by read. Try instead
inputfile=$1
cat "$inputfile" | while read -r line
do
echo "$line" | cut -f2 >> results_file
done
In your original code, the while loop is actually run only once, not four times; try putting echo 'Hello!' in the loop to your original code. You would see the message only once, not four times. I guess, without echo "$line" | part, cut -f2 ... part consumes the pipe away.
That is, your while loop first consumes the first line of the stdin and puts this line in the variable $line, leaving the next three lines for later use. But $line is never used. Instead, the remaining three lines are consumed by the command cut.
All commands within a command group are within the scope of any redirections applied to a command group (or any compound command):
— https://mywiki.wooledge.org/BashGuide/CompoundCommands
The pipe operator creates a subshell environment for each command.
— https://mywiki.wooledge.org/BashGuide/InputAndOutput
We can interpret the quotes as "the stdin to your while loop (i.e., the output of cat "$inputfile") is accessed by cut, unless you sever its access by creating a new subshell e.g., by another pipe echo "$line" | ...."
By the way, you can just use cut -f2 "$inputfile" >> results_file without the while loop.

With respect to your comment Does it mean to use "\t at the end" as a separator - no. You're confusing what was suggested, $'\t' with '\t$'. $'\t' means "the literal tab character generated from the escape sequence \t".
You also said in your comment your real 2nd fields are URLs to be curled. You shouldn't be using a UUOC and cut anyway, here's how to really do this:
while IFS=$'\t' read -r key url; do
val=$(curl "$url" | whatever)
printf '%s\t%s\n' "$key" "$val"
done < "$inputfile" > results_file
Replace whatever with whatever command you use to produce the output you want from the curl output.

Related

parse and echo string in a bash while loop

I have a file with this structure:
picture1_123.txt
picture2_456.txt
picture3_789.txt
picture4_012.txt
I wanted to get only the first segment of the file name, that is, picture1 to picture4.
I first used the following code:
cat picture | while read -r line; do cut -f1 -d "_"; echo $line; done
This returns the following output:
picture2
picture3
picture4
picture1_123.txt
This error got corrected when I changed the code to the following:
cat picture | while read line; do s=$(echo $line | cut -f1 -d "_"); echo $s; done
picture1
picture2
picture3
picture4
Why in the first:
The lines are printed in a different order than the original file?
no operation is done on picture1_123.txt and picture1 is not printed?
Thank you!
What Was Wrong
Here's what your old code did:
On the first (and only) iteration of the loop, read line read the first line into line.
The cut command read the entire rest of the file, and wrote the results of extracting only the desired field to stdout. It did not inspect, read, or modify the line variable.
Finally, your echo $line wrote the first line in entirety, with nothing being cut.
Because all input had been consumed by cut, nothing remained for the next read line to consume, so the loop never ran a second time.
How To Do It Right
The simple way to do this is to let read separate out your prefix:
while IFS=_ read -r prefix suffix; do
echo "$prefix"
done <picture
...or to just run nothing but cut, and not use any while read loop at all:
cut -f1 -d_ <picture

CSV file parsing in Bash

I have a CSV file with sample entries given below. What I want is to write a Bash script to read the CSV file line by line and put the first entry e.g 005 in one variable and the IP 192.168.10.1 in another variable, that I need to pass to some other script.
005,192.168.10.1
006,192.168.10.109
007,192.168.10.12
008,192.168.10.121
009,192.168.10.123
A more efficient approach, without the need to fork cut each time:
#!/usr/bin/env bash
while IFS=, read -r field1 field2; do
# do something with $field1 and $field2
done < file.csv
The gains can be quite substantial for large files.
Here's how I would do it with GNU tools :
while read line; do
echo $line | cut -d, -f1-2 --output-delimiter=' ' | xargs your_command
done < your_input.csv
while read line; do [...]; done < your_input.csv will read your file line by line.
For each line, we will cut it to its first two fields (separated by commas since it's a CSV) and pass them separated by spaces to xargs which will in turn pass as parameters to your_command.
If this is a very simple csv file with no string literals, etc. you can simply use head and cut:
#!/bin/bash
while read line
do
id_field=$(cut -d',' -f 1 <<<"$line") #here 005 for the first line
ip_field=$(cut -d',' -f 2 <<<"$line") #here 192.168.0.1 for the first line
#do something with $id_field and $ip_field
done < file.csv
The program works as follows: we use cut -d',' to obtain the first and second field of that line. We wrap this around a while read line and use I/O redirection to feed the file to the while loop.
Of course you substitute file.csv with the name of the file you want to process, and you can use other variable names than the ones in this sample.

Printing file content on one line

I'm completely lost trying to do something which I thought would be very straightforward : read a file line by line and output everything on one line.
I'm using bash on RHEL.
Consider a simple test case with a file (test.in) with following content:
one
two
three
four
I want to have a script which reads this files and outputs:
one two three four
Done
I tried this (test.sh):
cat test.in | while read in; do
printf "%s " "$in"
done
echo "Done"
The result looks like:
# ./test.sh
foure
Done
#
It seems that the printf causes the cursor to jump to the first position on the same line immediately after the %s. The issues holds when doing echo -e "$in \c".
Any ideas?
another answer:
tr '[:space:]' ' ' < file
echo
This must be safest and most efficient as well. Use \n if you want to only convert new lines instead of any white spaces.
You can use:
echo -- $(<test.in); echo 'Done'
one two three four
Done
echo -- `cat file` | tail -c +4
the -- is to protect you from command line options. But in my shell the -- is printed out. I think that might be a bug. Will have to check.
So you need to check if you have to include | tail -c +4 in your implementation.

Reading a file line by line in ksh

We use some package called Autosys and there are some specific commands of this package. I have a list of variables which i like to pass in one of the Autosys commands as variables one by one.
For example one such variable is var1, using this var1 i would like to launch a command something like this
autosys_showJobHistory.sh var1
Now when I launch the below written command, it gives me the desired output.
echo "var1" | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
But if i put the var1 in a file say Test.txt and launch the same command using cat, it gives me nothing. I have the impression that command autosys_showJobHistory.sh does not work in that case.
cat Test.txt | while read line; do autosys_showJobHistory.sh $line | grep 1[1..6]:[0..9][0..9] | grep 24.12.2012 | tail -1 ; done
What I am doing wrong in the second command ?
Wrote all of below, and then noticed your grep statement.
Recall that ksh doesn't support .. as an indicator for 'expand this range of values'. (I assume that's your intent). It's also made ambiguous by your lack of quoting arguments to grep. If you were using syntax that the shell would convert, then you wouldn't really know what reg-exp is being sent to grep. Always better to quote argments, unless you know for sure that you need the unquoted values. Try rewriting as
grep '1[1-6]:[0-9][0-9]' | grep '24.12.2012'
Also, are you deliberately using the 'match any char' operator '.' OR do you want to only match a period char? If you want to only match a period, then you need to escape it like \..
Finally, if any of your files you're processing have been created on a windows machine and then transfered to Unix/Linux, very likely that the line endings (Ctrl-MCtrl-J) (\r\n) are causing you problems. Cleanup your PC based files (or anything that was sent via ftp) with dos2unix file [file2 ...].
If the above doesn't help, You'll have to "divide and conquer" to debug your problem.
When I did the following tests, I got the expected output
$ echo "var1" | while read line ; do print "line=${line}" ; done
line=var1
$ vi Test.txt
$ cat Test.txt
var1
$ cat Test.txt | while read line ; do print "line=${line}" ; done
line=var1
Unrelated to your question, but certain to cause comment is your use of the cat commnad in this context, which will bring you the UUOC award. That can be rewritten as
while read line ; do print "line=${line}" ; done < Test.txt
But to solve your problem, now turn on the shell debugging/trace options, either by changing the top line of the script (the shebang line) like
#!/bin/ksh -vx
Or by using a matched pair to track the status on just these lines, i.e.
set -vx
while read line; do
print -u2 -- "#dbg: Line=${line}XX"
autosys_showJobHistory.sh $line \
| grep 1[1..6]:[0..9][0..9] \
| grep 24.12.2012 \
| tail -1
done < Test.txt
set +vx
I've added an extra debug step, the print -u2 -- .... (u2=stderror, -- closes option processing for print)
Now you can make sure no extra space or tab chars are creeping in, by looking at that output.
They shouldn't matter, as you have left your $line unquoted. As part of your testing, I'd recommend quoting it like "${line}".
Then I'd comment out the tail and the grep lines. You want to see what step is causing this to break, right? So does the autosys_script by itself still produce the intermediate output you're expecting? Then does autosys + 1 grep produce out as expected, +2 greps, + tail? You should be able to easily see where you're loosing your output.
IHTH

Echo changes my tabs to spaces

I'm taking the following structure from around the net as a basic example of how to read from a file in BASH:
cat inputfile.txt | while read line; do echo $line; done
My inputfile.txt is tab-delimited, though, and the lines that come out of the above command are space-delimited.
This is causing me problems in my actual application, which is of course more complex than the above: I want to take the line, generate some new stuff based on it, and then output the original line plus the new stuff as extra fields. And the pipeline is going to be complicated enough without a bunch of cut -d ' ' and sed -e 's/ /\t/g' (which wouldn't be safe for tab-delimited data containing spaces anyway).
I've looked at IFS solutions, but they don't seem to help in this case. What I want is an OFS...except that I'm in echo, not awk! I think that if I could just get echo to spit out what I gave it, verbatim, I'd be in good shape. Any thoughts? Thanks!
Try:
cat inputfile.txt | while read line; do echo "$line"; done
instead.
In other words, it's not read replacing the tabs, it's echo.
See the following transcript (using <<tab>> where the tabs are):
pax$ echo 'hello<<tab>>there' | while read line ; do echo $line ; done
hello there
pax$ echo 'hello<<tab>>there' | while read line ; do echo "$line" ; done
hello<<tab>>there

Resources