Unexpected behavior when processing input via stdin but file input works fine - bash

I have a program which transposes a matrix. It works properly when passed a file as a parameter, but it gives strange output when given input via stdin.
This works:
$ cat m1
1 2 3 4
5 6 7 8
$ ./matrix transpose m1
1 5
2 6
3 7
4 8
This doesn't:
$ cat m1 | ./matrix transpose
5
[newline]
[newline]
[newline]
This is the code I'm using to transpose the matrix:
function transpose {
# Set file to be argument 1 or stdin
FILE="${1:-/dev/stdin}"
if [[ $# -gt 1 ]]; then
print_stderr "Too many arguments. Exiting."
exit 1
elif ! [[ -r $FILE ]]; then
print_stderr "File not found. Exiting."
exit 1
else
col=1
read -r line < $FILE
for num in $line; do
cut -f$col $FILE | tr '\n' '\t'
((col++))
echo
done
exit 0
fi
}
And this code handles the argument passing:
# Main
COMMAND=$1
if func_exists $COMMAND; then
$COMMAND "${#:2}"
else
print_stderr "Command \"$COMMAND\" not found. Exiting."
exit 1
fi
I'm aware of this answer but I can't figure out where I've gone wrong. Any ideas?

for num in $line; do
cut -f$col $FILE | tr '\n' '\t'
((col++))
echo
done
This loop reads $FILE over and over, once for each column. That works fine for a file but isn't suitable for stdin, which is a stream of data that can only be read once.
A quick fix would be to read the file into memory and use <<< to pass it to read and cut.
matrix=$(< "$FILE")
read -r line <<< "$matrix"
for num in $line; do
cut -f$col <<< "$matrix" | tr '\n' '\t'
((col++))
echo
done
See An efficient way to transpose a file in Bash for a variety of more efficient one-pass solutions.

Related

How to run grep with while loop in shell script?

I'm trying to make shell script that counts 9 letters words consisted of A, G, C, T in B.txt.
First,
9bp_cases.txt contains
AAAAAAAAA
AAAAAAAAG
AAAAAAAAC
AAAAAAAAT
AAAAAAAGA
AAAAAAAGG
AAAAAAAGC
...
//#!/bin/bash
file=/Dataset/4.synTF/2.Sequence/9bp_cases.txt
# CSV 파일이 존재하지 않으면 종료
if [ ! -f "$file" ]; then
echo "CSV 파일이 존재하지 않습니다: $file" >&2
exit 1
fi
cat 9bp_cases.txt | while read line;
do
echo $line
grep -i -o $line B.txt | wc -w
done
The result is like this:
//AAAAAAAAA
0
AAAAAAAAG
0
AAAAAAAAC
0
AAAAAAAAT
0
AAAAAAAGA
0
None of the words is counted correctly.
However, when I run the simple code, it returns result well.
grep -i -o AAAAAAAA B.txt | wc -w
33410
I guess $line after grep is not recognized by the command grep.
Can you please help me?
Thank you.

bash - print a line every X seconds (like sed every X lines)

I know with sed you can pipe the output of a command so that you can print every X lines.
make all | sed -n '2~5'
Is there an equivalent command to print a line every X seconds?
make all | print_line_every_sec '5'
In 5 seconds timeout read one line and discard anything else:
while
# timeout 5 seconds
! timeout 5 sh -c '
# read one line
if IFS= read -r line; then
# output the line
printf "%s\n" "$line"
# discard the input for the rest of 5 seconds
cat >/dev/null
fi
# will get here only, if there is nothing to read
'
# that means that `timeout` will always return 124 if stdin is still open
# and it will return 0 exit status only if there is nothing to read
# so we loop on nonzero exit status of timeout.
do :; done
and a oneliner:
while ! timeout 0.5 sh -c 'IFS= read -r line && printf "%s\n" "$line" && cat >/dev/null'; do :; done
But maybe something simpler - just discard 5 seconds of data each one line:
while IFS= read -r line; do
printf "%s\n" "$line"
timeout 5 cat >/dev/null
done
or
while IFS= read -r line &&
printf "%s\n" "$line" &&
! timeout 5 cat >/dev/null
do :; done
If you want the most recent message every 5 seconds, this is a try :
make all | {
display(){
if (( $SECONDS >= 5)); then
if test -n "${last_line+x}"; then
# print only if there is a message in the last 5 seconds
echo $last_line; unset last_line
fi
SECONDS=0
fi
}
SECONDS=0
while true; do
while IFS= read -t 0.001 line; do
last_line=$line
display
done
display
done
}
Even if the proposed solutions are interesting and beautiful, the most elegant solution IMHO is a awk solution. If you want to issue
make all | print_line_every_sec 5
then you have to create the script print_line_every_sec as follows, including a test to avoid an infinite loop:
#!/bin/bash
if [ $1 -le 0 ] ; then echo $(basename $0): invalid argument \'$1\'; exit 1; fi
awk -v delay=$1 'BEGIN {t = systime ()}
{if (systime() >= t) {print $0 ; t += delay}}'
This might work for you (GNU sed):
sed 'e sleep 1' file
Print a line every n (in the above example 1 ) second.
To print 5 lines every 2 seconds, use:
sed '1~5e sleep 2' file
You can do it by watch command.
If You need only print your output every X second, you could use something like this:
watch -n X "Your CMD"
if you need to designate any change on your output, it would be useful to use -d switch :
watch -n X -d "Your CMD"

Last line of a file is not reading in shell script

I have a text file foo.txt with the below text as content,
1
2
3
4
5
I have a shell script,
file="foo.txt"
while IFS= read -r line
do
echo "$line"
done < "$file"
But this prints only till 4.
Actual Output:
1
2
3
4
How to get the expected output as below?
Expected Output:
1
2
3
4
5
This is due to missing line break in the last line of your input file.
You can use this loop to read everything:
while IFS= read -r line || [ -n "$line" ]; do
echo "$line"
done < "$file"
For the last line without line break, read doesn't return a success hence [ -n "$line" ] check is done to make sure to print it when $line is not empty.
PS: If you don't mind changing your input file then use printf to append a newline using:
printf '\n' >> "$file"
And then read normally:
while IFS= read -r line; do
echo "$line"
done < "$file"

Output a file in two columns in BASH

I'd like to rearrange a file in two columns after the nth line.
For example, say I have a file like this here:
This is a bunch
of text
that I'd like to print
as two
columns starting
at line number 7
and separated by four spaces.
Here are some
more lines so I can
demonstrate
what I'm talking about.
And I'd like to print it out like this:
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
How could I do that with a bash command or function?
Actually, pr can do almost exactly this:
pr --output-tabs=' 1' -2 -t tmp1
↓
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
-2 for two columns; -t to omit page headers; and without the --output-tabs=' 1', it'll insert a tab for every 8 spaces it added. You can also set the page width and length (if your actual files are much longer than 100 lines); check out man pr for some options.
If you're fixed upon “four spaces more than the longest line on the left,” then perhaps you might have to use something a bit more complex;
The following works with your test input, but is getting to the point where the correct answer would be, “just use Perl, already;”
#!/bin/sh
infile=${1:-tmp1}
longest=$(longest=0;
head -n $(( $( wc -l $infile | cut -d ' ' -f 1 ) / 2 )) $infile | \
while read line
do
current="$( echo $line | wc -c | cut -d ' ' -f 1 )"
if [ $current -gt $longest ]
then
echo $current
longest=$current
fi
done | tail -n 1 )
pr -t -2 -w$(( $longest * 2 + 6 )) --output-tabs=' 1' $infile
↓
This is a bunch and separated by four spa
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
… re-reading your question, I wonder if you meant that you were going to literally specify the nth line to the program, in which case, neither of the above will work unless that line happens to be halfway down.
Thank you chatraed and BRPocock (and your colleague). Your answers helped me think up this solution, which answers my need.
function make_cols
{
file=$1 # input file
line=$2 # line to break at
pad=$(($3-1)) # spaces between cols - 1
len=$( wc -l < $file )
max=$(( $( wc -L < <(head -$(( line - 1 )) $file ) ) + $pad ))
SAVEIFS=$IFS;IFS=$(echo -en "\n\b")
paste -d" " <( for l in $( cat <(head -$(( line - 1 )) $file ) )
do
printf "%-""$max""s\n" $l
done ) \
<(tail -$(( len - line + 1 )) $file )
IFS=$SAVEIFS
}
make_cols tmp1 7 4
Could be optimized in many ways, but does its job as requested.
Input data (configurable):
file
num of rows borrowed from file for the first column
num of spaces between columns
format.sh:
#!/bin/bash
file=$1
if [[ ! -f $file ]]; then
echo "File not found!"
exit 1
fi
spaces_col1_col2=4
rows_col1=6
rows_col2=$(($(cat $file | wc -l) - $rows_col1))
IFS=$'\n'
ar1=($(head -$rows_col1 $file))
ar2=($(tail -$rows_col2 $file))
maxlen_col1=0
for i in "${ar1[#]}"; do
if [[ $maxlen_col1 -lt ${#i} ]]; then
maxlen_col1=${#i}
fi
done
maxlen_col1=$(($maxlen_col1+$spaces_col1_col2))
if [[ $rows_col1 -lt $rows_col2 ]]; then
rows=$rows_col2
else
rows=$rows_col1
fi
ar=()
for i in $(seq 0 $(($rows-1))); do
line=$(printf "%-${maxlen_col1}s\n" ${ar1[$i]})
line="$line${ar2[$i]}"
ar+=("$line")
done
printf '%s\n' "${ar[#]}"
Output:
$ > bash format.sh myfile
This is a bunch and separated by four spaces.
of text Here are some
that I'd like to print more lines so I can
as two demonstrate
columns starting what I'm talking about.
at line number 7
$ >

How to verify information using standard linux/unix filters?

I have the following data in a Tab delimited file:
_ DATA _
Col1 Col2 Col3 Col4 Col5
blah1 blah2 blah3 4 someotherText
blahA blahZ blahJ 2 someotherText1
blahB blahT blahT 7 someotherText2
blahC blahQ blahL 10 someotherText3
I want to make sure that the data in 4th column of this file is always an integer. I know how to do this in perl
Read each line, Store value of 4th column in a variable
check if that variable is an integer
if above is true, continue the loop
else break out of the loop with message saying file data not correct
But how would I do this in a shell script using standard linux/unix filter? My guess would be to use grep, but I am not sure how?
cut -f4 data | LANG=C grep -q '[^0-9]' && echo invalid
LANG=C for speed
-q to quit at first error in possible long file
If you need to strip the first line then use tail -n+2 or you could get hacky and use:
cut -f4 data | LANG=C sed -n '1b;/[^0-9]/{s/.*/invalid/p;q}'
awk is the tool most naturally suited for parsing by columns:
awk '{if ($4 !~ /^[0-9]+$/) { print "Error! Column 4 is not an integer:"; print $0; exit 1}}' data.txt
As you get more complex with your error detection, you'll probably want to put the awk script in a file and invoke it with awk -f verify.awk data.txt.
Edit: in the form you'd put into verify.awk:
{
if ($4 !~/^[0-9]+$/) {
print "Error! Column 4 is not an integer:"
print $0
exit 1
}
}
Note that I've made awk exit with a non-zero code, so that you can easily check it in your calling script with something like this in bash:
if awk -f verify.awk data.txt; then
# action for success
else
# action for failure
fi
You could use grep, but it doesn't inherently recognize columns. You'd be stuck writing patterns to match the columns.
awk is what you need.
I can't upvote yet, but I would upvote Jefromi's answer if I could.
Sometimes you need it BASH only, because tr, cut & awk behave differently on Linux/Solaris/Aix/BSD/etc:
while read a b c d e ; do [[ "$d" =~ ^[0-9] ]] || echo "$a: $d not a numer" ; done < data
Edited....
#!/bin/bash
isdigit ()
{
[ $# -eq 1 ] || return 0
case $1 in
*[!0-9]*|"") return 0;;
*) return 1;;
esac
}
while read line
do
col=($line)
digit=${col[3]}
if isdigit "$digit"
then
echo "err, no digit $digit"
else
echo "hey, we got a digit $digit"
fi
done
Use this in a script foo.sh and run it like ./foo.sh < data.txt
See tldp.org for more info
Pure Bash:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer"; fi; ((linenum++)); done < data.txt
To stop at the first error, add a break:
linenum=1; while read line; do field=($line); if ((linenum>1)); then [[ ! ${field[3]} =~ ^[[:digit:]]+$ ]] && echo "FAIL: line number: ${linenum}, value: '${field[3]}' is not an integer" && break; fi; ((linenum++)); done < data.txt
cut -f 4 filename
will return the fourth field of each line to stdout.
Hopefully that's a good start, because it's been a long time since I had to do any major shell scripting.
Mind, this may well not be the most efficient compared to iterating through the file with something like perl.
tail +2 x.x | sort -n -k 4 | head -1 | cut -f 4 | egrep "^[0-9]+$"
if [ "$?" == "0" ]
then
echo "file is ok";
fi
tail +2 gives you all but the first line (since your sample has a header)
sort -n -k 4 sorts the file numerically on the 4th column, letters will rise to the top.
head -1 gives you the first line of the file
cut -f 4 gives you the 4th column, of the first line
egrep "^[0-9]+$" checks if the value is a number (integers in this case).
If egrep finds nothing, $? is 1, otherwise it's 0.
There's also:
if [ `tail +2 x.x | wc -l` == `tail +2 x.x | cut -f 4 | egrep "^[0-9]+$" | wc -l` ] then
echo "file is ok";
fi
This will be faster, requiring two simple scans through the file, but it's not a single pipeline.
#OP, use awk
awk '$4+0<=0{print "not ok";exit}' file

Resources