bash "read line" in while loop never ends - bash

I have the following code reading lines from a file and outputting a count:
while read -u 3 -r line; do
echo $i
i=$(( i + 1))
done 3<"$IN_FILE"
(I want to do more inside the loop, but this illustrates the issue).
This loop never ends for me. My IN_FILE contains 28.8M lines (as confirmed with wc -l), but it just keeps going and is outputting counts up to ~35M before I manually kill it. If I use head/tail to create a small sample of this file, it runs just fine and terminates as expected.
Does anyone have any idea what could cause this? Is there some special character that my file might contain that would cause the redirect to go into a loop?
If it's relevant, I'm running this bash script in Mac OS X terminal...
Thanks.

Maybe try using split -l 1000000 $IN_FILE to break it up into 29 files and see if any of those has the weird behavior?

Related

variable passing through awk command [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 1 year ago.
here's my issue, I have a bunch of fastq.gz files and I need to determine the number of lines of it (this is not the issue), and from that number of line derive a value that determine a threshold used as a variable used down in the same loop. I browsed but cannot find how to do it. here's what I have so far:
for file in *R1.fastq*; do
var=echo $(zcat "$file" | $((`wc -l`/400000)))
for i in *Bacter*; do
awk -v var1=$var '{if($2 >= var1) print $0}' ${i} | wc -l >> bacter-filtered.txt
done
done
I get the error message: -bash: 14850508/400000: No such file or directory
any help would be greatly appreciated !
The problem is in the line
var=echo $(zcat "$file" | $((`wc -l`/400000)))
There are a bunch of shell syntax elements here combined in ways that don't connect up with each other. To keep things straight, I'd recommend splitting it into two separate operations:
lines=$(zcat "$file" | wc -l)
var=$((lines/400000))
(You may also have to do something about the output to bacter-filtered.txt -- it's just going to contain a bunch of numbers, with no identifications of which ones come from which files. Also since it always appends, if you run this twice you'll have the output from both runs stuck together. You might want to replace all those appends with a single > bacter-filtered.txt after the last done, so the whole output just gets stored directly.)
What's wrong with the original? Well, let's start with this:
zcat "$file" | $((`wc -l`/400000))
Unless I completely misunderstand, the purpose here is to extract $file (with zcat), count lines in the result (with wc -l), and divide that by 400000. But since the output of zcat isn't piped directly to wc, it's piped to a complex expression involving wc, it's somewhat ambiguous what should happen, and is actually different under different shells. In zsh, it does something completely different from that: it lets wc read from the script's stdin (generally your Terminal), divides the result from that by 400000, and then pipes the output of zcat to that ... number?
In bash, it does something closer to what you want: wc actually does read from the output of zcat, so the second part of the pipe essentially turns into:
... | $((14850508/400000))
Now, what I'd expect to happen at this point (and happens in my tests) is that it should evaluate $((14850508/400000)) into 37, giving:
... | 37
which will then try to execute 37 as a command (because it's part of a pipeline, and therefore is supposed to be a command). But for some reason it's apparently not evaluating the division and just trying to execute 14850508/400000 as a command. Which doesn't really work any better or worse than 37, so I guess it doesn't matter much.
So that's where the error is coming from, but there's actually another layer of confusion in the original line. Suppose that internal pipeline was fixed so that it properly output "37" (rather than trying to execute it). The outer structure would then be:
var=echo $(cmdthatprints37)
The $( ) basically means "run the command inside, and substitute its output into the command line here", so that would evaluate to:
var=echo 37
...which, in shell syntax, means "run the command 37 with var set to "echo" in its environment.
The solution here would be simple. The echo is messing everything up so remove it:
var=$(cmdthatprints37)
...which evaluates to:
var=37
...which is what you want. Except that, as I said above, it'd be better to split it up and do the command bits and the math separately rather than getting them mixed up.
BTW, I'd also recommend some additional double-quoting of shell variables; shellcheck.net will be happy to point out where.

why does tail -F -n 1 myfile.txt print .all. the contents of myfile as it gets updated?

What I'm trying to do is really simple - I want to monitor a file and print its last line to the screen as the file gets updated. From what I know,
tail -F -n 1 myfile.txt
should do exactly that. However, I get strange behaviour: With the "original" myfile.txt, the command works fine and only the last line is printed to the screen. However, as soon as I alter myfile.txt by appending new lines of text, the entire contents of myfile.txt are printed - rather than just the very last line.
I have never used tail before and I might just be getting something terribly wrong here, but surely that's not the expected behaviour? I purposefully use the -F flag so I can manually alter myfile.txt - could that be the reason for it not working?
Help is very much appreciated...
Thanks so, so much!
No, that's the way it's meant to work, -n 1 is the initial behaviour, printing only the last line, but -F follows the file beyond that point, and states quite clearly:
output appended data as the file grows;
In other words, it outputs all the appended data that's gone to the file.
If you examine the source code, you'll notice the main() function first processes the -n option and, only at the end, does it call tail_forever(), in which there is no mention of the argument supplied with -n.
If you execute:
( echo 1; echo 2; echo 3 ) >qq
in one window then start up a tail in another:
tail -F -n 1
you should get only the line with 3.
If you then return to the original window and execute:
( echo 4; echo 5; echo 6 ) >>qq
your second window should just output the new lines (and all three of them).
If your second window gives you all six of the lines, it's broken.

mpg123 plays only 10 songs then quits

#/bin/bash
ls |sort -R |tail -$N |while read file; do
mpg123 "$file"
sleep 3
done
any idea why it only plays 10 mp3's and exits?
There are hundreds of mp3's in the same directory as this file (playmusic.sh)
Thanks
As Marc B said the problem occurs due to the variable N not being set which leads tail to default to its default number of lines which is 10. (Obviously it can also occur if N is actually set to 10.)
The fundamental problem here is you didn't understand what this code actually does. I suspect you didn't actually write this code yourself. Even though it's a bash script, it expects a variable N to be set. This is highly unorthodox for a bash script, you would normally use
$1
instead of $N, or better still
${1:?}
which would display an error and exit immediately, if you forgot to pass in a command-line argument.

Weird characters when I print a number into a file (bash shell)

I have this line in my script.sh
printf "%d" "$endMS_line"
$endMS_line is a number. I get that number with
endMS_line=`cat file | awk '{if($1=='"$variable"') print NR}'`
And to print it I use
printf "%d" "$endMS_line"
or
echo $endMS_line
So everything works perfectly in the standard output. The problem is when I want to save that number into a file (because I want to use the result in another script, may be there is a clever way to do it than write a file and then read the number from the file, etc..)
But for now I am trying to do that. How? Well I write this in the standard output.
myscript.sh inputs > file.txt
But when I try to see the file (when I open the file) I see the result plus weird characteres
[H[2J867
The correct number in this example is 867. Anyone know how can I fix this?
Thank you!
At the begginning of the script I had the command:
clear
removing that and using:
echo "$endMS_line"
Then in the standard output:
myscript.sh input > file.txt
works perfectly.

Handle special characters in bash for...in loop

Suppose I've got a list of files
file1
"file 1"
file2
a for...in loop breaks it up between whitespace, not newlines:
for x in $( ls ); do
echo $x
done
results:
file
1
file1
file2
I want to execute a command on each file. "file" and "1" above are not actual files. How can I do that if the filenames contains things like spaces or commas?
It's a little trickier than I think find -print0 | xargs -0 could handle, because I actually want the command to be something like "convert input/file1.jpg .... output/file1.jpg" so I need to permutate the filename in the process.
Actually, Mark's suggestion works fine without even doing anything to the internal field separator. The problem is running ls in a subshell, whether by backticks or $( ) causes the for loop to be unable to distinguish between spaces in names. Simply using
for f in *
instead of the ls solves the problem.
#!/bin/bash
for f in *
do
echo "$f"
done
UPDATE BY OP: this answer sucks and shouldn't be on top ... #Jordan's post below should be the accepted answer.
one possible way:
ls -1 | while read x; do
echo $x
done
I know this one is LONG past "answered", and with all due respect to eduffy, I came up with a better way and I thought I'd share it.
What's "wrong" with eduffy's answer isn't that it's wrong, but that it imposes what for me is a painful limitation: there's an implied creation of a subshell when the output of the ls is piped and this means that variables set inside the loop are lost after the loop exits. Thus, if you want to write some more sophisticated code, you have a pain in the buttocks to deal with.
My solution was to take the "readline" function and write a program out of it in which you can specify any specific line number that you may want that results from any given function call. ... As a simple example, starting with eduffy's:
ls_output=$(ls -1)
# The cut at the end of the following line removes any trailing new line character
declare -i line_count=$(echo "$ls_output" | wc -l | cut -d ' ' -f 1)
declare -i cur_line=1
while [ $cur_line -le $line_count ] ;
do
# NONE of the values in the variables inside this do loop are trapped here.
filename=$(echo "$ls_output" | readline -n $cur_line)
# Now line contains a filename from the preceeding ls command
cur_line=cur_line+1
done
Now you have wrapped up all the subshell activity into neat little contained packages and can go about your shell coding without having to worry about the scope of your variable values getting trapped in subshells.
I wrote my version of readline in gnuc if anyone wants a copy, it's a little big to post here, but maybe we can find a way...
Hope this helps,
RT

Resources