variable passing through awk command [duplicate]

variable passing through awk command [duplicate] - bash

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 1 year ago.
here's my issue, I have a bunch of fastq.gz files and I need to determine the number of lines of it (this is not the issue), and from that number of line derive a value that determine a threshold used as a variable used down in the same loop. I browsed but cannot find how to do it. here's what I have so far:
for file in *R1.fastq*; do
var=echo $(zcat "$file" | $((`wc -l`/400000)))
for i in *Bacter*; do
awk -v var1=$var '{if($2 >= var1) print $0}' ${i} | wc -l >> bacter-filtered.txt
done
done
I get the error message: -bash: 14850508/400000: No such file or directory
any help would be greatly appreciated !

The problem is in the line
var=echo $(zcat "$file" | $((`wc -l`/400000)))
There are a bunch of shell syntax elements here combined in ways that don't connect up with each other. To keep things straight, I'd recommend splitting it into two separate operations:
lines=$(zcat "$file" | wc -l)
var=$((lines/400000))
(You may also have to do something about the output to bacter-filtered.txt -- it's just going to contain a bunch of numbers, with no identifications of which ones come from which files. Also since it always appends, if you run this twice you'll have the output from both runs stuck together. You might want to replace all those appends with a single > bacter-filtered.txt after the last done, so the whole output just gets stored directly.)
What's wrong with the original? Well, let's start with this:
zcat "$file" | $((`wc -l`/400000))
Unless I completely misunderstand, the purpose here is to extract $file (with zcat), count lines in the result (with wc -l), and divide that by 400000. But since the output of zcat isn't piped directly to wc, it's piped to a complex expression involving wc, it's somewhat ambiguous what should happen, and is actually different under different shells. In zsh, it does something completely different from that: it lets wc read from the script's stdin (generally your Terminal), divides the result from that by 400000, and then pipes the output of zcat to that ... number?
In bash, it does something closer to what you want: wc actually does read from the output of zcat, so the second part of the pipe essentially turns into:
... | $((14850508/400000))
Now, what I'd expect to happen at this point (and happens in my tests) is that it should evaluate $((14850508/400000)) into 37, giving:
... | 37
which will then try to execute 37 as a command (because it's part of a pipeline, and therefore is supposed to be a command). But for some reason it's apparently not evaluating the division and just trying to execute 14850508/400000 as a command. Which doesn't really work any better or worse than 37, so I guess it doesn't matter much.
So that's where the error is coming from, but there's actually another layer of confusion in the original line. Suppose that internal pipeline was fixed so that it properly output "37" (rather than trying to execute it). The outer structure would then be:
var=echo $(cmdthatprints37)
The $( ) basically means "run the command inside, and substitute its output into the command line here", so that would evaluate to:
var=echo 37
...which, in shell syntax, means "run the command 37 with var set to "echo" in its environment.
The solution here would be simple. The echo is messing everything up so remove it:
var=$(cmdthatprints37)
...which evaluates to:
var=37
...which is what you want. Except that, as I said above, it'd be better to split it up and do the command bits and the math separately rather than getting them mixed up.
BTW, I'd also recommend some additional double-quoting of shell variables; shellcheck.net will be happy to point out where.

Related

does output from LHS of pipe become an arg for RHS of pipe

I'm having difficulty grasping how pipes work. Initially I thought of them as per the title but I couldn't get a simple example to work e.g.
mkdir temp
cd temp
echo "rubbish" > txtfile
ls | cat
I'm wondering why it returns the output from 'ls' rather than the output of 'cat txtfile' (i.e. "rubbish"). I've read many pipe tutorials but none of them seem to go beyond "STDOUT of LHS becomes STDIN for RHS" and I'm left wondering what is STDIN of RHS. Does it become the first argument? Where does it slot in when RHS of pipe has options or more than one argument. Is there any kind of macro substitution taking place or is my thinking wide of the mark.
Edit: I'm still none the wiser 5 comments later. I'll certainly take a look at Roadowl's pv utility but for now if I type
ls | cut -c 2-4
I get
xtf
which I'd expect. So, does cut take its input from stdin but cat doesn't?
Edit2: I stuck the question up on askubuntu (I originally put it up here by mistake). The answer there https://askubuntu.com/questions/1316848/does-output-from-lhs-of-pipe-become-an-arg-for-rhs-of-pipe throws a bit more light on it.
Edit3: While reading the answers here and ask ubuntu and the links therein it struck me (again) how woeful bash (& cohorts) are. It's almost like they're designed to trip you up. I only started using bash a couple of months back and every time I write a script I have to read endless web pages to get it to work or discover where I'm going wrong. Take a simple [[ $1=="..." ]] condition. You forget the spaces round the operator and the else condition might wipe some files you want without so much as a warning. Yes, you can do great things with it without a lot of typing but at times it's like using a tightrope to get from skyscraper A to skyscraper B to avoid using 2 lifts. What's up with gold c code like cat(ls())? That said, thanks to everyone who contributed.

I guess, you meant while performing
ls | cat
ls should return txtfile and which should go as a file input to cat command.
But, the things happening in the background are different :
First your shell creates a pipe using pipe(int pipefd[2]) system-call. This pipe has 2 ends, one is read and another is write.
When ls command is executing, it writes its output to the write end of the pipe and cat simultaneously reads from the read end of the pipe.
So, here STDOUT of ls is the write end whereas STDIN for cat is read end of the pipe.
While reading from the pipe cat will consider it as a stream of bytes and not as a name of the file.
So basically, cat is printing whatever is coming as a stream of bytes.
Read about pipe() over here : pipe(2) — Linux manual page

ls | cut -c 2-4
Here, cut reads its standard input, gets the line txtfile, takes characters 2 to 4 from it, producing xtf, and prints that on standard output. That's what the command line option tells it to do.
ls | cat
Here, cat reads its standard input, gets the line txtfile, and prints that on standard output, unchanged. That's what cat does. If there were further lines, it would do the same for those.
Both read standard input unless one or more file names are given as arguments. That standard input is connected to the terminal (the same one where you enter the command line), unless you use pipes or redirections to change that.
So, run the command cut -c 2-4, and enter the line abcdefghijkl, and it will print out bcd. Because without any arguments, it reads its standard input, which is the terminal, by default. Similarly for running just cat, you'll get back the same line you entered.
Running ls | cut -c 2-4 changes where the standard input comes from, but it doesn't create any new command line arguments (other than the -c and 2-4 you gave). Command line arguments are not the same as the standard input.
So, echo txtfile | cat is not the same as running cat txtfile, any more than running echo txtfile | cut -c 2-4 is the same as running cut -c 2-4 txtfile. For some reason, you seem to expect the pipe should work differently for cat than it does for cut.

Is there a way for me to simplify these echos? [duplicate]

This question already has answers here:
How do I iterate over a range of numbers defined by variables in Bash?
(20 answers)
Closed 3 years ago.
I am still learning how to shell script and I have been given a challenge to make it easier for me to echo "Name1" "Name2"..."Name15" and I'm not too sure where to start, I've had ideas but I don't want to look silly if I mess it up. Any help?
I haven't actually tried anything just yet it's all just been mostly thought.
#This is what I wrote to start
#!/bin/bash
echo "Name1"
echo "Name2"
echo "Name3"
echo "Name4"
echo "Name5"
echo "Name6"
echo "Name7"
echo "Name8"
echo "Name9"
echo "Name10"
echo "Name11"
echo "Name12"
echo "Name13"
echo "Name14"
echo "Name15"
My expected results are obviously just for it to output "Name1" "Name2" etc. But I'm looking for a more creative way to do it. If possible throw in a few ways to do it so I can learn. Thank you.

The easiest (possibly not the most creative) way to do this is to use printf:
printf "%s\n" name{1..15}
This relies on bash brace expansion {1..15} to have the 15 strings.

Use a for loop
for i in {1..15};do echo "Name$i";done

A few esoteric solutions, from the least to the most unreasonable :
base64 encoded string :
base64 -d <<<TmFtZTEKTmFtZTIKTmFtZTMKTmFtZTQKTmFtZTUKTmFtZTYKTmFtZTcKTmFtZTgKTmFtZTkKTmFtZTEwCk5hbWUxMQpOYW1lMTIKTmFtZTEzCk5hbWUxNApOYW1lMTUK
The weird chain is your expected result encoded in base64, an encoding generally used to represent binary data as text. base64 -d <<< weirdChain is passing the weird chain as input to the base64 tool and asking it to decode it, which displays your expected result
generate an infinite stream of "Name", truncate it, use line numbers :
yes Name | awk 'NR == 16 { exit } { printf("%s%s\n", $0, NR) }'
yes outputs an infinite stream of what it's passed as argument (or y by default, used to automatize interactive scripts asking for [y/n] confirmation). The awk command exits once it reaches the 16th line, and otherwise prints its input (provided by yes) followed by the line number. The truncature could as easily be done with head -15, and I've tried using the nl "number line" utility or grep -n to number lines, but they always added the line numbers as prefix which required an extra re-formatting step.
read random binary data and hope to stumble on all the lines you want to output :
timeout 1d strings /dev/urandom | grep -Eo "Name(1[0-5]|[1-9])" | sort -uV
strings /dev/urandom will extract ascii sequences from the binary random source /dev/urandom, grep will filter those which respect the format of a line of your expected output and sort will reorder those lines in the correct order. Since sort needs to have a received its whole input before it reorders it and /dev/urandom won't stop producing data, we use timeout 1d to stop reading from /dev/urandom after a whole day in hope it has sifted through enough random data to find your 15 lines (I'm not sure that's even remotely likely).
use an HTTP client to retrieve this page, extract the bash script you posted and execute it.
my_old_script=$(curl "https://stackoverflow.com/questions/57818680/" | grep "#This is what I wrote to start" -A 18 | tail -n+4)
eval "$my_old_script"
curl is a command line tool that can be used as an HTTP client, grep with its -A 18 parameter will select the "This is what I wrote to start" text and the 18 lines that follow, tail will remove the first 3 lines, and eval will execute your script.
While it will be much more efficient than the previous solution, it's an even less reasonable solution because high-rep users can edit your question to make this solution execute arbitrary code on your computer. Ideally you'd be using an HTML-aware parser rather than basic string manipulation to extract the code, but we're not talking about best practices here...

bash one-liner for opening `less` on the last screen w/o temporary files

I try to create a one-liner for opening less on the last screen of an multi-screen output coming from standard input. The reason for this is that I am working on a program that produces a long AST and I need to be able to traverse up and down through it but I would prefer to start at the bottom. I came up with this:
$ python a.py 2>&1 | tee >(lines=+$(( $(wc -l) - $LINES))) | less +$lines
First, I need to compute number of lines in output and subtract $LINES from it so I know what's the uppermost line of the last screen. I will need to reuse a.py output later so I use tee with process substitution for that purpose. As the last step I point less to open an original stdout on a particular line. Of course, it doesn't work in Bash because $lines is not set in last step as every subcommand is run in a subshell. In ZSH, even though pipe commands are not run in a subshell, process substitution still is and therefore it doesn't work neither. It's not a homework or a work task, I just wonder whether it's possible to do what I want without creating a temporary file in Bash or ZSH. Any ideas?

less supports this innately. The + syntax you're using accepts any less command you could enter while it's running, including G for go-to-end.
... | less +G
does exactly what you want.
This is actually mentioned explicitly as an example in the man page (search for "+G").

The real answer to your question should be the option +G to less, but you indicated that the problem definition is not representative for the abstract problem you want to solve. Therefore, please consideer this alternative problem:
python a.py 2>&1 | \
awk '
{a[NR]=$0}
END{
print NR
for (i=1;i<=NR;i++)print a[i]
}
' | {
read -r l
less -j-1 +$l
}
The awk command is printing the number of lines, and then all the lines in sequence. We define the first line to contain some meta information. This is piped to a group of commands delimited by { and }. The first line is consumed by read, which stores it in variable $l. The rest of the lines are taken by less, where this variable can be used. -j-1 is used, so the matched line is at the bottom of the screen.

Write a script that prints from a dir k strokes sorted by date

I have a directory "Main Dir" and I want to write a script, which will get 2 parameters: sorted_by_date , that will find in the directory the id-worker directory (it does exist) and in it, from a file called "sent.txt", it will print results-num (an integer) strokes sorted by date.
I'm a begginer in bash (have knowledge and skills mainly in C), and I stil didn't saw how to write scripts, but I've tried to do something from a little commands I learned and from a little search in the internet.
Can somebody help a newbie like me in my first script-writing ?
I'll paste here my first try:
#!/bin/bash
id_worker = "$1"
results_num = "$2"
sort -k3 -t "./Main Dir/id_worker/sent.text"
head -n+3 $results_num

I'm going to go out on a limb here, and assume your sort command is producing the information you want from the id_worker sent.txt file and that you are talking about the number of lines you want when you say strokes. Given the extended discussion in the comments, that is about the only thing I see that makes sense.
With that in mind, you were not that far off in your first attempt. What you needed to do to fix the sort command was to dereference your id_worker with $ to get the value you passed. In bash you assign variables as id_worker="something", but to get the value back, you must precede the variable with a $, just as you see with your id_worker="$1". NOTE: there are NO spaces allowed on either side of the '=' sign in bash. Putting that together, it looks like you intended:
sort -k3 -t "./Main Dir/$id_worker/sent.text"
Where you are beginning in the directory above Main Dir running your script because you have given a relative path "./Main Dir/stuff".
Now if you want to limit the number of lines to the first results_num lines of the sorted output, then you can use head, but you need to remove the "+" sign (which is only relevant with the tail command). To use it with the sorted output, you mustpipe the results of sort to head using the '|' pipe character. For example:
sort -k3 -t "./Main Dir/$id_worker/sent.text" | head -n $results_num
Putting all of the pieces that I think you intended, and including a short check to make sure both id_worker and results_num are given on the command line, you would end up with something like:
#!/bin/bash
## verify both arguments given
[ -z $1 -o -z $2 ] && {
printf "error: insufficient input. usage: %s worker num\n" "${0##*/}"
exit 1
}
id_worker="$1"
results_num="$2"
## pipe the results of sort to head to print first $results_num lines
sort -k3 -t "./Main Dir/$id_worker/sent.text" | head -n $results_num
Note: if you are having trouble with your script, run it with:
bash -x scriptname id_worker results_num
to enable line-by-line debugging output from bash. Let me know if I have not understood what you were saying or if the results are not what you intended. There are several ways of approaching this problem, but I do need to clearly understand what you want to go further. Good luck.

Handle special characters in bash for...in loop

Suppose I've got a list of files
file1
"file 1"
file2
a for...in loop breaks it up between whitespace, not newlines:
for x in $( ls ); do
echo $x
done
results:
file
1
file1
file2
I want to execute a command on each file. "file" and "1" above are not actual files. How can I do that if the filenames contains things like spaces or commas?
It's a little trickier than I think find -print0 | xargs -0 could handle, because I actually want the command to be something like "convert input/file1.jpg .... output/file1.jpg" so I need to permutate the filename in the process.

Actually, Mark's suggestion works fine without even doing anything to the internal field separator. The problem is running ls in a subshell, whether by backticks or $( ) causes the for loop to be unable to distinguish between spaces in names. Simply using
for f in *
instead of the ls solves the problem.
#!/bin/bash
for f in *
do
echo "$f"
done

UPDATE BY OP: this answer sucks and shouldn't be on top ... #Jordan's post below should be the accepted answer.
one possible way:
ls -1 | while read x; do
echo $x
done

I know this one is LONG past "answered", and with all due respect to eduffy, I came up with a better way and I thought I'd share it.
What's "wrong" with eduffy's answer isn't that it's wrong, but that it imposes what for me is a painful limitation: there's an implied creation of a subshell when the output of the ls is piped and this means that variables set inside the loop are lost after the loop exits. Thus, if you want to write some more sophisticated code, you have a pain in the buttocks to deal with.
My solution was to take the "readline" function and write a program out of it in which you can specify any specific line number that you may want that results from any given function call. ... As a simple example, starting with eduffy's:
ls_output=$(ls -1)
# The cut at the end of the following line removes any trailing new line character
declare -i line_count=$(echo "$ls_output" | wc -l | cut -d ' ' -f 1)
declare -i cur_line=1
while [ $cur_line -le $line_count ] ;
do
# NONE of the values in the variables inside this do loop are trapped here.
filename=$(echo "$ls_output" | readline -n $cur_line)
# Now line contains a filename from the preceeding ls command
cur_line=cur_line+1
done
Now you have wrapped up all the subshell activity into neat little contained packages and can go about your shell coding without having to worry about the scope of your variable values getting trapped in subshells.
I wrote my version of readline in gnuc if anyone wants a copy, it's a little big to post here, but maybe we can find a way...
Hope this helps,
RT

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio