Need help writing this specific bash script - bash

Construct the pipe to execute the following job.
"Output of ls should be displayed on the screen and from this output the lines
containing the word ‘poem’ should be counted and the count should be
stored in a file.”

If bash is allowed, use a process substitution as the receiver for tee
ls | tee >( grep -c poem > number.of.poetry.files)

Your attempt was close:
ls | tee /dev/tty | grep poem | wc -l >number_of_poems
The tee /dev/tty copies all ls output to the terminal. This satisfies the requirement that "Output of ls should be displayed on the screen." while also sending ls's output to grep's stdin.
This can be further simplified:
ls | tee /dev/tty | grep -c poem >number_of_poems
Note that neither of these solutions require bash. Both will work with lesser shells and, in particular, with dash which is the default /bin/sh under debian-like systems.

This sounds like a homework assignment :)
#!/bin/bash
ls
ls -l | grep -c poem >> file.txt
The first ls will display the output on the screen
The next line uses a series of pipes to output the number of files/directories containing "poem"
If there were 5 files with poem in them, file.txt would read 5. If file.txt already exists, the new count will be appended to the end. If you want overwrite file each time, change the line to read ls -l | grep -c poem > file.txt

Related

Why does "ls > out | cat < out" only output the first time I run it in Bash?

I am programming a Bash-like shell. I am having trouble understanding how this interaction works.
This command
ls > out | cat < out
only outputs the ls the first time I run it, and then nothing. In zsh it outputs everytime but not in Bash.
You're trying to give the parser conflicting directives.
This is like telling someone to "Turn to the left on your right."
<, >, and | all instruct the interpreter to redirect I/O according to rules.
Look at this bash example:
$: echo two>two # creates a file named two with the word two in it
$: echo one | cat < two <<< "three" << END
four
END
four
$: echo one | cat < two <<< three
three
$: echo one | cat < two
two
$: echo one | cat
one
Understand that putting a pipe character (|) between commands links the output of the first one to the input of the second one, so also giving each an input/output redirection that conflicts with that is nonsensical.
ls | cat # works - output of ls is input for cat
ls > out; cat < out # works - ls outputs to out, then cat reads out
ls > >(cat) # works
cat < <(ls) # works
but ls >out | cat sends the output from ls to out, and then attaches the output of that operation (of which there is none, because it's already been captured) to cat, which exits with no input or output.
If what you wanted was to have the output both go to a file and to the console, then either use ls > out; cat < out which makes them separate operations, or try
ls | tee out
which explicitly splits the stream to both the file and stdout.

The flow of stdout from combined commands

I need to edit a bash script that sorts .vcf files. vcf files are roughly structured as shown below:
## header line
## header line
…
Data line
Data line
…
The script is called vcfsort and is part of a library for manipulating vcf files. It looks like this:
head -1000 $1 | grep "^#"; cat $# | grep -v "^#" | sort -k1,1d -k2,2n
And it is run by writing vcfsort input.vcf > output.vcf.
I understand roughly what it does: since sorting should only be done on the data lines, it gets the header lines:
head -1000 $1 | grep "^#";
And combines it with sorted data lines:
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
I need the head command to read more lines. Instead of calling vcfsort like above, I thought I could just edit the script myself and write it out directly as a command like this:
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
This does not work as expected. My attempt above writes the correct output to stdout, if I leave out > output.vcf. However, if I include it, only the data lines are written to file and the header lines are written to stdout. So, I have a couple of questions:
In this stack overflow answer, it is said that to combine
semicolon-separated commands, they should be enclosed in parentheses. Why is that not the case in the vcfsort script?
Why is $# used in the cat command instead of $1? $# should refer to all of a shell scripts arguments, but since only one is given (the input file), why not just use $1? If there is a reason for this, how can I transfer that to my command line expression?
Why do I only get part of the stdout when I send it to a file?
Could you show me the edits I need to make to get my command to work as intended?
So the script gets first 1000 lines of first file!
Separates header, and basically just copy all comments in those first 1000 lines to output.
Next, it filters all comments lines (leaving only data lines) for all files, and does sorting.
so if you use
vcfsort file1 file2 file3
$1 = "file1" and header from file1 only will be presented in output.
while $# referring to all files: "file1 file2 file3"
if you need to get headers from all files and merge it - I would recommend to use loop.
for file in $#; do
head -1000 $file | grep "^#";
done
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
Why do I only get part of the stdout when I send it to a file?
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
Each command executing separatelly (divided by semicolon ";"). So in example above you just redirecting data lines output after sorting. It doesn't redirect to file header part.
I would recommend to delete redirecting to file and just use:
vcfsort input.vcf > output.vcf
This does not work as expected
May I know what was expected?
There are two command lists, separated by a ;, inside vcfsort:
head -1000 $1 | grep "^#"
cat $# | grep -v "^#" | sort -k1,1d -k2,2n
Each list is a single pipeline. The final two commands in each pipeline inherit their standard output from vcfsort, so that when you run
vcfsort input.vcf > output.vcf
both grep and sort write to output.vcf.
The equivalent using braces would be (replacing ; with a newline for readability)
# Quoting the parameter expansions is important, to protect
# against word-splitting and pathname expansion of the original arguments.
{ head -1000 "$1" | grep "^#"
cat "$#" | grep -v "^#" | sort -k1,1d -k2,2n
} > output.vcf
Output redirections apply only to a single command, not a command list. Here, a command group serves as that single command:
the standard output of the command group is output.vcf, and the two lists in the group inherit that just as before.
Your attempt
head -10000 input.vcf | grep "^#"; cat input.vcf | grep -v "^#" | sort -k1,1d -k2,2n > output.vcf
only opened output.vcf to use as the standard output for sort; the standard output of grep remains whatever standard output it inherits from its parent, namely your terminal.

How to store the output of a Bash command inside >() in a variable?

I have this:
tee < /some/big/file >(wc -c) >(md5sum) | ...
Instead of writing the results of wc -c and md5sum to stdout, I want to store the results into two variables for later processing. I don't want to read the file more than once. How can I accomplish that?
This doesn't directly answer your question... but you only have to read the file once, and you don't have to use tee.
SIZE=$(wc -c /some/big/file)
MD5=$(md5 /some/big/file)
So, how many times does this read the file? Once. This is because wc -c doesn't actually read the file, it just looks at how long it is and reports back. Here are my tests:
$ time wc -c /big/file >/dev/null
real 0m0.003s
user 0m0.000s
sys 0m0.000s
$ time wc -c </big/file >/dev/null
real 0m0.004s
user 0m0.000s
sys 0m0.000s
$ time cat /big/file | wc -c >/dev/null
real 0m52.945s
user 0m0.160s
sys 0m19.612s
The lesson: don't mix tee (or cat) with wc -c, because it's a big waste of time. Just do the md5 normally and don't worry about wc -c.
Note: The reason why wc -c <file is fast is because it gets an ordinary file handle, just as if wc had called open() itself.
Pipe performance
You should almost never use cat in a pipe.
cat file | cmd # slow
cmd <file # fast
Calling cat usually means creating an extra process which serves no purpose. In some cases, as with wc -c, it actually slows down the program after the pipe. I mean, you could stick cat anywhere you like, but it's just silly:
echo 'hello, world' | cat
cat file.txt | less
cat file.txt | cat | less
cat file.txt | cat | sort | cat | cat | uniq | cat >file_unique.txt
This is better:
echo 'hello, world'
less file.txt
sort file.txt | uniq >file_unique.txt
You can do this with a FIFO and temporary files.
input=/some/big/file
mkfifo tmp
wc -l <tmp >wc.out &
md5=$(tee <"$input" tmp | md5sum)
fg
lines=$(cat wc.out)
rm tmp
rm wc.out
May be someone can offer something better, but I think your best shot is
LINES=$(cat /some/big/file | wc -l)
MD5=$(cat /some/big/file | md5)
The reason, I think it's the best shot, is that one way or another you will have to apply two separate operations to the content of the file. So unless you have a command that knows how to do it at the same time you will have to read it twice.
Beyond reading it twice, that should be a solution to your problem.
I don't think you can export a variable from the >() (Process substitution). If you don't mind redirecting the output to temporary files you can then just read the file into a variable.
Example
tee < /some/big/file >(cmd1 > tmp1) >(cmd2 > tmp2) | ...
CMD1_OUT=$(cat tmp1)
CMD2_OUT=$(cat tmp2)

bash echo number of lines of file given in a bash variable without the file name

I have the following three constructs in a bash script:
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE)
echo $NUMOFLINES" lines"
echo $(wc -l $JAVA_TAGS_FILE)" lines"
echo "$(wc -l $JAVA_TAGS_FILE) lines"
And they both produce identical output when the script is run:
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
121711 /home/slash/.java_base.tag lines
I.e. the name of the file is also echoed (which I don't want to). Why do these scriplets fail and how should I output a clean:
121711 lines
?
An Example Using Your Own Data
You can avoid having your filename embedded in the NUMOFLINES variable by using redirection from JAVA_TAGS_FILE, rather than passing the filename as an argument to wc. For example:
NUMOFLINES=$(wc -l < "$JAVA_TAGS_FILE")
Explanation: Use Pipes or Redirection to Avoid Filenames in Output
The wc utility will not print the name of the file in its output if input is taken from a pipe or redirection operator. Consider these various examples:
# wc shows filename when the file is an argument
$ wc -l /etc/passwd
41 /etc/passwd
# filename is ignored when piped in on standard input
$ cat /etc/passwd | wc -l
41
# unusual redirection, but wc still ignores the filename
$ < /etc/passwd wc -l
41
# typical redirection, taking standard input from a file
$ wc -l < /etc/passwd
41
As you can see, the only time wc will print the filename is when its passed as an argument, rather than as data on standard input. In some cases, you may want the filename to be printed, so it's useful to understand when it will be displayed.
wc can't get the filename if you don't give it one.
wc -l < "$JAVA_TAGS_FILE"
You can also use awk:
awk 'END {print NR,"lines"}' filename
Or
awk 'END {print NR}' filename
(apply on Mac, and probably other Unixes)
Actually there is a problem with the wc approach: it does not count the last line if it does not terminate with the end of line symbol.
Use this instead
nbLines=$(cat -n file.txt | tail -n 1 | cut -f1 | xargs)
or even better (thanks gniourf_gniourf):
nblines=$(grep -c '' file.txt)
Note: The awk approach by chilicuil also works.
It's a very simple:
NUMOFLINES=$(cat $JAVA_TAGS_FILE | wc -l )
or
NUMOFLINES=$(wc -l $JAVA_TAGS_FILE | awk '{print $1}')
I normally use the 'back tick' feature of bash
export NUM_LINES=`wc -l filename`
Note the 'tick' is the 'back tick' e.g. ` not the normal single quote

Use output of bash command (with pipe) as a parameter for another command

I'm looking for a way to use the ouput of a command (say command1) as an argument for another command (say command2).
I encountered this problem when trying to grep the output of who command but using a pattern given by another set of command (actually tty piped to sed).
Context:
If tty displays:
/dev/pts/5
And who displays:
root pts/4 2012-01-15 16:01 (xxxx)
root pts/5 2012-02-25 10:02 (yyyy)
root pts/2 2012-03-09 12:03 (zzzz)
Goal:
I want only the line(s) regarding "pts/5"
So I piped tty to sed as follows:
$ tty | sed 's/\/dev\///'
pts/5
Test:
The attempted following command doesn't work:
$ who | grep $(echo $(tty) | sed 's/\/dev\///')"
Possible solution:
I've found out that the following works just fine:
$ eval "who | grep $(echo $(tty) | sed 's/\/dev\///')"
But I'm sure the use of eval could be avoided.
As a final side node: I've noticed that the "-m" argument to who gives me exactly what I want (get only the line of who that is linked to current user). But I'm still curious on how I could make this combination of pipes and command nesting to work...
One usually uses xargs to make the output of one command an option to another command. For example:
$ cat command1
#!/bin/sh
echo "one"
echo "two"
echo "three"
$ cat command2
#!/bin/sh
printf '1 = %s\n' "$1"
$ ./command1 | xargs -n 1 ./command2
1 = one
1 = two
1 = three
$
But ... while that was your question, it's not what you really want to know.
If you don't mind storing your tty in a variable, you can use bash variable mangling to do your substitution:
$ tty=`tty`; who | grep -w "${tty#/dev/}"
ghoti pts/198 Mar 8 17:01 (:0.0)
(You want the -w because if you're on pts/6 you shouldn't see pts/60's logins.)
You're limited to doing this in a variable, because if you try to put the tty command into a pipe, it thinks that it's not running associated with a terminal anymore.
$ true | echo `tty | sed 's:/dev/::'`
not a tty
$
Note that nothing in this answer so far is specific to bash. Since you're using bash, another way around this problem is to use process substitution. For example, while this does not work:
$ who | grep "$(tty | sed 's:/dev/::')"
This does:
$ grep $(tty | sed 's:/dev/::') < <(who)
You can do this without resorting to sed with the help of Bash variable mangling, although as #ruakh points out this won't work in the single line version (without the semicolon separating the commands). I'm leaving this first approach up because I think it's interesting that it doesn't work in a single line:
TTY=$(tty); who | grep "${TTY#/dev/}"
This first puts the output of tty into a variable, then erases the leading /dev/ on grep's use of it. But without the semicolon TTY is not in the environment by the moment bash does the variable expansion/mangling for grep.
Here's a version that does work because it spawns a subshell with the already modified environment (that has TTY):
TTY=$(tty) WHOLINE=$(who | grep "${TTY#/dev/}")
The result is left in $WHOLINE.
#Eduardo's answer is correct (and as I was writing this, a couple of other good answers have appeared), but I'd like to explain why the original command is failing. As usual, set -x is very useful to see what's actually happening:
$ set -x
$ who | grep $(echo $(tty) | sed 's/\/dev\///')
+ who
++ sed 's/\/dev\///'
+++ tty
++ echo not a tty
+ grep not a tty
grep: a: No such file or directory
grep: tty: No such file or directory
It's not completely explicit in the above, but what's happening is that tty is outputting "not a tty". This is because it's part of the pipeline being fed the output of who, so its stdin is indeed not a tty. This is the real reason everyone else's answers work: they get tty out of the pipeline, so it can see your actual terminal.
BTW, your proposed command is basically correct (except for the pipeline issue), but unnecessarily complex. Don't use echo $(tty), it's essentially the same as just tty.
You can do it like this:
tid=$(tty | sed 's#/dev/##') && who | grep "$tid"

Resources