Whats is the behaviour of the "wc" command?

Whats is the behaviour of the "wc" command? - bash

For example:
myCleanVar=$( wc -l < myFile )
myDirtVar=$( wc -l myFile )
echo $myCleanVar
9
echo $myDirtVar
9 myFile
why in "myCleanVar" I get an "integer" value from the "wc" command and in "myDirtVar" I get something like as: "9 file.txt"? I quoted "integer" because in know that in Bash shell by default all is treated as a string, but can't understand the differences of the behaviour of first and second expression. What is the particular effect of the redirection "<" in this case?

wc will list by default the name of file allowing you to use it on more than one file (and have result for all of them). If no filename is specified, the "standard input", which is usually the console input, is used, and no file name is printed. The < is needed to specify an "input redirection", that is read the input from given file instead of using user input.
Put all this information together and you get the reason of wc's behavior in your example
Question time: what would be the output of cat file | wc -l ?

The man page for wc says:
NAME
wc - print newline, word, and byte counts for each file
SYNOPSIS
wc [OPTION]... [FILE]...
wc [OPTION]... --files0-from=F
DESCRIPTION
Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified.
So when you pass the file as an argument it will also print the name of the file because you could pass more than one file to wc for it to count lines, so you know which file has that line count. Of course when you use the stdin instead it does not know the name of the file so it doesn't print it.
Example
$ wc -l FILE1 FILE2
2 FILE1
4 FILE2
6 total

Related

Writing word count in a text file in bash script

I am very new into bashscript, how can I write word count, word size and character size in the text file itself? My current code is:
#!bin/bash/
echo "start"
cat file #print file.txt
wc -m file #character count
wc -w file # word count
wc -c file # size
echo "end"
I want to append the terminal outputs into my text file. Text file should be like this:
...text
The size of this file: x , word count: x , character count: x. How can I do that?

You mean append the file's metadata to the same file?
Using GNU wc:
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: ${0##*/} FILE ..." >&2
exit 1
fi
for file do
wc -cwm "$file" |
awk -v file="$file" \
'{print "File size: "$3", word count: "$1", character count: "$2 >> file}'
done
Also works on Busybox wc. You can provide multiple files.
Note that if wc is given multiple flags, it always prints the numbers in the same order, regardless of the order the flags are given.
From man wc (GNU)
The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length.
However, POSIX says this:
By default, the standard output shall contain an entry for each input file of the form:
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
If the -m option is specified, the number of characters shall replace the field in this format.
source: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html
I haven't tested on other platforms, but apparently, a POSIX compliant wc would need two separate invocations to get both byte and character count. The reasoning for using one was to avoid a file changing between counts.
If you know the file is ASCII (and not UTF-8 for example), you could just reuse the byte count for the character count.

Trying to create script that uses two arguments to find number of occurences within string

I am trying to write a bash script where it takes two inputs from the user, uses the first as a three letter codon string, and uses the second as a pathname of a file containing a valid DNA string (which I have made and called dnafile.txt in my directory, and already stored a DNA string in). The script is supposed to output the number of occurrences of the codon given as argument 1 in the file given as argument 2. What do I do?

You can use grep's -o flag to only output matches, then count them using wc -l as such:
grep -o $1 $2 | wc -l
For example,
$ cat dnafile.txt
aacgtttgtaaccagaac
$ grep -o 'aac' dnafile.txt | wc -l
3

Send output from `split` utility to stdout

From this question, I found the split utilty, which takes a file and splits it into evenly sized chunks. By default, it outputs these chunks to new files, but I'd like to get it to output them to stdout, separated by a newline (or an arbitrary delimiter). Is this possible?
I tried cat testfile.txt | split -b 128 - /dev/stdout
which fails with the error split: /dev/stdoutaa: Permission denied.
Looking at the help text, it seems this tells split to use /dev/stdout as a prefix for the filename, not to write to /dev/stdout itself. It does not indicate any option to write directly to a single file with a delimiter. Is there a way I can trick split into doing this, or is there a different utility that accomplishes the behavior I want?

It's not clear exactly what you want to do, but perhaps the --filter option to split will help out:
--filter=COMMAND
write to shell COMMAND; file name is $FILE
Maybe you can use that directly. For example, this will read a file 10 bytes at a time, passing each chunk through the tr command:
split -b 10 --filter "tr [:lower:] [:upper:]" afile
If you really want to emit a stream on stdout that has separators between chunks, you could do something like:
split -b 10 --filter 'dd 2> /dev/null; echo ---sep---' afile
If afile is a file in my current directory that looks like:
the quick brown fox jumped over the lazy dog.
Then the above command will result in:
the quick ---sep---
brown fox ---sep---
jumped ove---sep---
r the lazy---sep---
dog.
---sep---

From info page :
`--filter=COMMAND'
With this option, rather than simply writing to each output file,
write through a pipe to the specified shell COMMAND for each
output file. COMMAND should use the $FILE environment variable,
which is set to a different output file name for each invocation
of the command.
split -b 128 --filter='cat ; echo ' inputfile

Here is one way of doing it. You will get each 128 character into variable "var".
You may use your preferred delimiter to print or use it for further processing.
#!/bin/bash
cat yourTextFile | while read -r -n 128 var ; do
printf "\n$var"
done
You may use it as below at command line:
while read -r -n 128 var ; do printf "\n$var" ; done < yourTextFile

No, the utility will not write anything to standard output. The standard specification of it says specifically that standard output in not used.
If you used split, you would need to concatenate the created files, inserting a delimiter in between them.
If you just want to insert a delimiter every N th line, you may use GNU sed:
$ sed '0~3a\-----\' file
This inserts a line containing ----- every 3rd line.

To divide the file into chunks, separated by newlines, and write to stdout, use fold:
cat yourfile.txt | fold -w 128
...will write to stdout in "chunks" of 128 chars.

How to quickly check a .gz file without unzip? [duplicate]

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.

zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.

On some systems (e.g., Mac), you need to use gzcat.

On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head

If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.

If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16

This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.

AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt

This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).

awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com

This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.

Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Whats is the behaviour of the "wc" command? - bash

Related

Writing word count in a text file in bash script

Trying to create script that uses two arguments to find number of occurences within string

Send output from `split` utility to stdout

How to quickly check a .gz file without unzip? [duplicate]

'grep +A': print everything after a match [duplicate]

Categories

Resources