Writing word count in a text file in bash script

Writing word count in a text file in bash script - bash

I am very new into bashscript, how can I write word count, word size and character size in the text file itself? My current code is:
#!bin/bash/
echo "start"
cat file #print file.txt
wc -m file #character count
wc -w file # word count
wc -c file # size
echo "end"
I want to append the terminal outputs into my text file. Text file should be like this:
...text
The size of this file: x , word count: x , character count: x. How can I do that?

You mean append the file's metadata to the same file?
Using GNU wc:
#!/bin/bash
if [[ $# -lt 1 ]]; then
echo "Usage: ${0##*/} FILE ..." >&2
exit 1
fi
for file do
wc -cwm "$file" |
awk -v file="$file" \
'{print "File size: "$3", word count: "$1", character count: "$2 >> file}'
done
Also works on Busybox wc. You can provide multiple files.
Note that if wc is given multiple flags, it always prints the numbers in the same order, regardless of the order the flags are given.
From man wc (GNU)
The options below may be used to select which counts are printed, always in the following order: newline, word, character, byte, maximum line length.
However, POSIX says this:
By default, the standard output shall contain an entry for each input file of the form:
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
If the -m option is specified, the number of characters shall replace the field in this format.
source: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html
I haven't tested on other platforms, but apparently, a POSIX compliant wc would need two separate invocations to get both byte and character count. The reasoning for using one was to avoid a file changing between counts.
If you know the file is ASCII (and not UTF-8 for example), you could just reuse the byte count for the character count.

Related

Looping and grep writes output for the last line only

I am looping through the lines in a text file. And performing grep on each lines through directories. like below
while IFS="" read -r p || [ -n "$p" ]
do
echo "This is the field: $p"
grep -ilr $p * >> Result.txt
done < fields.txt
But the above writes the results for the last line in the file. And not for the other lines.
If i manually execute the command with the other lines, it works (which mean the match were found). Anything that i am missing here? Thanks
The fields.txt looks like this
annual_of_measure__c
attached_lobs__c
apple

When the file fields.txt
has DOS/Windows lineending convention consisting of two character (Carriage-Return AND Linefeed) and
that file is processed by Unix-Tools expecting Unix lineendings consisting of only one character (Linefeed)
then the line read by the read command and stored in the variable $p is in the first line annual_of_measure__c\r (note the additional \r for the Carriage-Return). Then grep will not find a match.
From your description in the question and the confirmation in the comments, it seems that the last line in fields.txt has no lineending at all, so the variable $p is the ordinary string apple and grep can find a match on the last line of the file.
There are tools for converting lineendings, e.g. see this answer or even more options in this answer.

Count number of grep occurrences and store it a variable

I want to do something like this - grep for a string in a particular file, store it in a variable and be able to print just the number of occurrences.
#!/bin/bash
count=$(grep *something* *somefile*| wc -l)
echo $count
This always gives a 0 value, when I know it should be more.
This is what I intend to do, but its taking like forever to finish the script.
if egrep -iq "Android 6.0.1" $filename; then
count=$(egrep -ic "Android 6.0.1" $filename)
echo 'Operating System Version leaked number of times: '$count
I have 7 other such if statements and I am running this for around 20 files.
Any more efficient way to make it faster?

grep has its own counting flag
-c, --count
Suppress normal output; instead print a count of matching lines for
each input file. With the -v, --invert-match option (see below), count
non-matching lines. (-c is specified by POSIX .)
count=$( grep -c 'match' file)
Note that the match part is quoted as well so if you use special characters they are not interpreted by the shell.
Also as stated in the excerpt from that man page multiple matches on a single line will be counted as a single match as it only counts matching lines:
$ echo "hello hello hello hello
hello
> bye" | grep -c "hello"
2

A much more efficient approach would be to run Awk once.
awk -v patterns="foo,bar,baz" 'BEGIN { n=split(patterns, pats, ",") }
{ for (i=1; i<=n; ++i) if ($0 ~ pats[i]) ++hits[i] }
END { for (i=1; i<=n; ++i) printf("%8d%s\n", hits[i], pats[i]) }' list of files
For bonus points, format the output in machine-readable format (depending on where it ends up, JSON might be a good choice); and/or add the human-readable explanation for the significance of each hit to the END block.
If that's not what you want, running grep -Eic and ditching any zero value would already improve your run time over grepping the file twice for each match in the worst case. (The pessimal situation would be when the last line and no other line matches your pattern.)

WC on OSX - Return includes spaces

When I run the word count command in OSX terminal like wc -c file.txt I get the below answer that includes spaces padded before the answer. Does anyone know why this happens, or how I can prevent it?
18000 file.txt
I would expect to get:
18000 file.txt
This occurs using bash or bourne shell.

The POSIX standard for wc may be read to imply that there are no leading blanks, but does not say that explicitly. Standards are like that.
This is what it says:
By default, the standard output shall contain an entry for each input file of the form:
"%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
and does not mention the formats for the single-column options such as -c.
A quick check shows me that AIX, OSX, Solaris use a format which specifies the number of digits for the value — to align columns (and differ in the number of digits). HPUX and Linux do not.
So it is just an implementation detail.

I suppose it is a way of getting outputs to line up nicely, and as far as I know there is no option to wc which fine tunes the output format.
You could get rid of them pretty easily by piping through sed 's/^ *//', for example.
There may be an even simpler solution, depending on why you want to get rid of them.

At least under macOS/bash wc exhibits the behavior of outputting trailing positional TABs.
It can be avoided using expr:
echo -n "some words" | expr $(wc -c)
>> 10
echo -n "some words" | expr $(wc -w)
>> 2
Note: The -n prevents echoing a newline character which would count as 1 in wc -c

This bugs me every time I write a script that counts lines or characters. I wish that wc were defined not to emit the extra spaces, but it's not, so we're stuck with them.
When I write a script, instead of
nlines=`wc -l $file`
I always say
nlines=`wc -l < $file`
so that wc's output doesn't include the filename, but that doesn't help with the extra spaces. The trick I use next is to add 0 to the number, like this:
nlines=`expr $nlines + 0` # get rid of trailing spaces

Whats is the behaviour of the "wc" command?

For example:
myCleanVar=$( wc -l < myFile )
myDirtVar=$( wc -l myFile )
echo $myCleanVar
9
echo $myDirtVar
9 myFile
why in "myCleanVar" I get an "integer" value from the "wc" command and in "myDirtVar" I get something like as: "9 file.txt"? I quoted "integer" because in know that in Bash shell by default all is treated as a string, but can't understand the differences of the behaviour of first and second expression. What is the particular effect of the redirection "<" in this case?

wc will list by default the name of file allowing you to use it on more than one file (and have result for all of them). If no filename is specified, the "standard input", which is usually the console input, is used, and no file name is printed. The < is needed to specify an "input redirection", that is read the input from given file instead of using user input.
Put all this information together and you get the reason of wc's behavior in your example
Question time: what would be the output of cat file | wc -l ?

The man page for wc says:
NAME
wc - print newline, word, and byte counts for each file
SYNOPSIS
wc [OPTION]... [FILE]...
wc [OPTION]... --files0-from=F
DESCRIPTION
Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified.
So when you pass the file as an argument it will also print the name of the file because you could pass more than one file to wc for it to count lines, so you know which file has that line count. Of course when you use the stdin instead it does not know the name of the file so it doesn't print it.
Example
$ wc -l FILE1 FILE2
2 FILE1
4 FILE2
6 total

Reading in .txt files to Apple's Numbers

I have a list of .txt files all in the same directory with the name "w_i.txt" where i runs from 1 to n. Each of these files contains a single number (non integer). I want to be able to read in the value from each of these files into a column in Apple's Numbers. I want w_1.txt in row 1 and w_n.txt's value in row n of that column. Should I use Applescript for this and if so what code would be required?

I think I would tackle this as a shell script, rather than applescript.
The following script will iterate over your set of files in numerical order, and produce plain text output. You can redirect this to a text file. I don't have access to Apple's Numbers, but I'd be very surprised if you can't import data from a plain text file.
I have hard-coded the max file index as 5. You'll want to change that.
If there are any files missing, a 0 will be output on that line instead. You could change that to a blank line as required.
Also I don't know if your files end in newlines or not, so the cat/read/echo line is one way to just get the first token of the line and not worry about any following whitespace.
#!/bin/bash
for i in {1..5} ; do
if [ -e w_$i.txt ] ; then
cat w_$i.txt | { IFS="" read -r n ; echo -e "$n" ; }
else
echo 0
fi
done

If all files end with newlines, you could just use cat:
cd ~/Documents/some\ folder; cat w_*.txt | pbcopy
This works even if the files don't end with newlines and sorts w_2.txt before w_11.txt:
sed -n p $(ls w_*.txt | sort -n)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Writing word count in a text file in bash script - bash

Related

Looping and grep writes output for the last line only

Count number of grep occurrences and store it a variable

WC on OSX - Return includes spaces

Whats is the behaviour of the "wc" command?

Reading in .txt files to Apple's Numbers

Categories

Resources