Counting words and characters in Bash without wc [duplicate] - bash

This question already has answers here:
Length of string in bash
(11 answers)
Closed 2 years ago.
I have a variable set like this:
sentence="a very long sentence with multiple spaces"
I need to count how many words and characters are there without using other programs such as wc.
I know counting words can be done like this:
words=( $sentence )
echo ${#words[#]}
But how do I count the characters including spaces?

But how do I count the characters including spaces?
To count length of string use:
echo "${#sentence}"
47

You can also use grep with a regex that matches everything:
echo "this string" | grep -oP . | grep -c .

Using awk on a single line:
echo this string | awk '{print length}'
Another way of piping stdin text to awk:
awk '{print length}' <<< "this string"

Related

Get substring after a special character [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 7 months ago.
I have many strings that look like the following:
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
Using bash, I would like to just get the string after the last '.'(dot) character.
So the output I want:
xyz
abc
mno
pqr
Is there any way to do this?
AWK will do it. I'm using GNU AWK:
$ awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
xyz
abc
mno
pqr
AWK splits lines into fields and we use -F to set the field separator to .. Fields are indexed from 1, so $1 would get the first one (e.g. word1 in the first line) and we can use the variable $NF (for "number of fields") to get the value of the last field in each line.
https://www.grymoire.com/Unix/Awk.html is a great tutorial on AWK.
You can then just use a for loop to iterate over each of the resulting lines:
$ lines=$(awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
)
$ for line in $lines; do echo $line; done
xyz
abc
mno
pqr
I'm using command substitution here - see the Advanced Bash Scripting Guide for information on loops, command substitution and other useful things.
One simple solution would be to split the string on . and then get the last item from the splitted array
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef 'a * b')
for line in "${lines[#]}"
do
line_split=(${line//./ })
echo "${line_split[-1]}"
done
Another clean shell-checked way would be (the idea is the same)
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef)
for line in "${lines[#]}"; do
if [[ $line == *.* ]]; then # check if line contains dot character
IFS=. read -r -a split_array <<<"$line" # one-line solution
echo "${split_array[-1]}" # shows the results
else
echo "No dot in string: $line"
fi
done
This is a one-liner solution (after array assignment), without using an explicit loop (but using printf's implicit loop).
arr=( 'word1.word2.word3.xyz'
'word1.word2.word3.word4.abc'
'word1.word2.mno'
'word1.word2.word3.pqr' )
printf '%s\n' "${arr[#]##*.}"

Bad Substitution when I try to print a specific position of array [duplicate]

This question already has answers here:
Difference between sh and Bash
(11 answers)
Closed 10 months ago.
I'm getting started with bash programming, and I want to print a specific position of array, but when I try I get this error: Bad substitution
#!/bin/sh
user=`cut -d ";" -f1 $ultimocsv | sort -d | uniq -c`
arr=$(echo $user | tr " " "\n")
a=5
echo "${arr[$a]}" #Error:bad substitution
why?
You are using "sh" which does not support arrays. Even if you would use the "bash" you get the same error, because the "arr" will not be an array. I am not sure, if the "-c" at "uniq" is what you wanted.
I assume, this is what you are looking for:
#!/bin/bash
mapfile -t arr < <( cut -d ";" -f1 $ultimocsv | sort -d | uniq )
a=5
echo "${arr[$a]}"
This will not give the error, even if your file has less than 5 unique lines, because bash will return an empty string for a defined but empty array.
It works even with "uniq -c", because it puts complete lines in the array.

Cut a substring in bash

Suppose I have the following string:
some letters foo/substring/goo/some additional letters
I need to extract this substring supposing that foo/ and /goo are constant strings that are known in advance. How can I do that?
This sed one-liner does it.
sed 's#.*foo/##;s#/goo/.*##' file
Except for sed, awk, grep can do the job too. Or with zsh:
kent$ v="some letters foo/substring/goo/some additional letters"
kent$ echo ${${v##*foo/}%%/goo/*}
substring
Note that:
comment by #Nahuel Fouilleul
in ${var%%/goo/*} var must be a variable name, and can't be the result of expansion
The line should be divided into two statements, if work with bash.
$ echo $0
bash
$ v="some letters foo/substring/goo/some additional letters"
$ v=${v##*foo/}
$ v=${v%%/goo/*}
$ echo $v
substring
The line I executed in zsh, worked, but just I tested in bash, it didn't work.
$ echo $0
-zsh
$ v="some letters foo/substring/goo/some additional letters"
$ echo ${${v##*foo/}%%/goo/*}
substring
With variable expansion
line='some letters foo/substring/goo/some additional letters'
line=${line%%/goo*} # remove suffix /goo*
line=${line##*foo/} # remove prefix *ffo/
echo "$line"
or bash regular expression
line='some letters foo/substring/goo/some additional letters'
if [[ $line =~ foo/([^/]*)/goo ]]; then
echo "${BASH_REMATCH[1]}"
fi
If you know there are no other / in your "other letters", you can use cut :
> echo "some letters foo/substring/goo/some additional letters" | cut -d'/' -f2
In terms of readability I think awk is a good solution
echo "some letters foo/substring/goo/some additional letters" | awk -v FS="(foo/|/goo)" '{print $2}'

How to remove special characters from strings but keep underscores in shell script

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
text="info_!_????????_*"
if [ -z `echo $text | tr -dc "[:word:]"` ]
......
Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"
info_A__BC_
A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
info_
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
info_A_pi9ngo_mingo745
If you don't wish to have numbers in the output then change :alnum: to :alpha:.
My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x
info_A__BC_
Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'
SamPlE_tExTreally

results of wc as variables

I would like to use the lines coming from 'wc' as variables. For example:
echo 'foo bar' > file.txt
echo 'blah blah blah' >> file.txt
wc file.txt
2 5 23 file.txt
I would like to have something like $lines, $words and $characters associated to the values 2, 5, and 23. How can I do that in bash?
In pure bash: (no awk)
a=($(wc file.txt))
lines=${a[0]}
words=${a[1]}
chars=${a[2]}
This works by using bash's arrays. a=(1 2 3) creates an array with elements 1, 2 and 3. We can then access separate elements with the ${a[indice]} syntax.
Alternative: (based on gonvaled solution)
read lines words chars <<< $(wc x)
Or in sh:
a=$(wc file.txt)
lines=$(echo $a|cut -d' ' -f1)
words=$(echo $a|cut -d' ' -f2)
chars=$(echo $a|cut -d' ' -f3)
There are other solutions but a simple one which I usually use is to put the output of wc in a temporary file, and then read from there:
wc file.txt > xxx
read lines words characters filename < xxx
echo "lines=$lines words=$words characters=$characters filename=$filename"
lines=2 words=5 characters=23 filename=file.txt
The advantage of this method is that you do not need to create several awk processes, one for each variable. The disadvantage is that you need a temporary file, which you should delete afterwards.
Be careful: this does not work:
wc file.txt | read lines words characters filename
The problem is that piping to read creates another process, and the variables are updated there, so they are not accessible in the calling shell.
Edit: adding solution by arnaud576875:
read lines words chars filename <<< $(wc x)
Works without writing to a file (and do not have pipe problem). It is bash specific.
From the bash manual:
Here Strings
A variant of here documents, the format is:
<<<word
The word is expanded and supplied to the command on its standard input.
The key is the "word is expanded" bit.
lines=`wc file.txt | awk '{print $1}'`
words=`wc file.txt | awk '{print $2}'`
...
you can also store the wc result somewhere first.. and then parse it.. if you're picky about performance :)
Just to add another variant --
set -- `wc file.txt`
chars=$1
words=$2
lines=$3
This obviously clobbers $* and related variables. Unlike some of the other solutions here, it is portable to other Bourne shells.
I wanted to store the number of csv file in a variable. The following worked for me:
CSV_COUNT=$(ls ./pathToSubdirectory | grep ".csv" | wc -l | xargs)
xargs removes the whitespace from the wc command
I ran this bash script not in the same folder as the csv files. Thus, the pathToSubdirectory
You can assign output to a variable by opening a sub shell:
$ x=$(wc some-file)
$ echo $x
1 6 60 some-file
Now, in order to get the separate variables, the simplest option is to use awk:
$ x=$(wc some-file | awk '{print $1}')
$ echo $x
1
declare -a result
result=( $(wc < file.txt) )
lines=${result[0]}
words=${result[1]}
characters=${result[2]}
echo "Lines: $lines, Words: $words, Characters: $characters"

Resources