how to increment array count in bash? - bash

The awk code below counts the frequency of the occurrence of $1. I would like the bash equivalent for this code?
a[$1]++}END{for(x in a)print a[x],x
How would I do this with a bash array?

Although there will be bash-oriented and smarter ways, the literal translation of your code will be:
declare -A a # you need to explicitly declare an associative array
while read -r x rest; do # "x" is assigned to the first field ($1) and "rest" to the rest
((a[$x]++)) # increment the count
done
for x in "${!a[#]}"; do # iterate over the array
echo "${a[$x]},$x" # print the result
done
The code above waits for the input via the stdin and you need to feed the input with a redirect or a pipe.

To count the number of occurrences of elem in arr:
printf '%s\n' "${arr[#]}" | grep -cw elem

Related

Reading a particular Digit from a given number in shell [duplicate]

I have a string in a Bash shell script that I want to split into an array of characters, not based on a delimiter but just one character per array index. How can I do this? Ideally it would not use any external programs. Let me rephrase that. My goal is portability, so things like sed that are likely to be on any POSIX compatible system are fine.
Try
echo "abcdefg" | fold -w1
Edit: Added a more elegant solution suggested in comments.
echo "abcdefg" | grep -o .
You can access each letter individually already without an array conversion:
$ foo="bar"
$ echo ${foo:0:1}
b
$ echo ${foo:1:1}
a
$ echo ${foo:2:1}
r
If that's not enough, you could use something like this:
$ bar=($(echo $foo|sed 's/\(.\)/\1 /g'))
$ echo ${bar[1]}
a
If you can't even use sed or something like that, you can use the first technique above combined with a while loop using the original string's length (${#foo}) to build the array.
Warning: the code below does not work if the string contains whitespace. I think Vaughn Cato's answer has a better chance at surviving with special chars.
thing=($(i=0; while [ $i -lt ${#foo} ] ; do echo ${foo:$i:1} ; i=$((i+1)) ; done))
As an alternative to iterating over 0 .. ${#string}-1 with a for/while loop, there are two other ways I can think of to do this with only bash: using =~ and using printf. (There's a third possibility using eval and a {..} sequence expression, but this lacks clarity.)
With the correct environment and NLS enabled in bash these will work with non-ASCII as hoped, removing potential sources of failure with older system tools such as sed, if that's a concern. These will work from bash-3.0 (released 2005).
Using =~ and regular expressions, converting a string to an array in a single expression:
string="wonkabars"
[[ "$string" =~ ${string//?/(.)} ]] # splits into array
printf "%s\n" "${BASH_REMATCH[#]:1}" # loop free: reuse fmtstr
declare -a arr=( "${BASH_REMATCH[#]:1}" ) # copy array for later
The way this works is to perform an expansion of string which substitutes each single character for (.), then match this generated regular expression with grouping to capture each individual character into BASH_REMATCH[]. Index 0 is set to the entire string, since that special array is read-only you cannot remove it, note the :1 when the array is expanded to skip over index 0, if needed.
Some quick testing for non-trivial strings (>64 chars) shows this method is substantially faster than one using bash string and array operations.
The above will work with strings containing newlines, =~ supports POSIX ERE where . matches anything except NUL by default, i.e. the regex is compiled without REG_NEWLINE. (The behaviour of POSIX text processing utilities is allowed to be different by default in this respect, and usually is.)
Second option, using printf:
string="wonkabars"
ii=0
while printf "%s%n" "${string:ii++:1}" xx; do
((xx)) && printf "\n" || break
done
This loop increments index ii to print one character at a time, and breaks out when there are no characters left. This would be even simpler if the bash printf returned the number of character printed (as in C) rather than an error status, instead the number of characters printed is captured in xx using %n. (This works at least back as far as bash-2.05b.)
With bash-3.1 and printf -v var you have slightly more flexibility, and can avoid falling off the end of the string should you be doing something other than printing the characters, e.g. to create an array:
declare -a arr
ii=0
while printf -v cc "%s%n" "${string:(ii++):1}" xx; do
((xx)) && arr+=("$cc") || break
done
If your string is stored in variable x, this produces an array y with the individual characters:
i=0
while [ $i -lt ${#x} ]; do y[$i]=${x:$i:1}; i=$((i+1));done
The most simple, complete and elegant solution:
$ read -a ARRAY <<< $(echo "abcdefg" | sed 's/./& /g')
and test
$ echo ${ARRAY[0]}
a
$ echo ${ARRAY[1]}
b
Explanation: read -a reads the stdin as an array and assigns it to the variable ARRAY treating spaces as delimiter for each array item.
The evaluation of echoing the string to sed just add needed spaces between each character.
We are using Here String (<<<) to feed the stdin of the read command.
I have found that the following works the best:
array=( `echo string | grep -o . ` )
(note the backticks)
then if you do: echo ${array[#]} ,
you get: s t r i n g
or: echo ${array[2]} ,
you get: r
Pure Bash solution with no loop:
#!/usr/bin/env bash
str='The quick brown fox jumps over a lazy dog.'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array (skip first record)
# Character 037 is the octal representation of ASCII Record Separator
# so it can capture all other characters in the string, including spaces.
IFS= mapfile -s1 -t -d $'\37' array <<<"${str//?()/$'\37'}"
# Strip out captured trailing newline of here-string in last record
array[-1]="${array[-1]%?}"
# Debug print array
declare -p array
string=hello123
for i in $(seq 0 ${#string})
do array[$i]=${string:$i:1}
done
echo "zero element of array is [${array[0]}]"
echo "entire array is [${array[#]}]"
The zero element of array is [h]. The entire array is [h e l l o 1 2 3 ].
Yet another on :), the stated question simply says 'Split string into character array' and don't say much about the state of the receiving array, and don't say much about special chars like and control chars.
My assumption is that if I want to split a string into an array of chars I want the receiving array containing just that string and no left over from previous runs, yet preserve any special chars.
For instance the proposed solution family like
for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
Have left overs in the target array.
$ y=(1 2 3 4 5 6 7 8)
$ x=abc
$ for (( i=0 ; i < ${#x} ; i++ )); do y[i]=${x:i:1}; done
$ printf '%s ' "${y[#]}"
a b c 4 5 6 7 8
Beside writing the long line each time we want to split a problem, so why not hide all this into a function we can keep is a package source file, with a API like
s2a "Long string" ArrayName
I got this one that seems to do the job.
$ s2a()
> { [ "$2" ] && typeset -n __=$2 && unset $2;
> [ "$1" ] && __+=("${1:0:1}") && s2a "${1:1}"
> }
$ a=(1 2 3 4 5 6 7 8 9 0) ; printf '%s ' "${a[#]}"
1 2 3 4 5 6 7 8 9 0
$ s2a "Split It" a ; printf '%s ' "${a[#]}"
S p l i t I t
If the text can contain spaces:
eval a=( $(echo "this is a test" | sed "s/\(.\)/'\1' /g") )
$ echo hello | awk NF=NF FS=
h e l l o
Or
$ echo hello | awk '$0=RT' RS=[[:alnum:]]
h
e
l
l
o
I know this is a "bash" question, but please let me show you the perfect solution in zsh, a shell very popular these days:
string='this is a string'
string_array=(${(s::)string}) #Parameter expansion. And that's it!
print ${(t)string_array} -> type array
print $#string_array -> 16 items
This is an old post/thread but with a new feature of bash v5.2+ using the shell option patsub_replacement and the =~ operator for regex. More or less same with #mr.spuratic post/answer.
str='There can be only one, the Highlander.'
regexp="${str//?/(&)}"
[[ "$str" =~ $regexp ]] &&
printf '%s\n' "${BASH_REMATCH[#]:1}"
Or by just: (which includes the whole string at index 0)
declare -p BASH_REMATCH
If that is not desired, one can remove the value of the first index (index 0), with
unset -v 'BASH_REMATCH[0]'
instead of using printf or echo to print the value of the array BASH_REMATCH
One can check/see the value of the variable "$regexp" with either
declare -p regexp
Output
declare -- regexp="(T)(h)(e)(r)(e)( )(c)(a)(n)( )(b)(e)( )(o)(n)(l)(y)( )(o)(n)(e)(,)( )(t)(h)(e)( )(H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
or
echo "$regexp"
Using it in a script, one might want to test if the shopt is enabled or not, although the manual says it is on/enabled by default.
Something like.
if ! shopt -q patsub_replacement; then
shopt -s patsub_replacement
fi
But yeah, check the bash version too! If you're not sure which version of bash is in use.
if ! ((BASH_VERSINFO[0] >= 5 && BASH_VERSINFO[1] >= 2)); then
printf 'No dice! bash version 5.2+ is required!\n' >&2
exit 1
fi
Space can be excluded from regexp variable, change it from
regexp="${str//?/(&)}"
To
regexp="${str//[! ]/(&)}"
and the output is:
declare -- regexp="(T)(h)(e)(r)(e) (c)(a)(n) (b)(e) (o)(n)(l)(y) (o)(n)(e) (t)(h)(e) (H)(i)(g)(h)(l)(a)(n)(d)(e)(r)(.)"
Maybe not as efficient as the other post/answer but it is still a solution/option.
If you want to store this in an array, you can do this:
string=foo
unset chars
declare -a chars
while read -N 1
do
chars[${#chars[#]}]="$REPLY"
done <<<"$string"x
unset chars[$((${#chars[#]} - 1))]
unset chars[$((${#chars[#]} - 1))]
echo "Array: ${chars[#]}"
Array: f o o
echo "Array length: ${#chars[#]}"
Array length: 3
The final x is necessary to handle the fact that a newline is appended after $string if it doesn't contain one.
If you want to use NUL-separated characters, you can try this:
echo -n "$string" | while read -N 1
do
printf %s "$REPLY"
printf '\0'
done
AWK is quite convenient:
a='123'; echo $a | awk 'BEGIN{FS="";OFS=" "} {print $1,$2,$3}'
where FS and OFS is delimiter for read-in and print-out
For those who landed here searching how to do this in fish:
We can use the builtin string command (since v2.3.0) for string manipulation.
↪ string split '' abc
a
b
c
The output is a list, so array operations will work.
↪ for c in (string split '' abc)
echo char is $c
end
char is a
char is b
char is c
Here's a more complex example iterating over the string with an index.
↪ set --local chars (string split '' abc)
for i in (seq (count $chars))
echo $i: $chars[$i]
end
1: a
2: b
3: c
zsh solution: To put the scalar string variable into arr, which will be an array:
arr=(${(ps::)string})
If you also need support for strings with newlines, you can do:
str2arr(){ local string="$1"; mapfile -d $'\0' Chars < <(for i in $(seq 0 $((${#string}-1))); do printf '%s\u0000' "${string:$i:1}"; done); printf '%s' "(${Chars[*]#Q})" ;}
string=$(printf '%b' "apa\nbepa")
declare -a MyString=$(str2arr "$string")
declare -p MyString
# prints declare -a MyString=([0]="a" [1]="p" [2]="a" [3]=$'\n' [4]="b" [5]="e" [6]="p" [7]="a")
As a response to Alexandro de Oliveira, I think the following is more elegant or at least more intuitive:
while read -r -n1 c ; do arr+=("$c") ; done <<<"hejsan"
declare -r some_string='abcdefghijklmnopqrstuvwxyz'
declare -a some_array
declare -i idx
for ((idx = 0; idx < ${#some_string}; ++idx)); do
some_array+=("${some_string:idx:1}")
done
for idx in "${!some_array[#]}"; do
echo "$((idx)): ${some_array[idx]}"
done
Pure bash, no loop.
Another solution, similar to/adapted from Léa Gris' solution, but using read -a instead of readarray/mapfile :
#!/usr/bin/env bash
str='azerty'
# Need extglob for the replacement pattern
shopt -s extglob
# Split string characters into array
# ${str//?()/$'\x1F'} replace each character "c" with "^_c".
# ^_ (Control-_, 0x1f) is Unit Separator (US), you can choose another
# character.
IFS=$'\x1F' read -ra array <<< "${str//?()/$'\x1F'}"
# now, array[0] contains an empty string and the rest of array (starting
# from index 1) contains the original string characters :
declare -p array
# Or, if you prefer to keep the array "clean", you can delete
# the first element and pack the array :
unset array[0]
array=("${array[#]}")
declare -p array
However, I prefer the shorter (and easier to understand for me), where we remove the initial 0x1f before assigning the array :
#!/usr/bin/env bash
str='azerty'
shopt -s extglob
tmp="${str//?()/$'\x1F'}" # same as code above
tmp=${tmp#$'\x1F'} # remove initial 0x1f
IFS=$'\x1F' read -ra array <<< "$tmp" # assign array
declare -p array # verification

parsing var with blank spaces

I have a variable
some_var="a23=some value&p44=another_value&uw=possibly_another one"
and I want to convert it into several substrings one for each = (breaking at the &). So I would get
a23=some value
p44=another_value
uw=possibly_another one
If I run this code
for s in ${some_var//&/ };do echo $s;done
I get however
a23=some
value
p44=another_value
uw=possibly_another
one
(it breaks at the empty spaces)
How can I run the loop so that it takes space into account?
read into an array using the '&' as field separator:
#!/bin/bash
some_var='a23=some value&p44=another_value&uw=possibly_another one'
IFS='&' read -r -a arr <<< "$some_var"
for s in "${arr[#]}"; do
printf '%s\n' "$s"
done
A technique similar to yours also would work:
(IFS='&'; for s in $some_var; do printf '%s\n' "$s"; done)
Notice that it runs in a subshell not to mess up with the IFS of the current shell.
You may consider reading this article for detailed information on IFS.
Populating an associative array with the argument=value pairs:
#!/usr/bin/env bash
uri_argstring='a23=some value&p44=another_value&uw=possibly_another one'
# Declare an empty associative array
declare -A uri_args=()
# Populate the associative array by reading key values pairs
# delimiting field by &, = or newline,
# delimiting records (argument=value or key=value pairs) with & or End Of File
while IFS=$'=&\n' read -r -d '&' k v; do
# Add entry to associative array
uri_args[$k]=$v
done <<<"$uri_argstring" # Feed here-string to the while loop reading
# Fancy printing
printf 'Argument=Value\n--------------\n'
# Iterate the keys from the uri_args associative array
for arg in "${!uri_args[#]}"; do
value=${uri_args[$arg]}
# Printout
printf '%s=%s\n' "$arg" "$value"
done
Output:
Argument=Value
--------------
a23=some value
p44=another_value

How can I use AWK to sort this type of data into an array to use in BASH

The file looks something like this:
index:index.html
required:file2.1:file2.2
How do I get it into an array of index containing the string - index.html
and an array of required containing the strings - file2.1 and file2.2. and be able to use it in bash?
Thanks so much for any help.
As John1024 pointed, because you want bash arrays at the end, you can do the whole job with bash and you do not need awk. So, if it is part of your homework assignment, either explain the teacher that you found a better way or use something far less elegant and efficient as:
filename=<name-of-input-file>
a=($(awk -F: '$1 == "required" {for(i=2;i<=NF;i+=1) printf $i" "}' $filename))
echo ${a[0]}
file2.1
echo ${a[1]}
file2.2
Explanation:
a=(word1 word2 ...) assigns bash array a with the listed words.
$(command) evaluates the bash command.
awk -F: '$1 == "required" {for(i=2;i<=NF;i+=1) printf $i" "}' data.txt filters file data.txt with awk, using : as fields separator. If the first field is required it prints all other fields (NF is the number of fields), separating them by a space.
And of course, you can very easily adapt it to create the other bash array. I leave it to you as a way to verify that you got it well.
If you want to create bash arrays, you need to use bash, not awk:
declare -A req
declare -A arr
while IFS=: read a b c
do
if [ "$a" = "required" ]
then
req[$b]=$c
elif [ "$b" ]
then
arr[$a]=$b
fi
done <file
With your sample input, this creates the two arrays that you asked for:
declare -A arr='([index]="index.html" )'
declare -A req='([file2.1]="file2.2" )'
How it works
Associative arrays have to be declared before use:
declare -A req
declare -A arr
The following starts a loop. It reads the first three colon-separated fields from the input:
while IFS=: read a b c
do
If the first field is required, then this uses the second and third fields to create a new entry in array req:
if [ "$a" = "required" ]
then
req[$b]=$c
Otherwise, assuming a nonempty second field is present, this adds an entry to array arr:
elif [ "$b" ]
then
arr[$a]=$b
fi
The code below marks the end of the loop and indicates that loop input should be taken from file file:
done <file

Find characters with exactly two occurrences?

For example, say the string "test this" was inserted to my application -- I only want the s
I'm thinking along the lines of grep wildcards, but I've never really used them.
You could write a script.
Iterate over each character.
Increment a counter for each character per character seen.
At the end, check your counters for the one which is equal to 2.
Here's a pure bash implementation of alex' suggestion doing what steve did in awk:
#!/bin/bash
# your string
string="test this"
# First, make a character array out of it
for ((i=0; i<"${#string}"; i++)); do # (quotes just for SO higlighting)
chars[$i]="${string:$i:1}" # (could be space, so quoted)
done
# associative array will keep track of the count for each character
declare -A counts
# loop through each character and keep track of its count
for ((i=0; i<"${#chars[#]}"; i++)); do # (quotes just for SO higlighting)
key="${chars[$i]}" # current character
# (could be space, so quoted)
if [ -z counts["$key"] ]; then # if it doesn't exist yet in counts,
counts["$key"]=0; # initialize it to 0
else
((counts["$key"]++)) # if it exists, increment it
fi
done
# loop through each key/value and print all with count 2
for key in "${!counts[#]}"; do
if [ ${counts["$key"]} -eq 2 ]; then
echo "$key"
fi
done
Note that it uses an associative array, which was introduced in Bash 4.0, so this'll only work on that or newer.
One way using GNU awk:
echo "$string" | awk -F '' '{ for (i=1; i<=NF; i++) array[$i]++; for (j in array) if (array[j]==2) print j }'

Splitting string into array

I want to split the string and construct the array. I tried the below code:
myString="first column:second column:third column"
set -A myArray `echo $myString | awk 'BEGIN{FS=":"}{for (i=1; i<=NF; i++) print $i}'`
# Following is just to make sure that array is constructed properly
i=0
while [ $i -lt ${#myArray[#]} ]
do
echo "Element $i:${myArray[$i]}"
(( i=i+1 ))
done
exit 0
It produces the following result:
Element 0:first
Element 1:column
Element 2:second
Element 3:column
Element 4:third
Element 5:column
This is not what I want it to be. When I construct the array, I want that array to contain only three elements.
Element 0:first column
Element 1:second column
Element 2:third column
Can you please advise?
Here is how I would approach this problem: use the IFS variable to tell the shell (bash) that you want to split the string into colon-separated tokens.
$ cat split.sh
#!/bin/sh
# Script to split fields into tokens
# Here is the string where tokens separated by colons
s="first column:second column:third column"
IFS=":" # Set the field separator
set $s # Breaks the string into $1, $2, ...
i=0
for item # A for loop by default loop through $1, $2, ...
do
echo "Element $i: $item"
((i++))
done
Run it:
$ ./split.sh
Element 0: first column
Element 1: second column
Element 2: third column
if you definitely want to use arrays in Bash, you can try this way
$ myString="first column:second column:third column"
$ myString="${myString//:/ }" #remove all the colons
$ echo "${myString}"
first column second column third column
$ read -a myArr <<<$myString
$ echo ${myArr[#]}
first column second column third column
$ echo ${myArr[1]}
column
$ echo ${myArr[2]}
second
otherwise, the "better" method is to use awk entirely
Note that saving and restoring IFS as I often seen in these solutions has the side effect that if IFS wasn't set, it ends up changed to being an empty string which causes weird problems with subsequent splitting.
Here's the solution I came up with based on Anton Olsen's extended to handle >2 values separated by a colon. It handles values in the list that have spaces correctly, not splitting on the space.
colon_list=${1} # colon-separate list to split
while true ; do
part=${colon_list%%:*} # Delete longest substring match from back
colon_list=${colon_list#*:} # Delete shortest substring match from front
parts[i++]=$part
# We are done when there is no more colon
if test "$colon_list" = "$part" ; then
break
fi
done
# Show we've split the list
for part in "${parts[#]}"; do
echo $part
done
Ksh or Bash
#! /bin/sh
myString="first column:second column:third column"
IFS=: A=( $myString )
echo ${A[0]}
echo ${A[1]}
Looks like you've already found the solution, but note that you can do away with awk entirely:
myString="first column:second column:third column"
OIFS="$IFS"
IFS=':'
myArray=($myString)
IFS=$OIFS
i=0
while [ $i -lt ${#myArray[#]} ]
do
echo "Element $i:${myArray[$i]}"
(( i=i+1 ))
done

Resources