Looping through alphabets in Bash - bash

I want to mv all the files starting with 'x' to directory 'x'; something like:
mv path1/x*.ext path2/x
and do it for all alphabet letters a, ..., z
How can I write a bash script which makes 'x' loops through the alphabet?

for x in {a..z}
do
echo "$x"
mkdir -p path2/${x}
mv path1/${x}*.ext path2/${x}
done

This should get you started:
for letter in {a..z} ; do
echo $letter
done

here's how to generate the Spanish alphabet using nested brace expansion
for l in {{a..n},ñ,{o..z}}; do echo $l ; done | nl
1 a
...
14 n
15 ñ
16 o
...
27 z
Or simply
echo -e {{a..n},ñ,{o..z}}"\n" | nl
If you want to generate the obsolete 29 characters Spanish alphabet
echo -e {{a..c},ch,{d..l},ll,{m,n},ñ,{o..z}}"\n" | nl
Similar could be done for French alphabet or German alphabet.

Using rename:
mkdir -p path2/{a..z}
rename 's|path1/([a-z])(.*)|path2/$1/$1$2' path1/{a..z}*
If you want to strip-off the leading [a-z] character from filename, the updated perlexpr would be:
rename 's|path1/([a-z])(.*)|path2/$1/$2' path1/{a..z}*

With uppercase as well
for letter in {{a..z},{A..Z}}; do
echo $letter
done

This question and the answers helped me with my problem, partially.
I needed to loupe over a part of the alphabet in bash.
Although the expansion is strictly textual
I found a solution: and made it even more simple:
START=A
STOP=D
for letter in $(eval echo {$START..$STOP}); do
echo $letter
done
Which results in:
A
B
C
D
Hope its helpful for someone looking for the same problem i had to solve,
and ends up here as well

I hope this can help.
for i in {a..z}
for i in {A..Z}
for i in {{a..z},{A..Z}}
use loop according to need.

Related

How to increment a string variable within a for loop

I want a loop that can find the letter that ends words most frequently in multiple languages and output the data in columns.
So far I have
count="./wordlist/french/fr.txt ./wordlist/spanish/es.txt ./wordlist/german/de.$
lang="French Spanish German Portuguese Italian"
(
echo -e "Language Letter Count"
for i in $count
do
(for j in {a..z}
do
echo -e "LANG" $j $(grep -c $j\> $i)
done
) | sort -k3 -rn | head -1
done
) | column -t
I want it to output as shown:
Language Letter Count
French e 196195
Spanish a 357193
German e 251892
Portuguese a 217178
Italian a 216125
Instead I get:
Language Letter Count
LANG z 0
LANG z 0
LANG z 0
LANG z 0
LANG z 0
The words files have the format:
Word Freq(#) where the word and its frequency are delimited by a space.
This means I have 2 problems;
First, the grep command is not handling the argument $j\> to find a character at the end of a word. I have tried using grep -E $j\> and grep '$j\>' and neither worked.
The second problem is that I don't know how to output the name of the language (in the variable lang). Nesting another for loop did not work when I tried it like this (or with i and k in the opposite order):
(
for i in $count
do
for k in $lang
do
for j in {a..z}
do
echo -e $k $j $(grep -c $j\> $i)
done
) | sort -k3 -rn | head -1
done
done
) | column -t
Since this outputs multiples of the name of the language "$k" in places where it does not belong.
I know that I can just copy and paste the loop for each language, but I would like to extend this to every language.
Thanks in advance!
grep word boundaries
To make special delimiters (e.g. \> for word-end) work with egrep when being called from the shell, you should put them into "quotes".
count=$(egrep -c "${char}\>" "${file}")
Btw, you really should use double quote ("), because single quotes will prevent variable-expansion. (e.g. in j="foo"; k='$j\>', the first character of k's value will be $ rather than f)
Language name display
Getting the right language string is a bit more tricky; here's a few suggestions:
Derive the displayed language from the path of the wordlist:
lang=${file%/*}
lang=${lang##*/}
With bash (though not with dash and some other shells) you might even do lang=${lang^} to capitalize the string.
Lookup the proper language name in a dictionary. Bash-4 has dictionaries built in, but you can also use filebased dicts:
$ cat languagues.txt
./wordlist/french/fr.txt Français
./wordlist/english/en.txt English
./wordlist/german/de.txt Deutsch
$ file=./wordlist/french/fr.txt
$ lang=$(egrep "^${file}/>" languages.txt | awk '{print $2}')
You can also iterate over file,lang pairs, e.g.
languages="french/fr,French spanish/es,Español german/de,Deutsch"
for l in $languages; do
file=./wordlist/${l%,*}.txt
lang=${l#*,}
# ...
done
Taking word frequencies into account
The third problem I see (though I might misunderstand the problem), is that you are not taking the word frequency into account. e.g. a word A that is used 1000 times more often than the word B will only get counted once (just like B).
You can use awk to sum up the word frequencies of matching words:
count=$(egrep "${char}\>" "${file}" | awk '{s+=$2} END {print s}')
All Together Now
So a full solution to the problem could look like:
languages="french/fr,French spanish/es,Español german/de,Deutsch"
(
echo -e "Language Letter Count"
for l in ${languages}; do
file=./wordlist/${l%,*}.txt
lang=${l#*,}
for char in {a..z}; do
#count=$(egrep -c "${char}\>" "${file}")
count=$(egrep "${char}\>" "${file}" | awk '{s+=$2} END {print s}')
echo ${file} ${char} ${count}
done | sort -k3 -rn | head -1
done
) | column -t

How to loop through the first n letters of the alphabet in bash

I know that to loop through the alphabet, one can do
for c in {a..z}; do something; done
My question is, how can I loop through the first n letters (e.g. to build a string) where n is a variable/parameter given in the command line.
I searched SO, and only found answers doing this for numbers, e.g. using C-style for loop or seq (see e.g. How do I iterate over a range of numbers defined by variables in Bash?). And I don't have seq in my environment.
Thanks.
The straightforward way is sticking them in an array and looping over that by index:
#!/bin/bash
chars=( {a..z} )
n=3
for ((i=0; i<n; i++))
do
echo "${chars[i]}"
done
Alternatively, if you just want them dash-separated:
printf "%s-" "${chars[#]:0:n}"
that other guy's answer is probably the way to go, but here's an alternative that doesn't require an array variable:
n=3 # sample value
i=0 # var. for counting iterations
for c in {a..z}; do
echo $c # do something with "$c"
(( ++i == n )) && break # exit loop, once desired count has been reached
done
#rici points out in a comment that you could make do without aux. variable $i by using the conditional (( n-- )) || break to exit the loop, but note that this modifies $n.
Here's another array-free, but less efficient approach that uses substring extraction (parameter expansion):
n=3 # sample value
# Create a space-separated list of letters a-z.
# Note that chars={a..z} does NOT work.
chars=$(echo {a..z})
# Extract the substring containing the specified number
# of letters using parameter expansion with an arithmetic expression,
# and loop over them.
# Note:
# - The variable reference must be _unquoted_ for this to work.
# - Since the list is space-separated, each entry spans 2
# chars., hence `2*n` (you could subtract 1 after, but it'll work either way).
for c in ${chars:0:2*n}; do
echo $c # do something with "$c"
done
Finally, you can combine the array and list approaches for concision, although the pure array approach is more efficient:
n=3 # sample value
chars=( {a..z} ) # create array of letters
# `${chars[#]:0:n}` returns the first n array elements as a space-separated list
# Again, the variable reference must be _unquoted_.
for c in ${chars[#]:0:n}; do
echo $c # do something with "$c"
done
Are you only iterating over the alphabet to create a subset? If that's the case, just make it simple:
$ alpha=abcdefghijklmnopqrstuvqxyz
$ n=4
$ echo ${alpha:0:$n}
abcd
Edit. Based on your comment below, do you have sed?
% sed -e 's/./&-/g' <<< ${alpha:0:$n}
a-b-c-d-
You can loop through the character code of the letters of the alphabet and convert back and forth:
# suppose $INPUT is your input
INPUT='x'
# get the character code and increment it by one
INPUT_CHARCODE=`printf %x "'$INPUT"`
let INPUT_CHARCODE++
# start from character code 61 = 'a'
I=61
while [ $I -ne $INPUT_CHARCODE ]; do
# convert the index to a letter
CURRENT_CHAR=`printf "\x$I"`
echo "current character is: $CURRENT_CHAR"
let I++
done
This question and the answers helped me with my problem, partially.
I needed to loupe over a part of the alphabet based on a letter in bash.
Although the expansion is strictly textual
I found a solution: and made it even more simple:
START=A
STOP=D
for letter in $(eval echo {$START..$STOP}); do
echo $letter
done
Which results in:
A
B
C
D
Hope it's helpful for someone looking for the same problem i had to solve,
and ends up here as well
(also answered here)
And the complete answer to the original question is:
START=A
n=4
OFFSET=$( expr $(printf "%x" \'$START) + $n)
STOP=$(printf "\x$OFFSET")
for letter in $(eval echo {$START..$STOP}); do
echo $letter
done
Which results in the same:
A
B
C
D

Iterate over letters in a for loop

Is it possible to iterate over a list of letters, as followed:
aaaa, aaab, ..., aaaz, aaba, aabb, ..., aabz, ..., zzzy, zzzz
I know the syntax to iterate over the alphabet:
for i in {a..z}
but couldn't figure out a way to do the extended version...
Thanks in advance
You could use brace expansion:
echo {a..z}{a..z}{a..z}{a..z}
Use it in a loop:
for i in {a..z}{a..z}{a..z}{a..z}; do
echo $i
done
It would produce:
aaaa
aaab
aaac
aaad
aaae
...
zzzv
zzzw
zzzx
zzzy
zzzz
You can read more about combining and nesting brace expansions here.
Yes, BASH doesn't care what is contained in the sequence, as long as you give it the sequence.
for i in ducks geese swans; do echo $i; done
ducks
geese
swans
for building further with brace expansion, you just need to work on your brace statements:
for i in aaa{a..z}; do echo $i; done
aaaa
aaab
aaac
aaad
aaae
aaaf
aaag
aaah
aaai
aaaj
aaak
...
Take a look at brace expansion in man bash. You can use the above to satisfy your needs by a set of nested loops with differing levels of prefix for your expansion setups.
I actually don't know bash well, but wouldn't that work?
// Pseudocode
for i in {a..z}
for j in {a..z}
for k in {a..z}
for l in {a..z}
echo $i$j$k$l

Find 1st Letter of every word in a string

How would I find the first letter of a word contained within a string using bash.
For example
Code:
str="my-custom-string'
I would want to find m,c,s. I know how to find the very first letter, but this is slightly more complicated.
Many thanks,
$ echo 'my-custom-string' | egrep -o '\b\w'
m
c
s
Pure Bash using parameter substitution. Remove minus, select first character of each word:
str="my-custom-string"
for word in ${str//-/ }; do
echo "${word:0:1}"
done
Result
m
c
s
Here's a sed version:
echo 'my-custom-string' | sed 's/\(^\|-\)\(.\)[^-]*/\2\n/g'
This might work for you (GNU sed);
echo 'my-custom-string' | sed 's/\B.//g;y/-/,/'
m,c,s
or:
echo 'my-custom-string' | sed 's/\B.//g;y/-/\n/'
m
c
s

How can I cut(1) camelcase words?

Is there an easy way in Bash to split a camelcased word into its constituent words?
For example, I want to split aCertainCamelCasedWord into 'a Certain Camel Cased Word' and be able to select those fields that interest me. This is trivially done with cut(1) when the word separator is the underscore, but how can I do this when the word is camelcased?
sed 's/\([A-Z]\)/ \1/g'
Captures each capital letter and substitutes a leading space with the capture for the whole stream.
$ echo "aCertainCamelCasedWord" | sed 's/\([A-Z]\)/ \1/g'
a Certain Camel Cased Word
This solution works if you need to not split up words that are all caps. For example, using the top answer you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z]\)/ \1/g'
F A Q Page
But instead with my solution, you'll get:
$ echo 'FAQPage' | sed 's/\([A-Z][^A-Z]\)/ \1/g'
FAQ Page
Note: This does not work correctly when there is a second instance of multiple uppercase words, for example:
$ echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
This answer does not work correctly when there is a second instance of multiple uppercase
echo 'FAQPageOneReplacedByFAQPageTwo' | sed 's|\([A-Z][^A-Z]\)| \1|g'
FAQ Page One Replaced ByFAQ Page Two
So and additional expression is required for that
echo 'FAQPageOneReplacedByFAQPageTwo' | sed -e 's|\([A-Z][^A-Z]\)| \1|g' -e 's|\([a-z]\)\([A-Z]\)|\1 \2|g'
FAQ Page One Replaced By FAQ Page Two
Pure Bash:
name="aCertainCamelCasedWord"
declare -a word # the word array
counter1=0 # count characters
counter2=0 # count words
while [ $counter1 -lt ${#name} ] ; do
nextchar=${name:${counter1}:1}
if [[ $nextchar =~ [[:upper:]] ]] ; then
((counter2++))
word[${counter2}]=$nextchar
else
word[${counter2}]=${word[${counter2}]}$nextchar
fi
((counter1++))
done
echo -e "'${word[#]}'"

Resources