The title actually almost explains it all. I would like to check if a string contains a letter (not a specific letter, really any letter) more than once.
for example:
user:
test.sh this list
script:
if [ "$1" has some letter more then once ]
then
do something
fi
Use a Posix character class:
if [[ $1 =~ [[:alpha:]].*[[:alpha:]] ]]; then
echo "more than one letter"
fi
This regex (in bash) will tell you the first lower case letter that is repeated.
And which is it:
#!/bin/bash
regex="([a-z]).*\1"
if [[ $1 =~ $regex ]]; then
echo "more than one letter ${BASH_REMATCH[1]}"
fi
Call as:
$ script.sh "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz"
more than one letter "z"
Of course, the range of letters could be changed to lower and upper:
[a-zA-Z]
But only if the LC_COLLATE is set to "C", if that is set to UTF-8, then also accented characters could be included in the a-z range. As this may show:
$ ./sc.sh abcdefghijklémnopéqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz
more than one letter "é"
This will keep letters as what ASCII believe a letter is:
$ LC_COLLATE=C ./sc.sh abcdefghijklémnopéqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZz
more than one letter "z"
The range of characters could be some of the POSIX character ranges:
[[:word:]] [[:alpha:]] [[:lower:]] [[:upper:]]
Please note that what those ranges mean is also changed by the character set in use.
If you want to go by using just basic commands, you can use something like this ...
#!/bin/bash
PATH=/bin/:/usr/bin/:$PATH
if [ `echo $* | tr -d ' ' | sed 's/\(.\)/\1\n/g' | sort | uniq -c | tr -s ' ' | sort -n | grep -v '^ 1 ' | wc -l` -ge 1 ]
then
echo "Input contains duplicate characters"
fi
In case it is unclear, it will be easy to try it out each step on the command line like this ... echo test input | tr -d ' 'see the output, then add the sed part to it and so on and so forth.
The first tr -d ' ' will ensure spaces from your input are not counted as duplicates. For example, if the input is "abcd efgh ijkl", the only character repeating is the space. If you keep tr -d ' ' in there, the script will not count the input to be having duplicate characters, if you remove it, the script will count the input to be having duplicate characters.
Cheers.
-- Parag
Related
I have a string such as plantford1775.274.284b63.11.
I have been using identity=$( echo "$identity" | cut -d'.' -f3) to cut at each dot, and then choose the third section. I am left with 284b63.
The format of this part is always a letter, sandwiched by varying amounts of numbers. I would like to take the first few numbers before the letter. An example code line would be this:
identity=$( echo "$identity" | cut -d'anyletter' -f1)
What do I replace anyletter with to cut at whatever letter is listed there, so that I end with a string of 284?
This could be done in single awk, please try following written and tested with your shown samples.
echo "$identity" | awk -F'.' '{sub(/[^0-9].*/,"",$3);print $3}'
Explanation: simple explanation would be, passing echo command's output as a standard input to awk code. In awk program, setting field separator as . for values. Then in 3rd field substituting(using sub function of awk) everything apart from digits with NULL in 3rd field, then printing it.
Try:
echo plantford1775.274.284b63.11 | cut -d. -f3 | sed 's/[a-z].*//'
Or a slight variation on the REGEX, with [[...]] in bash:
v="plantford1775.274.284b63.11"
[[ $v =~ ^[^.]+.[^.]+.([^.]+).*$ ]] && echo ${BASH_REMATCH[1]}
Output
284b63
Or if you are only interested in the digits before the letter:
[[ $v =~ ^[^.]+.[^.]+.([[:digit:]]+)[^.]+.*$ ]] && echo ${BASH_REMATCH[1]}
Output
284
With bash, using the =~ operator :
[[ $identity =~ [^.]*.[^.]*.([0-9]+) ]] && identity=${BASH_REMATCH[1]}
or, in POSIX shell:
identity=${identity#*.*.}
identity=${identity%%[^0-9]*}
or, using sed:
identity=$(sed 's/[^.]*.[^.]*.\([0-9]*\).*/\1/' <<< "$identity")
Maybe you can use a bash regex and get the result from $BASH_REMATCH.
[[ "$identity" =~ ([0-9]+)[a-z][0-9]+ ]] && identity="${BASH_REMATCH[1]}"
Say we have
identity=284b63
then you can do a
lead=${identity%[a-z]*}
to set lead to 284. Feel free to adapt the pattern to upper case letters and/or other separators.
If the format of this part is always a letter, sandwiched by varying amounts of numbers, and you want to match this format, you might also use gnu awk, setting the field separator to . and use a pattern with a capture group for the 3rd field.
The pattern captures 1 or more digits from the start of the string, and match one of more chars [a-z] after it followed by a digit.
echo "$identity" | awk -F'.' 'match($3, /^([0-9]+)[a-z]+[0-9]/, ary) {print ary[1]}'
Output
284
Or using sed with a pattern matching the first 2 dots and the capture group after the 2nd dot:
identity=$(sed 's/^[^.]\+\.[^\.]\+\.\([0-9]\+\)[a-z]\+[0-9].*/\1/' <<< "$identity")
My intent is to write a shell script to extract a pattern ,using regular expressions, from a file and fill an array with all the ocurrences of the pattern in order to foreach it.
What is the best way to achieve this?
I am trying to do it using sed. And a problem I am facing is that the patterns can have newlines and these newlines must be considered, eg:
File content:
"My name
is XXX"
"My name is YYY"
"Today
is
the "
When I extract all patterns between double quotes, including the double quotes, the output of the first ocurrence must be:
"My name
is XXX"
fill an array with all the ocurrences of the pattern
First convert your file to have meaningful delimiter, ex. null byte, with ex. GNU sed with -z switch:
sed -z 's/"\([^"]*\)"[^"]*/\1\00/g'
I've added the [^"]* on the end, so that characters not between " are removed.
After it it becomes more trivial to parse it.
You can get the first element with:
head -z -n1
Or sort and count the occurrences:
sort -z | uniq -z -c
Or load to an array with bash's maparray:
maparray -d '' -t arr < <(<input sed -z 's/"\([^"]*\)"[^"]*/\1\00/'g))
Alternatively you can use ex. $'\01' as the separator, as long as it's unique, it becomes simple to parse such data in bash.
Handling such streams is a bit hard in bash. You can't set variable value in shell with embedded null byte. Also expect sometimes warnings on command substitutions. Usually when handling data with arbitrary bytes, I convert it with xxd -p to plain ascii and back with xxd -r -p. With that, it becomes easier.
The following script:
cat <<'EOF' >input
"My name
is XXX"
"My name is YYY"
"Today
is
the "
EOF
sed -z 's/"\([^"]*\)"[^"]*/\1\x00/g' input > input_parsed
echo "##First element is:"
printf '"'
<input_parsed head -z -n1
printf '"\n'
echo "##Elemets count are:"
<input_parsed sort -z | uniq -z -c
echo
echo "##The array is:"
mapfile -d '' -t arr <input_parsed
declare -p arr
will output (the formatting is a bit off, because of the non-newline delimetered output from uniq):
##First element is:
"My name
is XXX"
##Elemets count are:
1 My name
is XXX 1 My name is YYY 1 Today
is
the
##The array is:
declare -a arr=([0]=$'My name\nis XXX' [1]="My name is YYY" [2]=$'Today\nis\nthe ')
Tested on repl.it.
This may be what you're looking for, depending on the answers to the questions I posted in a comment:
$ readarray -d '' -t arr < <(grep -zo '"[^"]*"' file)
$ printf '%s\n' "${arr[0]}"
"My name
is XXX"
$ declare -p arr
declare -a arr=([0]=$'"My name \nis XXX"' [1]="\"My name is YYY\"" [2]=$'"Today\nis\nthe "')
It uses GNU grep for -z.
Sed can extract your desired pattern with or without newlines.
But if you want to store the multiple results into a bash array,
it may be easier to make use of bash regex.
Then please try the following:
lines=$(< "file") # slurp all lines
re='"[^"]+"' # regex to match substring between double quotes
while [[ $lines =~ ($re)(.*) ]]; do
array+=("${BASH_REMATCH[1]}") # push the matched pattern to the array
lines=${BASH_REMATCH[2]} # update $lines with the remaining part
done
# report the result
for (( i=0; i<${#array[#]}; i++ )); do
echo "$i: ${array[$i]}"
done
Output:
0: "My name
is XXX"
1: "My name is YYY"
2: "Today
is
the "
Given "ABCDEFGHIJKLMOPQRSTUVWXY"
How does one achieve this outcome? "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY"
With sed you can do this by first adding a - after every 5 characters, then removing the trailing - at the end of the line:
$ sed -E 's/.{5}/&-/g; s/-$//' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
In extended (-E) mode:
.{5} matches any 5 characters
&- replaces with the whole match (the 5 characters) plus -
Then the second substitution command matches - at the end of the line ($) and replaces with nothing.
With GNU awk, one option would be to use FPAT to define the way the line is interpreted as a series of fields, then add - between each field:
$ awk -v FPAT='.{5}' -v OFS='-' '{ $1 = $1 } 1' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
The field pattern FPAT is defined as any 5 characters and the Output Field Separator OFS is defined as -. $1 = $1 "touches" every line, causing it to be reformatted (without this part, nothing would happen). 1 is the shortest true condition causing each line to be printed.
It's not too difficult to do this in bash either:
#!/bin/bash
input="ABCDEFGHIJKLMNOPQRSTUVWXY"
parts=()
# build an array from slices of length 5
for (( i = 0; i < ${#input}; i += 5 )) do
parts+=( "${input:i:5}" )
done
# join the array on IFS (use a subshell to avoid modifying IFS for rest of script)
( IFS=-; echo "${parts[*]}" )
Could you please try following.
echo "ABCDEFGHIJKLMOPQRSTUVWXY" | sed 's/...../&-/g;s/-$//'
A simple solution for only letters will be
sed -E 's/[A-Z]{4}./&-/g' file.txt
The output will be:
ABCDE-FGHIJ-KLMOP-QRSTU-VWXY
if you want them to include more than capital letters just do a:
sed -E 's/[A-Za-z]{4}./&-/g' file.txt
Try this
#!/bin/bash
s="ABCDEFGHIJKLMNOPQRSTUVWXY"
a=($(echo ${s} | grep -o .))
o=""
i=0
while [[ ${i} -lt ${#a[#]} ]]; do
o="${o}${a[${i}]}"
(( i++ ))
[[ $(( i % 5 )) -eq 0 ]] && [[ ${i} -ne ${#a[#]} ]] && o="${o}-"
done
echo ${o}
exit 0
another solution with fold/paste
$ echo {A..Y} | tr -d ' ' | # this is to generate the string
fold -w5 | paste -sd-
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
This might work for you (GNU sed):
sed 's/.\{5\}\B/&-/g' file
Insert a hyphen every five characters as long as the fifth character is inside a word.
Yet another choice
perl -pe 's/(.{5})(?=.)/$1-/g' file
Match 5 characters that are followed by another character (to avoid the trailing hyphen problem)
I 'd like to find a bash only (no sed, awk, perl, ...) for finding out if a word is in alphabetical order, in other words every letter is.
example:
bdjkz is true,
ahjmno is true,
sdgla is false.
I'm already struggling just comparing ascii values for characters, so if anyone could point me in the right direction for that it would help a lot!
Thanks
Pure bash solution (no external tool used), using Parameter Expansion to address characters inside strings:
function compare () {
word=$1
for (( pos=0; pos<${#word}-1; pos++ )) ; do
[[ ${word:pos:1} < ${word:pos+1:1} ]] || return 1
done
return 0
}
Tested with
for word in bdjkz ahjmno sdgla ; do
if compare $word ; then
echo $word ordered
else
echo $word not ordered
fi
done
If you can utilize other command line tools (but not awk, sed, perl), you can try:
[[ "YOURSTRING" = "$(echo "YOURSTRING" | grep -o '.' | sort -n |tr -d '\n')" ]] && \
echo "Alphabetic order"
[[ ... ]] is testing the expresion
"YOURSTRING" = string comparison
"$( ... )" capture the inner workings output in a string
echo "YOURSTRING" | grep -o '.' print every character on a line from "YOURSTRING" (-o '.': print only the matches for any single character - NOTE: you might need a new version of grep for this option)
... sort -n | sort the output from 4.
... tr -d '\n' rejoin the characters from 5. (by deleting the trailing new line characters)
You can use:
p='bdjkz'
q=$(fold -w1 <<< "$p"|sort|tr -d "\n")
[[ "$p" == "$q" ]] && echo "in alphabetical order" || echo "not in alphabetical order"
s=($(echo "existingString" | grep -o .)) # put each character of input string in an array.
k=($(printf '%s\n' "${s[#]}" | sort)) # sorts the input string
if [[ "${s[*]}" == "${k[*]}" ]]; then # comparing the input string array with sorted array
echo "alphabetical"
else
echo "not alphabetical"
fi
How can i shift each letter of a string by a given number of letters down or up in bash, without using a hardcoded dictionary?
Do you mean something like ROT13:
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]'
uryyb gurer
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]' | tr '[a-z]' '[n-za-m]'
hello there
For a more general solution where you want to provide an arbitrary rotation (0 through 26), you can use:
#!/usr/bin/bash
dual=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
phrase='hello there'
rotat=13
newphrase=$(echo $phrase | tr "${dual:0:26}" "${dual:${rotat}:26}")
echo ${newphrase}
If you want to rotate also the capitals you could use something like this:
cat data.txt | tr '[a-z]' '[n-za-m]' | tr '[A-Z]' '[N-ZA-M]'
where data.txt has whatever you want to rotate.
$ alpha=abcdefghijklmnopqrstuvwxyz
$ rot=3
$ sed "y/${alpha}/${alpha:$rot}${alpha::$rot}/" <<< 'foobar'
irredu
Shift by 12 characters(A becomes M, and vice versa)
Encryption
----------
$> echo ABCDE | tr '[A-Z]' '[M-ZA-L]' // prints MNOPQ
Decryption
----------
$> echo MNOPQ | tr '[M-ZA-L]' '[A-Z]' // prints ABCDE
In the encryption example, we are piping ABCDE to the command tr which is given two arguments. The first one is a matching string. It will match certain strings in your input(in our case ABCDE). The second argument works upon the result of the first argument and modifies it accordingly. So, we're basically matching any uppercase letter present in the input ABCDE and passing it to the second argument. The second argument replaces the characters with their 12th next counterpart. Now, this part is important to understand and might confuse some people, we're basically going from [M-L] in the second argument. Since the tr command doesn't accept this directly, we're breaking it up into two separate chunks. First chunk is [M-Z] and the second one is [A-L]. It's basically like a search-and-replace mechanism. You search with the first argument, modify with the second argument, as simple as that.
For the second example, I've just swapped the first argument with the second one in the tr command. Which acts perfectly as a decryptor. You could write it the same way as the first example, but I find it less time consuming when I have the encryption algorithm and I can just swap the arguments to have a decryption algorithm as well.
Or
cat data.txt | tr 'a-zA-Z' 'n-za-mN-ZA-M'
It will also work
Without using tr, shift 1 to 25 characters
and can be decrypted using 26 - original key
#!/bin/bash
#set -x
i=0
for letters in {A..Z}
do
abc_cap[$i]="$letters"
((i++))
done
i=0
for letters in {a..z}
do
abc_small[$i]="$letters"
((i++))
done
read -r -p "Enter message to be encrypted/decrypted: " -a message
read -r -p "Enter shift amount (26 - orig key for decrypt): " shift_amount
echo -n "Encrypted message: "
if [ "$shift_amount" -gt 25 ] || [ "$shift_amount" -lt 1 ]
then
echo "Shift amount out of range"
exit
fi
for word in "${message[#]}"
do
while read -r -n 1 letter
do
if [[ "$letter" = [a-z] ]]
then
for a in "${!abc_small[#]}"
do
if [ "${abc_small[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_small[$a]}"
fi
done
elif [[ "$letter" = [A-Z] ]]
then
for a in "${!abc_cap[#]}"
do
if [ "${abc_cap[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_cap[$a]}"
fi
done
elif [[ "$letter" = "" ]]
then echo -n " "
else echo -n "$letter"
fi
done < <(echo "$word")
done
echo
exit
Problem statement and how this command can help you:
For example The password is stored in the file data.txt, where 13 positions have rotated all lowercase (a-z) and uppercase (A-Z) letters.
The data.txt file contains 1 line encrypted with the ROT13 ( rotation by 13) algorithm. In order to decrypt it, I have to replace every letter with the letter 13 positions ahead.
file contains the data as shown below
cat data.txt
Gur cnffjbeq vf WIAOOSFzMjXXBC0KoSKBbJ8puQm5lIEi
after rotation to 13 character, the password will look like this.
The password is JVNBBFSmZwKKOP0XbFXOoW8chDz5yVRv
The command to Do that is given below.
cat data.txt | tr '[A-Za-z]' '[N-ZA-Mn-za-m]'
Explanation of the Command
cat data.txt read all the character in data.txt file and then pass to tr command, tr commands takes two arguments, the first argument [A-Za-z] read only the characters made of A-Z or a-z. and in the second argument is rotation regular expression.
[13th character from A - ZA-12th character from A and same expression as for small letters]
[N-ZA-Mn-za-m]
N : 13th character from A.
Z : to the end.
A : first character.
N : just a previous character from the 13th character. to complete the circle.
repeat the same expression for small letters.
We rotated by 13, you can replace the 13th and Previous character by any x position to rotate the string by x characters