I have a bash script that outputs the following:
SUM = 137892134.0000000
I need to strip off the first part of the string, leaving only the number, formatted as an integer if possible. I'm assuming I need to use sed but I seem to have zero capacity to learn it.
I need to be able to write a conditional statement that can operate if the value is less than 100. I don't know if I can do this in a bash script, but that will be the second part of my challenge.
The basic form of a substitution with sed is:
s/replace this/with this/
Where "replace this" and "with this" are both regular expressions. In your case, you want to completely get rid of the literal string "SUM = " at the beginning and the decimal at the end. So:
#!/bin/bash
sum=$(your_script.sh | sed 's/^SUM = //' | sed 's/\..*//')
if ! egrep -q '^[0-9]+$' <<< $sum; then
echo "your_script.sh printed unexpected output!"
exit 1
fi
if [ $sum -lt 100 ]; then
echo "$sum is less than 100"
else
echo "$sum is not less than 100"
fi
The first line is what turns "SUM = 137892134.0000000" into "137892134". The first sed replaces "SUM = " at the beginning of the string (^) with nothing (i.e., deletes it). The second sed finds the first period character (\.) and replaces it and everything after it (.*) with nothing. The resulting string is then saved to the variable $sum using $(...).
The if-statement that uses egrep is checking to make sure that the value we saved in $sum is actually an integer, and bails if it's not.
The second if-statement compares the value of $sum, which we now know is an integer, with 100.
It's not clear to me how you want to handle "123.789" (whether you would print 124 or 123 when printing as an integer). Consider:
if $( echo SUM = 137892134.0000000 | awk '{printf "%d", $3}' ) -lt 100; then
echo the value is less than 100!!
fi
You can also do:
if echo SUM = 137892134.0000000 | awk '$3 >= 100 { exit 1}'; then
echo the value is less than 100!!
fi
or
if ! echo SUM = 137892134.0000000 | awk '{exit $3 < 100}'; then
echo the value is less than 100!!
fi
Note that the logic is a little convoluted as awk returning 1 evaluates to failure, so the comparison operator is the inverse of what might be expected.
Here is one way to use sed to do this:
echo 'SUM = 137892134.0000000' | sed 's/[^0-9.]//g' | sed 's/\..*//g'
This is what the output should look like: 137892134.
Some explanation on the commands:
sed 's/[^0-9.]//g' tells sed to remove any characters that are not numbers (0-9) or periods (.)
sed 's/\..*//g' tells sed to remove any characters (.*) after a decimal (\.)
Also, instead of using echo, you can use the output from your original script for that first part... and then it can be piped into sed to eventually get the final "int" that you want.
Note: this does not take into account any rounding issues as brought up by William Pursell.
I suggest:
bashScript | sed 's/.* \(.*\)\.0*/\1/'
In English: "Take a bunch of stuff followed by a space, followed by something, followed by a dot and maybe some zeroes, and replace all of that with the something."
Related
I have a string such as plantford1775.274.284b63.11.
I have been using identity=$( echo "$identity" | cut -d'.' -f3) to cut at each dot, and then choose the third section. I am left with 284b63.
The format of this part is always a letter, sandwiched by varying amounts of numbers. I would like to take the first few numbers before the letter. An example code line would be this:
identity=$( echo "$identity" | cut -d'anyletter' -f1)
What do I replace anyletter with to cut at whatever letter is listed there, so that I end with a string of 284?
This could be done in single awk, please try following written and tested with your shown samples.
echo "$identity" | awk -F'.' '{sub(/[^0-9].*/,"",$3);print $3}'
Explanation: simple explanation would be, passing echo command's output as a standard input to awk code. In awk program, setting field separator as . for values. Then in 3rd field substituting(using sub function of awk) everything apart from digits with NULL in 3rd field, then printing it.
Try:
echo plantford1775.274.284b63.11 | cut -d. -f3 | sed 's/[a-z].*//'
Or a slight variation on the REGEX, with [[...]] in bash:
v="plantford1775.274.284b63.11"
[[ $v =~ ^[^.]+.[^.]+.([^.]+).*$ ]] && echo ${BASH_REMATCH[1]}
Output
284b63
Or if you are only interested in the digits before the letter:
[[ $v =~ ^[^.]+.[^.]+.([[:digit:]]+)[^.]+.*$ ]] && echo ${BASH_REMATCH[1]}
Output
284
With bash, using the =~ operator :
[[ $identity =~ [^.]*.[^.]*.([0-9]+) ]] && identity=${BASH_REMATCH[1]}
or, in POSIX shell:
identity=${identity#*.*.}
identity=${identity%%[^0-9]*}
or, using sed:
identity=$(sed 's/[^.]*.[^.]*.\([0-9]*\).*/\1/' <<< "$identity")
Maybe you can use a bash regex and get the result from $BASH_REMATCH.
[[ "$identity" =~ ([0-9]+)[a-z][0-9]+ ]] && identity="${BASH_REMATCH[1]}"
Say we have
identity=284b63
then you can do a
lead=${identity%[a-z]*}
to set lead to 284. Feel free to adapt the pattern to upper case letters and/or other separators.
If the format of this part is always a letter, sandwiched by varying amounts of numbers, and you want to match this format, you might also use gnu awk, setting the field separator to . and use a pattern with a capture group for the 3rd field.
The pattern captures 1 or more digits from the start of the string, and match one of more chars [a-z] after it followed by a digit.
echo "$identity" | awk -F'.' 'match($3, /^([0-9]+)[a-z]+[0-9]/, ary) {print ary[1]}'
Output
284
Or using sed with a pattern matching the first 2 dots and the capture group after the 2nd dot:
identity=$(sed 's/^[^.]\+\.[^\.]\+\.\([0-9]\+\)[a-z]\+[0-9].*/\1/' <<< "$identity")
Im learning bash, and I have an assignment where I need to iterate through a list of strings in bash using a for loop, and return the longest string.
This is what I've written:
max=-1
word=""
list=`cat random-text.txt | tr -s [:space:] " " | sed -r 's/([.* ])/\1\n/g' | grep -E "^a.*" | sed -r 's/(.*)[[:space:]]/\1/' | tr -s [:space:] " "`
for i in $list; do
int=`$i | wc -c`
if [ $int > $max ]; then
max=$int
word=$i
fi
done
echo The longest word in $infile that starts with $char is $i
that's probably a bit messy, but I'm having trouble using the for loop (I need the echo function at the end to return the longest string I have found iterating through the array.
** that's a part of a longer script I've written, I
Thanks in advance, much appreciated!
for some reason, while I run this script I get an error which says: "Command 'an' not found
That's because you erroneously used $i | to feed the content of variable i to wc; correct is <<<$i instead (with Bash). But better use just int=${#i}.
Then in $int > $max the > is interpreted as an output redirection; the correct arithmetic comparison operator is -gt.
Finally you don't echo the longest word found, but rather the last processed one; change $i to $word there.
Given "ABCDEFGHIJKLMOPQRSTUVWXY"
How does one achieve this outcome? "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY"
With sed you can do this by first adding a - after every 5 characters, then removing the trailing - at the end of the line:
$ sed -E 's/.{5}/&-/g; s/-$//' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
In extended (-E) mode:
.{5} matches any 5 characters
&- replaces with the whole match (the 5 characters) plus -
Then the second substitution command matches - at the end of the line ($) and replaces with nothing.
With GNU awk, one option would be to use FPAT to define the way the line is interpreted as a series of fields, then add - between each field:
$ awk -v FPAT='.{5}' -v OFS='-' '{ $1 = $1 } 1' <<<"ABCDEFGHIJKLMNOPQRSTUVWXY"
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
The field pattern FPAT is defined as any 5 characters and the Output Field Separator OFS is defined as -. $1 = $1 "touches" every line, causing it to be reformatted (without this part, nothing would happen). 1 is the shortest true condition causing each line to be printed.
It's not too difficult to do this in bash either:
#!/bin/bash
input="ABCDEFGHIJKLMNOPQRSTUVWXY"
parts=()
# build an array from slices of length 5
for (( i = 0; i < ${#input}; i += 5 )) do
parts+=( "${input:i:5}" )
done
# join the array on IFS (use a subshell to avoid modifying IFS for rest of script)
( IFS=-; echo "${parts[*]}" )
Could you please try following.
echo "ABCDEFGHIJKLMOPQRSTUVWXY" | sed 's/...../&-/g;s/-$//'
A simple solution for only letters will be
sed -E 's/[A-Z]{4}./&-/g' file.txt
The output will be:
ABCDE-FGHIJ-KLMOP-QRSTU-VWXY
if you want them to include more than capital letters just do a:
sed -E 's/[A-Za-z]{4}./&-/g' file.txt
Try this
#!/bin/bash
s="ABCDEFGHIJKLMNOPQRSTUVWXY"
a=($(echo ${s} | grep -o .))
o=""
i=0
while [[ ${i} -lt ${#a[#]} ]]; do
o="${o}${a[${i}]}"
(( i++ ))
[[ $(( i % 5 )) -eq 0 ]] && [[ ${i} -ne ${#a[#]} ]] && o="${o}-"
done
echo ${o}
exit 0
another solution with fold/paste
$ echo {A..Y} | tr -d ' ' | # this is to generate the string
fold -w5 | paste -sd-
ABCDE-FGHIJ-KLMNO-PQRST-UVWXY
This might work for you (GNU sed):
sed 's/.\{5\}\B/&-/g' file
Insert a hyphen every five characters as long as the fifth character is inside a word.
Yet another choice
perl -pe 's/(.{5})(?=.)/$1-/g' file
Match 5 characters that are followed by another character (to avoid the trailing hyphen problem)
I have a variable like this:
words="这是一条狗。"
I want to make a for loop on each of the characters, one at a time, e.g. first character="这", then character="是", character="一", etc.
The only way I know is to output each character to separate line in a file, then use while read line, but this seems very inefficient.
How can I process each character in a string through a for loop?
You can use a C-style for loop:
foo=string
for (( i=0; i<${#foo}; i++ )); do
echo "${foo:$i:1}"
done
${#foo} expands to the length of foo. ${foo:$i:1} expands to the substring starting at position $i of length 1.
With sed on dash shell of LANG=en_US.UTF-8, I got the followings working right:
$ echo "你好嗎 新年好。全型句號" | sed -e 's/\(.\)/\1\n/g'
你
好
嗎
新
年
好
。
全
型
句
號
and
$ echo "Hello world" | sed -e 's/\(.\)/\1\n/g'
H
e
l
l
o
w
o
r
l
d
Thus, output can be looped with while read ... ; do ... ; done
edited for sample text translate into English:
"你好嗎 新年好。全型句號" is zh_TW.UTF-8 encoding for:
"你好嗎" = How are you[ doing]
" " = a normal space character
"新年好" = Happy new year
"。全型空格" = a double-byte-sized full-stop followed by text description
${#var} returns the length of var
${var:pos:N} returns N characters from pos onwards
Examples:
$ words="abc"
$ echo ${words:0:1}
a
$ echo ${words:1:1}
b
$ echo ${words:2:1}
c
so it is easy to iterate.
another way:
$ grep -o . <<< "abc"
a
b
c
or
$ grep -o . <<< "abc" | while read letter; do echo "my letter is $letter" ; done
my letter is a
my letter is b
my letter is c
I'm surprised no one has mentioned the obvious bash solution utilizing only while and read.
while read -n1 character; do
echo "$character"
done < <(echo -n "$words")
Note the use of echo -n to avoid the extraneous newline at the end. printf is another good option and may be more suitable for your particular needs. If you want to ignore whitespace then replace "$words" with "${words// /}".
Another option is fold. Please note however that it should never be fed into a for loop. Rather, use a while loop as follows:
while read char; do
echo "$char"
done < <(fold -w1 <<<"$words")
The primary benefit to using the external fold command (of the coreutils package) would be brevity. You can feed it's output to another command such as xargs (part of the findutils package) as follows:
fold -w1 <<<"$words" | xargs -I% -- echo %
You'll want to replace the echo command used in the example above with the command you'd like to run against each character. Note that xargs will discard whitespace by default. You can use -d '\n' to disable that behavior.
Internationalization
I just tested fold with some of the Asian characters and realized it doesn't have Unicode support. So while it is fine for ASCII needs, it won't work for everyone. In that case there are some alternatives.
I'd probably replace fold -w1 with an awk array:
awk 'BEGIN{FS=""} {for (i=1;i<=NF;i++) print $i}'
Or the grep command mentioned in another answer:
grep -o .
Performance
FYI, I benchmarked the 3 aforementioned options. The first two were fast, nearly tying, with the fold loop slightly faster than the while loop. Unsurprisingly xargs was the slowest... 75x slower.
Here is the (abbreviated) test code:
words=$(python -c 'from string import ascii_letters as l; print(l * 100)')
testrunner(){
for test in test_while_loop test_fold_loop test_fold_xargs test_awk_loop test_grep_loop; do
echo "$test"
(time for (( i=1; i<$((${1:-100} + 1)); i++ )); do "$test"; done >/dev/null) 2>&1 | sed '/^$/d'
echo
done
}
testrunner 100
Here are the results:
test_while_loop
real 0m5.821s
user 0m5.322s
sys 0m0.526s
test_fold_loop
real 0m6.051s
user 0m5.260s
sys 0m0.822s
test_fold_xargs
real 7m13.444s
user 0m24.531s
sys 6m44.704s
test_awk_loop
real 0m6.507s
user 0m5.858s
sys 0m0.788s
test_grep_loop
real 0m6.179s
user 0m5.409s
sys 0m0.921s
I believe there is still no ideal solution that would correctly preserve all whitespace characters and is fast enough, so I'll post my answer. Using ${foo:$i:1} works, but is very slow, which is especially noticeable with large strings, as I will show below.
My idea is an expansion of a method proposed by Six, which involves read -n1, with some changes to keep all characters and work correctly for any string:
while IFS='' read -r -d '' -n 1 char; do
# do something with $char
done < <(printf %s "$string")
How it works:
IFS='' - Redefining internal field separator to empty string prevents stripping of spaces and tabs. Doing it on a same line as read means that it will not affect other shell commands.
-r - Means "raw", which prevents read from treating \ at the end of the line as a special line concatenation character.
-d '' - Passing empty string as a delimiter prevents read from stripping newline characters. Actually means that null byte is used as a delimiter. -d '' is equal to -d $'\0'.
-n 1 - Means that one character at a time will be read.
printf %s "$string" - Using printf instead of echo -n is safer, because echo treats -n and -e as options. If you pass "-e" as a string, echo will not print anything.
< <(...) - Passing string to the loop using process substitution. If you use here-strings instead (done <<< "$string"), an extra newline character is appended at the end. Also, passing string through a pipe (printf %s "$string" | while ...) would make the loop run in a subshell, which means all variable operations are local within the loop.
Now, let's test the performance with a huge string.
I used the following file as a source:
https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
The following script was called through time command:
#!/bin/bash
# Saving contents of the file into a variable named `string'.
# This is for test purposes only. In real code, you should use
# `done < "filename"' construct if you wish to read from a file.
# Using `string="$(cat makefiles.txt)"' would strip trailing newlines.
IFS='' read -r -d '' string < makefiles.txt
while IFS='' read -r -d '' -n 1 char; do
# remake the string by adding one character at a time
new_string+="$char"
done < <(printf %s "$string")
# confirm that new string is identical to the original
diff -u makefiles.txt <(printf %s "$new_string")
And the result is:
$ time ./test.sh
real 0m1.161s
user 0m1.036s
sys 0m0.116s
As we can see, it is quite fast.
Next, I replaced the loop with one that uses parameter expansion:
for (( i=0 ; i<${#string}; i++ )); do
new_string+="${string:$i:1}"
done
The output shows exactly how bad the performance loss is:
$ time ./test.sh
real 2m38.540s
user 2m34.916s
sys 0m3.576s
The exact numbers may very on different systems, but the overall picture should be similar.
I've only tested this with ascii strings, but you could do something like:
while test -n "$words"; do
c=${words:0:1} # Get the first character
echo character is "'$c'"
words=${words:1} # trim the first character
done
It is also possible to split the string into a character array using fold and then iterate over this array:
for char in `echo "这是一条狗。" | fold -w1`; do
echo $char
done
The C style loop in #chepner's answer is in the shell function update_terminal_cwd, and the grep -o . solution is clever, but I was surprised not to see a solution using seq. Here's mine:
read word
for i in $(seq 1 ${#word}); do
echo "${word:i-1:1}"
done
#!/bin/bash
word=$(echo 'Your Message' |fold -w 1)
for letter in ${word} ; do echo "${letter} is a letter"; done
Here is the output:
Y is a letter
o is a letter
u is a letter
r is a letter
M is a letter
e is a letter
s is a letter
s is a letter
a is a letter
g is a letter
e is a letter
To iterate ASCII characters on a POSIX-compliant shell, you can avoid external tools by using the Parameter Expansions:
#!/bin/sh
str="Hello World!"
while [ ${#str} -gt 0 ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
or
str="Hello World!"
while [ -n "$str" ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
sed works with unicode
IFS=$'\n'
for z in $(sed 's/./&\n/g' <(printf '你好嗎')); do
echo hello: "$z"
done
outputs
hello: 你
hello: 好
hello: 嗎
Another approach, if you don't care about whitespace being ignored:
for char in $(sed -E s/'(.)'/'\1 '/g <<<"$your_string"); do
# Handle $char here
done
Another way is:
Characters="TESTING"
index=1
while [ $index -le ${#Characters} ]
do
echo ${Characters} | cut -c${index}-${index}
index=$(expr $index + 1)
done
fold and while read are great for the job as shown in some answers here. Contrary to those answers, I think it's much more intuitive to pipe in the order of execution:
echo "asdfg" | fold -w 1 | while read c; do
echo -n "$c "
done
Outputs: a s d f g
I share my solution:
read word
for char in $(grep -o . <<<"$word") ; do
echo $char
done
TEXT="hello world"
for i in {1..${#TEXT}}; do
echo ${TEXT[i]}
done
where {1..N} is an inclusive range
${#TEXT} is a number of letters in a string
${TEXT[i]} - you can get char from string like an item from an array
How can i shift each letter of a string by a given number of letters down or up in bash, without using a hardcoded dictionary?
Do you mean something like ROT13:
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]'
uryyb gurer
pax$ echo 'hello there' | tr '[a-z]' '[n-za-m]' | tr '[a-z]' '[n-za-m]'
hello there
For a more general solution where you want to provide an arbitrary rotation (0 through 26), you can use:
#!/usr/bin/bash
dual=abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz
phrase='hello there'
rotat=13
newphrase=$(echo $phrase | tr "${dual:0:26}" "${dual:${rotat}:26}")
echo ${newphrase}
If you want to rotate also the capitals you could use something like this:
cat data.txt | tr '[a-z]' '[n-za-m]' | tr '[A-Z]' '[N-ZA-M]'
where data.txt has whatever you want to rotate.
$ alpha=abcdefghijklmnopqrstuvwxyz
$ rot=3
$ sed "y/${alpha}/${alpha:$rot}${alpha::$rot}/" <<< 'foobar'
irredu
Shift by 12 characters(A becomes M, and vice versa)
Encryption
----------
$> echo ABCDE | tr '[A-Z]' '[M-ZA-L]' // prints MNOPQ
Decryption
----------
$> echo MNOPQ | tr '[M-ZA-L]' '[A-Z]' // prints ABCDE
In the encryption example, we are piping ABCDE to the command tr which is given two arguments. The first one is a matching string. It will match certain strings in your input(in our case ABCDE). The second argument works upon the result of the first argument and modifies it accordingly. So, we're basically matching any uppercase letter present in the input ABCDE and passing it to the second argument. The second argument replaces the characters with their 12th next counterpart. Now, this part is important to understand and might confuse some people, we're basically going from [M-L] in the second argument. Since the tr command doesn't accept this directly, we're breaking it up into two separate chunks. First chunk is [M-Z] and the second one is [A-L]. It's basically like a search-and-replace mechanism. You search with the first argument, modify with the second argument, as simple as that.
For the second example, I've just swapped the first argument with the second one in the tr command. Which acts perfectly as a decryptor. You could write it the same way as the first example, but I find it less time consuming when I have the encryption algorithm and I can just swap the arguments to have a decryption algorithm as well.
Or
cat data.txt | tr 'a-zA-Z' 'n-za-mN-ZA-M'
It will also work
Without using tr, shift 1 to 25 characters
and can be decrypted using 26 - original key
#!/bin/bash
#set -x
i=0
for letters in {A..Z}
do
abc_cap[$i]="$letters"
((i++))
done
i=0
for letters in {a..z}
do
abc_small[$i]="$letters"
((i++))
done
read -r -p "Enter message to be encrypted/decrypted: " -a message
read -r -p "Enter shift amount (26 - orig key for decrypt): " shift_amount
echo -n "Encrypted message: "
if [ "$shift_amount" -gt 25 ] || [ "$shift_amount" -lt 1 ]
then
echo "Shift amount out of range"
exit
fi
for word in "${message[#]}"
do
while read -r -n 1 letter
do
if [[ "$letter" = [a-z] ]]
then
for a in "${!abc_small[#]}"
do
if [ "${abc_small[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_small[$a]}"
fi
done
elif [[ "$letter" = [A-Z] ]]
then
for a in "${!abc_cap[#]}"
do
if [ "${abc_cap[$a]}" = "$letter" ]
then
a=$(echo "($a + $shift_amount) % 26" | bc)
echo -n "${abc_cap[$a]}"
fi
done
elif [[ "$letter" = "" ]]
then echo -n " "
else echo -n "$letter"
fi
done < <(echo "$word")
done
echo
exit
Problem statement and how this command can help you:
For example The password is stored in the file data.txt, where 13 positions have rotated all lowercase (a-z) and uppercase (A-Z) letters.
The data.txt file contains 1 line encrypted with the ROT13 ( rotation by 13) algorithm. In order to decrypt it, I have to replace every letter with the letter 13 positions ahead.
file contains the data as shown below
cat data.txt
Gur cnffjbeq vf WIAOOSFzMjXXBC0KoSKBbJ8puQm5lIEi
after rotation to 13 character, the password will look like this.
The password is JVNBBFSmZwKKOP0XbFXOoW8chDz5yVRv
The command to Do that is given below.
cat data.txt | tr '[A-Za-z]' '[N-ZA-Mn-za-m]'
Explanation of the Command
cat data.txt read all the character in data.txt file and then pass to tr command, tr commands takes two arguments, the first argument [A-Za-z] read only the characters made of A-Z or a-z. and in the second argument is rotation regular expression.
[13th character from A - ZA-12th character from A and same expression as for small letters]
[N-ZA-Mn-za-m]
N : 13th character from A.
Z : to the end.
A : first character.
N : just a previous character from the 13th character. to complete the circle.
repeat the same expression for small letters.
We rotated by 13, you can replace the 13th and Previous character by any x position to rotate the string by x characters