Here is what we have in the $foo variable:
abc bcd cde def
We need to echo the first part of the variable ONLY, and do this repeatedly until there's nothing left.
Example:
$ magic_while_code_here
I am on abc
I am on bcd
I am on cde
I am on def
It would use the beginning word first, then remove it from the variable. Use the beginning word first, etc. until empty, then it quits.
So the variable would be abc bcd cde def, then bcd cde def, then cde def, etc.
We would show what we have tried but we are not sure where to start.
If you need to use the while loop and cut the parts from the beginning of the string, you can use the cut command.
foo="abc bcd cde def"
while :
do
p1=`cut -f1 -d" " <<<"$foo"`
echo "I am on $p1"
foo=`cut -f2- -d" " <<<"$foo"`
if [ "$p1" == "$foo" ]; then
break
fi
done
This will output:
I am on abc
I am on bcd
I am on cde
I am on def
Assuming the variable consist of sequences of only alphabetic characters separated by space or tabs or newlines, we can (ab-)use the word splitting expansion and just do printf:
foo="abc bcd cde def"
printf "I am on %s\n" $foo
will output:
I am on abc
I am on bcd
I am on cde
I am on def
I would use read -a to read the string into an array, then print it:
$ foo='abc bcd cde def'
$ read -ra arr <<< "$foo"
$ printf 'I am on %s\n' "${arr[#]}"
I am on abc
I am on bcd
I am on cde
I am on def
The -r option makes sure backslashes in $foo aren't interpreted; read -a allows you to have any characters you want in $foo and split on whitespace.
Alternatively, if you can use awk, you could loop over all fields like this:
awk '{for (i=1; i<=NF; ++i) {print "I am on", $i}}' <<< "$foo"
Related
This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 7 months ago.
I have many strings that look like the following:
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
Using bash, I would like to just get the string after the last '.'(dot) character.
So the output I want:
xyz
abc
mno
pqr
Is there any way to do this?
AWK will do it. I'm using GNU AWK:
$ awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
xyz
abc
mno
pqr
AWK splits lines into fields and we use -F to set the field separator to .. Fields are indexed from 1, so $1 would get the first one (e.g. word1 in the first line) and we can use the variable $NF (for "number of fields") to get the value of the last field in each line.
https://www.grymoire.com/Unix/Awk.html is a great tutorial on AWK.
You can then just use a for loop to iterate over each of the resulting lines:
$ lines=$(awk -F '.' '{print $NF}' <<EOF
word1.word2.word3.xyz
word1.word2.word3.word4.abc
word1.word2.mno
word1.word2.word3.pqr
EOF
)
$ for line in $lines; do echo $line; done
xyz
abc
mno
pqr
I'm using command substitution here - see the Advanced Bash Scripting Guide for information on loops, command substitution and other useful things.
One simple solution would be to split the string on . and then get the last item from the splitted array
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef 'a * b')
for line in "${lines[#]}"
do
line_split=(${line//./ })
echo "${line_split[-1]}"
done
Another clean shell-checked way would be (the idea is the same)
lines=(word1.word2.word3.xyz word1.word2.word3.xyz word1.word2.word3.word4.abc word1.word2.mno word1.word2.word3.pqr abcdef)
for line in "${lines[#]}"; do
if [[ $line == *.* ]]; then # check if line contains dot character
IFS=. read -r -a split_array <<<"$line" # one-line solution
echo "${split_array[-1]}" # shows the results
else
echo "No dot in string: $line"
fi
done
This is a one-liner solution (after array assignment), without using an explicit loop (but using printf's implicit loop).
arr=( 'word1.word2.word3.xyz'
'word1.word2.word3.word4.abc'
'word1.word2.mno'
'word1.word2.word3.pqr' )
printf '%s\n' "${arr[#]##*.}"
I am trying to process the output of another script that looks a little something like this:
xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]
What I want to do is to be able to find the first substring surrounded by quotes, confirm the value (i.e. "ABCD") and then take all the remaining substrings (there is a variable number of substrings) and put them in an array.
I've been looking around for the answer to this but the references I've been able to find involve just extracting one substring and not multiples.
This Shellcheck-clean demonstration program shows a way to do it with Bash's own regular expression matching ([[ str =~ regex ]]):
#! /bin/bash -p
input='xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
# Regular expression to match strings with double quoted substrings.
# The first parenthesized subexpression matches the first string in quotes.
# The second parenthesized subexpression matches the entire portion of the
# string after the first quoted substring.
quotes_rx='^[^"]*"([^"]*)"(.*)$'
if [[ $input =~ $quotes_rx ]]; then
if [[ ${BASH_REMATCH[1]} == ABCD ]]; then
tmpstr=${BASH_REMATCH[2]}
else
echo "First quoted substring is not 'ABCD'" >&2
exit 1
fi
else
echo 'Input does not contain any quoted substrings' >&2
exit 1
fi
quoted_strings=()
while [[ $tmpstr =~ $quotes_rx ]]; do
quoted_strings+=( "${BASH_REMATCH[1]}" )
tmpstr=${BASH_REMATCH[2]}
done
declare -p quoted_strings
See mkelement0's excellent answer to How do I use a regex in a shell script? for information about Bash's regular expression matching.
This awk tests for the content between the first pair of " characters, and extracts everything between subsequent pairs.
awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i+=2) print $i}'
To populate a bash array, you could use mapfile and process substitution:
mapfile -t arr < <( … )
Testing:
mapfile -t arr < <(
awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i+=2) print $i}' \
<<< 'xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
)
printf '%s\n' "${arr[#]}"
EFGH
IJKL
MNOP
QRST
UVWX
YZ12
I have a string containing duplicate words, for example:
abc, def, abc, def
How can I remove the duplicates? The string that I need is:
abc, def
We have this test file:
$ cat file
abc, def, abc, def
To remove duplicate words:
$ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file
abc, def
How it works
:a
This defines a label a.
s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g
This looks for a duplicated word consisting of alphanumeric characters and removes the second occurrence.
ta
If the last substitution command resulted in a change, this jumps back to label a to try again.
In this way, the code keeps looking for duplicates until none remain.
s/(, )+/, /g; s/, *$//
These two substitution commands clean up any left over comma-space combinations.
Mac OSX or other BSD System
For Mac OSX or other BSD system, try:
sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file
Using a string instead of a file
sed easily handles input either from a file, as shown above, or from a shell string as shown below:
$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//'
ab, cd, ef
You can use awk to do this.
Example:
#!/bin/bash
string="abc, def, abc, def"
string=$(printf '%s\n' "$string" | awk -v RS='[,[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}')
string="${string%,*}"
echo "$string"
Output:
abc, def
This can also be done in pure Bash:
#!/bin/bash
string="abc, def, abc, def"
declare -A words
IFS=", "
for w in $string; do
words+=( [$w]="" )
done
echo ${!words[#]}
Output
def abc
Explanation
words is an associative array (declare -A words) and every word is added as
a key to it:
words+=( [${w}]="" )
(We do not need its value therefore I have taken "" as value).
The list of unique words is the list of keys (${!words[#]}).
There is one caveat thought, the output is not separated by ", ". (You will
have to iterate again. IFS is only used with ${words[*]} and even than only
the first character of IFS is used.)
I have another way for this case. I changed my input string such as below and run command to editing it:
#string="abc def abc def"
$ echo "abc def abc def" | xargs -n1 | sort -u | xargs | sed "s# #, #g"
abc, def
Thanks for all support!
The problem with an associative array or xargs and sort in the other examples is, that the words become sorted. My solution only skips words that already have been processed. The associative array map keeps this information.
Bash function
function uniq_words() {
local string="$1"
local delimiter=", "
local words=""
declare -A map
while read -r word; do
# skip already processed words
if [ ! -z "${map[$word]}" ]; then
continue
fi
# mark the found word
map[$word]=1
# don't add a delimiter, if it is the first word
if [ -z "$words" ]; then
words=$word
continue
fi
# add a delimiter and the word
words="$words$delimiter$word"
# split the string into lines so that we don't have
# to overwrite the $IFS system field separator
done <<< $(sed -e "s/$delimiter/\n/g" <<< "$string")
echo ${words}
}
Example 1
uniq_words "abc, def, abc, def"
Output:
abc, def
Example 2
uniq_words "1, 2, 3, 2, 1, 0"
Output:
1, 2, 3, 0
Example with xargs and sort
In this example, the output is sorted.
echo "1 2 3 2 1 0" | xargs -n1 | sort -u | xargs | sed "s# #, #g"
Output:
0, 1, 2, 3
This is a follow-up question to this question, regarding how to know the number of grouped digits in string.
In bash,
How can I find the last occurrence of a group of digits in a string?
So, if I have
string="123 abc 456"
I would get
456
And if I had
string="123 123 456"
I would still get
456
Without external utilities (such as sed, awk, ...):
$ s="123 abc 456"
$ [[ $s =~ ([0-9]+)[^0-9]*$ ]] && echo "${BASH_REMATCH[1]}"
456
BASH_REMATCH is a special array where the matches from [[ ... =~ ... ]] are assigned to.
Test code:
str=("123 abc 456" "123 123 456" "123 456 abc def" "123 abc" "abc 123" "123abc456def")
for s in "${str[#]}"; do
[[ $s =~ ([0-9]+)[^0-9]*$ ]] && echo "$s -> ${BASH_REMATCH[1]}"
done
Output:
123 abc 456 -> 456
123 123 456 -> 456
123 456 abc def -> 456
123 abc -> 123
abc 123 -> 123
123abc456def -> 456
You can use a regex in Bash:
$ echo "$string"
123 abc 456
$ [[ $string =~ (^.*[ ]+|^)([[:digit:]]+) ]] && echo "${BASH_REMATCH[2]}"
456
If you want to capture undelimited strings like 456 or abc123def456 you can do:
$ echo "$string"
test456text
$ [[ $string =~ ([[:digit:]]+)[^[:digit:]]*$ ]] && echo "${BASH_REMATCH[1]}"
456
But if you are going to use an external tool, use awk.
Here is a demo of Bash vs Awk to get the last field of digits in a string. These are for digits with ' ' delimiters or at the end or start of a string.
Given:
$ cat file
456
123 abc 456
123 123 456
abc 456
456 abc
123 456 foo bar
abc123def456
Here is a test script:
while IFS= read -r line || [[ -n $line ]]; do
bv=""
av=""
[[ $line =~ (^.*[ ]+|^)([[:digit:]]+) ]] && bv="${BASH_REMATCH[2]}"
av=$(awk '{for (i=1;i<=NF;i++) if (match($i, /^[[:digit:]]+$/)) last=$i; print last}' <<< "$line")
printf "line=%22s bash=\"%s\" awk=\"%s\"\n" "\"$line\"" "$bv" "$av"
done <file
Prints:
line= "456" bash="456" awk="456"
line= "123 abc 456" bash="456" awk="456"
line= "123 123 456" bash="456" awk="456"
line= "abc 456" bash="456" awk="456"
line= "456 abc" bash="456" awk="456"
line= "123 456 foo bar" bash="456" awk="456"
line= "abc123def456" bash="" awk=""
grep -o '[0-9]\+' file|tail -1
grep -o lists matched text only
tail -1 output only the last match
well, if you have string:
grep -o '[0-9]\+' <<< '123 foo 456 bar' |tail -1
You may use this sed to extract last number in a line:
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/'
Examples:
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 abc 456'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 456 foo bar'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 123 456'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 x'
123
RegEx Details:
(.*[^0-9]|^): Match 0 or more characters at start followed by a non-digit OR line start.
([0-9]+): Match 1+ digits and capture in group #2
.*: Match remaining characters till end of line
\2: Replace it with back-reference #2 (what we captured in group #2)
Another way to do it with pure Bash:
shopt -s extglob # enable extended globbing - for *(...)
tmp=${string%%*([^0-9])} # remove non-digits at the end
last_digits=${tmp##*[^0-9]} # remove everything up to the last non-digit
printf '%s\n' "$last_digits"
This is a good job for parameter expansion:
$ string="123 abc 456"
$ echo ${string##* }
456
A simple answer with gawk:
echo "$string" | gawk -v RS=" " '/^[[:digit:]]+$/ { N = $0 } ; END { print N }'
With RS=" ", we read each field as a separate record.
Then we keep the last number found and print it.
$ string="123 abc 456 abc"
$ echo "$string" | gawk -v RS=" " '/^[[:digit:]]+$/ { N = $0 } ; END { print N }'
456
This is really self explanatory. I'm working in a bash shell and I'm really new to shell scripting. I've found a lot of information about using tr and sed but all the examples I have found so far are removing delimiters and new lines. I really want to do the opposite of that. I want to be able to separate based on a blank space. I have a string like "abcd efgh" and I need it to be "abcd" "efgh" (all without quotes, just to show groupings).
I'm sure this is much simpler than I'm making it, but I'm very confused.
Updated Question:
I have a column of PIDs that I have put into an array, but each element of the array has both the pids in the column.
Column:
1234
5678
when I print out the entire array, all the different columns have been added so I have all the values, but when I print out a single element of my array I get something like:
1234 5678
which is not what I want.
I need to have an element for 1234 and a separate one for 5678.
This is my code so far:
!/bin/bash
echo "Enter the File Name"
read ips
index=0
IFS=' '
while read myaddr myname; do
myips[$index]="$myaddr"
names[$index]="$myname"
index=$(($index+1))
done < $ips
echo "my IPs are: ${myips[*]}"
echo "the corresponding names are: ${names[*]}"
echo "Total IPs in the file: ${index}"
ind=0
for i in "$myips[#]}"
do
echo $i
pids=( $(jps | awk '{print $1}') )
for pid in "${pids[#]}"; do
echo $pid
done
echo "my PIDs are: ${pids}"
for j in "${pids[#]}"
do
mypids[$ind]="$j"
ind=$(($ind+1))
done
done
echo "${mypids[*]}"
echo "The 3rd PID is: ${mypids[2]}"
SAMPLE OUTPUT:
Total IPs in the file: 6
xxx.xxx.xxx.xxx
5504
1268
1
xxx.xxx.xxx.xxx
5504
4352
1
xxx.xxx.xxx.xxx
5504
4340
1
5504
1268 5504
4352 5504
4340
The 3rd pid is: 5504
4340
I need each pid to be separate, so that each element of the array, is a single pid. So for instance, the line "The 3rd pid is: " needs to look something like
The 3rd pid is: 5504
and the 4th element would be 4340
Try cut:
$ echo "abcd efgh" | cut -d" " -f1
abcd
$ echo "abcd efgh" | cut -d" " -f2
efgh
Alternatively, if at some point you want to do something more complex, do look into awk as well:
$ echo "abcd efgh" | awk '{print $1}'
abcd
$ echo "abcd efgh" | awk '{print $2}'
efgh
To address your updated question:
I have a column of PIDs that I have put into an array, but each element of the array has both the pids in the column.
If you want to load a column of data into an array, you could do something like this:
$ pgrep sshd # example command. Get pid of all sshd processes
795
32046
32225
$ A=(`pgrep sshd`) # store output of command in array A
$ echo ${A[0]} # print first value
795
$ echo ${A[1]} # print second value
32046
To address the example code you posted, the reason for your problem is that you've change $IFS to a space (IFS=' ') which means that your columns which are separated by newlines are no longer being split.
Consider this example:
$ A=(`pgrep sshd`)
$ echo ${A[0]} # works as expected
795
$ IFS=' ' # change IFS to space only
$ A=(`pgrep sshd`)
$ echo ${A[0]} # newlines no longer used as separator
795
32046
32225
To avoid this problem, a common approach is to always backup the original IFS and replace it once you've done using the updated value. E.g.
# backup original IFS
OLDIFS=$IFS
IFS=' '
# .. do stuff ...
# restore after use
IFS=$OLDIFS
Sample file:
abcd efgh
bla blue
Using awk you can do the following
cat file.txt | awk '{print $1}'
This will output the following
abcd
bla
or
cat file.txt | awk '{print $2}'
This will output the following
efgh
blue
Awk is a really powerfull command I suggest you try to learn it as soon as you can. It will save you lots of headaches in bash scripting.
The other solutions are pretty good. I use cut often. However, I just wanted to add that if you always want to split on whitespace then xargs will do that for you. Then the command line version of printf can format the arguments (if reordering of strings is desired use awk as in the other solution). Here is an example for reference:
MYSTR="hello big world"
$ echo $MYSTR |xargs printf "%s : %s > %s\n"
hello : big > world
The read command handles input as entire lines (unless a delimiter is set with -e):
$ echo "abcd efgh" | while read item
do
echo $item
# Do something with item
done
abcd efgh
If you want to pipe each item to a command, you can do this:
echo "abcd efgh" | tr ' ' '\n' | while read item
do
echo $item
# Do something with item
done
abcd
efgh
No need to use external commands to split strings into words. The set built-in does just that:
string="abcd efgh"
set $string
# Now $1 is "abcd" and $2 is "efgh"
echo $1
echo $2
There is no difference between the string "abcd efgh" and the string "abcd" "efgh" other than, if passed as argument to a program, the first will be read as one argument where the second will be two arguments.
The double quotes " merely activate and deactivate shell expansion, just as the single quotes do (more aggressively, though).
Now, you could have a string '"abcd efgh"' which you would like to transform into '"abcd" "efgh"', which you could do with sed 's/ /" "/' but that's probably not what you want.