This is a follow-up question to this question, regarding how to know the number of grouped digits in string.
In bash,
How can I find the last occurrence of a group of digits in a string?
So, if I have
string="123 abc 456"
I would get
456
And if I had
string="123 123 456"
I would still get
456
Without external utilities (such as sed, awk, ...):
$ s="123 abc 456"
$ [[ $s =~ ([0-9]+)[^0-9]*$ ]] && echo "${BASH_REMATCH[1]}"
456
BASH_REMATCH is a special array where the matches from [[ ... =~ ... ]] are assigned to.
Test code:
str=("123 abc 456" "123 123 456" "123 456 abc def" "123 abc" "abc 123" "123abc456def")
for s in "${str[#]}"; do
[[ $s =~ ([0-9]+)[^0-9]*$ ]] && echo "$s -> ${BASH_REMATCH[1]}"
done
Output:
123 abc 456 -> 456
123 123 456 -> 456
123 456 abc def -> 456
123 abc -> 123
abc 123 -> 123
123abc456def -> 456
You can use a regex in Bash:
$ echo "$string"
123 abc 456
$ [[ $string =~ (^.*[ ]+|^)([[:digit:]]+) ]] && echo "${BASH_REMATCH[2]}"
456
If you want to capture undelimited strings like 456 or abc123def456 you can do:
$ echo "$string"
test456text
$ [[ $string =~ ([[:digit:]]+)[^[:digit:]]*$ ]] && echo "${BASH_REMATCH[1]}"
456
But if you are going to use an external tool, use awk.
Here is a demo of Bash vs Awk to get the last field of digits in a string. These are for digits with ' ' delimiters or at the end or start of a string.
Given:
$ cat file
456
123 abc 456
123 123 456
abc 456
456 abc
123 456 foo bar
abc123def456
Here is a test script:
while IFS= read -r line || [[ -n $line ]]; do
bv=""
av=""
[[ $line =~ (^.*[ ]+|^)([[:digit:]]+) ]] && bv="${BASH_REMATCH[2]}"
av=$(awk '{for (i=1;i<=NF;i++) if (match($i, /^[[:digit:]]+$/)) last=$i; print last}' <<< "$line")
printf "line=%22s bash=\"%s\" awk=\"%s\"\n" "\"$line\"" "$bv" "$av"
done <file
Prints:
line= "456" bash="456" awk="456"
line= "123 abc 456" bash="456" awk="456"
line= "123 123 456" bash="456" awk="456"
line= "abc 456" bash="456" awk="456"
line= "456 abc" bash="456" awk="456"
line= "123 456 foo bar" bash="456" awk="456"
line= "abc123def456" bash="" awk=""
grep -o '[0-9]\+' file|tail -1
grep -o lists matched text only
tail -1 output only the last match
well, if you have string:
grep -o '[0-9]\+' <<< '123 foo 456 bar' |tail -1
You may use this sed to extract last number in a line:
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/'
Examples:
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 abc 456'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 456 foo bar'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 123 456'
456
sed -E 's/(.*[^0-9]|^)([0-9]+).*/\2/' <<< '123 x'
123
RegEx Details:
(.*[^0-9]|^): Match 0 or more characters at start followed by a non-digit OR line start.
([0-9]+): Match 1+ digits and capture in group #2
.*: Match remaining characters till end of line
\2: Replace it with back-reference #2 (what we captured in group #2)
Another way to do it with pure Bash:
shopt -s extglob # enable extended globbing - for *(...)
tmp=${string%%*([^0-9])} # remove non-digits at the end
last_digits=${tmp##*[^0-9]} # remove everything up to the last non-digit
printf '%s\n' "$last_digits"
This is a good job for parameter expansion:
$ string="123 abc 456"
$ echo ${string##* }
456
A simple answer with gawk:
echo "$string" | gawk -v RS=" " '/^[[:digit:]]+$/ { N = $0 } ; END { print N }'
With RS=" ", we read each field as a separate record.
Then we keep the last number found and print it.
$ string="123 abc 456 abc"
$ echo "$string" | gawk -v RS=" " '/^[[:digit:]]+$/ { N = $0 } ; END { print N }'
456
Related
I have a file like
abc
1234567890
0987654321
cde
fgh
ijk
1234567890
0987654321
I need to write a script that extract the lines with a blank before and after, in the example should be like this:
cde
fgh
I guess that awk or sed could do the work but I wasn't able to make them work. Any help?
Here is the solution.
#!/bin/bash
amt=$(sed -n '$=' path-to-your-file)
i=0
while :
do
((i++))
if [ $i == $amt ]; then
break
fi
if ! [ $i == 1 ]; then
j=$(expr $i - 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
j=$(expr $i + 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
emp=$(sed $i'!d' path-to-your-file)
echo >> extracted $emp
fi
fi
fi
done
With awk:
awk '
BEGIN{
RS=""
FS="\n"
}
NF==1' file
Prints:
cde
fgh
very simple solution
cat "myfile.txt" | grep -A 1 '^$' | grep -B 1 '^$' | grep -v -e '^--$' | grep -v '^$'
assuming "--" is the default group separator
you may get ride of group separator by other means like
--group-separator="" or --no-group-separator options
but depends of grep program variant (BSD, Gnu, OSX... )
Here is what we have in the $foo variable:
abc bcd cde def
We need to echo the first part of the variable ONLY, and do this repeatedly until there's nothing left.
Example:
$ magic_while_code_here
I am on abc
I am on bcd
I am on cde
I am on def
It would use the beginning word first, then remove it from the variable. Use the beginning word first, etc. until empty, then it quits.
So the variable would be abc bcd cde def, then bcd cde def, then cde def, etc.
We would show what we have tried but we are not sure where to start.
If you need to use the while loop and cut the parts from the beginning of the string, you can use the cut command.
foo="abc bcd cde def"
while :
do
p1=`cut -f1 -d" " <<<"$foo"`
echo "I am on $p1"
foo=`cut -f2- -d" " <<<"$foo"`
if [ "$p1" == "$foo" ]; then
break
fi
done
This will output:
I am on abc
I am on bcd
I am on cde
I am on def
Assuming the variable consist of sequences of only alphabetic characters separated by space or tabs or newlines, we can (ab-)use the word splitting expansion and just do printf:
foo="abc bcd cde def"
printf "I am on %s\n" $foo
will output:
I am on abc
I am on bcd
I am on cde
I am on def
I would use read -a to read the string into an array, then print it:
$ foo='abc bcd cde def'
$ read -ra arr <<< "$foo"
$ printf 'I am on %s\n' "${arr[#]}"
I am on abc
I am on bcd
I am on cde
I am on def
The -r option makes sure backslashes in $foo aren't interpreted; read -a allows you to have any characters you want in $foo and split on whitespace.
Alternatively, if you can use awk, you could loop over all fields like this:
awk '{for (i=1; i<=NF; ++i) {print "I am on", $i}}' <<< "$foo"
I have problem on that the program cannot read each word if words in text file is spaced by tab, not space.
For example, here is file.
part_Q.txt:
NWLR35MQ 649
HCDA93OW 526
abc 1
def 2
ghi 3
note that between "abc" and "1", there is a tab, not space.
Also note that between "NWLR35MQ" and "649", there is no tab but all are spaces. same for 2nd line as well.
Output:
NWLR35MQ
649
HCDA93OW
526
def
2
ghi
3
However, if I replace tab between "abc" and "1" by space in the file, then it outputs correctly like below,
Expected output:
NWLR35MQ
649
HCDA93OW
526
abc
1
def
2
ghi
3
It correctly display all words in file. How can I display all words regardless of tab or space? it should display all words in both cases. It seems that the program regards tab as a character.
Below is source code:
#!/bin/sh
tempCtr=0
realCtr=0
copyCtr=0
while IFS= read -r line || [[ -n $line ]]; do
IFS=' '
tempCtr=0
for word in $line; do
temp[$tempCtr]="$word"
let "tempCtr++"
done
# if there are exactly 2 fields in each line, store ID and quantity
if [ $tempCtr -eq 2 ]
then
part_Q[$realCtr]=${temp[$copyCtr]}
let "realCtr++"
let "copyCtr++"
part_Q[$realCtr]=${temp[$copyCtr]}
let "realCtr++"
copyCtr=0
fi
done < part_Q.txt
for value in "${part_Q[#]}"; do
echo $value
done
What are you trying to do? If outputting is your only goal, this can be achieved very easily:
$ cat <<EOF | sed -E 's/[[:blank:]]+/\n/'
NWLR35MQ 649
HCDA93OW 526
abc 1
def 2
ghi 3
EOF
NWLR35MQ
649
HCDA93OW
526
abc
1
def
2
ghi
3
Awk is faster than a loop, but here is how you can implement this with a loop:
realCtr=0
while read -r x1 x2 x3; do
if [ -n "${x2}" ] && [ -z "${x3}" ]; then
echo 2=$x2
part_Q[realCtr]="${x1}"
(( realCtr++ ))
part_Q[realCtr]="${x2}"
(( realCtr++ ))
fi
done < part_Q.txt
echo "Array (2 items each line):"
echo "${part_Q[#]}" | sed 's/[^ ]* [^ ]* /&\n/g'
You might solve this (as in your example) by a single line of code
cat part_Q.txt | tr $'\t' $'\n' | tr -s ' ' $'\n'
which
first translates a tab into a newline, and then
translates space(-s) as well
Note: For tr you will need the $ before the \tab and \newline characters in bash.
Since it has been mentioned, awk can help, too:
awk 'NF==2{print $1"\n"$2}' part_Q.txt
Where NF==2 even takes care about only using lines with 2 'words'.
Changing IFS=' ' to IFS=$'\t ' solved the problem.
I have to put some data in a file which should be unique.
suppose in
file1 I have following data.
ABC
XYZ
PQR
and now I want to add MNO DES ABC then it should only copy "MNO" and "DES" as "ABC" is already present.
file1 should look like
ABC
XYZ
PQR
MNO
DES
(ABC should be there for only once.)
Easiest way: this sholud add non-matching line in f1
diff -c f1 f2|grep ^+|awk -F '+ ' '{print $NF}' >> f1
or if '+ ' is going to be a part of actual text:
diff -c f1 f2|grep ^+|awk -F '+ ' '{ for(i=2;i<=NF;i++)print $i}' >> f1
shell script way:
I have compare script that compares line counts/lenght etc.. but for your requirement I think below part should do the job....
input:
$ cat f1
ABC
XYZ
PQR
$ cat f2
MNO
DES
ABC
output after script*
$ ./compareCopy f1 f2
-----------------------------------------------------
comparing f1 f2
-----------------------------------------------------
Lines check - DONE
$ cat f1
ABC
XYZ
PQR
DES
MNO
#!/bin/sh
if [ $# != "2" ]; then
echo
echo "Requires arguments from command prompt"
echo "Usage: compare <file1> <file2>"
echo
exit
fi
proc="compareCopy"
#sort files for line by line compare
cat $1|sort > file1.tmp
cat $2|sort > file2.tmp
echo "-----------------------------------------------------"
echo " comparing $1 $2" |tee ${proc}_compare.result
echo "-----------------------------------------------------"
file1_lines=`wc -l $1|cut -d " " -f1`
file2_lines=`wc -l $2|cut -d " " -f1`
#Check each line
x=1
while [ "${x}" -le "${file1_lines}" ]
do
f1_line=`sed -n ${x}p file1.tmp`
f2_line=`sed -n ${x}p file2.tmp`
if [ "${f1_line}" != "${f2_line}" ]; then
echo "line number ${x} don't match in both $1 and $2 files" >> ${proc}_compare.result
echo "$1 line: "${f1_line}"" >> ${proc}_compare.result
echo "$2 line: "${f2_line}"" >> ${proc}_compare.result
# so add this line in file 1
echo $f2_line >> $1
fi
x=$[${x} +1]
done
rm -f file1.tmp file2.tmp
echo "Lines check - DONE" |tee -a ${proc}_compare.result
Use fgrep:
fgrep -vf file1 file2 > file2.tmp && cat file2.tmp >> file1 && rm file2.tmp
which fetches all lines of file2 that are not in file1 and appends the result to file1.
You may want to take a look at this post: grep -f maximum number of patterns?
Perl one liner
file one:
1
2
3
file two:
1
4
3
Print Only Unique Line
perl -lne 'print if ++$n{ $_ } == 1 ' file_one.txt file_two.txt
Or
perl -lne 'print unless ++$n{ $_ } ' file_one.txt file_two.txt
output
1
4
3
2
The natural way:
sort -u File1 File2 >Temp && mv Temp File1
The tricky way if the files are already sorted:
comm File1 File2 | awk '{$1=$1};1' >Temp && mv Temp File1
$ cat File1
MNO ABC
MNO ABC
ABC
vzcbjzcb
After removing the lines, it should look like this:
MNO ABC
MNO ABC
zcvshskl
I would go with Perl as follows:
perl -nle 's/ABC//g if !m/MNO/; print if length($_)' File1
That says... "Process all lines of File1 handling line endings automatically (-nle). Replace (substitute) ABC with nothing if the line doesn't match MNO. Print the line if it is not empty."
Note that this will not retain empty lines in your input file but I trust that is not a problem.
You can do much the same with awk:
awk '!/MNO/{gsub(/ABC/,"")} length($0)' File1
Or with sed:
sed -e '/MNO/!s/ABC//g' -e '/^\s*$/d' File1
If you really, really want to do it in bash:
#!/bin/bash
while read line; do
[[ ! $line =~ MNO ]] && line=${line/ABC/}
[ ${#line} -gt 0 ] && echo $line
done < File1