unix command to extract digits after last alphabetical string - shell

String:"gamma021AH00999NAK41"
last two digit may vary.It may be 3 digit 4 digit ,etc...
"NAK" in the given string can be any other string but it contains only characters.
So my intention is to extract last numbers(example 41 in the given string) until first character.
Thanks in advance

Using only shell builtins (no external commands like sed or awk, thus much faster if you're going to be repeating this over and over, f/e, once per line):
s=gamma021AH00999NAK41
result=${s##*[[:alpha:]]}
echo "$result"
${var##pattern} is a parameter expansion which removes the longest possible match for pattern from the front of the value of var before returning it. *[[:alpha:]], as a wildcard followed by an alpha character, will thus remove everything before the K in your string.

You can replace all the alphabetic characters by for example "#" and then take the last field based on the "#" separator:
echo "gamma021AH00999NAK41" | sed "s/[aA-zZ]/#/g" | awk -F'#' '{print $NF}'
NOTE: This won't work if you have other than alphanumeric symbols in your string.
EDIT: Only without awk (Thanks #CharlesDuffy):
echo "gamma021AH00999NAK41" | awk -F'[[:alpha:]]' '{print $NF}'

I see no mention of varying length, so this command will work:
echo "gamma021AH00999NAK41" | cut -b '19-'
Answer : 41

Related

grep strings based on the length

Is it possible to search strings based on the length in a specific file using grep?
I have tried using the awk but did not work
awk '$0~"^s" && length($0)==31' strings.xml
If not using grep is it possible to find using some other command line tool.
You can use:
grep -E '^s.{30}$' strings.xml
The regexp matches s at the beginning of the line, followed by any 30 characters, then the end of the line. So it will match a line with exactly 31 characters beginning with s.
But the awk command is equivalent, so if it didn't work, neither will this.
awk default is to split fields by whitespace, therefore if you want to match against the first match starting with s and have a length of 31, you could use:
awk '$1 ~ /^s.{30}$/ {print}' strings.xml
The /^s is to match a string starting with s and the .{30}$ matches any . character (except for line terminators) {30} exactly 30 times

Align numbers using only sed

I need to align decimal numbers with the "," symbol using only the sed command. The "," should go in the 5th position. For example:
183,7
2346,7
7,999
Should turn into:
183,7
2346,7
7,999
The maximum amount of numbers before the comma is 4. I have tried using this to remove spaces:
sed 's/ //g' input.txt > nospaces.txt
And then I thought about adding spaces depending on the number of digits before the comma, but I don't know how to do this using only sed.
Any help would be appreciated.
Assuming that there is only one number on each line; that there are at most four digits before the ,, and that there is always a ,:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;s/.*\(.....,.*\)/\1/;'
The first s gets rid of everything other than the (first) number on the line, and puts four spaces before it. The second one deletes everything before the fifth character prior to the ,, leaving just enough spaces to right justify the number.
The second s command might mangle input lines which didn't match the first s command. If it is possible that the input contains such lines, you can add a conditional branch to avoid executing the second substitution if the first one failed. With Gnu sed, this is trivial:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;T;s/.*\(.....,.*\)/\1/;'
T jumps to the end of the commands if the previous s failed. Posix standard sed only has a conditional branch on success, so you need to use this circuitous construction:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;ta;b;:a;s/.*\(.....,.*\)/\1/;'
where ta (conditional branch to a on success) is used to skip over a b (unconditional branch to end). :a is the label referred to by the t command.
if you change your mind, here is an awk solution
$ awk -F, 'NF{printf "%5d,%-d\n", $1,$2} !NF' file
183,7
2346,7
7,999
set the delimiter to comma and handle both parts as separate fields
Try with this:
gawk -F, '{ if($0=="") print ; else printf "%5d,%-d\n", $1, $2 }' input.txt
If you are using GNU sed, you could do as below
sed -r 's/([0-9]+),([0-9]+)/printf "%5s,%d" \1 \2/e' input.txt

ignore spaces within/around brackets to count occurrences

(to LaTeX users) I want to search for manually labeled items
(to whom it may concern) script file on GitHub
I tried to find solution, but what I've found suggested to remove spaces first. In my case, I think there should be simpler solution. It could be using grep or awk or some other tool.
Consider the following lines:
\item[a)] some text
\item [i) ] any text
\item[ i)] foo and faa
\item [ 1) ] foo again
I want to find (or count) if there are items with a single ) inside brackets. The format could have blank spaces inside the brackets and/or around it. Also, the char before the closing parentheses could be any letter or number.
Edit: I tried grep "\[a)\]" but it missed [ a) ].
Since there are many possible ways to write an item, I can not decide about a possible pattern. I think that it is enough for me such as
\item<blank spaces>[<blank spaces><letter or number>)<blank spaces>]
Replace blank space could not work because the patter above in general contains text around it (for example: \item[ a)] consider the function...)
The output should indicate is there are such patterns or not. It could be zero or the number of occurrences.
So to do it all in the grep itself:
grep -c -E '\\item\s*\[\s*\w+\)\s*\]' file.txt
Note all the \s* checks for spaces. Also -c to get the count.
Breaking it down:
\\ a backslash (needs escape in grep)
item "item"
\s* optional whitespaces
\[ "[" (needs escape in -E)
\s* optional whitespaces
\w+ at least one 'word' char
\) ")" (needs escape in -E)
\s* optional whitespaces
\] "]" (needs escape in -E)
Following awk may also help here(I am simply removing the spaces between [ to ] and then looking for pattern of either digit or character in it.
awk '
match($0,/\[.*\]/){
val=substr($0,RSTART+1,RLENGTH-1);
gsub(/[[:space:]]+/,"",val);
if(val ~ /[a-z0-9]+\)/){ count++ }
}
END{
print count
}' Input_file
So I am thinking something like this:
tr -d " \t" < file.txt | grep -c '\\item\[[0-9A-Za-z])\]'
This will count the number of matches for you.
Edit: Added \t to tr call. Now removes all spaces and tabs.
Here is a grep only version. This could be useful for printing out all of the matches (by removing -c) as well since the above version modifies the input:
grep -c '\\item *\[ *[0-9A-Za-z]) *\]' file.txt
Here is a more versatile answer if this is what you looking for. Here, we output the matches to a file and count the lines from the file to get the number of matches...
grep '\\item *\[ *[0-9A-Za-z]) *\]' file.txt > matches.txt
wc -l < matches.txt

Cut number of character in the beginning of string and end of the string

I need to cut a number of characters from the beginning and end of a string. The string is does not have a specific format and can be random numbers and words. I am trying to remove 5 characters in the beginning and 11 from the end of the string.
Input string:
342136001788006DEEFF0000060000806000006HSV40002HP
Output string:
6001788006DEEFF000006000080600000
The bolded characters 3413 and 6HSV40002HP are removed from the input.
it's ok found my answer using cut command which I was so focusing with awk & sed , but cut helps in the end
cut -c6-38 test.txt
You found the cut commamd wat is the best solution in this case.
You wondered how you should do this with sed, which will be interesting for more complex situations.
The noob solution is (using ; for 2 different substititions and $ for end-of-line):
echo '342136001788006DEEFF0000060000806000006HSV40002HP' |
sed 's/.....//;s/...........$//'
You do not want to count the dots, you can tell how often a pattern repeats with pattern{count}.
And you can remember/recall a pattern with `s/..(pattern)../\1/'.
echo '342136001788006DEEFF0000060000806000006HSV40002HP' |
sed 's/.\{5\}\(.*\).\{11\}/\1/'
When your sed supports the flog -r, you can avoid all thise backslashes:
echo '342136001788006DEEFF0000060000806000006HSV40002HP' |
sed -r 's/.{5}(.*).{11}/\1/'

Dynamic delimiter in Unix

Input:-
echo "1234ABC89,234" # A
echo "0520001DEF78,66" # B
echo "46545455KRJ21,00"
From the above strings, I need to split the characters to get the alphabetic field and the number after that.
From "1234ABC89,234", the output should be:
ABC
89,234
From "0520001DEF78,66", the output should be:
DEF
78,66
I have many strings that I need to split like this.
Here is my script so far:
echo "1234ABC89,234" | cut -d',' -f1
but it gives me 1234ABC89 which isn't what I want.
Assuming that you want to discard leading digits only, and that the letters will be all upper case, the following should work:
echo "1234ABC89,234" | sed 's/^[0-9]*\([A-Z]*\)\([0-9].*\)/\1\n\2/'
This works fine with GNU sed (I have 4.2.2), but other sed implementations might not like the \n, in which case you'll need to substitute something else.
Depending on the version of sed you can try:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1\n\2/'
or:
echo "0520001DEF78,66" | sed -E -e 's/[0-9]*([A-Z]*)([,0-9]*)/\1$\2/' | tr '$' '\n'
DEF
78,66
Explanation: the regular expression replaces the input with the expected output, except instead of the new-line it puts a "$" sign, that we replace to a new-line with the tr command
Where do the strings come from? Are they read from a file (or other source external to the script), or are they stored in the script? If they're in the script, you should simply reformat the data so it is easier to manage. Therefore, it is sensible to assume they come from an external data source such as a file or being piped to the script.
You could simply feed the data through sed:
sed 's/^[0-9]*\([A-Z]*\)/\1 /' |
while read alpha number
do
…process the two fields…
done
The only trick to watch there is that if you set variables in the loop, they won't necessarily be visible to the script after the done. There are ways around that problem — some of which depend on which shell you use. This much is the same in any derivative of the Bourne shell.
You said you have many strings like this, so I recommend if possible save them to a file such as input.txt:
1234ABC89,234
0520001DEF78,66
46545455KRJ21,00
On your command line, try this sed command reading input.txt as file argument:
$ sed -E 's/([0-9]+)([[:alpha:]]{3})(.+)/\2\t\3/g' input.txt
ABC 89,234
DEF 78,66
KRJ 21,00
How it works
uses -E for extended regular expressions to save on typing, otherwise for example for grouping we would have to escape \(
uses grouping ( and ), searches three groups:
firstly digits, + specifies one-or-more of digits. Oddly using [0-9] results in an extra blank space above results, so use POSIX class [[:digit:]]
the next is to search for POSIX alphabetical characters, regardless if lowercase or uppercase, and {3} specifies to search for 3 of them
the last group searches for . meaning any character, + for one or more times
\2\t\3 then returns group 2 and group 3, with a tab separator
Thus you are able to extract two separate fields per line, just separated by tab, for easier manipulation later.

Resources