Extract numbers from filename - bash

In BASH I thought to use sed, but can't figure how to extract pattern instead usual replace.
For example:
FILENAME = 'blah_blah_#######_blah.ext'
number of ciphers (in above example written with "#" substitute) could be either 7 or 10
I want to extract only the number

If all you need is to remove anything but digits, you could use
ls | sed -e s/[^0-9]//g
to get all digits grouped per filename (123test456.ext will become 123456), or
ls | egrep -o [0-9]+
for all groups of numbers (123test456.ext will turn up 123 and 456)

You can use this simple code:
filename=zc_adsf_qwer132467_xcvasdfrqw
echo ${filename//[^0-9]/} # ==> 132467

Just bash:
shopt -s extglob
filename=zc_adsf_qwer132467_xcvasdfrqw
tmp=${filename##+([^0-9])}
nums=${tmp%%+([^0-9])}
echo $nums # ==> 132467
or, with bash 4
[[ "$filename" =~ [0-9]+ ]] && nums=${BASH_REMATCH[0]}

Is there any number anywhere else in the file name? If not:
ls | sed 's/[^0-9][^0-9]*\([0-9][0-9]*\).*/\1/g'
Should work.
A Perl one liner might work a bit better because Perl simply has a more advanced regular expression parsing and will give you the ability to specify the range of digits must be between 7 and 10:
ls | perl -ne 's/.*\D+(\d{7,10}).*/$1/;print if /^\d+$/;'

$ ls -1
blah_blah_123_blah.ext
blah_blah_234_blah.ext
blah_blah_456_blah.ext
Having such files in a directory you run:
$ ls -1 | sed 's/blah_blah_//' | sed 's/_blah.ext//'
123
234
456
or with a single sed run:
$ ls -1 | sed 's/^blah_blah_\([0-9]*\)_blah.ext$/\1/'

This will work for you -
echo $FILENAME | sed -e 's/[^(0-9|)]//g' | sed -e 's/|/,/g'

Related

What does the `l` option mean in GNU sed?

I have read the sed manual for the -l command. There it says:
-l
--line-length=N
Specify the default line-wrap length for the l command. A length of 0 (zero) means to never wrap long lines. If not specified, it is taken to be 70.
I don't know how this is useful. Can someone give me an example?
I think it like this,but it result:
[root#kvm ~]# echo 'abcdefg' | sed -l 3 -n '/a/p'
abcdefg
Why not try it and see?
$ echo 'abcdefg' | sed -l 3 'l'
ab\
cd\
ef\
g$
abcdefg
$ echo 'abcdefg' | sed -l 4 'l'
abc\
def\
g$
abcdefg
From sed manual:
Commands which accept address ranges
...
l List out the current line in a ``visually unambiguous''
form.
l width List out the current line in a ``visually
unambiguous'' form,
breaking it at width characters. This is a GNU extension.
The -l N, --line-length=N option allows to specify the desired line-wrap length for the 'l' command (when the wrap-width argument is not explicitly provided in the sed script).
$ echo abcdefgh | sed -n 'l 5'
abcd\
efgh$
$ echo abcdefgh | sed -n -l 5 'l'
abcd\
efgh$
$ echo abcdefgh | sed -n -l 5 'l 3'
ab\
cd\
ef\
gh$

Count of matching word, pattern or value from unix korn shell scripting is returning just 1 as count

I'm trying to get the count of a matching pattern from a variable to check the count of it, but it's only returning 1 as the results, here is what I'm trying to do:
x="HELLO|THIS|IS|TEST"
echo $x | grep -c "|"
Expected result: 3
Actual Result: 1
Do you know why is returning 1 instead of 3?
Thanks.
grep -c counts lines not matches within a line.
You can use awk to get a count:
x="HELLO|THIS|IS|TEST"
echo "$x" | awk -F '|' '{print NF-1}'
3
Alternatively you can use tr and wc:
echo "$x" | tr -dc '|' | wc -c
3
$ echo "$x" | grep -o '|' | grep -c .
3
grep -c does not count the number of matches. It counts the number of lines that match. By using grep -o, we put the matches on separate lines.
This approach works just as well with multiple lines:
$ cat file
hello|this|is
a|test
$ grep -o '|' file | grep -c .
3
The grep manual says:
grep, egrep, fgrep - print lines matching a pattern
and for the -c flag:
instead print a count of matching lines for each input file
and there is just one line that match
You don't need grep for this.
pipe_only=${x//[^|]} # remove everything except | from the value of x
echo "${#pipe_only}" # output the length of pipe_only
Try this :
$ x="HELLO|THIS|IS|TEST"; echo -n "$x" | sed 's/[^|]//g' | wc -c
3
With only one pipe with perl:
echo "$x" |
perl -lne 'print scalar(() = /\|/g)'

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

sed: interpolating variables in timestamp format

I would like to use sed to extract all the lines between two specific strings from a file.
I need to do this on a script and my two strings are variables.
The strings will be in a sort of time stamp format, which means they can be something like:
2014/01/01 or 2014/01/01 08:01
I was trying with something like:
sed -n '/$1/,/$2/p' $file
or even
sed -n '/"$1"/,/"$2"/p' $file
with no luck, tried also to replace / as delimiter with ;.
I'm pretty sure the problem is due to the / and blank in input variables, but I can't figure out the proper syntax.
The syntax to use alternate regex delimiters is:
\ c regexp c
Match lines matching the regular expression regexp. The c may be any character.
https://www.gnu.org/software/sed/manual/sed.html#Addresses
So, pick one of
sed -n '\#'"$1"'#,\#'"$2"'#p' "$file"
sed -n "\\#$1#,\\#$2#p" "$file"
sed -n "$( printf '\#%s#,\#%s#p' "$1" "$2" )" "$file"
or awk
awk -v start="$1" -v end="$1" '$0 ~ start {p=1}; p; $0 ~ end {p=0}' "$file"
From the first $1 to the last $2:
sed -n "\\#$1#,\$p" "$file" | tac | sed -n "\\#$2#,\$p" | tac
This prints from the first $1 to the end, reverses the lines, prints from the first $2 to the new end, and reverses the lines again.
An example: from the first "5" to the last "7"
$ set -- 5 7
$ seq 20 | sed -n "\\#$1#,\$p" | tac | sed -n "\\#$2#,\$p" | tac
5
6
7
8
9
10
11
12
13
14
15
16
17
Try using double quotes instead of single ones.
sed -n "/$1/,/$2/p" $file

extract characters from filename of newest file

I am writing a bash script where i will need to check a directory for existing files and look at the last 4 digits of the first segment of the file name to set the counter when adding new files to the directory.
Naming Scructure:
yymmddHNAZXLCOM0001.835
I need to put the portion in the example 0001 into a CTR variable so the next file it puts into the directory will be
yymmddHNAZXLCOM0002.835
and so on.
what would be the easiest and shortest way to do this?
You can do this with sed:
filename="yymmddHNAZXLCOM0001.835"
first_part=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1/')
counter=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\2/')
suffix=$(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\3/')
echo "$first_part$(printf "%04u" $(($counter + 1))).$suffix"
=> "yymmddHNAZXLCOM0002.835"
All three sed calls use the same regular expression. The only thing that changes is the group selected to return. There's probably a way to do all of that in one call, but my sed-fu is rusty.
Alternate version, using a Bash array:
filename="yymmddHNAZXLCOM0001.835"
ary=($(echo $filename | sed -e 's/\(.*\)\([0-9]\{4,4\}\)\.\(.*\)/\1 \2 \3/'))
echo "${ary[0]}$(printf "%04u" $((${ary[1]} + 1))).${ary[2]}"
=> "yymmddHNAZXLCOM0002.835"
Note: This version assumes that the filename does not have spaces in it.
Try this...
current=`echo yymmddHNAZXLCOM0001.835 | cut -d . -f 1 | rev | cut -c 1-4 | rev`
next=`echo $current | awk '{printf("%04i",$0+1)}'`
f() {
if [[ $1 =~ (.*)([[:digit:]]{4})(\.[^.]*)$ ]]; then
local -a ctr=("${BASH_REMATCH[#]:1}")
touch "${ctr}$((++ctr[1]))${ctr[2]}"
# ...
else
echo 'no matches'
fi
}
shopt -s nullglob
f *

Resources