Split from 40900000 to 409-00-000 - bash

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance

Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)

No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh

You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'

For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000

With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.

Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000

Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Related

Regular expression using sed in UNIX

I want to replace variable using sed .To replace i need to know what is present in a file ,So i want to extract that string using regular expression .
$ cat file1.txt
select * from ${database_name}.tab_name;
I want to take ${type_database_name_env} into a string and use sed replace command to replace that variable with actual name
sed -n 's/[${][a-z][_][a-z][_][a-z][_][a-z][}]/,/./p' file1.txt
I need output as
$ var1=`sed command` # I am looking for proper sed command
$ echo $var1
${database_name}
With grep, you may use
var1="$(grep -o '\${[^{}]*}' file1.txt | head -1)"
The | head -1 is used to exract the first match in case there are more.
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(grep -o '\${[^{}]*}' <<< "$f" | head -1)"
echo "$var1"
With sed, you may use
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' file"
See the online demo:
f='select * from ${database_name}.tab_name;'
var1="$(sed -En 's/.*(\$\{[^{}]*}).*/\1/p' <<< $f)"
echo "$var1"
# => ${database_name}
Regex details
.* - matches 0+ chars
(\$\{[^{}]*}) - captures into Group 1 (\1) a $ char followed with {, 0+ chars other than { and } and then a }
.* - matches 0+ chars.
As the replacement is the reference to the Group 1 text, it is all there remains after sed does its job. Note the -E option: it enables the POSIX ERE syntax where (...) are used to specify a capturing group, not \(...\).
You could just use awk:
$ awk -F'[ .]+' '{print $4}' file
${database_name}

using Grep or Sed to get a text beetween {}

first thread on Stack Overflow,
I'm learning bash and i can't figure how to use Grep or Sed to a specific used. i want to extract/print all the data beetween specific characters like { and } or [ and ].
I've search a lot, but i can't find anything related to get something if the two characters are not on the same line.
I hope you can help me !
Thanks in advance
Didn't realize that OP has { and } in two separate lines. sed would be easier,
$ sed -n '/{/,/}/{//!p}' inputfile
For square brackets, you have to escape the characters:
$ sed -n '/\[/,/\]/{//!p}' inputfile
inputfile:
$ cat inputfile
Some text inside
{
between braces
}
some other text
[
between square bracket
]
some more text
output:
$ sed -n '/{/,/}/{//!p}' inputfile
between braces
$ sed -n '/\[/,/\]/{//!p}' inputfile
between square bracket
If they are on the same line, use perl-style-regex in grep and option -o:
$ echo 'Some text {between}' | grep -o -P '(?<=\{).*(?=\})'
between
$ echo 'Some text [between]' | grep -o -P '(?<=\[).*(?=\])'
between
You can use sed for the {} case like this
sed 's!.*{\(.*\)}.*!\1!'
\1 is a 'Remember pattern' that remembers everything that is within (.*)
You can try this sed but it's better with awk
sed '
/{/!d
s/[^{]*//
:A
$bB
N
/}/!bA
:B
s/}[^}]*$/}/
t
d
' infile
With modern grep (2.6.3+) - it's easy:
[root#s]$ cat test
aa{bb]
cc}dd [ee] ff [gg
hh] ii {jj} kk
[root#s]$ <test grep -z -P '({[^}]+}|\[[^]]+\])' -o
{bb]
cc}
[ee]
[gg
hh]
{jj}
If Your grep is 2.5.1 or lower (where -z didn't exist and -P was poorly-implemented) - the input needs to be converted to 1-line first.
Example: tr '\n' '\t' replaces all new-lines with tab-characters (if desired - opposite replacement can be done after processing).
[root#s]$ <test tr '\n' '\t'|grep '\(\[[^]]*\]\|{[^}]*}\)' --color=always|tr '\t' '\n'
aa{bb]
cc}dd [ee] ff [gg
hh] ii {jj} kk
[root#s]$ <test tr '\n' '\t'|grep '\(\[[^]]*\]\|{[^}]*}\)' -o
{bb] cc}
[ee]
[gg hh]
{jj}
For both versions You can choose which presentation (-o or --color=always) is more appealing.
P.S. all of the above assumes there's no nesting/escaping of those { } [ ] characters in the input

How to remove special characters from strings but keep underscores in shell script

I have a string that is something like "info_A!__B????????C_*". I wan to remove the special characters from it but keep underscores and letters. I tried with [:word:] (ASCII letters and _) character set, but it says "invalid character set". any idea how to handle this ? Thanks.
text="info_!_????????_*"
if [ -z `echo $text | tr -dc "[:word:]"` ]
......
Using bash parameter expansion:
$ var='info_A!__B????????C_*'
$ echo "${var//[^[:alnum:]_]/}"
info_A__BC_
A sed one-liner would be
sed 's/[^[:alnum:]_]//g' <<< 'info_!????????*'
gives you
info_
An awk one-liner would be
awk '{gsub(/[^[:alnum:]_]/,"",$0)} 1' <<< 'info_!??A_??????*pi9ngo^%$_mingo745'
gives you
info_A_pi9ngo_mingo745
If you don't wish to have numbers in the output then change :alnum: to :alpha:.
My tr doesn't understand [:word:]. I had to do like this:
$ x=$(echo 'info_A!__B????????C_*' | tr -cd '[:alnum:]_')
$ echo $x
info_A__BC_
Not sure if its robust way but it worked for your sample text.
sed one-liner:
echo "SamPlE_#tExT%, really ?" | sed -e 's/[^a-z^A-Z|^_]//g'
SamPlE_tExTreally

bash script command output execution doesn't assign full output when using backticks

I used many times [``] to capture output of command to a variable. but with following code i am not getting right output.
#!/bin/bash
export XLINE='($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER'
echo 'Original XLINE'
echo $XLINE
echo '------------------'
echo 'Extract all word with $ZWP'
#works fine
echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP
echo '------------------'
echo 'Assign all word with $ZWP to XVAR'
#XVAR doesn't get all the values
export XVAR=`echo $XLINE | sed -e 's/\$/\n/g' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP` #fails
echo "$XVAR"
and i get:
Original XLINE
($ZWP_SCRIP_NAME),$ZWP_LT_RSI_TRIGGER)R),$ZWP_RTIMER
------------------
Extract all word with $ZWP
ZWP_SCRIP_NAME
ZWP_LT_RSI_TRIGGER
ZWP_RTIMER
------------------
Assign all word with $ZWP to XVAR
ZWP_RTIMER
why XVAR doesn't get all the values?
however if i use $() to capture the out instead of ``, it works fine. but why `` is not working?
Having GNU grep you can use this command:
XVAR=$(grep -oP '\$\KZWP[A-Z_]+' <<< "$XLINE")
If you pass -P grep is using Perl compatible regular expressions. The key here is the \K escape sequence. Basically the regex matches $ZWP followed by one or more uppercase characters or underscores. The \K after the $ removes the $ itself from the match, while its presence is still required to match the whole pattern. Call it poor man's lookbehind if you want, I like it! :)
Btw, grep -o outputs every match on a single line instead of just printing the lines which match the pattern.
If you don't have GNU grep or you care about portability you can use awk, like this:
XVAR=$(awk -F'$' '{sub(/[^A-Z_].*/, "", $2); print $2}' RS=',' <<< "$XLINE")
First, the smallest change that makes your code "work":
echo "$XLINE" | tr '$' '\n' | sed -e 's/.*\(ZWP[_A-Z]*\).*/\1/g' | grep ZWP_
The use of tr replaces a sed expression that didn't actually do what you thought it did -- try looking at its output to see.
One sane alternative would be to rely on GNU grep's -o option. If you can't do that...
zwpvars=( ) # create a shell array
zwp_assignment_re='[$](ZWP_[[:alnum:]_]+)(.*)' # ...and a regex
content="$XLINE"
while [[ $content =~ $zwp_assignment_re ]]; do
zwpvars+=( "${BASH_REMATCH[1]}" ) # found a reference
content=${BASH_REMATCH[2]} # stuff the remaining content aside
done
printf 'Found variable: %s\n' "${zwpvars[#]}"

Converting CamelCase to lowerCamelCase with POSIX Shell

I am trying to only change the first letter of a string to lowercase using a Shell script. Ideally a simple way to go from CamelCase to lowerCamelCase.
GOAL:
$DIR="SomeString"
# missing step
$echo $DIR
someString
I have found some great resources for doing this to the entire string but not just altering the first letter and leaving the remaining string untouched.
If your shell is recent enough, you can use the following parameter expansion:
DIR="SomeString" # Note the missing dollar sign.
echo ${DIR,}
Alternative solution (will work on old bash too)
DIR="SomeString"
echo $(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}
prints
someString
for assing to variable
DIR2="$(echo ${DIR:0:1} | tr "[A-Z]" "[a-z]")${DIR:1}"
echo $DIR2
prints
someString
alternative perl
DIR3=$(echo SomeString | perl -ple 's/(.)/\l$1/')
DIR3=$(echo SomeString | perl -nle 'print lcfirst')
DIR3=$(echo "$DIR" | perl -ple 's/.*/lcfirst/e'
some terrible solutions;
DIR4=$(echo "$DIR" | sed 's/^\(.\).*/\1/' | tr "[A-Z]" "[a-z]")$(echo "$DIR" | sed 's/^.//')
DIR5=$(echo "$DIR" | cut -c1 | tr '[[:upper:]]' '[[:lower:]]')$(echo "$DIR" | cut -c2-)
All the above is tested with OSX's /bin/bash.
With sed:
var="SomeString"
echo $var | sed 's/^./\L&/'
^ means the start of the line
\L is the command to make the match in lowercase
& is the whole match
Perl solution:
DIR=SomeString
perl -le 'print lcfirst shift' "$DIR"
Since awk hasn't yet been mentioned, here's another way you could do it (requires GNU awk):
dir="SomeString"
new_dir=$(awk 'BEGIN{FS=OFS=""}{$1=tolower($1)}1' <<<"$dir")
This sets the input and output field separators to an empty string, so each character is a field. The tolower function does what you think it does. 1 at the end prints the line. If your shell doesn't support <<< you can do echo "$dir" | awk ... instead.
If you are looking for a POSIX compliant solution then have a look at typeset.
var='SomeString'
typeset -lL1 b="$var"
echo "${b}${var#?}"
Output:
someString
The typeset command creates a special variable that is lowercase, left aligned and one char long. ${var#?} trims the first occurrence of pattern from the start of $var and ? matches a single
character.

Resources