So basically something like expr index '0123 some string' '012345789' but reversed.
I want to find the index of the first character that is not one of the given characters...
I'd rather not use RegEx, if it is possible...
You can remove chars with tr and pick the first from what is left
left=$(tr -d "012345789" <<< "0123_some string"); echo ${left:0:1}
_
once you have the char to find the index follow the same
expr index "0123_some string" ${left:0:1}
5
Using gnu awk and FPAT you can do this:
str="0123 some string"
awk -v FPAT='[012345789]+' '{print length($1)}' <<< "$str"
4
awk -v FPAT='[02345789]+' '{print length($1)}' <<< "$str"
1
awk -v FPAT='[01345789]+' '{print length($1)}' <<< "$str"
2
awk -v FPAT='[0123 ]+' '{print length($1)}' <<< "$str"
5
I know this is in Perl but I got to say that I like it:
$ perl -pe '$i++while s/^\d//;$_=$i' <<< '0123 some string'
4
In case of 1-based index you can use $. which is initialized at 1 when dealing with single lines:
$ perl -pe '$.++while s/^\d//;$_=$.' <<< '0123 some string'
5
I'm using \d because I assume that you by mistake left out the number 6 from the list 012345789
Index is currently pointing to the space:
0123 some string
^ this space
Even if shell globing might look similar, it is not a regex.
It could be done in two steps: cut the string, count characters (length).
#!/bin/dash
a="$1" ### string to process
b='0-9' ### range of characters not desired.
c=${a%%[!$b]*} ### cut the string at the first (not) "$b".
echo "${#c}" ### Print the value of the position index (from 0).
It is written to work on many shells (including bash, of course).
Use as:
$ script.sh "0123_some string"
4
$ script.sh "012s3_some string"
3
Related
I have a string of form FOO_123_BAR.bazquux, where FOO and BAR are fixed strings, 123 is a number and bazquux is freeform text.
I need to perform a text transformation on this string: extract 123 and bazquux, increment the number and then arrange them in a different string.
For example, FOO_123_BAR.bazquux ⇒ FOO=124 BAR=bazquux.
(Actual transformation is more complex.)
Naturally, I can do this in a sequence of sed and expr calls, but it's ugly:
shopt -s lastpipe
in=FOO_123_BAR.bazquux
echo "$in" | sed -r 's|^FOO_([0-9]+)_BAR\.(.+)$|\1 \2|' | read number text
out="FOO=$((number + 1)) BAR=$text"
Is there a more powerful text processing tool that can do the job in a single invocation? If yes, then how?
Edit: I apologize for not making this clearer, but the exact structure of the input and output is an example. Thus, I prefer general solutions that work with any delimiters or absence thereof, rather than solutions that depend on e. g. presence of underscores.
With GNU sed, you can execute the entire replacement string as an external command using the e flag.
$ s='FOO_123_BAR.bazquux'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
FOO=124 BAR=bazquux
To avoid conflict with shell metacharacters, you need to quote the unknown portions:
$ s='FOO_123_BAR.$x(1)'
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\2/e'
sh: 1: Syntax error: "(" unexpected
$ echo "$s" | sed -E 's/^FOO_([0-9]+)_BAR\.(.+)$/echo FOO=$((\1 + 1)) BAR=\x27\2\x27/e'
FOO=124 BAR=$x(1)
Using any awk in any shell on every UNIX box and assuming none of your substrings contain _ or .:
$ s='FOO_123_BAR.bazquux'
$ echo "$s" | awk -F'[_.]' '{print $1"="$2+1,$3"="$4}'
FOO=124 BAR=bazquux
You may do it with perl:
perl -pe 's|^FOO_([0-9]+)_BAR\.(.+)$|"FOO=" . ($1 + 1) . " BAR=" . $2|e' <<< "$in"
See the online demo
The ($1 + 1) will increment the number captured in Group 2.
Could you please try following, written and tested with shown samples in GNU awk.
1st solution: Adding solution with match function awk.
echo "FOO_123_BAR.bazquux" |
awk '
match($0,/FOO_[0-9]+_BAR/){
split(substr($0,RSTART,RLENGTH),array,"_")
print array[1]"="array[2]+1,array[3] "=" substr($0,RSTART+RLENGTH+1)
}'
2nd solution:
echo "FOO_123_BAR.bazquux" |
awk '
BEGIN{
FS="_"
}
{
$2+=1
sub(/_/,"=")
sub(/_/," ")
sub(/\./,"=")
}
1'
A pure bash one-liner would be
[[ $s =~ FOO_([0-9]+)_BAR\.(.*) ]] && echo "FOO=$((BASH_REMATCH[1] + 1)) BAR=${BASH_REMATCH[2]}"
assuming the variable s is set to the string that is being parsed before calling that line (s=FOO_123_BAR.bazquux).
Using var substitution:
in=FOO_123_BAR.bazquux
raw=(${in//_/ })
$ echo "$raw=$[raw[1]+1] ${raw[2]//./=}"
FOO=124 BAR=bazquux
I am trying to extract two pieces of data from a string and I have having a bit of trouble. The string is formatted like this:
11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd
What I am trying to achieve is to print the first column (11111111-2222:3333:4444:555555555555) and the third section of the colon string (cccccccc), on the same line with a space between the two, as the first column is an identifier. Ideally in a way that can just be run as one-line from the terminal.
I have tried using cut and awk but I have yet to find a good way to make this work.
How about a sed expression like this?
echo "11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd" |
sed -e "s/\(.*\) .*:.*:\(.*\):.*/\1 \2/"
Result:
11111111-2222:3333:4444:555555555555 cccccccc
The following awk script does the job without relying on the format of the first column.
awk -F: 'BEGIN {RS=ORS=" "} NR==1; NR==2 {print $3}'
Use it in a pipe or pass the string as a file (simply append the filename as an argument) or as a here-string (append <<< "your string").
Explanation:
Instead of lines this awk script splits the input into space-separated records (RS=ORS=" "). Each record is subdivided into :-separated fields (-F:). The first record will be printed as is (NR==1;, that's the same as NR==1 {print $0}). In the second record, we will only print the 3rd field (NR==2 {print {$3}}); in case of the record aaa:bbb:ccc:ddd the 3rd field is ccc.
I think the answer from user803422 is better but here's another option. Maybe it'll help you use cut in the future.
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
first=$(echo "$str" | cut -d ' ' -f1)
second=$(echo "$str" | cut -d ':' -f6)
echo "$first $second"
With pure Bash Regex:
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
echo "$([[ $str =~ (.*\ ).*:.*:([^:]*) ]])${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Explanations:
[[ $str =~ (.*\ ).*:.*:([^:]* ]]: Match $str against the POSIX Extended RegEx (.*\ ).*:.*:([^:]*) witch contains two capture groups: 1: (.*\ ) 0 or more of any characters, followed by a space; and capture group 2: ([^:]*) witch contains any number of characters that are not :.
$([[ $str =~ (.*\ ).*:.*:([^:]*) ]]): execute the RegEx match in a sub-shell during the string value expansion. (here it produces no output, but the RegEx captured groups are referenced later).
${BASH_REMATCH[1]}${BASH_REMATCH[2]}: expand the content of the RegEx captured groups that Bash keeps in the dedicated $BASH_REMATCH array.
I want to read below file line by line from a text file and print how I want in shell scripting
Text file content:
zero#123456
one#123
two#12345678
I want to print this as:
zero#1-6
one#1-3
two#1-8
I tried the following:
file="readFile.txt"
while IFS= read -r line
do echo "$line"
done <printf '%s\n' "$file"
Create a script like below: my_print.sh
file="readFile.txt"
while IFS= read -r line
do
one=$(echo $line| awk -F'#' '{print $1}') ## This splits the line based on '#' and picks the 1st value. So, we get zero from 'zero#123456 '
len=$(echo $line| awk -F'#' '{print $2}'|wc -c) ## This takes the 2nd value which is 123456 and counts the number of characters
two=$(echo $line| awk -F'#' '{print $2}'| cut -c 1) ## This picks the 1st character from '123456' which is 1
three=$(echo $line| awk -F'#' '{print $2}'| cut -c $((len-1))) ## This picks the last character from '123456' which is 6
echo $one#$two-$three ## This is basically printing the output in the format you wanted 'zero#1-6'
done <"$file"
Run it like:
mayankp#mayank:~/$ sh my_print.sh
mayankp#mayank:~/$ cat output.txt
zero#1-6
one#1-3
two#1-8
Let me know of this helps.
It's no shell scripting (missed that first, sorry) but using perl with combined lookahead and lookbehind for a number:
$ perl -pe 's/(?<=[0-9]).*(?=[0-9])/-/' file
Text file content:
zero#1-6
one#1-3
two#1-8
Explained some:
s//-/ replace with a -
(?<=[0-9]) positive lookbehind, if preceeded by a number
(?=[0-9]) positive lookahead, if followed by a number
With sed:
sed -r 's/^(.+)#([0-9])[0-9]*([0-9])\s*$/\1#\2-\3/' readFile.txt
-r: using extented regular expressions (just to write some stuff without escaping them by a backslash)
s/expr1/expr2/: substitute expr1 by expr2
epxr1 is described by a regular expression, relevant matching patterns are caught by 3 capturing groups (parenthesized ones).
epxr2 retrieves captured strings (\1, \2, \3) and insert them in a formatted output (the one you wanted).
Regular-Expressions.info seems to be interesting to start with them. Also you can check your own regexp with Regx101.com.
Update: Also you could do that with awk:
awk -F'#' '{ \
gsub(/\s*/,"", $2) ; \
print $1 "#" substr($2, 1, 1) "-" substr($2, length($2), 1) \
}' < test.txt
I added a gsub() call because your file seems to have trailing blank characters.
I am trying to write a script that does the following:
Given a string that look like this "There are 5 apples and 3 oranges"
Extract the two integers (5, 3)
Compare them
I got the extract part done.
NUM=echo $String | grep -o "[0-9]\+"
But NUM will be something like this:
5
3
\n
I tried ${NUM[0]} and ${NUM[#]} just to get the first value but it doesn't work.
Any suggestions?
I would do this with process substitution and mapfile:
$ mapfile -t nums < <(grep -Eo '[[:digit:]]+' <<< 'There are 5 apples and 3 oranges')
$ declare -p nums
declare -a nums='([0]="5" [1]="3")'
This makes sure that only newlines separate array elements. This wouldn't be a problem in this case as the search terms are sequences of numbers, but it's a robust approach that would work for any pattern.
Notice that mapfile requires Bash 4.0 or newer.
The way you assign to NUM is incorrect.
So is the grep pattern in your post.
Write like this:
input='There are 5 apples and 3 oranges'
nums=($(grep -Eo '[0-9]+' <<< "$input"))
${nums[0]} will contain the first number, ${nums[1]} the 2nd, and so on.
If the input comes from a command:
nums=($(cmd | grep -Eo '[0-9]+'))
With GNU awk for FPAT:
$ echo 'There are 5 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
greater
$ echo 'There are 2 apples and 3 oranges' |
awk -v FPAT='[0-9]+' '{print ($1 > $2 ? "greater" : "lesser")}'
lesser
with GNU awk:
gawk '{if($1>$2){print $1">"$2}else if($1<$2){print $1"<"$2} else {print $1"="$2}}' FPAT='[0-9]+' <<<'There are 5 apples and 8 oranges'
The value of FPAT should be a string that provides a regular
expression. This regular expression describes the contents of each
field.
How can covert a string of form ABC_DEF_GHI to AbcDefGhi using any online command such as sed etc. ?
Here's a one-liner using gawk:
echo ABC_DEF_GHI | gawk 'function cap(s){return toupper(substr(s,1,1))tolower(substr(s,2))}{n=split($0,x,"_");for(i=1;i<=n;i++)o=o cap(x[i]); print o}'
AbcDefGhi
Optimized awk 1-liner
awk -v RS=_ '{printf "%s%s", substr($0,1,1), tolower(substr($0,2))}'
Optimized sed 1-liner
sed 's/\(.\)\(..\)_\(.\)\(..\)_\(.\)\(..\)/\1\L\2\U\3\L\4\U\5\L\6/'
Edit:
Here's a gawk version:
gawk -F_ '{for (i=1;i<=NF;i++) printf "%s%s",substr($i,1,1),tolower(substr($i,2)); printf "\n"}'
Original:
Using sed for this is pretty scary:
sed -r 'h;s/(^|_)./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^|_)(.))[^_]*/\3\n/g;G;:a;s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/\n//g'
Here it is broken down:
# make a copy in hold space
h;
# replace all the characters which will remain upper case with newlines
s/(^|_)./\n/g;
# lowercase all the remaining characters
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;
# swap the copy into pattern space and the lowercase characters into hold space
x;
# discard all but the characters which will remain upper case
s/((^|_)(.))[^_]*/\3\n/g;
# append the lower case characters to the end of pattern space
G;
# top of the loop
:a;
# shuffle the lower case characters back into their proper positions (see below)
s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;
# if a replacement was made, branch to the top of the loop
ta;
# remove all the newlines
s/\n//g
Here's how the shuffle works:
At the time it starts, this is what pattern space looks like:
A
D
G
bc
ef
hi
The shuffle loop picks up the string that's between the last newline and the end and moves it to the position before the two consecutive newlines (actually three) and moves the extra newline so it's before the character that it previously followed.
After the first step through the loop, this is what pattern space looks like:
A
D
Ghi
bc
ef
And processing proceeds similarly until there's nothing before the extra newline at which point the match fails and the loop branch is not taken.
If you want to title case a sequence of words separated by spaces, the script would be similar:
$ echo 'BEST MOVIE THIS YEAR' | sed -r 'h;s/(^| )./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^| ).)[^ ]*/\1\n/g;G;:a;s/(^.*)( [^\n]*)\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/^([^\n]*)(.*)\n([^\n]*)$/\1\3\2/;s/\n//g'
Best Movie This Year
One liner using perl:
$ echo 'ABC_DEF_GHI' | perl -npe 's/([A-Z])([^_]+)_?/$1\L$2\E/g;'
AbcDefGhi
This might work for you:
echo "ABC_DEF_GHI" |
sed 'h;s/\(.\)[^_]*\(_\|$\)/\1/g;x;y/'$(printf "%s" {A..Z} / {a..z})'/;G;:a;s/\(\(^[a-z]\)\|_\([a-z]\)\)\([^\n]*\n\)\(.\)/\5\4/;ta;s/\n//'
AbcDefGhi
Or using GNU sed:
echo "ABC_DEF_GHI" | sed 's/\([A-Z]\)\([^_]*\)\(_\|$\)/\1\L\2/g'
AbcDefGhi
Less scary sed version with tr:
echo ABC_DEF_GHI | sed -e 's/_//g' - | tr 'A-Z' 'a-z'