how to get substring, starting from the first occurence of a pattern in bash - bash

I'm trying to get a substring from the start of a pattern.
I would simply use cut, but it wouldn't work if the pattern is a few characters long.
if I needed a single-character, delimiter, then this would do the trick:
result=`echo "test String with ( element in parenthesis ) end" | cut -d "(" -f 2-`
edit: sample tests:
INPUT: ("This test String is an input", "in")
OUTPUT: "ing is an input"
INPUT: ("This test string is an input", "in ")
OUTPUT: ""
INPUT: ("This test string is an input", "n")
OUTPUT: "ng is an input"
note: the parenthesis mean that the input both takes a string, and a delimiter string.

EDITED:
In conclusion, what was requested was a way to parse out the text from a string beginning at a particular substring and ending at the end of the line. As mentioned, there are numerous ways to do this. Here's one...
egrep -o "DELIM.*" input
... where 'DELIM' is the desired substring.

Also
awk -v delim="in" '{print substr($0, index($0, delim))}'

This can be done without external programs. Assuming the string to be processed is in $string and the delimiter is DELIM:
result=${string#"${string%%DELIM*}"}
The inner part substitutes $string with everything starting from the first occurrence of DELIM (if any) removed. The outer part then removes that value from the start of $string, leaving everything starting from the first occurrence of DELIM or the empty string if DELIM does not occur. (The variable string remains unchanged.)

Related

In bash how can I get the last part of a string after the last hyphen [duplicate]

I have this variable:
A="Some variable has value abc.123"
I need to extract this value i.e abc.123. Is this possible in bash?
Simplest is
echo "$A" | awk '{print $NF}'
Edit: explanation of how this works...
awk breaks the input into different fields, using whitespace as the separator by default. Hardcoding 5 in place of NF prints out the 5th field in the input:
echo "$A" | awk '{print $5}'
NF is a built-in awk variable that gives the total number of fields in the current record. The following returns the number 5 because there are 5 fields in the string "Some variable has value abc.123":
echo "$A" | awk '{print NF}'
Combining $ with NF outputs the last field in the string, no matter how many fields your string contains.
Yes; this:
A="Some variable has value abc.123"
echo "${A##* }"
will print this:
abc.123
(The ${parameter##word} notation is explained in ยง3.5.3 "Shell Parameter Expansion" of the Bash Reference Manual.)
Some examples using parameter expansion
A="Some variable has value abc.123"
echo "${A##* }"
abc.123
Longest match on " " space
echo "${A% *}"
Some variable has value
Longest match on . dot
echo "${A%.*}"
Some variable has value abc
Shortest match on " " space
echo "${A%% *}"
some
Read more Shell-Parameter-Expansion
The documentation is a bit painful to read, so I've summarised it in a simpler way.
Note that the '*' needs to swap places with the ' ' depending on whether you use # or %. (The * is just a wildcard, so you may need to take off your "regex hat" while reading.)
${A% *} - remove shortest trailing * (strip the last word)
${A%% *} - remove longest trailing * (strip the last words)
${A#* } - remove shortest leading * (strip the first word)
${A##* } - remove longest leading * (strip the first words)
Of course a "word" here may contain any character that isn't a literal space.
You might commonly use this syntax to trim filenames:
${A##*/} removes all containing folders, if any, from the start of the path, e.g.
/usr/bin/git -> git
/usr/bin/ -> (empty string)
${A%/*} removes the last file/folder/trailing slash, if any, from the end:
/usr/bin/git -> /usr/bin
/usr/bin/ -> /usr/bin
${A%.*} removes the last extension, if any (just be wary of things like my.path/noext):
archive.tar.gz -> archive.tar
How do you know where the value begins? If it's always the 5th and 6th words, you could use e.g.:
B=$(echo "$A" | cut -d ' ' -f 5-)
This uses the cut command to slice out part of the line, using a simple space as the word delimiter.
As pointed out by Zedfoxus here. A very clean method that works on all Unix-based systems. Besides, you don't need to know the exact position of the substring.
A="Some variable has value abc.123"
echo "$A" | rev | cut -d ' ' -f 1 | rev
# abc.123
More ways to do this:
(Run each of these commands in your terminal to test this live.)
For all answers below, start by typing this in your terminal:
A="Some variable has value abc.123"
The array example (#3 below) is a really useful pattern, and depending on what you are trying to do, sometimes the best.
1. with awk, as the main answer shows
echo "$A" | awk '{print $NF}'
2. with grep:
echo "$A" | grep -o '[^ ]*$'
the -o says to only retain the matching portion of the string
the [^ ] part says "don't match spaces"; ie: "not the space char"
the * means: "match 0 or more instances of the preceding match pattern (which is [^ ]), and the $ means "match the end of the line." So, this matches the last word after the last space through to the end of the line; ie: abc.123 in this case.
3. via regular bash "indexed" arrays and array indexing
Convert A to an array, with elements being separated by the default IFS (Internal Field Separator) char, which is space:
Option 1 (will "break in mysterious ways", as #tripleee put it in a comment here, if the string stored in the A variable contains certain special shell characters, so Option 2 below is recommended instead!):
# Capture space-separated words as separate elements in array A_array
A_array=($A)
Option 2 [RECOMMENDED!]. Use the read command, as I explain in my answer here, and as is recommended by the bash shellcheck static code analyzer tool for shell scripts, in ShellCheck rule SC2206, here.
# Capture space-separated words as separate elements in array A_array, using
# a "herestring".
# See my answer here: https://stackoverflow.com/a/71575442/4561887
IFS=" " read -r -d '' -a A_array <<< "$A"
Then, print only the last elment in the array:
# Print only the last element via bash array right-hand-side indexing syntax
echo "${A_array[-1]}" # last element only
Output:
abc.123
Going further:
What makes this pattern so useful too is that it allows you to easily do the opposite too!: obtain all words except the last one, like this:
array_len="${#A_array[#]}"
array_len_minus_one=$((array_len - 1))
echo "${A_array[#]:0:$array_len_minus_one}"
Output:
Some variable has value
For more on the ${array[#]:start:length} array slicing syntax above, see my answer here: Unix & Linux: Bash: slice of positional parameters, and for more info. on the bash "Arithmetic Expansion" syntax, see here:
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Arithmetic-Expansion
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Arithmetic
You can use a Bash regex:
A="Some variable has value abc.123"
[[ $A =~ [[:blank:]]([^[:blank:]]+)$ ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
Prints:
abc.123
That works with any [:blank:] delimiter in the current local (Usually [ \t]). If you want to be more specific:
A="Some variable has value abc.123"
pat='[ ]([^ ]+)$'
[[ $A =~ $pat ]] && echo "${BASH_REMATCH[1]}" || echo "no match"
echo "Some variable has value abc.123"| perl -nE'say $1 if /(\S+)$/'

Replace the last character in string

How can I just replace the last character (it's a }) from a string? I need everything before the last character but replace the last character with some new string.
I tried many things with awk and sed but didn't succeed.
For example:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
}'
should become:
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cf2 Its red now
}'
This replaces the last occurrence of:
}
with
\\cf2 Its red now
}
sed would do this:
# replace '}' in the end
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/}$/\\cf2 Its red now}/'
# replace any last character
echo '\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural \f0 }' | sed 's/\(.\)$/\\cf2 Its red now\1/'
Replacing the trailing } could be done like this (with $ as the PS1 prompt and > as the PS2 prompt):
$ str="...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
> \\f0
> }"
$ echo "$str"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
}
$ echo "${str%\}}\cf2 It's red now
}"
...\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural
\f0
\cf2 It's red now
}
$
The first 3 lines assign your string to my variable str. The next 4 lines show what's in the string. The 2 lines:
echo "${str%\}}\cf2 It's red now
}"
contain a (grammar-corrected) substitution of the material you asked for, and the last lines echo the substituted value.
Basically, ${str%tail} removes the string tail from the end of $str; I remember % ends in 't' for tail (and the analogous ${str#head} has hash starting with 'h' for head).
See shell parameter expansion in the Bash manual for the remaining details.
If you don't know the last character, you can use a ? metacharacter to match the end instead:
echo "${str%?}and the extra"
First make a string with newlines
str=$(printf "%s\n%s\n%s" '\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural' '\\f0' "}'")
Now you look for the last } in your string and replace it including a newline.
The $ makes sure it will only replace it at the last line, & stands for the matches string.
echo "${str}" |sed '$ s/}[^}]$/\\\\cf2 Its red now\n&/'
The above solution only works when the } is at the last line. It becomes more difficult when you also want to support str2:
str2=$(printf "Extra } here.\n%s\nsome other text" "${str}")
You can not match the } on the last line. Removing the address $ for the last line will result in replacing all } characters (I added a } at the beginning of str2). You only want to replace the last one.
Replacing once is forced with ..../1. Replacing the last and not the first is done by reversing the order of lines with tac. Since you will tac again after the replacement, you need to use a different order in your sedreplacement string.
echo "${str2}" | tac |sed 's/}[^}]$/&\n\\\\cf2 Its red now/1' |tac
In awk:
$ awk ' BEGIN { RS=OFS=FS="" } $NF="\\\\cf2 Its red now\n}"' file
RS="" sets RS to an empty record (change it to suit your needs)
OFS=FS="" separates characters each to its own field
$NF="\\\\cf2 Its red now\n}" replaces the char in the last field ($NF=}) with the quoted text
awk '{sub(/\\f0/,"\\f0\n\\\\\cfs Its red now")}1' file
...\\tx4535\\tx5102\\tx5669\\tx6236\\tx6803\\pardirnatural
\\f0
\\cfs Its red now
}'

Copy text from one line and create a new line with that next under it

I have a text file in which I want to find all of ID:= "abc123" when it finds that I want it to take that value of abc123 and create a new line and have a set string, newId:= "abc123 How can I do this within terminal?
I'd like to use bash, below are some examples, find the string '"ID": ", copy the value (abc123) and make a new line with this data.
"ID": "abc123"
"newID": "abc123"
You can do this:
sed -e 's/^"ID": "\(.*\)"/&\
"newID": "\1"/' myfile.txt
First, I'll try to explain the regular expression that searches for matches:
^ Matches the start of the line
"ID": " Matches that exact string
\(.*\) Matches a sequence of zero or more (*) of any character (.). Placing this expression between backslashed parenthesis creates a "capture", which allows us to store the resulting part of the match into an auxiliary variable \1.
" Matches the double-quote character
When it finds a match, it replaces it with:
& the match itself. This operator is an auxiliary variable that represents what was matched.
\<new-line> the backslash followed by an actual new line character escapes a new line, ie. it allows us to print a new line character into the replacement
"newId": " prints that exact string
\1 prints the contents of our capture, so it prints the ID we found
" prints a double quote character.
Hope this helps =)
Try doing this :
sed -r 's#^"ID": "([a-Z0-9]+)"#"newID": "\1"#' file.txt
sed : the executable
-r : extented mode (no need to backslash parenthesis)
s : we perform a substitution, skeleton is s#origin#replacement# (the separator can be anything)
^ : means start of line in regex
( ) : parenthesis is a capture
"newID": is the start of the new string
\1 : is the end of the substituted string (the captured string)
Considering your question is very vague I made some assumptions which will become apparent in my implementation.
INPUT FILE -- call it t
ID="one"
dkkd
ID="two"
ffkjf
ID="three"
ldl
Command ran on input file
for line in `cat t`; do newID=`echo $line | grep ID | cut -d= -f2`; if [[ "$newID" != "" ]]; then echo $line >> t2; echo newID=$newID >> t2; else echo $line >> t2; fi; done
OUTPUT FILE -- Name is t2 (apparent from the command)
ID="one"
newID="one"
dkkd
ID="two"
newID="two"
ffkjf
ID="three"
newID="three"
ldl
Basically this command goes line by line in the file (in this case called t) looks for an ID line. If it finds one it gets its value, prints the original line with the ID and then prints another one with a newID following right after. If the line in question does not have and ID then it just prints the line it self.
Things to note:
If you have any other line in the file that contains "ID" in it but is not the normal ID that you requested, this will not work.

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

pattern matching in ruby

cud any body tell me how this expression works
output = "#{output.gsub(/grep .*$/,'')}"
before that opearation value of ouptput is
"df -h | grep /mnt/nand\r\n/dev/mtdblock4 248.5M 130.7M 117.8M 53% /mnt/nand\r\n"
but after opeartion it comes
"df -h | \n/dev/mtdblock4 248.5M 248.5M 130.7M 117.8M 53% /mnt/nand\r\n "
plzz help me
Your expression is equivalent to:
output.gsub!(/grep .*$/,'')
which is much easier to read.
The . in the regular expression matches all characters except newline by default. So, in the string provided, it matches "grep /mnt/nand", and will substitute a blank string for that. The result is the provided string, without the matched substring.
Here is a simpler example:
"hello\n\n\nworld".gsub(/hello.*$/,'') => "\n\n\nworld"
In both your provided regex, and the example above, the $ is not necessary. It is used as an anchor to match the end of a line, but since the pattern immediately before it (.*) matches everything up to a newline, it is redundant (but does not cause harm).
Since gsub returns a string, your first line is exactly the same as
output = output.gsub(/grep .*$/, '')
which takes the string and removes any occurance of the regexp pattern
/grep .*$/
i.e. all parts of the string that start with 'grep ' until the end of the string or a line break.
There's a good regexp tester/reference here. This one matches the word "grep", then a space, then any number of characters until the next line-break (\r or \n). "." by itself means any character, and ".*" together means any number of them, as many as possible. "$" means the end of a line.
For the '$', see here http://www.regular-expressions.info/reference.html
".*$" means "take every character from the end of the string" ; but the parser will interpret the "\n" as the end of a line, so it stops here.

Resources