How can I extract part of a string via a shell script? - bash

The string is setup like so:
href="PART I WANT TO EXTRACT">[link]

use awk
$ echo "href="PART I WANT TO EXTRACT">[link]" | awk -F""" '{print $2}'
PART I WANT TO EXTRACT
Or using shell itself
$ a="href="PART I WANT TO EXTRACT">[link]"
$ a=${a//"/}
$ echo ${a/&*/}
PART I WANT TO EXTRACT

Here's another way in Bash:
$ string="href="PART I WANT TO EXTRACT">[link]"
$ entity="""
$ string=${string#*${entity}*}
$ string=${string%*${entity}*}
$ echo $string
PART I WANT TO EXTRACT
This illustrates two features: Remove matching prefix/suffix pattern and the use of a variable to hold the pattern (you could use a literal instead).

expr "$string" : 'href="\(.*\)">\[link\]'

grep -o "PART I WANT TO EXTRACT" foo
Edit: "PART I WANT TO EXTRACT" can be a regex, i.e.:
grep -o http://[a-z/.]* foo

Related

Extract specific string from line with standard grep,egrep or awk

i'm trying to extract a specific string from a grep output
uci show minidlna
produces a large list
.
.
.
minidlna.config.enabled='1'
minidlna.config.db_dir='/mnt/sda1/usb/db'
minidlna.config.enable_tivo='1'
minidlna.config.wide_links='1'
.
.
.
so i tried to narrow down what i wanted by running
uci show minidlna | grep -oE '\bdb_dir=\S+'
this narrows the output to
db_dir='/mnt/sda1/usb/db'
what i want is to output only
/mnt/sda1/usb/db
without the quotes and without the starting "db_dir" so i can run rm /mnt/sda1/usb/db/file.db
i've used the answers found here
How to extract string following a pattern with grep, regex or perl
and that's as close as i got.
EDIT: after using Ed Morton's awk command i needed to pass the output to rm command.
i used:
| ( read DB; (rm $DB/files.db) .
read DB passes the output into the vairable DB.
(...) combines commands.
rm $DB/files.db deletes the the file files.db.
Is this what you're trying to do?
$ awk -F"'" '/db_dir/{print $2}' file
/mnt/sda1/usb/db
That will work in any awk in any shell on every UNIX box.
If that's not what you want then edit your question to clarify your requirements and post more truly representative sample input/output.
Using sed with some effort to avoid single quotes:
sed -n 's/^minidlna.config.db_dir=\s*\S\(\S*\)\S\s*$/\1/p' input
Well, so you end up having a string like db_dir='/mnt/sda1/usb/db'.
I would first remove the quotes by piping this to
.... | tr -d "'"
Now you end up with a string like db_dir=/mnt/sda1/usb/db.
Say you have this string stored in a variable named confstr, then
${confstr##*=}
gives you just /mnt/sda1/usb/db, since *= denotes everything from the start to the equal sign, and ## denotes removal.
I would do this:
Once you either extracted your line about into file.txt (or pipe it into this command), split the fields using the quote character. Use printf to generate the rm command and pass this into bash to execute.
$ awk -F"'" '{printf "rm %s.db/file.db\n", $2}' file.txt | bash
rm: /mnt/sda1/usb/db.db/file.db: No such file or directory
With your original command:
$ uci show minidlna | grep -oE '\bdb_dir=\S+' | \
awk -F"'" '{printf "rm %s.db/file.db\n", $2}' | bash

Extracting a substring until and including a matching word using bash tools

I have file names like these:
func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-pfobloc_run-01_bold_space-T1w_preproc.nii.gz
func/sub-01_task-rest_run-01_bold_space-T1w_preproc.nii.gz
and from each file name I want to extract the part until and including the word bold so that in the end I have:
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
Any ideas how to do that?
The easiest thing to do is to just remove bold and everything after, then replace bold. Obviously, this only works if the terminating string is fixed, as in this case.
$ f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${f%%bold*}"
func/sub-01_task-biommtloc_run-01_
$ echo "${f%%bold*}bold"
func/sub-01_task-biommtloc_run-01_bold
Is something like this what you want?
echo func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz | sed -e 's#bold_.*$#bold#'
Hope this helps
This is (needlessly) clever: remove the prefix ending with "bold"
and then so some substring index arithmetic based on the length of the suffix that's left over:
$ file=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ echo "$keep"
func/sub-01_task-biommtloc_run-01_bold
If $file does not contain "bold", then $keep will be empty: we can give it the value of $file if it is empty:
$ file=foobar
$ tmp=${file#*bold}
$ keep=${file:0:${#file}-${#tmp}}
$ : ${keep:=$file}
$ echo "$keep"
foobar
But seriously, do what chepner suggests.
using Perl
> echo "func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz" | perl -e 'while (<>) { $_=~s/(.*bold)(.*)/\1/g; print } '
func/sub-01_task-biommtloc_run-01_bold
>
This is similar to glenn's solution, but a bit "less clever" in that it doesn't use substrings, just nested substitutions:
$ while IFS= read -r fname; do echo "${fname%"${fname#*bold}"}"; done < infile
func/sub-01_task-biommtloc_run-01_bold
func/sub-01_task-pfobloc_run-01_bold
func/sub-01_task-rest_run-01_bold
The substitution "${fname%"${fname#*bold}"}" says:
Remove "${fname#*bold}" from the end of each filename, where
"${fname#*bold}" is everything up to and including bold removed from the front of the filename
Example for the first filename with explicit intermediate steps:
$ fname=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.gz
$ echo "${fname#*bold}"
_space-T1w_preproc.nii.gz
$ echo "${fname%"${fname#*bold}"}"
func/sub-01_task-biommtloc_run-01_bold
f=func/sub-01_task-biommtloc_run-01_bold_space-T1w_preproc.nii.g
echo "${f//bold*/bold}"
I would recommend using sed for this task. First take all of your input filenames and stick them in a file, call it namelist.txt in the current directory. The following will work, as long as your sed supports extended regular expressions (which most will, particularly GNU sed). Note that the flag for extended regular expressions may differ a bit between platforms, check your sed manual page. On my Linux, it is -r.
bash -c "sed -r 's/(sub-01_task-.{1,10}_run-01_bold).+/\\1/' namelist.txt"

bash scripting - removing trailing chars from the end of a string

I have a string like that:
rthbp-0.0.3-sources.jar
And i want to get a new string called in someway (like string2) with the original string with just the version , so - '0.0.3' now rthbp- at the start will remain constant but after version "-" (-sources.jar) may change.
is that possible in bash, just to extract the version info ?
I am doing this - echo ${f:6} but only gives me 0.0.3-sources.jar
If you're using bash, you can extract an arbitrary substring by specifying both offset and length:
$ filename=rthbp-0.0.3-sources.jar
$ echo "${filename:6:5}"
#=> 0.0.3
But using exact character offsets like that is fragile. You might want to use something like this:
$ IFS=- read pre version post <<<"$filename"
$ echo "$version"
#=> 0.0.3
Or, somewhat more clunkily:
$ ltrim=${filename%%-*}-
$ rest=${filename#$ltrim}
$ version=${rest%%-*}
Or as others mentioned you could call out to cut or awk to do the splitting for you..
No one has mentioned regular expression matching yet, so I will.
[[ $string1 =~ rthbp-(.*)-sources.jar ]]
version=${BASH_REMATCH[1]}
(You may want a slightly more general regular expression; this just demonstrates how to match against a regular expression containing a capture group and how to extract the captured value.)
You can use read built-in:
s='rthbp-0.0.3-sources.jar'
IFS=- read a ver _ <<< "$s" && echo "$ver"
0.0.3
I would recommend just using cut for this. Define the delimiter as dashes and keep field two:
$ echo "rthbp-0.0.3-sources.jar" | cut -d'-' -f 2
0.0.3
If you want to use pure bash, you can use parameter expansion, but it isn't as clean. Assuming the version always starts in the same place and is the same length, you can use:
$ str="rthbp-0.0.3-sources.jar"
$ echo "${str:6:5}"
0.0.3
Another variant of regex match:
$ echo "rthbp-0.0.3-sources.jar"|grep -e '([[:digit:]]+\.)+[[:digit:]]+' -oP
0.0.3
echo "rthbp-0.0.3-sources.jar" | awk -F- '$2=="0.0.3"{print $2}'
0.0.3
echo "rthbp-0.0.3-sources.jar" | awk -F- '{print $2}'
0.0.3

Reference to a bash variable whose name contains dot

I have a bash variable: agent1.ip with 192.168.100.137 as its value. When I refer to it in echo like this:
echo $agent1.ip
the result is:
.ip
How can I access the value?
UPDATE: my variables are:
Bash itself doesn't understand variable names with dots in them, but that doesn't mean you can't have such a variable in your environment. Here's an example of how to set it and get it all in one:
env 'agent1.ip=192.168.100.137' bash -c 'env | grep ^agent1\\.ip= | cut -d= -f2-'
Since bash.ip is not a valid identifier in bash, the environment string bash.ip=192.168.100.37 is not used to create a shell variable on shell startup.
I would use awk, a standard tool, to extract the value from the environment.
bash_ip=$(awk 'BEGIN {print ENVIRON["bash.ip"]}')
The cleanest solution is:
echo path.data | awk '{print ENVIRON[$1]}'
Try this:
export myval=`env | grep agent1.port | awk -F'=' '{print $2}'`;echo $myval
Is your code nested, and using functions or scripts that use ksh?
Dotted variable names are an advanced feature in ksh93. A simple case is
$ a=1
$ a.b=123
$ echo ${a.b}
123
$ echo $a
1
If you first attempt to assign to a.b, you'll get
-ksh: a.b=123: no parent
IHTH

Removing substring out of string using sed

I am trying to remove substring out of variable using sed like this:
PRINT_THIS="`echo "$fullpath" | sed 's/${rootpath}//' -`"
where
fullpath="/media/some path/dir/helloworld/src"
rootpath=/media/some path/dir
I want to echo just rest of the fullpath like this (i am using this on whole bunch of directories, so I need to store it in variables and do it automatically
echo "helloworld/src"
using variable it would be
echo "Directory: $PRINT_THIS"
Problem is, I can not get sed to remove the substring, what I am I doing wrong? Thanks
You don't need sed for that, bash alone is enough:
$ fullpath="/media/some path/dir/helloworld/src"
$ rootpath="/media/some path/dir"
$ echo ${fullpath#${rootpath}}
/helloworld/src
$ echo ${fullpath#${rootpath}/}
helloworld/src
$ rootpath=unrelated
$ echo ${fullpath#${rootpath}/}
/media/some path/dir/helloworld/src
Check out the String manipulation documentation.
To use variables in sed, you must use it like this :
sed "s#$variable##g" FILE
two things :
I use double quotes (shell don't expand variables in single quotes)
I use another separator that doesn't conflict with the slashes in your paths
Ex:
$ rootpath="/media/some path/dir"
$ fullpath="/media/some path/dir/helloworld/src"
$ echo "$fullpath"
/media/some path/dir/helloworld/src
$ echo "$fullpath" | sed "s#$rootpath##"
/helloworld/src

Resources