Extracting part of a string bounded by special symbols - bash

Hello I am passing strings for example /bin/bash/Xorg.tar.gz to my script which is
for i in $*; do
echo "$(expr match "$i" '\.*\.')"
done
I expect to return Xorg only but it returns 0,any ideas why?

It seems weird that your string would be /bin/bash/Xorg.tar.gz (kinda looks like /bin/bash is a directory or something) but either way, you can use standard parameter expansion to get the part you want:
i=${i##*/}
i=${i%%.*}
First remove everything up to the last /, then remove everything from the first ..

expr match directive attempts to match complete input not partial.
However, you can use builtin BASH regex for this:
[[ "$i" =~ .*/([^./]+)\. ]] && echo "${BASH_REMATCH[1]}"
This will print Xorg for your example argument.

The immediate fix (leaving the loop aside):
$ expr '/path/to/Xorg.tar.gz' : '.*/\([^.]*\)'
Xorg
Note:
: is needed after the input string to signal a regex-matching operation.
Note: expr <string> : <regex> is the POSIX-compliant syntax; GNU expr also accepts expr match <string> <regex>, as in your attempt.
expr implicitly matches from the start of the string, so .*/ must be used to match everything up to the last /
\([^.]*\) is used to match everything up to, but not including, the first . of the filename component; note the \-escaping of the ( and ) (the capture group delimiters), which is needed, because expr only supports (the obsolescent and limited) BREs.
Using a capture group ensures that the matched string is output, whereas by default the count of matching chars. is output.
As for the regex you used:
'\.*\.': \.* matches any (possibly empty) sequence (*) of literal . chars. (\.), implicitly at the start of the string, followed by exactly 1 literal . (\.).
In other words: you tried to match 2 or more consecutive . chars. at the start of the string, which is obviously not what you intended.
Because your regex doesn't contain a capture group, expr outputs the count of matching characters, which in this case is 0, since nothing matches.
That said, calling an external utility in every iteration of a shell loop is inefficient, so consider:
Tom Fenech's helpful answer, which only uses shell parameter expansions.
anubhava's helpful answer, which only uses Bash's built-in regex-matching operator, =~
If you don't actually need a shell loop and are fine with processing all paths with a single command using external utilities, consider this:
basename -a "$#" | cut -d'.' -f1
Note: basename -a, for processing multiple filename operands, is nonstandard, but both GNU and BSD/macOS basename support it.
To demonstrate it in action:
# Set positional parameters with `set`.
$ set -- '/path/to/Xorg.tar.gz' '/path/to/another/File.with.multiple.suffixes'
$ basename -a "$#" | cut -d'.' -f1
Xorg
File

Related

wrong parameters in unix shell [duplicate]

I wrote a BASH file that features multiple embedded loops of the form
for P in {'0.10','0.20', [...] '0.90','1.00'}; do
for Q in {'0.10','0.20', [...] ,'0.90','1.00'}; do
[...]
I use these variables both as parameters for a command line application, and to create file names directly in BASH. I would like to create duplicates, say $P_REP=0_10 that replaces the dot by an underscore without writting a explicit switch statement for every case, or some hardcoded equivalent. The (non-elegant way) I found to go about it is to
dump the content of P,Q to a temporary file.
replace the dot by an underscore using sed 's/./_/ -i.
read the file again and load its content to the new variable.
Hence, I was wondering if it is possible to run a sed like command directly on the content of a variable?
You can do pattern substitution directly in bash:
P_REP=${P/./_}
Q_REP=${Q/./_}
From the bash(1) man page:
Paramter Expansion
${parameter/pattern/string}
Pattern substitution. The pattern is expanded to produce a pattern just as in pathname expansion. Parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with #, it must match at the beginning of the expanded value of parameter. If pattern begins with %, it must match at the end of the expanded value of parameter. If string is null, matches of pattern are deleted and the / following pattern may be omitted. If parameter is # or *, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variable subscripted with # or *, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list.
John Kugelman's answer is fine for your example, but if you need to process the content of a variable with the actual sed program (or some other arbitrary command), you can do it like this:
P_REP=$(sed 's/\./_/' <<< "$P")
For loops you could use:
#!/bin/bash
P_REP=$(for P in '0.10' '0.20' '0.90' '1.00'; do echo ${P/./_} ; done)
Q_REP=$(for Q in '0.10' '0.20' '0.90' '1.00'; do echo ${Q/./_} ; done)
echo ${P_REP[#]}
echo ${Q_REP[#]}
For the exact problem you are mentionning, use John's proposition above.
I would however mention, in case you ever have to do something similar that can't be solved with bash's pattern substitution syntax, that you don't need to actually create temporary files to transform content with sed or similar commands. First, you can pipe a variable directly to a program as STDIN. Second, you may get the output of a command (oeither it's STDOUT, STDERR, or both) directly into a shell variable.
So in your example, you would have had:
for P in 0.10 0.20 [...] 0.90 1.00 ; do
for Q in 0.10 0.20 [...] 0.90 1.00 ; do
P_REP=$( sed 's/\./_/g' <<< "$P" )
Q_REP=$( sed 's/\./_/g' <<< "$Q" )
done
done
Note also that the array syntax (that is { '0.10', '0.20', ...}) is mostly specific to Bash and a very few Bash-followers. When it is easy to do so, you might prefer the more classical approach to for loops in shell, as I domonstrated above. Then your code will safetly execute in all posix-compliant shells.
Why so complicated there is simple solution
You are changing ALL substrings ALL files in Folder / Catalog
ORG="orignal_string"
DES="destination_string"
find . -type f -exec sed -i 's/'"${ORG}"'/'"${DES}"'/g' {} +

KSH Shell script Matching file pattern

I am new to shell script. I want to iterate a directory for the below specific pattern.
Ad_sf_03041500000.dat
SF_AD_0304150.DEL
SF_AD_0404141.EXP
Number of digits should be exactly match with this pattern.
I am using KSH shell script. Could you please help me to iterate only those files in for loop.
The patterns you are looking for are
Ad_sf_{11}([[:digit:]]).dat
SF_AD_{7}([[:digit:]]).DEL
SF_AD_{7}([[:digit:]]).EXP
Note that the {n}(...) pattern, to match exactly n occurrences of the following pattern, is an extension unique to ksh (as far as I know, not even zsh provides an equivalent).
To iterate over matching files, you can use
for f in Ad_sf_{11}(\d).dat SF_AD_{7}(\d).#(DEL|EXP); do
where I've use the "pick one" operator #(...) to combine the two shorter patterns into a single pattern, and I've used \d, which ksh supports as a shorter version of [[:digit:]] when inside parentheses.
Automatic wildcard generation method. Print the filenames with leading text and line numbers...
POSIX shell:
2> /dev/null find \
$(echo Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP |
sed 's/[0-9]/[0-9]/g' ) |
while read f ; do
echo "Here's $f";
done | nl
ksh (with a spot borrowed from Chepner):
set - Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP
for f in ${*//[0-9]/[0-9]} ; do [ -f "$f" ] || continue
echo "Here's $f";
done | nl
Output of either method:
1 Here's Ad_sf_03041500000.dat
2 Here's SF_AD_0304150.DEL
3 Here's SF_AD_0404141.EXP
If the line numbers aren't wanted, omit the | nl. echo can be replaced with whatever command needs to be run on the files.
How the POSIX code works. The OP spec is simple enough to churn out the correct wildcard with a little tweaking. Example:
echo Ad_sf_03041500000.dat SF_AD_0304150.DEL SF_AD_0404141.EXP |
sed 's/[0-9]/[0-9]/g'
Which outputs exactly the patterns needed (line feeds added for clarity):
Ad_sf_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].dat
SF_AD_[0-9][0-9][0-9][0-9][0-9][0-9][0-9].DEL
SF_AD_[0-9][0-9][0-9][0-9][0-9][0-9][0-9].EXP
The patterns above go to find, which prints only the matching filenames, (not the pattern itself when there are no files), then the filenames go to a while loop.
(The ksh variant is the same method but uses pattern substitution, set, and test -f in place of sed and find.)

Extracting snmpdump values (with an exact MIB) from a shell script

I have a a some SNMP dump:
1.3.6.1.2.1.1.2.0|5|1.3.6.1.4.1.9.1.1178
1.3.6.1.2.1.1.3.0|7|1881685367
1.3.6.1.2.1.1.4.0|6|""
1.3.6.1.2.1.1.5.0|6|"hgfdhg-4365.gfhfg.dfg.com"
1.3.6.1.2.1.1.6.0|6|""
1.3.6.1.2.1.1.7.0|2|6
1.3.6.1.2.1.1.8.0|7|0
1.3.6.1.2.1.1.9.1.2.1|5|1.3.6.1.4.1.9.7.129
1.3.6.1.2.1.1.9.1.2.2|5|1.3.6.1.4.1.9.7.115
And need to grep all data in first string after 1.3.6.1.2.1.1.2.0|5|, but not include this start of the string in grep itself. So, I must receive 1.3.6.1.4.1.9.1.1178 in grep. I've tried to use regex:
\b1.3.6.1.2.1.1.2.0\|5\|\s*([^\n\r]*)
But without any success. If a regular expression, or grep, is in fact the right tool, can you help me find the right regex? Otherwise, what tools should I consider instead?
With GNU grep +PCRE support, you can use Perl's \K flag to discard part of the matched string :
grep -Po "1\.3\.6\.1\.2\.1\.1\.2\.0\|5\|\K.*"
-P enables Perl's regex mode and -o switches output to matched parts rather than whole lines.
I had to escape the characters that have special meaning in Perl regexs, but this can be avoided as 123 suggests, by enclosing the characters to interpret literally between \Q and \E :
grep -Po "\Q1.3.6.1.2.1.1.2.0|5|\E\K.*"
I would usually solve this with sed as follows :
sed -n 's/1\.3\.6\.1\.2\.1\.1\.2\.0|5|\(.*\)/\1/p'
The -n flag disables implicit output and the search and replace command will remove the searched prefix from the line, leaving the relevant part to be printed.
The characters that have special meaning in GNU Basic Regular Expressions (BRE) must be escaped, which in this case is only .. Also note that the grouping tokens are \( and \) rather than the usual ( and ).
An alternate way to do this is in native shell, without any regexes at all. Consider:
prefix='1.3.6.1.2.1.1.2.0|5|'
while read -r line; do
[[ $line = "$prefix"* ]] && printf '%s\n' "${line#$prefix}"
done
If your original string is piped into the while read loop, the output is precisely 1.3.6.1.4.1.9.1.1178.

Using egrep and regular expression together

I want to search the below text file for words that ends in _letter, and get the whole portion upto "::". There is no space between any letter
blahblah:/blahblah::abc_letter:/blahblah/blahblah
blahblah:/blahblah::cd_123_letter:/blahblah/blahblah
blahblah:::/blahblah::24_cde_letter:/blahblah/blahblah
blahblah::/blahblah::45a6_letter:/blahblah/blahblah
blahblah:/blahblah::fgh_letter:/blahblah/blahblah
blahblah:/blahblah::789_letter:/blahblah/blahblah
I tried
egrep -o '*_letter'
and
egrep -o "*_letter"
But it only returns the word _letter
then I want to feed the input to the parametre of a shell script for loop. So the script will look like following
for i in [grep command]
mkdir $i
end
It will create the following directories
abc_letter/
cd_123_letter/
24_cde_letter/
45a6_letter/
fgh_letter/
789_letter/
ps: The result between :: and _letter doesn't contain any special character, only alphanumeric character
also my system doesn't have perl
Assuming no spaces or new-lines:
for i in $(sed 's/^.*:\([^/]*_letter\):.*$/\1/g' infile); do
mkdir $i
done
To extract after : to _letter strings from a file.txt and use them in your for loop, you can use the following egrep and revise your: script.sh, like this:
#!/bin/bash
for i in $(egrep -o "[^:]+_letter" file.txt); do
mkdir -p $i
done
Then you run ./script.sh, and later you check with ls, you see:
$ ls -1
24_cde_letter
45a6_letter
789_letter
abc_letter
cd_123_letter
fgh_letter
file.txt
script.sh
Explanation
Your original egrep -o '*_letter' probably just confused bash filename expansion with regular expression,
In bash, *something uses star globbing character to match * = anything here + something.
However in regular expression star * means the preceding character zero or more times. Since * is at the beginning of what you wrote, there is nothing before it, so it does not match anything there.
The only thing egrep can match is _letter, and since we are using the -o option it only displays the match, on an individual line, and thus why you originally only saw a line of _letter matches
Our new changes:
egrep pattern starts with [^ ... ], a negation, matches the opposite of what characters you put within. We put : within.
The + says to match the preceding one or more times.
So combined, it says look for anything-but-:, and do this one or more times.
Thus of course it matches anything after :, and keeps matching, until the next part of the pattern
The next part of the pattern is just _letter
egrep -o so only matched text will be shown, one per line
So in this way, from lines such as:
blahblah:/blahblah::abc_letter:/blahblah/blahblah
It successfully extracts:
abc_letter
Then, changes to your bash script:
Bash command substitution $() to have the results of the egrep command sent to the for-loop
for i value...; do ... done syntax
mkdir -p just a convenience in case you are re-testing, it will not error if directory was already made.
So altogether it helps to extract the pattern you wanted and generate directories with those names.

How truncate the ../ characters from string in bash?

How can I truncate the ../ or .. characters from string in bash
So, If I have strings
str1=../lib
str2=/home/user/../dir1/../dir2/../dir3
then how I can get string without any .. characters in a string like after truncated result should be
str1=lib
str2=/home/user/dir1/dir2/dir3
Please note that I am not interesting in absolute path of string.
You don't really need to fork a sub-shell to call sed. Use bash parameter expansion:
echo ${var//..\/}
str1=../lib
str2=/home/user/../dir1/../dir2/../dir3
echo ${str1//..\/} # Outputs lib
echo ${str2//..\/} # Outputs /home/user/dir1/dir2/dir3
You could use:
pax> str3=$(echo $str2 | sed 's?\.\./??g') ; echo $str3
/home/user/dir1/dir2/dir3
Just be aware (as you seem to be) that's a different path to the one you started with.
If you're going to be doing this infrequently, forking an external process to do it is fine. If you want to use it many times per second, such as in a tight loop, the internal bash commands will be quicker:
pax> str3=${str2//..\/} ; echo $str3
/home/user/dir1/dir2/dir3
This uses bash pattern substitution as described in the man page (modified slightly to adapt to the question at hand):
${parameter/pattern/string}
The parameter is expanded and the longest match of pattern against its value is replaced with string. If pattern begins with /, all matches of pattern are replaced with string.
If string is null, matches of pattern are deleted and the / following pattern may be omitted.
You can use sed to achieve it
sed 's/\.\.\///g'
For example
echo $str2 | sed 's/\.\.\///g'
OP => /home/user/dir1/dir2/dir3

Resources