Bash string manipulation, extracting/removing parts - bash

I'm modifying an old bash file and am having some trouble manipulating strings. The problem is that the strings can be anything random to the left of _<date>.<num>. For example, from ThisIsAString-Sub_tag_150827.1, I need to extract _150827.1. In bash, this seems very difficult to do. In any other language, I would split on _, and just grab the last element of the list. How do I do this in bash? I've tried a few different ways (including with awk), but cannot seem to get it right.

With bash's Parameter Expansion:
a="ThisIsAString-Sub_tag_150827.1"
echo "${a##*_}"
Output:
150827.1

Related

Remove square brackets with parameter expansions

I have an environment variable like so:
VAR=[["val1","val2"],["val3","val4"]]
I do not control the data so the actual number of values and arrays may vary but it follows this array format. There may be more or less results than is depicted in the example above. I am trying to strip the angle brackets so it looks like:
"val1","val2","val3","val4"
Using only bash string manipulation.
I am halfway there. If I do:
echo ${VAR//[/}
It removes all the left brackets. But I cannot figure out what sort of syntax is needed to remove left and right bracket at same time. It doesn't appear to be regex format and I am struggling to find any similar example in the docs. (I am using Ubuntu 20.04)
What is the pattern to remove both of these square brackets with the bash filter?
You need a bracket expression to match both opening and closing brackets.
$ VAR='[["val1","val2"],["val3","val4"]]'
$ echo "${VAR//[][]/}"
"val1","val2","val3","val4"
Bracket expressions are documented here, and here.
This will handle arbitrarily complex depths, such as with:
$ VAR='[["val1","val2"],["val3","val4"]],[a,[b,[c,[d,[e,[[[[[[[[[[[f,g]]]]]]]]]]],h],i]]]'
$ echo "${VAR//[][]/}"
"val1","val2","val3","val4",a,b,c,d,e,f,g,h,i

How to generate string for bash word-expansion?

Bash can generate multiple strings from single, if you use {...,...} syntax. Like here:
$ echo pgdb{200,10{0,1}}
pgdb200 pgdb100 pgdb101
Is there any way to take a list of strings and produce (hopefully shorter) string that, upon processing via bash word expansion will produce original list (not necessarily in original order?
For example, I'd like this tool/algorithm, that given:
postgresql
mysql
postgres
miata
would produce (for example): {postgres{ql,},m{iata,ysql}}
I thought about using trie to represent input strings, but can't figure out how to process this trie to build output string.
use Compress::BraceExpansion;?

How to parse a word in shell script

I want to parse the following word in shell script
VERSION=METER1.2.1
Here i want to split it as two words as
WORD1=METER
WORD2=1.2.1
Let me help how to parse it?
Far more efficient than using external tools such is sed is bash's built-in parameter expansion support. For instance, if you want the name variable to contain everything until the first number, and the numbers variable to contain everything after the last alpha character:
version=METER1.2.1
name=${version%%[0-9]*}
numbers=${version##*[[:alpha:]]}
To understand this, see the BashFAQ entry on string manipulation in general, or the BashFAQ entry on parameter expansion in particular.

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

Bash variable concatenation

Which form is most efficient?
1)
v=''
v+='a'
v+='b'
v+='c'
2)
v2='a'` `'b'` `'c'
Assuming readability were exactly the same to you, and that's a stretch, would 1) mean creating and throwing away a few string immutables (like in Python) or act as a Java "StringBuffer" with periodical expansion of the buffer capacity? How are string concatenations handled internally in Bash?
If 2) were just as readable to you as 1), would the backticks spawn subshells and would that be more costly, even as a potential 'no-op' than what is done in 1) ?
Well, the simplest and most efficient mechanism would be option 0:
v="abc"
The first mechanism involves four assignments.
The second mechanism is bizarre (and is definitely not readable). It (nominally) runs an empty command in two sub-shells (the two ` ` parts) and concatenates the outputs (an empty string) with the three constants. If the shell simply executes the back-tick commands without noting that they're empty (and it's not unreasonable that it won't notice; it is a weird thing to try — I don't recall seeing it done in my previous 30 years of shell scripting), this is definitely vastly slower.
So, given only options (1) and (2), use option (1), but in general, use option (0) shown above.
Why would you be building up the string piecemeal like that? What's missing from your example that makes the original code sensible but the reduced code shown less sensible.
v=""
x=$(...)
v="$v$x"
y=$(...)
v="$v$y"
z=$(...)
v="$v$z"
This would make more sense, especially if you use each of $x, $y and $z later, and/or use intermediate values of $v (perhaps in the commands represented by triple dots). The concatenation notation used will work with any Bourne-shell derivative; the alternative += shell will work with fewer shells, but is probably slightly more efficient (with the emphasis on 'slightly').
The portable and straight forward method would be to use double quotes and curly brackets for variables:
VARA="beginning text ${VARB} middle text ${VARC}..."
you can even set default values for empty variables this way
VARA="${VARB:-default text} substring manipulation 1st 3 characters ${VARC:0:3}"
using the curly brackets prevents situations where there is a $VARa and you want to write ${VAR}a but end up getting the contents of ${VARa}

Resources