Remove square brackets with parameter expansions - bash

I have an environment variable like so:
VAR=[["val1","val2"],["val3","val4"]]
I do not control the data so the actual number of values and arrays may vary but it follows this array format. There may be more or less results than is depicted in the example above. I am trying to strip the angle brackets so it looks like:
"val1","val2","val3","val4"
Using only bash string manipulation.
I am halfway there. If I do:
echo ${VAR//[/}
It removes all the left brackets. But I cannot figure out what sort of syntax is needed to remove left and right bracket at same time. It doesn't appear to be regex format and I am struggling to find any similar example in the docs. (I am using Ubuntu 20.04)
What is the pattern to remove both of these square brackets with the bash filter?

You need a bracket expression to match both opening and closing brackets.
$ VAR='[["val1","val2"],["val3","val4"]]'
$ echo "${VAR//[][]/}"
"val1","val2","val3","val4"
Bracket expressions are documented here, and here.
This will handle arbitrarily complex depths, such as with:
$ VAR='[["val1","val2"],["val3","val4"]],[a,[b,[c,[d,[e,[[[[[[[[[[[f,g]]]]]]]]]]],h],i]]]'
$ echo "${VAR//[][]/}"
"val1","val2","val3","val4",a,b,c,d,e,f,g,h,i

Related

Bash string manipulation, extracting/removing parts

I'm modifying an old bash file and am having some trouble manipulating strings. The problem is that the strings can be anything random to the left of _<date>.<num>. For example, from ThisIsAString-Sub_tag_150827.1, I need to extract _150827.1. In bash, this seems very difficult to do. In any other language, I would split on _, and just grab the last element of the list. How do I do this in bash? I've tried a few different ways (including with awk), but cannot seem to get it right.
With bash's Parameter Expansion:
a="ThisIsAString-Sub_tag_150827.1"
echo "${a##*_}"
Output:
150827.1

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

Ruby Regular Expressions: Matching if substring doesn't exist

I'm having an issue trying to capture a group on a string:
"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"
My regex currently looks like this:
/<code>([\s\S]*)<\/code>/
My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?
All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:
/<code>([\s\S]*?)<\/code>/
But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.
And I just learned that for going through multiple parts, the
String.scan( /<code>(.*?)<\/code>/ ){
puts $1
}
is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

Ruby regex for text within parentheses

I am looking for a regex to replace all terms in parentheses unless the parentheses are within square brackets.
e.g.
(matches) #match
[(do not match)] #should not match
[[does (not match)]] #should not match
I current have:
[^\]]\([^()]*\) #Not a square bracket, an opening bracket, any non-bracket character and a closing bracket.
However this is still matching words within the square brackets.
I have also created a rubular page of my progress so far: http://rubular.com/r/gG22pFk2Ld
A regex is not going to cut it for you if you can nest the square brackets (see this related question).
I think you can only do this with a regex if (a) you only allow one level of square brackets and (b) you assume all square brackets are properly matched. In that case
\([^()]*\)(?![^\[]*])
is sufficient - it matches any parenthesised expression not followed by an unpaired ]. You need (b) because of the limitations of negative lookbehind (only fixed length strings in 1.9, and not allowed at all in 1.8), which mean you are stuck matching (match)] even if you don't want to.
So basically if you need to nest, or to allow unmatched brackets, you should ditch the regex and look at the answer to the question I linked to above.
This is a type of expression you cannot parse using a pure-regex approach, because you need to keep track of the current nesting/state_if_in_square_bracket (so you don't have a type 3 language anymore).
However, depending on the exact circumstances, you can parse it with multiple regexes or simple parsers. Example approaches:
Split into sub-strings, delimited by
[/[[or ]/]], change the state
when such a square bracket is
encountered, replace () in a
sub-string if in
"not_in_square_bracket" state
Parse for square brackets (including content), remove & remember them (these are "comments"), now replace all the content in normal brackets and re-add the square brackets stuff (you can remember stuff by using unique temp strings)
The complexity of your solution also depends on the detail if escaping ] is allowed.

TEXTMATE: delete comments from document

I know that you can use this to remove blank lines
sed /^$/d
and this to remove comments starting with #
sed /^#/d
but how to you do delete all the comments starting with // ?
You just need to "escape" the slashes with the backslash.
/\/\//
the ^ operator binds it to the front of the line, so your example will only affect comments starting in the first column. You could try adding spaces and tabs in there, too, and then use the alternation operator | to choose between two comment identifiers.
/^[ \t]*(\/\/|$)/
Edit:
If you simply want to remove comments from the file, then you can do something like:
/(\/\/|$).*/
I don't know what the 'd' operator at the end does, but the above expression should match for you modulo having to escape the parentheses or the alternation operator (the '|' character)
Edit 2:
I just realized that using a Mac you may be "shelling" that command and using the system sed. In that case, you could try putting quotation marks around the search pattern so that the shell doesn't do anything crazy to all of your magic characters. :) In this case, 'd' means "delete the pattern space," so just stick a 'd' after the last example I gave and you should be set.
Edit 3:
Oh I just realized, you'll want to beware that if you don't catch things inside of quotes (i.e. you don't want to delete from # to end of line if it's in a string!). The regexp becomes quite a bit more complicated in that case, unfortunately, unless you just forgo checking lines with strings for comments. ...but then you'd need to use the substitution operation to sed rather than search-and-delete-match. ...and you'd need to put in more escapes, and it becomes madness. I suggest searching for an online sed helper (there are good regex testers out there, maybe there's one for sed?).
Sorry to sort of abandon the project at this point. This "problem" is one that sed can do but it becomes substantially more complex at every stage, as opposed to just whipping up a bit of Python to do it.

Resources