I have a string like this:
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
I want to replace all non-word characters (symbols and whitespace), except the ### delimiters.
I'm currently using:
str.gsub(/[^\w#]+/, 'X')
which yields:
"JimXBobXsXemailX###hl###address###endhl###XisXjb#exampleXcom"
In practice, this is good enough, but it offends me for two reasons:
The # in the email address is not replaced.
The use of [^\w] instead of \W feels sloppy.
How do I replace all non-word characters, unless those characters make up the ###hl### or ###endhl### delimiter strings?
str.gsub(/(###.*?###|\w+)|./) { $1 || "X" }
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
This approach uses the fact that alternations work like case structure: the first matching one consumes the corresponding string, then no further matching is done on it. Thus, ###.*?### will consume a marker (like ###hl###; nothing else will be matched inside it. We also match any sequence of word characters. If any of those are captured, we can just return them as-is ($1). If not, then we match any other character (i.e. not inside a marker, and not a word character) and replace it with "X".
Regarding your second point, I think you are asking too much; there is no simple way to avoid that.
Regarding the first point, a simple way is to temporarily replace "###" with a character that you will never use (let's say you are using a system without "\r", so that that character is not used; we can use that as a temporal replacement).
"Jim-Bob's email ###hl###address###endhl### is: jb#example.com"
.gsub("###", "\r").gsub(/[^\w\r]/, "X").gsub("\r", "###")
# => "JimXBobXsXemailX###hl###address###endhl###XisXXjbXexampleXcom"
Does the newline sequence or even all escape sequences in a file count as only one character, even though it's written \n?
After I separated a one-line-file into multiple lines, only one character per line was added according to the wc -m output of the terminal.
\n is a way of representing a newline character in various languages and programs but as the name suggests, a newline is only stored in a file as a single character.
The backslash helps both computers and humans to realise you are referring to a newline character without you having to actually type one, which would be confusing in a lot of instances.
The \n notation is usually used for a single character. Use a hexdump to see the actual bytes, for example xxd.
I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses
I'm trying to parse a line within a file that contains "ID" and a numeric entry. However, what the script below is doing is grabbing the "ID" numeric value plus everything after it. How can I just cut it down to "ID" + numeric value and nothing else?
Thanks in advance.
tail -n 1 events.log | sed 's/.*id=\([^)]\+\).*/\1/' > event_id.dat
Your regex string in brackets is [^)]\+, which means "all characters other than the end bracket".
If numerical digits is what you want to catch, you need to change that to [0-9]\+
It's tough to tell what you're looking for without an example of input and expected output, but this might work most generally:
sed -e 's/.*id=\([0-9]*\).*/\1/'
That amounts to:
Look for lines include "id=" immediately followed by some numbers ([0-9]*), with any amount of anything before or after
Replace those lines with just the numbers (where \1 references the part within parenthesis in the match expression)
Does that do what you want? If not, can you be more explicit with your input/output requirements?
I have a program that loads some tab-separated lines into a MySQL table. One of the values has tabs in it, which is causing some problems. The data is created column by column, so I need to find a way to strip the tab character out of an individual field with gsub. I do not, however, want to get rid of anything else, like spaces.
It's really easy \t is the tab character.
result = string.gsub /\t/, ''
or, in-place
string.gsub! /\t/, ''
\t is the escape character for tabs within strings. So you can just search for "\t" and replace that by a space or something.