Parsing String to get required Data - shell

I'm trying to parse a line within a file that contains "ID" and a numeric entry. However, what the script below is doing is grabbing the "ID" numeric value plus everything after it. How can I just cut it down to "ID" + numeric value and nothing else?
Thanks in advance.
tail -n 1 events.log | sed 's/.*id=\([^)]\+\).*/\1/' > event_id.dat

Your regex string in brackets is [^)]\+, which means "all characters other than the end bracket".
If numerical digits is what you want to catch, you need to change that to [0-9]\+

It's tough to tell what you're looking for without an example of input and expected output, but this might work most generally:
sed -e 's/.*id=\([0-9]*\).*/\1/'
That amounts to:
Look for lines include "id=" immediately followed by some numbers ([0-9]*), with any amount of anything before or after
Replace those lines with just the numbers (where \1 references the part within parenthesis in the match expression)
Does that do what you want? If not, can you be more explicit with your input/output requirements?

Related

Finding number range with grep

I have a database in this format:
username:something:UID:something:name:home_folder
Now I want to see which users have a UID ranging from 1000-5000. This is what what I tried to do:
ypcat passwd | grep '^.*:.*:[1-5][0-9]\{2\}:'
My thinking is this: I go to the third column and find numbers that start with a number from 1-5, the next number can be any number - range [0-9] and that range repeats itself 2 more times making it a 4 digit number. In other words it would be something like [1-5][0-9][0-9][0-9].
My output, however, lists even UID's that are greater than 5000. What am I doing wrong?
Also, I realize the code I wrote could potentially lists numbers up to 5999. How can I make the numbers 1000-5000?
EDIT: I'm intentionally not using awk since I want to understand what I'm doing wrong with grep.
There are several problems with your regex:
As Sundeep pointed out in a comment, ^.*:.*: will match two or more columns, because the .* parts can match field delimiters (":") as well as field contents. To fix this, use ^[^:]*:[^:]*: (or, equivalently, ^\([^:]:\)\{2\}); see the notes on bracket expressions and basic vs extended RE syntax below)
[0-9]\{2\} will match exactly two digits, not three
As you realized, it matches numbers starting with "5" followed by digits other than "0"
As a result of these problems, the pattern ^.*:.*:[1-5][0-9]\{2\}: will match any record with a UID or GID in the range 100-599.
To do it correctly with grep, use grep -E '^([^:]*:){2}([1-4][0-9]{3}|5000):' (again, see Sundeep's comments).
[Added in edit:]
Concerning bracket expressions and what ^ means in them, here's the relevant section of the re_format man page:
A bracket expression is a list of characters enclosed in '[]'. It
normally matches any single character from the list (but see below).
If the list begins with '^', it matches any single character (but see
below) not from the rest of the list. If two characters in the list
are separated by '-', this is shorthand for the full range of
characters between those two (inclusive) in the collating sequence,
e.g. '[0-9]' in ASCII matches any decimal digit.
(bracket expressions can also contain other things, like character classes and equivalence classes, and there are all sorts of special rules about things like how to include characters like "^", "-", "[", or "]" as part of a character list, rather than negating, indicating a range, class, or end of the expression, etc. It's all rather messy, actually.)
Concerning basic vs. extended RE syntax: grep -E uses the "extended" syntax, which is just different enough to mess you up. The relevant differences here are that in a basic RE, the characters "(){}" are treated as literal characters unless escaped (if escaped, they're treated as RE syntax indicating grouping and repetition); in an extended RE, this is reversed: they're treated as RE syntax unless escaped (if escaped, they're treated as literal characters).
That's why I suggest ^\([^:]:\)\{2\} in the first bullet point, but then actually use ^([^:]*:){2} in the proposed solution -- the first is basic syntax, the second is extended.
The other relevant difference -- and the reason I switched to extended for the actual solution -- is that only extended RE allows | to indicate alternatives, as in this|that|theother (which matches "this" or "that" or "theother"). I need this capability to match a 4-digit number starting with 1-4 or the specific number 5000 ([1-4][0-9]{3}|5000). There's simply no way to do this in a basic RE, so grep -E and the extended syntax are required here.
(There are also many other RE variants, such as Perl-compatible RE (PCRE). When using regular expressions, always be sure to know which variant your regex tool uses, so you don't use syntax it doesn't understand.)
ypcat passwd |awk -F: '$3>1000 && $3 <5000{print $1}'
awk here can go the task in a simple manner. Here we made ":" as the delimiter between the fields and put the condition that third field should be greater than 1000 and less then 5000. If this condition meets print first field.

How to implement Siri/Cortana like functionality in commandline?

I would like to implement a small subset of siri/cortana like features in command line.
For e.g.
$ What is the sum of 100 and 1000
> Response: 1100
$ What is the product of 10 and 12
> Response: 120
The questions are predefined regular expressions. It needs to call the matching function in ruby.
Pattern: What is the sum of (\d)+ and (\d)+
Ruby method to call: sum(a,b)
Any pointers/suggestion is appreciated.
That sounds exactly like cucumber, maybe take a look and see if you can just use their classes to hack something together :) ?
You could do something like the following:
question = gets.chomp
/\A.*(sum |product |quotient |difference )\D+([0-9]+)\D+([0-9]+).*\z/.match question
send($1, $2.to_i, $3.to_i)
Quick explanation for anyone that may be new to matching in Ruby:
This gets a line of input from the command line and scans it for a function name (i.e. sum, product, etc) followed by a space and potentially some non-digit characters. Then, it looks for a first number (similarly followed by a space and 0 or more non-digit characters) and a second number followed by nothing or anything. The parentheses determine what gets assigned to the variables preceded by a $, i.e. the substring that matches the contents of the first set of parentheses gets assigned to $1.
Next, it calls the method whose name is the value of $1 with the arguments (casted to integers) found in $2 and $3.
Obviously, this isn't generalized at all--you're putting the method names in the regex, and it's taking a fixed number of arguments--but it'll hopefully be useful for getting you on the right track.

Using sed to modify line not containing string

I am trying to write a bash script that uses sed to modify lines in a config file not containing a specific string. To illustrate by example, I could have ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=0)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
And I want every line's parenthetical list to be changed such that it contains strings anonuid=-1 and anongid=-1 within its parentheses ...
/some/file/path1 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path2 ipAddress1/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path3 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
/some/file/path4 ipAddress2/subnetMask(rw,sync,no_root_squash,anongid=-1,anonuid=-1)
/some/file/path5 ipAddress2/subnetMask(rw,sync,no_root_squash,anonuid=-1,anongid=-1)
As can be seen from the example, both anonuid and anongid may already exist within the parentheses, but it is possible that the original parenthetical list has one string but not the other (lines 2, 3, and 4), the list has neither (line 1), the list has both already set properly (line 5), or even one or both of them are set incorrectly (line 3). When either anonuid or anongid is set to a value other than -1, it must be changed to the proper value of -1 (line 3).
What would be the best way to edit my config file using sed such that anonuid=-1 and anongid=-1 is contained in each line's parenthetical list, separated by a comma delimiter of course?
I think this does what you want:
sed -e '/anonuid/{s/anonuid=[-0-9]*/anonuid=-1/;b gid;};s/)$/,anonuid=-1)/;:gid;/anongid/{s/anongid=[-0-9]*/anongid=-1/;b;};s/)$/,anongid=-1)/'
Basically, it has two nearly identical parts with the first dealing with anonuid and the second anongid, each with a bit of logic to decide if it needs to replace or add the appropriate values. (It doesn't bother to check if the value is already correct, that would just complicate things while not changing the results.)
You can use sed to specify the lines you are interested in:
$ sed '/anonuid=..*,anongid=..*)$/!p' $file
The above will print (p) all lines that don't match the regular expression between the two slashes. I negated the expression by using the !. This way, you're not matching lines with both anaonuid and anongid in them.
Now, you can work on the non-matching lines and editing those with the sed s command:
$ sed '/anonuid=..*,anongid=..*)$/!s/from/to/`
The manipulation might be fairly complex, and you might be passing multiple sed commands to get everything just right.
However, if the string no_root_squash appear in each line you want to change, why not take the simple way out:
$ sed 's/no_root_squash.*$/no_root_squash,anonuid=-1,anongid=-1)/' $file
This is looking for that no_root_squash string, and replacing everything from that string to the end of the line with the text you want. Are there lines you are touching that don't need to be edited? Yes, but you're not really changing those lines. You're basically substituting /no_root_squash,anonuid=-1,anongid=-1) with the same /no_root_squash,anonuid=-1,anongid=-1).
This may be faster even though it's replacing text that doesn't need replacing because there's less processing going on. Plus, it's easier to understand and support in the future.
Response
Thanks David! Yeah I was considering going that route, but I didn't want to rely 100% on every line containing no_root_squash. My current config file only ends in that string, but I'm just not 100% sure that won't potentially be different in the field. Do you think there would be a way to change that so it just overwrites from the end of the last string not containing anonuid=-1 or anongid=-1 onward?
What can you guarantee will be in each line?
You might be able to do a capture group:
sed 's/\(sync,[^,)]*\).*/\1,anonuid=-1,anongid=-1)/' $file
The \(..\) is a capture group. It basically captures that portion of the matching regular expression, and then allows you to reuse it via the \1. I'm capturing from the word sync to a group of characters not including a comma or a closing parentheses. Then, I'm appending the capture group, a comma, and your anon uid and gid.
Will that work?
Maybe I am oversimplifying:
sed 's/anonuid=[-0-9]*[^)]//g;s/anongid=[-0-9]*[^)]//g;s/[)]/anonuid=-1,anongid=-1)/g' test.txt > test3.txt
This just drops any current instance of anonuid or anongid and adds the string
"anonuid=-1,anongid=-1" into the parentheses

How would I get this portion of the string?

Here's my string:
http://media.example.com.s3.amazonaws.com/videos/1/123ab564we65a16a5w_web.m4v
I want this: 123ab564we65a16a5w
The only variables that will change here are the /1/ and the unique key that I'm trying to pull. Everything else will be exactly the same.
For the /1/ portion, that 1 could be anywhere from 1-3 digits, but will always be numeric.
I'm running Ruby 1.9.2.
Assuming nothing else changes, here's the regex for it:
http://media.example.com.s3.amazonaws.com/videos/\d{1,3}/(.*)_web.m4v
If there are other changes, you need to let us know all the variables.
This is shorter -
s.split(/[/_.]/)[-3]
Since you've indicated that the value you want will always have a "/" immediately before it (and none after it) and an "_" immediately after it, you could use this generic regex:
^.*/(.*)_.*$
Here's why this would work:
^ matches the beginning of the line
.*/ matches any number of characters up to the slash - this is greedy, so it will go until the last slash in the input value
(.*) matches any number of characters and captures the result
_.* matches an underscore and then any number of characters
$ matches the end of the line
By matching anything up to the last "/" and then anything after the "_", you easily isolate the desired value.
NOTE: I don't know if the Ruby regex syntax is any different than this, so your mileage may vary.
--
EDIT: It looks like in Ruby, you might not need/want the ^ or $ at the beginning and end.

Bash for truncation

I have to make changes to a document where there are two columns separated by tab (\t) and each record separated by newline \n. the statements of the document are as follows:
/something/random/2345.txt
my aim is to remove the entire string and just keep the number 2345 in this case.I used
sed 's/something/random//g' file.csv
but I do not know how to escape the / cause sed syntax has / too. Also not all records have the same words so i would be looking for regex of the type
/*/*.*
But each entry has a number as a part of the record and I would like to extract that.
Also there are a few records which do not contain any number, I would like to delete those records along with the corresponding entry in the next column for that record.
The file is in CSV format.
You can escape the forward slash with a backslash, or you can use a different character than forward slash to delimit your expression. Observe:
echo foobar | sed sIfooIcrowI
> crowbar
Of course, you probably shouldn't use an alphabetic character for the delimiter. I'm just using it here to make the point that pretty much any normal character can be substituted for the slash.
You could just remove all non digit characters from brining of each statement in string :
sed 's/[^0-9]*\(.*\)[\t]*/\1/g'

Resources