how to add a charecter to a a number using "sed"? - bash

I got some output file, with a bug. I can't correct the source code, as it is not mine.
The output is a table of numbers (14 columns, hundreds of rows) that look like "1.5398E+02" (format of X.XXXXE-YY ). The bug is that if the power has three digits, so the number look like "6.4492-137" (missing the "E"). I been told to run a "sed" with something like:
sed ' s/([0-9]\.[0-9]{4})-([0-9]{3})/\1E-\2/' model.txt > modelCorrect.txt
or maybe
sed ' s/([0-9]\.[0-9]{4})-([0-9]{3})/([0-9]\.[0-9]{4})-E([0-9]{3})' model.txt > modelCorrect.txt
But it doesn't work (sed: -e expression #1, char 39: invalid reference \2 on s' command's RHS or sed: -e expression #1, char 63: unterminated s' command). What am I doing wrong?

What am I doing wrong?
Just sed uses basic regular expressions (BREs). There, groups are written as \(...\). In your command there is no group, therefore you cannot use the reference \2.
To uses the "normal" extended regular expressions (EREs), use sed -E.
Other than that, you forgot to allow the + in your regex. And since your file has 14 columns, you may want to replace all matches in each line (using s/.../.../g) instead of just the first one.
Also, it probably is safer to match numbers with an arbitrary number of places. Why invest so much work into checking that the number has the format 1.4444-333 if you could just allow all numbers?
sed -E 's/([0-9])([-+][0-9]+)/\1E\2/g' <<< "6.4492-137 1.23+4"
prints 6.4492E-137 1.23E+4.

Related

Variable contains backslash not working in sed bash

Our log pattern is in the following format dd/Mon/year:time(22/Feb/2018:13).
Goal is we want to find logs between 2 different times. We used sed to get the log between 2 times.
sed -n '/22\/Feb\/2018:13:/,/22\/Feb\/2018:16/p' /var/log/apache2/domlogs/access.log
The above command is working manually. We created a two variables called LAST and NOW in the script and assigned the date variables as mentioned below.
NOW="22/Feb/2018:16"
LAST="22/Feb/2018:13"
We have used the following sed commands to print the same output however it doesn't help us to print the same output.
sed -n '/'"$LAST"'/'"$NOW"'/p' /var/log/apache2/domlogs/access.log
The command gives the below error
sed: -e expression #1, char 5: unknown command: `F'
If we use normal string for LAST and NOW then above command works fine. Only problem is if the variable contains / in the input
You can freely change your delimiter of sed's regular expression by preceding it with a backslash, e.g. \!. Following command should work:
sed -n '\!'"$LAST"'!,\!'"$NOW"'!p' /var/log/apache2/domlogs/access.log
If ! is expected to show up in your date time format, you can use your judgment to choose another one.
You should read some good sed tutorial before working with sed, e.g.: http://www.grymoire.com/Unix/sed.html
I think you originally want to implement a range search in sed while you might miss the syntax of that. I fixed it above and it's tested.

Removing Time Stamp With Sed

I have found a couple examples online.. but I could not
find a combination that would work, as the synx for sed
is very tricky, if you could please kindly point me in
right direction I would be highly grateful..
Here is the time stamp that i would like to remove from the file
00:02:06.580 --> 00:02:07.380
Here is what i already tried
cat sometextfile.txt | sed -r 's /\[0-9]{2}:[0-9]{2}:[0-9]{2}\/ g'
But I keep getting and error: sed: -e expression #1, char 34: unterminated `s' command
Thanks!!
The syntax is s/ what to replace / what to replace it with /. You are missing the second part. Even if you want to replace it with nothing, you need all three slashes; just don't put anything between the last two. As it is, you have only one slash, because the second one is quoted with \, meaning sed will treat it as part of the expression and look for a literal / in the input.
The beginning of your regex is also wrong. \[0-9]{2} matches the literal string [0-9 followed by exactly two right brackets (]]). Remove the initial backslash (\) if you want to match "exactly two digits".
Also, you never need to do cat filename |; you can just do < filename. In this specific case, sed takes a filename parameter, so you can do without the <, too.
So it should be something like this;
sed -E 's/[0-9]{2}:[0-9]{2}:[0-9]{2}//' sometextfile.txt
(I used -E because it's more portable than -r, which is a GNUism.)
You don't need the g on the end unless there's more than one timestamp per line.

Use sed to count periods, commas, and numbers?

I have a file that looks like this:
19.217.179.33,175.176.12.8
253.149.205.57,174.210.221.195
222.118.178.218,255.99.100.202
241.55.199.243,167.98.204.104
38.224.198.117,21.11.184.68
Each line is 2 IP addresses, separated by a comma. So, each line should meet these requirements:
Has 1 comma.
Has 6 periods.
Has ONLY numbers, commas, and periods.
If a line is missing a period, has more/less than one commas, has a letter, is blank, or anything like that - it isn't correct. Basically I just want to use sed or something similar to loop through each line in the file and make sure each of them meets the above requirements.
Is this something that can be done with sed? I know you can use it to delete files that do/don't have matching strings, but I wasn't sure about counting specific characters or verifying that a line only has certain characters.
Any help would be greatly appreciated. Thanks!
I think grep is a better tool for this. You just want to ensure that each line matches a particular regex, so invert the grep with -v and label the input invalid if any line gets output. Something like:
grep -qvE '^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$' input || echo input is valid
You can simplify that a bit:
IP='([0-9]{1,3}\.){3}[0-9]{1,3}'
grep -qvE "^$IP,$IP$" input || echo input is valid
Or if you are more interested in invalid data:
grep -qvE "^$IP,$IP$" input && echo input is invalid
What I'd do is to think up a regular expression that fits the 'proper' lines, and omits them from printing. Like this:
sed -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/d' file
Everything that remains is a wrong line.
Here's the recipe in more detail:
[0-9]{1,3} between one and three digits
\. literal period (just the period is a wildcard and matches any character)
(...){3} three repetitions of something, so together
([0-9]{1,3}\.){3}[0-9]{1,3} makes up something that looks like an IP address. (Though note that it doesn't enforce the <256 rule, so 999.999.999.999 matches.)
/^ ... $/ the match needs to start at the beginning of the line and run until its end.
'/ ... /d' print everything except lines that match what's inside the two slashes
-r is needed to recognise the {1,3} syntax.
This will find and print the lines that are wrong. If you want to delete the wrong lines, you can easily invert this:
sed -i.bak -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/p' file
-i.bak means keep a backup, but overwrite the input file
-n means don't output anything unless expressly directed to output, and
/ ... /p output all the lines that match this regex.
If you would like to display only information about file contents correctness , you can use this command:
sed -n -r '/^([0-9]{1,3}\.){3}[0-9]{1,3},([0-9]{1,3}\.){3}[0-9]{1,3}$/!{a \
FILE IS INCORRECT
;q;};$aFILE IS OK'
It's modified version of #chw21 answer, but displays only information text:
FILE IS INCORRECT, or
FILE IS OK.

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

grep: keep lines by number in specific column

I know how to do it with awk, for example, keep lines, which contains number 3 in second column: $ awk '"$2" == 3'
But how to do the same with only grep?
What about for first column?
Grep is not great for this, awk is better. But assuming your columns are separated by spaces, then you want
grep -E '^[^ ]+ +3( |$)'
Explanation: find something that has a start of line, followed by one or more non-space characters (first column), then one or more space characters (column separator), then the number 3, then either a space (because there's another column) or end of line (if there's no other column).
(Updated to fix syntax after testing.)
Here is the longer explanation for my mysterious command grep -P '^[^\t]*\t3\t' your_file from the comments:
I assumed that the column delimiter is a tab. grep without -P would require some strange things to use it directly (see e.g. see here ) . The -P makes it possible to just write \t without any problems. If for example your delimiter is ; then you could replace the \t with ; and you dont need the -P option.
Having said that, lets explain the idea behind the regular expression: You said, you want to match a 3 in the second column:
^ means: at the beginning of the line
[^\t]* means: zero or more (*) occurences of something not a tab ([^\t] here the ^ means "not a")
followed by tab
followed by 3
followed by tab
Now we have effectively expressed the idea that we need a 3 as the content of the second column (\t3\t) and we are not interested in the precise content of the first column. The ^[^\t]*\t is only necessary to express the idea "what follows is in the second column".
If you want to match something in the fourth column, you could use this to "skip" the first three column and match a 4 in the fourth column:
^([^\t]*\t){3}4. (Note the parenthesis and the {3}).
As you can see many details and awk is much more elegant and easy.
You can read this up in the documentation of grep and then you will need to study something about regular expression, e.g. start here.

Resources