How to use sed to delete last several character of a pattern - bash

I've gone through all of the threads but still cannot find the answer.
For example.
I have a timestamp of format: yyyy-mm-dd hh:mm:ss.xxx
where xxx indicates the milliseconds.
I want to get rid of the xxx part, notice that this timestamp is not in certain position so we cannot take it as a part in end of line or start of line.(in unix command or in bash script)
The method I can think of is to use sed, but all i can do is to get the pattern, but don't know what to do to process the pattern, it seems that all pattern does is to locate the lines instead of the pattern itself. So generally we can think of the question like: how to use sed to delete last several letters of a certain pattern.
Thanks for reading.
Note that xxx can be 0-999, so it can be 1,2,3 digits, sample is like:
asfd,asasfsf,afas,2017-10-20 13:22:22.0,333,222,0.002
nyh,nyhny,nhy,2 23 4 23 32:23:14.czxv,2017-10-20 13:22:22.234,12.0,234.22
nyh,nyhny,nhy,2017-10-20 13:22:22.234,12.0
wn,rrwn,daff,2017-10-20 13:22:32.543,12,32
What I expect is:
asfd,asasfsf,afas,2017-10-20 13:22:22,333,222,0.002
nyh,nyhny,nhy,2 23 4 23 32:23:14.czxv,2017-10-20 13:22:22,12.0,234.22
nyh,nyhny,nhy,2017-10-20 13:22:22,12.0
wn,rrwn,daff,2017-10-20 13:22:32,12,32

As per OP's shown Input_file proposing the new following solution.
awk '{sub(/\.[^,]*/,"",$2)} 1' Input_file
Explanation: Adding explanation of awk code also here.
awk '{
sub(/\.[^,]*/,"",$2) ##sub is awk in-built utility, which will substitute on basis of sub(text/regex which we need to replace,"new_text"/variable_value,For a current line/variable/field), so in this case I am using a REGEX which will look from a DOT to first occurrence of comma(,) which I am substituting with NULL in 2nd field(your 2nd field is the one which is having timing details because awk has space as delimiter by default).
}
1 ##awk works on method of condition then action. So Here I am making condition TRUE by mentioning 1 and no action is mentioned so be default print action will happen.
' Input_file

This might work for you (GNU sed):
sed 's/\(....-..-.. ..:..:..\)\..../\1/g' file
This is very lazy but most likely will work 99% of the time. It matches on the time stamp separators and then removes the .xxx at the end. If you want, you can be more specific i.e.
sed 's/\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\)\.[0-9]\{3\}/\1/g' file
Using the -r option, removes the toothpick mess:
sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2})\.[0-9]{3}/\1/g' file

Related

Adding a new line to a text file after 5 occurrences of a comma in Bash

I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.

How to detect some pattern with grep -f on a file in terminal, and extract those lines without the pattern

I'm on mac terminal.
I have a txt file with one column with 9 IDs, allofthem.txt, where every ID starts with ¨rs¨:
rs382216
rs11168036
rs9296559
rs9349407
rs10948363
rs9271192
rs11771145
rs11767557
rs11
Also, I have another txt file, useful.txt, with those IDs that were useful in an analysis I did. It looks the same, one column with several rows of IDs, but with less IDS, only 5.
rs9349407
rs10948363
rs9271192
rs11
Problem:I want to generate a new txt file with the non-useful ones (the ones that appear in allofthem.txt but not in useful.txt).
I want to do the inverse of:
grep -f useful.txt allofthem.txt
I want to use some systematic way of deleting all the IDs in useful and obtain a file with the remaining ones. Maybe with awk or sed, but I can´t see it. Can you help me, please? Thanks in advance!
Desired output:
rs382216
rs11168036
rs9296559
rs11771145
rs11767557
-v option does the inverse for you:
grep -vxf useful.txt allofthem.txt > remaining.txt
-x option matches the whole line in allofthem.txt, not parts.
As #hek2mgl rightly pointed out, you need -F if you want to treat the content of useful.txt as strings and not patterns:
grep -vxFf useful.txt allofthem.txt > remaining.txt
Make sure your files have no leading or trailing white spaces - they could affect the results.
I recommend to use awk:
awk 'FNR==NR{patterns[$0];next} $0 in patterns' useful.txt allofthem.txt
Explanation:
FNR==NR is true as long as we are reading useful.txt. We create an index in patterns for every line of useful.txt. next stops further processing.
$0 in patterns runs, because of the previous next statement, on every line of allofthem.txt. It checks for every line of that file if it is a key in patterns. If that checks evaluates to true awk will print that line.

Delete all lines before last case of a string

How would I go about deleting all the lines before the last occurrence of a string. Like if I had a file that looked like
Icecream is good
And
Chocolate is good
And
They have lots of sugar
If I want all lines after and including the last occurrence of "And" what's the cleanest way to do this? Specifically, I want
And
They have lots of sugar
I was doing sed -n -E -e '/And/,$p' file but I see this gives me the first occurrence.
This might work for you (GNU sed):
sed -n '/And/h;//!H;$!d;x;//p' file
Replace anything in the hold space by the line containing And. Append all other lines to the hold space. At the end of the file, swap the pattern space for the hold space and print out the result as long it matches the required string And.
I know that you asked for sed and that Potong provided a good sed solution. But, for comparison, here is an awk solution:
$ awk 's{s=s"\n"$0;} /And/{s=$0;} END{print s;}' file
And
They have lots of sugar
How it works:
s{s=s"\n"$0;}
If the variable s is not empty, then add to it the current line, $0.
/And/{s=$0;}
If the current line contains And, then set s to the current line, $0.
END{print s;}
After we have reached the end of the file, print s.
$ tac file | awk '!f; /And/{f=1}' | tac
And
They have lots of sugar
$ awk 'NR==FNR{if(/And/)nr=NR;next} FNR>=nr' file file
And
They have lots of sugar

How to pull a value from between 2 strings which occur several times in a file

I am trying to pull the value from inbetween 2 strings and line break each result. I am then hoping to combine this with another value from the same document being pulled the same way. The problem is there are NO linebreaks in this file and it is quite large. Here is an example of the file.
<ID>47</ID><DATACENTER_ID>36</DATACENTER_ID><DNS_NAME>myhost.domain.local</DNS_NAME> <IP_ADDRESS>10.0.0.1</IP_ADDRESS><ID>60</ID><DATACENTER_ID>36</DATACENTER_ID><DNS_NAME>yourhost.domain.local</DNS_NAME><IP_ADDRESS>10.0.0.2</IP_ADDRESS>
My end result would ideally look something like this.
ID-----DNS_NAME
47-----myhost.domain.local
60-----yourhost.domain.local
My closest attemps so far have been creating variables with grep, but I cant seem to format them into a table. Im also very new to scripting so forgive my ignorance.
If your grep supports -P (--Perl-regexp), then you're free to use the below regex.
$ grep -oP '<ID>\K[^<>]*(?=</ID>)|<DNS_NAME>\K[^<>]*(?=</DNS_NAME>)' file | sed 'N;s/\n/-----/g'
47-----myhost.domain.local
60-----yourhost.domain.local
\K Discards the previously matched characters from printing.
(?=...) posiitve lookahead assertion which asserts where the match would occur. It won't consume any characters.
Here is an gnu awk (do to multiple characters in RS) to get your data:
awk -v RS="<ID>" -F"<|>" 'NR>1 {print $1"-----"$9}' file
47-----myhost.domain.local
60-----yourhost.domain.local

Awk/Sed - how to print selection between two patterns?

From reference: catonmat.net I think I could get the interested selection between two patterns using the following:
Source Text (one line): 6 June 2013 08.32.435 UTF+8 Report /content/folder[#name='....' Failure ....
Here the important part is the path to report , therefore I am using:
awk '/content\/folder\[#name=/,/Failure/' source.csv
I got the entire matched line, instead of only the content path between the two matches.
I have also tried to:
sed -n '/content\/folder\[#name/,/Failure/ {/content\/folder\[#name\|Failure/!p}' source.csv
Still returning the entire line...
What was wrong?
Try this:
sed -n '|content/folder\[#name.*Failure|s|.*content/folder\[#name\(.*\)Failure.*|\1|' source.csv
/re1/,/re2/ is for selecting a range of lines, not a range of text within a line. Since content/folder and Failure are on the same line, you don't need a range, just a regex that matches a line containing both. Then use s/// to extra the part between them.
sed 's,.*/content/folder\[#name=\(.*\)Failure.*,\1,' source.csv
grep -Po '(?<=#name=).*(?=Failure)' source.csv

Resources