I have a problem similar to this one, but I can't adapt to my case.
Say I have a file with many of these lines:
f 1/1/519 2/2/2 3/3/520
f 287/4/521 1/5/519 3/6/520
f 5/7/522 1/8/523 287/9/524
I want to replace the content between the two slashes (number/anyNumber/number) of each block.
I would like to have the following result:
f 1//519 2//2 3//520
f 287//521 1//519 3//520
f 5//522 1//523 287//524
What is the correct sed (or anything else) command?
Using MacOS.
$ sed 's:/[^/]*/://:g' file
f 1//519 2//2 3//520
f 287//521 1//519 3//520
f 5//522 1//523 287//524
Easy enough making this pattern-based in sed:
sed 's#/[0-9]*/#//#g' input.txt
This matches any stretch of zero or more digits between two slashes, and replaces the whole bundle with two slashes.
In awk, you might do the same thing this way:
awk '{gsub(/\/[0-9]*\//,"//")} 1' input.txt
The gsub() command is documented on the awk man page. The 1 at the end is a shortcut for "print this line". But you might alternately treat the fields as actual fields:
awk '{for (i=2;i<=NF;i++) {split($i,a,"/"); $i=sprintf("%s//%s",a[1],a[3])} } 1' input.txt
This is more technically "right" in that it treats fields as fields, then treats subfields as subfields. But it'll undoubtedly be slower than the other options, and will also rewrite lines with OFS as field separators.
Lastly, you could use bash alone, without awk or sed:
shopt -s extglob
while read; do echo "${REPLY//\/+([0-9])\////}"; done < input.txt
This works in bash version 3 (since you're using macOS). It reads each line of input then uses Parameter Expansion to make the same translation that was done in the first two options. This solution is likely slower than the others. The extglob shell option is used to make more advanced patterns possible.
Could you please try following.
awk '{for(i=1;i<=NF;i++){gsub(/\/.*\//,"//",$i)}} 1' Input_file
Output will be as follows.
f 1//519 2//2 3//520
f 287//521 1//519 3//520
f 5//522 1//523 287//524
Answer is rather simple: cat file.txt | sed -e 's/\([0-9]\+\/\)[0-9]\+\(\/[0-9]\+\)/\1\2/g' > mod.txt
Putting something in brackets (()) allows you to refer to it later, using that you remember numbers before the first slash plus the slash (first capture group), match number between the slashes, then remember slash and any numbers after the slash (second capture group), then you just replace whole matched string with first and second capture groups, discarding everything else.
Switch g makes sed operate on every matching occurrence.
Related
I am doing a find and replace using sed in a bash script. I want to search each file for words with files and no. If both the words are present in the same line then replace red with green else do nothing
sed -i -e '/files|no s/red/green' $file
But I am unable to do so. I am not receiving any error and the file doesn't get updated.
What am I doing wrong here or what is the correct way of achieving my result
/files|no/ means to match lines with either files or no, it doesn't require both words on the same line.
To match the words in either order, use /files.*no|no.*files/.
sed -i -r -e '/files.*no|no.*files/s/red/green/' "$file"
Notice that you need another / at the end of the pattern, before s, and the s operation requires / at the end of the replacement.
And you need the -r option to make sed use extended regexp; otherwise you have to use \| instead of just |.
This might work for you (GNU sed):
sed '/files/{/no/s/red/green/}' file
or:
sed '/files/!b;/no/s/red/green/' file
This method allows for easy extension e.g. foo, bar and baz:
sed '/foo/!b;/bar/!b;/baz/!b;s/red/green/' file
or fee, fie, foe and fix:
sed '/fee/!b;/fi/!b;/foe/!b;/fix/!b;s/bacon/cereal/' file
An awk verison
awk '/files/ && /no/ {sub(/red/,"green")} 1' file
/files/ && /no/ files and no have to be on the same line, in any order
sub(/red/,"green") replace red with green. Use gsub(/red/,"green") if there are multiple red
1 always true, do the default action, print the line.
I have a text file that is basically one giant excel file on one line in a text file. An example would be like this:
Name,Age,Year,Michael,27,2018,Carl,19,2018
I need to change the third occurance of a comma into a new line so that I get
Name,Age,Year
Michael,27,2018
Carl,19,2018
Please let me know if that is too ambiguous and as always thank you in advance for all the help!
With Gnu sed:
sed -E 's/(([^,]*,){2}[^,]*),/\1\n/g'
To change the number of fields per line, change {2} to one less than the number of fields. For example, to change every fifth comma (as in the title of your question), you would use:
sed -E 's/(([^,]*,){4}[^,]*),/\1\n/g'
In the regular expression, [^,]*, is "zero or more characters other than , followed by a ,; in other words, it is a single comma-delimited field. This won't work if the fields are quoted strings with internal commas or newlines.
Regardless of what Linux's man sed says, the -E flag is an extension to Posix sed, which causes sed to use extended regular expressions (EREs) rather than basic regular expressions (see man 7 regex). -E also works on BSD sed, used by default on Mac OS X. (Thanks to #EdMorton for the note.)
With GNU awk for multi-char RS:
$ awk -v RS='[,\n]' '{ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
With any awk:
$ awk -v RS=',' '{sub(/\n$/,""); ORS=(NR%3 ? "," : "\n")} 1' file
Name,Age,Year
Michael,27,2018
Carl,19,2018
Try this:
$ cat /tmp/22.txt
Name,Age,Year,Michael,27,2018,Carl,19,2018,Nooka,35,1945,Name1,11,19811
$ echo "Name,Age,Year"; grep -o "[a-zA-Z][a-zA-Z0-9]*,[1-9][0-9]*,[1-9][0-9]\{3\}" /tmp/22.txt
Michael,27,2018
Carl,19,2018
Nooka,35,1945
Name1,11,1981
Or, ,[1-9][0-9]\{3\} if you don't want to put [0-9] 3 more times for the YYYY part.
PS: This solution will give you only YYYY for the year (even if the data for YYYY is 19811 (typo mistakes if any), you'll still get 1981
You are looking for 3 fragments, each without a comma and separated by a comma.
The last fields can give problems (not ending with a comma and mayby only two fields.
The next command looks fine.
grep -Eo "([^,]*[,]{0,1}){0,3}" inputfile
This might work for you (GNU sed):
sed 's/,/\n/3;P;D' file
Replace every third , with a newline, print ,delete the first line and repeat.
How would I go about deleting all the lines before the last occurrence of a string. Like if I had a file that looked like
Icecream is good
And
Chocolate is good
And
They have lots of sugar
If I want all lines after and including the last occurrence of "And" what's the cleanest way to do this? Specifically, I want
And
They have lots of sugar
I was doing sed -n -E -e '/And/,$p' file but I see this gives me the first occurrence.
This might work for you (GNU sed):
sed -n '/And/h;//!H;$!d;x;//p' file
Replace anything in the hold space by the line containing And. Append all other lines to the hold space. At the end of the file, swap the pattern space for the hold space and print out the result as long it matches the required string And.
I know that you asked for sed and that Potong provided a good sed solution. But, for comparison, here is an awk solution:
$ awk 's{s=s"\n"$0;} /And/{s=$0;} END{print s;}' file
And
They have lots of sugar
How it works:
s{s=s"\n"$0;}
If the variable s is not empty, then add to it the current line, $0.
/And/{s=$0;}
If the current line contains And, then set s to the current line, $0.
END{print s;}
After we have reached the end of the file, print s.
$ tac file | awk '!f; /And/{f=1}' | tac
And
They have lots of sugar
$ awk 'NR==FNR{if(/And/)nr=NR;next} FNR>=nr' file file
And
They have lots of sugar
The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
I am trying to remove all of the matches of $word from a file, but only on lines where $word is placed somewhere within { and } which also appear on the same line, e.g.:
{The cat liked} the fish.
The mouse {did not like} the cat.
The {cat did not} like the spider.
If $word is set to "cat", then lines 1 and 3 are deleted, because "cat" appears between the { and }. If $word is set to "like", then lines 1 and 2 are deleted, because this search term appears on those lines between the { and }. Line 3 is not deleted, because like appears outside of the braces.
The braces are never nested.
The braces never appear split across lines.
I have tried various things, but these all returned errors:
sed -i "/\{*$word*\}/d" ./file.txt
sed -i "/\{.*$word.*\}/d" ./file.txt
sed -i "/\{(.*)$word(.*)\}/d" ./file.txt
How can I remove all of the lines in a file containing a variable, but only when the found variable was on a line and found between two braces?
sed -i "/{.*$word.*}/d" ./file.txt
\{ in sed actually have a special meaning, not the literal {, you should just write a { to represent the literal character. (which would be confusing if you are well familiar with perl regex ...)
Edit:
Be careful with -i, if this is in a script, and accidently $word is not defined or set to empty string, this command will delete all lines containing { no matter what between }.
I would take the answer that #cybeliak gave a little further. If you really want to match cat and not, say scat, then you need to delimit your expression with word boundaries:
sed '/{.*[[:<:]]'$word'[[:>:]].*}/d'
Note - I prefer to use ' ' style quotes to prevent any unintended side-effects...
As an aside, I am a big fan of not using the -i flag. Pipe the result into a different file and confirm for yourself that it's good, before deleting the original.
Much easier to do with awk:
awk -v s="cat" -F '[{}]' '!($2 ~ s)' file
The mouse {did not like} the cat.
awk -v s="like" -F '[{}]' '!($2 ~ s)' file
The {cat did not} like the spider.
This might work for you (GNU sed):
sed -i '/{[^}]*'"$word"'[^}]*}/d' file
N.B. $wordshould not contain } or /.