Can I replace Affx- with rs in text file in bash - bash

I have a huge text file. I want to replace all strings that start with Affx- and then some numbers (like Affx-74537382 or Affx-4374575) with rs (and the same numbers like: rs74537382 or rs4374575. Is this possible with sed -i 's/Affx-/rs/ ?
Since the file is so huge I am not sure how to verify that the command is working correctly.

You can use
sed -E 's/^Affx(-[0-9]+)/rs\1/' file > tmp && mv tmp file
Details:
-E - POSIX ERE syntax enabled
^ - start of string
Affx - a literal text
(-[0-9]+) - Group 1 (\1 refers to the value in this group): - and one or more digits.
See the online demo:
#!/bin/bash
s='Blah-1233455
Affx-74537382
Some line here
Affx-4374575
End of text 123456778.'
sed -E 's/^Affx(-[0-9]+)/rs\1/' <<< "$s"
Output:
Blah-1233455
rs-74537382
Some line here
rs-4374575
End of text 123456778.

Related

How to add a character to the end of a line, when a find and replace is done to the beginning?

I am creating a simple script that converts a custom markup to TeX macros:
? What are four kinds of animals?
- elephants
- tigers
- bears
- fish
e
This becomes:
\QUESTION{What are four kinds of animals?}{
\ANSWER{elephants}
\ANSWER{tigers}
\ANSWER{bears}
\ANSWER{fish}
}
I have used a simple syntax to replace the items at the front:
sed 's#^? #\\QUESTION{#' file > temp1
sed 's#^\- #\\ANSWER{#' temp1 > temp2
sed 's#^e #\}{#' temp2 > temp3
How do I get it to also add the }{ to the end when "?" is found at the beginning, and add } to the end when "-" is found at the beginning of the line?
Match the whole line instead of its beginning, and use a replacement pattern referencing the content of the line :
sed -e 's#^? \(.*\)#\\QUESTION{\1}{' -e 's#^- \(.*\)#\\ANSWER{\1}#' -e 's#^e#}#'
In this command \(...\) are capturing groups and \1 refers to their content.
I also took the liberty of regrouping your multiple substitutions in a single sed command.
Like this:
sed -E 's/^(\? )(.*)/\\QUESTION{\2}{/;t;s/- (.*)/\ANSWER{\1}/;t;s/e/}/' file
Explanation:
s/^(\? )(.*)/\\QUESTION{\2}{/ Handle lines starting with ?
t means not further actions if the above s command replaced something
s/- (.*)/\ANSWER{\1}/ Handle lines starting with -
t means not further actions if the above s command replaced something
s/^e/}/ Handle lines starting with e.
You can "speed it up" a bit by reordering the commands by the complexity of the search pattern, like this:
sed -E 's/e/}/;t;s/- (.*)/\ANSWER{\1}/;t;s/^(\? )(.*)/\\QUESTION{\2}{/;' file
But yeah, probably micro-optimization.
You can try this sed too :
sed '/^- /s//\\ANSWER{/;/^e/s///;s/$/}/;/^? /{s//\\QUESTION{/;s/$/{/}' infile
sed '
/^- /s//\\ANSWER{/ # line with -
/^e/s/// # line with e
s/$/}/ # add } at the end of each line
/^? / { # line with ?
s//\\QUESTION{/
s/$/{/
}
' infile

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.
sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file
I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input
Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

How do I replace text using a variable in a shell script

I have a variable with a bunch of data.
text = "ABCDEFGHIJK"
file = garbage.txt //iiuhdsfiuhdsihf]sdiuhdfoidsoijsf
What I would like to do is replace the ] charachter in file with text. I've tried using sed but I keep getting odd errors.
output should be:
//iiuhdsfiuhdsihfABCDEFGHIJKsdiuhdfoidsoijsf
Just need to escape the ] character with a \ in regex:
text="ABCDEFGHIJK"
sed "s/\(.*\)\]\(.*\)/\1$text\2/" file > file.changed
or, for in-place editing:
sed -i "s/\(.*\)\]\(.*\)/\1$text\2/" file
Test:
sed "s/\(.*\)\]\(.*\)/\1$text\2/" <<< "iiuhdsfiuhdsihf]sdiuhdfoidsoijsf"
# output => iiuhdsfiuhdsihfABCDEFGHIJKsdiuhdfoidsoijsf
There is always the bash way that should work in your osx:
filevar=$(cat file)
echo "${filevar/]/$text}" #to replace first occurence
OR
echo "${filevar//]/$text}" #to replace all occurences
In my bash i don't even have to escape ].
By the way, the simple sed does not work?
$ a="AA"
$ echo "garbage.txt //iiuhdsfiuhdsihf]sdiuhdfoidsoijsf" |sed "s/]/$a/g"
garbage.txt //iiuhdsfiuhdsihfAAsdiuhdfoidsoijsf

String manipulation via script

I am trying to get a substring between &DEST= and the next & or a line break.
For example :
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
In this I need to extract "SFO"
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
In this I need to extract "SANFRANSISCO"
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
In this I need to extract "SANJOSE"
I am reading a file line by line, and I need to update the text after &DEST= and put it back in the file. The modification of the text is to mask the dest value with X character.
So, SFO should be replaced with XXX.
SANJOSE should be replaced with XXXXXXX.
Output :
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Please let me know how to achieve this in script (Preferably shell or bash script).
Thanks.
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=PORTORICA
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
$ sed -E 's/^.*&DEST=([^&]*)[&]*.*$/\1/' file
SFO
PORTORICA
SANFRANSISCO
SANJOSE
should do it
Replacing airports with an equal number of Xs
Let's consider this test file:
$ cat file
MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=SANFRANSISCO&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=SANJOSE
To replace the strings after &DEST= with an equal length of X and using GNU sed:
$ sed -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
To replace the file in-place:
sed -i -E ':a; s/(&DEST=X*)[^X&]/\1X/; ta' file
The above was tested with GNU sed. For BSD (OSX) sed, try:
sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
Or, to change in-place with BSD(OSX) sed, try:
sed -i '' -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta file
If there is some reason why it is important to use the shell to read the file line-by-line:
while IFS= read -r line
do
echo "$line" | sed -Ee :a -e 's/(&DEST=X*)[^X&]/\1X/' -e ta
done <file
How it works
Let's consider this code:
search_str="&DEST="
newfile=chart.txt
sed -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
-E
This tells sed to use Extended Regular Expressions (ERE). This has the advantage of requiring fewer backslashes to escape things.
:a
This creates a label a.
s/('"$search_str"'X*)[^X&]/\1X/
This looks for $search_str followed by any number of X followed by any character that is not X or &. Because of the parens, everything except that last character is saved into group 1. This string is replaced by group 1, denoted \1 and an X.
ta
In sed, t is a test command. If the substitution was made (meaning that some character needed to be replaced by X), then the test evaluates to true and, in that case, ta tells sed to jump to label a.
This test-and-jump causes the substitution to be repeated as many times as necessary.
Replacing multiple tags with one sed command
$ name='DEST|ORIG'; sed -E ':a; s/(&('"$name"')=X*)[^X&]/\1X/; ta' file
MYREQUESTISTO8764GETTHIS&DEST=XXX&ORIG=XXXX
MYREQUESTISTO8764GETTHIS&DEST=XXXXXXXXXXXX&ORIG=XXXX
MYREQUESTISTO8764GETTHISWITH&DEST=XXXXXXX
Answer for original question
Using shell
$ s='MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546'
$ s=${s#*&DEST=}
$ echo ${s%%&*}
SFO
How it works:
${s#*&DEST=} is prefix removal. This removes all text up to and including the first occurrence of &DEST=.
${s%%&*} is suffix removal_. It removes all text from the first & to the end of the string.
Using awk
$ echo 'MYREQUESTISTO8764GETTHIS&DEST=SFO&ORIG=6546' | awk -F'[=\n]' '$1=="DEST"{print $2}' RS='&'
SFO
How it works:
-F'[=\n]'
This tells awk to treat either an equal sign or a newline as the field separator
$1=="DEST"{print $2}
If the first field is DEST, then print the second field.
RS='&'
This sets the record separator to &.
With GNU bash:
while IFS= read -r line; do
[[ $line =~ (.*&DEST=)(.*)((&.*|$)) ]] && echo "${BASH_REMATCH[1]}fooooo${BASH_REMATCH[3]}"
done < file
Output:
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHIS&DEST=fooooo&ORIG=6546
MYREQUESTISTO8764GETTHISWITH&DEST=fooooo
Replace the characters between &DEST and & (or EOL) with x's:
awk -F'&DEST=' '{
printf("%s&DEST=", $1);
xlen=index($2,"&");
if ( xlen == 0) xlen=length($2)+1;
for (i=0;i<xlen;i++) printf("%s", "X");
endstr=substr($2,xlen);
printf("%s\n", endstr);
}' file

How do I insert a newline/linebreak after a line using sed

It took me a while to figure out how to do this, so posting in case anyone else is looking for the same.
For adding a newline after a pattern, you can also say:
sed '/pattern/{G;}' filename
Quoting GNU sed manual:
G
Append a newline to the contents of the pattern space, and then append the contents of the hold space to that of the pattern space.
EDIT:
Incidentally, this happens to be covered in sed one liners:
# insert a blank line below every line which matches "regex"
sed '/regex/G'
This sed command:
sed -i '' '/pid = run/ a\
\
' file.txt
Finds the line with: pid = run
file.txt before
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
and adds a linebreak after that line inside file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
; Error log file
Or if you want to add text and a linebreak:
sed -i '/pid = run/ a\
new line of text\
' file.txt
file.txt after
; Note: the default prefix is /usr/local/var
; Default Value: none
;pid = run/php-fpm.pid
new line of text
; Error log file
A simple substitution works well:
sed 's/pattern.*$/&\n/'
Example :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\nJohn/'
Hi
John
Bye
To be standard compliant, replace \n by backslash newline :
$ printf "Hi\nBye\n" | sed 's/H.*$/&\
> John/'
Hi
John
Bye
sed '/pattern/a\\r' file name
It will add a return after the pattern while g will replace the pattern with a blank line.
If a new line (blank) has to be added at end of the file use this:
sed '$a\\r' file name
Another possibility, e.g. if You don't have an empty hold register, could be:
sed '/pattern/{p;s/.*//}' file
Explanation:
/pattern/{...} = apply sequence of commands, if line with pattern found,
p = print the current line,
; = separator between commands,
s/.*// = replace anything with nothing in the pattern register,
then automatically print the empty pattern register as additional line)
The easiest option -->
sed 'i\
' filename

Resources