Understanding sed command - shell

Please excuse if the question is too naive. I am new to shell scripting and am not able to find any good resource to understand the specifics. I am trying to make sense of a legacy script. Please can someone tell me what the following command does:
sed "s#s3AtlasExtractName#$i#g" load_xyz.sql >> load_abc.sql;

This command will replace all occurrences of s3AtlasExtractName with whatever $i is.
s - Substitute
# - Delimiter
s3AtlasExtractName - Word that needs substituting
# - Delimiter
$i - i variable that will be used to replace s3AtlasExtractName
# - Delimiter
g - Global Replace all instance of s3AtlasExtractName in a single line and not just the first occurrence of it
So this will parse through load_xyz.sql and change all occurrences of s3AtlasExtractName to the value of $i and append the whole of the contents of load_xyz.sql to a file called load_abc.sql with the sed substitutions.

sed is a command line stream editor. You can find information about it here:
http://www.computerhope.com/unix/used.htm
An easy example is shown below where sed is used to replace the word "test" with the word "example" in myfile.txt but output is sent to newfile.txt
sed 's/test/example/g' myfile.txt > newfile.txt
It seems that your script is performing a similar function by replacing the content of the load_xyz.sql file and storing it in a new file load_abc.sql Without more code I am just guessing but it seems that the parameter $i could be used as counter to insert similar but new values into the load_abc.sql file.

In short, this reads load_xyz.sql and replaces every occurrence of "s3AtlasExtractName" by whatever has been stored in the shell variable "i".
The long version is that sed accepts many subcommands with different formattings. Any "simple" sed command will look like 'sed '. The first letter of the subcommand tells you which operation sed is going to do with your files.
The "s" operation stands for "substitution" and is the most commonly used. It is followed by a Perl-like regexp: separator, regexp to look for, separator, value to substitute, separator, PREG flags. In your case, the separator is '#' which is pretty unusual but not forbidden, so the command substitues '$i' to every instance of 's3AtlasExtractName'. The 'g' PREG flag tells sed to replace every occurrence of the pattern (the default is to only replace its first occurrence on every line in the input).
Finally, the use of "$i" inside a double-quote-delimited string tells the shell to actually expand the shell variable 'i' so you'll want to look for a shell statement setting that (possibly a 'for' statement).
Hope this helps.
edit: I focused on the 'sed' part and kinda missed the redirection part. The '>>' token tells the shell to take the output of the sed command (i.e. the contents of load_xyz.sql with all occurrences of s3AtlasExtractName replaced by the contents of $i) and append it to the file 'load_abc.sql'.

Related

How to make sed avoid replacement after specific symbol

I am writing a script for formatting a Fortran source code.
Simple formatting, like having all keywords in capitals or in small letters, etc.
Here is the main command
sed -i -e "/^\!/! s/$small\s/$cap /gI" $filein
It replaces every keyword $small (followed by a space) by a keyword $caps. And the replacement happens only if the line does not start with the "!".
It does what it should. Question:
How to avoid replacement if "!" is encountered in the middle of a line.
Or more generally, how to replace patterns everywhere, but not after a specific symbol, which can be either in the beginning of the line or somewhere else.
Example:
Program test ! It should not change the next program to caps
! Hi, is anything changing here? like program ?
This line does not have any key words
This line has Program and not exclamation mark.
"program" is a keyword. After running the script the result is:
PROGRAM test ! It should not change the next PROGRAM to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
I want:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.
So far, I've failed to find a nice solution, which does the trick, hopefully with the sed command.
The typicall way in sed is to:
split the string into two parts - save one part in hold space.
do operations on pattern space
get hold space and shuffle for output.
Would be something along:
sed '/!/!b;/[^!]/{b};h;s/.*!//;x;s/!.*//;s/program/PROGRAM/gI;G;s/\n/!/'
/!/!b; - if the line has no !, then print it and start over.
h;s/.*!//;x;s/!.*// - put part after ! in hold space, part before ! in pattern space
s/program/PROGRAM/gI; - do the substitution on part of the string
G;s/\n/!/ - grab the part from hold space and shuffle output - it's easy here.
Assumptions:
OP needs to convert multiple keywords to uppercase
keywords to be capitalized do not include white space (eg, program name will need to be processed as two separate strings program and name)
input delimiter is white space
keywords with 'attached' non-alphanums will be ignored (eg, Program, will be ignored since , will be picked up as part of string) unless OP specifically includes the non-alphanum as part of the keyword definition (eg, keywords includes Program,)
all keywords to be converted to uppercase (ie, not going to worry about any flags to switch between lowercase, uppercase, camelcase, etc)
Sample input data:
$ cat source.txt
Program test ! It should not change the next program to caps # change first 'Program'
! Hi, is anything changing here? like program or MarK? # change nothing
This line does not have any key words ! except here - pRoGraM Mark # change nothing
This line has Program and not exclamation mARk plus MarKer. # change 'Program' and 'mARk' but not MarKer
Hi, hi, hI # change 'Hi,' and 'hi,' but not 'hI'
List of keywords provided in a separate file (whitespace delimited);
$ cat keywords.dat
program
mark hi, # 2 separate keywords: 'mark' and 'hi,' (comma included)
One awk idea:
awk -v comment="!" ' # define character after which conversions are to be ignored
FNR==NR { for ( i=1; i<=NF; i++) # first file contains keywords; process each field as a separate keywork
keywords[toupper($i)] # convert to uppercase and use as index in associative array keywords[]
next
}
{ for ( i=1; i<=NF; i++ ) # second file, process each field separately
{ if ( $i == comment ) # if field is our comment character then stop processing rest of line else ...
break
if ( toupper($i) in keywords ) # if current field is a keyword then convert to uppercase
$i=toupper($i)
}
print # print the current line
}
' keywords.dat source.txt
This generates:
PROGRAM test ! It should not change the next program to caps
! Hi, is anything changing here? like program or MarK?
This line does not have any key words ! except here - pRoGraM Mark
This line has PROGRAM and not exclamation MARK plus MarKer.
HI, HI, hI
NOTES:
while GNU awk can be told to overwrite the input file (eg, awk -i inplace == sed -i), this will require a different approach for processing the keywords.dat file (to keep from overwriting with nothing)
(quite a bit) of additional logic could be added to support uppercase vs lowercase vs camelcase vs whatever ... ignore or include non-alphanums in comparisons ... using multiple/different 'comment' characters ... standardizing other portions of (Fortran) code (eg, indentation) ... etc
This might work for you (GNU sed):
small='Program ' caps='PROGRAM '
sed -E ':a;s/^([^!]*)('"$small"')/\1\n/;ta;s/\n/'"$caps"'/g' file
Replace any occurrence of the variable $small before the symbol ! with a newline, then replace all newlines by the variable $caps.
N.B. The newline is chosen because it can not normally exist in any line presented by sed as it is the delimiter sed uses to present lines in the pattern space. Secondly, the words matching $small are iteratively replaced by a newline, then all newlines globally replaced by $caps. This allows for the replacement to by a superset of the first. If this were not the order of operations, the iterative process may become an endless loop.
If $small is to represent a case insensitive match, add the i flag to the first substitution.
I've tried suggested options, but all of them did not work as expected for the whole file.
I have ended up with multiple sed commands; I am sure that it is not the best solution, but it works for me and does what I need.
My main problem was to avoid replacement after "!" if it appears somewhere in the middle of the line.
So I switched this problem to the one I could handle.
sed -i -e "/^\!/! s/!/!c7u!!c7u!/" $filein # 1. If a line does NOT start with !, search next "!" and replace it with "!c7u!!c7u!"
sed -i "s/!c7u!/\n/" $filein # 2. Move that comment to a new line
for ((i=0; i<$nwords; i++ )); do # Loop through all keywords
word=${words[$i]} # Take a keyword from the list
small=${word,,} # Write it in small letters
cap=${word^^} # Write it in capitals
sed -i -e "/^\!/! s/$small\b/$cap/gI" $filein # 3. Actual replacement in lines not starting with "!"
done
sed -i -e :a -e '$!N;s/\n!c7u//;ta' -e 'P;D' $filein # 4. Undo step 1-2, moving inline comments back

Extract a section in a config file line using sed

I'm trying to continue to extract and isolate sections of text within my wordpress config file via bash script. Can someone help me figure out my sytax?
The lineof code in the wp-config.php file is:
$table_prefix = 'xyz_';
This is what I'm trying to use to extract the xyz_ portion.
prefix=$(sed -n "s/$table_prefix = *'[^']*'/p" wp-config.php)
echo -n "$prefix"
There's something wrong with my characters obviously. Any help would be much appreciated!
Your sed command is malformed. You can use s/regex/replacement/p to print your sed command. Yours, as written, will give unterminated 's' command. If you want to print your whole line out, you can use the capture group \0 to match it as s/<our_pattern>/\0/p
Bash interpets $table_prefix as a variable, and because it is in double quotes, it tries to expand it. Unless you set this variable to something, it expands to nothing. This would cause your sed command to match much more liberally, and we can fix it by escaping the $ as \$table_prefix.
Next, this won't actually match. Your line has multiple spaces before the =, so we need another wildcard there as in ...prefix *= *...
Lastly, to extract the xyz_ portion alone, we'll need to do some things. First, we have to make sure our pattern matches the whole line, so that when we substitute, the rest of the line won't be kept. We can do this by wrapping our pattern to match in ^.* ... .*\$. Next, we want to wrap the target section in a capture group. In sed, this is done with \(<stuff>\). The zeroth capture group is the whole line, and then capture groups are numbered in the order the parentheses appear. this means we can do \([^']*\) to grab that section, and \1 to output it:
All that gives us:
prefix=$(sed -n "s/^.*\$table_prefix *= *'\([^']*\)'.*\$/\1/p" wp-config.php)
The only issue with the regex is that the '$' character specifies that you are using a bash variable and since the pattern is wrapped in double quotes (", bash will attempt to expand the variable. You can mitigate this by either escapping the $ or wrapping the pattern in single quotes and escaping the single quotes in the pattern
Lastly, you are using the sed command s which stands for subsitute. It takes a pattern and replaces the matches with text in the form of s/<pattern>/<replace>/. You can omit the 's' and leave the 'p' or print command at the end. After all your command should look something like:
sed -n "/\$table_prefix = *'[^']*'/p" wp-config.php

bash script on specific URL string manipulation

I need to manipulate a string (URL) of which I don't know lenght.
the string is something like
https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring
I basically need a regular expression which returns this:
https://x.xx.xxx.xxx/keyword/restofstring
where the x is the current ip which can vary everytime and I don't know the number of dontcares.
I actually have no idea how to do it, been 2 hours on the problem but didn't find a solution.
thanks!
You can use sed as follows:
sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2='
s stands for substitute and has the form s=search pattern=replacement pattern=.
The search pattern is a regex in which we grouped (...) the parts you want to extract.
The replacement pattern accesses these groups with \1 and \2.
You can feed a file or stdin to sed and it will process the input line by line.
If you have a string variable and use bash, zsh, or something similar you also can feed that variable directly into stdin using <<<.
Example usage for bash:
input='https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring'
output="$(sed -E 's=(https://[^/]*).*(/keyword/.*)=\1\2=' <<< "$input")"
echo "$output" # prints https://x.xx.xxx.xxx/keyword/restofstring
echo "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring" | sed "s/dontcare[0-9]\+\///g"
sed is used to manipulate text. dontcare[0-9]\+\///g is an escaped form of the regular expression dontcare[0-9]+/, which matches the word "dontcare" followed by 1 or more digits, followed by the / character.
sed's pattern works like this: s/find/replace/g, where g is a command that allowed you to match more than one instance of the pattern.
You can see that regular expression in action here.
Note that this assumes there are no dontcareNs in the rest of the string. If that's the case, Socowi's answer works better.
You could also use read with a / value for $IFS to parse out the trash.
$: IFS=/ read proto trash url trash trash trash keyword rest <<< "https://x.xx.xxx.xxx/dontcare1/dontcare2/dontcareN/keyword/restofstring"
$: echo "$proto//$url/$keyword/$rest"
https://x.xx.xxx.xxx/keyword/restofstring
This is more generalized when the dontcare... values aren't known and predictable strings.
This one is pure bash, though I like Socowi's answer better.
Here's a sed variation which picks out the host part and the last two components from the path.
url='http://example.com:1234/ick/poo/bar/quux/fnord'
newurl=$(echo "$url" | sed 's%\(https*://[^/?]*[^?/]\)[^ <>'"'"'"]*/\([^/ <>'"''"]*/^/ <>'"''"]*\)%\1\2%')
The general form is sed 's%pattern%replacement%' where the pattern matches through the end of the host name part (captured into one set of backslashed parentheses) then skips through the penultimate slash, then captures the remainder of the URL including the last slash; and the replacement simply recalls the two captured groups without the skipped part between them.

Using Perl to replace a string in only a certain line of file

I have a script that I am using for a large scale find and replace. When a match is found in a particular file, I record the file name, and the line number.
What I want to do is for each file name, line number pair, change a string from <foo> to <bar> on only that line of the file.
In my shell script, I am executing a find and replace command on the file given the line number...
run=`perl -pi -e "s/$find/$replace/ if $. = $lineNum" $file`
This however I believe has been ignoring the $. = $lineNum and just does the s/$find/$replace/ on the whole file, which is really bad.
Any ideas how I can do this?
You are using assignment = instead of comparison ==.
Use:
perl -pi -e "s/$find/$replace/ if $. == $lineNum" $file
where there are some caveats about the content of $find, $replace and $lineNum that probably aren't going to be a problem. The caveats are issues such as $find cannot contain a slash; $replace can't contain a slash either; $lineNum needs to be a line number; beware other extraneous extra characters that could confuse the code.
I don't see why you'd want to capture the standard output of the Perl process when it writes to the file, not to standard output. So, the assignment to run is implausible. And if it is necessary, you would probably be better off using run=$(perl …) with the $() notation in place of `…`.

sed partial replace or variable

I'd like to use sed to do a replace, but not by searching for what to replace.
Allow me to explain. I have a variable set to a default value initially.
VARIABLE="DEFAULT"
I can do a sed to replace DEFAULT with what I want, but then I would have to put DEFAULT back when I was all done. This is becuase what gets stored to VARIABLE is unique to the user. I'd like to use sed to search for somthing else other than what to replace. For example, search for VARIABLE=" and " and replace whats between it. That way it just constantly updates and there is no need to reset VARIABLE.
This is how I do it currently:
I call the script and pass an argument
./script 123456789
Inside the script, this is what happens:
sed -i "s%DEFAULT%$1%" file_to_modify
This replaces
VARIABLE="DEFAULT"
with
VARIABLE="123456789"
It would be nice if I didn't have to search for "DEFAULT", because then I would not have to reset VARIABLE at end of script.
sed -r 's/VARIABLE="[^"]*"/VARIABLE="123456789"/' file_to_modify
Or, more generally:
sed -r 's/VARIABLE="[^"]*"/VARIABLE="'"$1"'"/' file_to_modify
Both of the above use a regular expression that looks for 'VARIABLE="anything-at-all"' and replaces it with, in the first example above 'VARIABLE="123456789"' or, in the second, 'VARIABLE="$1"' where "$1" is the first argument to your script. The key element is [^"]. It means any character other than double-quote. [^"]* means any number of characters other than double-quote. Thus, we replace whatever was in the double-quotes before, "[^"]*", with our new value "123456789" or, in the second case, "$1".
The second case is a bit tricky. We want to substitute $1 into the expression but the expression is itself in single quotes. Inside single-quotes, bash will not substitute for $1. So, the sed command is broken up into three parts:
# spaces added for exposition but don't try to use it this way
's/VARIABLE="[^"]*"/VARIABLE="' "$1" '"/'
The first part is in single quotes and bash passes it literally to sed. The second part is in double-quotes, so bash will subsitute in for the value of `$``. The third part is in single-quotes and gets passed to sed literally.
MORE: Here is a simple way to test this approach on the command line without depending on any files:
$ new=1234 ; echo 'VARIABLE="DEFAULT"' | sed -r 's/VARIABLE="[^"]*"/VARIABLE="'"$new"'"/'
VARIABLE="1234"
The first line above is the command run at the prompt ($). The second is the output from running the command..

Resources