extract text from file using sed - bash

I'm trying to write a script in bash which extracts a database name from a PHP file. For example I want to copy CRM_123456789 from the below line:
$sugar_config['dbconfig']['db_name'] = 'CRM_123456789';
I have tried using sed, so essentially I want to copy the text between
['db_name'] = '
and
';
sed -n '/['db_name'] = /,/';/p' myfile.php
However this does not return anything. Does anyone know what I'm doing wrong?
Thanks

You cannot nest single quotes. Your expression evaluates to single-quoted /[ next to unquoted db_name where clearly you want to match on a literal single quote.
One workaround is to use double quotes for the outermost quoting, but make sure you make any necessary changes, because double quotes are weaker than single quotes in the shell. In your case, there's nothing to change in that respect, though.
However, you also appear to misunderstand how sed address expressions work. They identify lines, not substrings on a line. So your script would print between a line matching ['db_name'] and a line matching ';. To extract something from within a line, the common idiom is to substitute out the parts you don't want, then print what's left.
Also, because opening square bracket is a metacharacter in sed, you need to backslash-escape it to match it literally.
sed -n "s/.*\['db_name'] = '\([^']*\)'.*/\1/p" myfile.php
This matches up through ['db_name'] = ', then captures whatever is inside the single-quoted string into \1, then matches anything from the next single quote through the end of line, and substitutes it with just the captured string; and prints that line after performing the substitution.
If the config file supports variable whitespace, a useful improvement would be to allow for optional whitespace around the equals sign, and possibly also within the square brackets. [ ]* will match zero or more spaces (the square brackets aren't really necessary around a single space, but I include them here for legibility reasons).

You could try the below sed command.
$ sed -n "s/.*\['db_name'\] = '\([^']*\)';.*/\1/p" file
CRM_123456789

Related

Extract a section in a config file line using sed

I'm trying to continue to extract and isolate sections of text within my wordpress config file via bash script. Can someone help me figure out my sytax?
The lineof code in the wp-config.php file is:
$table_prefix = 'xyz_';
This is what I'm trying to use to extract the xyz_ portion.
prefix=$(sed -n "s/$table_prefix = *'[^']*'/p" wp-config.php)
echo -n "$prefix"
There's something wrong with my characters obviously. Any help would be much appreciated!
Your sed command is malformed. You can use s/regex/replacement/p to print your sed command. Yours, as written, will give unterminated 's' command. If you want to print your whole line out, you can use the capture group \0 to match it as s/<our_pattern>/\0/p
Bash interpets $table_prefix as a variable, and because it is in double quotes, it tries to expand it. Unless you set this variable to something, it expands to nothing. This would cause your sed command to match much more liberally, and we can fix it by escaping the $ as \$table_prefix.
Next, this won't actually match. Your line has multiple spaces before the =, so we need another wildcard there as in ...prefix *= *...
Lastly, to extract the xyz_ portion alone, we'll need to do some things. First, we have to make sure our pattern matches the whole line, so that when we substitute, the rest of the line won't be kept. We can do this by wrapping our pattern to match in ^.* ... .*\$. Next, we want to wrap the target section in a capture group. In sed, this is done with \(<stuff>\). The zeroth capture group is the whole line, and then capture groups are numbered in the order the parentheses appear. this means we can do \([^']*\) to grab that section, and \1 to output it:
All that gives us:
prefix=$(sed -n "s/^.*\$table_prefix *= *'\([^']*\)'.*\$/\1/p" wp-config.php)
The only issue with the regex is that the '$' character specifies that you are using a bash variable and since the pattern is wrapped in double quotes (", bash will attempt to expand the variable. You can mitigate this by either escapping the $ or wrapping the pattern in single quotes and escaping the single quotes in the pattern
Lastly, you are using the sed command s which stands for subsitute. It takes a pattern and replaces the matches with text in the form of s/<pattern>/<replace>/. You can omit the 's' and leave the 'p' or print command at the end. After all your command should look something like:
sed -n "/\$table_prefix = *'[^']*'/p" wp-config.php

sed command: search and replace variables with \n character

I have to variables in a bash script:
$string = "The cat is green.\n"
$line = "Sunny day today.\n"
each of those variables contain "\n" character, how can I use sed to search and replace:
sed 's/$string/$line/g' file.txt
This doesn't seem to work, if I erase the "\n" from the strings sed works properly.
If I had only the text I could escape "\n" by adding a backslash:
sed 's/"The cat is green.\\n"/"Sunny day today.\\n"/g' file.txt
How can I manage to do search/replace when variables contain "\n" in them.
Thank you for the help.
It looks like you are trying to match the two-character sequence \n, as opposed to the single newline character that together they represent in some contexts. There is a tremendous difference between these.
As part of your example, you presented
sed 's/$string/$line/g' file.txt
, but that won't work at all, because variable references are not expanded within single-quoted strings. That has nothing whatever to do with the values of shell variables string and line.
But let's consider those values:
$string="The cat is green.\n"
$line="Sunny day today.\n"
[Extra spaces removed.]
Of course, the problem you're focusing on is that sed recognizes \n as a code for a newline character, but you also have the problem that in a regular expression, the . character matches any character, so if you want it to be treated as a literal then it, too, needs to be escaped (in the pattern, but not in the replacement). If you're trying to support search and replace for arbitrary text, then there are other characters you'll need to escape, too.
Answering the question as posed (escaping only \n sequences) you might do this:
sed "s/${string//\\n/\\\\n}/${line//\\n/\\\\n}/g"
The ${foo//pat/repl} form of parameter expansion performs pattern substitution on the expanded value, but note well that the pattern (pat) is interpreted according to shell globbing rules, not as a regular expression. That specific form replaces every appearance of the pattern; read the bash manual for alternatives that match only the first appearance and/or that match only at the beginning or the end of the parameter's value. Note, too, the extra doubling of the \ characters in the pattern substitution -- they need to be escaped for the shell, too.
Given your variable definitions, that command would be equivalent to this:
sed 's/The cat is green.\\n/Sunny day today.\\n/g'
In other words, exactly what you wanted. Again, however, be warned: that is not a general solution for arbitrary search & replace. If you want that, then you'll want to study the sed manual to determine which characters need to be escaped in the regex, and which need to be escaped in the replacement. Moreover, I don't see a way to do it with just one pattern substitution for each variable.

Unix shell replacing a word containing backtick in a file

I am having a sql file (samplesqlfile) and I want to replace a string which contains backticks with another string. Below is the code.
actualtext="FROM sampledatabase.\`Datatype\`"
replacetext="FROM sampledatabase.\`Datatype_details\`"
sed -i "s/\<${actualtext}\>/${replacetext}/g" samplesqlfile
This is not working. The actual word to be replaced is
FROM sampledatabase.`Datatype`
I added back slashes to escape the backticks. But still it is not working. Please help.
Observe that this does not work:
$ sed "s/\<${actualtext}\>/${replacetext}/g" samplesqlfile
FROM sampledatabase.`Datatype`
But this does:
$ sed "s/\<${actualtext}/${replacetext}/g" samplesqlfile
FROM sampledatabase.`Datatype_details`
The problem was the \>. The string variable $actualtext does not end with a word-character. It ends with a quote. Consequently, \> will never match there. The solution is to remove \>.
To clarify, \> matches at the boundary between a word character and a non-word character where the word character appears first. Word characters can be alphanumerics or underlines.
\> is a GNU extension. The behavior under BSD/OSX sed will be different.
For purposes of illustration here, I removed the -i option. For your intended use, of course, add it back.

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

replace substring in lines using sed or grep

I have a file with a lot of lines, two of them are:
videoId: 'S2Rgr6yuuXQ'
var vid_seq=1;
in a shell script, I have two variables,
for id, the value is always 11 characters/numbers
id='fsafsferii2'
id_seq=80
I want to modify these two lines with id and id_seq
videoId: 'fsafsferii2'
var vid_seq=80;
I used
sed -i 's/\(videoId: \).*\\1'${id}'/\2' file
but there are errors, what is wrong with my script?
thanks
The grep command won't "replace" text, it is for "global regular expression print". But sed will.
sed -i'' '/^videoId: /s/: .*/: '"$id"'/;/^var vid_seq=/s/=.*/='"$id_seq"';/'
I'm not a big fan of inserting variables into sed scripts this way, but sed is simple, and provides no mechanism for actually using actual variables on its own. If you're going to do this, include some format checking for the two variables to make sure they contain the data you want them to contain, before you run this sed script. An accidental / in a variable would cause the sed script to fail.
UPDATE per comments:
Here's a successful test:
$ id=fsafsferii2
$ id_seq=80
$ cat inp686
videoId: 'S2Rgr6yuuXQ'
var vid_seq=1;
$ sed '/^videoId: /s/: .*/: '"$id"'/;/^var vid_seq=/s/=.*/='"$id_seq"';/' < inp686
videoId: fsafsferii2
var vid_seq=80;
$
Of course, you'll need to do some quote magic to get the single quotes into your videoId, but I'm sure you can figure that out yourself.
UPDATE 2
According to sed's man page, the substitute command is in the form:
[2addr]s/regular expression/replacement/flags
The [2addr] means you can specify up to two "addresses", which can be line numbers or regular expressions to match. So the s (substitute) command can take a line, a range, a match, or a span between matches. In our case, we're just using a single match to identify what lines we want to execute the substitution on.
The script above is made up of two sed commands, separated by a semicolon.
/^videoId: / -- Match lines that start with the word videoId:...
s/: .*/: '"$id"'/; -- Substitute all text from the colon to the end of the line with whatever is in the $id environment variable.
/^var vid_seq=/ -- Match lines that ... meh, as above.
s/=.*/='"$id_seq"';/ -- Substitute all text from the equals sign on with $id_seq.
Note that the '"$id"' construct means that we are exiting the single quotes, then immediately entering double quotes for the expansion of the variable ... then exiting the double quotes and going back into a new set of single quotes. Sed scripts are safest inside single quotes because of the frequent use of characters that might be interpreted by a shell.
Note also that because sed's substitute command uses a forward slash as a delimiter, the $id and $id_seq variables may not contain a slash. If they might, you can switch to a different delimiter.
What is wrong with:
sed -i 's/\(videoId: \).*\\1'${id}'/\2' file
Missing the third delimiter (/). Valid syntax is s/regex/replace/
Incorrect regex pattern (let's assume ${id} has been substituted)
\(videoId: \).*\\1fsafsferii2
is telling it to match a string that looks like this:
videoId: anything\1fsafsferii2
(\\ in regex matches literal backslash, so \\1 would match a literal backslash followed by 1 instead of 1st sub-expression)
Replace the matched string with \2
But since there is only one set of parentheses, \2 is actually empty.
Also, since the regex pattern in 2. doesn't match anything, nothing is replaced.
This should work (GNU sed)
sed -i 's/\(videoId: \).*/\1 \x27'${id}'\x27/
s/\(var vid_seq=\).*/\1'${id_seq}'\;/' file
Note:
\x27 is the hexadecimal representation of single quote (to prevent clashing with the other single quote)
\; for literal semicolon. If ; is not escaped, it's interpreted to terminate the s command in sed.

Resources