Using placeholders/variables in a sed command - bash

I want to store a specific part of a matched result as a variable to be used for replacement later. I would like to keep this in a one liner instead of finding the variable I need before hand.
when configuring apache, and use mod_rewrite, you can specificy specific parts of patterns to be used as variables,like this:
RewriteRule ^www.example.com/page/(.*)$ http://www.example.com/page.php?page=$1 [R=301,L]
the part of the pattern match that's contained inside the parenthesis is stored as $1 for use later. So if the url was www.example.com/page/home, it would be replaced with www.example.com/page.php?page=home. So the "home" part of the match was saved in $1 because it was the part of the pattern inside the parenthesis.
I want something like this functionality with a sed command, I need to automatically replace many strings in a SQL dump file, to add drop table if exist commands before each create table, but I need to know the table name to do this, so if the dump file contains something like:
...
CREATE TABLE `orders`
...
I need to run something like:
cat dump.sql | sed "s/CREATE TABLE `(.*)`/DROP TABLE IF EXISTS $1\N CREATE TABLE `$1`/g"
to get the result of:
...
DROP TABLE IF EXISTS `orders`
CREATE TABLE `orders`
...
I'm using the mod_rewrite syntax in the sed command as a logical example of what I'm trying to do.
Any suggestions?

sed '/CREATE TABLE \([^ ]*\)/ s//DROP TABLE IF EXISTS \1; &/'
Find a CREATE TABLE statement and capture the table name. Replace it with 'DROP TABLE IF EXISTS' and the table name, plus a semi-colon to terminate the statement, and a copy of what was matched to preserve the CREATE TABLE statement.
This is classic sed notation. Since you're using bash, there's a chance you're using GNU sed and will need to add --posix to use that notation, or you'll need to fettle the script to use GNU's non-standard sed regexes. I've also not attempted to insert a newline into the output. You can do that with GNU sed if it is important enough to you.
The key points are the parentheses (classically needing to be escaped with a backslash) are the capture mechanism, and backslash-number is the replacement mechanism.

sed -r "s/CREATE TABLE (\`.*\`)/DROP TABLE IF EXISTS \1\n &/g" dump.sql
test:
kent$ cat t.txt
CREATE TABLE `orders`
...
CREATE TABLE `foo`
...
...
CREATE TABLE `bar`
...
kent$ sed -r "s/CREATE TABLE (\`.*\`)/DROP TABLE IF EXISTS \1\n &/g" t.txt
DROP TABLE IF EXISTS `orders`
CREATE TABLE `orders`
...
DROP TABLE IF EXISTS `foo`
CREATE TABLE `foo`
...
...
DROP TABLE IF EXISTS `bar`
CREATE TABLE `bar`
...

This is called a "back reference". And sed will start numbering the things between the parenthesis. \(...\)
Note the use of backslash as an escape character above.
Ref: https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html

Here is an answer to the OP's question via title of this thread:
As #bhoom-suktitipat states: This is called a "back reference"...
SUMMARY:
Placeholder variables, or back reference lookup is achieved, on the replace side, by using backslash followed by a digit, starting at 1, like so: \1
BACKGROUND: I'm here because I've been taking a bunch of code and revamping it. In the process I want to pre-eslint it to find all the double quotes around JSON keys and strip them. ESLint's Quote Props Rule can be used, with --fix, to do the exact opposite of what I need todo.
MISSION OUTLINE:
Turn
{ "foo": "bar" }
into { foo: "bar" }, a.k.a. using-placeholders-variables-in-a-sed-command
Find the equivalent of the "foo" (key) in the second matching group: ("{1})(\w+)(":{1}).
Use that matching group as a placeholder to render foo:, instead of "foo":
Write changes to file.
Write to different file:
sed -r 's/("{1})(\w+)(":{1})/\2:/g' in.js > out.js
Write to same file:
sed -ri 's/("{1})(\w+)(":{1})/\2:/g' in-and-out.js
SED flags used:
-E, -r, --regexp-extended
use extended regular expressions in the script
(for portability use POSIX -E).
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)

Related

Replacing placeholder with multiple lines

I have a template that looks like:
<mydata>
<tag1>
<tag2> etc.
DATAHERE
</mydata>
I want to run a query on a DB and fetch a number of records and place them one below the other in the file at the place where there is the string DATAHERE.
My records are already fetched in mydata.txt. How do i replace the single line of DATAHERE ? I do not want to hardcode the number of lines in the template to skip. DATAHERE is my only marker.
This might work for you (GNU sed):
sed -e '/DATAHERE/{r dataFile' -e 'd}' file
Focus on the line DATAHERE and read the file dataFile then delete the current line.

Populate a value in a particular column in csv

I have a folder where there are 50 excel sheets in CSV format. I have to populate a particular value say "XYZ" in the column I of all the sheets in that folder.
I am new to unix and have looked for a couple of pages Here and Here . Can anyone please provide me the sample script to begin with?
For example :
Let's say column C in this case:
A B C
ASFD 2535
BDFG 64486
DFGC 336846
I want to update column C to value "XYZ".
Thanks.
I would export those files into csv format
- with semikolon as field separator
- eventually by leaving out column descriptions (otherwise see comment below)
Then the following combination of SHELL and SED script could more or less do already the trick
#! /bin/sh
for i in *.csv
do
sed -i -e "s/$/;XZY/" $i
done
-i means to edit the file in place, here you could append the value to all lines
-e specifies the regular expresssion for substitution
You might want to use a similar script like this, to rename "XYZ" to "C" only in the 1st line if the csv files should contain also column descriptions.

Using sed to replace text within a java properties file

I have a java properties file that looks like the following:
SiteUrlEndpoint=google.com/mySite
I want to use sed -i to inline replace the url but keep the context path that comes out of it. So for example if I wanted to change the properties file above to use amazon.com then the result would look like:
SiteUrlEndpoint=amazon.com/mySite
I am having trouble with sed to only replace the url and keeping the context path when replacing it inline.
My attempt:
sed -i 's:^[ \t]*siteUrlEndpoint[ \t]*=\([ \t]*.*\)[/]*$:siteUrlEndpoint = 'amazon.com':' file
You can do it with two backreferences, e.g.
sed -i.bak 's|^\(SiteUrlEndpoint=\).*/\(.*\)|\1amazon.com/\2|' file
note: the match of text up to / is greedy. If you have multiple parts of the path following the domain, you probably want to preserve all path components. To make it non-greedy, you could use the following instead
sed -i.bak 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' file
(you can add i.bak to create a backup of the original in file.bak)
To accomplish the same thing, you can match SiteUrlEndpoint= at the beginning of the line first, and then use a single backreference for the change, e.g.
sed -i.bak '/^SiteUrlEndpoint=/s|=[^/]*\(/.*\)|=amazon.com\1|' file
For example, given a file sites containing:
$ cat sites
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
You can change google.com to amazon.com with (using non-greedy form of first example):
$ sed -i 's|^\(SiteUrlEndpoint=\)[^/]*/\(.*\)|\1amazon.com/\2|' sites
Confirming:
$ cat sites
SiteUrlEndpoint=amazon.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
and
$ cat sites.bak
SiteUrlEndpoint=google.com/path/to/mySite
SiteUrlSomeOther=google.com/mySite
Explanation (first form)
sed -i.bak 's|^\(SiteUrlEndpoint=\) - locate & save
SiteUrlEndpoint=
[^/]*/ - match any folowing characters up to first / (non-greedy -
adjust as needed)
\(.*\) - match and save anything following /
|\1amazon.com/\2|' - full replacement (explanation below)
\1 - first back-reference containing SiteUrlEndpoint=
amazon.com - self-explanatory
/\2 - the '/' second back-reference of everything that followed.
Look over all the solutions and let me know if you have questions.
Regular expressions are hard, especially with complex regular expressions and/or large input files where unexpected changes are to be avoided.
Therefore I strongly recommend using sed -i.bak to keep a backup of the original file to then run a diff on both of them to see what changed.
Assuming that
You only want to change things after the tag siteUrlEndpoint (case insensitive)
You want to change the URL to amazon.com while leaving the path intact
I came up with this solution:
sed -i.bak 's;^\([ \t]*siteurlendpoint[ \t]*=[ \t]*\)[^/]*\(.*\);\1amazon.com\2;Ig' infile
I used a semicolon instead of your colon, that's just my preference when I don't want to use / ;)
Then I wrapped both the leading white spaces and siteurlendpoint as well as everything from the first / onwards into brackets \( \) so that I can take them again in the replacement with \1 and \2. That way I keep the indentation and the capitalisation of SiteUrlEndpoint intact.
For the search options I added an I to the g to make the search case insensitive. I am not sure how standard this option is, you might have to see whether your sed understands it.
The actual part that I want to replace I have just any character not including the next /: [^/]*
As for your line:
Your search term only searches for siteUrlEndpoint with lower case s. Since in your examples you wrote it with capital S, it wouldn't have triggered.
The final [/]*$ doesn't make any sense at all. "This line can end in zero or more of any of these caracters: /."
You precede this [/]*$ with .* which means: zero or more of any character at all.
The single quotes around 'amazon.com' might interfere with the single quotes around the whole search/replace term. It seems to work, but it is sloppy, and will fail if there are ever any spaces in there. It doesn't seem to serve any purpose anyway (except if you want to replace amazon.com with some environment variable like $NEWSITE) so I don't know why you're doing that.
Keep a backreference to the part just before the domain - then match and replace the domain - you can add the -i option after verifying the output of the sed command
url=amazon.com
sed -r 's/\b(SiteUrlEndpoint\s*=\s*)[^/]+/\1'$url'/'
Keep it simple:
$ sed -E 's/(SiteUrlEndpoint=)[^.]+/\1amazon/' file
SiteUrlEndpoint=amazon.com/mySite

Pre-pending and appending to a shell variable

My goal is to load an external tables log file into a CLOB column in an oracle database. I've been having issues with the max size you can insert at once but I am able to insert the whole file if I to_clob each line of the log file, concatenate and then insert them (as far as I'm aware this seems to be the quickest and easiest way?):
insert into clob_insert_test values (to_clob('hfsdjhfjsdhfjksd')||chr(10)||to_clob('jhfklsdjfklsdjklfjdsjlk'));
My question is:
I'm reading the file into a shell variable as below so what I need to do is pre-pend to_clob(' to the beginning of each line of the variable and then append ')||chr(10)|| and remove the last ||chr(10)|| from the variable to finish. I can then use that variable in the SQL insert statement for the clob column. Is there a way I can directly do this on the variable rather than modifying the log file before reading it in?
log_content=$(<"$log_file")
Edit:
Sorry I don't think I was clear. Given the example log file I would expect the following variable contents.
Input file:
LOG file opened at 05/05/15 15:12:24
Field Definitions for table ext_loading
Record format DELIMITED BY NEWLINE
Variable contents:
to_clob('LOG file opened at 05/05/15 15:12:24')||char(10)||to_clob('Field Definitions for table ext_loading')||char(10)||to_clob('Record format DELIMITED BY NEWLINE')
I assume you have a file like:
this is me||chr(10)||adfasdf
asdas||chr(10)||asdfasdfasdas
And you want it to become something like:
to_clob('this is meadfasdf')||chr(10)||
to_clob('asdasasdfasdfasdas')||chr(10)||
If so, you can use sed like this:
sed -e "s/||chr(10)||//" -e "s/^/to_clob('/" -e "s/$/')||chr(10)||/" file
That is:
remove ||chr(10)|| once from each line.
add to_clob(' to the begining of each line.
add ')||chr(10)|| to the end of each line.
And to store it in a variable:
log_content=$(sed -e "s/||chr(10)||//" -e "s/^/to_clob('/" -e "s/$/')||chr(10)||/" "$log_file")
Update
To match what you really need, you can also do this:
line=$(sed -e "/./s/^/to_clob('/" -e "/./s/$/')||chr(10)||/" "$log_file")
Then the output is:
$ echo $line # note, without quotes to have all of it together!
to_clob('LOG file opened at 05/05/15 15:12:24')||chr(10)|| to_clob('Field Definitions for table ext_loading')||chr(10)|| to_clob('Record format DELIMITED BY NEWLINE')||chr(10)||
And remove the last ||chr(10)|| with:
$ echo $line | sed 's/||chr(10)||$//'
to_clob('LOG file opened at 05/05/15 15:12:24')||chr(10)|| to_clob('Field Definitions for table ext_loading')||chr(10)|| to_clob('Record format DELIMITED BY NEWLINE')

bash script to update postgres database

I have some html data stored in text files right now. I recently decided to store the HTML data in the pgsql database instead of flat files. Right now, the 'entries' table contains a 'path' column that points to the file. I have added a 'content' column that should now store the data in the file pointed to by 'path'. Once that is complete, the 'path' column will be deleted. The problem that I am having is that the files contain apostrophes that throw my script out of whack. What can I do to correct this issue??
Here is the script
#!/bin/sh
dbname="myDB"
username="username"
fileroot="/path/to/the/files/*"
for f in $fileroot
do
psql $dbname $username -c "
UPDATE entries
SET content='`cat $f`'
WHERE id=SELECT id FROM entries WHERE path LIKE '*`$f`';"
done
Note: The logic in the id=SELECT...FROM...WHERE path LIKE "" is not the issue. I have tested this with sample filenames in the pgsql environment.
The problem is that when I cat $f, any apostrophe in Edit: the contents of $f closes the SQL string, and I get a syntax error.
For the single quote escaping issue, a reasonable workaround might be to double the quotes, so you'd use:
`sed "s/'/''/g" < "$f"`
to include the file contents instead of the cat, and for the second invocation in the LIKE where you appeared to intend to use the file name use:
${f/"'"/"''"/}
to include the literal string content of $f instead of executing it, and double the quotes. The ${varname/match/replace} expression is bash syntax and may not work in all shells; use:
`echo "$f" | sed "s/'/''/g"`
if you need to worry about other shells.
There are a bunch of other problems in that SQL.
You're trying to execute $f in your second invocation. I'm pretty sure you didn't intend that; I imagine you meant to include the literal string.
Your subquery is also wrong, it lacks parentheses; (SELECT ...) not just SELECT.
Your LIKE expression is also probably not doing what you intended; you probably meant % instead of *, since % is the SQL wildcard.
If I also change backticks to $() (because it's clearer and easier to read IMO), fix the subquery syntax and add an alias to disambiguate the columns, and use a here-document instead passed to psql's stdin, the result is:
psql $dbname $username <<__END__
UPDATE entries
SET content=$(sed "s/'/''/g" < "$f")
WHERE id=(SELECT e.id FROM entries e WHERE e.path LIKE '$(echo "$f" | sed "s/'/''/g")');
__END__
The above assumes you're using a reasonably modern PostgreSQL with standard_conforming_strings = on. If you aren't, change the regexp to escape apostrophes with \ instead of doubling them, and prefix the string with E, so O'Brien becomes E'O\'Brien'. In modern PostgreSQL it'd instead become 'O''Brien'.
In general, I'd recommend using a real scripting language like Perl with DBD::Pg or Python with psycopg to solve scripting problems with databases. Working with the shell is a bit funky. This expression would be much easier to write with a database interface that supported parameterised statements.
For example, I'd write this as follows:
import os
import sys
import psycopg2
try:
connstr = sys.argv[1]
filename = sys.argv[2]
except IndexError as ex:
print("Usage: %s connect_string filename" % sys.argv[0])
print("Eg: %s \"dbname=test user=fred\" \"some_file\"" % sys.argv[0])
sys.exit(1)
def load_file(connstr,filename):
conn = psycopg2.connect(connstr)
curs = conn.cursor()
curs.execute("""
UPDATE entries
SET content = %s
WHERE id = (SELECT e.id FROM entries e WHERE e.path LIKE '%%'||%s);
""", (filename, open(filename,"rb").read()))
curs.close()
if __name__ == '__main__':
load_file(connstr,filename)
Note the SQL wildcard % is doubled to escape it, so it results in a single % in the final SQL. That's because Python is using % as its format-specifier so a literal % must be doubled to escape it.
You can trivially modify the above script to accept a list of file names, connect to the database once, and loop over the list of all file names. That'll be a lot faster, especially if you do it all in one transaction. It's a real pain to do that with psql scripting; you have to use bash co-process as shown here ... and it isn't worth the hassle.
In the original post, I made it sound like there were apostrophes in the filename represented by $f. This was NOT the case, so a simple echo "$f" was able to fix my issue.
To make it more clear, the contents of my files were formatted as html snippets, typically something like <p>Blah blah <b>blah</b>...</p>. After trying the solution posted by Craig, I realized I had used single quotes in some anchor tags, and I did NOT want to change those to something else. There were only a few files where this violation occurred, so I just changed these to double quotes by hand. I also realized that instead of escaping the apostrophes, it would be better to convert them to &apos; Here is the final script that I ended up using:
dbname="myDB"
username="username"
fileroot="/path/to/files/*"
for f in $fileroot
do
psql $dbname $username << __END__
UPDATE entries
SET content='$(sed "s/'/\&apos;/g" < "$f")'
WHERE id=(SELECT e.id FROM entries e WHERE path LIKE '%$(echo "$f")');
__END__
done
The format coloring on here might make it look like the syntax is incorrect, but I have verified that it is correct as posted.

Resources