Using sed with a substitution variable that may have curly braces - bash

I'm writing a script for looping over a set of files in a directory searching for a string (stringA) in one file (srcFile), copying the line that follows it (stringToCopy), and pasting it on the line after another search string (stringB) in another file (outputFile). The copy/paste script that I have so far is as follows
stringA="This is string A"
stringB="This is string B"
srcFile=srcFile.txt
outpuFile=outputFile.txt
replacement="/$stringA/{getline; print}"
stringToCopy="$(awk "$replacement" $srcFile)"
sed -i "/$stringB/!b;n;c${stringToCopy}" $outputFile
The script works great, except when stringToCopy ends up containing curly braces. Example is
srcFile.txt:
This is string A
text to copy: {0}
outputFile.txt:
This is string B
line to be replaced
Once the script is done, I would expect outputFile.txt to be
This is string B
text to copy: {0}
But sed chokes with
sed: -e expression #1, char 106: unknown command: `m'
I've tried hardcoding the problematic string and trying different variations of escaping the curlies and quoting the string, but haven't found a winning combination and I'm at a loss for how to make it work.
EDIT
I had a derp moment and forgot that my stringA also has curly braces, that happened to cause my awk command to math multiple lines. This caused my stringToCopy to have newlines in it which is my real issue, not the curly braces. So the real question is, how to make awk treat curly braces as literal characters so that
srcFile.txt
This is string A: {0}
text to copy: {0}
This is string A:
Other junk
And stringA="This is string A: {0}"
Doesn't set stringToCopy to
text to copy: {0}
Other junk

A bit of a kludge in that we're going to add some extra coding specifically for braces ...
Current situation:
$ awk '/This is string A: {0}/{getline; print}' srcFile.txt
text to copy: {0} # this is the line we want
Other junk # we do not want this line
We can eliminate the second line by escaping the braces in the search pattern, eg:
$ awk '/This is string A: \{0\}/{getline; print}' srcFile.txt
text to copy: {0}
So, how to escape the braces? We can use some explicit parameter expansions to replace the braces with escaped braces in the $stringA variable, keeping in mind that we also need to escape the braces in the parameter expansion phase, too:
$ stringA="This is string A: {0}"
$ stringA="${stringA//\{/\\{}" # replace '{' with '\{'
$ stringA="${stringA//\}/\\}}" # replace '}' with '\}'
$ echo "${stringA}"
This is string A: \{0\}
We can then proceed with the rest of the code as is:
$ replacement="/$stringA/{getline; print}"
$ echo "${replacement}"
/This is string A: \{0\}/{getline; print}
$ stringToCopy="$(awk "$replacement" $srcFile)"
$ echo "${stringToCopy}"
text to copy: {0}
As for the final sed step I had to remove the ! to get it to work correctly:
$ sed -i "/$stringB/b;n;c${stringToCopy}" $outputFile
$ cat "${outputFile}"
This is string B
text to copy: {0}
NOTES:
if you preface your coding with set -xv you can see how variables are being interpreted at each step; use set +xv to turn off
obviously you'll probably run into issues if you do in fact have more than 1 matching row in $srcFile
if you find other characters that need to be escaped then you'll need to add additional parameter expansions for said characters

Related

I want to extract the strings from file name

one_two_three_four_five.rtf
I need five in A variable
I need four in B variable
And remaining in C variable
Should read from the last character
Note after 2 underscore from the last. There could be many underscores but should take has C variable.
Is it possible?
For example using parameter expansion
#!/bin/ksh
string="one_two_three_four_five.rtf"
base=${string%.rtf}
a=${base##*_}; base=${base%_$a}
b=${base##*_}; base=${base%_$b}
c=$base
echo "$a - $b - $c"
s="one_two_three_four_five.rtf"
source <(sed -r 's/(.*)_([^_]*)_([^_]*)[.].*/C="\1"; B="\2";A="\3"/' <<< "${s}")
# Result:
echo "A=$A, B=$B, C=$C"
A=five, B=four, C=one_two_three
Explanation:
sed -r No need for escaping backslashes
(.*)_ Matches largest string until underscore with the condition that there are underscores left for matching the remaining string
([^_]*) String without underscore
[.] A dot without special meaning
"\1" First remembered string
<<< "${s}" Input for sed is like echo "${s}" | sed ...
<(..) Simulates a file, so sourcing these will execute the commands.

Adding a comma after $variable

I'm writing a for loop in bash to run a command and I need to add a comma after one of my variables. I can't seem to do this without an extra space added. When I move "," right next to $bams then it outputs *.sorted,
#!/bin/bash
bams=*.sorted
for i in $bams
do echo $bams ","
done;
Output should be this:
'file1.sorted','file2.sorted','file3.sorted'
The eventual end goal is to be able to insert a list of files into a --flag in the format above. Not sure how to do that either.
First, a literal answer (if your goal were to generate a string of the form 'foo','bar','baz', rather than to run a program with a command line equivalent to somecommand --flag='foo','bar','baz', which is quite different):
shopt -s nullglob # generate a null result if no matches exist
printf -v var "'%s'," *.sorted # put list of files, each w/ a comma, in var
echo "${var%,}" # echo contents of var, with last comma removed
Or, if you don't need the literal single quotes (and if you're passing your result to another program on its command line with the single quotes being syntactic rather than literal, you absolutely don't want them):
files=( *.sorted ) # put *.sorted in an array
IFS=, # set the comma character as the field separator
somecommand --flag "${files[*]}" # run your program with the comma-separated list
try this -
lst=$( echo *.sorted | sed 's/ /,/g' ) # stack filenames with commas
echo $lst
if you really need the single-ticks around each filename, then
lst="'$( echo *.sorted | sed "s/ /','/g" )'" # commas AND quotes
#!/bin/bash
bams=*.sorted
for i in $bams
do flag+="${flag:+,}'$i'"
done
echo $flag

Why is bash ignoring the ending double quote (")

I have this file:
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/I_Cuestionario_general_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/IV_Ingresos_y_recursos_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VI_ambito_escolar_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VII_ambito_laboral_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VIII_ambito_comunitario_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/IX_Atencion_Obstetrica_estimaciones_endireh2016.xlsx
http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/X_ambito_familiar_estimaciones_endireh2016.xlsx
And this bash script:
while read p; do
echo "\"$p\""
done < file.txt
I would expect the same file but with double quotes around each line, but this is what bash is outputting:
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/I_Cuestionario_general_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/IV_Ingresos_y_recursos_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VI_ambito_escolar_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VII_ambito_laboral_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/VIII_ambito_comunitario_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/IX_Atencion_Obstetrica_estimaciones_endireh2016.xlsx
"http://www.beta.inegi.org.mx/contenidos/proyectos/enchogares/especiales/endireh/2016/tabulados/X_ambito_familiar_estimaciones_endireh2016.xlsx
Anyone know why bash is behaving this way? And how to output both " double quotes? (beginning and end)
I'm near certain that the line endings on your input file are CR/LF rather than just LF. This would output:
";
the web address;
a CR returning the cursor to the beginning of the line;
"; and, finally,
moving to a new line.
Capture the output to a file and pass it through a dump utility like od -xcb, that should show you the raw bytes being output.
As a test, creating a file consisting of the two lines 123<CR> and 456, I see:
pax> while read p; do echo "\"$p\""; done <testfile
"123
"456"
which seems to indicate the problem is as described.
If you're having trouble escaping the leading and trailing double quotes, you can just use single quotes around your echo statement. Any double quotes inside of single quotes have no significance in terms of defining a string literal, and vice versa:
while read p; do
echo '"$p"'
done < file.txt

Shell - what does <<-'EOF' mean [duplicate]

I needed to write a script to enter multi-line input to a program (psql).
After a bit of googling, I found the following syntax works:
cat << EOF | psql ---params
BEGIN;
`pg_dump ----something`
update table .... statement ...;
END;
EOF
This correctly constructs the multi-line string (from BEGIN; to END;, inclusive) and pipes it as an input to psql.
But I have no idea how/why it works, can some one please explain?
I'm referring mainly to cat << EOF, I know > outputs to a file, >> appends to a file, < reads input from file.
What does << exactly do?
And is there a man page for it?
The cat <<EOF syntax is very useful when working with multi-line text in Bash, eg. when assigning multi-line string to a shell variable, file or a pipe.
Examples of cat <<EOF syntax usage in Bash:
1. Assign multi-line string to a shell variable
$ sql=$(cat <<EOF
SELECT foo, bar FROM db
WHERE foo='baz'
EOF
)
The $sql variable now holds the new-line characters too. You can verify with echo -e "$sql".
2. Pass multi-line string to a file in Bash
$ cat <<EOF > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
EOF
The print.sh file now contains:
#!/bin/bash
echo $PWD
echo /home/user
3. Pass multi-line string to a pipe in Bash
$ cat <<EOF | grep 'b' | tee b.txt
foo
bar
baz
EOF
The b.txt file contains bar and baz lines. The same output is printed to stdout.
This is called heredoc format to provide a string into stdin. See https://en.wikipedia.org/wiki/Here_document#Unix_shells for more details.
From man bash:
Here Documents
This type of redirection instructs the shell to read input from
the current source until a line
containing only word (with no trailing
blanks) is seen.
All of the lines read up to that point are then used as the
standard input for a command.
The format of here-documents is:
<<[-]word
here-document
delimiter
No parameter expansion, command substitution, arithmetic expansion, or
pathname expansion is performed on
word. If any characters in word are
quoted, the
delimiter is the result of quote removal on word, and the lines
in the here-document are not expanded.
If word is unquoted, all lines of the
here-document are subjected to parameter expansion, command
substitution, and arithmetic
expansion. In the latter case, the
character sequence \<newline> is
ignored, and \ must be used to quote the characters \, $, and `.
If the redirection operator is <<-, then all leading tab characters
are stripped from input lines and the
line containing delimiter. This
allows here-documents within shell scripts to be indented in a natural fashion.
In your case, "EOF" is known as a "Here Tag". Basically <<Here tells the shell that you are going to enter a multiline string until the "tag" Here. You can name this tag as you want, it's often EOF or STOP.
Some rules about the Here tags:
The tag can be any string, uppercase or lowercase, though most people use uppercase by convention.
The tag will not be considered as a Here tag if there are other words in that line. In this case, it will merely be considered part of the string. The tag should be by itself on a separate line, to be considered a tag.
The tag should have no leading or trailing spaces in that line to be considered a tag. Otherwise it will be considered as part of the string.
example:
$ cat >> test <<HERE
> Hello world HERE <-- Not by itself on a separate line -> not considered end of string
> This is a test
> HERE <-- Leading space, so not considered end of string
> and a new line
> HERE <-- Now we have the end of the string
POSIX 7
kennytm quoted man bash, but most of that is also POSIX 7: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04 :
The redirection operators "<<" and "<<-" both allow redirection of lines contained in a shell input file, known as a "here-document", to the input of a command.
The here-document shall be treated as a single word that begins after the next <newline> and continues until there is a line containing only the delimiter and a <newline>, with no <blank> characters in between. Then the next here-document starts, if there is one. The format is as follows:
[n]<<word
here-document
delimiter
where the optional n represents the file descriptor number. If the number is omitted, the here-document refers to standard input (file descriptor 0).
If any character in word is quoted, the delimiter shall be formed by performing quote removal on word, and the here-document lines shall not be expanded. Otherwise, the delimiter shall be the word itself.
If no characters in word are quoted, all lines of the here-document shall be expanded for parameter expansion, command substitution, and arithmetic expansion. In this case, the <backslash> in the input behaves as the <backslash> inside double-quotes (see Double-Quotes). However, the double-quote character ( '"' ) shall not be treated specially within a here-document, except when the double-quote appears within "$()", "``", or "${}".
If the redirection symbol is "<<-", all leading <tab> characters shall be stripped from input lines and the line containing the trailing delimiter. If more than one "<<" or "<<-" operator is specified on a line, the here-document associated with the first operator shall be supplied first by the application and shall be read first by the shell.
When a here-document is read from a terminal device and the shell is interactive, it shall write the contents of the variable PS2, processed as described in Shell Variables, to standard error before reading each line of input until the delimiter has been recognized.
Examples
Some examples not yet given.
Quotes prevent parameter expansion
Without quotes:
a=0
cat <<EOF
$a
EOF
Output:
0
With quotes:
a=0
cat <<'EOF'
$a
EOF
or (ugly but valid):
a=0
cat <<E"O"F
$a
EOF
Outputs:
$a
Hyphen removes leading tabs
Without hyphen:
cat <<EOF
<tab>a
EOF
where <tab> is a literal tab, and can be inserted with Ctrl + V <tab>
Output:
<tab>a
With hyphen:
cat <<-EOF
<tab>a
<tab>EOF
Output:
a
This exists of course so that you can indent your cat like the surrounding code, which is easier to read and maintain. E.g.:
if true; then
cat <<-EOF
a
EOF
fi
Unfortunately, this does not work for space characters: POSIX favored tab indentation here. Yikes.
Using tee instead of cat
Not exactly as an answer to the original question, but I wanted to share this anyway: I had the need to create a config file in a directory that required root rights.
The following does not work for that case:
$ sudo cat <<EOF >/etc/somedir/foo.conf
# my config file
foo=bar
EOF
because the redirection is handled outside of the sudo context.
I ended up using this instead:
$ sudo tee <<EOF /etc/somedir/foo.conf >/dev/null
# my config file
foo=bar
EOF
<< EoF basically means:
<< - "read the multi-line input that begins from the next line onward, and treat it as if it's code in a separate file"
EoF - "stop reading immediately after the word EoF is found in the multi-line input"
As other answers have explained, the multi-line input is called a Here Document
A Here Document is often used to generate output to be passed to a subsequent process. For example cat << EoF can be used to generate a desired output, using a Here Document.
Here's an example of using a Here Document to create a text document on the fly:
cat << EoF > ./my-document.txt
Hello world
Have a nice day
EoF
A little extension to the above answers. The trailing > directs the input into the file, overwriting existing content. However, one particularly convenient use is the double arrow >> that appends, adding your new content to the end of the file, as in:
cat <<EOF >> /etc/fstab
data_server:/var/sharedServer/authority/cert /var/sharedFolder/sometin/authority/cert nfs
data_server:/var/sharedServer/cert /var/sharedFolder/sometin/vsdc/cert nfs
EOF
This extends your fstab without you having to worry about accidentally modifying any of its contents.
note to mention that cat << \EOT (see the backslash) will not expand any variables inside, while cat << EOT will do.
examples:
FOO="bar"
cat << \EOT > foobar.txt
echo "$FOO"
EOT
will output:
echo $FOO
while:
FOO="bar"
cat << EOT > foobar.txt
echo "$FOO"
EOT
will output:
echo "bar"
Example to create a json file:
cat << EoF > ./allaccess.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}
EoF
As a result:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}
Long story short, EOF marker(but a different literal can be used as well) is a heredoc format that allows you to provide your input as multiline.
A lot of confusion comes from how cat actually works it seems.
You can use cat with >> or > as follows:
$ cat >> temp.txt
line 1
line 2
While cat can be used this way when writing manually into console, it's not convenient if I want to provide the input in a more declarative way so that it can be reused by tools and also to keep indentations, whitespaces, etc.
Heredoc allows to define your entire input as if you are not working with stdin but typing in a separate text editor. This is what Wikipedia article means by:
it is a section of a source code file that is treated as if it were a
separate file.
This isn't necessarily an answer to the original question, but a sharing of some results from my own testing. This:
<<test > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
test
will produce the same file as:
cat <<test > print.sh
#!/bin/bash
echo \$PWD
echo $PWD
test
So, I don't see the point of using the cat command.
Worth noting that here docs work in bash loops too.
This example shows how-to get the column list of table:
export postgres_db_name='my_db'
export table_name='my_table_name'
# start copy
while read -r c; do test -z "$c" || echo $table_name.$c , ; done < <(cat << EOF | psql -t -q -d $postgres_db_name -v table_name="${table_name:-}"
SELECT column_name
FROM information_schema.columns
WHERE 1=1
AND table_schema = 'public'
AND table_name =:'table_name' ;
EOF
)
# stop copy , now paste straight into the bash shell ...
output:
my_table_name.guid ,
my_table_name.id ,
my_table_name.level ,
my_table_name.seq ,
or even without the new line
while read -r c; do test -z "$c" || echo $table_name.$c , | perl -ne
's/\n//gm;print' ; done < <(cat << EOF | psql -t -q -d $postgres_db_name -v table_name="${table_name:-}"
SELECT column_name
FROM information_schema.columns
WHERE 1=1
AND table_schema = 'public'
AND table_name =:'table_name' ;
EOF
)
# output: daily_issues.guid ,daily_issues.id ,daily_issues.level ,daily_issues.seq ,daily_issues.prio ,daily_issues.weight ,daily_issues.status ,daily_issues.category ,daily_issues.name ,daily_issues.description ,daily_issues.type ,daily_issues.owner

how to chop last n bytes of a string in bash string choping?

for example qa_sharutils-2009-04-22-15-20-39, want chop last 20 bytes, and get 'qa_sharutils'.
I know how to do it in sed, but why $A=${A/.\{20\}$/} does not work?
Thanks!
If your string is stored in a variable called $str, then this will get you give you the substring without the last 20 digits in bash
${str:0:${#str} - 20}
basically, string slicing can be done using
${[variableName]:[startIndex]:[length]}
and the length of a string is
${#[variableName]}
EDIT:
solution using sed that works on files:
sed 's/.\{20\}$//' < inputFile
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
using awk:
echo $str | awk '{print substr($0,1,length($0)-20)}'
or using strings manipulation - echo ${string:position:length}:
echo ${str:0:$((${#str}-20))}
In the ${parameter/pattern/string} syntax in bash, pattern is a path wildcard-style pattern, not a regular expression. In wildcard syntax a dot . is just a literal dot and curly braces are used to match a choice of options (like the pipe | in regular expressions), so that line will simply erase the literal string ".20".
There are several ways to accomplish the basic task.
$ str="qa_sharutils-2009-04-22-15-20-39"
If you want to strip the last 20 characters. This substring selection is zero based:
$ echo ${str::${#str}-20}
qa_sharutils
The "%" and "%%" to strip from the right hand side of the string. For instance, if you want the basename, minus anything that follows the first "-":
$ echo ${str%%-*}
qa_sharutils
only if your last 20 bytes is always date.
$ str="qa_sharutils-2009-04-22-15-20-39"
$ IFS="-"
$ set -- $str
$ echo $1
qa_sharutils
$ unset IFS
or when first dash and beyond are not needed.
$ echo ${str%%-*}
qa_sharutils

Resources