This section of my expect script is sometimes sending "y\r", and sometimes not. On a side note, I don't even see why lvcreate is asking, because before here, the script dd's 20GB of zeros to the drives, and this is the first lvcreate. But, whatever, what would be a separate question, and I'd just rather build in the response for wiping. (And, yes, this type of thing is dangerous. It's for fresh system installs, and steps through with the user to get them to type in several confirmations.)
I don't get why it's not matching on the first glob. First line is the actual output of the program lvcreate.
WARNING: linux_raid_member signature detected on /dev/disk1/terraswap1 at offset 4096. Wipe it? [y/n]:
expect: does "\u001b[K\u001b[?1h\u001b=\u001b[?2004hl\u0008lvcreate disk1 -L 16G -n terraswap1\u001b[?1l\u001b>\u001b[?2004l\r\r\n\u001b]0;root#archiso: lvcreate disk1 -L 16G -n terraswap1\u0007WARNING: linux_raid_member signature detected on /dev/disk1/terraswap1 at offset 4096. Wipe it? [y/n]: " (spawn_id exp7) match glob pattern "Wipe it? [y/n]: "? no
"root*archiso*# "? no
In short, to reduce scrolling, the expect output is:
expect: does "...Wipe it? [y/n]: " (spawn_id exp7) match glob pattern "Wipe it? [y/n]: "? no
"root*archiso*# "? no
The pertinent code section is:
#!/usr/bin/expect -f
# ... connect via ssh as root to machine with hostname archiso
expect "root*archiso*# "
send "lvcreate disk1 -L 16G -n terraswap1\r"
expect {
"Wipe it? \[y/n]: " {
send "y\r"
exp_continue
} "root*archiso*# " {
send "lvcreate disk2 -L 16G -n terraswap2\r"
}
}
The [ is special to both Tcl and the Expect pattern matcher so it is particularly messy. To match a literal [, you have to backslash once from Tcl and then again so that it is not treated as a range during pattern matching. The first backslash, of course, has to be backslashed to prevent it from turning the next backslash into a literal backslash!
expect "\\\[" ; #matches literal '['
So, your code should be,
expect {
"Wipe it? \\\[y/n]: " {
send "y\r"
exp_continue
} "root*archiso*# " {
send "lvcreate disk2 -L 16G -n terraswap2\r"
}
}
Note :
Alternatively, we can use braces as well for matching literal [
expect {\[}; # This too matches literal '['
glob patterns are implicitly anchored. Your pattern does not match because the entirety of the text does not match the short pattern. For a string containing "Wipe it? [y/n]", you want:
{*Wipe it\? \[y/n\]:*}
Note that ? is a glob wildcard, so escape it to match a literal question mark. See the string match documentation for details.
Related
I am trying to retrieve several files using scp.
I already know the paths of the files to get on the remote, so I decided to add them in an array:
declare -a array
array+=("path/to/file1")
array+=("path/to/file2")
array+=("path/to/file3")
scp "$USER#$HOST:${array[#]}" .
outputs:
path/to/file1
cp: cannot stat `path/to/file2': No such file or directory
cp: cannot stat `path/to/file3': No such file or directory
Only the first file gets copied. The scp command only takes the first file into account, then cp is invoked for the remaining files.
Something as simple as this makes it work:
declare -a array
array+=("path/to/file1")
array+=("path/to/file2")
array+=("path/to/file3")
string="${array[#]"
scp "$USER#$HOST:$string" .
outputs:
path/to/file1
path/to/file2
path/to/file3
When I launch my script with bash -x, it shows that with the array, the command is not properly quoted:
+ scp $USER#$HOST:path/to/file1 path/to/file2 path/to/file3 .
Contrary to the string version:
+ scp '$USER#$HOST:path/to/file1 path/to/file2 path/to/file3' .
What exactly is causing this ? And is there a way to make the array version work, or should I use a string every time I want to use scp ? (which could be quite inconvenient with special characters)
Expanding an array with # results in multiple arguments:
$ array=(foo bar baz)
$ printf '<%s>\n' "${array[#]}"
<foo>
<bar>
<baz>
Expanding it with * results in a single argument separated by the first character in $IFS:
$ array=(foo bar baz)
$ printf '<%s>\n' "${array[*]}"
<foo bar baz>
99% of commands expect one filename per argument, but scp for historical reasons uses multiple filenames per arguments. In this case, you can therefore use
scp "$USER#$HOST:${array[*]}" .
though you'll likely want to escape the filenames as well, again for historical scp reasons:
scp "$USER#$HOST:${array[*]#Q}" .
We can substitute printf for scp to see what it actually expands to:
$ printf '>%s<\n' "$USER#$HOST:${array[#]}"
>user#host:path/to/file1<
>path/to/file2<
>path/to/file3<
Which is not what you want: you need to attach the user&host to each array element.
The ${var/pattern/string} expansion can be used here:
$ printf '>%s<\n' "${array[#]/#/$USER#$HOST:}"
>user#host:path/to/file1<
>user#host:path/to/file2<
>user#host:path/to/file3<
This is a tricky one:
we have the "var" as array[#] -- that expands to each array element
the pattern is #, which means "the empty string anchored at the start of the string
and the replacement string is the user&host.
Expanding the files array into a brace expression for the remote host:
$ remote_files=$( IFS=","; printf '%s#%s:{%s}' "$USER" "$HOST" "${array[*]}" )
$ printf '>%s<\n' "$remote_files"
>user#host:{path/to/file1,path/to/file2,path/to/file3}<
I have an application where I want the user to be able to enter ip addresses that are saved to a conf file. The addresses need to be checked to ensure they are valid ip addresses (xxx.xxx.xxx.xxx)
Given that this is a user set persistent value running on a user application (ie. not root), the conf file must reside in a user folder. I have chosen the user home directory (Raspbian).
The conf file test sample looks like this:
interface=eth0
ip_address=172.30.21.40
routers=172.30.21.1
domain_name_server_1=199.85.126.30
damaim_name_server_2=8.8.8.8
If the user saves a valid ip_address, I want to read and store this in a variable .
If the user saves an invalid ip_address, then I want to read and discard the ip address and return an empty string.
I have looked at range of options to do this.
I looked at using source, but I found this requires the conf to be executable. That would add the risk of a user injecting executable code into the conf file.
I think I should be able to read, check and store the ip_address value in a one line sed command, but I just can't get it to work.
The test script is:
!/bin/bash
conf_file='/home/user/ip.conf'
v1="$(sed -n 's/\b(?:ip_address=)(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\b/\1/p' $conf_file)"
echo "The ip address is : $v1"
exit
To break this down into parts:
\b(?:ip_address=) # match the string "ip_address=" starting with a word separator \b
(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))
^ ^
# This section checks the format and number range of the ip address. This is made up of three
# groups that are all contained with a set of brackets (marked with ^) to create a group 1 with
# the whole ip address. This is what I want to capture. This ends with a word separator \b
/\1/p # This is the substitution section where I specify group 1 and print to save to $v1.
When I run this command I get the error
sed: -e expression #1, char 110: invalid reference \1 on `s' command's RHS
When I enter:
\b(?:ip_address=)(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\b
into the online regex tester it works without error. It identifies the full ip address as group 1.
The sed command doesn't seem to recognise the back reference \1 and I can't figure out what I am doing wrong.
Edit
I tried a simple command:
v1="$(sed -n -E 's/^\s*(interface=)(.*)\b/\2/p' $conf_file)"
This only worked correctly with the -E option added. This is based on an answer found here. I can't find any documentation on -E but it appears to enable extended regular expressions.
Too much. Big problems are sum of small problems - just take one little problem one at a time.
# Filter lines with ip_address. Allow leading spaces.
if ! ip_address=$(grep '^[[:space:]]*ip_address=' "$conf_file")l then
echo "ERROR: no line ip_Address found in config file" >&2
exit 2
fi
# dissallow two ip_address= lines
if [[ "$(printf "%s\n" "$ip_address" | wc -l)" -gt 1 ]]; then
echo "ERROR: There are two lines with ip_address in config file!" >&2
exit 2
fi
# remove the string before `=`
ip_address=${ip_address##*=}
# check if it's a valid address
re='^(0*(1?[0-9]{1,2}|2([0-4][0-9]|5[0-5]))\.){3}'
re+='0*(1?[0-9]{1,2}|2([0-4][0-9]|5[0-5]))$'
if [[ ! $ip_address =~ $re ]]; then
echo "ERROR: ip_Address option is not valid ip address" >&2
exit 2
fi
echo "found ip_Address=$ip_address"
But sure, you can do it all in GNU sed, including error handling:
if ! ip_address=$(sed -n '
# if its the last line i should have ip_address in hold space
${
x
# if I dont, that means error
/^$/{
s/.*/ERROR: no ip_Address found in the file/
p
q 1
}
# print the ip address
p
}
# remove lines that are not ip_Addresses
/^[[:space:]]*ip_address=/!{d;b}
# remove ip_address= strnig
s///
# if there is something in hold space, means we already found ip_address
x
/^$/!{
s/.*/ERROR: two lines with ipaddress found/
p
q 1
}
x
# check if the rest is a valid ip addresss
/^\(0*\(1\?[0-9]\{1,2\}\|2\([0-4][0-9]\|5[0-5]\)\)\.\)\{3\}0*\(1\?[0-9]\{1,2\}\|2\([0-4][0-9]\|5[0-5]\)\)$/!{
s/.*/ERROR: Invalid ip address: &/
p
q 1
}
# hold the valid ip_address in hold space
h
' "$conf_file"); then
echo "$ip_address" >&2
exit 2
fi
echo "Found ip_address=$ip_address"
I believe your idea was to do it just like:
sed -n -E 's/^ip_address=(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))$/\1/p' "$conf_file"
which would be "good enough", but will be silent if user makes a mistake.
Thanks to assistance in the comments, the problem was found to be the ?: term in the regex. sed couldn't process that. Here is a demo script of the solution I was looking for:
#!/bin/bash
conf_file='/home/user/ip.conf'
v1=$(sed -n -E 's/^\s*(ip_address=)(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\s(.*)$/\2/p' $conf_file)
echo "The ip address is : $v1"
v2="$(sed -n -E 's/^\s*(interface=)(.*)\s(.*)/\2/p' $conf_file)"
echo "The interface is : $v2"
v3=$(sed -n -E 's/^\s*(routers=)(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))\b(.*)/\2/p' $conf_file)
echo "The router ip is : $v3"
exit 0
The test config file, which intentionally includes errors, is:
interface=eth0 #comment
ip_address=172.30.21.40 # comment
ip_address=123.30.21.40 comment
ip_address=1234.123.30.21.40
ip_address=ab3.dd30.21.40
routers=172.30.21.1 172.123.456.234
domain_name_server_1=199.85.126.30
damaim_name_server_2=8.8.8.8
The output is:
The ip address is : 172.30.21.40
123.30.21.40
The interface is : eth0
The router ip is : 172.30.21.1
The error detection isn't perfect (it doesn't gracefully handle duplicate lines), and for my application it doesn't need to be. It is good enough. In my application, users will not have direct access to the command line or their home directory so this script is intended to be a second line of defense against bad input. Your requirements might vary.
This one line of code completes a number of tasks.
open file,
searches for a option name,
reads the option value,
checks the option value format/content against the regex filter,
ignores other text and comments on the same line,
returns the option value if valid, or "" if not and,
saves to a variable.
If the entry is missing or invalid, the return value will be "". In that case, a default value will be used (code not shown).
Getting it to work took way too much time but I learnt a lot about regex and sed. Now that it works, it is easy to adapt to read other than ip addresses, as shown in the test files.
I'm writing a bash script that will use readline command to catch user's multiline input (list of packages), but I can't figure out how to delete (backslash) characters while inputing text (let's suppose I made a mistake). Below is the part of the code I'm implying.
read -d `echo -e "\e"` -p $'Give me a list:\n' PACKAGES
\e is the escape character to exit multiline input. Here what I got when I try to delete a character (I did tried all the methods I know: alt+backslash,ctrl+w;ctrl+u):
# read -d `echo -e "\e"` -p $'Give me a list:\n' vPACKAGES
Give me a list:
line one
line two
line 3^H^W^U^H^
As you can see ^H is how backslash was intrpreted, ^W =ctrl+w, so I'm unable to delete the any character only escape with escape. I did tried to play with read flags, -e letting me delete characters but enter becomes no longer a line breaker.
Thank you in advance for your advices and help.
I wouldn't do this this way at all. I would just prompt for values in a loop until you get an empty entry or some other delimiter.
Something like this (untested):
pkgs=()
printf 'Give me a list:\n'
while read -e pkg; [ -n "$pkg" ]; do
# Validate package, etc., etc.
pkgs+=("$pkg")
done
# Use `${pkgs[#]}` ...
You can use any other test instead of [ -n "$pkg" ] if you have some other terminating entry. (e.g. [ "$pkg" = "." ] to test for a ., etc.)
Or, as glenn jackman correctly adds, with newer (4.0+) versions of bash the new readarray builtin can be used here something like this:
printf 'Enter package names, one per line: hit Ctrl-D on a blank line to stop\n'
readarray -t pkgs
In the below code, I would like to have the output of pyfg() passed exactly as echoed (i.e. with the space between -htns and crl being interpreted literally, not as whitespace, by aoeu()) to aoeu(). Of course, the problem is that in aoeu(), $1 is -htns, $2 is crl, and $3, which I don't want at all in this case, is qjkx. I know this example is thoroughly useless, but the real application to which I'm trying to apply this calls an external program in place of the below aoeu(), so I do need something like what's below.
#!/bin/bash
# pass_space_function.sh
aoeu() {
echo "$1" "$2"
}
pyfg() {
echo "-htns crl" "qjkx"
}
aoeu $(pyfg)
My running the above outputs:
$ ./pass_space_function.sh
-htns crl
My desired output is:
$ ./pass_space_function.sh
-htns crl qjkx
To be clear, I do understand exactly why my code isn't working, but that about which I'm not so sure is how to make it do what I want it to do.
EDIT:
#!/bin/bash
aoeu() {
echo 1:"$1" 2:"$2" 3:"$3"
}
pyfg() {
# These variables might be user-provided.
wvz="/usr/lib/scarychacacters_\"##$:%^&:*(){}[]; a o ;u ;::e i y f.so.4"
bm="/space space space"
snt="/var/cache/normalpath"
printf "%q %q %q" "$wvz" "$bm" "$snt"
}
aoeu $(pyfg)
That code returns, for me, 1:/usr/lib/scarychacacters_\"##\$:%\^\&:\*\(\)\{\}\[\]\;\ 2:a\ 3:o\. It's obviously splitting at the whitespace in $wvz.
The key to correct quoting lies in the understanding what happens.
That echo "-htns crl" "qjkx" for example will print just a byte stream to its stdout, so it will be just -htns crl qjkx in the end. The information that -htns crl were grouped more closely than qjkx is lost.
To avoid this loss you can use printf "%q":
pyfg() {
printf "%q %q" "-htns crl" "qjkx"
}
This will generate quoted output: -htns\ crl qjkx which means to the shell the same as "-htns crl" "qjkx" (whether the space is escaped with a backslash or quoted with double quotes does not make a difference).
The next aspect is the use of $() to pass the output of one program to the next.
The typical way is to put that in double quotes:
aoeu "$(pyfg)"
This way everything is passed without interpretation which is desirable in most cases.
In your case, however, you might want to make the output of pyfg quoted instead of quote the output of pyfg; notice the important difference: The first means that pyfg produces quoted output (as shown above), the second means that pyfg produces output which gets quoted later. The second does not help if the output of pyfg already lost the information which parts belong together.
If you now just leave away the double quotes, the output unfortunately just gets split at the spaces (i. e. first character of $IFS) even if this space is escaped with a backslash. So, instead, you need to use eval in this case to force the shell to interpret the value of $(pyfg) with the normal shell evaluation mechanism:
eval aoeu "$(pyfg)"
EDIT: This works
#!/bin/bash
# pass_space_function.sh
aoeu() {
echo $1 x $2
}
pyfg() {
echo "'-htns crl' 'qjkx'"
}
eval aoeu $(pyfg)
I have a script that searches through a very large number of files, and uses sed to substitute a multiple line pattern. The script is iterative, and it works fine on some iterations but sometimes it causes a segmentation fault.
This is what the script is doing:
Search for files that DON'T contain the string X
Out of these files, search the ones that CONTAIN the string Y
Iterate the returned file list with a for-loop
If the file contents match pattern A, replace pattern A with A_TAG
The same for patterns B,C,D (a file can contain only one of A,B,C,D)
Patterns A,B,C,D are multiline, and they are replaced with two lines. X and Y are single line.
Here's the script. I apologise for the long lines, but I decided not to edit them since they're regex. I did however shorten the regex by replacing strings with "pattern" - the replaced contents are NOT the same in every regex, but they don't have any special characters so I don't think the actual contents are relevant to this question. Besides, the regex has been shown to work so you probably don't need to fully understand it..
#!/bin/sh
STRING_A="Pattern(\n|.)*Pattern\.\""
A_TAG="\$STRING:A$"
STRING_B="(Pattern(\n|.)*)?(Pattern(\n|.)*)?Pattern(\n|.)*Pattern(\n|.)*Pattern\.((\n|.)*will be met\: http\:\/\/www.foo\.org\/example\/temp\.html\.\n)?"
B_TAG="\$STRING:B$"
STRING_C="(Pattern(\n|.)*)?Pattern(\n|.)*http\:\/\/www\.foo\.org\/bar\/old-foobar\/file\-2\.1\.html\.((\n|.)*Pattern.*Pattern)?"
C_TAG="\$STRING:C$"
STRING_D="(Pattern(\n|.)*)?(Pattern(\n|.)*http\:\/\/www\.foo\.org\/bar\/old-foobar\/file\-2\.1\.html.*|Pattern(\n|.)*Pattern)((\n|.)*http\:\/\/www\.some-site\.org/\.)?"
D_TAG="\$STRING:D$"
## params: #1 file, #2 PATTERN, #3 TAG
multil_sed()
{
echo "In multil_sed"
# -n = silent, -r = extended regex, -i = inline changes
sed -nr '
# Sed has a hold buffer that we can use to "keep text in memory".
# Here we copy the line to the buffer if it is the first line of the file,
# or append it if it is not
1h
1!H
# We must first save all lines until the nth line to the hold buffer,
# then we can search for our pattern
60 {
# Then we must use the pattern buffer. Pattern buffer holds text that
# is up for modification. With g we can hopy the hold buffer into the pattern space
g
# Now we can just use the substitution command as we normally would. Use # as a delimiter
s#([ \t:#*;/".\\-]*)'"$2"'#\1'"$3"'\
\1$QT_END_LICENSE$#Ig
# Finally print what we did
p
}
' $1 > $1.foo;
echo "Done"
}
for p in $(find . -type f -not -iwholename '*.git*' -exec grep -iL '.*STRING_X.*' {} \; | xargs grep -il -E '.*STRING_Y.*')
do
echo
echo "####################"
echo "Working on file" $p
#Find A
if pcregrep -qiM "$STRING_A" "$p";
then
echo "A"
multil_sed "$p" "$STRING_A" "$A_TAG"
#Find B
elif pcregrep -qiM "$STRING_B" "$p";
then
echo "B"
multil_sed "$p" "$STRING_B" "$B_TAG"
#Find C
elif pcregrep -qiM "$STRING_C" "$p";
then
echo "C"
multil_sed "$p" "$STRING_C" "$C_TAG"
#Find D
elif pcregrep -qiM "$STRING_D" "$p";
then
echo "D"
multil_sed "$p" "$STRING_D" "$D_TAG"
else
echo "No match found"
fi
echo "####################"
done
I should probably note that C is essentially a longer version of D, that has some extra contents before the common part.
What happens is that for some iterations this works ok..
####################
Working on file ./src/listing.txt
A
In multil_sed
Done
####################
and sometimes it doesn't.
####################
Working on file ./src/web/page.html
/home/tekaukor/code/project/tag_adder.sh: line 54: 16904 Segmentation fault (core dumped) pcregrep -qiM "$STRING_A" "$p"
No match found
####################
It's not dependent on which pattern is being searched.
####################
Working on file ./src/test/formatter_test.cpp
/home/tekaukor/code/project/tag_adder.sh: line 54: 18051 Segmentation fault (core dumped) pcregrep -qiM "$STRING_B" "$p"
/home/tekaukor/code/project/tag_adder.sh: line 54: 18053 Segmentation fault (core dumped) pcregrep -qiM "$STRING_C" "$p"
/home/tekaukor/code/project/tag_adder.sh: line 54: 18055 Segmentation fault (core dumped) pcregrep -qiM "$STRING_D" "$p"
No match found
####################
Line 54 points to the line "for p in $(find . -type f -not -iwholename '.git' -exec grep...".
My guess is that sed is causing a buffer overflow, but I haven't found a way to ascertain or fix this.
Bash isn't great about locating the source of a fault in a compound statement so
Line 54 points to the line for p in $(find . -type f ....
is misleading as the error could be anywhere in that for statement block. The error message
Segmentation fault (core dumped) pcregrep -qiM "$STRING_D" "$p"
is much more accurate. And likely the cause of the fault is the -M flag combined with unbounded patterns like (.|\n)* As the pcregrep man page notes:
-M, --multiline
Allow patterns to match more than one line. When this option is given, patterns may usefully contain literal newline characters and internal occurrences of ^ and $ characters. The output for any one match may consist of more than one line. When this option is set, the PCRE library is called in "multiline" mode. There is a limit to the number of lines that can be matched, imposed by the way that pcregrep buffers the input file as it scans it. However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is the shorter) are available for forward matching, and similarly the previous 8K characters (or all the previous characters, if fewer than 8K) are guaranteed to be available for lookbehind assertions.
with emphasis mine. The single pattern fragment .* or (.|\n)* can literally match an entire file, so yes, it will fill up its lookahead buffer not just to the next literal (e.g. http) but until it finds the last such literal, because by default regular expressions seek the longest conforming match.
UPDATE #2: So apparently sed doesn't support non greedy matching, which makes part of my answer invalid. There are ways around this, but I will not include them here as it's far removed from the original question. The answer to this question is using the --disable-stack-for-recursion flag as described below.
The answer by msw helped me in the right direction.
First I changed the regex to be lazy instead of greedy. By default regex is greedy, which (as msw stated) means that a multiline expression with "PATTERN(.|\n)*TEXT" will search through the whole file. By adding "?" after quantifiers (* -> *?) I made the regez lazy, which means that the "(.|\n)*?" in "PATTERN(.|\n)*?TEXT" will stop expanding at the first TEXT.
I also made the optional parts lazy (? -> ??), though I'm not sure if this was necessary.
However this was not enough. I also had to configure pcregrep to use heap instead of stack memory. I downloaded pcre and configured using the flag --disable-stack-for-recursion. Note that using heap is much slower, so you shouldn't do this if you don't have to.
I'm including a step-by-step in case anyone wonders here with the same problem. Note that I'm still a linux newb and there's a high chance that I made something unnecessary and/or stupid. The instructions are based on http://www.mail-archive.com/pcre-dev#exim.org/msg00817.html and http://www.linuxfromscratch.org/blfs/view/svn/general/pcre.html
Download pcre from http://downloads.sourceforge.net/pcre/pcre-8.33.tar.bz2
tar jxf pre-8.33.tar.bz2
cd pcre-8.33
./configure --prefix=/usr --docdir=/usr/share/doc/pcre-8.33 --enable-utf --enable-unicode-properties --enable-pcregrep-libz2 --disable-static --disable-stack-for-recursion
make
sudo make install
There are some additional steps in the provided guide, but I didn't have to do them.
UPDATE: Making the optional elements lazy (? -> ??) is a mistake, as then they will not be included in the matched pattern if possible.