I'm trying to address an issue with an Expect script that logs into a very large number of devices (thousands). The script is about 1500 lines and fairly involved; its job is to audit managed equipment on a network with many thousands of nodes. As a result, it logs into the devices via telnet, runs commands to check on the health of the equipment, logs this information to a file, and then logs out to proceed to the next device.
This is where I'm running into my problem; every expect in my script includes a timeout and an eof like so:
timeout {
lappend logmsg "$rtrname timed out while <description of expect statement>"
logmessage
close
wait
set session 0
continue
}
eof {
lappend logmsg "$rtrname disconnected while <description of expect statement>"
logmessage
set session 0
continue
}
My final expect closes each spawn session manually:
-re "OK.*#" {
close
send_user "Closing session... "
wait
set session 0
send_user "closed.\n\n"
continue
}
The continues bring the script back to the while loop that initiates the next spawn session, assuming session = 0.
The set session 0 tracks when a spawn session closes either manually by the timeout or via EOF before a new spawn session is opened, and everything seems to indicate that the spawn sessions are being closed, yet after a thousand or so spawned sessions, I get the following error:
spawn telnet <IP removed>
too many programs spawned? could not create pipe: too many open files
Now, I'm a network engineer, not a UNIX admin or professional programmer, so can someone help steer me towards my mistake? Am I closing telnet spawn sessions but not properly closing a channel? I wrote a second, test script, that literally just connects to devices one by one and disconnects immediately after a connection is formed. It doesn't log in or run any commands as my main script does, and it works flawlessly through thousands of connections. That script is below:
#!/usr/bin/expect -f
#SPAWN TELNET LIMIT TEST
set ifile [open iad.list]
set rtrname ""
set sessions 0
while {[gets $ifile rtrname] != -1} {
set timeout 2
spawn telnet $rtrname
incr sessions
send_user "Session# $sessions\n"
expect {
"Connected" {
close
wait
continue
}
timeout {
close
wait
continue
}
eof {
continue
}
}
In my main script I'm logging every single connection and why they may EOF or timeout (via the logmessage process which writes a specific reason to a file), and even when I see nothing but successful spawned connections and closed connections, I get the same problem with my main script but not the test script.
I've been doing some reading on killing process IDs, but as I understand it, close should be killing the process ID of the current spawn session, and wait should be halting the script until the process is dead. I've also tried using a simple "exit" command from the devices to close the telnet connection, but this doesn't produce any better results.
I may simply need a suggestion on how to better track the opening and closing of my sessions and ensure that, between devices, no spawn sessions remain open. Any help that can be offered will be much appreciated.
Thank you!
The Error?
spawn telnet too many programs spawned? could not create
pipe: too many open files
This error is likely due to your system running out of file handles (or at least exhausting the count available to you).
I suspect the reason for this is abandoned telnet sessions which are left open.
Now let's talk about why they may still be hanging around.
Not Even, Close?
Close may not actually close the telnet connection, especially if telnet doesn't recognize the session has been closed, only expect's session with telnet (See: The close Command). In this case Telnet is most likely being kept alive waiting for more input from the network side and by a TCP keepalive.
Not all applications recognize close, which is presented as an EOF to the receiving application. Because of this they may remain open even when their input has been closed.
Tell "Telnet", It's Over.
In that case, you will need to interrupt telnet. If your intent is to complete some work and exit. Then that is exactly what we'll need to do.
For "telnet" you can cleanly exit by issuing a "send “35\r”" (which would be "ctrl+]" on the keyboard if you had to type it yourself) followed by "quit" and then a carriage return. This will tell telnet to exit gracefully.
Expect script: start telnet, run commands, close telnet
Excerpt:
#!/usr/bin/expect
set timeout 1
set ip [lindex $argv 0]
set port [lindex $argv 1]
set username [lindex $argv 2]
set password [lindex $argv 3]
spawn telnet $ip $port
expect “‘^]’.”
send – – “\r”
expect “username:” {
send – – “$username\r”
expect “password:”
send – – “$password\r”
}
expect “$”
send – – “ls\r”
expect “$”
sleep 2
# Send special ^] to telnet so we can tell telnet to quit.
send “35\r”
expect “telnet>”
# Tell Telnet to quit.
send – – “quit\r”
expect eof
# You should also, either call "wait" (block) for process to exit or "wait -nowait" (don't block waiting) for process exit.
wait
Wait, For The Finish.
Expect - The wait Command
Without "wait", expect may sever the connection to the process prematurely, this can cause the creation zombies in some rare cases. If the application did not get our signal earlier (the EOF from close), or if the process doesn't interpret EOF as an exit status then it may also continue running and your script would be none the wiser. With wait, we ensure we don't forget about the process until it cleans up and exits.
Otherwise, we may not close any of these processes until expect exits. This could cause us to run out of file handles if none of them close for a long running expect script (or one which connects to a lot of servers). Once we run out of file handles, expect and everything it started just dies, and you won't see those file handles exhausted any longer.
Timeout?, Catch all?, Why?
You may also want to consider using a "timeout" in case that the server doesn't respond when expected so we can exit early. This is ideal for severely lagged servers which should instead get some admin attention.
Catch all can help your script deal with any unexpected responses that don't necessarily prevent us from continuing. We can choose to just continue processing, or we could choose to exit early.
Expect Examples Excerpt:
expect {
"password:" {
send "password\r"
} "yes/no)?" {
send "yes\r"
set timeout -1
} timeout {
exit
# Below is our catch all
} -re . {
exp_continue
#
} eof {
exit
}
}
Related
the question of today is:
If i have an expect script which simply automate some basic telnet operations but sometimes the server "bad act" and close the
connections can i handle it and avoid to wait for some timeout
occurs?
Some specifications:
The string that i got printed on the console, when the connection is dropped by the server, is the classic "Connection closed by foreign host."
The operations which the expect telnet automation must send are the following:
Authentication phase:
← Wait for the prompt
→ Send the username
← Wait for the password
→ Send the password
← Wait for the prompt
Command and output phase:
→ Send a command string (generally something very simple such a word: "voltage", "temp")
← Wait the whole output generally wait until the next prompt do the job. (the output contains many symbols and variable values so the wait for prompt seems a good strategy.)
Connection closing phase:
→ Send the "exit" command
← Wait for the legit "Connection closed by foreign host."
NOTE:
The "Connection closed by foreign host." can occur in any moment, eg in any phase. In detail, i'm interested to understand if is possible to fix this when i'm waiting for prompt or output (steps 1,3,5,7) insted of waiting for the connection termination.
Best regards to everyone and thanks to who will try to help!
Stay safe,
Luca
After the spawn you can setup an expect_before command (once) that will be executed every time your script does an expect. Eg
expect_before "Connection closed" { send_user "Unexpected close" ; exit 1 }
See example
I am using an Expect script to automate the installation of a program. Because of an issue with one of the install dependencies, I need to pause the installation process at a specific point to edit the db.properties file. Once that file is changed, I can resume the installation process. I can spawn a new process in the middle of the installation to do this, but I get the "spawn id exp5 not open" error after closing that process.
db_edit.sh edits the appropriate file:
#!/usr/bin/sh
filename=db.properties
sed -i "s/<some_regex>/<new_db_info>/g" $filename
My automated installation script spawns the above script in the middle of its execution:
#!/usr/bin/expect
# Run the installer and log the output
spawn ./install.bin
log_file install_output.log
# Answer installer questions
# for simplicity, let's pretend there is only one
expect "PRESS <ENTER> TO CONTINUE:"
send "\r"
# Now I need to pause the installation and edit that file
spawn ./db_edit.sh
set db_edit_ID $spawn_id
close -i $db_edit_ID
send_log "DONE"
# Database Connection - the following must happen AFTER the db_edit script runs
expect "Hostname (Default: ):"
send "my_host.com\r"
# more connection info ...
expect eof
The output log install_output.log shows the following error:
PRESS <ENTER> TO CONTINUE: spawn ./db_edit.sh^M
DONEexpect: spawn id exp5 not open
while executing
"expect "Hostname (Default: ):""^M
The database info has been modified correctly, so I know the script works and it is indeed spawned. However, when I close that process, the spawn id of the installation process is also apparently closed, which causes the spawn id exp5 not open error.
Also curious is that the spawn appears to happen before it should. The response to "PRESS <ENTER>" should be "\r" or ^M to indicate that ENTER was sent.
How can I fix this to resume the installation script after closing db_edit.sh?
There is no need to automate any interactivity with that script, so don't use spawn
exec db_edit.sh
This way, you're not interfering with the spawn_id of the currently spawned process.
The expect has been installed, it_a_test is the vps password.
scp /home/wpdatabase_backup.sql root#vps_ip:/tmp
The command can transfer file /home/wpdatabase_backup.sql into my vps_ip:/tmp.
Now i rewrite the process into the following code:
#!/usr/bin/expect -f
spawn scp /home/wpdatabase_backup.sql root#vps_ip:/tmp
expect "password:"
send it_is_a_test\r
Why can't transfer my file into remote vps_ip with expect?
Basically, expect will work with two feasible commands such as send and expect. In this case, if send is used, then it is mandatory to have expect (in most of the cases) afterwards. (while the vice-versa is not required to be mandatory)
This is because without that we will be missing out what is happening in the spawned process as expect will assume that you simply need to send one string value and not expecting anything else from the session, making the script exits and causing the failure.
So, you just have to add one expect to wait for the closure of the scp command which can be performed by waiting for eof (End Of File).
#!/usr/bin/expect -f
spawn scp /home/wpdatabase_backup.sql root#vps_ip:/tmp
expect "password:"
send "it_is_a_test\r"
expect eof; # Will wait till the 'scp' completes.
Note :
The default timeout in expect is 10 seconds. So, if the scp completes, within 10 seconds, then no problem. Suppose, if the operation takes more than that, then expect will timeout and quit, which makes failure in scp transfer. So, you can set increase timeout if you want which can be modified as
set timeout 60; # Timeout is 1 min
I have an expect script that logs into several computers through ssh and starts programs. It has been working fine for a while from but now suddenly a problem has appeared.
It happens at the same time every run; after logging out of a certain computer it attempts to log into the next one before the prompt is ready. That is, the lines
#!/usr/bin/expect
set multiPrompt {[#>$] }
(...)
expect -re $multiPrompt
send "exit\r"
expect -re $multiPrompt
spawn ssh name#computer4.place.com
which should (and normally does) give the result
name#computer3:~$ exit
logout
Connection to computer3.place.com closed.
name#computer1:~$ ssh name#computer4.place.com
instead gives
name#computer3:~$ exit
logout
ssh name#computer4.place.com
Connection to computer3.thphys.nuim.ie closed.
name#computer1:~$
and then it goes bananas. In other words the ssh ... doesn't wait for the prompt to appear.
I am using autossh with a monitoring flag. autossh prints to standard output every time it send test packets.
When using autossh under expect the text packets messages are not printed.
I don't know if they are sent at all which is important to keep the ssh connection alive.
can you tell if "expect" effects the autossh behavior ?
how can I figure out if autossh works correctly ?
the expect code:
#!/usr/bin/expect
set timeout 50
spawn autossh -M 11111 -N -R 4848:localhost:80 user#192.168.1.100
set keepRunning 1
while {$keepRunning} {
expect \
{
"(yes/no)" { send "yes\r" }
"Password:*" { send "1234\r" ; set keepRunning 0 }
"ssh exited prematurely with" { exit 7 }
"remote port forwarding failed*" { exit 8 }
}
}
expect \
{
"remote port forwarding failed*" { exit 9 }
"Password:*" { exit 5 }
}
wait
the periodic output that I see without expect is this:
autossh[2882]: checking for grace period, tries = 0
autossh[2882]: starting ssh (count 1)
autossh[2883]: execing /pfrm2.0/bin/ssh
autossh[2882]: ssh child pid is 2883
autossh[2882]: check on child 2883
autossh[2882]: set alarm for 50 secs
Password: autossh[2882]: connection ok
**autossh[2882]: check on child 2883
autossh[2882]: set alarm for 60 secs
autossh[2882]: connection ok
autossh[2882]: check on child 2883
autossh[2882]: set alarm for 60 secs**
The last 5 lines are test packets sent by autossh.
That lines are printed only when running autossh from bash directly.
When running from using "expect" those lines are not printed and I dont know if autossh sends them.
Thanks.
Frankly, I have not worked on autossh before. But, as per your code, expect will check for the patterns given by you, such as logging in by means of providing the password. If it happen to see an sort of error message. it will exit.
Then at the end. you have added wait. It delays until a spawned process (or the current process if none is named) terminates. I am not sure whether those prints will simply get printed in terminal. In that case, after logging in, you can make the expect to wait for it just like how you did for logging in. If those prints will be seen after providing any sort of command, then make sure you passing it by send and then wait for those patterns.
Unless until, you instruct expect to expect for a pattern, it won't bother waiting for anything which is why it won't be seen in output as well.
I've found out that wait somehow prevents the "autossh" from monitoring the connection.
The "wait" in the end of the "expect" script was replaced with "interact" and it is working fine now.
* Thanks Dinesh
(This is more detailed description and answer.)
What I wanted:
ssh tunnel using autossh and expect and a wrapper bash script.
The bash script should run in background.
when the ssh quits the script should start the tunnel again.
The problems with expect:
After establishing the ssh-tunnel I have tried using the following
interact - when the bash script was running in background interact hangs so expect wasn't waiting for autossh and I couldn't monitor the autossh (without a sleep loop)
wait - for some reason it was preventing autossh monitoring mechanism from working properly.
-nothing- - expect just quits and I couldn't monitor autossh
the solution was using this loop end "expecting" a "connection ok" string
set moreThanAutoSshTimeout [expr {$env(AUTOSSH_POLL) + 10}]
set timeout $moreThanAutoSshTimeout
set connStatus 1
while { $connStatus } {
set connStatus 0
expect \
{
"connection ok" { set connStatus 1 }
}
}