ignore ANSI colors in pexpect response - pexpect

Can I use pexpect in a way that ignores ANSI escape codes (especially colors) in the output? I am trying to do this:
expect('foo 3 bar 5')
...but sometimes I get output with ANSI-colored numbers. The problem is I don't know which numbers will have ANSI colors and which won't.
Is there a way to use pexpect but have it ignore ANSI sequences in the response from the child process?

Here's a not entirely satisfying proposal, subclassing 2 routines of the pexpect classes pexpect.Expecter and pexpect.spawn so that incoming data can have the escape sequences removed before they get added to the buffer and tested for pattern match. It is a lazy implementation in that it assumes any escape sequence will always be read atomically, but coping with split reads is more difficult.
# https://stackoverflow.com/a/59413525/5008284
import re, pexpect
from pexpect.expect import searcher_re
# regex for vt100 from https://stackoverflow.com/a/14693789/5008284
class MyExpecter(pexpect.Expecter):
ansi_escape = re.compile(rb'\x1B[#-_][0-?]*[ -/]*[#-~]')
def new_data(self, data):
data = self.ansi_escape.sub(b'', data)
return pexpect.Expecter.new_data(self, data)
class Myspawn(pexpect.spawn):
def expect_list(self, pattern_list, timeout=-1, searchwindowsize=-1,
async=False):
if timeout == -1:
timeout = self.timeout
exp = MyExpecter(self, searcher_re(pattern_list), searchwindowsize)
return exp.expect_loop(timeout)
This assumes you use the expect() call with a list, and do
child = Myspawn("...")
rc = child.expect(['pat1'])
For some reason I had to use bytes rather than strings as I get the data before it is decoded, but that may just be because of a currently incorrect locale environment.

This workaround partially defeats the purpose of using pexpect but it satisfies my requirements.
The idea is:
expect anything at all (regex match .*) followed by the next prompt (which in my case is xsh $ - note the backslash in the "prompt" regex)
get the after property
trim off the prompt: [1:]
remove ANSI escape codes from that
compare the filtered text with my "expected" response regex
with pexpect.spawn(XINU_CMD, timeout=3, encoding='utf-8') as c:
# from https://stackoverflow.com/a/14693789/5008284
ansi_escape = re.compile(r"\x1B[#-_][0-?]*[ -/]*[#-~]")
system_prompt_wildcard = r".*xsh \$ " # backslash because prompt is "xsh $ "
# tests is {command:str, responses:[str]}
for test in tests:
c.sendline(test["cmd"])
response = c.expect([system_prompt_wildcard, pexpect.EOF, pexpect.TIMEOUT]) #=> (0|1|2)
if response != 0: # any error
continue
response_text = c.after.split('\n')[1:]
for expected, actual in zip(test['responses'], response_text):
norm_a = ansi_escape.sub('', norm_input.sub('', actual.strip()))
result = re.compile(norm_a).findall(expected)
if not len(result):
print('NO MATCH FOUND')

Related

Piped output from Python gets backed up [duplicate]

Is output buffering enabled by default in Python's interpreter for sys.stdout?
If the answer is positive, what are all the ways to disable it?
Suggestions so far:
Use the -u command line switch
Wrap sys.stdout in an object that flushes after every write
Set PYTHONUNBUFFERED env var
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Is there any other way to set some global flag in sys/sys.stdout programmatically during execution?
If you just want to flush after a specific write using print, see How can I flush the output of the print function?.
From Magnus Lycka answer on a mailing list:
You can skip buffering for a whole
python process using python -u
or by
setting the environment variable
PYTHONUNBUFFERED.
You could also replace sys.stdout with
some other stream like wrapper which
does a flush after every call.
class Unbuffered(object):
def __init__(self, stream):
self.stream = stream
def write(self, data):
self.stream.write(data)
self.stream.flush()
def writelines(self, datas):
self.stream.writelines(datas)
self.stream.flush()
def __getattr__(self, attr):
return getattr(self.stream, attr)
import sys
sys.stdout = Unbuffered(sys.stdout)
print 'Hello'
I would rather put my answer in How to flush output of print function? or in Python's print function that flushes the buffer when it's called?, but since they were marked as duplicates of this one (what I do not agree), I'll answer it here.
Since Python 3.3, print() supports the keyword argument "flush" (see documentation):
print('Hello World!', flush=True)
# reopen stdout file descriptor with write mode
# and 0 as the buffer size (unbuffered)
import io, os, sys
try:
# Python 3, open as binary, then wrap in a TextIOWrapper with write-through.
sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
# If flushing on newlines is sufficient, as of 3.7 you can instead just call:
# sys.stdout.reconfigure(line_buffering=True)
except TypeError:
# Python 2
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Credits: "Sebastian", somewhere on the Python mailing list.
Yes, it is.
You can disable it on the commandline with the "-u" switch.
Alternatively, you could call .flush() on sys.stdout on every write (or wrap it with an object that does this automatically)
This relates to Cristóvão D. Sousa's answer, but I couldn't comment yet.
A straight-forward way of using the flush keyword argument of Python 3 in order to always have unbuffered output is:
import functools
print = functools.partial(print, flush=True)
afterwards, print will always flush the output directly (except flush=False is given).
Note, (a) that this answers the question only partially as it doesn't redirect all the output. But I guess print is the most common way for creating output to stdout/stderr in python, so these 2 lines cover probably most of the use cases.
Note (b) that it only works in the module/script where you defined it. This can be good when writing a module as it doesn't mess with the sys.stdout.
Python 2 doesn't provide the flush argument, but you could emulate a Python 3-type print function as described here https://stackoverflow.com/a/27991478/3734258 .
def disable_stdout_buffering():
# Appending to gc.garbage is a way to stop an object from being
# destroyed. If the old sys.stdout is ever collected, it will
# close() stdout, which is not good.
gc.garbage.append(sys.stdout)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
# Then this will give output in the correct order:
disable_stdout_buffering()
print "hello"
subprocess.call(["echo", "bye"])
Without saving the old sys.stdout, disable_stdout_buffering() isn't idempotent, and multiple calls will result in an error like this:
Traceback (most recent call last):
File "test/buffering.py", line 17, in <module>
print "hello"
IOError: [Errno 9] Bad file descriptor
close failed: [Errno 9] Bad file descriptor
Another possibility is:
def disable_stdout_buffering():
fileno = sys.stdout.fileno()
temp_fd = os.dup(fileno)
sys.stdout.close()
os.dup2(temp_fd, fileno)
os.close(temp_fd)
sys.stdout = os.fdopen(fileno, "w", 0)
(Appending to gc.garbage is not such a good idea because it's where unfreeable cycles get put, and you might want to check for those.)
The following works in Python 2.6, 2.7, and 3.2:
import os
import sys
buf_arg = 0
if sys.version_info[0] == 3:
os.environ['PYTHONUNBUFFERED'] = '1'
buf_arg = 1
sys.stdout = os.fdopen(sys.stdout.fileno(), 'a+', buf_arg)
sys.stderr = os.fdopen(sys.stderr.fileno(), 'a+', buf_arg)
Yes, it is enabled by default. You can disable it by using the -u option on the command line when calling python.
In Python 3, you can monkey-patch the print function, to always send flush=True:
_orig_print = print
def print(*args, **kwargs):
_orig_print(*args, flush=True, **kwargs)
As pointed out in a comment, you can simplify this by binding the flush parameter to a value, via functools.partial:
print = functools.partial(print, flush=True)
You can also run Python with stdbuf utility:
stdbuf -oL python <script>
You can create an unbuffered file and assign this file to sys.stdout.
import sys
myFile= open( "a.log", "w", 0 )
sys.stdout= myFile
You can't magically change the system-supplied stdout; since it's supplied to your python program by the OS.
You can also use fcntl to change the file flags in-fly.
fl = fcntl.fcntl(fd.fileno(), fcntl.F_GETFL)
fl |= os.O_SYNC # or os.O_DSYNC (if you don't care the file timestamp updates)
fcntl.fcntl(fd.fileno(), fcntl.F_SETFL, fl)
One way to get unbuffered output would be to use sys.stderr instead of sys.stdout or to simply call sys.stdout.flush() to explicitly force a write to occur.
You could easily redirect everything printed by doing:
import sys; sys.stdout = sys.stderr
print "Hello World!"
Or to redirect just for a particular print statement:
print >>sys.stderr, "Hello World!"
To reset stdout you can just do:
sys.stdout = sys.__stdout__
It is possible to override only write method of sys.stdout with one that calls flush. Suggested method implementation is below.
def write_flush(args, w=stdout.write):
w(args)
stdout.flush()
Default value of w argument will keep original write method reference. After write_flush is defined, the original write might be overridden.
stdout.write = write_flush
The code assumes that stdout is imported this way from sys import stdout.
Variant that works without crashing (at least on win32; python 2.7, ipython 0.12) then called subsequently (multiple times):
def DisOutBuffering():
if sys.stdout.name == '<stdout>':
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
if sys.stderr.name == '<stderr>':
sys.stderr = os.fdopen(sys.stderr.fileno(), 'w', 0)
(I've posted a comment, but it got lost somehow. So, again:)
As I noticed, CPython (at least on Linux) behaves differently depending on where the output goes. If it goes to a tty, then the output is flushed after each '\n'
If it goes to a pipe/process, then it is buffered and you can use the flush() based solutions or the -u option recommended above.
Slightly related to output buffering:
If you iterate over the lines in the input with
for line in sys.stdin:
...
then the for implementation in CPython will collect the input for a while and then execute the loop body for a bunch of input lines. If your script is about to write output for each input line, this might look like output buffering but it's actually batching, and therefore, none of the flush(), etc. techniques will help that.
Interestingly, you don't have this behaviour in pypy.
To avoid this, you can use
while True:
line=sys.stdin.readline()
...

How to interactively run mount command from a (Ruby) script?

I am trying to write a Ruby script that runs the mount command interactively behind the scenes. The problem is, if I redirect input and output of the mount command to pipes, it doesn't work. Somehow, mount seems to realise that it's not talking directly to stdin/stdout and falls over. Either that, or it's a more wide-ranging problem that would affect all interactive commands; I don't know.
I want to be able to parse the output of mount, line by line, and shove answers into its input pipe when it asks questions. This shouldn't be an unreasonable expectation. Can someone help, please?
Examples:
def read_until(pipe, stop_at, timeoutsec = 10, verbose = false)
lines = []; line = ""
while result = IO.select([pipe], nil, nil, timeoutsec)
next if result.empty?
begin
c = pipe.read(1) rescue c = nil
end
break if c.nil?
line << c
break if line =~ stop_at
# Start a new line?
if line[-1] == ?\n
puts line if verbose
lines << line.strip
line = ""
end
end
return lines, line.match(stop_at)
end
cmd = "mount.ecryptfs -f /tmp/1 /tmp/2"
status = Open3::popen2e(cmd) { |i,o,t|
o.fcntl(3, 4) # Set non-blocking (this doesn't make any difference)
i.fcntl(3, 4) # Set non-blocking (this doesn't make any difference)
puts read_until(o, /some pattern/, 1, true) # Outputs [[], nil]
}
I've also tried spawn:
a, b = IO.pipe
c, d = IO.pipe
pid = spawn(cmd, :in=>a, :out=>d)
puts read_until(c, /some pattern/, 1, true) # Outputs [[], nil]
I've tried subprocess, pty and a host of other solutions - basically, if it's on Google, I've tried it. It seems that mount just knows if I'm not passing it a real shell, and deliberately blocks. See:
pid = spawn(cmd, :in=>STDIN, :out=>STDOUT) # Works
pid = spawn(cmd, :in=>somepipe, :out=>STDOUT) # Blocks after first line of output, for no reason whatsoever. It's not expecting any input at this point.
I even tried spawning a real shell (e.g. bash) and sending the mount command to it via an input pipe. Same problem.
Please ignore any obvious errors in the above: I have tried several solutions tonight, so the actual code has been rewritten many times. I wrote the above from memory.
What I want is the following:
Run mount command with arguments, getting pipes for its input and output streams
Wait for first specific question on output pipe
Answer specific question by writing to input pipe
Wait for second specific question on output pipe
...etc...
And so on.
You may find Kernel#system useful. It opens a subshell, so if you are ok w/ the user just interacting with mount directly this will make everything much easier.

ruby gedcom parser EOF exception

i need to parse gedcom 5.5 files for a analyziation project.
The first ruby parser i found couses a stack level too deep error, so i tryed to find alternatives. I fount this project: https://github.com/jslade/gedcom-ruby
There are some samples included, but i don't get them to work either.
Here is the parser itself: https://github.com/jslade/gedcom-ruby/blob/master/lib/gedcom.rb
If i try the sample like this:
ruby ./samples/count.rb ./samples/royal.ged
i get the following error:
D:/rails_projects/gedom_test/lib/gedcom.rb:185:in `readchar': end of file reached (EOFError)
I wrote a "gets" in every method for better unterstanding, this is the output till the exception raises:
Parsing './samples/royal.ged'...
INIT
BEFORE
CHECK_PROC_OR_BLOCK
BEFORE
CHECK_PROC_OR_BLOCK
PARSE
PARSE_FILE
PARSE_IO
DETECT_RS
The exact line that causes the trouble is
while ch = io.readchar
in the detect_rs method:
# valid gedcom may use either of \r or \r\n as the record separator.
# just in case, also detects simple \n as the separator as well
# detects the rs for this string by scanning ahead to the first occurence
# of either \r or \n, and checking the character after it
def detect_rs io
puts "DETECT_RS"
rs = "\x0d"
mark = io.pos
begin
while ch = io.readchar
case ch
when 0x0d
ch2 = io.readchar
if ch2 == 0x0a
rs = "\x0d\x0a"
end
break
when 0x0a
rs = "\x0a"
break
end
end
ensure
io.pos = mark
end
rs
end
I hope someone can help me with this.
The readchar method of Ruby's IO class will raise an EOFError when it encounters the end of the file. http://www.ruby-doc.org/core-2.1.1/IO.html#method-i-readchar
The gedcom-ruby gem hasn't been touched in years, but there was a fork of it made a couple of years go to fix this very problem.
Basically it changes:
while ch = io.readchar
to
while !io.eof && ch = io.readchar
You can get the fork of the gem here: https://github.com/trentlarson/gedcom-ruby

HMAC-SHA1 in bash

Is there a bash script available to generate a HMAC-SHA1 hash?
The equivalent of the following PHP code:
hash_hmac("sha1", "value", "key", TRUE);
Parameters
true : When set to TRUE, outputs raw binary data. FALSE outputs lowercase hexits.
Thanks.
see HMAC-SHA1 in bash
In bash itself, no, it can do a lot of stuff but it also knows when to rely on external tools.
For example, the Wikipedia page provides a Python implementation which bash can call to do the grunt work for HMAC_MD5, repeated below to make this answer self-contained:
#!/usr/bin/env python
from hashlib import md5
trans_5C = "".join(chr(x ^ 0x5c) for x in xrange(256))
trans_36 = "".join(chr(x ^ 0x36) for x in xrange(256))
blocksize = md5().block_size
def hmac_md5(key, msg):
if len(key) > blocksize:
key = md5(key).digest()
key += chr(0) * (blocksize - len(key))
o_key_pad = key.translate(trans_5C)
i_key_pad = key.translate(trans_36)
return md5(o_key_pad + md5(i_key_pad + msg).digest())
if __name__ == "__main__":
h = hmac_md5("key", "The quick brown fox jumps over the lazy dog")
print h.hexdigest() # 80070713463e7749b90c2dc24911e275
(keeping in mind that Python also contains SHA1 stuff as well, see here for details on how to use HMAC with the hashlib.sha1() constructor).
Or, if you want to run the exact same code as PHP does, you could try running it with phpsh, as detailed here.

Script working in Python2 but not in Python 3 (hashlib)

I worked today in a simple script to checksum files in all available hashlib algorithms (md5, sha1.....) I wrote it and debug it with Python2, but when I decided to port it to Python 3 it just won't work. The funny thing is that it works for small files, but not for big files. I thought there was a problem with the way I was buffering the file, but the error message is what makes me think it is something related to the way I am doing the hexdigest (I think) Here is a copy of my entire script, so feel free to copy it, use it and help me figure out what the problem is with it. The error I get when checksuming a 250 MB file is
"'utf-8' codec can't decode byte 0xf3 in position 10: invalid continuation byte"
I google it, but can't find anything that fixes it. Also if you see better ways to optimize it, please let me know. My main goal is to make work 100% in Python 3. Thanks
#!/usr/local/bin/python33
import hashlib
import argparse
def hashFile(algorithm = "md5", filepaths=[], blockSize=4096):
algorithmType = getattr(hashlib, algorithm.lower())() #Default: hashlib.md5()
#Open file and extract data in chunks
for path in filepaths:
try:
with open(path) as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
break
algorithmType.update(dataChunk.encode())
yield algorithmType.hexdigest()
except Exception as e:
print (e)
def main():
#DEFINE ARGUMENTS
parser = argparse.ArgumentParser()
parser.add_argument('filepaths', nargs="+", help='Specified the path of the file(s) to hash')
parser.add_argument('-a', '--algorithm', action='store', dest='algorithm', default="md5",
help='Specifies what algorithm to use ("md5", "sha1", "sha224", "sha384", "sha512")')
arguments = parser.parse_args()
algo = arguments.algorithm
if algo.lower() in ("md5", "sha1", "sha224", "sha384", "sha512"):
Here is the code that works in Python 2, I will just put it in case you want to use it without having to modigy the one above.
#!/usr/bin/python
import hashlib
import argparse
def hashFile(algorithm = "md5", filepaths=[], blockSize=4096):
'''
Hashes a file. In oder to reduce the amount of memory used by the script, it hashes the file in chunks instead of putting
the whole file in memory
'''
algorithmType = hashlib.new(algorithm) #getattr(hashlib, algorithm.lower())() #Default: hashlib.md5()
#Open file and extract data in chunks
for path in filepaths:
try:
with open(path, mode = 'rb') as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
break
algorithmType.update(dataChunk)
yield algorithmType.hexdigest()
except Exception as e:
print e
def main():
#DEFINE ARGUMENTS
parser = argparse.ArgumentParser()
parser.add_argument('filepaths', nargs="+", help='Specified the path of the file(s) to hash')
parser.add_argument('-a', '--algorithm', action='store', dest='algorithm', default="md5",
help='Specifies what algorithm to use ("md5", "sha1", "sha224", "sha384", "sha512")')
arguments = parser.parse_args()
#Call generator function to yield hash value
algo = arguments.algorithm
if algo.lower() in ("md5", "sha1", "sha224", "sha384", "sha512"):
for hashValue in hashFile(algo, arguments.filepaths):
print hashValue
else:
print "Algorithm {0} is not available in this script".format(algorithm)
if __name__ == "__main__":
main()
I haven't tried it in Python 3, but I get the same error in Python 2.7.5 for binary files (the only difference is that mine is with the ascii codec). Instead of encoding the data chunks, open the file directly in binary mode:
with open(path, 'rb') as f:
while True:
dataChunk = f.read(blockSize)
if not dataChunk:
break
algorithmType.update(dataChunk)
yield algorithmType.hexdigest()
Apart from that, I'd use the method hashlib.new instead of getattr, and hashlib.algorithms_available to check if the argument is valid.

Resources