I am struggling to find a text comparison tool or algorithm that can compare an expected text against the current state of the text being typed.
I will have an experimentee typewrite a text that he has in front of his eyes. My idea is to compare the current state of the text against the expected text whenever something is typed. That way I want to find out when and what the subject does wrong (I also want to find errors that are not in the resulting text but were in the intermediate text for some time).
Can someone point me in a direction?
Update #1
I have access to the typing data in a csv format:
This is example output data of me typing "foOBar". Every line has the form (timestamp, Key, Press/Release)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
In Python
Given your input csv file (I called it keyboard_records.csv)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
The following code does the following:
Read its content and store it in a list named steps
For each step in steps recognizes what happened and
If it was a shift press or release sets a flag (shift_on) accordingly
If it was an arrow pressed moves the cursor (index of current where we insert characters) – if it the cursor is at the start or at the end of the string it shouldn't move, that's why those min() and max()
If it was a letter/number/symbol it adds it in curret at cursor position and increments cursor
Here you have it
import csv
steps = [] # list of all actions performed by user
expected = "Hello"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
current = [] # text written by the user
shift_on = False # is shift pressed
cursor = 0 # where is the cursor in the current text
for step in steps:
time, key, action = step
if key == 'LeftShift':
if action == 'P':
shift_on = True
else:
shift_on = False
continue
if key == 'LeftArrow' and action == 'P':
cursor = max(0, cursor-1)
continue
if key == 'RightArrow' and action == 'P':
cursor = min(len(current), cursor+1)
continue
if action == 'P':
if shift_on is True:
current.insert(cursor, key.upper())
else:
current.insert(cursor, key.lower())
cursor += 1
# Now you can join current into a string
# and compare current with expected
print(''.join(current)) # printing current (just to see what's happening)
else:
# What to do when a key is released?
# Depends on your needs...
continue
To compare current and expected have a look here.
Note: by playing around with the code above and a few more flags you can make it recognize also symbols. This will depend on your keyboard. In mine Shift + 6 = &, AltGr + E = € and Ctrl + Shift + AltGr + è = {. I think this is a good point to start.
Update
Comparing 2 texts isn't a difficult task and you can find tons of pages on the web about it.
Anyway I wanted to present you an object oriented approach to the problem, so I added the compare part that I previously omitted in the first solution.
This is still a rough code, without primary controls over the input. But, as you asked, this is pointing you in a direction.
class UserText:
# Initialize UserText:
# - empty text
# - cursor at beginning
# - shift off
def __init__(self, expected):
self.expected = expected
self.letters = []
self.cursor = 0
self.shift = False
# compares a and b and returns a
# list containing the indices of
# mismatches between a and b
def compare(a, b):
err = []
for i in range(min(len(a), len(b))):
if a[i] != b[i]:
err.append(i)
return err
# Parse a command given in the
# form (time, key, action)
def parse(self, command):
time, key, action = command
output = ""
if action == 'P':
if key == 'LeftShift':
self.shift = True
elif key == 'LeftArrow':
self.cursor = max(0, self.cursor - 1)
elif key == 'RightArrow':
self.cursor = min(len(self.letters), self.cursor + 1)
else:
# Else, a letter/number was pressed. Let's
# add it to self.letters in cursor position
if self.shift is True:
self.letters.insert(self.cursor, key.upper())
else:
self.letters.insert(self.cursor, key.lower())
self.cursor += 1
########## COMPARE WITH EXPECTED ##########
output += "Expected: \t" + self.expected + "\n"
output += "Current: \t" + str(self) + "\n"
errors = UserText.compare(str(self), self.expected[:len(str(self))])
output += "\t\t"
i = 0
for e in errors:
while i != e:
output += " "
i += 1
output += "^"
i += 1
output += "\n[{} errors at time {}]".format(len(errors), time)
return output
else:
if key == 'LeftShift':
self.shift = False
return output
def __str__(self):
return "".join(self.letters)
import csv
steps = [] # list of all actions performed by user
expected = "foobar"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
ut = UserText(expected)
for step in steps:
print(ut.parse(step))
The output for the csv file above was:
Expected: foobar
Current: f
[0 errors at time 17293398.576653]
Expected: foobar
Current: fo
[0 errors at time 17293401.313254]
Expected: foobar
Current: foO
^
[1 errors at time 17293402.073046]
Expected: foobar
Current: foOB
^^
[2 errors at time 17293403.178612]
Expected: foobar
Current: foOBa
^^
[2 errors at time 17293404.966193]
Expected: foobar
Current: foOBar
^^
[2 errors at time 17293405.725405]
I found the solution to my own question around a year ago. Now i have time to share it with you:
In their 2003 paper 'Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric', R. William Soukoreff and I. Scott MacKenzie propose three major new metrics: 'total error rate', 'corrected error rate' and 'not corrected error rate'. These metrics have become well established since the publication of this paper. These are exaclty the metrics i was looking for.
If you are trying to do something similiar to what i did, e.g. compare the writing performance on different input devices this is the way to go.
Related
I've been working on creating a YAML re-formatter based on ruamel.yaml (which you can see here).
I'm currently using version 0.17.20.
Cleaning up comments and whitespace has been difficult. I want to:
ensure there is only one space before the # for EOL comments
align full line comments with the key or item immediately following
remove duplicate blank lines so there is at most one blank line
To get closer to achieving that, I have a custom Emitter class where I extend write_comment to adjust the comments just before writing with super().write_comment(...). However, the Emitter does not know about which key or item comes next because comments are generally attached as post comments.
As I've studied the ruamel.yaml code to figure out how to do this, I found the rtsc mode (Round Trip Split Comments) which looks fantastic because it separates EOLComment, BlankLineComment and FullLineComment instead of lumping them together.
From what I can tell, the Parser and Scanner have been adjusted to capture the comments. So, loading is (mostly?) implemented with this "NEWCMNT" implementation. But Emitter.write_comment expects CommentToken instead of comment line numbers, so dumping does not work yet.
If I update my Emitter.write_comment method, is that enough to finish dumping? Or what else might be necessary? In one of my tries, I ran into a sys.exit in ScannedComments.assign_eol() - what else is needed to finish that?
PS: I wouldn't normally ask how to collaborate on StackOverflow, but this is not a bug report or a feature request, and I'm trying/failing to use a new (undocumented) feature, so I'm filing this here instead of sourceforge.
rtsc is work in progress cq work started but unfinished. It's internals will almost certainly change.
Two of the three points you indicate can relatively easy be implemented:
set the column of each comment to 0 ( by recursively going over a loaded data structure similar to here ) if the column is before the position of the end of the value on a line, you'll get one space between the value and the column
at the same time doing the recursion in the previous point. Take each comment value and do something like:
value = '\n'.join(line.strip() for line in value.splitlines()
while '\n\n\n' in value:
value = value.replace('\n\n\n', '\n\n')
The indentation to the following element is difficult, depends on the
data structure etc. Given that these are full line comments, I suggest
you do some postprocessing of the YAML document you generate:
find a full line comment, gather full line comments until next line is
not full line comment (i.e. some "real" YAML). Since full line comments
are in column[0] if the previous stuff is applied, you don't have to
track if you are in a (multi-line) literal or folded scalar string where
one of the lines happens to start with #
determine number of spaces
before real YAML and apply these to the full line comments.
import sys
import ruamel.yaml
yaml_str = """\
# the following is a example YAML doc
a:
- b: 42
# collapse multiple empty lines
c: |
# this is not a comment
it is the first line of a block style literal scalar
processing this gobbles a newline which doesn't go into a comment
# that is unless you have a (dedented) comment directly following
d: 42 # and some non-full line comment
e: # another one
# and some more comments to align
f: glitter in the dark near the Tannhäuser gate
"""
def redo_comments(d):
def do_one(comment):
if not comment:
return
comment.column = 0
value = '\n'.join(line.strip() for line in comment.value.splitlines()) + '\n'
while '\n\n\n' in value:
value = value.replace('\n\n\n', '\n\n')
comment.value = value
def do_values(v):
for x in v:
for comment in x:
do_one(comment)
def do_loc(v):
if v is None:
return
do_one(v[0])
if not v[1]:
return
for comment in v[1]:
do_one(comment)
if isinstance(d, dict):
do_loc(d.ca.comment)
do_values(d.ca.items.values())
for val in d.values():
redo_comments(val)
elif isinstance(d, list):
do_values(d.ca.items.values())
for elem in d:
redo_comments(elem)
def realign_full_line_comments(s):
res = []
buf = []
for line in s.splitlines(True):
if not buf:
if line and line[0] == '#':
buf.append(line)
else:
res.append(line)
else:
if line[0] in '#\n':
buf.append(line)
else:
# YAML line, determine indent
count = 0
while line[count] == ' ':
count += 1
if count > len(line):
break # superfluous?
indent = ' ' * count
for cline in buf:
if cline[0] == '\n': # empty
res.append(cline)
else:
res.append(indent + cline)
buf = []
res.append(line)
return ''.join(res)
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)
redo_comments(data)
yaml.dump(data, sys.stdout, transform=realign_full_line_comments)
which gives:
# the following is a example YAML doc
a:
- b: 42
# collapse multiple empty lines
c: |
# this is not a comment
it is the first line of a block style literal scalar
processing this gobbles a newline which doesn't go into a comment
# that is unless you have a (dedented) comment directly following
d: 42 # and some non-full line comment
e: # another one
# and some more comments to align
f: glitter in the dark near the Tannhäuser gate
I have to rotate (like "clockwise") a singe entry of a tuple in a list of tuples. I use this single tuple as a focus.
And when the flag was placed on the end of the list of tuples, it next should be again on the begin.
x = []
x.append(('search','https://www.search.com','1')) # '1' is the focus
x.append(('financials','https://www.stock-exchange.com','0'))
x.append(('fastfood','https://www.burgers.com','0'))
x.append(('tv','https://www.tv.com','0'))
print x
[('search','https://www.search.com','1'),('financials','https://www.stock-exchange.com','0'),('fastfood','https://www.burgers.com','0'),('tv','https://www.tv.com','0')]
What I need to have, is this (abstract displayed)...
- - *
- -
- -
- -
Then I must switch the focus to the "next line"...
- -
- - *
- -
- -
...and later...
- -
- -
- - *
- -
...then...
- -
- -
- -
- - *
...and then this again...
- - *
- -
- -
- -
...and so on.
Sometimes I need to find the focus in my code and extract the data from the first and second entry where the focus (third entry) is currently placed.
With this "one-liner", I can find the focus, and can store all entries to a, b and c:
a,b,c = list(data[[index for (index, a_tuple) in enumerate(x) if a_tuple[2]=='1'][0]])
I like those "one-liners". But changing the focus to the "next line" seems to be not so easy.
newX = [] # Create a new List for x
if '1' in x[len(x)-1][2]: # Determine whether the focus is currently placed on the end, and if Yes, place it to the begin
for b in range(0,len(x),1): # Parse
if b == 0: # Set the focus on the begin
newX.append((x[b][0],x[b][1],'1')) # Set new focus to new list
else: # No focus for all other entries
newX.append((x[b][0],x[b][1],"0")) # Set "no-focus" to all other entries in new list
else: # The focus was not on the end. Where is it?
a = 0 # A little helper
for b in range(0,len(x),1): # Parse again
if '1' in x[b][2]: # Focus found
a = b # Set the current tuple-number to a
break # Already found... don't go on with for/next
for b in range(0,len(x),1): # Parse again
if b == a+1: # Set the new focus on next entry as it was before
newX.append((x[b][0],x[b][1],'1')) # Set new focus to new list
else: # No Focus for all other entries
newX.append((x[b][0],x[b][1],"0")) # Set "no-focus" to all other entries in new list
x = newX # Set x with the new list
del newX,a,b # Save some memory
I managed to do this, but I don't like my code. This task looks so simple that I think there also has to be a "one-liner" for it, something that is embedded internally in Python and that is exactly intended for this. I have v2.7. Anyone an idea?
If I understand you correctly, you want simple modulo (%) operator.
Remove the third item from the tuple, make new variable named selected that will hold the value of selected item:
lst = [
("search", "https://www.search.com"),
("financials", "https://www.stock-exchange.com"),
("fastfood", "https://www.burgers.com"),
("tv", "https://www.tv.com"),
]
selected = 0
def next_selected(lst, cur_selected):
return (cur_selected + 1) % len(lst)
selected = next_selected(lst, selected) # will wrap around
Then getting the selected item is simply:
a, b = lst[selected]
I think I've got it
lst = {
0: ("0","search", "https://www.search.com"),
1: ("0", "financials", "https://www.stock-exchange.com"),
2: ("*", "fastfood", "https://www.burgers.com"),
3: ("0", "tv", "https://www.tv.com"),
}
def next_selected(lst, cur_selected):
return (cur_selected + 1) % len(lst)
for index, key in enumerate(lst): # parse
if lst[index][0] == "*": # found focus
break
lst[index] = "0",lst[index][1],lst[index][2] # delete old focus
selected = index # use index from above parsing
selected = next_selected(lst, selected) # will wrap around
lst[selected] = '*',lst[selected][1],lst[selected][2] # store focus to selected
a,b = lst[selected][1],lst[selected][2] # get values of selected
This program displays a home circuit breaker panel. the user can view what is on each breaker on the panel (data taken from an imported dictionary of entered breaker panel info) or the user can check what breakers control any list zone (kitchen basement, etc) The breakerville program closes when the user decides and is supposed to play a wave file at the close. It doesn't play after the program is made into an exe with pyinstaller just the windows 'beep'.
I am suspecting that I may need to edit the spec file to get the wave file to work after compiled. Is this correct and if so how? Do I need to modify the spec file?
from playsound import playsound # CURRENTLY USING
from chart import chart
from BreakerZones import BreakerZones
import time
import sys
import colorama
import yaml # to print the nested_lookup results(n) on separate lines
from nested_lookup import nested_lookup, get_all_keys # importing 2 items from nested_lookup
from colorama import Fore, Back, Style
colorama.init(autoreset=True) # If you don't want to print Style.RESET_ALL all the time,
# reset automatically after each print statement with True
print(colorama.ansi.clear_screen())
print('\n'*4) # prints a newline 4 times
print(Fore.MAGENTA + ' Arriving-' + Fore.GREEN + ' *** BREAKERVILLE USA ***')
def main():
print('\n' * 2)
print(Fore.BLUE + ' Breaker Numbers and Zones')
k = get_all_keys(BreakerZones)
# raw amount of keys even repeats , has quotes
new_l = [] # eliminate extra repeating nested keys
for e in k: # has quotes
if e not in new_l and sorted(e) not in new_l: #
new_l.append(e) #
print()
new_l.sort() # make alphabetical
newer_l = ('%s' % ', '.join(map(str, new_l)).strip("' ,")) # remove ['%s'] brackets so they don't show up when run
print(' ', yaml.dump(newer_l, default_flow_style=False)) # strip("' ,") or will see leading "' ," in output
print(Fore.BLUE + ' ENTER A BREAKER # OR ZONE', Fore.GREEN + ': ', end='')
i = input().strip().lower() # these lines is workaround for the colorama
print() # user input() issue of 'code' appearing in screen output
if i in k:
n = (nested_lookup(i, BreakerZones, wild=False, with_keys=False)) # wild=True means key not case sensitive,
print(yaml.dump(n, default_flow_style=False)) # 'with_keys' returns values + keys also
# for key, value in n.items(): eliminated by using yaml
# print(key, '--', value) eliminated by using yaml
else:
print(Fore.YELLOW + ' Typo,' + Fore.GREEN + ' try again')
main()
print()
print(Fore.GREEN + ' Continue? Y or N: C for breaker chart : ', end='') # see comments ENTER A BREAKER
ans = input().strip().lower() # strip() removes any spaces before or after user input
if ans == 'c':
chart()
print()
print(Fore.GREEN + ' Continue? Y or N : ', end='')
ans = input().strip().lower() # strip() removes any spaces before or after user input
if ans == 'y': # shorter version 'continue Y or N' after printing breaker chart
main()
else:
print()
print(Fore.MAGENTA + ' Departing -' + Fore.GREEN + ' *** BREAKERVILLE ***')
playsound('train whistle.wav')
time.sleep(2) # delay to exit program
sys.exit()
elif ans != 'y':
print()
print(Fore.MAGENTA + ' Good Day -' + Fore.GREEN + ' *** BREAKERVILLE ***')
playsound('train whistle.wav') #CURRENTLY USING
time.sleep(2) # delay to exit program
sys.exit()
else:
main()
main()
For the records: The issue is fixed by providing full path to the sound file.
This is probably linked to the implementation of playsound and how it determines what is the current working directory. Please refer to https://pyinstaller.readthedocs.io/en/stable/runtime-information.html#run-time-information for a better understanding of that topic with pyinstaller
So I have a ZIP reader library, and I read ZIP files by first figuring out where the EOCD record is (the standard way "from the tail"). I have to look for a pattern that is roughly this:
4byte_magic_number, fixed_n_bytes, 2_bytes_of_comment_size, comment
The bytesize of comment is provided in the 2_bytes_of_comment_size. Just scanning for the magic number is insufficient, because I eager-read a substantial portion at the tail of the file - basically the maximum size the ZIP EOCD record can be, and then look for this pattern in there.
So far, I came up with this
def locate_eocd_signature(in_str)
# We have to scan from the _very_ tail. We read the very minimum size
# the EOCD record can have (up to and including the comment size), using
# a sliding window. Once our end offset matches the comment size we found our
# EOCD marker.
eocd_signature_int = 0x06054b50
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
end_location = minimum_record_size * -1
loop do
# If the window is nil, we have rolled off the start of the string, nothing to do here.
# We use negative values because if we used positive slice indices
# we would have to detect the rollover ourselves
break unless window = in_str[end_location, minimum_record_size]
window_location = in_str.bytesize + end_location
unpacked = window.unpack(unpack_pattern)
# If we found the signature, pick up the comment size, and check if the size of the window
# plus that comment size is where we are in the string. If we are - bingo.
if unpacked[0] == 0x06054b50 && comment_size = unpacked[-1]
assumed_eocd_location = in_str.bytesize - comment_size - minimum_record_size
# if the comment size is where we should be at - we found our EOCD
return assumed_eocd_location if assumed_eocd_location == window_location
end
end_location -= 1 # Shift the window back, by one byte, and try again.
end
end
but it just screams ugly at me. Is there a better way to do something like this? Is there a pack specifier that says "all the bytes in binary until the the end of the string" that I do not know of? Then I could tack that onto the end of the pack specifier for example... A bit at loss here.
In the end I opted for the following optimization. First, I made a method for finding all the indices of a given substring in a string - there is no stdlib builtin for this.
def all_indices_of_substr_in_str(of_substring, in_string)
last_i = 0
found_at_indices = []
while last_i = in_string.index(of_substring, last_i)
found_at_indices << last_i
last_i += of_substring.bytesize
end
found_at_indices
end
Then, we use it to "latch" onto the offsets in our buffer where our signature was found.
def locate_eocd_signature(in_str)
eocd_signature = 0x06054b50
eocd_signature_str = [eocd_signature].pack('V')
unpack_pattern = 'VvvvvVVv'
minimum_record_size = 22
str_size = in_str.bytesize
indices = all_indices_of_substr_in_str(eocd_signature_str, in_str)
indices.each do |check_at|
maybe_record = in_str[check_at..str_size]
# If the record is smaller than the minimum - we will never recover anything
break if maybe_record.bytesize < minimum_record_size
# Now we check if the record ends with the combination
# of the comment size and an arbitrary byte string of that size.
# If it does - we found our match
*_unused, comment_size = maybe_record.unpack(unpack_pattern)
if (maybe_record.bytesize - minimum_record_size) == comment_size
return check_at # Found the EOCD marker location
end
end
# If we haven't caught anything, return nil deliberately instead of returning the last statement
nil
end
I am working on a larger project to write a code so the user can play Connect 4 against the computer. Right now, the user can choose whether or not to go first and the board is drawn. While truing to make sure that the user can only enter legal moves, I have run into a problem where my function legal_moves() takes 1 positional argument, and 0 are given, but I do not understand what I need to do to male everything agree.
#connect 4
#using my own formating
import random
#define global variables
X = "X"
O = "O"
EMPTY = "_"
TIE = "TIE"
NUM_ROWS = 6
NUM_COLS = 8
def display_instruct():
"""Display game instructions."""
print(
"""
Welcome to the second greatest intellectual challenge of all time: Connect4.
This will be a showdown between your human brain and my silicon processor.
You will make your move known by entering a column number, 1 - 7. Your move
(if that column isn't already filled) will move to the lowest available position.
Prepare yourself, human. May the Schwartz be with you! \n
"""
)
def ask_yes_no(question):
"""Ask a yes or no question."""
response = None
while response not in ("y", "n"):
response = input(question).lower()
return response
def ask_number(question,low,high):
"""Ask for a number within range."""
#using range in Python sense-i.e., to ask for
#a number between 1 and 7, call ask_number with low=1, high=8
low=1
high=NUM_COLS
response = None
while response not in range (low,high):
response=int(input(question))
return response
def pieces():
"""Determine if player or computer goes first."""
go_first = ask_yes_no("Do you require the first move? (y/n): ")
if go_first == "y":
print("\nThen take the first move. You will need it.")
human = X
computer = O
else:
print("\nYour bravery will be your undoing... I will go first.")
computer = X
human = O
return computer, human
def new_board():
board = []
for x in range (NUM_COLS):
board.append([" "]*NUM_ROWS)
return board
def display_board(board):
"""Display game board on screen."""
for r in range(NUM_ROWS):
print_row(board,r)
print("\n")
def print_row(board, num):
"""Print specified row from current board"""
this_row = board[num]
print("\n\t| ", this_row[num], "|", this_row[num], "|", this_row[num], "|", this_row[num], "|", this_row[num], "|", this_row[num], "|", this_row[num],"|")
print("\t", "|---|---|---|---|---|---|---|")
# everything works up to here!
def legal_moves(board):
"""Create list of column numbers where a player can drop piece"""
legals = []
if move < NUM_COLS: # make sure this is a legal column
for r in range(NUM_ROWS):
legals.append(board[move])
return legals #returns a list of legal columns
#in human_move function, move input must be in legal_moves list
print (legals)
def human_move(board,human):
"""Get human move"""
legals = legal_moves(board)
print("LEGALS:", legals)
move = None
while move not in legals:
move = ask_number("Which column will you move to? (1-7):", 1, NUM_COLS)
if move not in legals:
print("\nThat column is already full, nerdling. Choose another.\n")
print("Human moving to column", move)
return move #return the column number chosen by user
def get_move_row(turn,move):
move=ask_number("Which column would you like to drop a piece?")
for m in range (NUM_COLS):
place_piece(turn,move)
display_board()
def place_piece(turn,move):
if this_row[m[move]]==" ":
this_row.append[m[move]]=turn
display_instruct()
computer,human=pieces()
board=new_board()
display_board(board)
move= int(input("Move?"))
legal_moves()
print ("Human:", human, "\nComputer:", computer)
Right down the bottom of the script, you call:
move= int(input("Move?"))
legal_moves()
# ^ no arguments
This does not supply the necessary board argument, hence the error message.