ARM L2 Cache access count differs when stepping though code - performance

I am using OpenOCD to run a program on an embedded board with an ARM Cortex-A9 (a Zynq-7010; I'm using a single core running a baremetal program).
I would like to be able to figure out if each memory access results in a cache hit or a cache miss. The L2 Cache Controller, an L2C-310, provides two counters which I have configured to count Data Read Hits and Data Read Lookups (which result in either a hit or a miss).
The problem is that I get different results when I execute the program without stopping compared to when I execute the program and step through each line. My specific question is: Why is there a difference and is there a way that I can make the output uniform? More generally: Is it possible to determine if a specific line of assembly resulted in a cache hit?
I am controlling OpenOCD using a python script. I am disabling the L1 caches (not shown), turning off pre-fetching, and (I think) turning off the pre-load engine. Here is the code I am running when executing the program without stopping:
# A this point, the program has be loaded, and allowed to execute up until the start of the code section I am measuring
telnet = Telnet('127.0.0.1', 4444, 300)
end_addr = 0x100930 # End of code section I am measuing
### 1) Pre-Fetch / Pre-Load stuff ####################
# Turning off the pre-fetch option. Should read, change mask, then write
print('\tSetting pre-fetch option to off\n')
telnet.write('arm mcr 15 0 1 0 1\n')
response = telnet.read_until('arm mcr 15 0 1 0 1\r\n')
response = telnet.read_until('\r>')
print('Response: %s' % (response))
actlr = int(response[:-2])
print('actlr: %d' % (actlr))
actlr = actlr & (~(1 << 2)) # Disable L1 Pre-fetch
actlr = actlr & (~(1)) # Disable TLB / Cache maintenance fwd
print('actlr: %d' % (actlr))
# write to telnet
telnet.write('arm mrc 15 0 1 0 1 ' + str(actlr) + '\n')
telnet.write('arm mcr 15 0 1 0 1\n')
response = telnet.read_until('arm mcr 15 0 1 0 1\r\n')
response = telnet.read_until('\r>')
print('Response: %s' % (response))
print('attempting to pause pre-load engine:')
telnet.write('arm mrc 15 0 11 3 0 0\n')
######################################################
### 2) Enable L2CC counters ##########################
telnet.write('mww 0xF8F02200 0x1\n') # enable counting
# confirm enabled
telnet.write('mdw 0xF8F02200\n')
response = telnet.read_until('mdw 0xF8F02200\r\n')
response = telnet.read_until('\r>')
print('Response should be 1: %s' % (response))
# Counter 0: 0011 to set source to Data Lookup, 01 for increment
telnet.write('mww 0xF8F02208 0xD\n')
# confirm counter 0 settings
telnet.write('mdw 0xF8F02208\n')
response = telnet.read_until('mdw 0xF8F02208\r\n')
response = telnet.read_until('\r>')
print('Response should be 0xD: %s' % (response))
# Counter 1: 0010 to set source to Data Read Hit, 01 for increment
telnet.write('mww 0xF8F02204 0x9\n')
# confirm counter 1 settings
telnet.write('mdw 0xF8F02204\n')
response = telnet.read_until('mdw 0xF8F02204\r\n')
response = telnet.read_until('\r>')
print('Response should be 0x9: %s' % (response))
######################################################
### 3) Different from step mode ######################
print("Setting end tag to 0x%x" % (end_addr))
# Set breakpoint to end of section
telnet.write('bp ' + str(end_addr) + ' 1 hw\n')
telnet.write('resume\n')
response = telnet.read_until('resume\r\n', 1)
response = telnet.read_until('>', 1)
# Allow program to execute
sleep(10)
######################################################
### 4) Check L2CC counters ###########################
# Counter 0: number of data lookups
telnet.write('mdw 0xF8F02210\n')
response = telnet.read_until('mdw 0xF8F02210\r\n')
response = telnet.read_until('\r>')
print('Data Lookups: %s' % (response))
# Counter 1: check number of data read hits
telnet.write('mdw 0xF8F0220C\n')
response = telnet.read_until('mdw 0xF8F0220C\r\n')
response = telnet.read_until('\r>')
print('Data Read Hits: %s' % (response))
system('exit')
The results of which are:
Data Lookups: 0xf8f02210: 00000660
Data Read Hits: 0xf8f0220c: 000004c2
When I run the code by stepping through each instruction, the number of data lookups ends up being the same as the number of read hits... so where did those cache misses go?
Here is the changed code (chunk number 3):
### 3) Different from continuous mode ##############
print("Setting end tag to 0x%x" % (end_addr))
cur_addr = step()
# Run until the end tag
while cur_addr != end_addr:
cur_addr = step()
And the step() function:
# Commands OpenOCD to step the processor forward
# Returns the new pc address
# Sorry for the messy error checking!
def step():
telnet.write('step\n')
response = telnet.read_until('step\r\n', 1)
# print('Step: %s ****' % (response))
# Skip line of output
response = telnet.read_until('\n', 1)
# print('Step: %s ****' % (response))
# 2nd line has the pc, 4th column
response = telnet.read_until('\n', 1)
# print('Step: %s ****' % (response))
try:
cur_addr = int(response.strip().split()[3], 16)
except (ValueError, IndexError) as e:
print("Error step() on line: ", response)
print("\tException: ", e)
countdown = 20
error = True
while (countdown > 0):
countdown = countdown - 1
response = telnet.read_until('\n', 1)
#print("Retry on line: ", response)
if ('cpsr' in response): # TODO: Maybe just read until this anyways?
print("Found a match.\n")
cur_addr = int(response.strip().split()[3], 16)
countdown = 0
error = False
if error:
# try sending again?
telnet.write('step\n')
response = telnet.read_until('step\r\n', 1)
print('Step: %s ****' % (response))
# Skip line of output
response = telnet.read_until('\n', 1)
print('Step: %s ****' % (response))
# 2nd line has the pc, 4th column
response = telnet.read_until('\n', 1)
print("Failed to recover from error? Retried:", response)
cur_addr = int(response.strip().split()[3], 16) # This line should fail
response = telnet.read_until('>', 1) # skip remaining
return cur_addr
And the output:
Data Lookups: 0xf8f02210: 000004c2
Data Read Hits: 0xf8f0220c: 000004c2
Thank you for your time, and I appreciate any help.

Related

Confused about the use of validation set here

For the main.py of the px2graph project, the part of training and validation is shown as below:
splits = [s for s in ['train', 'valid'] if opt.iters[s] > 0]
start_round = opt.last_round - opt.num_rounds
# Main training loop
for round_idx in range(start_round, opt.last_round):
for split in splits:
print("Round %d: %s" % (round_idx, split))
loader.start_epoch(sess, split, train_flag, opt.iters[split] * opt.batchsize)
flag_val = split == 'train'
for step in tqdm(range(opt.iters[split]), ascii=True):
global_step = step + round_idx * opt.iters[split]
to_run = [sample_idx, summaries[split], loss, accuracy]
if split == 'train': to_run += [optim]
# Do image summaries at the end of each round
do_image_summary = step == opt.iters[split] - 1
if do_image_summary: to_run[1] = image_summaries[split]
# Start with lower learning rate to prevent early divergence
t = 1/(1+np.exp(-(global_step-5000)/1000))
lr_start = opt.learning_rate / 15
lr_end = opt.learning_rate
tmp_lr = (1-t) * lr_start + t * lr_end
# Run computation graph
result = sess.run(to_run, feed_dict={train_flag:flag_val, lr:tmp_lr})
out_loss = result[2]
out_accuracy = result[3]
if sum(out_loss) > 1e5:
print("Loss diverging...exiting before code freezes due to NaN values.")
print("If this continues you may need to try a lower learning rate, a")
print("different optimizer, or a larger batch size.")
return
time_str = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, global_step, out_loss, out_accuracy))
# Log data
if split == 'valid' or (split == 'train' and step % 20 == 0) or do_image_summary:
writer.add_summary(result[1], global_step)
writer.flush()
# Save training snapshot
saver.save(sess, 'exp/' + opt.exp_id + '/snapshot')
with open('exp/' + opt.exp_id + '/last_round', 'w') as f:
f.write('%d\n' % round_idx)
It seems that the author only get the result of each batch of the validation set. I am wondering, if I want to observe whether the model is improving or reaching the best performance, should I use the result on the whole validation set?
If the validation set is small enough, we could calculate the loss, accuracy on the whole validation set during training to observe the performance. However, if the validation set is too large, it is better to calculate batch-wise validation results and for multiple steps.

Tool/Algorithm for text comparision after every key hit

I am struggling to find a text comparison tool or algorithm that can compare an expected text against the current state of the text being typed.
I will have an experimentee typewrite a text that he has in front of his eyes. My idea is to compare the current state of the text against the expected text whenever something is typed. That way I want to find out when and what the subject does wrong (I also want to find errors that are not in the resulting text but were in the intermediate text for some time).
Can someone point me in a direction?
Update #1
I have access to the typing data in a csv format:
This is example output data of me typing "foOBar". Every line has the form (timestamp, Key, Press/Release)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
In Python
Given your input csv file (I called it keyboard_records.csv)
17293398.576653,F,P
17293398.6885,F,R
17293399.135282,LeftShift,P
17293399.626881,LeftShift,R
17293401.313254,O,P
17293401.391732,O,R
17293401.827314,LeftShift,P
17293402.073046,O,P
17293402.184859,O,R
17293403.178612,B,P
17293403.301748,B,R
17293403.458137,LeftShift,R
17293404.966193,A,P
17293405.077869,A,R
17293405.725405,R,P
17293405.815159,R,R
The following code does the following:
Read its content and store it in a list named steps
For each step in steps recognizes what happened and
If it was a shift press or release sets a flag (shift_on) accordingly
If it was an arrow pressed moves the cursor (index of current where we insert characters) – if it the cursor is at the start or at the end of the string it shouldn't move, that's why those min() and max()
If it was a letter/number/symbol it adds it in curret at cursor position and increments cursor
Here you have it
import csv
steps = [] # list of all actions performed by user
expected = "Hello"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
current = [] # text written by the user
shift_on = False # is shift pressed
cursor = 0 # where is the cursor in the current text
for step in steps:
time, key, action = step
if key == 'LeftShift':
if action == 'P':
shift_on = True
else:
shift_on = False
continue
if key == 'LeftArrow' and action == 'P':
cursor = max(0, cursor-1)
continue
if key == 'RightArrow' and action == 'P':
cursor = min(len(current), cursor+1)
continue
if action == 'P':
if shift_on is True:
current.insert(cursor, key.upper())
else:
current.insert(cursor, key.lower())
cursor += 1
# Now you can join current into a string
# and compare current with expected
print(''.join(current)) # printing current (just to see what's happening)
else:
# What to do when a key is released?
# Depends on your needs...
continue
To compare current and expected have a look here.
Note: by playing around with the code above and a few more flags you can make it recognize also symbols. This will depend on your keyboard. In mine Shift + 6 = &, AltGr + E = € and Ctrl + Shift + AltGr + è = {. I think this is a good point to start.
Update
Comparing 2 texts isn't a difficult task and you can find tons of pages on the web about it.
Anyway I wanted to present you an object oriented approach to the problem, so I added the compare part that I previously omitted in the first solution.
This is still a rough code, without primary controls over the input. But, as you asked, this is pointing you in a direction.
class UserText:
# Initialize UserText:
# - empty text
# - cursor at beginning
# - shift off
def __init__(self, expected):
self.expected = expected
self.letters = []
self.cursor = 0
self.shift = False
# compares a and b and returns a
# list containing the indices of
# mismatches between a and b
def compare(a, b):
err = []
for i in range(min(len(a), len(b))):
if a[i] != b[i]:
err.append(i)
return err
# Parse a command given in the
# form (time, key, action)
def parse(self, command):
time, key, action = command
output = ""
if action == 'P':
if key == 'LeftShift':
self.shift = True
elif key == 'LeftArrow':
self.cursor = max(0, self.cursor - 1)
elif key == 'RightArrow':
self.cursor = min(len(self.letters), self.cursor + 1)
else:
# Else, a letter/number was pressed. Let's
# add it to self.letters in cursor position
if self.shift is True:
self.letters.insert(self.cursor, key.upper())
else:
self.letters.insert(self.cursor, key.lower())
self.cursor += 1
########## COMPARE WITH EXPECTED ##########
output += "Expected: \t" + self.expected + "\n"
output += "Current: \t" + str(self) + "\n"
errors = UserText.compare(str(self), self.expected[:len(str(self))])
output += "\t\t"
i = 0
for e in errors:
while i != e:
output += " "
i += 1
output += "^"
i += 1
output += "\n[{} errors at time {}]".format(len(errors), time)
return output
else:
if key == 'LeftShift':
self.shift = False
return output
def __str__(self):
return "".join(self.letters)
import csv
steps = [] # list of all actions performed by user
expected = "foobar"
with open("keyboard.csv") as csvfile:
for row in csv.reader(csvfile, delimiter=','):
steps.append((float(row[0]), row[1], row[2]))
# Now we parse the information
ut = UserText(expected)
for step in steps:
print(ut.parse(step))
The output for the csv file above was:
Expected: foobar
Current: f
[0 errors at time 17293398.576653]
Expected: foobar
Current: fo
[0 errors at time 17293401.313254]
Expected: foobar
Current: foO
^
[1 errors at time 17293402.073046]
Expected: foobar
Current: foOB
^^
[2 errors at time 17293403.178612]
Expected: foobar
Current: foOBa
^^
[2 errors at time 17293404.966193]
Expected: foobar
Current: foOBar
^^
[2 errors at time 17293405.725405]
I found the solution to my own question around a year ago. Now i have time to share it with you:
In their 2003 paper 'Metrics for text entry research: An evaluation of MSD and KSPC, and a new unified error metric', R. William Soukoreff and I. Scott MacKenzie propose three major new metrics: 'total error rate', 'corrected error rate' and 'not corrected error rate'. These metrics have become well established since the publication of this paper. These are exaclty the metrics i was looking for.
If you are trying to do something similiar to what i did, e.g. compare the writing performance on different input devices this is the way to go.

Why is Google Translate API giving me so many 403s?

I've posted the relevant code below. I have a quote of 100 requests / second and a total quota of 50M characters daily (the latter of which I've never hit). I'm including 75 requests in each batch (i.e. in the below, there are 75 strings in each group).
I'm constantly running into 403s, usually after a very short time span of less than a minute of firing off requests. After that, no amount of backoff works until the next day. This is really debilitating and I'm very unsure why it's happening. So far, their response team hasn't been helpful for diagnosing the issue.
Here's an example error:
"Google Translate Error on checksum 48af8c32261d9cb8911d99168a6f5b21: https://www.googleapis.com/language/translate/v2?q=QUERYSTRING&source=ja&target=en&key=MYKEY&format=text&alt=json returned "User Rate Limit Exceeded">"
def _google_translate_callback(self, request_id, response, err):
if err:
print 'Google Translate Error on request_id %s: %s' % (request_id, err)
print 'Backing off for %d seconds.' % self.backoff
sleep(self.backoff)
if self.backoff < 4096:
self.backoff = self.backoff * 2
self._translate_array_google_helper()
else:
translation = response['translations'][0]['translatedText'] \
.replace('"', '"') \
.replace(''', "'")
self.translations.append((request_id, translation))
if is_done():
self.is_translating = False
else:
self.current_group += 1
self._translate_array_google_helper()
def _translate_array_google_helper(self):
if self.current_group >= len(self.groups):
self.is_translating = False
return
service = self.google_translator.translations()
group = self.groups[self.current_group]
batch = self.google_translator.new_batch_http_request(
callback=self._google_translate_callback
)
for text, request_id in group:
format_ = 'text'
if is_html(text):
format_ = 'html'
batch.add(
service.list(q=text, format=format_,
target=self.to_lang, source=self.from_lang),
request_id=request_id
)
batch.execute()

How to get systeminfos with Ruby?

I like to get my Pi's systeminfos like CPU usage, CPU temp, RAM usage, uptime and the available disk size. I know how to do this in Python, but it wont work in Ruby. Can someone please tell me how I can achieve this? I think it must be Ruby, because I need it for my Siriproxy and the plugin is written in Ruby.
Tnanks in advantage!
This is the Python script:
#!/usr/bin/env python
import os, time
# Return CPU temperature as a character string
def getCPUtemperature():
res = os.popen('vcgencmd measure_temp').readline()
return(res.replace("temp=","").replace("'C\n",""))
# Return RAM information (unit=kb) in a list
# Index 0: total RAM
# Index 1: used RAM
# Index 2: free RAM
def getRAMinfo():
p = os.popen('free')
i = 0
while 1:
i = i + 1
line = p.readline()
if i==2:
return(line.split()[1:4])
# Return % of CPU used by user as a character string
def getCPUuse():
return(str(os.popen("top -n1 | awk '/Cpu\(s\):/ {print $2}'").readline().strip(\
)))
# Return information about disk space as a list (unit included)
# Index 0: total disk space
# Index 1: used disk space
# Index 2: remaining disk space
# Index 3: percentage of disk used
def getDiskSpace():
p = os.popen("df -h /")
i = 0
while 1:
i = i +1
line = p.readline()
if i==2:
return(line.split()[1:5])
# CPU informatiom
CPU_temp = getCPUtemperature()
CPU_usage = getCPUuse()
# RAM information
# Output is in kb, here I convert it in Mb for readability
RAM_stats = getRAMinfo()
RAM_total = round(int(RAM_stats[0]) / 1000,1)
RAM_used = round(int(RAM_stats[1]) / 1000,1)
RAM_free = round(int(RAM_stats[2]) / 1000,1)
# Disk information
DISK_stats = getDiskSpace()
DISK_total = DISK_stats[0]
DISK_free = DISK_stats[1]
DISK_perc = DISK_stats[3]
These method definitions should help you. I'm not in a position to test them, but they should be pretty close if not spot on.
def get_cpu_temperature
%x{vcgencmd measure_temp}.lines.first.sub(/temp=/, '').sub(/C\n/, '')
end
def get_ram_info
%x{free}.lines.to_a[1].split[1,3]
end
def get_cpu_use
%x{top -n1}.lines.find{ |line| /Cpu\(s\):/.match(line) }.split[1]
end
def get_disk_Space
%x{df -h /}.lines.to_a[1].split[1,4]
end

Ruby data extraction from a text file

I have a relatively big text file with blocks of data layered like this:
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
(they contain more lines and then are repeated)
I would like first to extract the numerical value after TUNE X = and output these in a text file. Then I would like to extract the numerical value of LINE FREQUENCY and AMPLITUDE as a pair of values and output to a file.
My question is the following: altough I could make something moreorless working using a simple REGEXP I'm not convinced that it's the right way to do it and I would like some advices or examples of code showing how I can do that efficiently with Ruby.
Generally, (not tested)
toggle=0
File.open("file").each do |line|
if line[/TUNE/]
puts line.split("=",2)[-1].strip
end
if line[/Line Frequency/]
toggle=1
next
end
if toggle
a = line.split
puts "#{a[1]} #{a[2]}"
end
end
go through the file line by line, check for /TUNE/, then split on "=" to get last item.
Do the same for lines containing /Line Frequency/ and set the toggle flag to 1. This signify that the rest of line contains the data you want to get. Since the freq and amplitude are at fields 2 and 3, then split on the lines and get the respective positions. Generally, this is the idea. As for toggling, you might want to set toggle flag to 0 at the next block using a pattern (eg SIGNAL CASE or ANALYSIS)
file = File.open("data.dat")
#tune_x = #frequency = #amplitude = []
file.each_line do |line|
tune_x_scan = line.scan /TUNE X = (\d*\.\d*)/
data_scan = line.scan /(\d*\.\d*E[-|+]\d*)/
#tune_x << tune_x_scan[0] if tune_x_scan
#frequency << data_scan[0] if data_scan
#amplitude << data_scan[0] if data_scan
end
There are lots of ways to do it. This is a simple first pass at it:
text = 'ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 0.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 0.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 0.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 1.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 1.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 1.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
ANALYSIS OF X SIGNAL, CASE: 1
TUNE X = 2.2561890123390808
Line Frequency Amplitude Phase Error mx my ms p
1 2.2561890123391E+00 0.204316425208E-01 0.164145385871E+03 0.00000000000E+00 1 0 0 0
2 2.2562865535359E+00 0.288712798671E-01 -.161563284233E+03 0.97541196785E-04 1 0 0 0
'
require 'stringio'
pretend_file = StringIO.new(text, 'r')
That gives us a StringIO object we can pretend is a file. We can read from it by lines.
I changed the numbers a bit just to make it easier to see that they are being captured in the output.
pretend_file.each_line do |li|
case
when li =~ /^TUNE.+?=\s+(.+)/
print $1.strip, "\n"
when li =~ /^\d+\s+(\S+)\s+(\S+)/
print $1, ' ', $2, "\n"
end
end
For real use you'd want to change the print statements to a file handle: fileh.print
The output looks like:
# >> 0.2561890123390808
# >> 0.2561890123391E+00 0.204316425208E-01
# >> 0.2562865535359E+00 0.288712798671E-01
# >> 1.2561890123390808
# >> 1.2561890123391E+00 0.204316425208E-01
# >> 1.2562865535359E+00 0.288712798671E-01
# >> 2.2561890123390808
# >> 2.2561890123391E+00 0.204316425208E-01
# >> 2.2562865535359E+00 0.288712798671E-01
You can read your file line by line and cut each by number of symbol, for example:
to extract tune x get symbols from
10 till 27 on line 2
to extract LINE FREQUENCY get
symbols from 3 till 22 on line 6+n

Resources