I'm trying to write down 6809 assembly in EBNF to write a tree-sitter parser.
I'm stuck on one certain production. In 6809 assembly, you can use a register as an operand and additionally de- or increment it:
LDA 0,X+ ; loads A from X then bumps X by 1
LDD ,Y++ ; loads D from Y then bumps Y by 2
LDA 0,-U ; decrements U by 1 then loads A from address in U
LDU ,--S ; decrements S by 2 then loads U from address in S
Mind the "missing" first operand in the second line of code. Here are the productions I wrote:
instruction = opcode, [operand], ["," , register_exp];
...
register_exp = [{operator}], register | register, [{operator}];
register = "X" | "Y" | "U" | etc. ;
operator = "+" | "-";
The problem is register_exp = .... I feel like there could be a more elegant way to define this production. Also, what happens if only a register is given to register_exp?
You probably need
register_exp = [{operator}], register | register, [{operator}] | register;
to allow register names without operators. Why do you find it not so elegant? Quite descriptive.
I'm using the windows kernel debugger through visual studio 2013 and I'm trying to stop (break) in a function (nt!KiSwapContext) but only for a specific process (0x920).
The breakpoint works without a condition bp nt!KiSwapContext
I determined the Process ID for the current thread can be found with dt dword poi(gs:[188h])+3B8h
I've confirmed the following conditional works to see if I am on the right thread: ? poi(poi(gs:[188h])+3B8h)==0x920
However, when I try to set the conditional breakpoint it always breaks no matter what I put in the if/else . So I am guessing it thinks the expression is invalid and is just ignoring it. I've confirmed that if I do enter an invalid expression it just accepts it without warning or error and always stops on the breakpoint.
The expression I am using is: bp nt!KiSwapContext ".if (poi(poi(gs:[188h])+3B8h)==0x920) {} .else {gc}"
I also tried using the j conditional syntax to no avail.
Any ideas on what I am doing wrong?
[Edit] Oh, as a bonus, how can I do the conditional check with a dword instead of a qword on a 64 bit processor. ? poi(poi(gs:[188h])+3B8h) returns a qword value. I know I can use dd to get the value, but I can't seem to figure out how to add that into the conditional. Something like ? dword(poi(gs:[188h])+3B8h)==0x920 or ? {dd poi(gs:[188h])+3B8h}==0x920
windbg allows you to set process specific breakpoints with /p
you shouldn't be mucking with gs and fs registers
kd> bl
kd> !process 0 0 calc.exe
Failed to get VAD root
PROCESS 8113d528 SessionId: 0 Cid: 07a0 Peb: 7ffde000 ParentCid: 043c
DirBase: 03d27000 ObjectTable: e15ba240 HandleCount: 28.
Image: calc.exe
kd> bp /p 8113d528 nt!KiSwapContext "?? (char *)(#$proc->ImageFileName)"
kd> g
char * 0x8113d69c
"calc.exe"
nt!KiSwapContext:
804db828 83ec10 sub esp,10h
kd> g
char * 0x8113d69c
"calc.exe"
nt!KiSwapContext:
804db828 83ec10 sub esp,10h
use dwo() and qwo () as required to evaluate dword and qword
kd> ? qwo ( ffb9cda8 + 70)
Evaluate expression: -9142252815570161280 = 81203180`81203180
kd> ? dwo ( qwo ( ffb9cda8 + 70))
Evaluate expression: -4600296 = ffb9ce18
confirmation
kd> dd 81203180 l1
81203180 ffb9ce18
kd> dd ffb9cda8+70 l1
ffb9ce18 81203180
Edit
I cant access an x64 system atm so cant tell you what is the error in your expression
but in general you should avoid hardcoding unless it is absolutely necessary
in your case it is not necessary
windbg provides you pseudo registers to what you are hard coding
$thread to c++ Expression for CurrentThread * ie (nt!_ETHREAD *) .
so $thread->Cid.UniqueProcess is what you are evaluating with your gsexxxxx
with that in mind you can set a breakpoint like this
bp nt!KiSwapContext " r? $t0 = #$thread->Cid.UniqueProcess ;.if( #$t0 != 0x740 ) {? #$t0;?? (char * )#$proc->ImageFileName ;gc }"
this conditional will break only in calc.exe is the Current Process
kd> g
Evaluate expression: 404 = 00000194
char * 0x81105c84
"csrss.exe"
XXXXXXXXXXX
Evaluate expression: 4 = 00000004
char * 0x8129196c
"System"
xxxxxxxxxxxxxxxxxxxxxxxxxxx
Evaluate expression: 1404 = 0000057c
char * 0x8114a4bc
"vpcmap.exe"
Evaluate expression: 480 = 000001e0
char * 0x8112a98c
"services.exe"
Evaluate expression: 492 = 000001ec
char * 0x811cc9ac
"lsass.exe"
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Evaluate expression: 1116 = 0000045c
char * 0xffaf9da4
"explorer.exe"
Evaluate expression: 644 = 00000284
char * 0xffb74f14
"svchost.exe"
nt!KiSwapContext: <---------------------------Conditional broke here
804db828 83ec10 sub esp,10h
kd> ? #$t0;?? (char * )#$proc->ImageFileName
Evaluate expression: 1856 = 00000740
char * 0x8110e76c
"calc.exe"
keep in mind evaluating conditions in a very hot path will make you endure unbearable pain watching it crawl by
nt!kiSwapContext is called hundreds of times in few seconds
and you will be seeing a very noticeable performance degradation in your
Session
whenever possible use process specific or thread specific breakpoints
do not evaluate conditions
no i don't use any cheat sheet ( google says there are few available ) i prefer manual or in some cases online msdn documentation
I'm writing an multiarchitecture assembler/disassembler in Common Lisp (SBCL 1.1.5 in 64-bit Debian GNU/Linux), currently the assembler produces correct code for a subset of x86-64. For assembling x86-64 assembly code I use a hash table in which assembly instruction mnemonics (strings) such as "jc-rel8" and "stosb" are keys that return a list of 1 or more encoding functions, like the ones below:
(defparameter *emit-function-hash-table-x64* (make-hash-table :test 'equalp))
(setf (gethash "jc-rel8" *emit-function-hash-table-x64*) (list #'jc-rel8-x86))
(setf (gethash "stosb" *emit-function-hash-table-x64*) (list #'stosb-x86))
The encoding functions are like these (some are more complicated, though):
(defun jc-rel8-x86 (arg1 &rest args)
(jcc-x64 #x72 arg1))
(defun stosb-x86 (&rest args)
(list #xaa))
Now I am trying to incorporate the complete x86-64 instruction set by using NASM's (NASM 2.11.06) instruction encoding data (file insns.dat) converted to Common Lisp CLOS syntax. This would mean replacing regular functions used for emitting binary code (like the functions above) with instances of a custom x86-asm-instruction class (a very basic class so far, some 20 slots with :initarg, :reader, :initform etc.), in which an emit method with arguments would be used for emitting the binary code for given instruction (mnemonic) and arguments. The converted instruction data looks like this (but it's more than 40'000 lines and exactly 7193 make-instance's and 7193 setf's).
;; first mnemonic + operand combination instances (:is-variant t).
;; there are 4928 such instances for x86-64 generated from NASM's insns.dat.
(eval-when (:compile-toplevel :load-toplevel :execute)
(setf Jcc-imm-near (make-instance 'x86-asm-instruction
:name "Jcc"
:operands "imm|near"
:code-string "[i: odf 0f 80+c rel]"
:arch-flags (list "386" "BND")
:is-variant t))
(setf STOSB-void (make-instance 'x86-asm-instruction
:name "STOSB"
:operands "void"
:code-string "[ aa]"
:arch-flags (list "8086")
:is-variant t))
;; then, container instances which contain (or could be refer to instead)
;; the possible variants of each instruction.
;; there are 2265 such instances for x86-64 generated from NASM's insns.dat.
(setf Jcc (make-instance 'x86-asm-instruction
:name "Jcc"
:is-container t
:variants (list Jcc-imm-near
Jcc-imm64-near
Jcc-imm-short
Jcc-imm
Jcc-imm
Jcc-imm
Jcc-imm)))
(setf STOSB (make-instance 'x86-asm-instruction
:name "STOSB"
:is-container t
:variants (list STOSB-void)))
;; thousands of objects more here...
) ; this bracket closes (eval-when (:compile-toplevel :load-toplevel :execute)
I have converted NASM's insns.dat to Common Lisp syntax (like above) using a trivial Perl script (further below, but there's nothing of interest in the script itself) and in principle it works. So it works, but compiling those 7193 objects is really really slow and commonly causes heap exhaustion. On my Linux Core i7-2760QM laptop with 16G of memory the compiling of an (eval-when (:compile-toplevel :load-toplevel :execute) code block with 7193 objects like the ones above takes more than 7 minutes and sometimes causes heap exhaustion, like this one:
;; Swank started at port: 4005.
* Heap exhausted during garbage collection: 0 bytes available, 32 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
3: 38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255
4: 127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2000000 0 0 0.0000
5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
6: 0 0 0 0 1178 163 0 0 0 43941888 0 2000000 985 0 0.0000
Total bytes allocated = 4159844368
Dynamic-space-size bytes = 4194304000
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = in progress
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 9994(tid 46912556431104):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
I had to add --dynamic-space-size 4000 parameter for SBCL to get it compiled at all, but still after allocating 4 gigabytes of dynamic space heap sometimes gets exhausted. Even if the heap exhaustion would be solved, more than 7 minutes for compiling 7193 instances after only adding a slot in the class ('x86-asm-instruction class used for these instances) is way too much for interactive development in REPL (I use slimv, if that matters).
Here's (time (compile-file output:
; caught 18636 WARNING conditions
; insns.fasl written
; compilation finished in 0:07:11.329
Evaluation took:
431.329 seconds of real time
238.317000 seconds of total run time (234.972000 user, 3.345000 system)
[ Run times consist of 6.073 seconds GC time, and 232.244 seconds non-GC time. ]
55.25% CPU
50,367 forms interpreted
784,044 lambdas converted
1,031,842,900,608 processor cycles
19,402,921,376 bytes consed
Using OOP (CLOS) would enable incorporating the instruction mnemonic (such as jc or stosb above, :name), allowed operands of the instruction (:operands), instruction's binary encoding (such as #xaa for stosb, :code-string) and possible architecture limitations (:arch-flags) of the instruction in one object. But it seems that at least my 3-year-old computer is not efficient enough to compile around 7000 CLOS object instances quickly.
My question is: Is there some way to make SBCL's make-instance faster, or should I keep assembly code generation in regular functions like the examples further above? I'd be also very happy to know about any other possible solutions.
Here's the Perl script, just in case:
#!/usr/bin/env perl
use strict;
use warnings;
# this program converts NASM's `insns.dat` to Common Lisp Object System (CLOS) syntax.
my $firstchar;
my $line_length;
my $are_there_square_brackets;
my $mnemonic_and_operands;
my $mnemonic;
my $operands;
my $code_string;
my $flags;
my $mnemonic_of_current_mnemonic_array;
my $clos_object_name;
my $clos_mnemonic;
my $clos_operands;
my $clos_code_string;
my $clos_flags;
my #object_name_array = ();
my #mnemonic_array = ();
my #operands_array = ();
my #code_string_array = ();
my #flags_array = ();
my #each_mnemonic_only_once_array = ();
my #instruction_variants_array = ();
my #instruction_variants_for_current_instruction_array = ();
open(FILE, 'insns.dat');
$mnemonic_of_current_mnemonic_array = "";
# read one line at once.
while (<FILE>)
{
$firstchar = substr($_, 0, 1);
$line_length = length($_);
$are_there_square_brackets = ($_ =~ /\[.*\]/);
chomp;
if (($line_length > 1) && ($firstchar =~ /[^\t ;]/))
{
if ($are_there_square_brackets)
{
($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_;
$code_string = "[" . $code_string . "]";
($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands;
}
else
{
($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_;
}
$mnemonic =~ s/[\t ]+/ /g;
$operands =~ s/[\t ]+/ /g;
$code_string =~ s/[\t ]+/ /g;
$flags =~ s/[\t ]+//g;
# we don't want non-x86-64 instructions here.
unless ($flags =~ "NOLONG")
{
# ok, the content of each field is now filtered,
# let's convert them to a suitable Common Lisp format.
$clos_object_name = $mnemonic . "-" . $operands;
# in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`,
# but that would get too complicated.
# so we'll simply replace them:
# `|` -> `-`.
# `,` -> `.`.
# `:` -> `.`.
$clos_object_name =~ s/\|/-/g;
$clos_object_name =~ s/,/./g;
$clos_object_name =~ s/:/./g;
$clos_mnemonic = "\"" . $mnemonic . "\"";
$clos_operands = "\"" . $operands . "\"";
$clos_code_string = "\"" . $code_string . "\"";
$clos_flags = "\"" . $flags . "\""; # add first and last double quotes.
$clos_flags =~ s/,/" "/g; # make each flag its own Common Lisp string.
$clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax.
push #object_name_array, $clos_object_name;
push #mnemonic_array, $clos_mnemonic;
push #operands_array, $clos_operands;
push #code_string_array, $clos_code_string;
push #flags_array, $clos_flags;
if ($mnemonic eq $mnemonic_of_current_mnemonic_array)
{
# ok, same mnemonic as the previous one,
# so the current object name goes to the list.
push #instruction_variants_for_current_instruction_array, $clos_object_name;
}
else
{
# ok, this is a new mnemonic.
# so we'll mark this as current mnemonic.
$mnemonic_of_current_mnemonic_array = $mnemonic;
push #each_mnemonic_only_once_array, $mnemonic;
# we first push the old array (unless it's empty), then clear it,
# and then push the current object name to the cleared array.
if (#instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push #instruction_variants_array, [ #instruction_variants_for_current_instruction_array ];
}
#instruction_variants_for_current_instruction_array = ();
push #instruction_variants_for_current_instruction_array, $clos_object_name;
}
}
}
}
# the last instruction's instruction variants must be pushed too.
if (#instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push #instruction_variants_array, [ #instruction_variants_for_current_instruction_array ];
}
close(FILE);
# these objects need be created already during compilation.
printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n");
# print the code to create each instruction + operands combination object.
for (my $i=0; $i <= $#mnemonic_array; $i++)
{
$clos_object_name = $object_name_array[$i];
$mnemonic = $mnemonic_array[$i];
$operands = $operands_array[$i];
$code_string = $code_string_array[$i];
$flags = $flags_array[$i];
# print the code to create a variant object.
# each object here is a variant of a single instruction (or a single mnemonic).
# actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end.
printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))",
$clos_object_name,
$mnemonic,
$operands,
$code_string,
$flags);
printf("\n\n");
}
# print the code to create each instruction + operands combination object.
# for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++)
for my $i (0 .. $#instruction_variants_array)
{
$mnemonic = $each_mnemonic_only_once_array[$i];
# print the code to create a container object.
printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic);
#instruction_variants_for_current_instruction_array = $instruction_variants_array[$i];
# for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++)
for my $j (0 .. $#{$instruction_variants_array[$i]} )
{
printf("%s", $instruction_variants_array[$i][$j]);
# print 3 closing brackets if this is the last variant.
if ($j == $#{$instruction_variants_array[$i]})
{
printf(")))");
}
else
{
printf(" ");
}
}
# if this is not the last instruction, print two newlines.
if ($i < $#instruction_variants_array)
{
printf("\n\n");
}
}
# print the closing bracket to close `eval-when`.
print(")");
exit;
18636 warnings looks really bad, Start by getting rid of all the warnings.
I would start by getting rid of the EVAL-WHEN around all that. Does not make much sense to me. Either load the file directly, or compile and load the file.
Also note that SBCL does not like (setf STOSB-void ...) when the variable is undefined. New top-level variables are introduced with DEFVAR or DEFPARAMETER. SETF just sets them, but does not define them. That should help to get rid of the warnings.
Also :is-container t and :is-variant t smell like these properties should be converted into classes to inherit from (for example as a mixin). A container has variants. A variant does not have variants.
I just found a CodeGolf answer here http://repl.it/2Om/6:
puts"The eight#{e="een-hundreds were a time for "}rum.
The ninet#{e}fun.
The two-thousands are a time to run
a civilized classroom.
"*(?X-??)
Original post: https://codegolf.stackexchange.com/a/40250
I am curious, how does it work? I have never seen (?X-??) before. What's happening here?
?Char gives the ASCII code of Char.
?X = "X".ord = 88
?? = "?".ord = 63
?X - ?? = 88-63 = 25
There is your integer: 25
Then "a"*25 = "aaaaaaaaaaaaaaaaaaaaaaaaa"
I have a string variable with multiple lines: e.g.
"SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n
I would want to get both of lines that start with "Seq_vec SVEC" and extract the values of the integer part that matches...
string = "Clone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n"
seqvector = Regexp.new("Seq_vec\\s+SVEC\\s+(\\d+\\s+\\d+)",Regexp::MULTILINE )
vector = string.match(seqvector)
if vector
vector_start,vector_stop = vector[1].split(/ /)
puts vector_start.to_i
puts vector_stop.to_i
end
However this only grabs the first match's values and not the second as i would like.
Any ideas what i could be doing wrong?
Thank you
To capture groups use String#scan
vector = string.scan(seqvector)
=> [["1 65"], ["102 1710"]]
match finds just the first match. To find all matches use String#scan e.g.
string.scan(seqvector)
=> [["1 65"], ["102 1710"]]
or to do something with each match:
string.scan(seqvector) do |match|
# match[0] will be the substring captured by your first regexp grouping
puts match.inspect
end
Just to make this a bit easier to handle, I would split the whole string into an array first and then would do:
string = "SClone VARPB63A\nSeq_vec SVEC 1 65 pCR2.1-topo\nSequencing_vector \"pCR2.1-topo\"\nSeq_vec SVEC 102 1710 pCR2.1-topo\nClipping QUAL 46 397\n"
selected_strings = string.split("\n").select{|x| /Seq_vec SVEC/.match(x)}
selected_strings.collect{|x| x.scan(/\s\d+/)}.flatten # => [" 1", " 65", " 102", " 1710"]