Memory issue when calculating a huge matrix in VMD's TCL - malloc: can't allocate region - matrix

I wrote a *.tcl script for calculation of the pairwise RMSD between trajectory's frames (n=5000) using VMD in text mode (command line). However, when I run the script (vmd -dispdev text -e distmatrix.tcl) on the terminal, after processing couple of frames the script stops with a memory issue:
$vmd -dispdev text -e distmatrix.tcl
Info) VMD for MACOSXX86, version 1.9.3 (November 30, 2016)
Info) http://www.ks.uiuc.edu/Research/vmd/
Info) Email questions and bug reports to vmd#ks.uiuc.edu
Info) Please include this reference in published work using VMD:
Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual
Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Multithreading available, 8 CPUs detected.
Info) Dynamically loaded 2 plugins in directory:
Info) /Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/MACOSXX86/molfile
{/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/tcl8.5} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts} {/Applications/VMD/VMD 1.9.3.app/Contents/lib} /Applications/VMD/VMD 1.9.3.app/Contents/Frameworks/Tcl.framework/Versions/8.5/Resources/Scripts ~/Library/Tcl /Library/Tcl /Network/Library/Tcl /System/Library/Tcl ~/Library/Frameworks /Library/Frameworks /Network/Library/Frameworks /System/Library/Frameworks {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/vmd} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/MACOSXX86/tcl} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/noarch/tcl} /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/la1.0
{/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/tcl8.5} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts} {/Applications/VMD/VMD 1.9.3.app/Contents/lib} /Applications/VMD/VMD 1.9.3.app/Contents/Frameworks/Tcl.framework/Versions/8.5/Resources/Scripts ~/Library/Tcl /Library/Tcl /Network/Library/Tcl /System/Library/Tcl ~/Library/Frameworks /Library/Frameworks /Network/Library/Frameworks /System/Library/Frameworks {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/vmd} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/MACOSXX86/tcl} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/noarch/tcl} /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/la1.0 /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/orient
{/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/tcl8.5} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts} {/Applications/VMD/VMD 1.9.3.app/Contents/lib} /Applications/VMD/VMD 1.9.3.app/Contents/Frameworks/Tcl.framework/Versions/8.5/Resources/Scripts ~/Library/Tcl /Library/Tcl /Network/Library/Tcl /System/Library/Tcl ~/Library/Frameworks /Library/Frameworks /Network/Library/Frameworks /System/Library/Frameworks {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/vmd} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/MACOSXX86/tcl} {/Applications/VMD/VMD 1.9.3.app/Contents/vmd/plugins/noarch/tcl} /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/la1.0 /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/orient /Applications/VMD/VMD 1.9.3.app/Contents/vmd/scripts/vmd/get_total_charge.tcl
2.0.2
file6
5000
::m
Progress: ▏▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▏ 65% (3295/5000)vmd(96350,0x5ae080) malloc: can't allocate region
*** mach_vm_map(size=8388608) failed (error code=3)
vmd(96350,0x5ae080) malloc: *** set a breakpoint in malloc_error_break to debug
vmd(96350,0x5ae080) malloc: can't allocate region
*** mach_vm_map(size=8388608) failed (error code=3)
vmd(96350,0x5ae080) malloc: *** set a breakpoint in malloc_error_break to debug
unable to realloc 1149 bytes
Abort trap: 6
It is interesting to note that when I run the same script in tclsh (./distmatrix.tcl), the program is running smoothly and finishes successfully with writing the final 5000x5000 matrix in a the output file.
$./distmatrix.tcl
Progress: ▏▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋▋ ▏ 100% (5000/5000)
Done.
For the sake of simplicity I tested a similar simple script (not the real trajectory calculation) of producing a 5000x5000 matrix (code below):
#!/usr/bin/tclsh
package require struct::matrix
################################################################
proc progress_bar {totalsize current} {
font color light green for the bar
set fgreenl [exec tput setaf 40]
no color
set nb [exec tput sgr 0]
puts -nonewline "\rProgress: \u258F"
if {$totalsize == 0} {
set totalsize $current
}
set portion [expr 1.0 * $current/$totalsize * 40]
for {set x 0} {$x <= $portion} {incr x} {
puts -nonewline "${fgreenl}\u258B${nb}"
}
for {} {$x <= 40} {incr x} {
puts -nonewline " "
}
puts -nonewline " \u258F [format "%3d" [expr int(100.0 * $current/$totalsize)]]% ($current/$totalsize)"
flush stdout
if {$totalsize == $current} {
puts "\nDone."
}
}
Output file matrix
set outfile [open mat2.txt w]
################################################################
set n 5000
Create a new matrix that will store all data (pairwise RMSD values)
struct::matrix m;
Define number of rows that is equal to the number of frames in the MD trajectory
m add columns [expr $n + 1];
for {set i 0} {$i <= $n} {incr i} {
set vecl {}
for {set j 0} {$j <= $n} {incr j} {
set vec [expr sqrt($j)]
lappend vecl $vec
m add row $vecl;
progress_bar $n $i
after 1
}
set vec_vals [m format 2string;]
puts $outfile $vec_vals
close $outfile
exit
Is there a possibility to overcome this memory issue (e.g., somehow by chunking the operations executed inside the second loop)?

Related

Script to copy all text from one file into template file

I'm extremely new to scripting (this will be the first I've ever written on my own), and I'm struggling...
Essentially, a program I'm using outputs a multiple lists of atomic cartesian coordinates for a molecule, where each set of coordinates contains a slightly different geometry. I then want to run calculations on this molecule at each of these geometries in another program. To do this, I need to create input files containing these coordinates based on a template file, and am hoping to do this via a bash or zsh script.
The first program outputs all geometries in a single file, of the form:
13
-15.02035015
C 3.0629012683 -0.1237662359 -0.0004161296
C 1.5725410176 -0.4599705612 -0.0010537192
H 3.6545324244 -1.0351015878 -0.0040975574
H 3.3232895577 0.4531937573 0.8855087768
H 3.3225341254 0.4598595336 -0.8822056347
N 0.6972014643 0.7054585380 0.0017284824
H 1.3274001069 -1.0545725774 0.8830977697
H 1.3271390154 -1.0504225891 -0.8878762403
H 0.8745667924 1.2800026498 -0.8166554074
H 0.8753847581 1.2767560879 0.8221982135
S -2.4024384670 -0.0657095889 -0.0009217321
H -2.1207044390 -1.3609141502 0.0227283569
H -1.0945221361 0.2739471520 0.0001162389
13
-15.02029090
C 3.0458878237 -0.1642767706 -0.0538270794
C 1.5490175255 -0.4572401536 0.0316764611
H 3.3628431459 0.4546044246 0.7845264240
H 3.2796163460 0.3602842378 -0.9790015411
H 3.6124852940 -1.0910645341 -0.0311065021
N 0.7057821467 0.7323073404 0.0100678359
H 1.3291157247 -0.9968212951 0.9565729700
H 1.2449884019 -1.0864558318 -0.8086148493
H 0.8643815373 1.2571851525 -0.8447936589
H 0.9361625337 1.3407384060 0.7900308086
S -2.3784808925 -0.1009812166 -0.0319557326
H -2.4637876581 -0.0476175701 1.2900767837
H -1.0744168237 0.2509451631 -0.0171658709
etc...
essentially, one line where the number of atoms is written (always the same number within a file, but will depend on the molecule you are interested in [3 if I'm looking at water, H2O; 5 for methane, CH4; 4 for ammonia, NH3; etc.]), one comment line (in the case of this program, the energies are written there) and then the cartesian coordinates, followed directly by the next set of coordinates. In my test file, there are 49 sets of coordinates.
The template file will look something like this:
#Comment line, molecule number CONF_NUMBER
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...
* xyz 0 1
COORDINATES
*
So, ideally I would end up writing a script which would take the coordinates of each molecule from the coordinates file and generate an input file for every listed geometry based on the template, replacing the COORDINATES text in the template with the one of the geometries (and, if possible, to include a number in the first comment line, replacing CONF_NUMBER with a number matching the directory and file name):
~/c1/molecule-c1-name.inp:
#Comment line, molecule number 1
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...
* xyz 0 1
O 1.0 23.21 1.1
H 2.0 2.90 1.1
H 3.0 2.33 1.1
*
~/c2/molecule-c2-name.inp:
#Comment line, molecule number 2
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...
* xyz 0 1
O 2.0 23.21 1.1
H 3.0 2.70 1.43
H 2.0 2.33 1.1
*
etc...
So far, I've been able to break up the individual geometries into separate each geometry into individual coordinate files and even remove the two lines above each geometry (which are not needed). Unfortunately, I cannot find a way to copy the whole geometry into the template; I'm stuck in a position where only a single line from the coordinate file is copies over per input file. The code I've got so far is:
#!/bin/zsh
input_file=$1
#arguments
while getopts t:j:p:s: flag
do
case "${flag}" in
t) template_file=${OPTARG};;
j) job_file=${OPTARG};;
p) prefix=${OPTARG};;
s) suffix=${OPTARG};;
esac
done
#Determine number of atoms
n_atoms=$(sed -n 1p $1)
#Determine number of lines to separate
splitlines=$(echo $n_atoms | awk '{print ($0+2)}')
#determine number of conformers
n_conformers=$(grep -c "$n_atoms" $1)
#echo $splitlines
#Split the coordinates into individual .xyz files
split -dl $splitlines $1 coords
#Rename coordinate files
for file in coords*
do
sed -i -e 1,2d $file
mv "$file" "$file.xyz"
done
rm *-e
#Copy coordinates into template file
n=1
while read file
do
sed -i "" "s/COORDINATES/"$file"/r" template.inp > ea.h2s-input${n}.inp
((n++))
done < coords00.xyz #first coordinate file produced
In this example, coords00.xyz is the first of the separated coordinate files the script generates, template.inp is the template file, and ea.h2s-input${n}.inp is the name of the resulting input file (which I will later make customisable with arguments, hopefully).
Bear in mind that, during testing, I've been trying to get the simple things working, so this script is only written to get the first geometry copied into the template file (hence why the files are named explicitly, rather than as variables - although I'm hoping to use the arguments at the beginning of the script to help name each resulting input file), but I can't even get that to work!
Unfortunately, all other forum posts I've found only talk about copying small bits of text (a name, a word, one line of text) into templates - never multiple lines, let alone the entire contents of a file.
I have tried everything I can think of to get this to work, and this script is as close as I have gotten, but I cannot figure out how to print all of the coordinate lines into the template. Any help would be greatly appreciated!
If you are open to using awk:
awk -v tmpl="$(<src.tmpl)" 'BEGIN{cnt=1} \
NR==1 {n_atoms=$1} \
NF==1 {flag=1; close(out); out=sprintf("coords""%02d"".xyz", cnt)} \
{if(flag==0) {print $0 > out; coord_cnt++}} \
{if(coord_cnt==n_atoms){printf "*\n" > out; coord_cnt=0}}\
{if (NF==1 && $1!=n_atoms) { flag=0; \
printf "#Comment line, molecule number %s\n%s\n", cnt, tmpl > out; cnt+=1}}' src.txt
template file contents should look like: (newlines escaped)
# \
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym \
%method \
functional mgga_xc_b97m_v \
end \
etc etc etc... \
\
* xyz 0 1
src.txt contents:
13
-15.02035015
C 3.0629012683 -0.1237662359 -0.0004161296
C 1.5725410176 -0.4599705612 -0.0010537192
H 3.6545324244 -1.0351015878 -0.0040975574
H 3.3232895577 0.4531937573 0.8855087768
H 3.3225341254 0.4598595336 -0.8822056347
N 0.6972014643 0.7054585380 0.0017284824
H 1.3274001069 -1.0545725774 0.8830977697
H 1.3271390154 -1.0504225891 -0.8878762403
H 0.8745667924 1.2800026498 -0.8166554074
H 0.8753847581 1.2767560879 0.8221982135
S -2.4024384670 -0.0657095889 -0.0009217321
H -2.1207044390 -1.3609141502 0.0227283569
H -1.0945221361 0.2739471520 0.0001162389
13
-15.02029090
C 3.0458878237 -0.1642767706 -0.0538270794
C 1.5490175255 -0.4572401536 0.0316764611
H 3.3628431459 0.4546044246 0.7845264240
H 3.2796163460 0.3602842378 -0.9790015411
H 3.6124852940 -1.0910645341 -0.0311065021
N 0.7057821467 0.7323073404 0.0100678359
H 1.3291157247 -0.9968212951 0.9565729700
H 1.2449884019 -1.0864558318 -0.8086148493
H 0.8643815373 1.2571851525 -0.8447936589
H 0.9361625337 1.3407384060 0.7900308086
S -2.3784808925 -0.1009812166 -0.0319557326
H -2.4637876581 -0.0476175701 1.2900767837
H -1.0744168237 0.2509451631 -0.0171658709
Output: (note output files numbering starts with '01')
$ ls coords0*
coords01.xyz coords02.xyz
$ cat coords0*
#Comment line, molecule number 1
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...
* xyz 0 1
C 3.0629012683 -0.1237662359 -0.0004161296
C 1.5725410176 -0.4599705612 -0.0010537192
H 3.6545324244 -1.0351015878 -0.0040975574
H 3.3232895577 0.4531937573 0.8855087768
H 3.3225341254 0.4598595336 -0.8822056347
N 0.6972014643 0.7054585380 0.0017284824
H 1.3274001069 -1.0545725774 0.8830977697
H 1.3271390154 -1.0504225891 -0.8878762403
H 0.8745667924 1.2800026498 -0.8166554074
H 0.8753847581 1.2767560879 0.8221982135
S -2.4024384670 -0.0657095889 -0.0009217321
H -2.1207044390 -1.3609141502 0.0227283569
H -1.0945221361 0.2739471520 0.0001162389
*
#Comment line, molecule number 2
#
!B97M-D4 verytightscf verytightopt freq DefGrid3 NoRI Mass2016 UseSym
%method
functional mgga_xc_b97m_v
end
etc etc etc...
* xyz 0 1
C 3.0458878237 -0.1642767706 -0.0538270794
C 1.5490175255 -0.4572401536 0.0316764611
H 3.3628431459 0.4546044246 0.7845264240
H 3.2796163460 0.3602842378 -0.9790015411
H 3.6124852940 -1.0910645341 -0.0311065021
N 0.7057821467 0.7323073404 0.0100678359
H 1.3291157247 -0.9968212951 0.9565729700
H 1.2449884019 -1.0864558318 -0.8086148493
H 0.8643815373 1.2571851525 -0.8447936589
H 0.9361625337 1.3407384060 0.7900308086
S -2.3784808925 -0.1009812166 -0.0319557326
H -2.4637876581 -0.0476175701 1.2900767837
H -1.0744168237 0.2509451631 -0.0171658709
*

CLOS make-instance is really slow and causes heap exhaustion in SBCL

I'm writing an multiarchitecture assembler/disassembler in Common Lisp (SBCL 1.1.5 in 64-bit Debian GNU/Linux), currently the assembler produces correct code for a subset of x86-64. For assembling x86-64 assembly code I use a hash table in which assembly instruction mnemonics (strings) such as "jc-rel8" and "stosb" are keys that return a list of 1 or more encoding functions, like the ones below:
(defparameter *emit-function-hash-table-x64* (make-hash-table :test 'equalp))
(setf (gethash "jc-rel8" *emit-function-hash-table-x64*) (list #'jc-rel8-x86))
(setf (gethash "stosb" *emit-function-hash-table-x64*) (list #'stosb-x86))
The encoding functions are like these (some are more complicated, though):
(defun jc-rel8-x86 (arg1 &rest args)
(jcc-x64 #x72 arg1))
(defun stosb-x86 (&rest args)
(list #xaa))
Now I am trying to incorporate the complete x86-64 instruction set by using NASM's (NASM 2.11.06) instruction encoding data (file insns.dat) converted to Common Lisp CLOS syntax. This would mean replacing regular functions used for emitting binary code (like the functions above) with instances of a custom x86-asm-instruction class (a very basic class so far, some 20 slots with :initarg, :reader, :initform etc.), in which an emit method with arguments would be used for emitting the binary code for given instruction (mnemonic) and arguments. The converted instruction data looks like this (but it's more than 40'000 lines and exactly 7193 make-instance's and 7193 setf's).
;; first mnemonic + operand combination instances (:is-variant t).
;; there are 4928 such instances for x86-64 generated from NASM's insns.dat.
(eval-when (:compile-toplevel :load-toplevel :execute)
(setf Jcc-imm-near (make-instance 'x86-asm-instruction
:name "Jcc"
:operands "imm|near"
:code-string "[i: odf 0f 80+c rel]"
:arch-flags (list "386" "BND")
:is-variant t))
(setf STOSB-void (make-instance 'x86-asm-instruction
:name "STOSB"
:operands "void"
:code-string "[ aa]"
:arch-flags (list "8086")
:is-variant t))
;; then, container instances which contain (or could be refer to instead)
;; the possible variants of each instruction.
;; there are 2265 such instances for x86-64 generated from NASM's insns.dat.
(setf Jcc (make-instance 'x86-asm-instruction
:name "Jcc"
:is-container t
:variants (list Jcc-imm-near
Jcc-imm64-near
Jcc-imm-short
Jcc-imm
Jcc-imm
Jcc-imm
Jcc-imm)))
(setf STOSB (make-instance 'x86-asm-instruction
:name "STOSB"
:is-container t
:variants (list STOSB-void)))
;; thousands of objects more here...
) ; this bracket closes (eval-when (:compile-toplevel :load-toplevel :execute)
I have converted NASM's insns.dat to Common Lisp syntax (like above) using a trivial Perl script (further below, but there's nothing of interest in the script itself) and in principle it works. So it works, but compiling those 7193 objects is really really slow and commonly causes heap exhaustion. On my Linux Core i7-2760QM laptop with 16G of memory the compiling of an (eval-when (:compile-toplevel :load-toplevel :execute) code block with 7193 objects like the ones above takes more than 7 minutes and sometimes causes heap exhaustion, like this one:
;; Swank started at port: 4005.
* Heap exhausted during garbage collection: 0 bytes available, 32 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000
3: 38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255
4: 127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2000000 0 0 0.0000
5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
6: 0 0 0 0 1178 163 0 0 0 43941888 0 2000000 985 0 0.0000
Total bytes allocated = 4159844368
Dynamic-space-size bytes = 4194304000
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = in progress
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 9994(tid 46912556431104):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
I had to add --dynamic-space-size 4000 parameter for SBCL to get it compiled at all, but still after allocating 4 gigabytes of dynamic space heap sometimes gets exhausted. Even if the heap exhaustion would be solved, more than 7 minutes for compiling 7193 instances after only adding a slot in the class ('x86-asm-instruction class used for these instances) is way too much for interactive development in REPL (I use slimv, if that matters).
Here's (time (compile-file output:
; caught 18636 WARNING conditions
; insns.fasl written
; compilation finished in 0:07:11.329
Evaluation took:
431.329 seconds of real time
238.317000 seconds of total run time (234.972000 user, 3.345000 system)
[ Run times consist of 6.073 seconds GC time, and 232.244 seconds non-GC time. ]
55.25% CPU
50,367 forms interpreted
784,044 lambdas converted
1,031,842,900,608 processor cycles
19,402,921,376 bytes consed
Using OOP (CLOS) would enable incorporating the instruction mnemonic (such as jc or stosb above, :name), allowed operands of the instruction (:operands), instruction's binary encoding (such as #xaa for stosb, :code-string) and possible architecture limitations (:arch-flags) of the instruction in one object. But it seems that at least my 3-year-old computer is not efficient enough to compile around 7000 CLOS object instances quickly.
My question is: Is there some way to make SBCL's make-instance faster, or should I keep assembly code generation in regular functions like the examples further above? I'd be also very happy to know about any other possible solutions.
Here's the Perl script, just in case:
#!/usr/bin/env perl
use strict;
use warnings;
# this program converts NASM's `insns.dat` to Common Lisp Object System (CLOS) syntax.
my $firstchar;
my $line_length;
my $are_there_square_brackets;
my $mnemonic_and_operands;
my $mnemonic;
my $operands;
my $code_string;
my $flags;
my $mnemonic_of_current_mnemonic_array;
my $clos_object_name;
my $clos_mnemonic;
my $clos_operands;
my $clos_code_string;
my $clos_flags;
my #object_name_array = ();
my #mnemonic_array = ();
my #operands_array = ();
my #code_string_array = ();
my #flags_array = ();
my #each_mnemonic_only_once_array = ();
my #instruction_variants_array = ();
my #instruction_variants_for_current_instruction_array = ();
open(FILE, 'insns.dat');
$mnemonic_of_current_mnemonic_array = "";
# read one line at once.
while (<FILE>)
{
$firstchar = substr($_, 0, 1);
$line_length = length($_);
$are_there_square_brackets = ($_ =~ /\[.*\]/);
chomp;
if (($line_length > 1) && ($firstchar =~ /[^\t ;]/))
{
if ($are_there_square_brackets)
{
($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_;
$code_string = "[" . $code_string . "]";
($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands;
}
else
{
($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_;
}
$mnemonic =~ s/[\t ]+/ /g;
$operands =~ s/[\t ]+/ /g;
$code_string =~ s/[\t ]+/ /g;
$flags =~ s/[\t ]+//g;
# we don't want non-x86-64 instructions here.
unless ($flags =~ "NOLONG")
{
# ok, the content of each field is now filtered,
# let's convert them to a suitable Common Lisp format.
$clos_object_name = $mnemonic . "-" . $operands;
# in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`,
# but that would get too complicated.
# so we'll simply replace them:
# `|` -> `-`.
# `,` -> `.`.
# `:` -> `.`.
$clos_object_name =~ s/\|/-/g;
$clos_object_name =~ s/,/./g;
$clos_object_name =~ s/:/./g;
$clos_mnemonic = "\"" . $mnemonic . "\"";
$clos_operands = "\"" . $operands . "\"";
$clos_code_string = "\"" . $code_string . "\"";
$clos_flags = "\"" . $flags . "\""; # add first and last double quotes.
$clos_flags =~ s/,/" "/g; # make each flag its own Common Lisp string.
$clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax.
push #object_name_array, $clos_object_name;
push #mnemonic_array, $clos_mnemonic;
push #operands_array, $clos_operands;
push #code_string_array, $clos_code_string;
push #flags_array, $clos_flags;
if ($mnemonic eq $mnemonic_of_current_mnemonic_array)
{
# ok, same mnemonic as the previous one,
# so the current object name goes to the list.
push #instruction_variants_for_current_instruction_array, $clos_object_name;
}
else
{
# ok, this is a new mnemonic.
# so we'll mark this as current mnemonic.
$mnemonic_of_current_mnemonic_array = $mnemonic;
push #each_mnemonic_only_once_array, $mnemonic;
# we first push the old array (unless it's empty), then clear it,
# and then push the current object name to the cleared array.
if (#instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push #instruction_variants_array, [ #instruction_variants_for_current_instruction_array ];
}
#instruction_variants_for_current_instruction_array = ();
push #instruction_variants_for_current_instruction_array, $clos_object_name;
}
}
}
}
# the last instruction's instruction variants must be pushed too.
if (#instruction_variants_for_current_instruction_array)
{
# push the variants array, unless it's empty.
push #instruction_variants_array, [ #instruction_variants_for_current_instruction_array ];
}
close(FILE);
# these objects need be created already during compilation.
printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n");
# print the code to create each instruction + operands combination object.
for (my $i=0; $i <= $#mnemonic_array; $i++)
{
$clos_object_name = $object_name_array[$i];
$mnemonic = $mnemonic_array[$i];
$operands = $operands_array[$i];
$code_string = $code_string_array[$i];
$flags = $flags_array[$i];
# print the code to create a variant object.
# each object here is a variant of a single instruction (or a single mnemonic).
# actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end.
printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))",
$clos_object_name,
$mnemonic,
$operands,
$code_string,
$flags);
printf("\n\n");
}
# print the code to create each instruction + operands combination object.
# for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++)
for my $i (0 .. $#instruction_variants_array)
{
$mnemonic = $each_mnemonic_only_once_array[$i];
# print the code to create a container object.
printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic);
#instruction_variants_for_current_instruction_array = $instruction_variants_array[$i];
# for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++)
for my $j (0 .. $#{$instruction_variants_array[$i]} )
{
printf("%s", $instruction_variants_array[$i][$j]);
# print 3 closing brackets if this is the last variant.
if ($j == $#{$instruction_variants_array[$i]})
{
printf(")))");
}
else
{
printf(" ");
}
}
# if this is not the last instruction, print two newlines.
if ($i < $#instruction_variants_array)
{
printf("\n\n");
}
}
# print the closing bracket to close `eval-when`.
print(")");
exit;
18636 warnings looks really bad, Start by getting rid of all the warnings.
I would start by getting rid of the EVAL-WHEN around all that. Does not make much sense to me. Either load the file directly, or compile and load the file.
Also note that SBCL does not like (setf STOSB-void ...) when the variable is undefined. New top-level variables are introduced with DEFVAR or DEFPARAMETER. SETF just sets them, but does not define them. That should help to get rid of the warnings.
Also :is-container t and :is-variant t smell like these properties should be converted into classes to inherit from (for example as a mixin). A container has variants. A variant does not have variants.

lldb - How to display float with decimals using "type format add"

I have a variable of type float. Xcode displays it using scientific notation (i.e. 3.37626e+07). I'm trying to get it to display using dot notation (i.e. 33762616.00).
I've tried every format provided by lldb, but none displays the float using decimals. I read other posts and watched the WWDC2012 session 415 (as suggested here), but I must be too close the forest to see the trees. Any help would be greatly appreciated!
Try adding a custom data formatter in your ~/.lldbinit file for type float. e.g.
Process 13204 stopped
* thread #1: tid = 0xb6f8d, 0x0000000100000f33 a.out`main + 35 at a.c:5, stop reason = step over
#0: 0x0000000100000f33 a.out`main + 35 at a.c:5
2 int main ()
3 {
4 float f = 33762616.0;
-> 5 printf ("%f\n", f);
6 }
(lldb) p f
(float) $0 = 3.37626e+07
(lldb) type summ add -v -o "return '%f' % valobj.GetData().GetFloat(lldb.SBError(), 0)" float
(lldb) p f
(float) $1 = 33762616.000000
(lldb)
The default set of formatters provided by lldb can't do this, but dropping into Python allows you a lot of flexibility.

Tracing all functions calls and printing out their parameters (userspace)

I want to see what functions are called in my user-space C99 program and in what order. Also, which parameters are given.
Can I do this with DTrace?
E.g. for program
int g(int a, int b) { puts("I'm g"); }
int f(int a, int b) { g(5+a,b);g(8+b,a);}
int main() {f(5,2);f(5,3);}
I wand see a text file with:
main(1,{"./a.out"})
f(5,2);
g(10,2);
puts("I'm g");
g(10,5);
puts("I'm g");
f(5,3);
g(10,3);
puts("I'm g");
g(11,5);
puts("I'm g");
I want not to modify my source and the program is really huge - 9 thousand of functions.
I have all sources; I have a program with debug info compiled into it, and gdb is able to print function parameters in backtrace.
Is the task solvable with DTrace?
My OS is one of BSD, Linux, MacOS, Solaris. I prefer Linux, but I can use any of listed OS.
Here's how you can do it with DTrace:
script='pid$target:a.out::entry,pid$target:a.out::return { trace(arg1); }'
dtrace -F -n "$script" -c ./a.out
The output of this command is like as follows on FreeBSD 14.0-CURRENT:
dtrace: description 'pid$target:a.out::entry,pid$target:a.out::return ' matched 17 probes
I'm g
I'm g
I'm g
I'm g
dtrace: pid 39275 has exited
CPU FUNCTION
3 -> _start 34361917680
3 -> handle_static_init 140737488341872
3 <- handle_static_init 2108000
3 -> main 140737488341872
3 -> f 2
3 -> g 2
3 <- g 32767
3 -> g 5
3 <- g 32767
3 <- f 0
3 -> f 3
3 -> g 3
3 <- g 32767
3 -> g 5
3 <- g 32767
3 <- f 0
3 <- main 0
3 -> __do_global_dtors_aux 140737488351184
3 <- __do_global_dtors_aux 0
The annoying thing is that I've not found a way to print all the function arguments (see How do you print an associative array in DTrace?). A hacky workaround is to add trace(arg2), trace(arg3), etc. The problem is that for nonexistent arguments there will be garbage printed out.
Yes, you can do this with dtrace. But you probably will never be able to do it on linux. I've tried multiple versions of the linux port of dtrace and it's never done what I wanted. In fact, it once caused a CPU panic. Download the dtrace toolkit from http://www.brendangregg.com/dtrace.html. Then set your PATH accordingly. Then execute this:
dtruss -a yourprogram args...
Your question is exceedingly likely to be misguided. For any non-trivial program, printing the sequense of all function calls executed with their parameters will result in multi-MB or even multi-GB output, that you will not be able to make any sense of (too much detail for a human to understand).
That said, I don't believe you can achieve what you want with dtrace.
You might begin by using GCC -finstrument-functions flag, which would easily allow you to print function addresses on entry/exit to every function. You can then trivialy convert addresses into function names with addr2line. This gives you what you asked for (except parameters).
If the result doesn't prove to be too much detail, you can set a breakpoint on every function in GDB (with rb . command), and attach continue command to every breakpoint. This will result in a steady stream of breakpoints being hit (with parameters), but the execution will likely be at least 100 to 1000 times slower.

Programatically get/set Mac OSX default system keyboard shortcut

I'm trying to find a way to programatically get/set the default OSX system keyboard shortcuts (hotkeys) found in the System Preferences -> Keyboard & Mouse -> Keyboard Shortcuts tab. I need to be able to do this in the background, so GUI scripting is not a solution.
I'm unable to find a plist or anything where this info might be stored. I tried using Instruments "File Activity" trace while using System Preferences, but again came up empty handed.
Any help is appreciated.
Actually there's a Plist for that, informations are stored in com.apple.symbolichotkeys AppleSymbolicHotKeys which is a complex nested dicts and lists as :
$ defaults read com.apple.symbolichotkeys AppleSymbolicHotKeys
{
10 = {
enabled = 1;
value = {
parameters = (
65535,
96,
8650752
);
type = standard;
};
};
11 = {
enabled = 1;
value = {
parameters = (
65535,
97,
8650752
);
type = standard;
};
};
[...]
}
Let's say you want to programatically modify the "Show Help Menu" shortcut in System Preferences -> Keyboard -> Shortcuts tab -> App Shortcut -> All Applications. To find the correct entry print all the Plist in a text file, modify the shortcut in the System Preferences, print again the the Plist in a second file and diff them:
$ defaults read com.apple.symbolichotkeys AppleSymbolicHotKeys > 1
$ # modify System Preferences
$ defaults read com.apple.symbolichotkeys AppleSymbolicHotKeys > 2
$ diff -U 5 1 2
--- 1 2019-05-27 23:37:58.000000000 -0300
+++ 2 2019-05-27 23:38:24.000000000 -0300
## -5063,13 +5063,13 ##
};
98 = {
enabled = 1;
value = {
parameters = (
- 32,
- 49,
- 524288
+ 105,
+ 34,
+ 655360
);
type = standard;
};
};
};
So the entry to be modified is 98, since it's a complex structure you'll have to use /usr/libexec/PlistBuddy to do it:
# Set "alt + Space" as shortcut for "Help menu"
/usr/libexec/PlistBuddy -c "Delete :AppleSymbolicHotKeys:98:value:parameters" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Add :AppleSymbolicHotKeys:98:value:parameters array" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Add :AppleSymbolicHotKeys:98:value:parameters: integer 32" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Add :AppleSymbolicHotKeys:98:value:parameters: integer 49" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Add :AppleSymbolicHotKeys:98:value:parameters: integer 524288" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Delete :AppleSymbolicHotKeys:98:enabled" ~/Library/Preferences/com.apple.symbolichotkeys.plist
/usr/libexec/PlistBuddy -c "Add :AppleSymbolicHotKeys:98:enabled bool true" ~/Library/Preferences/com.apple.symbolichotkeys.plist
Note:
I had to delete the bool parameter in order to modify it
Computer must be restarted for the changes to be applied
Ooop, I re-ran Instruments, but made sure to close out System Preferences this time, the shortcuts weren't getting written out until then.
Turns out the file is located at ~/Library/Preferences/com.apple.symbolichotkeys.plist
But it's pretty cryptic. None the less, this is what I was after.
There is an API for this (getting, not setting).

Resources