lldb: how to call a function from a specific library/framework - macos

Problem: In project we have localization functions which are specific to a framework/dynamic library. That is they have identical name but fetch resources from different bundles/folders
I'd want to call a function from a specific library, something similar to:
lldb> p my_audio_engine.framework::GetL10nString( stringId );
lldb> expr --shlib my_audio_engine.framework -- GetL10nString();
lldb> p my_audio_engine`L10N_Utils::GetString(40000)
but all these variants don't work.
Adding gdb in tags hoping the same semantic if exists will work on lldb as well.

lldb's expression parser doesn't currently have the equivalent of gdb's foo.c::function meta-symbol to encode a function from a specific source file.
Please feel free to file a bug requesting this at bugreporter.apple.com. It will get duped to the one I filed a while ago, but dups are votes for features, and we haven't gotten around to this one yet 'cause nobody but me asked for it...
For the nonce, you will have to do this by hand. Here's a silly example for calling printf, which I happen to know is in libsystem_c.dylib on OS X. First, I find the address in the shared library I am interested in:
(lldb) image lookup -vn printf libsystem_c.dylib
1 match found in /usr/lib/system/libsystem_c.dylib:
Address: libsystem_c.dylib[0x0000000000042948] (libsystem_c.dylib.__TEXT.__text + 266856)
Summary: libsystem_c.dylib`printf
Module: file = "/usr/lib/system/libsystem_c.dylib", arch = "x86_64"
Symbol: id = {0x00000653}, range = [0x00007fff91307948-0x00007fff91307a2c), name="printf"
The first address (the one under Address) is the address of the function in the dylib, not where it got loaded in the running program. That's not immediately useful. I could calculate the library's load offset if I wanted to and apply it to the file address, but fortunately the first address in the Symbol's address range is the address in the running program so I don't have to. 0x00007fff91307948 is the address I want.
Now I want to call that address. I do this in two steps because it makes the casting easier, like:
(lldb) expr typedef int (*$printf_type)(const char *, ...)
(lldb) expr $printf_type $printf_function = ($printf_type) 0x00007fff91307948
Now I have a function I can call over and over:
(lldb) expr $printf_function("Hello world %d times.\n", 400)
Hello world 400 times.
(int) $2 = 23
If you are going to do this over and over, you can write a Python function that finds the symbol out of the library of interest, and constructs the expression that calls the right function. The Python API's include calls to get symbols from a particular module (lldb-speak for loadable binary images), get their addresses, evaluate expressions, etc.

Related

Does CLion possible evaluate a function when debugging Rust code?

A snip of Rust code:
pub fn main() {
let a = "hello";
let b = a.len();
let c =b;
println!("len:{}",c)
}
When debugging in CLion, Is it possible to evaluate a function? For example, debug the code step by step, now the code is running to the last line println!... and the current step stops here, by adding the expression a.len() to the watch a variable window, the IDE can't evaluate the a.len(). It says: error: no field named len
This is the same reason you can't make conditional breakpoints for Rust code:
Can't create a conditional breakpoint in VSCode-LLDB with Rust
I hope, I'm not too late to answer this, but with both lldb and gdb, Rust debugging capability is currently rather constrained.
Expressions that are straightforward work; anything complex is likely to produce issues.
My observations from rust-lldb trying this, are that only a small portion of Rust is understood by the expression parser.
There is no support for macros.
Non-used functions are not included in the final binary.
For instance, since that method is not included in the binary, you are unable to execute capacity() on the HashMap in the debugger.
Methods must be named as follows:
struct value.method(&struct value)
There is no technique that I've discovered to call monomorphized functions on generic structs (like HashMap).
For example, "hello" is a const char [5] including the trailing NUL byte. String constants "..." in lldb expressions are produced as C-style string constants.
Therefore, they are not valid functions

Match the left side variable of an assignment to the return value of the right side function call

For the following statement inside function func(), I'm trying to figure out the variable name (which is 'dictionary' in the example) that points to the malloc'ed memory region.
Void func() {
uint64_t * dictionary = (uint64_t *) malloc ( sizeof(uint64_t) * 128 );
}
The instrumented malloc() can record the start address and size of the allocation. However, no knowledge of variable 'dictionary' that will be assigned to, any features from the compilers side can help to solve this problem, without modifying the compiler to instrument such assignment statements?
One way I've been thinking is to use the feature that variable 'dictionary' and function 'malloc' is on one source code line or next to each other, the dwarf provides line information.
One thing you can do with Clang and LLVM is emit the code with debug information and then look for malloc calls. These will be assigned to LLVM values, which can be traced (when not compiled with optimizations, that is) to the original C/C++ source code via the debug information metadata.

What does Arduino's "F()" actually do?

I have asked a similar question before, but I realize that I can't make heads or tails of the macrology and templateness. I'm a C (rather than C++) programmer.
What does F() actually do? When does it stuff characters into pgmem (flash)? When does it pull characters out of pgmem? Does it cache them? How does it handle low-memory situations?
There are no templates involved, only function overloading. The F() macro does two things:
uses PSTR to ensure that the literal string is stored in flash memory (the code space rather than the data space). However, PSTR("some string") cannot be printed because it would receive a simple char * which represents a base address of the string stored in flash. Dereferencing that pointer would access some random characters from the same address in data. Which is why F() also...
casts the result of PSTR() to __FlashStringHelper*. Functions such as print and println are overloaded so that, on receiving a __FlashStringHelper* argument, they correctly dereference the characters in the flash memory.
BTW. For the ESP32 library, both of these functions are defined in the following files:
# PSTR : ../Arduino/hardware/espressif/esp32/cores/esp32/pgmspace.h
# F : ../Arduino/hardware/espressif/esp32/cores/esp32/WString.h
And the F(x):
// An abstract class used as a means to provide a unique pointer type
// but really has no body
class __FlashStringHelper;
#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(PSTR(string_literal)))
...
Also for ESP32, PSTR(x) is not needed and is just x: #define PSTR(s) (s).

How to change a call with Reverse engineering

I have an example program test1.exe that uses an example library test2.dll.
test.dll contains the functions A() and B() of the same type.
test1.exe calls A and then exits.
Here I've found the call to A():
(http://i.stack.imgur.com/5W9Jd.jpg)
Now, if i'm not mistaken, I need to replace 88FDFFFF with the correct offset of B(), but how can I calculate it so that B() will be invoked instead of A()?
If this in an x86 call-relative instruction, the offset value is computed by subtracting the address of the instruction following the call (= call instruction location + 5 bytes), from the address of the target. So, you need to patch the offset to be address(B)-address(callinstruction+5).
if b is imported in test1.exe it is easy otherwise you have to use LoadLibrary and GetProcAddress.
press ctrl+N to see if b is imported or not.
I would recommend to learn asm basics first and play with HIEW hexeditor/disassembler to change simple codes.

What obscure syntax ruined your day? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When have you run into syntax that might be dated, never used or just plain obfuscated that you couldn't understand for the life of you.
For example, I never knew that comma is an actual operator in C. So when I saw the code
if(Foo(), Bar())
I just about blew a gasket trying to figure out what was going on there.
I'm curious what little never-dusted corners might exist in other languages.
C++'s syntax for a default constructor on a local variable. At first I wrote the following.
Student student(); // error
Student student("foo"); // compiles
This lead me to about an hour of reading through a cryptic C++ error message. Eventually a non-C++ newbie dropped by, laughed and pointed out my mistake.
Student student;
This is always jarring:
std::vector <std::vector <int> >
^
mandatory space.
When using the System.DirectoryServices name space to bind to an ADAM (Active Directory Application Mode; now called AD LDS, I think), I lost an entire day trying to debug this simple code:
DirectoryEntry rootDSE = new DirectoryEntry(
"ldap://192.168.10.78:50000/RootDSE",
login,
password,
AuthenticationTypes.None);
When I ran the code, I kept getting a COMException with error 0x80005000, which helpfully mapped to "Unknown error."
I could use the login and password and bind to the port via ADSI Edit. But this simple line of code didn't work. Bizarre firewall permission? Something screwed in configuration? Some COM object not registered correctly? Why on earth wasn't it working?
The answer? It's LDAP://, not ldap://.
And this is why we drink.
C++
class Foo
{
// Lots of stuff here.
} bar;
The declaration of bar is VERY difficult to see. More commonly found in C, but especially annoying in C++.
Perl's syntax caused me a bad day a while ago:
%table = {
foo => 1,
bar => 2
};
Without proper warnings (which are unavailable on the platform I was using), this creates a one-element hash with a key as the given hash reference and value undef. Note the subtle use of {}, which creates a new hash reference, and not (), which is an array used to populate the %table hash.
I was shocked Python's quasi-ternary operator wasn't a syntax error the first time I saw it:
X if Y else Z
This is stupid and common, but this syntax:
if ( x = y ) {
// do something
}
Has caught me about three times in the past year in a couple of different languages. I really like the R language's convention of using <- for assignment, like this:
x <- y
If the x = y syntax were made to mean x == y, and x <- y to mean assignment, my brain would make a smoother transition to and from math and programming.
C/C++'s bitvector syntax. The worst part about this is trying to google for it simply based on the syntax.
struct C {
unsigned int v1 : 12;
unsigned int v2 : 1;
};
C#'s ?? operator threw me for a loop the first time I saw it. Essentially it will return the LHS if it's non-null and the RHS if the LHS is null.
object bar = null;
object foo = bar ?? new Student(); // gets new Student()
Powershell's function calling semantics
function foo() {
params ($count, $name);
...
}
foo (5, "name")
For the non powershellers out there. This will work but not how you expect it to. It actually creates an array and passes it as the first argument. The second argument has no explicit value. The correct version is
foo 5 "name"
The first time I saw a function pointer in C++ I was confused. Worse, because the syntax has no key words, it was really hard to look up. What exactly does one type into a search engine for this?
int (*Foo)(float, char, char);
I ended up having to ask the local C++ guru what it was.
VB's (yeah yeah, I have to use it) "And" keyword - as in:
If Object IsNot Nothing And Object.Property Then
See that Object.Property reference, after I've made sure the object isn't NULL? Well, VB's "And" keyword * does * not * block * further * evaluation and so the code will fail.
VB does have, however, another keyword - AndAlso:
If Object IsNot Nothing AndAlso Object.Property Then
That will work as you'd expect and not explode when run.
I was once very confused by some C++ code that declared a reference to a local variable, but never used it. Something like
MyLock &foo;
(Cut me some slack on the syntax, I haven't done C++ in nearly 8 years)
Taking that seemingly unused variable out made the program start dying in obscure ways seemingly unrelated to this "unused" variable. So I did some digging, and found out that the default ctor for that class grabbed a thread lock, and the dtor released it. This variable was guarding the code against simultaneous updates without seemingly doing anything.
Javascript: This syntax ...
for(i in someArray)
... is for looping through arrays, or so I thought. Everything worked fine until another team member dropped in MooTools, and then all my loops were broken because the for(i in ...) syntax also goes over extra methods that have been added to the array object.
Had to translate some scientific code from old FORTRAN to C. A few things that ruined my day(s):
Punch-card indentation. The first 6 characters of every line were reserved for control characters, goto labels, comments, etc:
^^^^^^[code starts here]
c [commented line]
Goto-style numbering for loops (coupled with 6 space indentation):
do 20, i=0,10
do 10, j=0,10
do_stuff(i,j)
10 continue
20 continue
Now imagine there are multiple nested loops (i.e., do 20 to do 30) which have no differentiating indentation to know what context you are in. Oh, and the terminating statements are hundreds of lines away.
Format statement, again using goto labels. The code wrote to files (helpfully referred to by numbers 1,2,etc). To write the values of a,b,c to file we had:
write (1,51) a,b,c
So this writes a,b,c to file 1 using a format statement at the line marked with label 51:
51 format (f10.3,f10.3,f10.3)
These format lines were hundreds of lines away from where they were called. This was complicated by the author's decision to print newlines using:
write (1,51) [nothing here]
I am reliably informed by a lecturer in the group that I got off easy.
C's comma operator doesn't seem very obscure to me: I see it all the time, and if I hadn't, I could just look up "comma" in the index of K&R.
Now, trigraphs are another matter...
void main() { printf("wat??!\n"); } // doesn't print "wat??!"
Wikipedia has some great examples, from the genuinely confusing:
// Will the next line be executed????????????????/
a++;
to the bizarrely valid:
/??/
* A comment *??/
/
And don't even get me started on digraphs. I would be surprised if there's somebody here who can fully explain C's digraphs from memory. Quick, what digraphs does C have, and how do they differ from trigraphs in parsing?
Syntax like this in C++ with /clr enabled. Trying to create a Managed Dictionary object in C++.
gcroot<Dictionary<System::String^, MyObj^>^> m_myObjs;
An oldie:
In PL/1 there are no reserved words, so you can define variables, methods, etc. with the same name as the language keywords.
This can be a valid line of code:
IF ELSE THEN IF ELSE THEN
(Where ELSE is a boolean, and IF and THEN are functions, obviously.)
Iif(condition, expression, expression) is a function call, not an operator.
Both sides of the conditional are ALWAYS evaluated.
It always ruines my day if I have to read/write some kind of Polish notation as used in a lot of HP calculators...
PHP's ternary operator associates left to right. This caused me much anguish one day when I was learning PHP. For the previous 10 years I had been programming in C/C++ in which the ternary operator associates right to left.
I am still a little curious as to why the designers of PHP chose to do that when, in many other respects, the syntax of PHP matches that C/C++ fairly closely.
EDIT: nowadays I only work with PHP under duress.
Not really obscure, but whenever I code too much in one language, and go back to another, I start messing up the syntax of the latter. I always chuckle at myself when I realize that "#if" in C is not a comment (but rather something far more deadly), and that lines in Python do not need to end in a semicolon.
While performing maintentnace on a bit of C++ code I once spotted that someone had done something like this:
for (i=0; i<10; i++)
{
MyNumber += 1;
}
Yes, they had a loop to add 1 to a number 10 times.
Why did it ruin my day? The perpetrator had long since left, and I was having to bug fix their module. I thought that if they were doing something like this, goodness knows what else I was going to encounter!
AT&T assembler syntax >:(
This counter-intuitive, obscure syntax has ruined many of my days, for example, the simple Intel syntax assembly instruction:
mov dword es:[ebp-5], 1 /* Cool, put the value 1 into the
* location of ebp minus five.
* this is so obvious and readable, and hard to mistake
* for anything else */
translates into this in AT&T syntax
movl $1, %es:-4(%ebp) /* huh? what's "l"? 4 bytes? 8 bytes? arch specific??
* wait, why are we moving 1 into -4 times ebp?
* or is this moving -4 * ebp into memory at address 0x01?
* oh wait, YES, I magically know that this is
* really setting 4 bytes at ebp-5 to 1!
More...
mov dword [foo + eax*4], 123 /* Intel */
mov $123, foo(, %eax, 4) /* AT&T, looks like a function call...
* there's no way in hell I'd know what this does
* without reading a full manual on this syntax */
And one of my favorites.
It's as if they took the opcode encoding scheme and tried to incorporate it into the programming syntax (read: scale/index/base), but also tried to add a layer of abstraction on the data types, and merge that abstraction into the opcode names to cause even more confusion. I don't see how anyone can program seriously with this.
In a scripting language (Concordance Programming Language) for stand alone database software (Concordance) used for litigation document review, arrays were 0 indexed while (some) string functions were 1 indexed. I haven't touched it since.
This. I had my run in with it more then once.
GNU extensions are often fun:
my_label:
unsigned char *ptr = (unsigned char *)&&my_label;
*ptr = 5; // Will it segfault? Finding out is half the fun...
The syntax for member pointers also causes me grief, more because I don't use it often enough than because there's anything really tricky about it:
template<typename T, int T::* P>
function(T& t)
{
t.*P = 5;
}
But, really, who needs to discuss the obscure syntax in C++? With operator overloading, you can invent your own!

Resources