gdb7.5 print utf8 string show string address not string content - utf-8

I have just installed the latest released gdb--gdb7.5。 when i used it to debug the c++ program which is encoded with utf-8, i set the gdb charset to utf8 with the command" set charset utf-8". However when i want to print a string : char *str = "明天是个好天气" , the "p str" result is the address of the string rather than the content. So is there anyting necessary for gdb to debug utf8 string?
Breakpoint 1, test () at test.cpp:6
6 char *str = "我们的世界多么美好";
(gdb) n
8 printf( "%s\n" , str );
(gdb) p str
$1 = 0x40067c "0\221们2\204265\214321016好"
(gdb) set charset UTF-8
(gdb) show charset
The host character set is "UTF-8".
The target character set is "UTF-8".
The target wide character set is "auto; currently UTF-32".
(gdb) p str
$2 = 0x40067c
(gdb)

Tell us exactly what you did, and exactly what GDB prints.
Using gdb-7.5:
Temporary breakpoint 1, main () at t.c:5
5 char *str = "明天是个好天气";
(gdb) n
6 return 0;
(gdb) p str
$1 = 0x4005cc "明天是个好天气"
(gdb) show charset
The host character set is "auto; currently UTF-8".
The target character set is "auto; currently UTF-8".
The target wide character set is "auto; currently UTF-32".
(gdb) set charset utf
Undefined item: "utf".
This shows that:
GDB does the right thing "out of the box" for me.
You likely didn't do set charset utf, but did something else.
Your LANG setting in the environment is relevant. Mine is:
$ echo $LANG
en_US.UTF-8
If I unset LANG, I get:
(gdb) p str
$1 = 0x4005cc "\346\230\216\345\244\251\346\230\257\344\270\252\345\245\275\345\244\251\346\260\224"

Related

Declare an lldb summary-string for a sized string type

I would like to have a formatter for the buildin string type of the nim language, but somehow I fail at providing it. Nim compilis to c, and the c representation of the string type you see here:
#if defined(__GNUC__) || defined(__clang__) || defined(_MSC_VER)
# define SEQ_DECL_SIZE /* empty is correct! */
#else
# define SEQ_DECL_SIZE 1000000
#endif
typedef char NIM_CHAR;
typedef long long int NI64;
typedef NI64 NI;
struct TGenericSeq {NI len; NI reserved; };
struct NimStringDesc {TGenericSeq Sup; NIM_CHAR data[SEQ_DECL_SIZE]; };
and here is the output of what I have tried in the lldb session:
(lldb) frame variable *longstring
(NimStringDesc) *longstring = {
Sup = (len = 9, reserved = 15)
data = {}
}
(lldb) frame variable longstring->data
(NIM_CHAR []) longstring->data = {}
(lldb) type summary add --summary-string "${&var[0]%s}" "NIM_CHAR []"
(lldb) frame variable longstring->data
(NIM_CHAR []) longstring->data = {}
(lldb) type summary add --summary-string "${var%s}" "NIM_CHAR *"
(lldb) frame variable longstring->data
(NIM_CHAR []) longstring->data = {}
(lldb) frame variable &longstring->data[0]
(NIM_CHAR *) &[0] = 0x00007ffff7f3a060 "9 - 3 - 2"
(lldb) frame variable *longstring
(lldb) type summary add --summary-string "${var.data%s}" "NimStringDesc"
(lldb) frame variable *longstring
(NimStringDesc) *longstring = NIM_CHAR [] # 0x7ffff7f3a060
(lldb) type summary add --summary-string "${&var.data[0]%s}" "NimStringDesc"
(lldb) frame variable *longstring
(NimStringDesc) *longstring = {
Sup = (len = 9, reserved = 15)
data = {}
}
(lldb)
I simply can't manage, that the output will just be data interpreted as a '\0' terminated c-string
The summary string syntax you've tried is (by design) not as syntax rich as C.
And since you're using a zero-sized array, I don't think we have any magic provision to treat that as a pointer-to string. You might want to file a bug about it, but in this case, it's arguable whether it would help you. Since your string is length-encoded it doesn't really need to be zero-terminated, and that is the only hint LLDB would understand out of the box to know when to stop reading out of a pointer-to characters.
In your case, you're going to have to resort to Python formatters
The things you need are:
the memory location of the string buffer
the length of the string buffer
a process to read memory out of
This is a very small Python snippet that does it - I'll give you enhancement suggestions as well, but let's start with the basics:
def NimStringSummary(valobj,stuff):
l = valobj.GetChildMemberWithName('Sup').GetChildMemberWithName('len').GetValueAsUnsigned(0)
s = valobj.GetChildMemberWithName('data').AddressOf()
return '"%s"'% valobj.process.ReadMemory(s.GetValueAsUnsigned(0),l,lldb.SBError())
As you can see, first of all it reads the value of the length field;
then it reads the address-of the data buffer; then it uses the process that the value comes from to read the string content, and returns it in quotes
Now, this is a proof of concept. If you used it in production, you'd quickly run into a few issues:
What if your string buffer hasn't been initialized yet, and it says the size of the buffer is 20 gigabytes? You're going to have to limit the size of the data you're willing to read. For string-like types it has builtin knowledge of (char*, std::string, Swift.String, ...) LLDB prints out the truncated buffer followed by ..., e.g.
(const char*) buffer = "myBufferIsVeryLong"...
What if the pointer to the data is invalid? You should check that s.GetValueAsUnsigned(0) isn't actually zero - if it is you might want to print an error message like "null buffer".
Also, here I just passed an SBError that I then ignore - it would be better to pass one and then check it
All in all, you'd end up with something like:
import lldb
import os
def NimStringSummary(valobj,stuff):
l = valobj.GetChildMemberWithName('Sup').GetChildMemberWithName('len').GetValueAsUnsigned(0)
if l == 0: return '""'
if l > 1024: l = 1024
s = valobj.GetChildMemberWithName('data').AddressOf()
addr = s.GetValueAsUnsigned(0)
if addr == 0: return '<null buffer>'
err = lldb.SBError()
buf = valobj.process.ReadMemory(s.GetValueAsUnsigned(0),l,err)
if err.Fail(): return '<error: %s>' % str(err)
return '"%s"' % buf
def __lldb_init_module(debugger, internal_dict):
debugger.HandleCommand("type summary add NimStringDesc -F %s.NimStringSummary" % os.path.splitext(os.path.basename(__file__))[0])
The one extra trick is the __lldb_init_module function - this function is automatically called by LLDB whenever you 'command script import' a python file. This will allow you to add the 'command script import ' to your ~/.lldbinit file and automatically get the formatter to be picked up in all debug sessions
Hope this helps!

Can a breakpoint display the contents of "const unsigned char* variable"?

I'm on the trail of why the contents of a TXT record in a Bonjour service discovery is sometimes being incompletely interpreted, and I've reached a point where it would be really useful to have a breakpoint print out the contents of an unsigned char in a callback (I've tried NSLog, but using NSLog in a threaded callback can get really tricky).
The callback function is defined this way:
static void resolveCallback(DNSServiceRef sdRef, DNSServiceFlags flags, uint32_t interfaceIndex, DNSServiceErrorType errorCode,
const char* fullname, const char* hosttarget, uint16_t port, uint16_t txtLen,
const unsigned char* txtRecord, void* context) {
So I'm interested in the txtRecord
Right now my breakpoint is using:
memory read --size 4 --format x --count 4 `txtRecord`
But that's only because that was an example on the lldv.llvm.org example page ;-) It's certainly showing data that I expect to be there, partially.
Do I have to apply informed knowledge of the length or can the breakpoint be coded such that it uses the length that is present? I'm thinking that instead of "hard coding" the two 4s in the example there ought to be a way to wrap in other read instructions inside back ticks like I did with the variable name.
Looking at http://lldb.llvm.org/varFormats.html I thought I'd try a format of C instead of x but that prints out series of dots which must mean I picked a wrong format or something.
I just tried
memory read `txtRecord`
and that's almost exactly what I wanted to see as it gives:
0x1c5dd884: 10 65 6e 30 3d 31 39 32 2e 31 36 38 2e 31 2e 33 .en0=192.168.1.3
0x1c5dd894: 36 0a 70 6f 72 74 3d 35 30 32 37 38 00 00 00 00 6.port=50278....
This looks really close:
memory read `txtRecord` --format C
giving:
0x1d0c6974: .en0=192.168.1.36.port=50278....
If that's the best I can get, I guess I can deal with the length bytes in front of each of the two strings in that txtRecord.
I'm asking this question because I'd like to display the actual and correct values... the bug is that sometimes the IP address comes back wrong, losing the frontmost 1, other times the port comes back "short" (in network byte order) with non-numeric characters at the end, like "502¿" instead of "50278" (in this example run).
My initial response to this question, while informative, was not complete. I originally thought the problem being reported was just about printing a c-string array of type unsigned char * where the default formatters (char *) weren't being used. That answer comes first. Then comes the answer about how to print this (somewhat unique) array of pascal strings data that the program is actually dealing with.
First answer: lldb knows how to handle the char * well; it's the unsigned char * bit that is making it behave a little worse than usual. e.g. if txtRecord were a const char *,
(lldb) p txtRecord
(const char *) $0 = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
You can copy the type summary lldb has built in for char * for unsigned char *. type summary list lists all of the built in type summaries; copying lldb-179.5's summaries for char *:
(lldb) type summary add -p -C false -s ${var%s} 'unsigned char *'
(lldb) type summary add -p -C false -s ${var%s} 'const unsigned char *'
(lldb) fr va txtRecord
(const unsigned char *) txtRecord = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
(lldb) p txtRecord
(const unsigned char *) $2 = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
(lldb)
Of course you can put these in your ~/.lldbinit file and they'll be picked up by Xcode et al from now on.
Second answer: To print the array of pascal strings that this is actually using, you'll need to create a python function. It will take two arguments, the size of the pascal string buffer (txtLen) and the address of the start of the buffer (txtRecord). Create a python file like pstrarray.py (I like to put these in a directory I made, ~/lldb) and load it into your lldb via the ~/.lldbinit file so you have the command available:
command script import ~/lldb/pstrarray.py
The python script is a little long; I'm sure someone more familiar with python could express this more concisely. There's also a bunch of error handling which adds bulk. But the main idea is to take two parameters: the size of the buffer and the pointer to the buffer. The user will express these with variable names like pstrarray txtLen txtRecord, in which case you could look up the variables in the current frame, but they might also want to use an acutal expression like pstrarray sizeof(str) str. So we need to pass these parameters through the expression evaluation engine to get them down to an integer size and a pointer address. Then we read the memory out of the process and print the strings.
import lldb
import shlex
import optparse
def pstrarray(debugger, command, result, dict):
command_args = shlex.split(command)
parser = create_pstrarray_options()
try:
(options, args) = parser.parse_args(command_args)
except:
return
if debugger and debugger.GetSelectedTarget() and debugger.GetSelectedTarget().GetProcess():
process = debugger.GetSelectedTarget().GetProcess()
if len(args) < 2:
print "Usage: pstrarray size-of-buffer pointer-to-array-of-pascal-strings"
return
if process.GetSelectedThread() and process.GetSelectedThread().GetSelectedFrame():
frame = process.GetSelectedThread().GetSelectedFrame()
size_of_buffer_sbval = frame.EvaluateExpression (args[0])
if not size_of_buffer_sbval.IsValid() or size_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS) == lldb.LLDB_INVALID_ADDRESS:
print 'Could not evaluate "%s" down to an integral value' % args[0]
return
size_of_buffer = size_of_buffer_sbval.GetValueAsUnsigned ()
address_of_buffer_sbval = frame.EvaluateExpression (args[1])
if not address_of_buffer_sbval.IsValid():
print 'could not evaluate "%s" down to a pointer value' % args[1]
return
address_of_buffer = address_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS)
# If the expression eval didn't give us an integer value, try it again with an & prepended.
if address_of_buffer == lldb.LLDB_INVALID_ADDRESS:
address_of_buffer_sbval = frame.EvaluateExpression ('&%s' % args[1])
if address_of_buffer_sbval.IsValid():
address_of_buffer = address_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS)
if address_of_buffer == lldb.LLDB_INVALID_ADDRESS:
print 'could not evaluate "%s" down to a pointer value' % args[1]
return
err = lldb.SBError()
pascal_string_buffer = process.ReadMemory (address_of_buffer, size_of_buffer, err)
if (err.Fail()):
print 'Failed to read memory at address 0x%x' % address_of_buffer
return
pascal_string_array = bytearray(pascal_string_buffer, 'ascii')
index = 0
while index < size_of_buffer:
length = ord(pascal_string_buffer[index])
print "%s" % pascal_string_array[index+1:index+1+length]
index = index + length + 1
def create_pstrarray_options():
usage = "usage: %prog"
description='''print an buffer which has an array of pascal strings in it'''
parser = optparse.OptionParser(description=description, prog='pstrarray',usage=usage)
return parser
def __lldb_init_module (debugger, dict):
parser = create_pstrarray_options()
pstrarray.__doc__ = parser.format_help()
debugger.HandleCommand('command script add -f %s.pstrarray pstrarray' % __name__)
and an example program to run this on:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main ()
{
unsigned char str[] = {16,'e','n','0','=','1','9','2','.','1','6','8','.','1','.','3','6',
10,'p','o','r','t','=','5','1','6','8','7'};
uint8_t *p = str;
while (p < str + sizeof (str))
{
int len = *p++;
char buf[len + 1];
strlcpy (buf, (char*) p, len + 1);
puts (buf);
p += len;
}
puts ("done"); // break here
}
and in use:
(lldb) br s -p break
Breakpoint 1: where = a.out`main + 231 at a.c:17, address = 0x0000000100000ed7
(lldb) r
Process 74549 launched: '/private/tmp/a.out' (x86_64)
en0=192.168.1.36
port=51687
Process 74549 stopped
* thread #1: tid = 0x1c03, 0x0000000100000ed7 a.out`main + 231 at a.c:17, stop reason = breakpoint 1.1
#0: 0x0000000100000ed7 a.out`main + 231 at a.c:17
14 puts (buf);
15 p += len;
16 }
-> 17 puts ("done"); // break here
18 }
(lldb) pstrarray sizeof(str) str
en0=192.168.1.36
port=51687
(lldb)
While it's cool that it's possible to do this in lldb, it's not as smooth as we'd like to see. If the size of the buffer and the address of the buffer were contained in a single object, struct PStringArray {uint16_t size; uint8_t *addr;}, that would work much better. You could define a type summary formatter for all variables of type struct PStringArray and no special commands would be required. You'd still need to write a python function, but it could get all the information it needed out of the object directly so it would disappear into the lldb type format system. You could just write (lldb) p strs and the custom formatter function would be called on strs to print all the strings in there.

How can I find out what is changing an object (or a simple variable) in Xcode 4 / lldb?

In some debuggers this is called "setting a trap" on a variable. What I want to do is trigger a breakpoint on any statement that changes the object. Or changes a property of the object.
I have an NSMutableDictionary that gets a value/key added to it but I can't find any statement that could be doing that.
You can set a watchpoint (from here):
Set a watchpoint on a variable when it is written to.
(lldb) watchpoint set variable -w write global_var
(lldb) watch set var -w write global_var
(gdb) watch global_var
Set a watchpoint on a memory location when it is written into. The size of the region to watch for defaults to the pointer size if no '-x byte_size' is specified. This command takes raw input, evaluated as an expression returning an unsigned integer pointing to the start of the region, after the '--' option terminator.
(lldb) watchpoint set expression -w write -- my_ptr
(lldb) watch set exp -w write -- my_ptr
(gdb) watch -location g_char_ptr
Set a condition on a watchpoint.
(lldb) watch set var -w write global
(lldb) watchpoint modify -c '(global==5)'
(lldb) c
...
(lldb) bt
* thread #1: tid = 0x1c03, 0x0000000100000ef5 a.out`modify + 21 at main.cpp:16, stop reason = watchpoint 1
frame #0: 0x0000000100000ef5 a.out`modify + 21 at main.cpp:16
frame #1: 0x0000000100000eac a.out`main + 108 at main.cpp:25
frame #2: 0x00007fff8ac9c7e1 libdyld.dylib`start + 1
(lldb) frame var global
(int32_t) global = 5
List all watchpoints.
(lldb) watchpoint list
(lldb) watch l
(gdb) info break
Delete a watchpoint.
(lldb) watchpoint delete 1
(lldb) watch del 1
(gdb) delete 1
Watchpoints are used to track a write to an address in memory (the default behavior). If you know where an object is in memory (you have a pointer to it), and you know the offset into the object that you care about, that's what watchpoints are for. For instance, in a simple C example, if you have:
struct info
{
int a;
int b;
int c;
};
int main()
{
struct info variable = {5, 10, 20};
variable.a += 5; // put a breakpoint on this line, run to the breakpoint
variable.b += 5;
variable.c += 5;
return variable.a + variable.b + variable.c;
}
Once you're at a breakpoint on variable.a, do:
(lldb) wa se va variable.c
(lldb) continue
And the program will pause when variable.c has been modified. (I didn't bother to type out the full "watch set variable" command).
With a complex object like an NSMutableDictionary, for instance, I don't think watchpoints will do what you need. You would need to know the implementation details of the NSMutableDictionary object layout to know which word (or words) of memory to set a watchpoint.

how could the const char * changed?

I am writing a program which can print the directory recursively,
below is the gdb debug segment
note that the d_path (it is a const char * passed as a parameter to print_dir_tree)
is "/home/cifer/.gftp" before step to "if (dr == NULL) {"
however, it is printed "/home/cifer/!\200" after this clause
who can tell me why?
thanks a lot!
Breakpoint 1, print_dir_tree (d_path=0x805b058 "/home/cifer/.gftp", depth=4)
at dir_demo.c:15
15 DIR *dr = opendir(d_path);
(gdb) print d_path
$2 = 0x805b058 "/home/cifer/.gftp"
(gdb) print d_path
$3 = 0x805b058 "/home/cifer/.gftp"
(gdb) step
16 if (dr == NULL) {
(gdb) print d_path
$4 = 0x805b058 "/home/cifer/!\200"
(gdb) step
20 struct dirent *de = NULL;
(gdb) print d_path
$5 = 0x805b058 "/home/cifer/!\200"
(gdb) step
21 while((de = readdir(dr)) != NULL) {
(gdb) print d_path
$6 = 0x805b058 "/home/cifer/!\200"
(gdb)
I would need to see your code to tell for sure but if you are asking if a const can be changed the answer is yes. If you are asking how it can be changed, that is simple, it is treated just like a regular variable but it will give you a warning when it is changed. If you are trying to change it and avoid a warning you can copy the variable and then make changes to it.

GDB If statement return values?

Is there a way I can read if statements being evaluated? I mean, like get the return value of realpath in -
if(realpath(path.c_str(), realPath) == 0)
You can step into and finish realpath function. The returned value will be printed on screen.
at the gdb prompt (if you got debug symbols enabled when you compiled)
print realpath(path.c_str(),realPath)
it will print the result, very nifty.
If you had a memory pointer laying around on your code, you could use GDB to allocate new memory space for it it, and use it to store the result of the expression you want to evaluate.
Check this page for more info.
(gdb) set variable p = malloc(sizeof(int))
(gdb) print p
$2 = (int *) 0x40013f98 (address allocated by malloc)
(gdb) set variable *p = 255
(gdb) print *p
$3 = 255

Resources