Is there any way to change the default behavior of Visual Studio's debugger such that when hovering over a null-terminated, dynamically allocated character array (C++), it will display the full content of the string, rather than the first character only?
I should mention that I am using Visual Studio 2010. If there is a way to achieve this in VS2012 only though, I would be interested to know that as well!
There's a useful link for visual studio, C++ Debugger Tips:
To interpret a pointer expression as a string, you can use ‘,s’ for an simple null-terminated string, ‘,s8‘ for a UTF-8 string, or ‘,su‘ for a Unicode string. (Note that the expression has to be a pointer type for this to work).
For example you break in the following function
void function(char* s)
{
// break here
}
in the MSVC watch window (or debugger), you would first try to just add s but it will only display the first character. But with the above information, you could append the following suffixes to the variables in the watch window:
s,s8
or if you know it's unicode, try:
s,su
This even works for arbitrary pointers, or say for other data types, e.g. debugging the content of a QString:
QString str("Test");
// break here
For this, possible watch window (or debugger) statements are:
((str).d)->array,su <-- debug QString (Qt4) as unicode char string
(char*)str.d + str.d->offset,su <-- debug QString (Qt5) as unicode char string
0x0c5eae82,su <-- debug any memory location as unicode char string
If appending ,s8 or, respectively ,su does not work, try the other variant.
Related
I'm trying to extract the strings from a binary linux kernel image
(this specific phenomena happens in all types of images I've tried: bzImage, vmlinuz, vmlinux, .... and not a specific one)
Simply running 'strings ' prints many strings with a prefix character, for example:
"4netlink: %d bytes leftover after parsing attributes in process `%s'."
However, looking at the kernel sources, the current string should not include the "4" prefix.
While opening the file using some HEX editor, I've seen that the string actually also includes:
'\x00\x01' and only then '\x34' ("4")
My guess is this is some kind of pointer to a special section, or something of the sorts,
because many other strings include "3" and other numbers (and even characters).
Would appreciate any information in the matter
Thanks!
The prefixes OP is seeing are KERN_<LEVEL> prefixes. These are special string literals to be added before the main printk format specifier, using C's concatenation of adjacent string literals. For example:
printk(KERN_ERR "Something has gone wrong!\n");
From kernel version 3.6 onwards, these KERN_<LEVEL> prefix macros are defined in "include/linux/kern_levels.h" and begin with the ASCII SOH character "\001" followed by the log level as an ASCII digit for the numeric levels, or some other ASCII character for special meanings. The string for KERN_DEFAULT changed from "\001" "d" to "" (empty string) in kernel version 5.1. The string for KERN_CONT changed from "" (empty string) to "\001" "c" in kernel version 4.9.
From kernel version 2.6.37 to 3.5.x, the KERN_<LEVEL> prefix macros were defined in "include/linux/printk.h" and used a different format with the level specified between angle brackets, for example KERN_WARNING was defined as "<4>", KERN_DEFAULT was defined as "<d>", and KERN_CONT was defined as "<c>".
Besides printk, there are other macros for generating kernel logs, some of which specify the KERN_<LEVEL> part implicitly. OP's example from "lib/nlattr.c":
pr_warn_ratelimited("netlink: %d bytes leftover after parsing attributes in process `%s'.\n",
rem, current->comm);
Here, the pr_warn_ratelimited macro is defined in "include/linux/printk.h" as:
#define pr_warn_ratelimited(fmt, ...) \
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
There is a lot going on there, but pr_fmt(fmt) is one or more string literals including fmt macro parameter, so the string passed to printk_ratelimited is constructed from some concatenated string literals beginning with those from the expansion of KERN_WARNING.
I'm trying to write a short program (short enough that it has a simple main function). First, I should list the dependency in the cargo.toml file:
[dependencies]
passwords = {version = "3.1.3", features = ["crypto"]}
Then when I use the crate in main.rs:
extern crate passwords;
use passwords::hasher;
fn main() {
let args: Vec<String> = std::env::args().collect();
if args.len() < 2
{
println!("Error! Needed second argument to demonstrate BCrypt Hash!");
return;
}
let password = args.get(1).expect("Expected second argument to exist!").trim();
let hash_res = hasher::bcrypt(10, "This_is_salt", password);
match hash_res
{
Err(_) => {println!("Failed to generate a hash!");},
Ok(hash) => {
let str_hash = String::from_utf8_lossy(&hash);
println!("Hash generated from password {} is {}", password, str_hash);
}
}
}
The issue arises when I run the following command:
$ target/debug/extern_crate.exe trooper1
And this becomes the output:
?sC�M����k��ed from password trooper1 is ���Ka .+:�
However, this input:
$ target/debug/extern_crate.exe trooper3
produces this:
Hash generated from password trooper3 is ��;��l�ʙ�Y1�>R��G�Ѡd
I'm pretty content with the second output, but is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten? And is there code I could use to prevent this?
Note: Code was developed in Visual Studio Code in Windows 10, and was compiled and run using an embedded Git Bash Terminal.
P.S.: I looked at similar questions such as Rust println! problem - weird behavior inside the println macro and Why does my string not match when reading user input from stdin? but those issues seem to be issues with new-line and I don't think that's the problem here.
To complement the previous, the answer to your question of "is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten?" is:
let str_hash = String::from_utf8_lossy(&hash);
The reason's in the name: from_utf8_lossy is lossy. UTF8 is a pretty prescriptive format. You can use this function to "decode" stuff which isn't actually UTF8 (for whatever reason), but the way it will do this decoding is:
replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER, which looks like this: �
And so that is what the odd replacement you get is: byte sequences which can not be decoded as UTF8, and are replaced by the "replacement character".
And this is because hash functions generally return random-looking binary data, meaning bytes across the full range (0 to 255) and with no structure. UTF8 is structured and absolutely does not allow such arbitrary data so while it's possible that a hash will be valid UTF8 (though that's not very useful) the odds are very very low.
That's why hashes (and binary data in general) are usually displayed in alternative representations e.g. hex, base32 or base64.
You could convert the hash to hex before printing it to prevent this
Neither of the other answers so far have covered what caused the Hash generated part of the answer to get overwritten.
Presumably you were running your program in a terminal. Terminals support various "terminal control codes" that give the terminal information such as which formatting they should use to output the text they're showing, and where the text should be output on the screen. These codes are made out of characters, just like strings are, and Unicode and UTF-8 are capable of representing the characters in question – the only difference from "regular" text is that the codes start with a "control character" rather than a more normal sort of character, but control characters have UTF-8 encodings of their own. So if you try to print some randomly generated UTF-8, there's a chance that you'll print something that causes the terminal to do something weird.
There's more than one terminal control code that could produce this particular output, but the most likely possibility is that the hash contained the byte b'\x0D', which UTF-8 decodes as the Unicode character U+000D. This is the terminal control code "CR", which means "print subsequent output at the start of the current line, overwriting anything currently there". (I use this one fairly frequently for printing progress bars, getting the new version of the progress bar to overwrite the old version of the progress bar.) The output that you posted is consistent with accidentally outputting CR, because some random Unicode full of replacement characters ended up overwriting the start of the line you were outputting – and because the code in question is only one byte long (most terminal control codes are much longer), the odds that it might appear in randomly generated UTF-8 are fairly high.
The easiest way to prevent this sort of thing happening when outputting arbitrary UTF-8 in Rust is to use the Debug implementation for str/String rather than the Display implementation – it will output control codes in escaped form rather than outputting them literally. (As the other answers say, though, in the case of hashes, it's usual to print them as hex rather than trying to interpret them as UTF-8, as they're likely to contain many byte sequences that aren't valid UTF-8.)
I have been creating a gtk+ application in eclipse. At a point in the code, an alert dialogue is displayed using code similar to the gtk+ hello world. When I run this program, the dialogue ends up displaying the content of 'words' as expected, but the program crashes when I close the dialogue. I am new to c, so I ran the program with debug expecting to find some simple mistake. However, when i ran with debug, the dialogue displayed 'words' preceded by many null characters and logged the message.
Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
This new problem is confusing, and to add to the confusion, the program also did not crash when the dialogue was closed.
In summary, when I run the code, the text is fine, and the program crashes. When I debug the code, the text is invalid, and the program does not crash.
The text in the dialogue is generated with the following code:
char* answerBuffer = (char*)malloc(strlen(s)+strlen(words)+1);
strcat(answerBuffer,words);
char* answer = (char*)malloc(strlen(answerBuffer)+1);
g_strlcpy(answer,answerBuffer,strlen(answerBuffer)+1);
return answer;
as the code executes, the length of answerBuffer is 320 and words is a char* argument set to "a,b,c,d". I am running this on windows xp through eclipse with the minGW compiler using gtk+ 2.24. Can anyone tell me how to debug/fix this?
ps. 's' contains text from a file followed by either one or twelve null characters (one if I run, twelve if I debug)
Given the code you've supplied, this line is the problem:
strcat(answerBuffer,words);
Why? Because you don't know what is in answerBuffer. Malloc doesn't necessarily zero memory it returns to you, so answerBuffer contains essentially random bytes. You need to zero at least the first byte (so it looks like a zero length string), or use calloc() to allocate your buffer, which gives you zeroed memory.
Well, the odds are that the content of 's' isn't a valid UTF-8 sequence.
Look up what UTF-8 is about in case that confuses you. Or make sure your text file conains only ASCII characters for simplicity.
If that doesn't help you, then you're probably messing up somewhere with the file read or possible encoding conversions.
I have a Win32 Edit window (i.e. CreateWindow with classname "EDIT").
Every time I add a line to the control I append '\r\n' (i.e new line).
However, when I call WM_GETTEXT to get the text of the EDIT window, it is always missing the last '\n'.
If I add 1 to the result of WM_GETTEXTLENGTH, it returns the correct character count, thus WM_GETTEXT returns the final '\n'.
MSDN says this about WM_GETTEXTLENGTH:
When the WM_GETTEXTLENGTH message is
sent, the DefWindowProc function
returns the length, in characters, of
the text. Under certain conditions,
the DefWindowProc function returns a
value that is larger than the actual
length of the text. This occurs with
certain mixtures of ANSI and Unicode,
and is due to the system allowing for
the possible existence of double-byte
character set (DBCS) characters within
the text. The return value, however,
will always be at least as large as
the actual length of the text; you can
thus always use it to guide buffer
allocation. This behavior can occur
when an application uses both ANSI
functions and common dialogs, which
use Unicode.
... but that doesn't explain the off by 1 conundrum.
Why does this occur and is safe for me to just add an unexplained 1 to the text length?
Edit
After disabling the unicode compile, I can get it working with an ASCII build, however, I would like to get this working with a UNICODE build, perhaps the EDIT window control does not behave well with UNICODE?
Try to set ES_MULTILINE and ES_WANTRETURN styles for your edit control.
\r and \n map to byte constructs, which work when you compile for ASCII.
Because \r, \n are not guaranteed to represent carriage return, line feed (both could map to line feed, for example), it is best to use the hexadecimal code points when building the string. (You would probably use the TCHAR functions.)
Compile for ASCII - sprintf(dest, "%s\x0D\x0A", str);
Compile for UNICODE - wsprintf(dest, "%s\0x000D\x000A", str);
When you call WM_GETTEXT to retrieve the text you might need to call WideCharToMultiByte to convert it to a certain code page or character set such as ASCII or UTF8 in order to save it to a file.
http://msdn.microsoft.com/en-us/library/aa450989.aspx
The documentation for WM_GETTEXT says the supplied buffer has to be large enough to include the null terminator. The documentation for WM_GETTEXTLENGTH says the return value does not include the null terminator. So you have to include room for an extra character when allocating the buffer that receives the text.
You have to add one character for your string terminator \0 character.
Using Visual Studio 2010 Professional, I have a ToString() method that looks like this:
public override string ToString()
{
return "something" + "\n" + "something";
}
Because there are several "something"'s and each is long, I'd like to see
something
something
Sadly, I'm seeing
"something\nsomething"
Is there a way to get what I want?
Actually there is a way. You can use format specifiers in the immediate window to change the format of the display. If you have a string with carriage returns and linefeeds in it ("\r\n") you can follow the print request with the 'no quotes' format specifier.
In the immediate window type:
?MyObj.ToString(),nq
and the \r\n will cause newlines in the immediate window.
For more info on format specifiers see:
http://msdn.microsoft.com/en-us/library/e514eeby.aspx
Unfortunately no there is not. What's happening here is an artifact of the design of the debugger APIs.
The component responsible for processing the ToString() call is the expression evaluator. It's the data source for the majority of the debugger windows (watch, locals, immediate, etc ...).
For every window but the immediate the value is displayed on a single line. Displaying a multiline string on a single line doesn't make much sense. Hence the expression evaluator makes the string slightly more displayable by escaping newline characters into a displayable version.
This technique works pretty well for the locals and watch window. But in the immediate window where it makes more sense to display the multiline value it makes a lot less sense. Unfortunately the expression evaluator doesn't know the context of where it's data will be displayed and hence does the safe operation which is to escape the newlines.