I have always wondered why garbage data appears to not be meaningful. For clarity, what I mean by "garbage" is data that is just whatever happens to be at a particular memory address, that you have access to because of something like forgetting to initialize a variable.
For example, printing out an unused array gave me this:
#°õN)0ÿÿl¯ÿ¯ÿ ``¯ÿ¯ÿ #`¯ÿø+))0 wy¿[d
Obviously, this is useless for my application, but it also seems like it is not anything useful for any application. Why is this? Is there some sort of data protection going on here perhaps?
As you state in your question:
... "garbage" is data that is just whatever happens to be at a particular memory address, that you have access to because of something like forgetting to initialize a variable.
This implies that something else used to be in that memory before you got to use it for your variable. Whatever used to be there may or may not have any relation to how you wish to use the variable. That is, most languages do not force memory used for one type of object to be reused for the exact same type.
This means, if memory was used to store a pointer, and then released, that same memory may be used to store a string. If the pointer value was read out as if it was a string, something that looks like garbage may appear. This is because the bytes used to represent a pointer value are not restricted to the values that correspond to printable ASCII values.
A common way to detect a buffer overrun has occurred in a program is to examine a pointer value and see if it contains printable ASCII values. In this case, the user of the memory as a pointer sees junk, but in this case it is "printable".
Of course memory is never garbage, unless you make a conscious effort. After all, you are on a deterministic machine, even if it doesn't always seem like it. (Of course, if you interprete arbitrary bytes as text then it's unlikely that you see yourself as ASCII art, although you would deserve it.)
That was the reason for one of the worst bugs in history, quite recently, cf. https://xkcd.com/1354/. Where do you live to have missed it?
Related
This question already has an answer here:
Secure erasing of password from memory in Ruby
(1 answer)
Closed 2 years ago.
I have a use case where I need to dispose of some data the very moment I don't need it anymore, for security reasons.
I am writing a server in Ruby that deals with logins and passwords. I use BCrypt to store passwords in my database. My server receives the password, makes a bcrypt hash out of it, and then doesn't use the original password anymore.
I know of a kind of cyberattacks that involves stealing data right from RAM, and I am concerned that an attacker might steal a user's password in raw string form in the period of time that the password is still in memory. I am not sure if simply using password_in_string_form = nil would be enough.
I want to nullify the variable that holds the user's password the moment I am done with it. By nullify I mean something akin to using /dev/null to fill something with zeroes. The end goal is irreversible destruction of data.
I am not sure if simply using password_in_string_form = nil would be enough.
No, it would not be enough. The object might or might not be garbage collected immediately, and even if it was, that does not cause the contents to be erased from memory.
However, unless they have been frozen, Ruby strings are mutable. Thus, as long as you do not freeze the password string, you can replace its contents with zeroes, or random characters, or whatever before you let go of it. In particular, this should work, subject a few provisos, covered later:
(0 ... password_in_string_form.length).each do |i|
password_in_string_form[i] = ' '
end
But care needs to be exercised, for this approach, which may seem more idomatic, does not work:
# SURPRISE! This does not reliably remove the password from memory!
password_in_string_form.replace(' ' * password_in_string_form.length)
Rather than updating the target string's contents in-place, replace() releases the contents to Ruby's internal allocator (which does not modify them), and chooses a strategy for the new contents based on details of the replacement.
The difference in effect between those two approaches should be a big warning flag for you, however. Ruby is a pretty high-level language. It gives you a lot of leverage, but at the cost of control over fine details, such as whether and how long data are retained in memory.
And that brings me to the provisos. Here are the main ones:
As you handle the password string, you must take care to avoid making copies of it or of any part of it, or else to capture all the copies and trash them, too. That will take some discipline and attention to detail, because it is very easy to make such copies.
Trashing the password string itself may not be enough to achieve your objective. You also need to trash any other copies of password in memory, such as from upstream of isolating the password string. If yours is a web application, for instance, that would include the contents of the HTTP request in which the password was delivered to your application, and probably more strings derived from it than just the isolated password string. Similar applies to other kinds of applications.
passwords may not be the only thing you need to protect. If an adversary is in a position where they can steal passwords from the host machine's memory, then they are also in position to steal the sensitive data that users access after logging in.
For these and other reasons, if the security requirements for your server dictate that in-memory copies of user passwords be destroyed as soon as they are no longer needed, then (pure) Ruby may not be an appropriate implementation language.
On the other hand, if an adversary obtains sufficient access to scrape passwords from memory / swap, then it's probably game over already. At minimum, they will have access to everything your application can access. That doesn't make the passwords altogether moot, but you should take it into consideration in your evaluation of how much effort to devote to this issue.
This is not possible in Ruby.
You will have to write some code specific to each implementation (Opal, TruffleRuby, JRuby, Rubinius, MRuby, YARV, etc.) to ensure that. Depending on the implementation, it may not even be possible to do inside of the managed memory of the implementation at all, without having a separate piece of memory that you manage yourself.
I.e. you will probably need to have some tiny piece of native code that manages its own tiny piece of native memory and injects it into your Ruby program.
I see some variables named 'dirty' in some source code at work and some other code. What does it mean? What is a dirty flag?
Generally, dirty flags are used to indicate that some data has changed and needs to eventually be written to some external destination. It isn't written immediate because adjacent data may also get changed and writing bulk of data is generally more efficient than writing individual values.
There's a deeper issue here - rather than "What does 'dirty mean?" in the context of code, I think we should really be asking - is 'dirty' an appropriate term for what is generally intended.
'Dirty' is potentially confusing and misleading. It will suggest to many new programmers corrupt or erroneous form data. The work 'dirty' implies that something is wrong and that the data needs to be purged or removed. Something dirty is, after all undesirable, unclean and unpleasant.
If we mean 'the form has been touched' or 'the form has been amended but the changes haven't yet been written to the server', then why not 'touched' or 'writePending' rather than 'dirty'?
That I think, is a question the programming community needs to address.
Dirty could mean a number of things, you need to provide more context. But in a very general sense a "dirty flag" is used to indicate whether something has been touched / modified.
For instance, see usage of "dirty bit" in the context of memory management in the wiki for Page Table
"Dirty" is often used in the context of caching, from application-level caching to architectural caching.
In general, there're two kinds of caching mechanisms: (1) write through; and (2) write back. We use WT and WB for short.
WT means that the write is done synchronously both to the cache and to the backing store. (By saying the cache and the backing store, for example, they can stand for the main memory and the disk, respectively, in the context of databases).
In contrast, for WB, initially, writing is done only to the cache. The write to the backing store is postponed until the cache blocks containing the data are about to be modified/replaced by new content.
The data is the dirty values. When implementing a WB cache, you can set dirty bits to indicate whether a cache block contains dirty value or not.
What can a driver do defensively to protect against a user-space app that issues an ioctl call with a pointer whose pointee is of a type/size different from what the driver expects/specified as part of its interface.
For e.g. say IOCTL x expects a (struct foo *) but caller issues it with (unsigned long) ((struct bar *)&bar). Will copy_from_user blow up/compromise the system stability?
Maybe one way is to expect caller to have CAP_SYS_ADMIN and have the implicit trust but is there another/better way?
Thanks.
copy_to/from_user use void pointers, meaning they are ignorant of any data types you pass. And given your example, even if they were aware of the data type, you still cannot trust your user: He could simply cast to the type you want:
struct bar *x;
copy_to_kernel_aware_of_foo((struct foo*)x);
Expecting the caller to have any kinds of root privileges or capabilities also does not solve your problem - root can also make mistakes or be evil.
Things that can help a bit:
Only use copy_to/from_user to copy around untyped byte buffers. Don't rely on kernel and user space having the same notion of complex data structures.
If you only worry about data types being wrong by mistake, you might consider tagging your data structure so that it contains some magic values in between the 'real' data. This will not help you against the caller deliberately faking data, though.
In terms of an attack surface, the attacker will probably not attack you by passing a wrong data type, but instead provide wrong values. There's nothing to help you instead of proper validation of all data that is passed to you from user space. Never trust anything without checking!
I've been reading some books on windows programming in C++ lately, and I have had some confusing understanding of some of the recurring concepts in WinAPI. For example, there are tons of data types that start with the handle keyword'H', are these supposed to be used like pointers? But then there are other data types that start with the pointer keyword 'P'. So I guess not. Then what is it exactly? And why were pointers to some data types given separate data types in the first place? For example, PCHAR could have easily designed to be CHAR*?
Handles used to be pointers in early versions of Windows but are not anymore. Think of them as a "cookie", a unique value that allows Windows to find back a resource that was allocated earlier. Like CreateFile() returns a new handle, you later use it in SetFilePointer() and ReadFile() to read data from that same file. And CloseHandle() to clean up the internal data structure, closing the file as well. Which is the general pattern, one api function to create the resource, one or more to use it and one to destroy it.
Yes, the types that start with P are pointer types. And yes, they are superfluous, it works just as well if you use the * yourself. Not actually sure why C programmers like to declare them, I personally think it reduces code readability and I always avoid them. But do note the compound types, like LPCWSTR, a "long pointer to a constant wide string". The L doesn't mean anything anymore, that dates back to the 16-bit version of Windows. But pointer, const and wide are important. I do use that typedef, not doing so will risk future portability problems. Which is the core reason these typedefs exist.
A handle is the same as a pointer only so far as both ID a particular item. Obviously a pointer is the address of the item so if you know it's structure you can start getting fields in the item. A handle may or may not be a pointer - basically if it is a pointer you don't know what it is pointing to so you can't get into the fields.
Best way to think of a handle is that it is a unique ID for something in the system. When you pass it to something in the system the system will know what to cast it to (if it is a pointer) or how to treat it (if it is just some id or index).
Short version: The default inspect method for a class displays the object's address.* How can I do this in a custom inspect method of my own?
*(To be clear, I want the 8-digit hex number you would normally get from inspect. I don't care about the actual memory address. I'm just calling it a memory address because it looks like one. I know Ruby is memory-safe.)
Long version: I have two classes, Thing and ThingList. ThingList is a subclass of Array specifically designed to hold Things. Due to the nature of Things and the way they are used in my program, Things have an instance variable #container that points back to the ThingList that holds the Thing.
It is possible for two Things to have exactly the same data. Therefore, when I'm debugging the application, the only way I can reliably differentiate between two Things is to use inspect, which displays their address. When I inspect a Thing, however, I get pages upon pages of output because inspect will recursively inspect #container, causing every Thing in the list to be inspected as well!
All I need is the first part of that output. How can I write a custom inspect method on Thing that will just display this?
#<Thing:0xb7727704>
EDIT: I just realized that the default to_s does exactly this. I didn't notice this earlier because I have a custom to_s that provides human-readable details about the object.
Assume that I cannot use to_s, and that I must write a custom inspect.
You can get the address using object_id and multiplying it by 2* and display it in hex using sprintf (aka %):
"#<Thing:0x%08x>" % (object_id * 2)
Of course, as long as you only need the number to be unique and don't care that it's the actual address, you can just leave out the * 2.
* For reasons that you don't need to understand (meaning: I don't understand them), object_id returns half the object's memory address, so you need to multiply by 2 to get the actual address.
This is impossible. There is no way in Ruby to get the memory address of an object, since Ruby is a memory-safe language which has (by design) no methods for accessing memory directly. In fact, in many implementations of Ruby, objects don't even have a memory address. And in most of the implementations that do map objects directly to memory, the memory address potentially changes after every garbage collection.
The reason why using the memory address as an identifier in current versions of MRI and YARV accidentally works, is because they have a crappy garbage collector implementation that never defragments memory. All other implementations have garbage collectors which do defragment memory, and thus move objects around in memory, thereby changing their address.
If you tie your implementation to the memory address, your code will only ever work on slow implementations with crappy garbage collectors. And it isn't even guaranteed that MRI and YARV will always have crappy garbage collectors, in fact, in both implementations the garbage collector has been identified as one of the major performance bottlenecks and it is safe to assume that there will be changes to the garbage collectors. There are already some major changes to YARV's garbage collector in the SVN, which will be part of YARV 1.9.3 and YARV 2.0.
If you want an ID for objects, use Object#object_id.
Instead of subclassing Array your class instances could delegate to one for the desired methods so that you don't inherit the overridden inspect method.