Encoding stronger safety of the WinApi through the Rust FFI - winapi

I'm playing around with the winapi crate, but it doesn't seem to me to add safety to the Windows API - it seems merely to provide the types and signatures and allows us to program in mostly the same unsafe paradigms, but using Rust syntax.
Is it possible to, say, subdivide the native types further in the Rust FFI to encode the implicit lifetime information so that winapi programming is actually safer? When the winapi allocates to a pointer or a handle which must be deallocated/released with some call, can we attach the correct Drop behavior for that value? Is Rust expressive enough?
Of course we could completely wrap the winapi calls with safer objects that map between caller and the winapi, but that incurs a runtime hit during the copy/mapping and that's no fun.
(Perhaps it's clear, but I'm new to Rust and to the WinApi and even to native programming.)
I realize that string data would usually have to be converted to Rust's UTF-8. But then I wonder if it would be possible to automatically wrap a native string in a memoizing struct where the string doesn't get converted to UTF-8 (transparently) unless it's needed in Rust code (vs just being passed back to the WinApi as the same format).
Handles and pointers though wouldn't need any conversion, they just need the right lifetimes. But there are many kinds of pointers and many kinds of handles and those type differences ought to be preserved in Rust. But then to encode the library-specific free() with a Drop trait impl I think there would be many permutations and then we'd need overloads for other winapi functions which don't care who allocated it. Right?

Related

ARM-SVE: wrapping runtime sized register

In a generic SIMD library eve we were looking into supporting length agnostic sve
However, we cannot wrap a sizeless register into a struct to do some meta-programming around it.
struct foo {
svint8_t a;
};
Is there a way to do it? Either clang or gcc.
I found some talk of __sizeless_struct and some patches flying around but I think it didn't go anywhere.
I also found these gcc tests - no wrapping of a register in a struct.
No, unfortunately this isn't possible (at the time of writing). __sizeless_struct was an experimental feature that Arm added as part of the initial downstream implementation of the SVE ACLE in Clang. The main purpose was to allow tuple types like svfloat32x3_t to be defined directly in <arm_sve.h>. But the feature had complex, counter-trend semantics. It broke one of the fundamental rules of C++, which is that all class objects have a constant size, so it would have been an ongoing maintenance burden for upstream compilers.
__sizeless_struct (or something like it) probably wouldn't be acceptable for a portable SIMD framework, since the sizeless struct would inherit all of the restrictions of sizeless vector types: no global variables, no uses in normal structs, etc. Either all SIMD targets would have to live by those restrictions, or the restrictions would vary by target (limiting portability).
Function-based abstraction might be a better starting point than class-based abstraction for SIMD frameworks that want to support variable-length vectors. Google Highway is an example of this and it works well for SVE.

Rust manual memory management

When I began learning C, I implemented common data structures such as lists, maps and trees. I used malloc, calloc, realloc and free to manage the memory manually when requested. I did the same thing with C++, using new and delete.
Now comes Rust. It seems like Rust doesn't offer any functions or operators which correspond to the ones of C or C++, at least in the stable release.
Are the Heap structure and the ptr module (marked with experimental) the ones to look at for this kind of thing?
I know that these data structures are already in the language. It's for the sake of learning.
Although it's really not recommended to do this ever, you can use malloc and free like you are used to from C. It's not very useful, but here's how it looks:
extern crate libc; // 0.2.65
use std::mem;
fn main() {
unsafe {
let my_num: *mut i32 = libc::malloc(mem::size_of::<i32>() as libc::size_t) as *mut i32;
if my_num.is_null() {
panic!("failed to allocate memory");
}
libc::free(my_num as *mut libc::c_void);
}
}
A better approach is to use Rust's standard library:
use std::alloc::{alloc, dealloc, Layout};
fn main() {
unsafe {
let layout = Layout::new::<u16>();
let ptr = alloc(layout);
*(ptr as *mut u16) = 42;
assert_eq!(*(ptr as *mut u16), 42);
dealloc(ptr, layout);
}
}
It's very unusual to directly access the memory allocator in Rust. You generally want to use the smart pointer constructors (Box::new, Rc::new, Arc::new) for single objects and just use Vec or Box<[T]> if you want a heap-based array.
If you really want to allocate memory and get a raw pointer to it, you can look at the implementation of Rc. (Not Box. Box is magical.) To get its backing memory, it actually creates a Box and then uses its into_raw_non_null function to get the raw pointer out. For destroying, it uses the allocator API, but could alternatively use Box::from_raw and then drop that.
Are the Heap structure and the ptr module (marked with experimental) the ones to look at for this kind of thing?
No, as a beginner you absolutely shouldn't start there. When you started learning C, malloc was all there was, and it's still a hugely error-prone part of the language - but you can't write any non-trivial program without it. It's very important for C programmers to learn about malloc and how to avoid all the pitfalls (memory leaks, use-after-free, and so on).
In modern C++, people are taught to use smart pointers to manage memory, instead of using delete by hand, but you still need to call new to allocate the memory for your smart pointer to manage. It's a lot better, but there's still some risk there. And still, as a C++ programmer, you need to learn how new and delete work, in order to use the smart pointers correctly.
Rust aims to be much safer than C or C++. Its smart pointers encapsulate all the details of how memory is handled at low-level. You only need to know how to allocate and deallocate raw memory if you're implementing a smart pointer yourself. Because of the way ownership is managed, you actually need to know a lot more details of the language to be able to write correct code. It can't be lesson one or two like it is in C or C++: it's a very advanced topic, and one many Rust programmers never need to learn about.
If you want to learn about how to allocate memory on the heap, the Box class is the place to start with that. In the Rust book, the chapter about smart pointers is the chapter about memory allocation.

Is pointers in c++ performing low-level memory manipulation?

So i was reading something about c++ in wiki and i came across this "low-level memory manipulation", it said c++ facilitates low-level memory manipulation. so first thing that came in my head was pointers
so can someone give me a brief and correct description what low-level memory manipulation actually means and examples of c++ features that does that.Don't comment if you are not sure.
https://en.wikipedia.org/wiki/C%2B%2B
I guess that the text you are referring to is saying that raw pointer manipulation is low-level in genuine C++ and that idiomatic C++11 programs should use smart pointers (like std::unique_ptr or std::shared_ptr) and standard containers (both internally would use raw
pointers, but this is hidden to the programmer).
Low-level memory manipulation would mean explicitly use in your code raw pointers like YourType* ptr; and raw memory allocation like ptr = new YourType(something); and later explicit deletion with delete ptr;
You should read Programming - Principles and practices using C++
C++ evolved from C which is, by far, the dominant language for coding operating systems. The fundamental variable types closely adhere to those of the compiler's target machine and C is used as a kind of high level assembly language. A subset of C++ is increasingly used for the same purpose as it provides close to the metal programming. When C++ programmers look at the assy code they find a close correlation to their program source.

GTK data types vs base data types

I'm starting to fiddle around a little bit with GTK+ for some little project.
GLib defines a series of data type, like gint gpointer and so on, which are just typedefs of base data types (gint is a typedef for int, gpointer for void* and so on).
Now, say I have a function or a class that does in no way make use of GTK. I would be really tempted to use the base data types so that I can reuse the class/function somewhere else even if I don't include the GTK headers.
On the other hand, I find it quite ugly to have a mix of gint and int in the code, when they are actually the same thing.
In summary, I am wondering whether there is a standard practice of when to use one or the other, or if one should just mix them at will...
I deal with this issue a lot working with third party libraries where they all want their own type alias for integers, floats, longs, shorts, byte aliases instead of chars, etc.
It's very annoying. This is often done to ensure portability but ends up giving each library its own standards.
What I find displeasing most here is from a coupling perspective. I might have a general mesh interface which should be decoupled from any rendering concerns. Yet some of its data may be passed directly to an OpenGL function which wants to assume that size of the integers we pass will match sizeof(GLint).
In some cases this isn't merely aesthetic. It might not even be plausible to include GL headers in this mesh header, as it may be part of a widely-used software development kit which should not require such compile-time dependencies on the third party plugin writers who use it.
Yet portability is an issue. I managed to survive a nightmarish scenario in a very large-scale legacy C codebase where the implicit assumption was made throughout the codebase that sizeof(int) == sizeof(void*). It took years of looking for needles in a haystack to port this codebase to 64-bit.
What I've settled on personally is to start favoring plain old unaliased data types over the years. I've also taken a liking to just using signed integers, e.g. I found it a nuisance in the past to even avoid warnings in basic loops through containers where some would use int, others unsigned int, others size_t, etc. to indicate the number of elements contained. At least personally, I found my maintenance time reduced by just favoring int without a very good reason not to do so.
To try to mitigate a potential worst-case scenario on some platform where sizeof(int) != sizeof(GLint), e.g., I tend to liberally sprinkle assertions around code that makes the assumption that these two are equal: assert(sizeof(int) == sizeof(GLint));. This should significantly mitigate the pain associated with that kind of nightmarish scenario I faced before when porting from 32-bit to 64-bit. It also explicitly expresses these assumptions.
I've found this to establish a comfortable balance for my case. Of course this is all subjective and can vary considerably based on your use cases. But this is one possible solution that might allow you to just favor plain old unaliased data types more and more in spite of all these third party libraries and not face a worst-case scenario if your assumptions cease to be correct on some platform.

Why doesn't Haskell have symbols (a la ruby) / atoms (a la erlang)?

The two languages where I have used symbols are Ruby and Erlang and I've always found them to be extremely useful.
Haskell does have algebraic datatypes, but I still think symbols would be mighty convenient. An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
The syntactic sugar for atoms can be minor - :something or <something> is an atom. All atoms are instances of a Type called Atom which derives Show and Eq. You can then use it for more descriptive error codes, for example
type ErrorCode = Atom
type Message = String
data Error = Error ErrorCode Message
loginError = Error :redirect "Please login first"
In this case :redirect is more efficient than using a string ("redirect") and easier to understand than an integer (404).
The benefit may seem minor, but I say it is worth adding atoms as a language feature (or at least a GHC extension).
So why have symbols not been added to the language? Or am I thinking about this the wrong way?
I agree with camccann's answer that it's probably missing mainly because it would have to be baked quite deeply into the implementation and it is of too little use for this level of complication. In Erlang (and Prolog and Lisp) symbols (or atoms) usually serve as special markers and serve mostly the same notion as a constructor. In Lisp, the dynamic environment includes the compiler, so it's partly also a (useful) compiler concept leaking into the runtime.
The problem is the following, symbol interning is impure (it modifies the symbol table). Because we never modify an existing object it is referentially transparent, however, but if implemented naïvely can lead to space leaks in the runtime. In fact, as currently implemented in Erlang you can actually crash the VM by interning too many symbols/atoms (current limit is 2^20, I think), because they can never get garbage collected. It's also difficult to implement in a concurrent setting without a huge lock around the symbol table.
Both problems can be (and have been) solved, however. For example, see Erlang EEP 20. I use this technique in the simple-atom package. It uses unsafePerformIO under the hood, but only in (hopefully) rare cases. It could still use some help from the GC to perform an optimisation similar to indirection shortening. It also uses quite a few IORefs internally which isn't too great for performance and memory usage.
In summary, it can be done but implementing it properly is non-trivial. Compiler writers always weigh the power of a feature against its implementation and maintenance efforts, and it seems like first-class symbols lose out on this one.
I think the simplest answer is that, of the things Lisp-style symbols (which is where both Ruby and Erlang got the idea, I believe) are used for, in Haskell most are either:
Already done in some other fashion--e.g. a data type with a bunch of nullary constructors, which also behave as "convenient names for integers".
Awkward to fit in--things that exist at the level of language syntax instead of being regular data usually have more type information associated with them, but symbols would have to either be distinct types from each other (nearly useless without some sort of lightweight ad-hoc sum type) or all the same type (in which case they're barely different from just using strings).
Also, keep in mind that Haskell itself is actually a very, very small language. Very little is "baked in", and of the things that are most are just syntactic sugar for other primitives. This is a bit less true if you include a bunch of GHC extensions, but GHC with -XAndTheKitchenSinkToo is not the same language as Haskell proper.
Also, Haskell is very amenable to pseudo-syntax and metaprogramming, so there's a lot you can do even without having it built in. Particularly if you get into TH and scary type metaprogramming and whatever else.
So what it mostly comes down to is that most of the practical utility of symbols is already available from other features, and the stuff that isn't available would be more difficult to add than it's worth.
Atoms aren't provided by the language, but can be implemented reasonably as a library:
http://hackage.haskell.org/package/simple-atom
There are a few other libs on hackage, but this one looks the most recent and well-maintained.
Haskell uses type constructors* instead of symbols so that the set of symbols a function can take is closed, and can be reasoned about by the type system. You could add symbols to the language, but it would put you in the same place that using strings would - you'd have to check all possible symbols against the few with known meanings at runtime, add error handling all over the place, etc. It'd be a big workaround for all the compile-time checking.
The main difference between strings and symbols is interning - symbols are atomic and can be compared in constant time. Both are types with an essentially infinite number of distinct values, though, and against the grain of Haskell's specifying arguments and results with finite types.
I'm more familiar with OCaml than Haskell, so "type constructor" may not be the right term. Things like None or Just 3.
An immediate use that springs to mind is that since symbols are isomorphic to integers you can use them where you would use an integral or a string "primary key".
Use Enum instead.
data FileType = GZipped | BZipped | Plain
deriving Enum
descr ft = ["compressed with gzip",
"compressed with bzip2",
"uncompressed"] !! fromEnum ft

Resources