Heterogeneous vs Homogeneous - data-structures

I'm little bit confused about heterogeneous and homogeneous list and array. In OOP context, if I define base class and it's derived class, why is array of base class homogeneous, if I can store derived classes as well? It's same principle, as void pointer in C (i.e. https://gist.github.com/rawcoder/9720851 ). Every literature says, that homogeneous structure is of same type (semantically), so can you please explain this to me little bit more further?

The simplest way of putting it is that instances of a derived type are instances of a base type, so can be stored in a collection of the base type.
It has more to do with the type of the collection than what's stored in it. If a collection has type array[B], and D < B (type D derives from B), then storing a D in an array[B] doesn't violate the type of the array or operations on the array/elements of the array. However, if the array were defined to hold only descendants of B but not B itself (e.g. array[ D < B ]; array<D extends B> in Java-speak), then it would be heterogenous, because it's declared to hold multiple types, rather than a single type. Note that in some programming languages, "heterogeneous" means holds any type (e.g. array[Any?], array<*>), not just multiple different types.
Outside of such languages, union types muddy the waters, because they allow disparate types to be treated as one. array[A | B | C] could be viewed as heterogeneous, or as the homogeneous array[U], where U = A | B | C.
The point of a type system in programming is to ensure a certain level of correctness by restricting what is allowed, based on type. Homogeneity strengthens the typing system, while unions weaken typing, as they:
can allow an instance of one type to be treated as another, causing a run-time type violation, and
allow for some types to not be handled, which is potentially another error.
Some type systems avoid these by using the more restricted sum type (aka "variant type"), which includes type information along with instances, instead of using union types. This prevents both the above issues with union types, restoring homogeneity. However, type systems that have sum types are usually so strong that the concepts of homogeneity and heretogeneity aren't as useful (basically, every collection is homogeneous, even the heterogeneous ones, which are collections of the maximal type from which all other types descend; consider the array[Any?] example above).
Storing subtypes in a homogeneous collection is a consequence of the LSP, which states that a property of a type should be true of a subtype (i.e. subtypes should be substitutable in contexts in place of a supertype). If the method (e.g. operator[], add) that adds an element to an array takes type B, it should also take subtypes. If code operates on elements of a collection expecting they're of type B, it should operate just as correctly on subtypes.
Note this is distinct from the matter of subtype relationships between collection types (e.g. if D < B, is array[D] < array[B]?), which is a matter of (type) variance.

Related

Constraint with slice and map union: invalid operation: cannot index c (variable of type T constrained by sliceOrMap) [duplicate]

I decided to dive into Go since 1.18 introduced generics. I want to implement an algorithm that only accepts sequential types — arrays, slice, maps, strings, but I'm not able to crack how.
Is there a method that can be targeted involving indexability?
You can use a constraint with a union, however the only meaningful one you can have is:
type Indexable interface {
~[]byte | ~string
}
func GetAt[T Indexable](v T, i int) byte {
return v[i]
}
And that's all, for the time being. Why?
The operations allowed on types with union constraint are only those allowed for all types in the constraint type set.
To allow indexing, the types in the union must have equal key type and equal element type.
The type parameter proposal suggests that map[int]T could be used in a union with []T, however this has been disallowed. The specs now mention this in Index expressions: "If there is a map type in the type set of P, all types in that type set must be map types, and the respective key types must be all identical".
For arrays, the length is part of the type, so a union would have to specify all possible lengths you want to handle, e.g. [1]T | [2]T etc. Quite impractical, and prone to out-of-bounds issues (There's a proposal to improve this).
So the only union with diverse types that supports indexing appears to be []byte | string (possibly approximated ~). Since byte is an alias of uint8, you can also instantiate with []uint8.
Other than that, there's no other way to define a constraint that supports indexing on all possible indexable types.
NOTE that []byte | string supports indexing but not range, because this union doesn't have a core type.
Playground: https://gotipplay.golang.org/p/uatvtMo_mrZ

Enums in computer memory

A quote from Wikipedia's article on enumerated types would be the best opening for this question:
In other words, an enumerated type has values that are different from each other, and that can be compared and assigned, but which are not specified by the programmer as having any particular concrete representation in the computer's memory; compilers and interpreters can represent them arbitrarily.
While I understand the definition and uses of enums, I can't yet grasp the interaction between enums and memory — when an enum type is declared without creating an instance of enum type variable, is the type definition stored in memory as a union or a structure? And what is the meaning behind the aforementioned Wiki excerpt?
The Wikipedia excerpt isn't talking specifically about C's enum types. The C standard has some specific requirements for how enums work.
An enumerated type is compatible with either char or some signed or unsigned integer type. The choice of representation is up to the compiler, which must document its choice (it's implementation-defined), but the type must be able to represent all the values of the enumeration.
The values of the enumeration constants start at 0 by default, and increment by 1 for each successive constant:
enum foo {
zero, // equal to 0
one, // equal to 1
two // equal to 2
};
The constants are always of type int, regardless of what the enum type itself is compatible with. (It would have made more sense for the constants to be of the enumerated type; they're of type int for historical reason.)
You can specify values for some or all of the constants -- which means that the values are not necessarily distinct:
enum bar {
two = 2,
deux = 2,
zwei = 2,
one = 1,
dos // implicitly equal to 2
};
Defining an enumerated type doesn't result in anything being stored in memory at run time. If you define an object of the enumerated type, that object's value will be stored in memory (unless it's optimized away), and will occupy sizeof (enum whatever) bytes. It's the same as for objects of any other type.
An enumeration constant is treated as a constant expression. The expression two is treated almost identically to a constant 2.
Note that C++ has some different rules for enum types. Your question is tagged C, so I won't go into details.
It means that the enum constants are not required to be located in memory. You cannot take the addresses of them.
This allows the compiler to replace all references to enum constants with their actual values. For example, the code:
enum { x = 123; }
int y = x;
may compile as if it were:
int y = 123;
When an enum type is declared without creating an instance of enum type variable, is the type definition stored in memory as a union or a structure?
In C, types are mostly compile-time constructs; once the program has been compiled to machine code, all the type information disappears*. Accessing a struct member is instead "access the memory n bytes past this pointer".
So if the compiler inlines all the enums as shown above, then enums do not exist at all in compiled code.
* Except optionally in the debugging info section, but that's usually only read by debuggers.

Untyped set operations in Isabelle

I have the following code in Isabelle:
typedecl type1
typedecl type2
consts
A::"type1 set"
B::"type2 set"
When I want to use union operation with A and B as bellow:
axiomatization where
c0: "A ∩ B = {}"
Since A and B are sets of different types, I get an error of clash of types which makes sense!
I am wondering if there are any counterparts for set operations that they implicitly consider their operands as pure sets (i.e., ignore their types), therefore something like A ∩' B become possible ( ∩' is ∩ operation counterpart in above sense).
PS:
Another workaround is type casting that I wrote this as a separate question here to better organize my questions.
Thanks
Sets in Isabelle/HOL are always typed, i.e., they contain only elements of one type. If you want to have untyped sets, you have to switch to another logic such as Isabelle/ZF.
Similarly, all values in HOL are annotated with their type, and this is fundamental to the logic. Equality, for example, can only be applied to two values of the same type. Hence, there is no notion of equality between values of different types. Consequently, there is no set operation that ignores the type of values, because such an operation would necessarily have to know how to identify values of different types.

d2: immutability of partially known structures

In D, immutable is transitive, so assignments to any field of immutable structure is prohibited. As far as I understand, immutable structure variable is strongly guaranteed to be never ever changed, and all it's contents too.
But what if I have declared thing like this?
struct OpaqueData;
immutable(OpaqueData*) data;
How can D guarantee transitive immutability of structure not implemented in D and possibly having indirections?
What is right way to encapsulate such kind of pointer to opaque data in immutable class?
Since you don't know of any fields in OpaqueData, you can't assign to any contents of it in the first place.
You can, of course, break the type system entirely by casting away immutable (D does give you the power to do so) and assigning to the raw memory an OpaqueData* value points to, but then you're asking for whatever problems you'll end up with... If you don't do this and respect that your OpaqueData pointer is immutable, you cannot alter it in any way due to the transitive nature of type qualifiers.
This is, in fact, the entire point of them: They are mathematically sound.

Why is the new Tuple type in .Net 4.0 a reference type (class) and not a value type (struct)

Does anyone know the answer and/or have an opinion about this?
Since tuples would normally not be very large, I would assume it would make more sense to use structs than classes for these. What say you?
Microsoft made all tuple types reference types in the interests of simplicity.
I personally think this was a mistake. Tuples with more than 4 fields are very unusual and should be replaced with a more typeful alternative anyway (such as a record type in F#) so only small tuples are of practical interest. My own benchmarks showed that unboxed tuples up to 512 bytes could still be faster than boxed tuples.
Although memory efficiency is one concern, I believe the dominant issue is the overhead of the .NET garbage collector. Allocation and collection are very expensive on .NET because its garbage collector has not been very heavily optimized (e.g. compared to the JVM). Moreover, the default .NET GC (workstation) has not yet been parallelized. Consequently, parallel programs that use tuples grind to a halt as all cores contend for the shared garbage collector, destroying scalability. This is not only the dominant concern but, AFAIK, was completely neglected by Microsoft when they examined this problem.
Another concern is virtual dispatch. Reference types support subtypes and, therefore, their members are typically invoked via virtual dispatch. In contrast, value types cannot support subtypes so member invocation is entirely unambiguous and can always be performed as a direct function call. Virtual dispatch is hugely expensive on modern hardware because the CPU cannot predict where the program counter will end up. The JVM goes to great lengths to optimize virtual dispatch but .NET does not. However, .NET does provide an escape from virtual dispatch in the form of value types. So representing tuples as value types could, again, have dramatically improved performance here. For example, calling GetHashCode on a 2-tuple a million times takes 0.17s but calling it on an equivalent struct takes only 0.008s, i.e. the value type is 20× faster than the reference type.
A real situation where these performance problems with tuples commonly arises is in the use of tuples as keys in dictionaries. I actually stumbled upon this thread by following a link from the Stack Overflow question F# runs my algorithm slower than Python! where the author's F# program turned out to be slower than his Python precisely because he was using boxed tuples. Manually unboxing using a hand-written struct type makes his F# program several times faster, and faster than Python. These issues would never had arisen if tuples were represented by value types and not reference types to begin with...
The reason is most likely because only the smaller tuples would make sense as value types since they would have a small memory footprint. The larger tuples (i.e. the ones with more properties) would actually suffer in performance since they would be larger than 16 bytes.
Rather than have some tuples be value types and others be reference types and force developers to know which are which I would imagine the folks at Microsoft thought making them all reference types was simpler.
Ah, suspicions confirmed! Please see Building Tuple:
The first major decision was whether
to treat tuples either as a reference
or value type. Since they are
immutable any time you want to change
the values of a tuple, you have to
create a new one. If they are
reference types, this means there can
be lots of garbage generated if you
are changing elements in a tuple in a
tight loop. F# tuples were reference
types, but there was a feeling from
the team that they could realize a
performance improvement if two, and
perhaps three, element tuples were
value types instead. Some teams that
had created internal tuples had used
value instead of reference types,
because their scenarios were very
sensitive to creating lots of managed
objects. They found that using a value
type gave them better performance. In
our first draft of the tuple
specification, we kept the two-,
three-, and four-element tuples as
value types, with the rest being
reference types. However, during a
design meeting that included
representatives from other languages
it was decided that this "split"
design would be confusing, due to the
slightly different semantics between
the two types. Consistency in behavior
and design was determined to be of
higher priority than potential
performance increases. Based on this
input, we changed the design so that
all tuples are reference types,
although we asked the F# team to do
some performance investigation to see
if it experienced a speedup when using
a value type for some sizes of tuples.
It had a good way to test this, since
its compiler, written in F#, was a
good example of a large program that
used tuples in a variety of scenarios.
In the end, the F# team found that it
did not get a performance improvement
when some tuples were value types
instead of reference types. This made
us feel better about our decision to
use reference types for tuple.
If the .NET System.Tuple<...> types were defined as structs, they would not be scalable. For instance, a ternary tuple of long integers currently scales as follows:
type Tuple3 = System.Tuple<int64, int64, int64>
type Tuple33 = System.Tuple<Tuple3, Tuple3, Tuple3>
sizeof<Tuple3> // Gets 4
sizeof<Tuple33> // Gets 4
If the ternary tuple were defined as a struct, the result would be as follows (based on a test example I implemented):
sizeof<Tuple3> // Would get 32
sizeof<Tuple33> // Would get 104
As tuples have built-in syntax support in F#, and they are used extremely often in this language, "struct" tuples would pose F# programmers at risk of writing inefficient programs without even being aware of it. It would happen so easily:
let t3 = 1L, 2L, 3L
let t33 = t3, t3, t3
In my opinion, "struct" tuples would cause a high probability of creating significant inefficiencies in everyday programming. On the other hand, the currently existing "class" tuples also cause certain inefficiencies, as mentioned by #Jon. However, I think that the product of "occurrence probability" times "potential damage" would be much higher with structs than it currently is with classes. Therefore, the current implementation is the lesser evil.
Ideally, there would be both "class" tuples and "struct" tuples, both with syntactic support in F#!
Edit (2017-10-07)
Struct tuples are now fully supported as follows:
Built into mscorlib (.NET >= 4.7) as System.ValueTuple
Available as NuGet for other versions
Syntactic support in C# >= 7
Syntactic support in F# >= 4.1
For 2-tuples, you can still always use the KeyValuePair<TKey,TValue> from earlier versions of the Common Type System. It's a value type.
A minor clarification to the Matt Ellis article would be that the difference in use semantics between reference and value types is only "slight" when immutability is in effect (which, of course, would be the case here). Nevertheless, I think it would have been best in the BCL design not to introduce the confusion of having Tuple cross over to a reference type at some threshold.
I don't know but if you have ever used F# Tuples are part of the language. If I made a .dll and returned a type of Tuples it be nice to have a type to put that in. I suspect now that F# is part of the language (.Net 4) some modifications to CLR were made to accommodate some common structures in F#
From http://en.wikibooks.org/wiki/F_Sharp_Programming/Tuples_and_Records
let scalarMultiply (s : float) (a, b, c) = (a * s, b * s, c * s);;
val scalarMultiply : float -> float * float * float -> float * float * float
scalarMultiply 5.0 (6.0, 10.0, 20.0);;
val it : float * float * float = (30.0, 50.0, 100.0)

Resources