On signature polymorphic methods in Java-7 - java-7

As far as I can tell, with the introduction of MethodHandle in Java 7 came the introduction of compiler-generated method overloads.
The javadoc for MethodHandle states (I've trimmed the examples):
Here are some examples of usage:
Object x, y; String s; int i;
mh = ...
// (Ljava/lang/String;CC)Ljava/lang/String;
// (String, char, char) -> String
s = (String) mh.invokeExact("daddy",'d','n');
// (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
// (Object, Object, Object) -> Object
x = mh.invokeExact((Object)1, (Object)2, (Object)3);
// (Ljava/util/List;)I
// (List) -> int
i = (int) mh.invokeExact(java.util.Arrays.asList(1,2,3));
// (Ljava/io/PrintStream;Ljava/lang/String;)V
// (PrintStream, String) -> void
mh.invokeExact(System.out, "Hello, world.");
Each of the above calls generates a single invokevirtual instruction
with the name invoke and the type descriptors indicated in the
comments. The argument types are taken directly from the actual
arguments, while the return type is taken from the cast immediately
applied to the call. This cast may be to a primitive. If it is
missing, the type defaults to Object if the call occurs in a context
which uses the return value. If the call occurs as a statement, a cast
is impossible, and there is no return type; the call is void.
In effect, invokeExact and friends behave as if there is an overload for every possible combination of paramaters and return type.
I've heard that MethodHandles are preparing for features in Java 8 like lambdas. (I know they are already useful for scripting languages.)
[/introduction]
So, are there more of these compiler-generated overloads hiding around in Java? Are there hints there will be more of them in the future (say, with extension methods)? Why is it necessary in the first place? Merely speed? How does it help lambdas out (I thought lambdas would compile to an anonymous inner class)?
In short, what's the rationale; why are they (generated overloads) useful now and in the future?
UPDATE: What I call compiler-generated overloads here, the Oracle guys call signature polymophic.

I just came across an Hotspot internals wiki on MethodHandles and invokedynamic
It makes a few interesting points that answer these questions (and a few more).
What is called compiler-generated overloads in the question, the java guys call signature polymorphic.
MethodHandle.invokeExact and friends are unique, being the only signature polymorphic methods.
On the HotSpot VM, the invokevirtual bytecode for MethodHandle.invoke* is secretly converted to an invokehandle instruction.
invokehandle is like invokedynamic; a few internals are different, and where each invokedynamic instruction must point to it's own Constant Pool Cache Entry (CPCE), invokehandles can share CPCEs.
invokedynamic uses the non-public MethodHandle.invokeBasic on the HotSpot VM
MethodHandle.invokeBasic is like invokeExact but more loose; for one it does not check the types of at the call-site with those of the callee.
Hot method handles (including invokedynamic) can be JIT-compiled
Additionally, lambda expressions will be implemented via invokedynamic. (Got that from Edwin Dalorzo's answer.) This means lambda expressions
will indirectly use MethodHandle.invokeBasic on the HotSpot VM (see above), and
are eligible to be JIT-compiled

These two links may not answer all your questions, but they might be a good starting point:
Translation of Lambda Expressions
From Lambdas to Bytecode
This is reference material comming from the expert group currently working in the JDK 8: Project Lambda. With luck you can find some explanations there, above all about your missconception of lambda expressions as inner classes.

Related

What happens when a method is used on an object created from a built in class?

I understand that classes are like mold from which you can create objects, and a class defines a number of methods and variables (class,instances,local...) inside of it.
Let's say we have a class like this:
class Person
def initialize (name,age)
#name = name
#age = age
end
def greeting
"#{#name} says hi to you!"
end
end
me = Person.new "John", 34
puts me.greeting
As I can understand, when we call Person.new we are creating an object of class Person and initializing some internal attributes for that object, which will be stored in the instance variables #name and #age. The variable me will then be a reference to this newly created object.
When we call me.greeting, what happens is that greeting method is called on the object referenced by me, and that method will use the instance variable #name that is directly tied/attached to that object.
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
Let's say now that we have the string "hello". We created it using a string literal, just like: string = "hello".
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
My doubt arises because I can't understand what happens when we call something like string.upcase, how does the #upcase method "work" on string? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
Hence, when calling a method on an object you are actually "talking" to that object, inspecting and using its attributes that are stored in its instance variables. All good for now.
No, that is very much not what you are doing in an Object-Oriented Program. (Or really any well-designed program.)
What you are describing is a break of encapsulation, abstraction, and information hiding. You should never inspect and/or use another object's instance variables or any of its other private implementation details.
In Object-Orientation, all computation is performed by sending messages between objects. The only thing you can do is sending messages to objects and the only thing you can observe about an object is the responses to those messages.
Only the object itself can inspect and use its attributes and instance variables. No other object can, not even objects of the same type.
If you send an object a message and you get a response, the only thing you know is what is in that response. You don't know how the object created that response: did the object compute the answer on the fly? Was the answer already stored in an instance variable and the object just responded with that? Did the object delegate the problem to a different object? Did it print out the request, fax it to a temp agency in the Philippines, and have a worker compute the answer by hand with pen and paper? You don't know. You can't know. You mustn't know. That is at the very heart of Object-Orientation.
This is, BTW, exactly how messaging works in real-life. If you send someone a message asking "what is π²" and they answer with "9.8696044011", then you have no idea whether they computed this by hand, used a calculator, used their smart phone, looked it up, asked a friend, or hired someone to answer the question for them.
You can imagine objects as being little computers themselves: they have internal storage, RAM, HDD, SSD, etc. (instance variables), they have code running on them, the OS, the basic system libraries, etc. (methods), but one computer cannot read another computer's RAM (access its instance variables) or run its code (execute its methods). It can only send it a request over the network and look at the response.
So, in some sense, your question is meaningless: from the point of view of Object-Oriented Abstraction, is should be impossible to answer your question, because it should be impossible to know how an object is implemented internally.
It could use instance variables, or it could not. It could be implemented in Ruby, or it could be implemented in another programming language. It could be implemented as a standard Ruby object, or it could be implemented as some secret internal private part of the Ruby implementation.
In fact, it could even not exist at all! (For example, in many Ruby implementations small integers do not actually exist as objects at all. The Ruby implementation will just make it look like they do.)
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
[…] [W]hat happens when we call something like string.upcase, how does the #upcase method "work" on string? I guess that in order to return the string in uppercase, the string object previously declared has some instance variables attached to, and the instances methods work on those variables?
There is nothing in the Ruby Language Specification that says how the String#upcase method is implemented. The Ruby Language Specification only says what the result is, but it doesn't say anything about how the result is computed.
Note that this is not specific to Ruby. This is how pretty much every programming language works. The Specification says what the results should be, but the details of how to compute those results is left to the implementor. By leaving the decision about the internal implementation details up to the implementor, this frees the implementor to choose the most efficient, most performant implementation that makes sense for their particular implementation.
For example, in the Java platform, there are existing methods available for converting a string to upper case. Therefore, in an implementation like TruffleRuby, JRuby, or XRuby, which sits on top of the Java platform, it makes sense to just call the existing Java methods for converting strings to upper case. Why waste time implementing an algorithm for converting strings to upper case when somebody else has already done that for you? Likewise, in an implementation like IronRuby or Ruby.NET, which sit on top of the .NET platform, you can just use .NET's builtin methods for converting strings to upper case. In an implementation like Opal, you can just use ECMAScript's methods for converting strings to upper case. And so on.
Unfortunately, unlike many other programming languages, the Ruby Language Specification does not exist as a single document in a single place. Ruby does not have a single formal specification that defines what certain language constructs mean.
There are several resources, the sum of which can be considered kind of a specification for the Ruby programming language.
Some of these resources are:
The ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification – Note that the ISO Ruby Specification was written around 2009–2010 with the specific goal that all existing Ruby implementations at the time would easily be compliant. Since YARV only implements Ruby 1.9+ and MRI only implements Ruby 1.8 and lower, this means that the ISO Ruby Specification only contains features that are common to both Ruby 1.8 and Ruby 1.9. Also, the ISO Ruby Specification was specifically intended to be minimal and only contain the features that are absolutely required for writing Ruby programs. Because of that, it does for example only specify Strings very broadly (since they have changed significantly between Ruby 1.8 and Ruby 1.9). It obviously also does not specify features which were added after the ISO Ruby Specification was written, such as Ractors or Pattern Matching.
The Ruby Spec Suite aka ruby/spec – Note that the ruby/spec is unfortunately far from complete. However, I quite like it because it is written in Ruby instead of "ISO-standardese", which is much easier to read for a Rubyist, and it doubles as an executable conformance test suite.
The Ruby Programming Language by David Flanagan and Yukihiro 'matz' Matsumoto – This book was written by David Flanagan together with Ruby's creator matz to serve as a Language Reference for Ruby.
Programming Ruby by Dave Thomas, Andy Hunt, and Chad Fowler – This book was the first English book about Ruby and served as the standard introduction and description of Ruby for a long time. This book also first documented the Ruby core library and standard library, and the authors donated that documentation back to the community.
The Ruby Issue Tracking System, specifically, the Feature sub-tracker – However, please note that unfortunately, the community is really, really bad at distinguishing between Tickets about the Ruby Programming Language and Tickets about the YARV Ruby Implementation: they both get intermingled in the tracker.
The Meeting Logs of the Ruby Developer Meetings.
New features are often discussed on the mailing lists, in particular the ruby-core (English) and ruby-dev (Japanese) mailing lists.
The Ruby documentation – Again, be aware that this documentation is generated from the source code of YARV and does not distinguish between features of Ruby and features of YARV.
In the past, there were a couple of attempts of formalizing changes to the Ruby Specification, such as the Ruby Change Request (RCR) and Ruby Enhancement Proposal (REP) processes, both of which were unsuccessful.
If all else fails, you need to check the source code of the popular Ruby implementations to see what they actually do.
For example, this is what the ISO/IEC 30170:2012 Information technology — Programming languages — Ruby specification has to say about String#upcase:
15.2.10.5.42 String#upcase
upcase
Visibility: public
Behavior: The method returns a new direct instance of the class String which contains all the characters of the receiver, with all the lower-case characters replaced with the corresponding upper-case characters.
As you can see, there is no mention of instances variables or really any details at all about how the method is implemented. It only specifies the result.
If a Ruby implementor wants to use instance variables, they are allowed to use instances variables, if a Ruby implementor doesn't want to use instance variables, they are allowed to do that, too.
If you check the Ruby Spec Suite for String#upcase, you will find specifications like these (this is just an example, there are quite a few more):
describe "String#upcase" do
it "returns a copy of self with all lowercase letters upcased" do
"Hello".upcase.should == "HELLO"
"hello".upcase.should == "HELLO"
end
describe "full Unicode case mapping" do
it "works for all of Unicode with no option" do
"äöü".upcase.should == "ÄÖÜ"
end
it "updates string metadata" do
upcased = "aßet".upcase
upcased.should == "ASSET"
upcased.size.should == 5
upcased.bytesize.should == 5
upcased.ascii_only?.should be_true
end
end
end
Again, as you can see, the Spec only describes results but not mechanisms. And this is very much intentional.
The same is true for the Ruby-Doc documentation of String#upcase:
upcase(*options) → string
Returns a string containing the upcased characters in self:
s = 'Hello World!' # => "Hello World!"
s.upcase # => "HELLO WORLD!"
The casing may be affected by the given options; see Case Mapping.
There is no mention of any particular mechanism here, nor in the linked documentation about Unicode Case Mapping.
All of this only tells us how String#upcase is specified and documented, though. But how is it actually implemented? Well, lucky for us, the majority of Ruby implementations are Free and Open Source Software, or at the very least make their source code available for study.
In Rubinius, you can find the implementation of String#upcase in core/string.rb lines 819–822 and it looks like this:
def upcase
str = dup
str.upcase! || str
end
It just delegates the work to String#upcase!, so let's look at that next, it is implemented right next to String#upcase in core/string.rb lines 824–843 and looks something like this (simplified and abridged):
def upcase!
return if #num_bytes == 0
ctype = Rubinius::CType
i = 0
while i < #num_bytes
c = #data[i]
if ctype.islower(c)
#data[i] = ctype.toupper!(c)
end
i += 1
end
end
So, as you can see, this is indeed just standard Ruby code using instance variables like #num_bytes which holds the length of the String in platform bytes and #data which is an Array of platform bytes holding the actual content of the String. It uses two helper methods from the Rubinius::CType library (a library for manipulating individual characters as byte-sized integers). The "actual" conversion to upper case is done by Rubinius::CType::toupper!, which is implemented in core/ctype.rb and is extremely simple (to the point of being simplistic):
def self.toupper!(num)
num - 32
end
Another very simple example is the implementation of String#upcase in Opal, which you can find in opal/corelib/string.rb and looks like this:
def upcase
`self.toUpperCase()`
end
Opal is an implementation of Ruby for the ECMAScript platform. Opal cleverly overloads the Kernel#` method, which is normally used to spawn a sub shell (which doesn't exist in ECMAScript) and execute commands in the platform's native command language (which on the ECMAScript platform arguably is ECMAScript). In Opal, Kernel#` is instead used to inject arbitrary ECMAScript code into Ruby.
So, all that `self.toUpperCase()` does, is call the String.prototype.toUpperCase method on self, which does work because of how the String class is defined in Opal:
class ::String < `String`
In other words, Opal implements Ruby's String class by simply inheriting from ECMAScript's String "class" (really the String Constructor function) and is therefore able to very easily and elegantly reuse all the work that has been done implementing Strings in ECMAScript.
Another very simple example is TruffleRuby. Its implementation of String#upcase can be found in src/main/ruby/truffleruby/core/string.rb and looks like this:
def upcase(*options)
s = Primitive.dup_as_string_instance(self)
s.upcase!(*options)
s
end
Similar to Rubinius, String#upcase just delegates to String#upcase!, which is not surprising since TruffleRuby's core library was originally forked from Rubinius's. This is what String#upcase! looks like:
def upcase!(*options)
mapped_options = Truffle::StringOperations.validate_case_mapping_options(options, false)
Primitive.string_upcase! self, mapped_options
end
The Truffle::StringOperations::valdiate_case_mapping_options helper method is not terribly interesting, it is just used to implement the rather complex rules for what the Case Mapping Options that you can pass to the various String methods are allowed to look like. The actual "meat" of TruffleRuby's implementation of String#upcase! is just this: Primitive.string_upcase! self, mapped_options.
The syntax Primitive.some_name was agreed upon between the developers of multiple Ruby implementations as "magic" syntax within the core of the implementation itself to be able to call out from Ruby code into "primitives" or "intrinsics" that are provided by the runtime system, but are not necessarily implemented in Ruby.
In other words, all that Primitive.string_upcase! self, mapped_options tells us is "there is a magic function called string_upcase! defined somewhere deep in the bowels of TruffleRuby itself, which knows how to convert a string to upper case, but we are not supposed to know how it works".
If you are really curious, you can find the implementation of Primitive.string_upcase! in src/main/java/org/truffleruby/core/string/StringNodes.java. The code looks dauntingly long and complex, but all you really need to know is that the Truffle Language Implementation Framework is based on constructing Nodes for an AST-walking interpreter. Once you ignore all the machinery related to constructing the AST nodes, the code itself is actually fairly simple.
Once again, the implementors are relying on the fact that the Truffle Language Implementation Framework already comes with a powerful implementation of strings, which the TruffleRuby developers can simply reuse for their own strings.
By the way, this idea of "primitives" or "intrinsics" is an idea that is used in many programming language implementations. It is especially popular in the Smalltalk world. It allows you to write the definition of your methods in the language itself, which in turn allows features like reflection and tools like documentation generators and IDEs (e.g. for automatic code completion) to work without them having to understand a second language, but still have an efficient implementation in a separate language with privileged access to the internals of the implementation.
For example, because large parts of YARV are implemented in C instead of Ruby, but YARV is the implementation that the documentation on Ruby-Doc and Ruby-Lang is generated from, that means that the RDoc Ruby Documentation Generator actually needs to understand both Ruby and C. And you will notice that sometimes documentation for methods implemented in C is missing, incomplete, or corrupted. Similarly, trying to get information about methods implemented in C using Method#parameters sometimes returns non-sensical or useless results. This would not happen if YARV used something like Intrinsics instead of directly writing the methods in C.
JRuby implements String#upcase in several overloads of org.jruby.RubyString.upcase and String#upcase! in several overloads of org.jruby.RubyString.upcase_bang.
However, in the end, they all delegate to one specific overload of org.jruby.RubyString.upcase_bang defined in core/src/main/java/org/jruby/RubyString.java like this:
private IRubyObject upcase_bang(ThreadContext context, int flags) {
modifyAndKeepCodeRange();
Encoding enc = checkDummyEncoding();
if (((flags & Config.CASE_ASCII_ONLY) != 0 && (enc.isUTF8() || enc.maxLength() == 1)) ||
(flags & Config.CASE_FOLD_TURKISH_AZERI) == 0 && getCodeRange() == CR_7BIT) {
int s = value.getBegin();
int end = s + value.getRealSize();
byte[]bytes = value.getUnsafeBytes();
while (s < end) {
int c = bytes[s] & 0xff;
if (Encoding.isAscii(c) && 'a' <= c && c <= 'z') {
bytes[s] = (byte)('A' + (c - 'a'));
flags |= Config.CASE_MODIFIED;
}
s++;
}
} else {
flags = caseMap(context.runtime, flags, enc);
if ((flags & Config.CASE_MODIFIED) != 0) clearCodeRange();
}
return ((flags & Config.CASE_MODIFIED) != 0) ? this : context.nil;
}
As you can see, this is is a very low-level way of implementing it.
In MRuby, the implementation looks again very different. MRuby is designed to be light-weight, small, and easy to embed into a larger application. It is also designed to be used in small embedded systems such as robots, sensors, and IoT devices. Because of that, it is designed to be very modular: a lot of the parts of MRuby are optional and are distributed as "MGems". Even parts of the core language are optional and can be left out, such as support for the catch and throw keywords, big numbers, the Dir class, meta programming, eval, the Math module, IO and File, and so on.
If we want to find out where String#upcase is implemented, we have to follow a trail of breadcrumbs. We start with the mrb_str_upcase function in src/string.c which looks like this:
static mrb_value
mrb_str_upcase(mrb_state *mrb, mrb_value self)
{
mrb_value str;
str = mrb_str_dup(mrb, self);
mrb_str_upcase_bang(mrb, str);
return str;
}
This is a pattern we have already seen a couple of times: String#upcase just duplicates the String and then delegates to String#upcase!, which is implemented just above in mrb_str_upcase_bang:
static mrb_value
mrb_str_upcase_bang(mrb_state *mrb, mrb_value str)
{
struct RString *s = mrb_str_ptr(str);
char *p, *pend;
mrb_bool modify = FALSE;
mrb_str_modify_keep_ascii(mrb, s);
p = RSTRING_PTR(str);
pend = RSTRING_END(str);
while (p < pend) {
if (ISLOWER(*p)) {
*p = TOUPPER(*p);
modify = TRUE;
}
p++;
}
if (modify) return str;
return mrb_nil_value();
}
As you can see, there is a lot of mechanic in there to extract the underlying data structure from the Ruby String object, iterate over that data structure making sure to not run over the end, etc., but the real work of actually converting to uppercase is actually performed by the TOUPPER macro defined in include/mruby.h:
#define TOUPPER(c) (ISLOWER(c) ? ((c) & 0x5f) : (c))
There you have it! That's how String#upcase works "under the hood" in five different Ruby implementations: Rubinius, Opal, TruffleRuby, JRuby, and MRuby. And it will again be different in IronRuby, YARV, RubyMotion, Ruby.NET, XRuby, MagLev, MacRuby, tinyrb, MRI, IoRuby, or any of the other Ruby implementations of present, future, and past.
This shows you that there are many different ways of approaching how to implement something like String#upcase in a Ruby implementation. There are almost as many different approaches as there are implementations!
My question is, when creating an object from a built in class (String, Array, Integer...), are we actually storing some information on some instance variables for that object during its creation?
Yes, we are, basically:
string = "hello" is shorthand for string = String.new("hello")
take a look at the following:
https://ruby-doc.org/core-3.1.2/String.html#method-c-new (ruby 3)
https://ruby-doc.org/core-2.3.0/String.html#method-c-new (ruby 2)
What's the difference between String.new and a string literal in Ruby?
You can also check the following (to extend the functionalities of the class):
Extend Ruby String class with method to change the contents
So the short answer is:
Dealing with built in classes (String, Array, Integer, ...etc) is almost the same thing as we do in any other class we create

Return Ruby's Fiddle::Pointer from C function

I am currently working on a high-performance Vector/Matrix Ruby gem C extension, as I find the built-in implementation cumbersome and not ideal for most cases that I have personally encountered, as well as lacking in other areas.
My first approach was implementing in Ruby as a subclass of Fiddle::CStructEntity, as a goal is to make them optimized for interop without need for conversion (such as passing to native OpenGL functions). Implementing in C offers a great benefit for the math, but I ran into a roadblock when trying to implement a minor function.
I wished to have a method return a Fiddle::Pointer to the struct (basically a pointer to Rdata->data. I wished to return an actual Fiddle::Pointer object. Returning an integer address, packed string, etc. is trivial, and using that could easily be extended in a Ruby method to convert to a Fiddle::Pointer like this:
def ptr
# Assume address is an integer address returned from C
Fiddle::Pointer.new(self.address, self.size)
end
This kind of opened up a question to me, and that is it possible to to even do such from C? Fiddle is not part of the core, library, it is part of the standard lib, and as such, is really just an extension itself.
The problem is trivial, and can be easily overcome with a couple lines of Ruby code as demonstrated above, but was more curious if returning a Fiddle object was even possible from a C extension without hacks? I was unable to find any examples of this being done, and as always when it comes to the documentation involving Fiddle, it is quite basic and does not explain much.
The solution for this is actually rather simple, though admittedly not as elegant or clean of a solution I was hoping to discover.
There are possibly more elaborate ways to go about this by including the headers for Fiddle, and building against it, but this was not really a viable solution, as I didn't want to restrict my C extension to only work with Ruby 2.0+, and would be perfectly acceptable to simply omit the method in the event Ruby version was less than 2.0.
First I include version.h, which gives access defines the macro RUBY_API_VERSION_MAJOR, which is all I really need to know in regards to whether or not Fiddle will be present or not.
This will be an abbreviated version to simply show how to get the Fiddle::Pointer class as a VALUE, and to create an instance.
#if RUBY_API_VERSION_MAJOR >= 2
rb_require("fiddle");
VALUE fiddle = rb_const_get(rb_cObject, rb_intern("Fiddle"));
rb_cFiddlePointer = rb_const_get(fiddle, rb_intern("Pointer"));
#endif
In this example, the class is stored in rb_cFiddlePointer, which can then be used to create and return a Fiddle::Pointer object from C.
// Get basic data about the struct
struct RData *rdata = RDATA(self);
VALUE *args = xmalloc(sizeof(VALUE) * 2);
// Set the platform pointer-size address (could just use size_t here...)
#if SIZEOF_INTPTR_T == 4
args[0] = LONG2NUM((long) rdata->data);
#elif SIZEOF_INTPTR_T == 8
args[0] = LL2NUM((long long) rdata->data);
#else
args[0] = INT2NUM(0);
#endif
// Get size of structure
args[1] = INT2NUM(SIZE_OF_YOUR_STRUCTURE);
VALUE ptr = rb_class_new_instance(2, args, rb_cFiddlePointer);
xfree(args);
return ptr;
After linking the function to an actual Ruby method, you can then call it to get a sized pointer to the internal structure in memory.

Unable to update Kotlin method parameter's value [duplicate]

This question already has answers here:
Kotlin function parameter: Val cannot be reassigned
(4 answers)
Closed 5 years ago.
I've following Kotlin method
fun getpower(base:Int,power:Int):Int
{
var result = 1
while(power > 0){
result = result * base
power-- // <---- error in this line
}
return result
}
Kotlin compiler gives following error
Error:(6, 8) Val cannot be reassigned
What's wrong with updating the variable?
What's wrong with updating the variable?
The others answer the question by effectively saying "because Kotlin function parameters are immutable." Of course, that is a(the) correct answer.
But given the fact so many languages, including Java, allow you to re-assign function parameters, it might be valid to interpret your question as "why does Kotlin disallow re-assigning function parameters?"
My Answer: Kotlin and Swift have many features in common, so I went to Swift 3 to see why they decided to deprecate function parameter re-assignment and found this motivation.
Using var annotations on function parameters have limited utility, optimizing for a line of code at the cost of confusion with inout , which has the semantics most people expect. To emphasize the fact these values are unique copies and don't have the write-back semantics of inout , we should not allow var here.
In summary, the problems that motivate this change are:
• var is often confused with inout in function parameters.
• var is often confused to make value types have reference semantics.
•Function parameters are not refutable patterns like in if-, while-, guard-, for-in-, and case statements.
Of course, Kotlin has no inout decoration. But the writers could have chosen to allow val and var, with val being the default. Then they would have had behavior consistent with many other languages. Instead, they opted for code clarity.
The OPs example code shows a valid example of when parameter re-assignment is clear and natural. Having to add one more line to a very short function (to get a local variable to do what the parameter variable could have done) IMHO reduces clarity. Again, IMHO, I would have preferred optionally being able to declare my parameters as var.
A function's parameters inside the function are read-only (like variables created with val), therefore they cannot be reassigned.
You can see discussions about this design decision here and here.
In Kotlin, method parameter's are val(non-mutable) type not var(mutable) type. Similar as java final.
That's why i cant mutate(change) that .
The error you saw has more to do with scoping. A function's parameter by design is immutable or more accurately, read-only and that is what the val keyword stands for that is why you see that error.

In what way does this struct-field-aliasing code invoke Undefined Behavior

Given the code:
#include <stdlib.h>
#include <stdint.h>
typedef struct { int32_t x, y; } INTPAIR;
typedef struct { int32_t w; INTPAIR xy; } INTANDPAIR;
void foo(INTPAIR * s1, INTPAIR * s2)
{
s2->y++;
s1->x^=1;
s2->y--;
s1->x^=1;
}
int hey(int x)
{
static INTPAIR dummy;
void *p = calloc(sizeof (INTANDPAIR),1);
INTANDPAIR *p1 = p;
INTPAIR *p2a = p;
INTPAIR *p2b = &p1->xy;
p2b->x = x;
foo(p2b,p2a);
int result= p2b->x;
free(p);
return result;
}
#include <stdio.h>
int main(void)
{
for (int i=0; i<10; i++)
printf("%d.",hey(i));
}
Behavior depends upon gcc optimization level, which implies that gcc thinks
this code invokes Undefined Behavior (the definition of "foo" collapses to nothing, but interestingly the definition of "hey" increments the value passed in). I'm not quite sure what if anything it does that runs afoul of the Standard's rules, though.
The code very deliberately and evilly constructs two pointers such that
s2a->y and s2b->x will alias, but the pointers are deliberately constructed in such a way that both identify legitimate potential objects of type INTPAIR. Because code used calloc to get the memory, all field members have legitimate initial defined values of zero. All accesses to the allocated memory are done via an int32_t member of an INTPAIR*.
I can understand why it would make sense for the Standard to forbid aliasing structure fields in this fashion, but I couldn't find anything in the Standard which actually does so. Is gcc operating in Standard-compliant fashion here, or is it violating some clause in the Standard which isn't referenced by Annex J.2 and doesn't use any of the terms I searched for?
UPDATE:
I felt this answer was OK, but not still a little imprecise, and not cut and dry as to what the UB was. After a lot of very interesting discussion and comments I have tried again with a new answer
The right part of the C99 standard is quoted in this answer. I'm copying it here for convenience. The question and several of the answers are quite thorough.
(C99; ISO/IEC 9899:1999 6.5/7:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types 73) or 88):
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of
the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
73) or 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
What is an effective type then? (C99; ISO/IEC 9899:1999 6.5/6:
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
So at the line p2b->x = x the object at p+4 becomes of effective type INTPAIR. Is it aligned correctly? If it isn't then Undefined Behavior (UB). But to keep it interesting, assume it is as it must be in this case because of the layout of INTANDPAIR.
By the same analysis there are two 8 byte objects, p2a (s2) at #(p+4) and p2b #p. As your example is demonstrating the 2nd element of p2a and the first of p2b end up being aliased.
In the foo(), the object p2b #p+4 is accessed by the normal method via s1->x. But then the "stored value" of object p2b is also accessed by a side effect of modifying a different object p2a #p. Since this falls under none of the bullets of 6.5/7, it is UB. Note that 6.5/7 says only, so objects shall not be accessed in any other ways.
I think the main distinction is that the "object" in question is the whole structure p2a/s2 and p2b/s1, not the integer members. If you change the argument of the function to take the integers and alias them it works "fine" because the function can't know s1 and s2 alias. For example:
void foo2(int *s1, int *s2)
{
(*s2)++;
(*s1)^=1;
(*s2)--;
(*s1)^=1;
}
...
/*foo(p2b,p2a);*/
foo2((int*)p, (int*)p); /* or p+4 or whatever you want */
This more or less confirms that this is the way GCC chose to interpret things: modifying a member is modifying the whole struct object and that since side effects of modifying one object are not on the listed legal ways to indirectly modify a different object, whee! we can do whatever silly thing we feel like doing.
So whether GCC interprets the ambiguities in standard to decide that by deriving s1 and s2 pointers through different typed pointers and then accessing them constitutes indirectly accessing the memory via different original types via p1 and p or whether it interprets the standard in the way I'm suggesting that "object" s2->y modifies is not just the integer but the s2 object, it is UB either way. Or is GCC just being especially snarky and pointing out that if the standard doesn't very clearly specify the semantics of dynamically allocated yet overlapping objects, it is free to do whatever it wants because by definition it is "undefined".
I don't think at this microscopic level anyone other than the standards body can definitively answer whether this should be UB or not because at this level it requires some "interpretation". The GCC's implementers opinion's seem to favor very aggressive interpretations.
I like Linus's reaction to this whole thing. And it is true, why not just be conservative and let the programmer tell the compiler when it is safe? Very Excellent Linus Rant
My previous answer was lacking, maybe not completely wrong, but the sample program is deliberately designed to sidestep each of the more obvious explicit Undefined Behaviors (UB) dictated by the C99 standard, like 6.5/7. But with both GCC (and Clang) this example demonstrates strict aliasing failure like symptoms under optimization. They appear to be assuming s1->y and s2-x can't alias. So, is the compiler wrong? Is this a loophole in the strict aliasing legalese?
Short answer: No. I wouldn't be surprised if there was a loophole of some kind in the standard, given its complexity. But in this example, creating overlapping objects on the heap is explicitly undefined behavior, and there are several other things happening that the standard does not define.
I think the point of the example is not that it fails - it is obvious that "playing fast and loose" with pointers is a bad idea and relying on corner cases and legalese to prove the compile "wrong" is of little help if the code doesn't work. The key questions are: is GCC wrong? and what in the standard says so.
First, lets look at the obvious strict aliasing rules and how this example is trying to avoid them.
C99 6.5/7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 76)
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
This is the main strict aliasing section. It means that accessing the same memory via two different type pointers is UB. This example sidesteps it by accessing both using INTPAIR pointers in foo().
The key problem with this is that it is talking about accessing the stored value via two different effective types (e.g. pointers). It doesn't talk about accessing via two different objects.
What is being accessed? is it the integer member or the entire object s1 / s2? Is accessing s2->x via s1->y access via "a type compatible with the effective type of the object". I believe an argument can be made that a) the access as a side effect of modifying a different object does not fall under the permissible methods in 6.5/7 and that b) modifying one member of the aggregate transitively modifies the aggregate (*s1 or *s2) also.
Since this is not specified, it is UB, but it is a bit hand-wavy.
How did we get pointers to two overlapping objects? Are the pointer casts leading to them OK? Section 6.3.2.3 contains the rules for casting pointers and the example carefully does not violate any of them. In particular, because p2b is a pointer to INTANDPAIR member xy the alignment is guaranteed to be right, otherwise it would definitely run afoul of 6.3.2.3/7.
Furthermore, &p1->xy is not a problem - it can't be - it is a perfectly legitimate pointer to an INTPAIR. Simply casting pointers and/or taking addresses is safely outside the definition of "access" (3.1/1).
It is clear that the problem comes about by accessing two integer members that overlay each other as different parts of overlapping objects. Any attempt to do this via pointers of different types would clearly run afoul of 6.5/7. If accessed by the same type pointer at the same address, there would be no problem whatsoever. So the only way left that they could alias this way is that if two objects at different addresses overlapped in some fashion.
Obviously this could occur as part of a union, but that is not the case for this example. Type punning through unions may not be UB in C99, but it would be a different question whether a variant of this example could be made misbehave via unions.
The example uses dynamic allocation and casts the resultant void pointer to two different types. Going from from a pointer to an object to void * and back again is valid (6.3.2.3/1). Several other ways of obtaining pointers to objects that would overlap are explicitly UB by the pointer conversion rules of 6.3.2.3, the aliasing rules of 6.5/7, and/or the compatible type rules 6.2.7.
So what else is wrong?
6.2.4 Storage durations of objects
1 An object has a storage duration that determines its lifetime. There are three storage durations: static, automatic, and allocated. Allocated storage is described in 7.20.3
The storage for each of the objects is allocated by calloc() so the duration we want is "allocated". So we check 7.20.3: (emphasis added)
7.20.3 Memory management functions
1 The order and contiguity of storage allocated by successive calls to the calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.
...
2 The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, 25) and retains its last-stored value throughout its lifetime. 26) If an object is referred to outside of its lifetime, the behavior is undefined.
To avoid UB, the accesses to the two different objects must be to a valid object within its lifetime. You can get a single valid object (or an array) with malloc()/calloc(), but these guarantee that you will receive a pointer disjoint from all other objects. So is the object returned from calloc() p or is it p1? It can't be both.
The UB is triggered by attempting to reuse the same dynamically allocated object to hold two objects that are not disjoint. While calloc() guarantees it will return a pointer to a disjoint object, there is nothing that says it will still work if you then start using parts of the buffer for a 2nd overlapping one. In fact, it even explicitly says it is UB if you access an object outside its lifetime and there is only a single allocation ergo a single lifetime.
Also note:
4. Conformance
In this International Standard, ‘‘shall’’ is to be interpreted as a requirement on an implementation or on a program; conversely, ‘‘shall not’’ is to be interpreted as a prohibition.
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition
of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
For this to be a compiler error it must fail on a program that only uses constructs explicitly defined. Anything else is outside the safe-harbor and is still undefined, even if it the standard doesn't explicitly state that it is Undefined Behavior.

Initialize member variables in a method and not the constructor

I have a public method which uses a variable (only in the scope of the public method) I pass as a parameter we will call A, this method calls a private method multiple times which also requires the parameter.
At present I am passing the parameter every time but it looks weird, is it bad practice to make this member variable of the class or would the uncertainty about whether it is initialized out way the advantages of not having to pass it?
Simplified pseudo code:
public_method(parameter a)
do something with a
private_method(string_a, a)
private_method(string_b, a)
private_method(string_c, a)
private_method(String, parameter a)
do something with String and a
Additional information: parameter a is a read only map with over 100 entries and in reality I will be calling private_method about 50 times
I had this same problem myself.
I implemented it differently in 3 different contexts to see hands-on what are result using 3 different strategies, see below.
Note that I am type of programmer that makes many changes to the code always trying to improve it. Thus I settle only for the code that is amenable to changes, readbale, would you call this "flexible" code. I settle only for very clear code.
After experimentation, I came to these results:
Passing a as parameter is perfectly OK if you have one or two - short number - of such values. Passing in parmeters has very good visibility, clarity, clear passing lines, well visible lifetime (initialization points, destruction points), amenable to changes, easy to track.
If number of such values begin to grow to >= 5-6 values, I swithc to approach #3 below.
Passing values through class members -- did not do good to clarity of my code, eventually I got rid of it. It makes for less clear code. Code becomes muddled. I did not like it. It had no advantages.
As alternative to (1) and (2), I adopted Inner class approach, in cases when amount of such values is > 5 (which makes for too long argument list).
I pack those values into small Inner class and pass such object by reference as argument to all internal members.
Public function of a class usually creates an object of Inner class (I call is Impl or Ctx or Args) and passes it down to private functions.
This combines clarity of arg passing with brevity. It's perfect.
Good luck
Edit
Consider preparing array of strings and using a loop rather than writing 50 almost-identical calls. Something like char *strings[] = {...} (C/C++).
This really depends on your use case. Does 'a' represent a state that your application/object care about? Then you might want to make it a member of your object. Evaluate the big picture, think about maintenance, extensibility when designing structures.
If your parameter a is a of a class of your own, you might consider making the private_method a public method for the variable a.
Otherwise, I do not think this looks weird. If you only need a in just 1 function, making it a private variable of your class would be silly (at least to me). However, if you'd need it like 20 times I would do so :P Or even better, just make 'a' an object of your own that has that certain function you need.
A method should ideally not pass more than 7 parameters. Using the number of parameters more than 6-7 usually indicates a problem with the design (do the 7 parameters represent an object of a nested class?).
As for your question, if you want to make the parameter private only for the sake of passing between private methods without the parameter having anything to do with the current state of the object (or some information about the object), then it is not recommended that you do so.
From a performance point of view (memory consumption), reference parameters can be passed around as method parameters without any significant impact on the memory consumption as they are passed by reference rather than by value (i.e. a copy of the data is not created). For small number of parameters that can be grouped together you can use a struct. For example, if the parameters represent x and y coordinates of a point, then pass them in a single Point structure.
Bottomline
Ask yourself this question, does the parameter that you are making as a members represent any information (data) about the object? (data can be state or unique identification information). If the answer to his question is a clear no, then do not include the parameter as a member of the class.
More information
Limit number of parameters per method?
Parameter passing in C#

Resources