What is the top type in the Hack language? - syntax

In the Hack language type system, is there a "top" type, also known as an "any" type, or a universal "Object" type? That is, a type which all types are subclasses of?
The manual mentions "mixed" types, which might be similar, but are not really explained. There is also the possibility of simply omitting the type declaration in some places. However, this cannot be done everywhere, e.g. if I want to declare something to be a function from string to the top type, it's not clear how I do this. function (string): mixed?

I'm an engineer working on Hack at Facebook. This is a really insightful and interesting question. Depending on what exactly you're getting at, Hack has a couple different variations of this.
First, let's talk about mixed. It's the supertype of everything. For example, this typechecks:
<?hh // strict
function f(): mixed {
return 42;
}
But since it's the supertype of everything, you can't do much with a mixed value until you case analyze on what it actually is, via is_int, instanceof, etc. Here's an example of how you'd have to use the result of f():
<?hh // strict
function g(): int {
$x = f();
if (is_int($x)) {
return $x;
} else {
return 0;
}
}
The "missing annotation" type ("any") is somewhat different than this. Whereas mixed is the supertype of everything, "any" unifies with everything -- it's both the supertype and subtype of everything. This means that if you leave off an annotation, we'll assume you know what you're doing and just let it pass. For example, the following code typechecks as written:
<?hh
// No "strict" since we are omitting annotations
function f2() {
return 42;
}
function g2(): string {
return f2();
}
This clearly isn't sound -- we just broke the type system and will cause a runtime type error if we execute the above code -- but it's admitted in partial mode in order to ease conversion. Strict requires that you annotate everything, and so you can't get a value of type "any" in order to break the type system in this way if all of your code is in strict. Consider how you'd have to annotate the code above in strict mode: either f2 would have to return int and that would be a straight-up type error ("string is not compatible with int"), or f2 would have to return mixed and that would be a type error as written ("string is not compatible with mixed") until you did a case analysis with is_int etc as I did in my earlier example.
Hope this clears things up -- if you want clarification let me know in the comments and I'll edit. And if you have other questions that aren't strict clarifications of this, continue tagging them "hacklang" and we'll make sure they get responded to!
Finally: if you wouldn't mind, could you press the "file a documentation bug" on the docs pages that were confusing or unclear, or could in any way be improved? We ideally want docs.hhvm.com to be a one-stop place for stuff like this, but there are definitely holes in the docs that we're hoping smart, enthusiastic folks like yourself will help point out. (i.e., I thought this stuff was explained well in the docs, but since you are confused that is clearly not the case, and we'd really appreciate a bug report detailing where you got lost.)

Related

Why is GraphQL Variable Mutation Syntax So Redundant?

GraphQL queries/mutations are super clean: they only require what they actually require, nothing else. Or at least the basic ones are.
But if you use variables with either one, then your syntax inevitably has redundancy in it:
query HeroNameAndFriends($episode: Episode) {
hero(episode: $episode) {
name
friends {
name
}
}
}
Note the $episode: Episode and episode: $episode. The thing is EVERY GraphQL mutation requires this same redundancy: if you use variables, every argument has to be defined twice (and if you're making programmatic queries, you undoubtedly are using variables).
My question is, why? It seems so unnecessary to make everyone who uses GraphQL have to repeat their arguments a second time.
Why not just make the syntax:
query HeroNameAndFriends() {
hero($episode: Episode) {
name
friends {
name
}
}
}
or if your really need to allow for differing variable names, allow an optional third part:
query HeroNameAndFriends() {
hero(episode: Episode : $episode) {
name
friends {
name
}
}
}
To be clear, I understand that variable-using queries are different from non-variable ones, but what I'm asking about is, why pick a syntax for those queries that forces everyone to repeat themselves?
It just seems so ... not DRY! Surely I'm missing an important reason why this repetition is necessary?
Why do I have to list my arguments in a JavaScript function? Isn't it obvious that the function
const getFullName = () => firstname + ' ' + lastname;
takes two arguments, one named firstname one named lastname? Well I think here it is easy to see through but there are a more complicated cases in which it is not quite so obvious. Now the example above seems weird but there are some functional programming languages where expressions like (_ + 2) and _.concat(_) are valid function expressions. And you can create some pretty obnoxious code (looking at you, point free Haskell). But a lot of languages seem to think that declaring the input of a piece of code explicitly is a good idea.
Back to GraphQL: Let's look at some more complicated uses of variables because variables are really a full variable implementation. You can do more with them then just applying them to an argument:
query NestedInput($name: String) {
user(where: { name: { contains: $name } }) { ... }
}
query WithDirective($long: Boolean) {
users {
name
bio #include(if: $long)
friends(showAll: $long) {
name
}
}
}
So variables can be used in a lot of places and also be used multiple times. And not seldomly GraphQL queries become really large. Would this work with the second syntax that you have proposed? Yes, but I think the readability would suffer. DRY is not about reducing the amount of characters you have to type but about reducing errors. But often being explicit about things can also reduce errors (e.g. a lot of type systems these days are about being explicit about input parameters and their types).
So i guess it was simply a trade of decision that was made by the developers of GraphQL and they chose the explicit version. Don't forget the context: GraphQL was created at Facebook, one of the biggest web apps in the world.
Since the benefits of the explicitness are not obvious here is an edit that lists a few:
The declaration allows a developer to quickly understand all variables and their corresponding types in a query.
GraphQL developer tooling does not have to expensively infer the type of the variables from the schema and instead can simply lookup the input type.
Error messages get easier to understand, imagine the argument showAll does not take booleans but ints. With the declaration we can say mismatching types for variable $long. $long is Boolean but expected Int. If it is not declared we would have to say $long is sometimes used as Boolean, sometimes as Int. What if we use it as three different types?
Breaking queries could be detected by GraphQL only static code analysis: Again imagine we change the type of showAll to Int. In this case the query can be detected as wrong, while without the explicit type declaration the query might instead break at runtime.

C++11 Aggregate initialization of private member, is it correct?

Is it correct to initialize private member through aggregate initialization while passing it as a parameter into owner's class function? Just look at code below.
class A {
struct S {
int t, u;
};
public:
void f(const S& s) {}
};
int main() {
A a;
a.f({1, 2}); // correct?
return 0;
}
I checked standard and nets and it seems that there is no exact answer. Looks like mechanics are as follows:
* braced initializer is public thing and thus user doesn't violate access restrictions.
* implicit conversion from initializer into "S" is internal for "S" and thus also fine for compiler.
The question is, whether there is some reference in standard, draft or at least cppreference with the description of this behaviour?
Yes this is correct. The only thing private about S is the name. Access control only controls access through the name ([class.access]p4). So you could use a type trait to get the type of S for example through f's type (example).
So, it is allowed because there is no restriction [dcl.init.agg] that prohibits initializing "private" types.
There is also a note, found by #StephaDyatkovskiy.
It doesn't matter whether it's officially valid; you should avoid this corner case.
I would claim that "is it valid C++" is the wrong question here.
When you look at a piece of code and, try as you might, you can't decide whether it should be valid C++ or not; and you know it's going to be some corner case depending on the exact wording of the standard - it's usually a good idea not to rely on that corner case, either way. Why? Because other people will get confused too; they will waste time trying to figure out what you meant; they will go look it up in the standard - or worse, not look it up, and make invalid assumptions; and they will be distracted from what they actually need to focus on.
So, with this code, I would ask myself: "Is type S really private? Does outside code really not need to know about it?"
If the answer is "Yes, it is" - then I would change f, to take the parameters for an S constructor (and forward them to the ctor):
void f(int t, int u) { S {t, u}; /* etc. etc. */ }
If the answer is "No, code calling f() can know that it's passing an S reference" - then I would make S public.

Using function arguments as local variables

Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).

Is using flags very often in code advisable?

I came across lot of flags while reading someone else code,
if (condition1)
var1 = true
else
var1 = false
then later,
if (var1 == true)
// do something.
There are lot of flags like this. I eager to know, is using flags very often in code advisable?
This:
if (condition1)
var1= true;
else
var1 = false;
Is a classic badly written code.
Instead you should write:
var1 = condition1;
And yes, flags are very useful for making the code be more readable and possibly, faster.
It's advisable if condition1 is something quite complicated - like if (A && (B || C) && !D) or contains a lot of overhead (if (somethingTimeConsumingThatWontChange())) then it makes sense to store that result instead of copy-pasting the code.
If condition1 is just a simple comparison then no, I wouldn't use a flag.
This is pretty subjective, and depends on the rest of the code. "Flags" as you call them have their place.
First of all, this code should read like this:
var1 = condition1;
if( var1 )
// No need to compare *true* to *true* when you're looking for *true*
As for the number of flags, there are more elegant ways of branching your code. For instance , when using javascript you can do stuff like this:
var methodName = someFunctionThatReturnsAString();
// assuming you name the method according to what's returned
myObject[ methodName ]();
instead of
if( someFunctionThatReturnsAString === 'myPreferedMethod' ){
myObject.myPreferedMethod();
}else{
myObject.theOtherMethod();
}
If you're using a strongly typed language, polymorphism is your friend. I think the technique is refered to as polymorphic dispatch
I remember this Replace Temp var with Query method from the refactoring book.
I think this refactoring will make the code more readable, but, I agree that it might affect performance when the query method is expensive ... (But, maybe the query method can be put in its own class, and the result can be cached into that class).
This is question is a bit generic. The answer depends on what you want to do and with which language you want it to do. Assuming an OO context than there could be better approaches.
If the condition is the result of some object state than the "flag" should propably be a property of the object itself. If it is a condition of the running application and you have a lot of these things it might could be that you should think about a state pattern/state machine.
Flags are very useful - but give them sensible names, e.g. using "Is" or similar in their names.
For example, compare:
if(Direction) {/* do something */}
if(PowerSetting) {/* do something else */}
with:
if(DirectionIsUp) {/* do something */}
if(PowerIsOn) {/* do something else */}
If it is readable and does the job then there's nothing wrong with it. Just make use of "has" and "is" prefix to make it more readable:
var $isNewRecord;
var $hasUpdated;
if ($isNewRecord)
{
}
if ($hasUpdated)
{
}
Bearing in mind that that code could be more readably written as
var1 = condition1
, this assignment has some useful properties if used well. One use case is to name a complicated calculation without breaking it out into a function:
user_is_on_fire = condition_that_holds_when_user_is_on_fire
That allows one to explain what one is using the condition to mean, which is often not obvious from the bare condition.
If evaluating the condition is expensive (or has side effects), it might also be desirable to store the result locally rather than reevaluate the condition.
Some caveats: Badly named flags will tend to make the code less readable. So will flags that are set far from the place where they are used. Also, the fact that one wants to use flags is a code smell suggesting that one should consider breaking the condition out into a function.
D'A
Call it flags when you work in a pre-OO language. They are useful to parameterize the behaviour of a piece of code.
You'll find the code hard to follow, soon, however. It would be easier reading/changing/maintaining when you abstract away the differences by e.g. providing a reference to the changeable functionality.
In languages where functions are first-class citisens (e.g. Javascript, Haskell, Lisp, ...), this is a breeze.
In OO languages, you can implement some design patterns like Abstract Factory, Strategy/Policy, ...
Too many switches I personally regard as code smell.
That depends on the condition and how many times it's used. Anyway, refactoring into function (preferably caching the result if condition is slow to calculate) might give you a lot more readable code.
Consider for example this:
def checkCondition():
import __builtin__ as cached
try:
return cached.conditionValue
except NameError:
cached.conditionValue = someSlowFunction()
return cached.conditionValue
As for coding style:
if (condition1)
var1= true
else
var1 = false
I hate that kind of code. It should be either simply:
var1 = condition1
or if you want to assure that's result is boolean:
var1 = bool(condition1)
if (var1 == true)
Again. Bad coding style. It's:
if (var1)
What i dont like about flags, is when they are called flags, with no comment whatsoever.
e.g
void foo(...){
bool flag;
//begin some weird looking code
if (something)
[...]
flag = true;
}
They attempt against code redeability. And the poor guy who has to read it months/years after the original programmer is gone, is going to have some hard time trying to understand what the purposse of it originally was.
However, if the flag variable has a representative name, then i think they are ok, as long as used wisely (see other responses).
Yes, that is just silly nonsensical code.
You can simplify all that down to:
if (condition1)
{
// do something
}
Here's my take.
Code using flags:
...
if (dogIsBarking && smellsBad) {
cleanupNeeded = true;
}
doOtherStuff();
... many lines later
if (cleanupNeeded) {
startCleanup();
}
...
Very unclean. The programmer simply happens to code in whatever order his mind tells him to. He just added code at a random place to remind himself that cleanup is needed later on... Why didn't he do this:
...
doOtherStuff();
... many lines later
if (dogIsBarking && smellsBad) {
startCleanup();
}
...
And, following advise from Robert Martin (Clean Code), can refactor logic into more meaningful method:
...
doSomeStuff();
... many lines later
if (dogTookADump()) {
startCleanup();
}
...
boolean dogTookADump() {
return (dogIsBarking && smellsBad);
}
So, I have seen lots and lots of code where simple rules like above could be followed, yet people keep adding complications and flags for no reason! Now, there are legit cases where flags might be needed, but for most cases they are one style that programmers are carrying over from the past.

Is there a case where parameter validation may be considered redundant?

The first thing I do in a public method is to validate every single parameter before they get any chance to get used, passed around or referenced, and then throw an exception if any of them violate the contract. I've found this to be a very good practice as it lets you catch the offender the moment the infraction is committed but then, quite often I write a very simple getter/indexer such as this:
private List<Item> m_items = ...;
public Item GetItemByIdx( int idx )
{
if( (idx < 0) || (idx >= m_items.Count) )
{
throw new ArgumentOutOfRangeException( "idx", "Invalid index" );
}
return m_items[ idx ];
}
In this case the index parameter directly relates to the indexes in the list, and I know for a fact (e.g. documentation) that the list itself will do exactly the same and will throw the same exception. Should I remove this verification or I better leave it alone?
I wanted to know what you guys think, as I'm now in the middle of refactoring a big project and I've found many cases like the above.
Thanks in advance.
It's not just a matter of taste, consider
if (!File.Exists(fileName)) throw new ArgumentException("...");
var s = File.OpenText(fileName);
This looks similar to your example but there are several reasons (concurrency, access rights) why the OpenText() method could still fail, even with a FileNotFound error. So the Exists-check is just giving a false feeling of security and control.
It is a mind-set thing, when you are writing the GetItemByIdx method it probably looks quite sensible. But if you look around in a random piece of code there are usually lots of assumptions you could check before proceeding. It's just not practical to check them all, over and over. We have to be selective.
So in a simple pass-along method like GetItemByIdx I would argue against redundant checks. But as soon as the function adds more functionality or if there is a very explicit specification that says something about idx that argument turns around.
As a rule of thumb an exception should be thrown when a well defined condition is broken and that condition is relevant at the current level. If the condition belongs to a lower level, then let that level handle it.
I would only do parameter verification where it would lead to some improvement in code behavior. Since you know, in this case, that the check will be performed by the List itself, then your own check is redundant and provides no extra value, so I wouldn't bother.
It's true that possibly you duplicated work that's already been done in the API, but it's there now. If your error handling framework works and is solid, and isn't causing performance issues (profiling IYF) then I reckon leave it, and gradually phase it out if you have time. It doesn't sound like a top priority!

Resources