In my current project, I am persisting protobufs with enums. To ensure backwards compatibility, I need to make sure that these enums stay the same and I want to write unit tests for that.
Example:
legacy_mood.proto:
enum Mood {
HAPPY = 0;
SAD = 1;
}
mood.proto:
enum Mood {
EXCITED = 0;
HAPPY = 1;
SAD = 2;
}
I am looking for a way to compare these two protos, and in this case let the test fail, because the constant value from HAPPY and SAD changed.
I want to allow new values, so I really just want to check equivalence for elements that exist in the legacy proto, so EXCITED should be ignored in this case.
Before I implement this myself, is there a library for that? I've been googleing for a bit now, but couldn't find anything. Could be in Java, C++ or Python
In any of the language you mentioned, you can convert the Enum value A in message Foo to its string representation, and use the string name to get the Enum value B in message Bar, and make sure the two values are equal.
Python API: https://developers.google.com/protocol-buffers/docs/reference/python-generated#enum
Related
I am currently working on a high-performance Vector/Matrix Ruby gem C extension, as I find the built-in implementation cumbersome and not ideal for most cases that I have personally encountered, as well as lacking in other areas.
My first approach was implementing in Ruby as a subclass of Fiddle::CStructEntity, as a goal is to make them optimized for interop without need for conversion (such as passing to native OpenGL functions). Implementing in C offers a great benefit for the math, but I ran into a roadblock when trying to implement a minor function.
I wished to have a method return a Fiddle::Pointer to the struct (basically a pointer to Rdata->data. I wished to return an actual Fiddle::Pointer object. Returning an integer address, packed string, etc. is trivial, and using that could easily be extended in a Ruby method to convert to a Fiddle::Pointer like this:
def ptr
# Assume address is an integer address returned from C
Fiddle::Pointer.new(self.address, self.size)
end
This kind of opened up a question to me, and that is it possible to to even do such from C? Fiddle is not part of the core, library, it is part of the standard lib, and as such, is really just an extension itself.
The problem is trivial, and can be easily overcome with a couple lines of Ruby code as demonstrated above, but was more curious if returning a Fiddle object was even possible from a C extension without hacks? I was unable to find any examples of this being done, and as always when it comes to the documentation involving Fiddle, it is quite basic and does not explain much.
The solution for this is actually rather simple, though admittedly not as elegant or clean of a solution I was hoping to discover.
There are possibly more elaborate ways to go about this by including the headers for Fiddle, and building against it, but this was not really a viable solution, as I didn't want to restrict my C extension to only work with Ruby 2.0+, and would be perfectly acceptable to simply omit the method in the event Ruby version was less than 2.0.
First I include version.h, which gives access defines the macro RUBY_API_VERSION_MAJOR, which is all I really need to know in regards to whether or not Fiddle will be present or not.
This will be an abbreviated version to simply show how to get the Fiddle::Pointer class as a VALUE, and to create an instance.
#if RUBY_API_VERSION_MAJOR >= 2
rb_require("fiddle");
VALUE fiddle = rb_const_get(rb_cObject, rb_intern("Fiddle"));
rb_cFiddlePointer = rb_const_get(fiddle, rb_intern("Pointer"));
#endif
In this example, the class is stored in rb_cFiddlePointer, which can then be used to create and return a Fiddle::Pointer object from C.
// Get basic data about the struct
struct RData *rdata = RDATA(self);
VALUE *args = xmalloc(sizeof(VALUE) * 2);
// Set the platform pointer-size address (could just use size_t here...)
#if SIZEOF_INTPTR_T == 4
args[0] = LONG2NUM((long) rdata->data);
#elif SIZEOF_INTPTR_T == 8
args[0] = LL2NUM((long long) rdata->data);
#else
args[0] = INT2NUM(0);
#endif
// Get size of structure
args[1] = INT2NUM(SIZE_OF_YOUR_STRUCTURE);
VALUE ptr = rb_class_new_instance(2, args, rb_cFiddlePointer);
xfree(args);
return ptr;
After linking the function to an actual Ruby method, you can then call it to get a sized pointer to the internal structure in memory.
I started using Golang recently and stumbled across a problem:
I have two structs, human and alien, which are both based on the creature struct. I want to initialize one of them based on the value of the isAlien boolean inside of an if-statement.
Using the human := human{} notation or the alien equivalent inside the if blocks to initialize, the instances aren't accessible from outside of the if-statement.
On the other hand, the usual solution of declaring the type and the name of the variable before the if-statement and initializing the variable inside the if-statement doesn't work, because there two are different types:
var h human //use human or alien here?
if isAlien {
h = alien{} //Error: incompatible types
} else {
h = human{}
}
//same when swapping human with alien at the declaration
I know that I could just declare both types before the if-statement but that solution doesn't seem elegant to me.
Is there some obvious solution that I'm missing here?
As you noted, the problem is clearly represented by this statement:
var h human //use human or alien here?
If you plan to use that h variable there after creating the objects, then the type of h must be one that can accept either a human or alien as a value.
The way to do this in Go is by using an ìnterface that both alien and human can fulfil.
So you should declare an interface like:
type subject interface {
// you should list all functions that you plan to use on "h" afterwards
// both "human" and "alien" must implement those functions
}
Then:
var h subject
Will do the trick.
So, I'm going to go out on a limb and say you're probably thinking about this the wrong way.
The first question that occurs to me looking at your example is: what's the return type of this function? In other words, what signature do you need h to be? If alien has an embedded struct creature (which seems to be the inheritance pattern you're trying to follow), and you return a human from your function after declaring h to be a creature, anything that consumes your function will only know that it's dealing with a creature, so there's no point in declaring it a human or an alien in the first place.
I suspect that what you really want to be doing is moving away from concrete structs here and instead using interfaces. In that world, you'd have a creature interface, and both human and alien would satisfy the creature interface. You wouldn't necessarily know which one you were dealing with downstream, but you'd be able to reliably call creature methods and the appropriate human or alien implementation would be invoked.
In the Hack language type system, is there a "top" type, also known as an "any" type, or a universal "Object" type? That is, a type which all types are subclasses of?
The manual mentions "mixed" types, which might be similar, but are not really explained. There is also the possibility of simply omitting the type declaration in some places. However, this cannot be done everywhere, e.g. if I want to declare something to be a function from string to the top type, it's not clear how I do this. function (string): mixed?
I'm an engineer working on Hack at Facebook. This is a really insightful and interesting question. Depending on what exactly you're getting at, Hack has a couple different variations of this.
First, let's talk about mixed. It's the supertype of everything. For example, this typechecks:
<?hh // strict
function f(): mixed {
return 42;
}
But since it's the supertype of everything, you can't do much with a mixed value until you case analyze on what it actually is, via is_int, instanceof, etc. Here's an example of how you'd have to use the result of f():
<?hh // strict
function g(): int {
$x = f();
if (is_int($x)) {
return $x;
} else {
return 0;
}
}
The "missing annotation" type ("any") is somewhat different than this. Whereas mixed is the supertype of everything, "any" unifies with everything -- it's both the supertype and subtype of everything. This means that if you leave off an annotation, we'll assume you know what you're doing and just let it pass. For example, the following code typechecks as written:
<?hh
// No "strict" since we are omitting annotations
function f2() {
return 42;
}
function g2(): string {
return f2();
}
This clearly isn't sound -- we just broke the type system and will cause a runtime type error if we execute the above code -- but it's admitted in partial mode in order to ease conversion. Strict requires that you annotate everything, and so you can't get a value of type "any" in order to break the type system in this way if all of your code is in strict. Consider how you'd have to annotate the code above in strict mode: either f2 would have to return int and that would be a straight-up type error ("string is not compatible with int"), or f2 would have to return mixed and that would be a type error as written ("string is not compatible with mixed") until you did a case analysis with is_int etc as I did in my earlier example.
Hope this clears things up -- if you want clarification let me know in the comments and I'll edit. And if you have other questions that aren't strict clarifications of this, continue tagging them "hacklang" and we'll make sure they get responded to!
Finally: if you wouldn't mind, could you press the "file a documentation bug" on the docs pages that were confusing or unclear, or could in any way be improved? We ideally want docs.hhvm.com to be a one-stop place for stuff like this, but there are definitely holes in the docs that we're hoping smart, enthusiastic folks like yourself will help point out. (i.e., I thought this stuff was explained well in the docs, but since you are confused that is clearly not the case, and we'd really appreciate a bug report detailing where you got lost.)
What is the reasoning behind the two common variable declaration syntax that many popular languages use, such as:
int foo = 0;
and
foo:int = 0;
One problem I have with the second option, is that it almost looks like you are doing, "int = 0;". Why do languages use a particular way? Is it easier to parse or something of the like?
I have studied the basics of compiler development and I do not think that parsers have any problem at all in both cases given actual solutions and techniques.
For me it's clearly a matter of readability from human eyes. I think it's easier to read
int foo = 0
than
foo:int = 0
In fact, I would say that it's even easier to simply write foo = 0, since one can recognize that 0 is an integer number :) I personally like this approach, instead of having type identifiers.
Something like this (yes, this doesn't deal with some edge cases - that's not the point):
int CountDigits(int num) {
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
What's your opinion about this? That is, using function arguments as local variables.
Both are placed on the stack, and pretty much identical performance wise, I'm wondering about the best-practices aspects of this.
I feel like an idiot when I add an additional and quite redundant line to that function consisting of int numCopy = num, however it does bug me.
What do you think? Should this be avoided?
As a general rule, I wouldn't use a function parameter as a local processing variable, i.e. I treat function parameters as read-only.
In my mind, intuitively understandabie code is paramount for maintainability, and modifying a function parameter to use as a local processing variable tends to run counter to that goal. I have come to expect that a parameter will have the same value in the middle and bottom of a method as it does at the top. Plus, an aptly-named local processing variable may improve understandability.
Still, as #Stewart says, this rule is more or less important depending on the length and complexity of the function. For short simple functions like the one you show, simply using the parameter itself may be easier to understand than introducing a new local variable (very subjective).
Nevertheless, if I were to write something as simple as countDigits(), I'd tend to use a remainingBalance local processing variable in lieu of modifying the num parameter as part of local processing - just seems clearer to me.
Sometimes, I will modify a local parameter at the beginning of a method to normalize the parameter:
void saveName(String name) {
name = (name != null ? name.trim() : "");
...
}
I rationalize that this is okay because:
a. it is easy to see at the top of the method,
b. the parameter maintains its the original conceptual intent, and
c. the parameter is stable for the rest of the method
Then again, half the time, I'm just as apt to use a local variable anyway, just to get a couple of extra finals in there (okay, that's a bad reason, but I like final):
void saveName(final String name) {
final String normalizedName = (name != null ? name.trim() : "");
...
}
If, 99% of the time, the code leaves function parameters unmodified (i.e. mutating parameters are unintuitive or unexpected for this code base) , then, during that other 1% of the time, dropping a quick comment about a mutating parameter at the top of a long/complex function could be a big boon to understandability:
int CountDigits(int num) {
// num is consumed
int count = 1;
while (num >= 10) {
count++;
num /= 10;
}
return count;
}
P.S. :-)
parameters vs arguments
http://en.wikipedia.org/wiki/Parameter_(computer_science)#Parameters_and_arguments
These two terms are sometimes loosely used interchangeably; in particular, "argument" is sometimes used in place of "parameter". Nevertheless, there is a difference. Properly, parameters appear in procedure definitions; arguments appear in procedure calls.
So,
int foo(int bar)
bar is a parameter.
int x = 5
int y = foo(x)
The value of x is the argument for the bar parameter.
It always feels a little funny to me when I do this, but that's not really a good reason to avoid it.
One reason you might potentially want to avoid it is for debugging purposes. Being able to tell the difference between "scratchpad" variables and the input to the function can be very useful when you're halfway through debugging.
I can't say it's something that comes up very often in my experience - and often you can find that it's worth introducing another variable just for the sake of having a different name, but if the code which is otherwise cleanest ends up changing the value of the variable, then so be it.
One situation where this can come up and be entirely reasonable is where you've got some value meaning "use the default" (typically a null reference in a language like Java or C#). In that case I think it's entirely reasonable to modify the value of the parameter to the "real" default value. This is particularly useful in C# 4 where you can have optional parameters, but the default value has to be a constant:
For example:
public static void WriteText(string file, string text, Encoding encoding = null)
{
// Null means "use the default" which we would document to be UTF-8
encoding = encoding ?? Encoding.UTF8;
// Rest of code here
}
About C and C++:
My opinion is that using the parameter as a local variable of the function is fine because it is a local variable already. Why then not use it as such?
I feel silly too when copying the parameter into a new local variable just to have a modifiable variable to work with.
But I think this is pretty much a personal opinion. Do it as you like. If you feel sill copying the parameter just because of this, it indicates your personality doesn't like it and then you shouldn't do it.
If I don't need a copy of the original value, I don't declare a new variable.
IMO I don't think mutating the parameter values is a bad practice in general,
it depends on how you're going to use it in your code.
My team coding standard recommends against this because it can get out of hand. To my mind for a function like the one you show, it doesn't hurt because everyone can see what is going on. The problem is that with time functions get longer, and they get bug fixes in them. As soon as a function is more than one screen full of code, this starts to get confusing which is why our coding standard bans it.
The compiler ought to be able to get rid of the redundant variable quite easily, so it has no efficiency impact. It is probably just between you and your code reviewer whether this is OK or not.
I would generally not change the parameter value within the function. If at some point later in the function you need to refer to the original value, you still have it. in your simple case, there is no problem, but if you add more code later, you may refer to 'num' without realizing it has been changed.
The code needs to be as self sufficient as possible. What I mean by that is you now have a dependency on what is being passed in as part of your algorithm. If another member of your team decides to change this to a pass by reference then you might have big problems.
The best practice is definitely to copy the inbound parameters if you expect them to be immutable.
I typically don't modify function parameters, unless they're pointers, in which case I might alter the value that's pointed to.
I think the best-practices of this varies by language. For example, in Perl you can localize any variable or even part of a variable to a local scope, so that changing it in that scope will not have any affect outside of it:
sub my_function
{
my ($arg1, $arg2) = #_; # get the local variables off the stack
local $arg1; # changing $arg1 here will not be visible outside this scope
$arg1++;
local $arg2->{key1}; # only the key1 portion of the hashref referenced by $arg2 is localized
$arg2->{key1}->{key2} = 'foo'; # this change is not visible outside the function
}
Occasionally I have been bitten by forgetting to localize a data structure that was passed by reference to a function, that I changed inside the function. Conversely, I have also returned a data structure as a function result that was shared among multiple systems and the caller then proceeded to change the data by mistake, affecting these other systems in a difficult-to-trace problem usually called action at a distance. The best thing to do here would be to make a clone of the data before returning it*, or make it read-only**.
* In Perl, see the function dclone() in the built-in Storable module.
** In Perl, see lock_hash() or lock_hash_ref() in the built-in Hash::Util module).