How does protobuf parse enums from strings? - enums

Given a proto definition like:
enum Foo {
STUFF = 1;
A_THING = 2;
}
...will protobuf parse any of the following strings into values correctly?
STUFF
Stuff
stuff
AThing
aThing
A_THING
Furthermore, does the parser implementation differ by language?
(The parser function in C++ would be named Foo_Parse(const std::string&, Foo*); not sure what it would be in other languages.)

The enum name-parsing helper for your Foo would only accept the exact strings "STUFF" and "A_THING".
Note that this name-parsing helper is just a convenience function for you. This is not a core Protobuf feature. In particular, enum names are not sent on the wire using the standard Protobuf encoding. It's important to make this clear since when talking about "parsing" it's easy to mistake this as having to do with Protobuf wire-format parsing, which it does not.
Since this is just a random helper and not a core feature, implementations in other languages could in theory offer helpers implementing arbitrary logic. You need to check the documentation for the specific language. However, I would be somewhat surprised if any language implemented an enum name-parsing helper that accepted different strings from the C++ one.

Related

Is there a way to perform compile time type-check in Ruby?

I know Ruby is dynamically and strongly typed, but AFAIK, current syntax doesn't allow checking the type of arguments at compile time due to lack of explicit type notation (or contract) for each argument.
If I want to perform compile-time type check, what (practically matured) options do I have?
Update
What I mean type-check is something like typical statically typed language. Such as C.
For example, C function denotes type of each argument and compiler checks passing-in argument is correct or not.
void func1(struct AAA aaa)
{
struct BBB bbb;
func1(bbb); // Wrong type. Compile time error.
}
As an another example, Objective-C does that by putting explicit type information.
- (id)method1:(AAA*)aaa
{
BBB* bbb = [[AAA alloc] init]; // Though we actually use correctly typed object...
[self method1:bbb]; // Compile time warning or error due to type contract mismatch.
}
I want something like that.
Update 2
Also, I mean compile-time = before running the script. I don't have better word to describe it…
There was a project for developing a type system, a type inferencer, a type checker and a syntax for type annotations for (a subset of) Ruby, called Diamondback Ruby. It was abandoned 4 years ago, you can find its source on GitHub.
But, basically, that language would no longer be Ruby. If static types are so important to you, you should probably just use a statically typed language such as Haskell, Scala, ML, Agda, Coq, ATS etc. That's what they're here for, after all.
RDL is a library for static type checking of Ruby/Rails programs. It has type annotations included for the standard library and (I think) for Rails. It lets you add types to methods/variables/etc. like so:
file.rb:
require 'rdl'
type '(Fixnum) -> Fixnum', typecheck: :now
def id(x)
"forty-two"
end
And then running file.rb will perform static type checking:
$ ruby file.rb
.../lib/rdl/typecheck.rb:32:in `error': (RDL::Typecheck::StaticTypeError)
.../file.rb:5:5: error: got type `String' where return type `Fixnum' expected
.../file.rb:5: "forty-two"
.../file.rb:5: ^~~~~~~~~~~
It seems to be pretty well documented!
While you can't check this in a static time-sense, you can use conditionals in your methods to run only after checking the object.
Here the #is_a? and #kind_of? come in handy...
def method(variable)
if variable.is_a? String
...
else
...
end
end
You would have the choice of returning specified error values or raise an exception. Hopefully this is close to what you are looking for.
You are asking for a "compile-time" type check, but in Ruby, there is no "compile" phase. Static analysis of Ruby code is almost impossible, since any method, even from the built-in classes, can be redefined at runtime. Classes can also be dynamically created and instantiated at runtime. How would you do type-checking for a class which doesn't even exist when the program starts?
Surely, your real goal is not just to "type-check your code". Your goal is to "write code that works", right? Type-checking is just a tool which can help you "write code that works". However, while type-checking is helpful, it has its limits. It can catch some simple bugs, but not most bugs, and not the most difficult bugs.
When you choose to use Ruby, you are giving up the benefits of type-checking. However, Ruby may allow you to get things done with much less code than other languages you are used to. Writing programs using less code, means that generally there are less bugs for you to fix. If you use Ruby skillfully, I believe the tradeoff is worth it.
Although you can't type-check your code in Ruby, there is great value in using assertions which check method arguments. In some cases, those assertions might check the type of an argument. More frequently, they will check other properties of the arguments. Then you need some tests which exercise the code. You will find that with a relatively small number of tests, you will catch more bugs than your C/C++ compiler can do.
It seems you want static types. There is not an effective way to do this in Ruby due to the language's dynamic nature.
A naive approach I can think of is to make a "contract" like this:
def up(name)
# name(string)
name.upcase
end
So the first line of each method will be a comment declaring what type each argument must have.
Then implement a tool that will statically scan & analyze the source and catch such errors by scanning the call sites of the above method and check the type of the passed argument whenever possible.
For example this would be easy to check:
x = "George"
up(x)
but how would you check this one:
x = rand(2).zero? "George" : 5
up(x)
In other words, most of the time the types are impossible to be deduced before runtime.
However if you do not care about the "type checking" happening statically, you could also do:
def up(name)
raise "TypeError etc." unless name.is_a? String
# ...
end
In any way, I don't think you will benefit from the above. I would recommend to make use of duck typing instead.
You might be interested in the idea of a "Pluggable type system". It means adding a static type system to a dynamic language, but the programmer decides what should be typed and what is left untyped. The typechecker stands aside the core language and it is usually implemented as a library. It can either do static checking or check types at runtime in a special "checked" mode that should be used during development and to execute tests.
The type checker for Ruby I found is called Rtc (Ruby Type Checker). Github, academic paper. The motivation is to make the requirements on a type of a parameter of a function or method explicit, move the requirements out of the tests into type annotations and turn the type annotation into an "executable documentation". Source.

How to Work with Ruby Duck Typing

I am learning Ruby and I'm having a major conceptual problem concerning typing. Allow me to detail why I don't understand with paradigm.
Say I am method chaining for concise code as you do in Ruby. I have to precisely know what the return type of each method call in the chain, otherwise I can't know what methods are available on the next link. Do I have to check the method documentation every time?? I'm running into this constantly running tutorial exercises. It seems I'm stuck with a process of reference, infer, run, fail, fix, repeat to get code running rather then knowing precisely what I'm working with during coding. This flies in the face of Ruby's promise of intuitiveness.
Say I am using a third party library, once again I need to know what types are allow to pass on the parameters otherwise I get a failure. I can look at the code but there may or may not be any comments or declaration of what type the method is expecting. I understand you code based on methods are available on an object, not the type. But then I have to be sure whatever I pass as a parameter has all the methods the library is expect, so I still have to do type checking. Do I have to hope and pray everything is documented properly on an interface so I know if I'm expected to give a string, a hash, a class, etc.
If I look at the source of a method I can get a list of methods being called and infer the type expected, but I have to perform analysis.
Ruby and duck typing: design by contract impossible?
The discussions in the preceding stackoverflow question don't really answer anything other than "there are processes you have to follow" and those processes don't seem to be standard, everyone has a different opinion on what process to follow, and the language has zero enforcement. Method Validation? Test-Driven Design? Documented API? Strict Method Naming Conventions? What's the standard and who dictates it? What do I follow? Would these guidelines solve this concern https://stackoverflow.com/questions/616037/ruby-coding-style-guidelines? Is there editors that help?
Conceptually I don't get the advantage either. You need to know what methods are needed for any method called, so regardless you are typing when you code anything. You just aren't informing the language or anyone else explicitly, unless you decide to document it. Then you are stuck doing all type checking at runtime instead of during coding. I've done PHP and Python programming and I don't understand it there either.
What am I missing or not understanding? Please help me understand this paradigm.
This is not a Ruby specific problem, it's the same for all dynamically typed languages.
Usually there are no guidelines for how to document this either (and most of the time not really possible). See for instance map in the ruby documentation
map { |item| block } → new_ary
map → Enumerator
What is item, block and new_ary here and how are they related? There's no way to tell unless you know the implementation or can infer it from the name of the function somehow. Specifying the type is also hard since new_ary depends on what block returns, which in turn depends on the type of item, which could be different for each element in the Array.
A lot of times you also stumble across documentation that says that an argument is of type Object, Which again tells you nothing since everything is an Object.
OCaml has a solution for this, it supports structural typing so a function that needs an object with a property foo that's a String will be inferred to be { foo : String } instead of a concrete type. But OCaml is still statically typed.
Worth noting is that this can be a problem in statically typed lanugages too. Scala has very generic methods on collections which leads to type signatures like ++[B >: A, That](that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Array[T], B, That]): That for appending two collections.
So most of the time, you will just have to learn this by heart in dynamically typed languages, and perhaps help improve the documentation of libraries you are using.
And this is why I prefer static typing ;)
Edit One thing that might make sense is to do what Scala also does. It doesn't actually show you that type signature for ++ by default, instead it shows ++[B](that: GenTraversableOnce[B]): Array[B] which is not as generic, but probably covers most of the use cases. So for Ruby's map it could have a monomorphic type signature like Array<a> -> (a -> b) -> Array<b>. It's only correct for the cases where the list only contains values of one type and the block only returns elements of one other type, but it's much easier to understand and gives a good overview of what the function does.
Yes, you seem to misunderstand the concept. It's not a replacement for static type checking. It's just different. For example, if you convert objects to json (for rendering them to client), you don't care about actual type of the object, as long as it has #to_json method. In Java, you'd have to create IJsonable interface. In ruby no overhead is needed.
As for knowing what to pass where and what returns what: memorize this or consult docs each time. We all do that.
Just another day, I've seen rails programmer with 6+ years of experience complain on twitter that he can't memorize order of parameters to alias_method: does new name go first or last?
This flies in the face of Ruby's promise of intuitiveness.
Not really. Maybe it's just badly written library. In core ruby everything is quite intuitive, I dare say.
Statically typed languages with their powerful IDEs have a small advantage here, because they can show you documentation right here, very quickly. This is still accessing documentation, though. Only quicker.
Consider that the design choices of strongly typed languages (C++,Java,C#,et al) enforce strict declarations of type passed to methods, and type returned by methods. This is because these languages were designed to validate that arguments are correct (and since these languages are compiled, this work can be done at compile time). But some questions can only be answered at run time, and C++ for example has the RTTI (Run Time Type Interpreter) to examine and enforce type guarantees. But as the developer, you are guided by syntax, semantics and the compiler to produce code that follows these type constraints.
Ruby gives you flexibility to take dynamic argument types, and return dynamic types. This freedom enables you to write more generic code (read Stepanov on the STL and generic programming), and gives you a rich set of introspection methods (is_a?, instance_of?, respond_to?, kind_of?, is_array?, et al) which you can use dynamically. Ruby enables you to write generic methods, but you can also explicity enforce design by contract, and process failure of contract by means chosen.
Yes, you will need to use care when chaining methods together, but learning Ruby is not just a few new keywords. Ruby supports multiple paradigms; you can write procedural, object oriend, generic, and functional programs. The cycle you are in right now will improve quickly as you learn about Ruby.
Perhaps your concern stems from a bias towards strongly typed languages (C++, Java, C#, et al). Duck typing is a different approach. You think differently. Duck typing means that if an object looks like a , behaves like a , then it is a . Everything (almost) is an Object in Ruby, so everything is polymorphic.
Consider templates (C++ has them, C# has them, Java is getting them, C has macros). You build an algorithm, and then have the compiler generate instances for your chosen types. You aren't doing design by contract with generics, but when you recognize their power, you write less code, and produce more.
Some of your other concerns,
third party libraries (gems) are not as hard to use as you fear
Documented API? See Rdoc and http://www.ruby-doc.org/
Rdoc documentation is (usually) provided for libraries
coding guidelines - look at the source for a couple of simple gems for starters
naming conventions - snake case and camel case are both popular
Suggestion - approach an online tutorial with an open mind, do the tutorial (http://rubymonk.com/learning/books/ is good), and you will have more focused questions.

Can I define C functions that accept native Go types through CGo?

For the work I'm doing to integrate with an existing library, I ended up needing to write some additional C code to provide an interface that was usable through CGo.
In order to avoid redundant data copies, I would like to be able to pass some standard Go types (e.g. Go strings) to these C adapter functions.
I can see that there are GoString and GoInterface types defined in the header CGo generates for use by exported Go functions, but is there any way to use these types in my own function prototypes that CGo will recognise?
At the moment, I've ended up using void * in the C prototypes and passing unsafe.Pointer(&value) on the Go side. This is less clean than I'd like though (for one thing, it gives the C code the ability to write to the value).
Update:
Just to be clear, I do know the difference between Go's native string type and C char *. My point is that since I will be copying the string data passed into my C function anyway, it doesn't make sense to have the code on the Go side make its own copy.
I also understand that the string layout could change in a future version of Go, and its size may differ by platform. But CGo is already exposing type definitions that match the current platform to me via the documented _cgo_export.h header it generates for me, so it seems a bit odd to talk of it being unspecified:
typedef struct { char *p; int n; } GoString;
But there doesn't seem to be a way to use this definition in prototypes visible to CGo. I'm not overly worried about binary compatibility, since the code making use of this definition would be part of my Go package, so source level compatibility would be enough (and it wouldn't be that big a deal to update the package if that wasn't the case).
Not really. You cannot safely mix, for example Go strings (string) and C "strings" (*char) code without using the provided helpers for that, ie. GoString and CString. The reason is that to conform to the language specs a full copy of the string's content between the Go and C worlds must be made. Not only that, the garbage collector must know what to consider (Go strings) and what to ignore (C strings). And there are even more things to do about this, but let me keep it simple here.
Similar and/or other restrictions/problems apply to other Go "magical" types, like map or interface{} types. In the interface types case (but not only it), it's important to realize that the inner implementation of an interface{} (again not only this type), is not specified and is implementation specific.
That's not only about the possible differences between, say gc and gccgo. It also means that your code will break at any time the compiler developers decide to change some detail of the (unspecified and thus non guaranteed) implementation.
Additionally, even though Go doesn't (now) use a compacting garbage collector, it may change and without some pinning mechanism, any code accessing Go run time stuff directly will be again doomed.
Conclusion: Pass only simple entities as arguments to C functions. POD structs with simple fields are safe as well (pointer fields generally not). From the complex Go types, use the provided helpers for Go strings, they exists for a (very good) reason.
Passing a Go string to C is harder than it should be. There is no really good way to do it today. See https://golang.org/issue/6907.
The best approach I know of today is
// typedef struct { const char *p; ptrdiff_t n; } gostring;
// extern CFunc(gostring s);
import "C"
func GoFunc(s string) {
C.CFunc(*(*C.gostring)(unsafe.Pointer(&s)))
}
This of course assumes that Go representation of a string value will not change, which is not guaranteed.

Converting Protobuf definitions to Thrift

Are there any tools that exist to generate a Thrift interface definition from a Protobuf definition?
It appears the answer is "not yet". One problem you will face is that thrift defines a full RPC system with services and method calls, while protobuf really focuses on the datatypes and the serialization bits. Thrift's data model is a bit more restricted than protobuf's (no recursive structures, etc.), but that should not be a problem in the thrift -> protobuf direction.
Sure, you could quite easily convert all the thrift data types to protobuf definitions, while ignoring the service section entirely. You could even add something like that as a built in generator in the thrift compiler if you wanted to.
Thrift and Protobuf are not interchangeable though. Take a look at Biggest differences of Thrift vs Protocol Buffers? to see some key differences. What exactly are you trying to accomplish?
I wrote a translator to convert a subset of Thrift into Protobuf and vice-versa.
This is some Thrift code:
enum Operation{
ADD=1,SUBTRACT=2,MULTIPLY=3,DIVIDE=4
}
struct Work{1:i32 num1,2:i32 num2,4:string comment
}
which is automatically converted into this Protobuf code:
enum Operation{
ADD=1,SUBTRACT=2,MULTIPLY=3,DIVIDE=4
}
message Work{int32 num1 = 1;int32 num2 = 2;string comment = 4;
}
I don't think there is. If you're up to write code for that, you can write a language generator that produces .thrift files for you.
You can write the tool in any language (I wrote in C++ in protobuf-j2me [1], and adapted protobuf-csharp-port code in [2]).
You can have protoc.exe to call it like:
protoc.exe --plugin=protoc-gen-thrift.exe --thrift_out=. file.proto
You need to name it as protoc-gen-thrift.exe to make --thrift_out option available.
[1] https://github.com/igorgatis/protobuf-j2me
[2] https://github.com/jskeet/protobuf-csharp-port/blob/6ac38bf84452f1f9bd3df59a276708725be2ed49/src/ProtoGen/ProtocGenCs.cs
I believe there is.
Check out the non-destructive transformer I wrote protobuf-thrift, which can transform protobuf to thrift and vice-versa. What's more, it will preserve the declaration order and comments, which gives you maximum retention of source code format.
You can also try out the web interface here.
BTW, I also agreed with captncraig's answer. It's true that thrift and protobuf have many differences, such as nested type in protobuf and union type in thrift. But you can't deny that they have a lot of syntaxes in common, such as enum => enum, struct => message, service => service, which are our most used syntaxes.
So, protobuf-thrift is the tool that helps you reduce repeating work, with knowing the truth that "they are not 100% convertible".

Document duck types with multiple methods in YARD

YARD allows me to specify types for method parameters and return values. As I really like to duck type it is nice to see that YARD also supports defining types by specifying methods they must support.
As you can see here, expressions like #first_method, #second_method are interpreted as a logical disjunctions. This means an object needs to support #first_method or #second_method or both. This is not what I need.
I would like to be able to specify that an object is required to support both #first_method and #second_method for my parameter. Is there a way to specify this?
There is no idiomatic syntax for specifying a compound duck-type interface. That said, note that all type specifications are really just free-form text; the YARD types parser link given in the question is simply a set of conventional or idiomatic syntaxes for specifying common interfaces. If you think of a smart way to specify these types (and it makes sense to your users), you are free to do so. Perhaps something like #first#second might work, or #first&#second.
My recommendation, however, would be to create a concrete type to wrap this ad-hoc interface. Even if you don't actually use the type in your runtime, it would serve as object in your domain to explain the interface to your users. For example, by creating a class Foo with the two methods #first and #second, and then documenting those two methods, you could now use Foo as your type. You could explain in the Foo class documentation that it is never actually used in code and just exists as an "interface", if that's the case. You could also make Foo a "module" to denote that it is more of an "interface" (or "protocol", if you prefer that language) than a concrete type.

Resources