What is the point of google.protobuf.StringValue? - protocol-buffers

I've recently encountered all sorts of wrappers in Google's protobuf package. I'm struggling to imagine the use case. Can anyone shed the light: what problem were these intended to solve?
Here's one of the documentation links: https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/string-value (it says nothing about what can this be used for).
One thing that will be different in behavior between this, and simple string type is that this field will be written less efficiently (a couple extra bytes, plus a redundant memory allocation). For other wrappers, the story is even worse, since the repeated variants of those fields will be written inefficiently (official Google's Protobuf serializer doesn't support packed encoding for non-numeric types).
Neither seems to be desirable. So, what's this all about?

There's a few reasons, mostly to do with where these are used - see struct.proto.
StringValue can be null, string often can't be in a language interfacing with protobufs. e.g. in Go strings are always set; the "zero value" for a string is "", the empty string, so it's impossible to distinguish between "this value is intentionally set to empty string" and "there was no value present". StringValue can be null and so solves this problem. It's especially important when they're used in a StructValue, which may represent arbitrary JSON: to do so it needs to distinguish between a JSON key which was set to empty string (StringValue with an empty string) or a JSON key which wasn't set at all (null StringValue).
Also if you look at struct.proto, you'll see that these aren't fully fledged message types in the proto - they're all generated from message Value, which has a oneof kind { number_value, string_value, bool_value... etc. By using a oneof struct.proto can represent a variety of different values in one field. Again this makes sense considering what struct.proto is designed to handle - arbitrary JSON - you don't know what type of value a given JSON key has ahead of time.

In addition to George's answer, you can't use a Protobuf primitive as the parameter or return value of a gRPC procedure.

Related

Given an `RDF::Term` from the `RDF::Vocab` library, how do I infer the XSD datatype(s) I should expect?

I'm using Ruby scripts to to round-trip a SKOS vocabulary definition in Turtle format through to a spreadsheet (via CSV) and back to allow non-technical people to check and update the localised phrases. This then needs to be converted back into the Turtle format with as little non-significant churn and variation from the original as possible.
The spreadsheet has these columns (for the sake of this example):
vocab ID
term ID
property ID
value
EN
FR
...
vocab ID contains an abbreviated URI for the vocabulary, e.g. foo:bar.
property ID contains abbreviated URIs identifying a property of either the vocabulary itself or a term in it. (such as dcterm:created or dc:title for the former case; or skos:prefName or skos:altLabel for the latter). A special case of the former is base_uri, defining the vocab's base URI.
term ID contains IDs identifying a term in the vocabulary when appropriate - or it is blank for properties of the vocabulary itself.
value is an unlocalised string literal, or some other sort of literal (like a date), or an URI, as appropriate. It may be blank if the value is a localised string - the other columns then contain the translated versions of the property in various languages. The column name is the two-letter identifier for the language.
Creating the CSV is not the problem - what is a little tricky is reading back the literal values and recreating the correct literal values.
Here's the thing: I'd like to be able to infer the XSD datatype of the property from the RDF::Term for it. I can look the latter up from the abbreviated URI, using RDF::Vocab. However there seems to be no mapping I can find in these libraries to the XSD datatype (whether mandatory or merely suggested.)
This seems to mean I must create a mapping from property IDs to the XSD datatype myself if I'm to avoid ending up with all the values becoming string literals by default (which wouldn't preserve the original datatypes).
Can anyone advise if I'm correct here, or is there a way to infer the nominal XSD datatype to use using the Ruby RDF libraries?
I presume you mean an RDF::Vocabulary::Term instance, which typically is an IRI, but contains accessors for a related vocabulary definition.
The Documentation for RDF::Vocabulary::Term describes the generic accessors you can use, and for a term based on a property, you might look at either range or rangeIncludes accessors to get an idea of what the preferred values that might be used as the object of a triple using this term.
The built-in vocabularies are minimal, pretty much limited to RDF, RDFS, XSD, and OWL. Load the rdf-vocab gem, and many other vocabularies are loaded. You can also use the RDF::Vocabulary.from_graph class method to instantiate a new vocabulary, including its term definitions, from a graph.
For example, see the following:
require 'rdf/vocab'
RDF::Vocab::SCHEMA.name.rangeIncludes # => [RDF::Vocab::SCHEMA.Text]
RDF::Vocab::FOAF.name.range # => [RDF::RDFS.Literal]
Other common accessors correspond to basic RDFS, OWL, SKOS, and schema.org annotation properties. Or, you can access an arbitrary annotation property using #attribute_value and #properties accessors.
In some cases, property range may be more complex, take for example, the term definition for skos:member:
property :member,
definition: "Relates a collection to one of its members.".freeze,
domain: "http://www.w3.org/2004/02/skos/core#Collection".freeze,
isDefinedBy: "http://www.w3.org/2004/02/skos/core".freeze,
label: "has member".freeze,
range: term(
type: "http://www.w3.org/2002/07/owl#Class".freeze,
unionOf: list("http://www.w3.org/2004/02/skos/core#Concept".freeze, "http://www.w3.org/2004/02/skos/core#Collection".freeze)
),
type: ["http://www.w3.org/1999/02/22-rdf-syntax-ns#Property".freeze, "http://www.w3.org/2002/07/owl#ObjectProperty".freeze]
Additionally, the rdf-reasoner gem can form entailments over vocabularies to provide additional domain and range (and other) information based on subProperty hierarchies (as well as class hierarchies for rdf:type).

Can anyone explain the difference between Uuid::generate and DB::generateKey?

Without thinking too hard about it I created a column of type [UUID] and successfully stored "strings" (as noted in the documentation, and generally referred to as a separate type altogether) returned from DB::generateKey in it.
Feels like I've done something I shouldn't have.
Can anyone share some light on this. Thanks in advance.
Mostly they return different types.
For clarity, DB::generateKey is equivalent to Uuid::generate |> toString
According to the standard library docs, it's the return type.
Uuid::generate() -> UUID
Generate a new UUID v4 according to RFC 4122
DB::generateKey() -> Str
Returns a random key suitable for use as a DB key
I believe the UUID type is a bitstring representation, that is, a specific sequence of bits in memory.
Whereas the Str type is a string representation of a UUID.

Is there an off the shelf binary format that allows string caching

I am investigating migrating of a highly customized and efficient binary format to one of the available binary formats. The data is stored on some low powered mobile among other places, so performance is important requirement.
Advantage of the current format is that all strings are stored in a pool. This means that we don't repeat the same string hundred of times in file, we read it only once during deserialization and all objects are referencing it by its index. It also means that we keep only one copy in memory. So a lot of advantages :)
I was not able to find a way for capnproto or flatbuffers to support this. Or would I need to build layer on top, and in generated object use integer index to strings explicitly?
Thanks you!
FlatBuffers supports string pooling. Simply serialize a string once, then refer to that string multiple times in other objects. The string will only occur in memory once.
Simplest example, schema:
table MyObject { name: string; id: string; }
code (C++):
FlatBufferBuilder fbb;
auto s = fbb.CreateString("MyPooledString");
// Both string fields point to the same data:
auto o = CreateMyObject(fbb, s, s);
fbb.Finish(o);
You can always do this manually like:
struct MyMessage {
stringTable #0 :List(Text);
# Now encode string fields as integer indexes into the string table.
someString #1 :UInt32;
otherString #2 :UInt32;
}
Cap'n Proto could in theory allow multiple pointers to point at the same object, but currently prohibits this for security reasons: it would be too easy to DoS servers that don't expect it by sending messages that are cyclic or contain lots of overlapping references. See the section on amplification attacks in the docs.

Why are there no custom default values in proto3?

The proto2 version of Protocol Buffers allows to specify default values for message elements:
optional double scaling_factor = 3 [default = 1.0];
Why is this no longer possible in proto3? I consider this a neat feature to save additional bytes on the wire without the need of writing any wrapper code.
My understanding is that proto3 no longer allows you to detect field presence and no longer supports non-zero default values because this makes it easier to implement protobufs in terms of "plain old structs" in various languages, without the need to generate accessor methods. This is perceived as making Protobuf easier to use in those languages.
(I personally think that languages which lack accessors and properties aren't very good languages and protobuf should not design down to them, but it's not my project anymore.)
This is a work around instead of a direct answer to your question, but I've found myself using wrappers.proto optional values and then setting the default value myself programatically when I absolutely must know if this was a default value or a value that was explicitly set.
Not optimal that your code has to enforce the value instead of the generated code itself, but if you own both sides, at least it's a viable alternative versus having no idea if the value was the default or explicity set as such, especially when looking at a bool set to false.
I am unclear how this affects bytes on the wire. For the instances where I've used it, message length was not a design constraint.
Proto File
import "google/protobuf/wrappers.proto";
google.protobuf.BoolValue optional_bool = 1;
Java code
//load or receive message here
if( !message.hasOptionalBool() )
message.setOptionalBool( BoolValue.newBuilder().setValue( true ) );
In my autogenerated file .pb.cc I see few places like this:
if (this->myint() != 0) {
and few like this:
myint_ = 0;
So, why not to enable default value and generate
static ::google::protobuf::int32 myint_defaultvalue = 5;
...
if (this->myint() != myint_defaultvalue) {
...
...
myint_ = myint_defaultvalue;
...
instead?

Type mismatch error while reading lotus notes document in vb6

Am trying to read the lotus notes document using VB6.I can able to read the values of the but suddenly type mismatch error is throwed.When i reintialise the vb6 variable it works but stops after certain point.
ex; address field in lotus notes
lsaddress=ImsField(doc.address)
private function ImsField(pValue)
ImsField=pValue(0)
end function
Like this I am reading the remaining fields but at certain point the runtime error "13" type mismatch error throwed.
I have to manually reintialize by
set doc=view.getdocumentbykey(doclist)
The type mismatch error occurs for a certain field. The issue should be a data type incompatibility. Try to figure out which field causes the error.
Use GetItemValue() instead of short notation for accessing fields and don't use ImsField():
lsaddress=doc.GetItemValue("address")(0)
The type mismatch is occurring because you are encountering a case where pValue is not an array. That will occur when you attempt to reference a NotesItem that does not exist. I.e., doc.MissingItem.
You should not use the shorthand notation doc.itemName. It is convenient, but it leads to sloppy coding. You should use getItemValue as everyone else is suggesting, and also you should check to see if the NotesItem exists. I.e.,
if doc.hasItem("myItem") then
lsaddress=doc.getItemValue("myItem")(0)
end if
Notes and Domino are schema-less. There are no data integrity checks other than what you write yourself. You may think that the item always has to be there, but the truth is that there is nothing that will ever guarantee that, so it is always up to you to write your code so that it doesn't assume anything.
BTW: There are other checks that you might want to perform besides just whether or not the field exists. You might want to check the field's type as well, but to do that requires going one more level up the object chain and using getFirstItem instead of getItemValue, which I'm not going to get into here. And the reason, once again, is that Notes and Domino are schema-less. You might think that a given item must always be a text list, but all it takes is someone writing sloppy code in an one-time fix-it agent and you could end up having a document in which that item is numeric!
Checking your fields is actually a good reason (sometimes) to encapsulate your field access in a function, much like the way you have attempted to do. The reason I added "sometimes" above is that your code's behavior for a missing field isn't necessarily always going to be the same, but for cases where you just want to return a default value when the field doesn't exist you can use something like this:
lsaddress ImsField("address","")
private function ImsField(fieldName,defaultValue)
if doc.hasItem(fieldName) then
lsaddress=doc.getItemValue(fieldName)(0)
else
lsaddress=defaultValue
end if
end function
Type mismatch comes,
When you try to set values from one kind of datatype variable to different datatype of another variable.
Eg:-
dim x as String
Dim z as variant
z= Split("Test:XXX",":")
x=z
The above will through the error what you mentioned.
So check the below code...
lsaddress = ImsField(doc.address)
What is the datatype of lsaddress?
What is the return type of ImsField(doc.address)?
If the above function parameter is a string, then you should pass the parameter like (doc.address(0))

Resources