How are "long" type integers handled in Hyperledger Composer transaction processor functions? - hyperledger-composer

When defining a Hyperledger Composer model, fields can be designated to have the type long, which is implemented as an int64.
How are long values passed onto transaction processor functions, when int64 is not natively supported by Node.js? Is it converted to a Number? If so, wouldn't that mean that it is effectively downgraded to 52 bits of precision?

Looking at preserve int64 values when parsing json in Go, it would seem that even if it is stored internally as an int64 and serialized into JSON from golang as such, Node.js would indeed parse this into a regular Number thereby losing precision.

Related

Is it ok to have 100s of fields in a protobuf message?

We are developing a set of C++ applications that exchange data through protobuf messages. One of the messages that we want to exchange contains a list of type-value pairs. The type is just an integer, the value can be a number of different data types, both basic ones like integer or string, but also more complex ones like ip addresses or prefixes. But for every specific type, there is only one data type allowed for the value.
type
value data type
1
string
2
integer
3
list<ip_addr>
4
integer
5
struct
6
string
...
...
Note: one of the communicating apps will ultimately encode this list of type-value pairs into a byte array in a network packet according to a fixed protocol format.
There are a few ways to encode this into a protobuf message, but we're currently leaning towards creating a protobof message for each type number separately:
message Type1
{
string value = 1;
}
message Type2
{
integer value = 1;
}
message Type3
{
repeated IpAddr value = 1;
}
...
message TVPair
{
oneof type
{
Type1 type_1 = 1;
Type1 type_2 = 2;
Type1 type_3 = 3;
...
}
}
message Foo
{
repeated TVPair tv_pairs = 1;
}
This is clear and easy to use for all applications and it hides the details of the network protocol encoding in the only app that actually needs to take care of it.
The only worry I have is that the list of Type numbers is in the order of a few 100 items. This means a few 100 protobuf messages need to be defined and the oneof structure in the TVPair message will contain that amount of members. I know the field numbers in protobuf messages can be a lot higher (~500.000.000) so that's not really an issue. But are there any downsides to having 100's of fields in a single protobuf message?
The comment from #DazWilkin pointed me towards some best practices in the protocol buffers documentation website:
Don’t Make a Message with Lots of Fields
Don’t make a message with “lots” (think: hundreds) of fields. In C++ every field adds roughly 65 bits to the in-memory object size whether it’s populated or not (8 bytes for the pointer and, if the field is declared as optional, another bit in a bitfield that keeps track of whether the field is set). When your proto grows too large, the generated code may not even compile (for example, in Java there is a hard limit on the size of a method ).
Large Data Sets
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are a collection of small pieces, where each small piece is structured data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you want something more like a database. Each solution should be developed as a separate library, so that only those who need it need pay the costs.
So although it might be technically possible, it is not advised to create big messages with lots of fields.

What is Marshal and unmarshal in Golang Proto?

I understand that we need to marshall object to serialise it, and unMarshall to deserialise it. However my question is, when do we invoke marshal and unmarshal and why do we serialise the objects if we are going to deserialise it again soon?
PS I just started to learn Go and Proto so would really appreciate your help. Thank you.
Good question!
Marshaling (also called serializing) converts a struct to raw bytes. Usually, you do this when you're sending data to something outside your program: you might be writing to a file or sending the struct in an HTTP request body.
Unmarshaling (also called deserializing) converts raw bytes to a struct. You do this when you're accepting data from outside your program: you might be reading from a file or an HTTP response body.
In both situations, the program sending the data has it in memory as a struct. We have to marshal and unmarshal because the recipient program can't just reach into the sender's memory and read the struct. Why?
Often the two programs are on different computers, so they can't access each other's memory directly.
Even for two processes on the same computer, shared memory is complex and usually dangerous. What happens if you're halfway through overwriting the data when the other program reads it? What if your struct includes sensitive data (like a decrypted password)?
Even if you share memory, the two programs need to interpret the data in the same way. Different languages, and even different versions of Go, represent the same struct differently in memory. For example, do you represent integers with the most-significant bit first or last? Depending on their answers, one program might interpret 001 as the integer 1, while the other might interpret the same memory as the integer 4.
Nowadays, we're usually marshaling to and unmarshaling from some standardized format, like JSON, CSV, or YAML. (Back in the day, this wasn't always true - you'd often convert your struct to and from bytes by hand, using whatever logic made sense to you at the time.) Protobuf schemas make it easier to generate marshaling and unmarshaling code that's CPU- and space-efficient (compared to CSV or JSON), strongly typed, and works consistently across programming languages.

Hadoop own data types

I have been using hadoop for quite a time now but I'm not sure why Hadoop uses its own data types and not Java data types ? I have been searching for same thing over internet but nothing helped. please help.
Short answer is because of the serialization & deserialization performance that they provide.
Long version:
The primary benefit of using Writables (Hadoop's data types) is in their efficiency. Compared to Java serialization, which would have been an obvious alternative choice, they have a more compact representation. Writables don't store their type in the serialized representation, since at the point of deserialization it is known which type is expected.
Here is a more detailed excerpt from Hadoop Definitive Guide:
Java serialization is not compact, classes that implement java.io.Serializable or java.io.Externalizable write their classname and the object representation to the stream. Subsequent instances of the same class write a reference handle to the first occurrence, which occupies only 5 bytes. However, reference handles don't work well with random access, because the referent class may occur at any point in the preceding stream - that is, there is state stored in the stream. Even worse, reference handles play havoc with sorting records in a serialized stream, since the first record of a particular class is distinguished and must be treated as a special case. All these problems can be avoided by not writing the classname to the stream at all, which is the approach Writable takes. The result is that the format is considerably more compact than Java serialization, and random access and sorting work as expected because each record is independent of the others (so there is no stream state).

GobEncoder for Passing Anonymous Function via RPC

I'm trying to build a system that will execute a function on multiple machines, passing the function anonymously via RPC to each worker machine (a la MapReduce) to execute on some subset of data. Gob doesn't support encoding functions, though the docs for GobEncoder say that "A type that implements GobEncoder and GobDecoder has complete control over the representation of its data and may therefore contain things such as private fields, channels, and functions, which are not usually transmissible in gob streams" so it seems possible.
Any examples of how this might work? I don't know much about how this encoding/decoding should be done with Gob.
IMHO this won't work. While it is true that if your type implements
Gob{En,De}coder it can (de)serialize unexported fields of structs it is still impossible to (de)serialize code: Go is statically compiled and linked without
runtime code generation capabilities (which would circumvent compile time type
safety).
Short: You cannot serialize functions, only data. Your workers must provide
the functions you wan't to execute. Take a look at encoding/rpc.
You may want to try GoCircuit, which provides a framework that basically lets you do this:
http://www.gocircuit.org/
It works by copying your binary to the remote machine(s), starting it, then doing an RPC that effectively says "execute function X with args A, B, ..."

Is 'handle' synonymous to pointer in WinAPI?

I've been reading some books on windows programming in C++ lately, and I have had some confusing understanding of some of the recurring concepts in WinAPI. For example, there are tons of data types that start with the handle keyword'H', are these supposed to be used like pointers? But then there are other data types that start with the pointer keyword 'P'. So I guess not. Then what is it exactly? And why were pointers to some data types given separate data types in the first place? For example, PCHAR could have easily designed to be CHAR*?
Handles used to be pointers in early versions of Windows but are not anymore. Think of them as a "cookie", a unique value that allows Windows to find back a resource that was allocated earlier. Like CreateFile() returns a new handle, you later use it in SetFilePointer() and ReadFile() to read data from that same file. And CloseHandle() to clean up the internal data structure, closing the file as well. Which is the general pattern, one api function to create the resource, one or more to use it and one to destroy it.
Yes, the types that start with P are pointer types. And yes, they are superfluous, it works just as well if you use the * yourself. Not actually sure why C programmers like to declare them, I personally think it reduces code readability and I always avoid them. But do note the compound types, like LPCWSTR, a "long pointer to a constant wide string". The L doesn't mean anything anymore, that dates back to the 16-bit version of Windows. But pointer, const and wide are important. I do use that typedef, not doing so will risk future portability problems. Which is the core reason these typedefs exist.
A handle is the same as a pointer only so far as both ID a particular item. Obviously a pointer is the address of the item so if you know it's structure you can start getting fields in the item. A handle may or may not be a pointer - basically if it is a pointer you don't know what it is pointing to so you can't get into the fields.
Best way to think of a handle is that it is a unique ID for something in the system. When you pass it to something in the system the system will know what to cast it to (if it is a pointer) or how to treat it (if it is just some id or index).

Resources