Protobuf parse text - ruby

I have the following protobuf text and I am using google-protobuf to parse it but I'm not sure how to do it.
# HELP Type about service.
# TYPE gauge
metadata_server1{namespace="default",service="nginx"} 1
metadata_server2{namespace="default",service="operator"} 1
metadata_server3{namespace="default",service="someservice"} 1
...
Whenever I try to decode it, I get this error:
/usr/lib/ruby/gems/2.3.0/gems/protobuf-3.8.3/lib/protobuf/decoder.rb:21:in `decode_each_field'
This is how I am trying to decode it:
class Metrics < ::Protobuf::Message
required :string, :namespace, 1
required :string, :value, 2
required :string, :map, 3
end
class Message < ::Protobuf::Message
repeated Metrics, :metrics, 1
end
data = get_data('http://localhost:8080/')
parsed_data = Metrics.decode(data)
puts parsed_data.metrics //does not work
Does anyone know how I can parse this?

Your data is not a Protobuf. Protobuf is a binary format, not text, so it would not be human-readable like the data you are seeing. Technically, Protobuf has an alternative text representation used for debugging, but your data is not that format either.
Instead, your data appears to be Prometheus text format, which is not a Protobuf format. To parse this, you will need a Prometheus text parser. Usually, only Prometheus itself consumes this format, so not a lot of libraries for parsing it are available (whereas there are lots of libraries for creating it). The format is pretty simple, though, and you could probably parse it with a suitable regex.
Some servers which export Prometheus metrics also support exporting it in an alternative Protobuf-based format. If your server supports that, you can request it by sending the header:
Accept: application/vnd.google.protobuf; proto=io.prometheus.client.MetricFamily; encoding=delimited
If you send that in the request, you might get a Protobuf-based format back, if the server supports it. Note that the Protobuf format is deprecated and removed in Prometheus 2, so fewer servers are likely to support it these days.
If your server does support this format, note that the result is still not a plain Protobuf. Rather, it is a collection of Protobufs in "delimited" format. Each Protobuf is prefixed by a varint-encoded length ("varint" is Protobuf's variable-width integer encoding). In C++ or Java, there are "parseDelimitedFrom" functions you can use to parse this format, but it looks like Ruby does not have built-in support currently.

Related

What is the point of google.protobuf.StringValue?

I've recently encountered all sorts of wrappers in Google's protobuf package. I'm struggling to imagine the use case. Can anyone shed the light: what problem were these intended to solve?
Here's one of the documentation links: https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/string-value (it says nothing about what can this be used for).
One thing that will be different in behavior between this, and simple string type is that this field will be written less efficiently (a couple extra bytes, plus a redundant memory allocation). For other wrappers, the story is even worse, since the repeated variants of those fields will be written inefficiently (official Google's Protobuf serializer doesn't support packed encoding for non-numeric types).
Neither seems to be desirable. So, what's this all about?
There's a few reasons, mostly to do with where these are used - see struct.proto.
StringValue can be null, string often can't be in a language interfacing with protobufs. e.g. in Go strings are always set; the "zero value" for a string is "", the empty string, so it's impossible to distinguish between "this value is intentionally set to empty string" and "there was no value present". StringValue can be null and so solves this problem. It's especially important when they're used in a StructValue, which may represent arbitrary JSON: to do so it needs to distinguish between a JSON key which was set to empty string (StringValue with an empty string) or a JSON key which wasn't set at all (null StringValue).
Also if you look at struct.proto, you'll see that these aren't fully fledged message types in the proto - they're all generated from message Value, which has a oneof kind { number_value, string_value, bool_value... etc. By using a oneof struct.proto can represent a variety of different values in one field. Again this makes sense considering what struct.proto is designed to handle - arbitrary JSON - you don't know what type of value a given JSON key has ahead of time.
In addition to George's answer, you can't use a Protobuf primitive as the parameter or return value of a gRPC procedure.

Why do CSV::HeaderConverters stop processing when a non-String is returned?

Why does processing of header converters stop with the first non-String that's returned from a header converter?
Details
After the built-in :symbol header converter is triggered, no other converters will be processed. It seems that processing of header converters stops with the first converter that returns anything that's not a String (i.e. same behavior if you write a custom header converter that returns a Fixnum, or anything else).
This code works as expected, throwing the exception in :throw_an_exception
require 'csv'
CSV::HeaderConverters[:throw_an_exception] = lambda do |header|
raise 'Exception triggered.'
end
csv_str = "Numbers\n" +
"1\n" +
"4\n" +
"7"
puts CSV.parse(
csv_str,
{
headers: true,
header_converters: [
:throw_an_exception,
:symbol
]
}
)
However, if you switch the order of the header converters so that the :symbol converter comes first, the :throw_an_exception lambda is never called.
...
header_converters: [
:symbol,
:throw_an_exception
]
...
So I reached out to JEG2.
I was thinking that converters were intended to be a series of steps in a chain, where all elements were supposed to go through every step. In fact, that's not the way to best use the CSV library, especially if you have a very large amount of data.
The way it should be used (and this is the answer to the "why" question and the explanation for why this is better for performance) is to have the converters work like a series of matchers, where the first matched converter returns a non-String, which indicates to the CSV library that the current value has been converted successfully. When you do that, the parser can stop as soon as it's a non-String, and move on to the next header/cell value.
In this way you remove a TON of overhead when parsing CSV data. The larger the file you're processing, the more overhead you eliminate.
Here is the email response I got back:
...
The converters are basically a pipeline of conversions to try. Let's say you're using two converters, one for dates and one for numbers. Without a linked line, we would try both for every field. However, we know a couple of things:
An unconverterd CSV field is a String, because that's how we read it in
A field that is now a non-String, has been converted, so we can stop searching for a converter that matches.
Given that, the optimization helps our example skip checking the number converter if we already have a Date object.
...
For unknown reason CSV#convert_fields function has a hilarious
break unless field.is_a? String # short-circuit pipeline for speed
line in converters.each. I doubt I could suggest anything better than monkeypatching this function, but the cause is clear now.

Primitive type as data structure for API Blueprint

I want to use primitive type for describe data structure. Like so:
# Data Structures
## Video Delete (enum[number])
+ `0` - Successful deletion.
+ `1` - Error occured.
And the output is.
{
"enum": [
1,
0
],
"$schema": "http://json-schema.org/draft-04/schema#"
}
So description is missing. I've tried to put description in different places. I did a lot of things (do not wanna talk about them). Also I've tried to add info to enum values like so:
+ `0` (number) - Successful deletion.
I do not know whether this problem deals with MSON syntax or Aglio generator.
The syntax above is supported by MSON as far as I can tell. The problem is that Aglio doesn't do anything with the description, and when I went to look into adding it I realized that it isn't really supported in JSON Schema. There seem to be two methods people use to get around that fact:
Add the enumerated value descriptions to the main description, the Olio theme 1.6.2 has support for this but the C++ parser seems to still have some bugs around this feature:
## Video Delete (enum[number]) - 0 for success, 1 for error
Use a weird oneOf syntax where you create sets of single enums with a description. I don't recommend this.
Unfortunately the first option requires work on your part and can't easily be done in Aglio. Does anyone else have a better description and some samples of MSON input -> JSON Schema output?

Compressing large string in ruby

I have a web application(ruby on rails) that sends some YAML as the value of a hidden input field.
Now I want to reduce the size of the text that is sent across to the browser. What is the most efficient form of lossless compression that would send across minimal data? I'm ok to incur additional cost of compression and decompression at the server side.
You could use the zlib implementation in the ruby core to in/de-flate data:
require "zlib"
data = "some long yaml string" * 100
compressed_data = Zlib::Deflate.deflate(data)
#=> "x\x9C+\xCE\xCFMU\xC8\xC9\xCFKW\xA8L\xCC\xCDQ(.)\xCA\xCCK/\x1E\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15\x1C\x15D\x15\x04\x00\xB3G%\xA6"
You should base64-encode the compressed data to make it printable:
require 'base64'
encoded_data = Base64.encode64 compressed_data
#=> "eJwrzs9NVcjJz0tXqEzMzVEoLinKzEsvHhUcFRwVHBUcFRwVHBUcFUQVBACz\nRyWm\n"
Later, on the client-side, you might use pako (a zlib port to javascript) to get your data back. This answer probably helps you with implementing the JS part.
To give you an idea on how effective this is, here are the sizes of the example strings:
data.size # 2100
compressed_data.size # 48
encoded_data.size # 66
Same thing goes vice-versa when compressing on the client and inflating on the server.
Zlib::Inflate.inflate(Base64.decode64(encoded_data))
#=> "some long yaml stringsome long yaml str ... (shortened, as the string is long :)
Disclaimer:
The ruby zlib implementation should be compatible with the pako implementation. But I have not tried it.
The numbers about string sizes are a little cheated. Zlib is really effective here, because the string repeats a lot. Real life data usually does not repeat as much.
If you are working on a Rails application, you can also use the ActiveSupport::Gzip wrapper that allows compression/decompression of strings with gzip.
compressed_log = ActiveSupport::Gzip.compress('large string')
=> "\x1F\x8B\b\x00yq5c\x00\x03..."
original_log = ActiveSupport::Gzip.decompress(compressed_log)
=> "large string"
Behind the scenes, the compress method uses the Zlib::GzipWriter class which writes gzipped files. Similarly, the decompress method uses Zlib::GzipReader class which reads a gzipped file.

BSON to Messagepack

The problem that I am facing is that BSON comes with ObjectId and Timestamp which are not supported in Messagepack and it aint possible to define a custom serializer for Messagepack (at least as far as I know).
I wrote a piece of python code to compare pymongo's BSON vs msgpack. With not much of optimization I could achieve 300% performance improvement.
So, is there any way to convert BSON to Messagepack?
Here is how I solved the problem.
Unfortunately since mongodb none-REST API doesn't come with a Strict, or JS mode for document retrieval (as opposed to its REST API in which you could specify the format you wanna use to retrieve a document), we are left with no option but to do the conversion manually.
import json
from bson import json_util
import msgpack
con = Connection()
db = con.test
col = db.collection
d = col.find().limit(1)[0]
s = json.dumps(d, default=json_util.default) # s is in JSON compatibale format (ObjcetId => '$0id'
packer= msgpack.Packer()
packer.pack(s) # messagepack can successfully convert since the format is JSON compatible.
The awesome observation is that even with one extra step of json.dumps, Messagepack serializer is faster than BSON encode, not 3 times though. For 10000 repetition the difference is three tenth of a second.

Resources