Is schema’s backward & forward compatibility better than optional fields? - protocol-buffers

I’m using JsonSchema and I wonder what’s the need for technologies like Confluent Schema Registry, since their “killer feature” (other than version control which I can achieve in Git), is the backward & forward compatibility.
I’d like to know why is this feature worth something, when I can easily add / remove fields anywhere in a micro services architecture as long as they’re optional (the same applies to Protobuf). If so, then what’s the use case for backward & forward compatibility?

Related

How to define compatible logs

I have the chance to influence the log format of a logging solution we are about to set up for an existing backend system. It is not open-telemetry based and may never be, but at the moment I can still make suggestions and would like to make sure the logs are written in a compatible format. Is there some kind of overview or definition I can use as a base? Some kind of list of mandatory fields the need to be filled?
I see you found the data model (https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/logs/data-model.md) in the specification - keep in mind, logging support for OpenTelemetry is currently not stable and so this may change. Generally, I suspect that if you use something like the Elastic Common Schema (https://www.elastic.co/guide/en/ecs/master/ecs-log.html) then you should be broadly compatible going forward.

Heroku and Elasticsearch - which add-on to use?

I plan to use Elasticsearch on heroku.
I was looking for the best option of Elasticsearch add-on I can use.
Found was my first choice from the following reasons:
It is now part of elastic.
When using Elasticsearch on heroku it will be opened to the world - a secure wrapper to the transport client was introduced - https://github.com/foundit/elasticsearch-transport-module/
But it looks like this repository is not highly maintained, and Elasticseach 1.5 is the latest version which is supported.
What is the recommended add-on then?
If I want to use the latest version of Elasticsearch I am doomed to use an unsecure connection?
Maybe use the official java client?
Nick with Bonsai here. Based on your question, and my own obvious bias, I'll suggest Bonsai for the following reasons:
All of our clusters have SSL with basic auth to secure the connection. We feel pretty strongly that security comes as a standard feature.
We were the first hosted Elasticsearch provider, ever. (And one of the first addon providers on Heroku, ever, with our first search addon, Websolr.) So we've got plenty of experience hosting search and and thousands of other happy Heroku customers.
One definite tradeoff with using Bonsai is that we're generally always going to lag a bit behind the latest version of ES. As of this posting we're still running ES 1.7, but updates to ES 2.2 are just around the corner.
This is probably going to be true in the future as well. Part of the reason for this is that we're a small, bootstrapped company, and we have to be pragmatic in where we focus our engineering efforts. Plus as an operations company with thousands of businesses, we like to let major new upgrades spend a few months in the wild before we commit to supporting it.
We also work hard on providing managed upgrades, at least for versions that are sufficiently backwards compatible. Everyone has their tools for helping to manage upgrades, but I don't think any of the other providers do actual in-place upgrades.
Unless you have a hard requirement for a specific feature in 2.x (and if you do, please let me know) you may do fine on 1.7 until our 2.x support is fully baked. Drop us a line at info#bonsai.io to get whitelisted for the first release of that in the coming weeks.

ZeroMQ vs Crossroads I/O

I am looking into using ZeroMQ as the messaging/transport layer for a fairly large distributed system, mainly targeting monitoring and data collection (many producers, a few consumers).
As far as I can see there are currently two different implementations of the same concept; ZeroMQ and Crossroads I/O, the latter being a fork of ZeroMQ (in 2012?).
I am trying to figure out which one to use and wonder about the differences between them, but have so far not found much information regarding this.
For example:
Are they compatible on the wire?
Are they API compatible, i.e. some kind of common base API, possibly with different add-ons?
Do they both implement support for ZMTP (ZeroMQ Message Transport Protocol)?
Do they share some kind of common understanding of future development or will they continue in two separate and possible different directions?
What are the pros/cons in relation to the other?
Basically, how do one choose one over the other?
Crossroads.io is pretty dead since Martin Sustrik has started on a new stack, in C, called nano: https://github.com/250bpm/nanomsg
Crossroads.io does not, afaik, implement ZMTP/1.0 nor ZMTP/2.0 but its own version of the protocol.
Nano has pluggable transports and we'll probably make a ZMTP transport for that. Nano is really nice, a rethinking of the original libzmq library, and if it's successful would make a good new kernel.
Ideally, Nano would interoperate both at the API and the protocol level, so be a pluggable replacement for libzmq. It does have quite a long way to go, though.
Note that there are now several rewrites of libzmq emerging, including JeroMQ (Java) and NetMQ (C#). These two do implement ZMTP/1.0 and ZMTP/2.0 properly. There are also other libraries like Axon (https://github.com/visionmedia/axon) which are heavily inspired by 0MQ but not compatible.
Based on experience, users value interoperability more than almost anything else, so it's quite likely that different 0MQ-like stacks will end up speaking the same protocols.

Do any open-source standalone restful image servers exist?

I'm planning to develop a standalone restful Image Server with the following functionality, but first would like to know if something similar already exists in the open source world (language not important):
restful (crud) on master image, e.g: /GET/asd983249as
possibly bulk-gets / LIST
support for metadata (Creative commons info, dimensions, etc.) that directly relates to the image (references from the domain to these images is NOT included)
restful lazy-get of different 'renditions' of an image. i.e if a rendition doesn't exist, it is created upon request. Obviously the original image needs to exist. Different operations are allowed (resize and crop to begin with)
e.g: /GET/asd983249as/100x100 (simple resize)
allowed dimensions are configurable, so not to get DoS'ed (not as quickly anyway)
Non functional:
Reasonable performant / Scalable / HA (yeah I know this doesn't say anything really)
Possibly in-mem caching
Thinking about going the Mongo GridFS route, getting MongoDb sharding and replication almost for free. Putting Nginx in front, perhaps (in part) directly using nginx-gridfs (see below) should allow for the rest-stuff and, with some config, some simple caching if gridfs can't handle that for itself (don't know)
Sources:
nginx-gridfs
http://www.coffeepowered.net/2010/02/17/serving-files-out-of-gridfs/
Idea of lazy-gets (and a simple implementation of what I'm looking for, although it seemed more hobbyish than an actively maintained project)
http://sumitbirla.com/2011/11/how-to-build-a-scalable-caching-resizing-image-server/
other stuff that comes close, but isn't an end solution
https://github.com/adamdbradley/foresight.js/wiki/Server-Resizing-Images
Anything that already does this?
I would recommend you this project:
https://github.com/imbo/imbo
Its easy to use, stable and used in big projects.
But I am still curious about alternatives.
I was looking for options for a project, and I found those two below. They are not a perfect match to your requirements but seem quite mature. I have no experience with them yet, though.
https://imageresizing.net/ Essential edition is open source. The more advanced solutions are not.
http://thumborize.me/ (with associated github) has many interesting features like face detection, new codecs, smart cropping.

What is a good choice for Fulltext indexing when developing a OSX application?

Hy,
I'm implementing an IMAP client as a Mac OSX application using MacRuby.
For the sake of offline availability, I wanted to allow fulltext indexing and attribute based indexing of all messages. Attributes include common E-Mail stuff like from:, to:, etc...
This would allow for advanced results sprinkled with faceting, analytic calculations and such.
Now I'm unsure about the choices and good practices when it comes to integrating such a search feature. I have a strong web development background, therefore my intuitive action would be to setup a Solr server and start feeding it with data. This might just work theoretically, as I could write an Agent that manages the solr instance for my application in the background. But to me, this approach seems like an infrastructure hassle.
On the other side, I've read about people using the FTS3 functionality from SQLite. This approach is easily accessible by CoreData. I haven't used SQLite's FTS3 but I don't think it is as powerful as Solr can be.
What is your weapon of choice for a use case like mine?
I'm mainly interested in solutions that are actually in use by Objective-C/Cocoa/MacRuby developers.
In you're going to develop the app with Ruby give a try to picky. It is very simple to use.
There is an Objective-C Lucene port
http://svn.gna.org/viewcvs/etoile/trunk/Etoile/Frameworks/LuceneKit/
I have not used it, but in your situation, I'd at least check it out. In my experience, SQL based full text search can't compete with Lucene, but haven't tried SQLite for this.
EDIT: just noticed the ruby tag -- this started out as port of Lucene
https://github.com/dbalmain/ferret

Resources