ElasticSearch as EventStore - elasticsearch

I would like to ask you about your opinion and what you see as cons and pros to use ElasticSearch as EventStore.
I would like to hear if somebody had experience with using ElasticSearch as event store and what was the results, reliability and if there was any issues.

eta: Reminded of this post by a no-explanation-down-voter... This Kafka is not for Event Sourcing post is pretty useful when thinking through this sort of a question.
Elastic search is simply not designed to be an authoritative persistent store for actual app state.
Neither is Redis.
Neither is Kafka.
All three may potentially be useful in the context of an app which does employ an event store.
May I suggest reading a book like NoSql distilled to get an idea of selection criteria in this space so you'll know how to select something appropriate.
Also for a question like this to be more answerable (in general questions like this are considered subjective and get closed), it's important to supply context as to what sort of stuff you're going to maintain, what sort of access patterns, what sort of scale and/or any other context. A laundrly list of contextless pro/cons is not going to help much.

Related

Event model or schema in event store

Events in an event store (event sourcing) are most often persisted in a serialized format with versions to represent a changed in the model or schema for an event type. I haven't been able to find good documentation showing the actual model or schema for an actual event (often data table in event store schema if using a RDBMS) but understand that ideally it should be generic.
What are the most basic fields/properties that should exist in an event?
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Any well defined examples would be greatly appreciated.
I've contemplated using json-api as a specification for my events but perhaps that's too "heavy". The benefits I see are flexibility and maturity.
Am I heading down the "wrong path"?
Don't overlook forward and backward compatibility.
You should plan to review Greg Young's book on event versioning; it doesn't directly answer your question, but it does cover a lot about the basics of interpreting an event.
Short answer: pretty much everything is optional, because you need to be able to change it later.
You should also review Hohpe's Enterprise Integration Patterns, in particular his work on messaging, which details a lot of cases you may care about.
de Graauw's Nobody Needs Reliable Messaging helped me to understan an important point.
To summarize: if reliability is important on the business level, do it on the business level.
So while there are some interesting bits of meta data tracking that you may want to do, the domain model is really only going to look at the data; and that is going to tend to be specific to your domain.
You also have the fun that the representation of events that you use in the service that produces them may not match the representation that it shares with other services, and in particular may not be the same message that gets broadcast.
I worked through an exercise trying to figure out what the minimum amount of information necessary for a subscriber to look at an event to understand if it cares. My answers were an id (have I seen this specific event before?), a token that tells you the semantic meaning of the message (is that something I care about?), and a location (URI) to get a richer representation if it is something I care about.
But outside of the domain -- for example, when you are looking at the system as a whole trying to figure out what is going on, having correlation identifiers and causation identifiers, time stamps, signatures of the source location, and so on stored in a consistent location in the meta data can be a big help.
Just modelling with basic types that map to Json to write as you would for an API can go a long way.
You can spend a lot of time generating overly complex models if you throw too much tooling at it - things like Apache Thrift and/or Protocol Buffers (or derived things) will provide all sorts of IDL mechanisms for you to generate incidental complexity with.
In .NET land and many other platforms, if you namespace the types you can do various projections from the types
Personally, I've used records and DUs in F# as a design and representation tool
you get intellisense, syntax hilighting, and types you can use from F# or C# for free
if someone wants to look, types.fs has all they need

Use Cases of NIFI

I have a question about Nifi and its capabilities as well as the appropriate use case for it.
I've read that Nifi is really aiming to create a space which allows for flow-based processing. After playing around with Nifi a bit, what I've also come to realize is it's capability to model/shape the data in a way that is useful for me. Is it fair to say that Nifi can also be used for data modeling?
Thanks!
Data modeling is a bit of an overloaded term, but in the context of your desire to model/shape the data in a way that is useful for you, it sounds like it could be a viable approach. The rest of this is under that assumption.
While NiFi employs dataflow through principles and design closely related to flow based programming (FBP) as a means, the function is a matter of getting data from point A to B (and possibly back again). Of course, systems aren't inherently talking in the same protocols, formats, or schemas, so there needs to be something to shape the data into what the consumer is anticipating from what the producer is supplying. This gets into common enterprise integration patterns (EIP) [1] such as mediation and routing. In a broader sense though, it is simply getting the data to those that need it (systems, users, etc) when and how they need it.
Joe Witt, one of the creators of NiFi, gave a great talk that may be in line with this idea of data shaping in the context of Data Science at a Meetup. The slides of which are available [2].
If you have any additional questions, I would point you to check out the community mailing lists [3] and ask any additional questions so you can dig in more and get a broader perspective.
[1] https://en.wikipedia.org/wiki/Enterprise_Integration_Patterns
[2] http://files.meetup.com/6195792/ApacheNiFi-MD_DataScience_MeetupApr2016.pdf
[3] http://nifi.apache.org/mailing_lists.html
Data modeling might well mean many things to many folks so I'll be careful to use that term here. What I do think in what you're asking is very clear is that Apache NiFi is a great system to use to help mold the data into the right format and schema and content you need for your follow-on analytics and processing. NiFi has an extensible model so you can add processors that can do this or you can use the existing processors in many cases and you can even use the ExecuteScript processors as well so you can write scripts on the fly to manipulate the data.

what does a web-based framework perform well?

thanks in advance.
Somewhere else I asked the same question but about scalability. Please let's not confuse both terms. I'd really like to understand what particular parameters I should look at when assessing if a framework performs acceptably.
I mean parameters such not over-querying the database for simple tasks just because the ORM above is not very well written, etc... Please let's try to answer this question without diving into a particular technology.
Lastly, let's assume that the underlying hardware is any that allow you to perform well, that is to say, there is no hardware bottleneck. I don't want a software that uses a cannon to kill mosquitos :).
Thanks again.
I mean parameters such not over-querying the database for simple tasks
just because the ORM above is not very well written, etc... Please
let's try to answer this question without diving into a particular
technology.
Succesfull ORM technologies are not under-performant. But if your web application is extremely simple (you don't define the scope of your question) then introducing a robust ORM that would be an overkill. Do a DAO pattern yourself instead.
Finally you can always do test measurements yourself to evaluate if the performance suits you

What is the name of this anti-pattern?

Surely some of you have dealt with this one. It tends to happen when programmers get a bit too taken by OO and forget about performance and having a database.
For an example, lets say we have an Email table and they need to be sent by this program. At start-up, it looks for anything that needs to be sent as follows:
Emails = find_every_damn_email_in_the_database();
FOR Email in Emails
IF !Email.IsSent() THEN Email.Send()
This is a good from a do-not-repeat-yourself perspective, but sometimes it's unavoidable and it should be:
Emails = find_unsent_emails();
FOR Email in Emails
Email.Send()
Is there a name of this one?
I'll have a go at it and coin the name "the lazy filter (anti) pattern".
I saw that once. That programmer wasn't around too long.
We called that the "firehose method".
To me it's Joel Spolsky's leaky abstraction.
It's not exactly an anti-pattern, but whoever wrote this code, didn't really understand where Active Record pattern abstraction leaks.
I call that "The Shotgun Approach".
I'm not sure this is necessarily database related, since you could have a complex and expensive procedure (e.g., more than a flag) for applying a filter for a group.
I don't think there's a name to it, since the first design is simply not good, and it violates the one-responsibility-only principle. If you search, filter, and print the filtered you are doing multiple things, so you need to refactor it into "searched filtered" and print.
The only thing different than a simple refactoring here is that it also affects performance, in the same way that inner loops can be designed in ways that harm performance.
Appear to have derived from the following anti-patterns:
Standing On The Shoulders Of Midgets
If It Is Working Dont Change
The original developer would have possibly not been allowed to write the find_unsent_emails() implementation, and would therefore have reused the midget function. And then, why change it after development and testing?
This is frequently due to it being a lot easier to use an existing query and then filtering in code than getting a new SQL query added. Maybe because the DBAs control all queries and getting a new query approved takes days, or maybe because the ORM tool you're using makes it very difficult to define your own custom queries.
If I were to name it I'd call it the "Easy Way Out" (anti)pattern. Whether it's an antipattern or not really depends on the individual situation. If it will always be a fairly small number of items you need to retrieve, doing the filtering in code really isn't a big problem. But if the number of items is large and has the potential to continually grow, then obviously the filtering should be done on the server.
I've seen similar issues elsewhere, where instead of a simple array of things to do, there was a "transaction cluster" based on a "list cluster" based on a "collection cluster" based on a "memory cluster". Needless to say, the simplest thing turned into a great big freakin' deal.
I called it galloping generality.
Stoopid Amateurs.
Seriously, I've only seen this one in people with Computer Science degrees and no professional experience at all. When I was teaching at Duke, my advisor and I ran a "Large Scale Programming" class where we made people look at exactly these sorts of errors.
The performance of the first one can actually be fine, depending on the type of Emails. If it's just an iterator (think of std::vector::begin() in C++) then it's fine and better than storing all unsent e-mails in some container first.
This antipattern has several possible names.
"Don't-know-SQL" antipattern
"Fascist-DBA" antipattern
"What-does-'latency'-mean?" antipattern
There is a nice example at The Daily WTF.
Inspired partly by 1800's "the lazy filter (anti) pattern", how about "dysfunctional programming" (ie the opposite of functional programming)?

What logging implementation do you prefer?

I'm about to implement a logging class in C++ and am trying to decide how to do it. I'm curious to know what kind of different logging implementations there are out there.
For example, I've used logging with "levels" in Python. Where you filter out log events that are lower than a certain threshold. It also includes logging "names" where you can filter out events via a hierarchy, for example "app.apples.*" will not be displayed but "app.bananas.*" will be.
I've had thoughts about using "tags", but unsure of the implementation. I've seen games use "bits" for compactness.
So my questions:
What implementations have you created or used before?
What do you think the advantages and disadvantages of them are?
I'd read this post by Jeff Atwood
It's about the overflow of Logging and how to avoid it.
There are lots of links on the Log4J wikipedia page.
One of our applications uses Registry entries to dynamically control logging/tracing during production execution.
For example:
if (Logger.TraceOptionIsEnabled(TraceOption.PLCF_ShowConfig)) {...whatever
Whe executed at run-time, if registry value PLCF_ShowConfig is true, the call returns true, and whatever is executed.
Quite handy.
Jeff Atwood had a pretty interesting blog entry about logging. The ultimate message of it was that logging is generally unnecessary (to some extents).
Logging generally doesn't scale well (too much data on high traffic systems).
I think the best point of it is that you generally don't need it. It's easier to trace through your code by hand to understand what values are being assigned to things than it is to sift through lots of log files.
It's just information overload.
Now the same can't be said for single user applications. For things like media encoding or general OS usage, it can be nice to have a log for small apps because debug info is useful (to me) in this situation. If you're burning a DVD and something goes wrong, looking at log info can be very helpful to troubleshoot with if you understand the log output.
I think having a few levels would help for the user, such as:
No logs
Basic logging for general user feedback
Highly technical data for a developer or tech-support person to interpret
Depending on the situation, it may be useful to store ALL log data and only display to the user the basic info, or perhaps giving the option to see all detailed data.
It all depends on the domain.

Resources