Something here doesn't feel right to me here, and so I would like the community's input - perhaps I am approaching this in the wrong way....
Q: Is is appropriate to use traditional infrastructure logging frameworks (like log4net) to log business events?
When I say business events, I mean I want a global log like this:
xx:xx Customer A purchased widget B.
xx:xx Widget B was dispatched from warehouse.
xx:xx Customer B payment declined.
Most traditional infrastructure logging frameworks have event levels something like this:
FATAL
ERROR
WARN
INFO
DEBUG
An of course these messages don't fit well into that. Best description would be INFO, but of course these are important events, and INFO is of very low importance.
I would still like this as a 'log' (e.g. I don't want to have to extract this from my business objects each time I want to see it)
Seems to me I have two options:
1) Use a framework like log4net and just define a special logger for this (and live with the fact that it doesn't feel right).
2) Provide a service for performing this that doesn't rely on a traditional logging services.
I'm leaning towards 2. What has anyone else done in a similar situations?
Thanks!
What you're wanting sounds like an auditing service, not a logging service. If I'm right, your goals are to keep track of these business events for historical and maybe even reporting purposes. You can use the details in the audit to, for lack of a better phrase, place blame for events that happen in the system.
I probably wouldn't use a logging system, like log4j, for this purpose. In our system, auditing is a first class citizen as a full service.
--
HTH,
Dusty
Leave the logger for things having to do with the program, not the business. It is just a tool to help the developers.
Write your own system to log business events. If it is a business requirement to have a record, you will want something you have control over and you will need to use the logger above to keep track of how it works.
Basically, #2 in your question.
To me the idea of a Business Event is that it plays a role in some future business processing, anything from actually triggering Business Actions to simply available for analytics.
Hence, completely different QOS requirements. needs its own API.
Conceviably initially that maps down to logging, but in future could go to reliable messaging or DB.
These sound like the sorts of things that your customers might potentially want to query or report on from within your application - the obvious choice would be the database.
In particualr, in this case I'd feel like traditional logging frameworks wouldnt be suitable because when it comes to data that you might later want to access within your application logging frameworks allow you to do things that dont really make sense, for example you might be able to change where the logging is sent to based on the app.config file (which is unhelful if you try and read it from a different location).
That said, if a logging framework allows you to do exactly what you want already then there isnt any shame in just using the logging framework as your implementation and saving yourself the effort:
class TransactionLogger
{
public void Log (Message message)
{
MyLoggingFramework.Log(message.string, etc...);
}
}
In my experience business events comprise large or huge number of technical operations behind the scenes, with only certain business events being important to the business.
This creates problems when trying to use a generic logging methodology, so in general, in the systems I've worked on, both are used.
Logging for the technical aspects, and business event logging for the business events.
The business event logging, doesn't use the same technology as the technical logging, and instead logs to a custom designed history/audit table (Sometimes these are split, depending on the required detail), which is designed specifically for each application. (This keeps the auditors and users nice and happy.)
This allows easy reporting, and management of the information, while obviously expanding the scope of each specification slightly.
you could use it but you need is business activity monitoring and event processing software. Off the top of my head, IBM WebSphere Business Monitor provides this capability. It processes Common Base Event (an IBM implementation of the Web Services Distributed Management Web Event Format standard) and then takes that data and create business activity dashboards.
Check out DiALog: A Distributed Model for Capturing Provenance and Auditing Information, apart from the distributed aspect, you can use the subject-predicate-object principle to record the business events. And afterwards reconstruct certain trails.
Here is a related post - mine. Audit logging and exception management framework.
Related
I am thinking what is the best way to structure your micro-services, in the past the team I was working with used Axon Framework and PostgreSQL and each microservice had its own event store in the PostgreSQL database, then we built communication between using REST.
I am thinking that it would be smarter to have all microservices talk to the same event store as we would be able to share events faster instead of rewriting the communication lines using REST.
The questions that follows from the backstory is:
What is the best practice for having an event store
Would each service have its own? Would they share the same eventstore?
Where would I find information to inspire and gather more answers? As searching the internet for best practices and how to structure the Event Store seems like searching for a needle in a haystack.
Bear in mind, the question stated is in no way aimed at Axon Framework, but more the general idea on building scalable and good code. As the applications would work with each own event store for write model and read models.
Thank you for reading and I wish you all the best
-- Me
I'd add a slightly different notion to Tore's response, although the mainline is identical to what I'm sharing here. So, I don't aim to overrule Tore, just hoping to provide additional insight.
If the (micro)services belong to the same Bounded Context, then they're allowed to "learn about each other's language."
This language thus includes the events these applications publish and store.
Whenever there's communication required between different Bounded Contexts, you'd separate the stores, as one context shouldn't be bothered by the specifics of another context.
Hence it is beneficial to deduce what services belong to which Bounded Context since that would dictate the required separation.
Axon aims to support this by allowing multiple contexts with the Axon Server, as you can read here.
It simply allows the registration of applications to specific contexts, within which it will completely separate all message streams (so commands, events, and queries) and the Event Store.
You can also set this up from scratch yourself, of course. Tore's recommendation of Kafka is what's used quite broadly for Event Streaming needs between applications. Honestly, any broadcast type of infrastructure suits event distribution, as that's how events are typically propagated.
You want to have one EventStore per service, just as you would want to have one relation database per service for a non EventSourced system.
Sharing a database/eventstore between services creates coupling and we have all learned the hard way that this is an anti-pattern today.
If you want to use a event log to share events across services, then Kafka is a popular choice.
Important to remember that you only do event-sourcing within a service bounded context.
I have application which acts as a proxy between different systems without own database. There are few possible use cases which are covered by the application:
Display data from specific system or systems
Store data to specific system or systems
Actually this application has their own front-end and back-end (with sping boot and angular stack). And back-end is responsible to get/put data from/to external systems and front-end communicates with the back-end and it does not know anything about external systems. Also, the back-end follows hexagonal architecture and has their own defined domain models.
Currently there are requirements to cover auditing for business use cases related to the application. For instance, if user goes to some feature related to the application and make some changes there, it should be audited.
I've googled this topic on the internet but I only found entity based auditing like this https://docs.spring.io/spring-data/jpa/docs/1.7.0.DATAJPA-580-SNAPSHOT/reference/html/auditing.html. For my case I would need something similar but based on domain models rather then on entities.
Could you please recommend some direction to cover this? Actually which library or so can be used for such use case to use state of domain model to prepare audit events. I've found something like this https://logging.apache.org/log4j-audit/latest/gettingStarted.html, but I am really not sure if it is rigth way to go
I would say you can build your own auditing strategy based on events.
Let us take the example you gave: "if user goes to some feature related to the application and make some changes there, it should be audited.".
I assume you have a service that handles these requests from a REST API or something similar. That same service would not only communicate with the external systems but would also publish an event with let's say the information about the user and the performed changes or updated (here you can rely on Redis for example, but there are other options like RabbitMQ or even Kafka, depending on how reliable you want your auditing feature to be).
Then you would have another component of your app listening for these events so that you can store them in a Database (I guess that is the purpose). Or you can even have a separated micro-service only for this purpose, depending on how complex this auditing system is meant to be.
If you want something more "magical" and automated you can try to take a look at Spring Boot Data Audit code to see how it is implemented, but you might end up building an overengineered solution.
I have a requirement to persist some data in a table (single table). The data is coming from UI. Do i need to write just the system API and persist the data OR i need to write process and system API both? I don't see a use of process API in this case. Please suggest. Is it always necessary to access system API through process API or system API can be invoked without process API as well.
I would recommend a fine-grained approach to this. We should be following it through the experience layer even though we do not have must customization to the data.
In short, an experience layer API and directly calling System layer API (if there is no orchestration/data conversion/formatting needed)
Why we need a system API & experience API? A couple of points.
System API should be more attached to the underlying system. And if
in case, in the future, it changes then it should not impact any of
the clients.
Secondly, giving an upper layer gives us the feasibility to add
different SLAs, policies, logging and lots more, to different
clients. Even if you have a single client right now it's better to
architect for the future. Reusing is the key advantage of these APIs.
Please check Pattern 2 in this document
That is a question for the enterprise architect in your organisation. In this case, the process API would probably be a simple proxy for the system API, but that might not always be the case in future. Also, it is sometimes useful to follow a standard architectural pattern even if it creates some spurious complexity in the implementation. As always, there are design trade-offs and the answer will depend on factors that cannot be known by people outside of your organisation.
In the current plan, incoming commands are handled via Function Apps, resulting in Events being sent to an Event Hub, and then materializing the views
Someone is arguing that instead of storing events in something like table storage, and materializing views based on events and snapshots, that we should:
Just stream events to a log in Azure Monitor to have auditing
We can make changes to a domain object immediately in response to a command and use the change feed as our source of events for materialized views.
He doesn’t see the advantage of even having a materialized view. Why not just use a query? Argument is we don’t expect a lot of traffic.
He wants to fulfill the whole audit log by saving events to the azure monitor log - Just an application log. Instead, that commands should just directly modify the representation of an entity in cosmos, and we'd use the change feed from CosmosDB as our domain object events, or we would create new events off of that via subscribers to that stream.
Is this actually an advantageous approach? Can ya'll think of any reasons why we wouldn't want to do that? Seems like we'd be losing something here.
He's saying we'd no longer need to be concerned with eventual consistency, as we'd have immediate consistency.
Every reference implementation I've evaluated does NOT do it the way he's suggesting. I'm not deeply versed in the advantages/disadvantages of the event sourcing / CQRS paradigm so I'm at a loss at the moment.. Currently researching furiously
This is a conceptual issue so there's not so much a code example. However, here's some references that seem to back up the approach I'm taking..
https://medium.com/#thomasweiss_io/planet-scale-event-sourcing-with-azure-cosmos-db-48a557757c8d
https://sajeetharan.com/2019/02/03/event-sourcing-with-azure-eventhub-and-cosmosdb/
https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing
If your goal is only to have the audit log, state-based persistence could be a good choice. Event sourcing adds some complexity to the implementation side and unless you can identify more advantages of using it, you might not convince your team to bring this complexity to the system. There are numerous questions and answers on SO, as well as in some blog posts, about pros and cons of event sourcing, so I won't get into that discussion here.
I can warn you, though, that the second article in your list is very weak and would most probably lead you to many difficulties. The role of Event Hub there is completely unclear and it doesn't explain anything about projections and read-models (what you call "materialised views"). Only a very limited number of use-cases can live with only getting one entity by id and without being able to execute a query across multiple entities. That also probably answers your concern of having read-models at all. You will need them very soon when for the first time you will start figuring out how to get a list of entities based on some condition (query).
Using CosmosDb as the event store is completely feasible, as described in the first article if you can manage the costs involved. Just remember to set the change feed TTL to -1, otherwise, you won't be able to replay your projections when you need to.
To summarise:
Keeping the audit log can be done without event-sourcing, but you need to ensure that events are published reliably, preferably in the same transaction as the entity state update. It is often hard or impossible but you might accept the risk of your audit requirement is not strict. You can also base your audit log on the CosmosDb change feed, just collecting document changes and logging them somewhere.
Event sourcing is a powerful technique but it has both pros and cons. The most common prejudice against using event sourcing is its implementation complexity. It might not be a big issue if you have a team that is somewhat experienced in building event-sourced systems. If you don't have such a team, you might want to build a small-scale spike to get some experience.
If you don't get full buy-in from the team to use event sourcing, you will later get all the blame if anything goes wrong. And it will go wrong at some point, especially with little experience in this area.
Spend some time reading books and trying out things yourself, before going wild in production.
Don't use Event Hub for anything that it is not designed for. Event Hub is the powerful event ingestion transport with limited TTL and it should be used for that purpose.
Don't use Table Storage as the event store, unless you only read entities by id. I used it in production for such a scenario and it worked (to some extent) but you can't project read-models from there.
A simple rule of thumb is to not use products for tasks they weren't designed for.
Azure Monitor was not designed to store application domain data. Azure Monitor is designed to store telemetry data from your applications and services and provides features such as alerts and other types of integration into DevOps tools for managing the operation and health of your apps.
There is a simple reason why you were able to find articles on event sourcing using Cosmos DB and why our own docs talk about it. Because it was designed to be used this way. It is simple to set up Cosmos DB to be an append only event store for your applications and use Change Feed to fire off messages in other apps or services or, in your case, to maintain a materialized view state of domain objects within your app.
Background: I've inherited a web application that is intended to create on-the-fly connections between local and remote equipment. There are a tremendous number of moving parts recently: the app itself has changed significantly; the development toolchain was just updated; and both the local and remote equipment have been "modified" to support those changes.
The bright side is that it has a reasonable logging system that will write debug messages to a file, and it will log to both the file and a real-time user screen. I have an opportunity to re-work the entire log/debug mechanism.
Examples:
All messages are time-stamped and prefixed with a severity level.
Logs are for the customer. They record the system's responses to his/her requests.
Any log that identifies a problem also suggests a solution.
Debugs are for developers and Tech Support. They reveal the system internals.
Debugs indicate the function and/or line that generated them.
The customer can adjust the debug level on the fly to set the verbosity.
Question: What best practices have you used as a developer, or seen as a consumer, that generate useful logs and debugs?
Edit: Many helpful suggestions so far, thanks! To clarify: I'm more interested in what to log: content, format, etc.--and the reasons for doing so--than specific tools.
What was it about the best logs you've seen that made them most helpful?
Thanks for your help!
Don't confuse Logging, Tracing and Error Reporting, some people I know do and it creates one hell of a log file to grep through in order to get the information I want.
If I want to have everything churned out, I seperate into the following:
Tracing -> Dumps every action and step, timestamped, with input and
output data of that stage (ugliest and
largest file)
Logging -> Log the business process steps only, client does enquiry so log
the enquiry criteria and output data
nothing more.
Error Reporting / Debugging -> Exceptions logged detailing where it
occurred, timestamped, input/output
data if possible, user information etc
That way if any errors occurred and the Error/Debug log doesn't contain enough information for my liking I can always do a grep -A 50 -B 50 'timestamp' tracing_file to get more detail.
EDIT:
As has also been said, sticking to standard packages like the built in logging module for python as an example is always good. Rolling your own is not a great idea unless the language does not have one in it's standard library. I do like wrapping the logging in a small function generally taking the message and value for determining which logs it goes to, ie. 1 - tracing, 2 - logging, 4 - debugging so sending a value of 7 drops to all 3 etc.
The absolutley most valueable thing done with any logging framework is a "1-click" tool that gathers all logs and mail them to me even when the application is deployed on a machine belonging to a customer.
And make good choices at what to log so you can roughly follow the main paths in your application.
As frameworks I've used the standards (log4net, log4java, log4c++)
do NOT implement your own logging framework, when there already is a good one out-of-the-box. Most people who do just reinvent the wheel.
Some people never use a debugger but logs everything. That's different philosophies, you have to make your own choice. You can find many advices like these, or this one. Note that these advice are not language related...
Coding Horror guy got an interesting post about logging problem and why abusive logging could be a time waste in certain conditions.
I simply believe logging is for tracing things that could remain in production. Debug is for development. Maybe it's a too simple way of seeing things, cause some people use logs for debugging because they can't stand debuggers. But debugger-mode can be a waste of time too: you don't have to use it like a sort of test case, because it's not written down and will disappear after debug session.
So I think my opinion about this is :
logging for necessary and useful traces through development and production environments, with development and production levels, with the use of a log framework (log4 family tools)
debugging-mode for special strange cases when things are going out of control
test cases are important and can save time spend in infernal labyrinthine debugging sessions, used as an anti-regression method. Note that most of the people don't use test cases.
Coding horror said resist to the tendency of logging everything. That's right, but I've already seen a hudge app that does the exact contrary in a pretty way (and through a database)...
I would just setup your logging system to have multiple logging levels, on the services I write I have a logging/audit for almost every action and it's assigned a audit level 1-5 the higher the number the more audit events you get.
The very basic logging: starting, stopping, and restarting
Basic logging: Processing x number of files etc
Standard logging: Beginning to Processing, Finished processing, etc
Advanced logging: Beginning and ending of every stage in Processing
Everything : every action taken
you set the audit level in a config file so it can be changed on the fly.
Some general rules-of-thumb I have found to be useful in server-side applications:
requestID - assign a request ID to each incoming (HTTP) request and then log that on every log line, so you can easily grep those logs later by that ID and find all relevant lines. If you think it is very tedious to add that ID to every log statement, then at least java logging frameworks have made it transparent with the use of Mapped Diagnostic Context (MDC).
objectID - if your application/service deals with manipulating some business objects that have primary key, then it is useful to attach also that primary key to diagnostic context. Later, if someone comes with question "when was this object manipulated?" you can easily grep by the objectID and see all log records related to that object. In this context it is (sometimes) useful to actually use Nested Diagnostic Context instead of MDC.
when to log? - at least you should log whenever you cross an important service/component boundary. That way you can later reconstruct the call-flow and drill down to the particular codebase that seems to cause the error.
As I'm a Java developer, I will also give my experience with Java APIs and frameworks.
API
I'd recommend to use Simple Logging Facade for Java (SLF4J) - in my experience, it is the best facade to logging:
full-featured: it has not followed the least-common denominator approach (like commons-logging); instead, it is using degrade gracefully approach.
has adapters for practically all popular Java logging frameworks (e.g. log4j)
has solutions available on how to redirect all legacy logging APIs (log4j, commons-logging) to SLF4J
Implementation
The best implementation to use with SLF4J is logback - written by the same guy who also created SLF4J API.
Use an existing logging format, such as that used by Apache, and you can then piggyback on the many tools available for analysing the format.