Calculating similarites between sentences - algorithm

I have datbase with thousands of rows of error logs and their description.This error log is for an application that running 24/7. I want to create a dashboard/UI to view the current common errors happening for prodcution support.
The problem I am having is that even though there are lot of common errors, the error description differs by the transcation ID or user ID or things that are unique for that sigle prcoess.
e.g Error trasaction XYz failed for user 233
e.g 2. Error trasaction XYz failed for user 567
I consider these two erros to be same. So I want to a program that will go through the new error logs and classify them into groups. I am trying to use "edit distance" but its very slow.Since I alraedy have old error logs, i am trying to think of solutions using that information too. Any thoughts?

I'm assuming that the error messages are generated by a program, and so they probably fall into very specific pattern.
That means you don't have to do anything particularly complex. Just parse the error messages: use regular expressions (or maybe something more powerful) to split the messages into tuples. Then group or count or do something with the individual fields. For example, you could do a regex like "Error transaction ([A-Z]*) failed for user ([0-9]*)". You could then make a histogram of the error codes (first capture group) or users (second capture group).

There are other metrics (apart from Levenshtein) which might be more appropriate. Have you considered Cosine Similarity?
SimMetrics is an F/OSS library that offers an extensive collection of similarity algorithms and their corresponding cost functions.

Related

How to properly create Prometheus metrics with unique field

I have a system that regularly downloads files and parses them. However, sometimes something might go wrong with the parsing and I have the task to create a Prometheus alert for when a certain file fails. My
initial idea is to create a custom counter alert in Prometheus - something like
processed_files_total and use status as label because if the file fails it has FAILED status and if it succeeds - SUCCESS, so supposedly the alert should look like
increase(processed_files_total{status=FAILED}[24h]) > 0 and I hope that this will alert me in case there is at least 1 file with failed status.
The problem comes from the fact that I also want to have the
exact filename in the alert message and since each file has a unique name I'm almost sure that it is not a good idea to put it as label e.g. filename={filename} - According to Prometheus docs -
Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
is there any other way I can achieve getting the filename from the alert or this is the way to go ?
It's a good question.
I think the correct answer is that the alert should notify you that something failed and the resolution is to go to the app's logs to identify the specific file(s) that failed.
Lightning won't strike you for using the filename as a label value in Prometheus if you really must but, I think, as you are, using an unbounded value should give you pause as to whether you're abusing the tool.
Metrics seem intrinsically (hunch) about monitoring aggregate state (an unusual number of files are failing) rather than specific (why did this one fail); logs and tracing tools help with the specific cases.

Handling multiple user input in RASA

I am making a symptom checker chatbot . And I cannot figure out way to take multiple user input before analyzing it and giving output. Like, taking all the user input then giving the output after analyzing all the inputs given. Forms can be used but i am confused on how to implement on the system.
You can wait for multiple user messages by using action_listen. However, it might be hard to know when to stop listening. Depending on how many messages you're expecting from the user, it may be easiest to have a custom action loop, with the bot saying something like
"Anything else you want to add?", and accumulating the user's responses so they can all be analysed together.

Kibana - get logs surrounding certain document with patterns to mark start and end range of a context

For example, I see an error log in Kibana, but I am not interested only in this error, but also the context of this line, i.e., I want to know what happens before and after this error. Such as:
the order failed with status "FAILED", but the log just before this line would contain the method name who caused this error
some 5-10 lines before this, I know there would be a line like "Start processing order xxxxx with status xxx"
and 15-20 lines after this log, there would be something like "End processing with status xxx"
All this together, marks a life cycle of processing of this particular order. And all these lines are what I mean by saying "context".
How can I get all these lines as a search in Kibana?(Let's suppose all the literals are in the field "message")
For now, I know we can "view surrounding documents", but that is not efficient enough.
https://www.elastic.co/guide/en/kibana/current/xpack-apm.html
Well, just learn about Elastic APM and it can solve part of the problem. APM can record "span" and "transactions" to form "distributed trace", then add info to the field "trace" to logs and then we can aggregate all logs with same trace id to learn the context of this event across the microservices.
The question now changes to "How to use APM to add trace". And, one of our microservices is reactive, which cannot be easily adapted to use APM: Reactive pipeline context is thread-based, between the threads there are no easy way to transfer the trace from one context to another. So this is the part APM cannot solve.
But at least now we know in imperative apps we have a way.

HTTP500 error LoadRunner Oracle NCA script

I have recorded a script from login till the opening of Oracle form.
Then i split the program into two parts, one with login and other as Navigation to form and open.
Login is successfully executing but the navigation script is giving me an error HTTP-error code 500
T03_Amar_Navigation.c(95): Error -26612: HTTP Status-Code=500 (Internal Server Error) for the URL [MsgId: MERR-26612].
there is no problem while logging in and opening oracle form manually.
can someone help me what I may be missing?
I tried copying all the correlation parameters into the navigation as well, no error or mismatch with correlation parameters
Best guess, based upon seeing this 500 condition hundreds of times in my career, is that you need to check your script for the following
Explicit checking for success on each step, or expected results. This is more than just accepting an HTTP 200. This involves actually processing the content that is returned and objectively looking at the page for elements you expect to be present. If they are not present then you will want to branch your code and elegantly exit your iteration. A majority of 500 level events are simply the result of poor testing practices and not checking for expected results.
Very carefully examine your code for unhandled dynamic elements. These could be related to session, state, time or a variable related to user/business process. A mishandled or unhandled dynamic element cascading for just a few pages results in an an application where the data being submitted does match the actual state of the business process. As this condition is something that would not be possible with the actual website, you wind up with an unaddressed exception in the code and a 500 pushed back to the user. There are roughly half a dozen methods for examining your requests for dynamic elements. I find the most powerful to be the oldest, simply record the application twice for the same data, then compare the scripts. Once you have addressed the items related to session, state and time, then record with a different data set (user, account, etc...) and look at the dynamic elements related to your actual data in use.
Address the two items above and your 500 will quite likely go away.

How to use compensating measures in an CQRS and DDD based application

Let's assume we host two microservices: RealEstate and Candidate.
The RealEstate service is responsible for managing rental properties, landlords and so forth.
The Candidate service provides commands to apply for a rental property.
There would be a CandidateForRentalProperty command which requires the RentalPropertyId and all necessary Candidate information.
Now the crucial point: Different types of RentalPropertys require a different set of Candidate information.
Therefore the commands and aggregates got splitten up:
Commands: CandidateForParkingLot, CandidateForFlat, and so forth.
Aggregates: ParkingLotCandidature, FlatCandidature, and so forth.
The UI asks the read model to decide which command has to be called.
It's reasonable for me to validate the Candidate information and all the business logic involved with that in the Candidate domain layer, but leave out validation whether the correct command got called based on the given RentalPropertyId. Reason: Multiple aggregates are involved in this validation.
The microservice should be autonomous and it's read model consumes events from the RealEstate domain, hence it's not guaranteed to be up to date. We don't want to reject candidates based on that but rather use eventual consistency.
Yes, this could lead to inept Candidate information used for a certain kind of RentalProperty. Someone could just call the CandidateForFlat command with a parking lot rental property id.
But how do we handle the cases in which this happens?
The RealEstate domain does not know anything about Candidates.
Would there be an event handler which checks if there is something wrong and execute an appropriate command to compensate?
On the other hand, this "mapping" is domain logic and I'd like to accomodate it in the domain layer. But I don't know who's accountable for this kind of compensating measures. Would the Candidate aggregate be informed, like IneptApplicationTypeUsed or something like that?
As an aside - commands are usually imperative verbs. ApplyForFlat might be a better spelling than CandidateForFlat.
The pattern you are probably looking for here is that of an exception report; when the candidate service matches a CandidateForFlat message with a ParkingLot identifier, then the candidate service emits as an output a message saying "hey, we've got a problem here".
If a follow up message fixes the problem -- the candidate service gets an updated message that fixes the identifier in the CandidateForFlat message, or the candidate service gets an update from real estate announcing that the identifier actually points to a Flat, then the candidate service can emit another message "never mind, the problem has been fixed"
I tend to find in this pattern that the input commands to the service are really all just variations of handle(Event); the user submitted, the http request arrived; the only question is whether or not the microservice chooses to track that event. In other words, the "command" stream is just another logical event source that the microservice is subscribed to.
As you said, validation of commands should be performed at the point of command generation - at client side - where read models are available.
Command processing is performed by aggregate, so it cannot and should not check validity or existence of other aggregates. So it should trust a command issuer.
If commands comes from an untrusted environment like public API, then your API gateway becomes a client, and it should have necessary read models to validate references.
If you want to accept a command fast and check it later, then log events like ClientAppliedForParkingLot, and have a Saga/Process manager handle further workflow by keeping its internal state, and issuing commands like AcceptApplication or RejectApplication.
I understand the need for validation but I don't think the example you gave calls for cross-Aggregate (or cross-microservice for that matter) compensating measures as stated in the Q title.
Verifications like checking that the ID the client gave along with the flat rental command matches a flat and not a parking lot, that the client has permission to do that, and so forth, are legitimate. But letting the client create such commands in the wild and waiting for an external actor to come around and enforce these rules seems subpar because the rules could be made intrinsic properties of the object originating the process.
So what I'd recommend is to change the entry point into the operation - to create the Candidature Aggregate Root as part of another Aggregate Root's behavior. If that other Aggregate (RentalProperty in our case) lives in another Bounded Context/microservice, you can maintain a list of RentalProperties in the Candidate Bounded Context with just the amount of info needed, and initiate the Candidature from there.
So you would have
FlatCandidatureHandler ==loads==> RentalProperty ==creates==> FlatCandidature
or
FlatCandidatureHandler ==checks existence==> local RentalProperty data
==creates==> FlatCandidature
As a side note, what could actually necessitate compensating actions are factors extrinsic to the root object of the process. For instance, if the property becomes unavailable in the mean time. Then whatever Aggregate holds that information should emit an event when that happens and the compensation should be initiated.

Resources