Apache DerbyDB log, how to read transaction log - derby

In case you think this may be related to the question here, I think the answer is possibly, but I'm not sure.
I'm looking for a utility where I can easily read the transaction logs that occur in a DerbyDB (embeded, although it may end up in a server later!).
I'm not hugely interested in the the system logs as a whole, simply the updates to fields in the database. My application grabs the name of the current user on the system and uses this to log into the embeded DB.
I'm sure the derby logs the info of the user and the field name, table name (and some sort of 'key' reference) for the row in the database, and the new value that has been input into the field. If it includes the old value great, although I'm sure a 'search' would enable a view of what the old and updated values were / are would be reasonably easy.
I'v looked at the info for the derby DB log format, but it is fairly technical, and I don't immediately see a reference into the values I'm interested in.
If the solution suggested in the first link will also provide a solution for me, that is great. Or does derby store this info in a system table somewhere, for more easy access, if so I haven't yet found this info.
A good alternative would be the ability to use log4j to view the files, if at all possible.
Your thoughts and answers are much appreciated.
thanks in advance.
David

Check out these tools: https://issues.apache.org/jira/browse/DERBY-5195
They sound like what you're interested in.

Related

How to store heroku logs for data science purposes?

We can see how to view heroku logs, as well as how to write the last n lines as a text file.
Is there any established pattern for sensible and easy log storage, (potentially ETL), and analysis?
At least, this would involve:
storing logs
moving logs (e.g. via an ETL) to somewhere they can be analysed en mass (e.g. AWS S3 or GCP GCS)
Is there any established pattern to achieve this?
Background
Why would anyone want logs en mass? In case it's relevant, a specific task I'm trying to achieve is to use bayesian inference on web logs to answer questions like: "if a person clicked on A, B and C, then they're x% likely to click on D" (so as to better understand which other pages a user may be interested in, and therefore suggest more relevant pages to the user). This is all pretty straight forward in python or R. But obviously one needs access to the logs (all the logs) before such data science can be carried out.
What I know so far
Heroku provides several logging addons
Really the best solution is probably to setup the heroku app to also pipe your logs into an S3 bucket or something like that. Though perhaps you want to set it up so it only sends the log data you are actually interested in. Even better if you can get something that does this for you.
Looks like PaperTrail at least allows this. Here is the current documentation link:
https://documentation.solarwinds.com/en/Success_Center/papertrail/Content/kb/how-it-works/automatic-s3-archive-export.htm?cshid=pt-how-it-works-automatic-s3-archive-export
Though it might get rather costly depending on the volume of logs you need to handle to use an outside service. Otherwise, you may just need to roll your own solution (or better yet, look for gems that can help)

HBase - hotspotting check

I am using HBase. And I am suspecting that rowkey has caused hotspoting. Before trying with salting of rowkey, I would like to check if hotspoting has already occurred. Is there any way in HBase to analyze data distribution in region servers to check if hotspoting has occurred?
Thanks,
Partha
You can use the HMaster Info Web UI to detect this.
It should be http://master-address:16010 by default.
If it's not available, you can check if the UI is not disabled in the conf (hbase-site.xml) and be sure that hbase.master.info.port is not set to -1.
When you are on it, you have to click on the table that you want to check.
You will be on this page
https://docs.prediction.io/images/cloudformation/hbase-32538c47.png
Then if you see that one region server has a lot more regions than the others, this is a good hint that one of your region server is probably hotspotted.
It means that the regions in this part of the rowkey scope are splitted more often ! The request per second can also be an indicator but to my experience, it's not always really accurate.
But this is just good hints and the only simple good way that I know to be sure that a hotspot is occuring is to bench it. Because when it happens, the write performance are really, REALLY different. So, you should check the througput that you have with an hashed rowkey with the same data then compare. You'll see very quickly if there is an hotspot.

How to write "Last Seen" logic like that on Stack Overflow

I'm working on an application that has similar logic as SO with regards to when the user was last seen. I've run into a conceptual problem that I'm hoping some of you Guru's can help me out with.
All activity is logged in an ActivityLog table in the database
When a logged in user hits the site and a new session is created, I update the activity log with the UserID and some very generic info. Same thing happens when they create a new record, update their profile, etc.
The problem I'm having is this.
If I use the most recent activity item, then navigate to my personal account page, the "Last Seen" shows up as 1 second ago because I JUST hit the db on session start... This is not good because I want to see what I was "last" there, not when I'm there "now".
However, if I use Skip(1).Take(1) to get the second record in the database, then when someone else views my profile while I may have "just" signed on... they'll see that I was on say a week ago and not today.
What kind of logic would you use in order to have your cake and eat it too?
I'm using ASP.NET MVC2 and Linq to SQL, but I think this question is more language agnostic.
It sounds like you could just show the second most recent record for the current user, and for all other users show the most recent one.
What I would do (simply to avoid a large logic loop) would be to add two fields. current_seen and last_seen. On login move current_seen to last_seen and set current_seen to the current timestampe. Then display last_seen as their "Last seen on XX/XX/XXX".
One place to look is at the source code to OSQA (the open source q&a system) -
http://www.osqa.net/
And yes, it looks a lot like StackOverflow (to say the least).

Cache like mechanism (which Data Structure)?

I am fetching some questions from the server (database) and showing it to client (user) in the browser. The client will answer the question and based on his/her answer the next set of questions will be fetched from the database. Now, I want to pre-fetch the next set of questions while the user read the present question so that the waiting time for user to see the next question will be shorter.
My questions is, how to store the pre-fetched questions i.e. which data structure should I use to store the pre-fetched questions in the memory so that I can get better performance? I want a "cache" type of thing. Also once the user hit any question from the cache the question won't be there any more.
PS: Each question has unique Id.
Thanks
Naveen
There are multiple options to go about it. One that makes a big difference, one that makes little.
Little difference would be to fetch questions and store it in user's session. It's basically depends on where your session is stored, could also be database, or a file. This only makes sense if your db tables are very denormalized and it requires lots of joins to get the answer. I doubt that's the case so this won't make much difference for the user no matter which data structure used.
Big difference would make prefetching them with AJAX using javascript straight into the browser. In this case a simple array would suffice. JS gives you flexibility to build any objects with any properties, anything would be good enough. So write a poller in JS which fetches the questions from server while user is looking at the question, return them using JSON for example. JSON will become a simple object. Since each user stores only a couple of questions prefetched in their browser particular data structure choice won't make a difference here either.
Try using LinkedHashMap as You will have LRU algorithm implemented quickly with good performance.
Read this link as well :
LinkedHashMap as cache
First a few questions to adapt to your context :
assuming you use Java ?
using Hibernate also ?
If you want to prefetch in the server, many caching solutions exists.
Taking into account your unique id (see PS), if this ID is database related and you are using Hibernate, the easiest solution would be to configure the Hibernate second-level cache for that entity. Then, your only code would be to run the query in advance....
If theses requisites do not fit, I used EhCache as the caching solutions.
Somehow easy to start using, and it has plenty of features available when you later need them.

What logging implementation do you prefer?

I'm about to implement a logging class in C++ and am trying to decide how to do it. I'm curious to know what kind of different logging implementations there are out there.
For example, I've used logging with "levels" in Python. Where you filter out log events that are lower than a certain threshold. It also includes logging "names" where you can filter out events via a hierarchy, for example "app.apples.*" will not be displayed but "app.bananas.*" will be.
I've had thoughts about using "tags", but unsure of the implementation. I've seen games use "bits" for compactness.
So my questions:
What implementations have you created or used before?
What do you think the advantages and disadvantages of them are?
I'd read this post by Jeff Atwood
It's about the overflow of Logging and how to avoid it.
There are lots of links on the Log4J wikipedia page.
One of our applications uses Registry entries to dynamically control logging/tracing during production execution.
For example:
if (Logger.TraceOptionIsEnabled(TraceOption.PLCF_ShowConfig)) {...whatever
Whe executed at run-time, if registry value PLCF_ShowConfig is true, the call returns true, and whatever is executed.
Quite handy.
Jeff Atwood had a pretty interesting blog entry about logging. The ultimate message of it was that logging is generally unnecessary (to some extents).
Logging generally doesn't scale well (too much data on high traffic systems).
I think the best point of it is that you generally don't need it. It's easier to trace through your code by hand to understand what values are being assigned to things than it is to sift through lots of log files.
It's just information overload.
Now the same can't be said for single user applications. For things like media encoding or general OS usage, it can be nice to have a log for small apps because debug info is useful (to me) in this situation. If you're burning a DVD and something goes wrong, looking at log info can be very helpful to troubleshoot with if you understand the log output.
I think having a few levels would help for the user, such as:
No logs
Basic logging for general user feedback
Highly technical data for a developer or tech-support person to interpret
Depending on the situation, it may be useful to store ALL log data and only display to the user the basic info, or perhaps giving the option to see all detailed data.
It all depends on the domain.

Resources