I have a requirement where in i have to calculate the number of hit count for the application day wise and generate a report for it. What i have been told is to create a table and then keep on updating the count whenever user accesses the application. Can anyone guide me how to proceed. Any guidance will be very much appreciated.
Related
I need twitter dataset for last 3-4 months relating to any company/ commodity for performing sentiment analysis and thereby stock price prediction.
But the twitter API only goes back upto 10-12 days.The code that I've prepared works well but I need more dataset to reach a reliable conclusion.I can't wait for 3-4 months since I've to submit the project soon.
If anyone knows any link where I can find some dataset or if anyone has it,
Please let me know.
Thank you in advance
You probably could check out these discussions (1,2) and also this site.
I'm feeling good to work with Neo4j as I'm building a social network and neo4j is working well for me. Please answer to these points:
1) I'm stuck at making a decision as to store the number of likes on a post (de-normalized)somewhere in the database or should I count the number of edges to that post dynamically every time.
For example, When retrieving the "post" json data, for each user who needs that data, I need to count the no. of edges everytime I generate json.
2) Stuck at deciding best way to notify users about likes or comments.
For example, I want to push a notification to the user saying "John and 3 others also commented on Cena's post".
This notification might be updated as the number of comments increases. So, it's helpful for me to update notification if I'm using count(*) rather than than storing the counter somewhere, because I can fetch the count of "${count} new replies on your post" easily. But I'm worried about the performance.
3) Can I use Redis or other memcache with neo4j? Does that make a "significant" difference?
Please help me out in deciding which is better.
P.S: Please keep in mind the efficiency and scalability of application.
I'm using a JVM to perform API calls to the Google Apps Administrator API.
I've noticed with the User Usage Reports, I'm not getting complete data when it comes to a field I'm interested in (num_docs_externally_visible) and the fields which form that fields calculation. I generally request a single day's usage report at a time, across my entire user base (~40k users).
According to the documentation on the developer's, I should be able to see that in a report 2-6 days after; however after running a report for the first 3 weeks of February, I've only gotten it for 60% of the days. The pattern appears to be random (in that I have up to 4 day in a row streaks of the item appearing and 3 days in a row streaks of it not, but there is no consistency to this).
Has anyone else experienced this issue? And if so, were you able to resolve it? Or should I expect this behavior to continue if this is an issue with what the API is returning outside of my control?
I think it's only natural that the data you get is not yet complete, it takes a certain day to receive the complete data.
This SO question is not exactly the same of your question, but i think it will help you. Especially the part that you need to use your account time zone.
I have a requirement in which I need to call a process which sends a perticular message for every X days for a customer till N days.
Basically, it's like the process runs every day fetching the customers into cursor then the process should check when was the last message sent for each customer if it was sent exactly X days before then I need to send the message to those customers.
I can handle this in the process by adding a extra column to track last notification date and refer that for sending. But it will be a performance hit..
So Can any one suggest me if there is a simpler way to handle this .
Kindly let me know if you need clarification on any part
I don't think that would be a performance bump !
If you are adding the a column in the same table , anyway only one query is gonna be executed. So I didn't likely to be a performance bump.
Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
CREATE VIEW vw_Miles WITH SCHEMABINDING AS
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
UserId
FROM Miles
GROUP BY UserID
GO
CREATE UNIQUE CLUSTERED INDEX ix_Miles ON vw_Miles(UserId)
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
GROUP BY UserId)
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?