Which is the best way to implement visitor's logic?
Create visitors table |ip|resource_type|resource_id|
Create serialize field in records (Post, Pet, Event, Ad, etc...)
Use nosql solutions
Any other idea
In the 1st case, we have extended the table size for every visit.
In the 2nd, we have a long field.
In the 3nd, I have trouble with mongoid at production (centOS).
Not sure I'm answering, but I would not implement that myself, but rather take a look at existing solutions. For basic counting :
Vanity
Google Analytics
For more detailed metrics about what each user does, I would go toward cohort.
A totally other option could be using just the log and something like lograge to log each request. It is very easy to add fields (such as the IP). You can then extract all the informations from your logs.
Related
I'm feeling good to work with Neo4j as I'm building a social network and neo4j is working well for me. Please answer to these points:
1) I'm stuck at making a decision as to store the number of likes on a post (de-normalized)somewhere in the database or should I count the number of edges to that post dynamically every time.
For example, When retrieving the "post" json data, for each user who needs that data, I need to count the no. of edges everytime I generate json.
2) Stuck at deciding best way to notify users about likes or comments.
For example, I want to push a notification to the user saying "John and 3 others also commented on Cena's post".
This notification might be updated as the number of comments increases. So, it's helpful for me to update notification if I'm using count(*) rather than than storing the counter somewhere, because I can fetch the count of "${count} new replies on your post" easily. But I'm worried about the performance.
3) Can I use Redis or other memcache with neo4j? Does that make a "significant" difference?
Please help me out in deciding which is better.
P.S: Please keep in mind the efficiency and scalability of application.
So I will be embarking on designing a dashboard that will display KPI's and other relevant information for my team. Since I am in the early stages of this project and am not very familiar on the technical process behind designing a dashboard, I need some questions vetted out first before I go and shop for some solutions to avoid reinventing the wheel.
Here are some of my questions:
We want a dashboard that can provide live-time information via our data sources (or as close to live-time as possible). What function allows a dashboard to update itself with concurrent datasources? From a conceptual standpoint, I can understand creating a dashboard out of Microsoft Excel, and having the dashboard dependent on the values you may have set within your pivot table.
How do you make a dashboard request information from multiple datasources on its own? Just like the excel example, a user may have to go into the pivot tables to update values, but I want to know how would a dashboard request this by itself and what is the exact method from a programming standpoint? Does the code execute itself every time you refresh the webpage?
How do you create datasources organically? I know for some solutions such as SharePoint BI Center, there are pre-supported datasources like an excel sheet or SharePoint and it's as easy as uploading your document and letting the design handle the rest. However, there are going to be some datasources that I know that will need to be fetched. Do I need to understand something else like an event recorder in order to navigate this issue?
Introduction
The dashboard (or a report, respectively) is usually the result of a long chain of steps. Very much simplified it could look like this:
src1
|------\
src2 | /---- Dashboards
|------+---[DWH]-[BR]-+
src n | | \---- Reports etc.
|------/ [Big Data]
Keep in mind, this is only a very, very simple structure of a data backend / frontend.
DWH means Data Warehouse, where data might be stored temporarily (you referred to this as fetching). This could be a database, could be a Big Data engine, could be a combination of both...
Afterwards, there are Business Rules (BR). Those might be specific rules in how different departments calculate and relate to data, but also simple things like algebra.
Questions
So, the main question should not be about the technology:
What software should we choose?
How can we create a dashboard?
but on the contrary focused on your business processes (see it like a top-down view):
How does our core process look like? Where would I like to measure data?
How would department a calculate sales in difference to department b? Should all use the same rule?
Where does everyone store the data? Can we access it? Do we need structural data?
And, very easy to forget but also easily sometimes one of the biggest parts: Is the identifier of a business object (say, sales id) everywhere build and formatted in the same way?
Conclusion
When those questions are at least in the back of your head and you keep working in this direction, more or less automatically data will spill out at certain points of that process.
Then it won't matter if you use Excel, a small-to medium app like Tableau, Tibco Spotfire, QlikView, Power BI or you want to go full scale with a big Hadoop backend, databases and JasperReports, Apache Drill, Pentaho, SSIS on top of it... it will come out eventually.
TL;DR
Focus on the processes first. Make sure to understand them. Draft in Excel. Then proceed in getting the data and the tools you need to help your use cases. It will work out much better from a "top-down" approach than trying to solve your requirements with tools only.
I'm trying to make a database table for every single username. I see that for every username, I can add more columns in it's row, but I want to attribute a full table for each one. How can I do that?
Thanks,
Eli
First let me say, what you are trying to do sounds like really, really bad database design and you should rethink your idea of creating a table per user. To get help for this you should add way more detail about the reasoning to your question to get a good answer. As far as I know there is also a maximum number of classes you can create on Parse so sooner or later you will run into problems, either performance wise or due to technical limitations of the platform.
That being said, you can use the Schema API to programmatically create/delete/update tables of your Parse app. It always requires the master key, so doing this from the client side is not recommended for security reasons. You could put this into a Cloud Code function for example and call this one from your app/admin tool to create a new table for a user on the fly or delete a table of a user.
Again, better don't do it and think about a better way to design your database, it would be out of scope here to discuss it.
For a summer internship, I am asked to collect some specific data relative to the pages a user visits on the startup's website.
In order to simplify things, we can consider the website as a dating site, where each user has its profile page and is tagged under certain categories (hair color, city, etc).
I would like to know the best way, in the Rails framework, to keep traces of each visits a user makes to a profile or to a tag page.
Should it be logged in a file or added in a database, where exactly in the code should the functions be called ?
Maybe a gem already exists for this specific purpose?
The question is both about where functions should be called in Rails and how data should be stored because the goal is to build a recommendation system, ultimately.
There are a wide range of options available to you. I'd recommend one of the following:
Instrument detailed logging of the relevant controller actions. Periodically run a rake task that aggregates data from the log files and makes it available to your relevance engine.
Use a key/value store such as Redis to increment user/action specific counters during requests. Your relevance engine can query this store for the required metrics. Again, periodic aggregation of metrics is advised.
Both approaches lend themselves well to before_filter statements. You can interrogate the input params before the controller action executes to transparently implement the collection of statistics.
I wouldn't recommend using a relational database to store the raw data.
Have you ever noticed how facebook says “3 friends and 33 others liked this”? I was wondering what the best approach to do this is. I don’t think going through the friends list, and the list of users who “liked this” and comparing them is efficient at all! Do they keep a track of this in the database? That will make the database size very huge.
What do you guys think?
Thanks!
I would guess they outer join their friends table with their likes table to count both regular likes and friend likes at the same time.
With the proper indexes, it wouldn't be a slow query at all. Huge databases aren't necessarily slow, so there's really no reason to not store all of this information in a database. The trick is to make sure the indexes and partitions (if any) are set up well.
Facebook uses Cassandra, a NoSQL database for at least some things. Here's a more detailed discussion of what some of the bigger social media sites do to solve these problems:
http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx
Lots of interesting reading in there if you follow the links from it to the Digg blog post, etc.
Yes they definitely keep it in their database as they definitely have more than 1 server that needs to access the data.
As for scalability, I'm sure they use a lot of caching.
Here is an example:
If you have 1 million rows to go through, an index can perform O(logn) = 20 operations (in the worst case) only to find what you need.
For 2 million, you only need 21 operations (in the worst case) to find what you need.
Every time you double the amount of users to go through you simply need only 1 more operation (in the worst case) with a O(logn) index.
They also have a distributed architecture or a clustered database.
Facebook must be using a trigger(which automatically gets executed as soon as an event occurs).
For example, suppose a trigger is created to store the count and names of people who liked the status, then it will get executed every time when someone likes your status and that too implicitly (automatically).
This makes the operation way too easy and Facebook doesn't have to manually update the database or store a huge database for this. Also,this approach is a faster one.
In designing social networking software (mothsorchid.com) I found the only way to address this is to pre-cache streams of notifications. One doesn't query the database at the time of page load to count how many friends and others 'liked this', when someone 'likes' something that is recorded on the object, and when retrieving the object one can compare with the current user's friend list. If someone updates their profile/makes a comment/etc it sends notification objects to friends which are pre-cached in their feeds. Cuts down tremendously on database work at expense of disk space, but disk space is cheap.
As to how Facebook does this, they use Cassandra DBMS, which is probably a little different to what you have in mind.
Keep in mind that Facebook strongly utilizes memcached, so they're retaining a lot of data in memory and only refreshing it when absolutely necessary. See this blog post for some scalability discussion around this:
http://www.facebook.com/note.php?note_id=39391378919
Each entry that somebody can like probably contains a list of everybody who does like it (all of this is of course in a database). When you view that entry, they match it against your friends list to see which of them is your friend. Voila.
A lot of this are explained by the Director of Engineering of Facebook in this QCon presentation :
http://www.infoq.com/presentations/Facebook-Software-Stack
A great presentation to watch.....