Recommendation On Data Store For Website's User Activity Log - performance

I am looking for some recommendations on a good data store for activity feeds. The goal is to have a Twitter/Facebook type feed log consisting of various activities users can do throughout our website. The "wall" or "feed" would updated via AJAX showing what the users of the website are currently doing. It will be written to often and then the most recent will be displayed on the site.
(e.g. John Smith recommended Jane Smith's article 2 seconds ago)
We currently are storing the feeds in MySQL but performance has been poor and I'm concerned with hindering performance throughout the rest of the website if we are constantly hitting the database to grab the most recent user activity as well as writing the feeds.
Any recommendations would be greatly appreciated!

Make use of the best caching solutions like memcache to increase performance. Other than scaling, there are no performance-increasing possibilities for an activity feed.
I would vote for using http://redis.io/ or http://www.mongodb.org/ as an alternative to MySQL for short-term, almost live activity feeds across a site. And a cron job to dump history of activities into MySQL for record keeping.
A look at tumblr's or twitters architectures can push you to the right direction as well.

You should take the microservices approach to separate between the datastore that stores the users' actions to the one that store the actual data.
Pub/Sub is the right approach to handle the big stream of users' actions.
Use Kafka or Google Pub/Sub cloud service for a scalable data pipeline. They can take the load with its scalable architecture.
Independently consume the messages from Kafka to some database such as MySQL or Google BigQuery for analytics purposes you must have.

Related

Real-time updates Saas

im developing a web-app. and one of my requirements is a page in which multiple users can share the same state. Kinda like a game room that all users see the same game and when one user interacts with an item everybody can see the progress. Ive got this covered with firebase and firestore. Now i want for all the users to see each others cursors. Since this requires alot of real-time updates, using firestore would be too slow and too pricy. Can anyone suggest a solution? Maybe some form of SaaS that can provide real time updates for a resonable price?
Gave it a go with firestore but that is too slow and pricy

How integrate FireStore Health Check and Dashboard metrics with our internal Company systems

Context: it is my first use of FireStore. I want to use it to push notification status to our Mobile Application. I can see that there is Google Firestore Dashboard under Analytics umbrella. In our company we use mainly three tools for monitoring our applications: Zabbix, Dynatrace and certain internal solution based on Elasticsearch. I need to ntegrate our internal monitoring systems with metrics resulted from our first Firestore project.
What I am looking for: based on personal assumptions:
1) Maybe there might exist either some GET endpoints that a I can connect and poll for information let's say each minute
2) Maybe, following the idea of Database Realtime pushing events accross a long time connection, I can code a Spring Boot application that import Firebase SDK and every day I connect to some specific Firestore endpoint which will push any interested events (eg. delay based on custom logic or dead service)
3) Maybe some plugin I can connect straight to a Kafka hosted in our internal Datacent
4) Some plugin to connect from Firestore/Firebase to either third tools (eg. Zabbix or Dynatrace or Elasticsearch)
5) Some dependency I could import in google-cloud-funtions thiggered from Firestore Healcheck engine in orther to consume some internal end-point posting data
Perhaps there is already some approach universally used for a scenario when you have to connect Firestore to internal monitoring system. I will be highly appreciated if tell me that than I can narrow my googling searchs because I am not finding anything usefull.
Please, it is not part of this question comparing Monitoring approach. It is a very solid fact in our company use internal Dashboards and some custom alerts trigger. I just mentioned the names above to clarify what I mean by internal monitoring tools. The focus on this question is HOW IMPORT/INTEGRATE/OBSERVE/CONSUME Firestore monitoring data. Our internal stack is beyond this question.
Here is the Official Documentation for Cloud Monitoring using which you can collect metrics, events, and metadata from Google Cloud Platform products that you can use to create dashboards, charts, and alerts.
Please let me know if you have further questions.

Logging user interactions with a program

We have an app and we want to log how the user is interacting with it. For example are they using the pages we expect them to. I dont want to log this via the app as it will be very hard for me to then get this information from the device. Each page interacts with webservices so I was planning to log that interaction.
I have had some thoughts on this
* as the webservice is being called add a logging table to the database - problem here could be performance impact
* use log4j async mode to log these details.
Does anyone have any other suggestion on how to do this? Im reading the Lean Startup at the moment (very good so far) and this sort of thing seems fundamental to it so Im wondering if there are any other tips to this.
Thanks
Since no one answered this for a couple months, I thought a couple pointers might help you...
Use mobile analytics tools
Fabric.io
Google Analytics for Mobile Apps
Flurry
Amazon Mobile Analytics
appsee
Have the server record what users access (that's the approach you're considering). To offload the overhead, there are a couple tactics you could employ (mix 'n match as you will):
Use async mechanisms (async operations in the server, such as Futures; log4j async mode; async databases; etc).
Use a separate database.
Use a NoSQL database only to write accesses. Later on you process that information in a separate analytics application.
Have the client (mobile app) record the actions and send them in bulk to the server once in a while (as frequently as you need / want / can afford).
Cheers

How to send batches of million push notifications using Amazon Simple Notification Service (SNS)

Like many companies, the one I work for isn't comfortable in using Apple's APNS: No official library, stream that get cut-off randomly, etc... The same goes for Android's push system: Limited to small batches, completely different to Apple's APNS... That's why we are looking for an alternative and when Amazon claimed to be able to send millions of push notifications almost for free, we thought that SNS would be the perfect solution.
The issue is that we frequently have more than one million devices to address, and obviously our push campaigns rarely target the same devices.
As far as we dug, the only solution is to use the AWS API that only provides a method to create the endpoints one at a time! That is a big deal for us because after some testing, we figured that in order to create 1 000 000 endpoints, it would take approximately 15h (~17 calls/sec).
Even after all the endpoints are created, in order to send all the pushes at once, the endpoints need to be added to a topic, and again, this has to be done one endpoint at a time (so 15 more hours).
Event if we multithreaded our calls to let's say 30 threads, it would still take an hour!
So, could anyone tell us if there is anything that we missed? Is Amazon really expecting us to flood their webservices during 30 hours in order to create one push campaign? How can they pretend to send a million pushes in a second if it takes hours to prepare it? Are they working on a batch API for SNS? Is it possible to plug an Amazon DB containing the tokens to feed an SNS topic?
It looks like Amazon provides a few methods of adding endpoints/tokens, including a CSV importer (but limited to 2MB csv files at a time). They also provide an API and sample java application for bulk uploading tokens (link).
The topic subscription point is addressed by an Amazon SNS employee here, essentially explaining that there is no batch API available for this unfortunately.
There are a few other 3rd party push notification providers that may better meet your needs when it comes to frequently creating custom segments/topics:
OneSignal (Disclosure: I run this company)
MixPanel
Parse

Does the Cloud solve the hosting location dilemma?

My startup is located in Europe where most of our current users are.
I'm looking for a host that will allow us to scale to the US and Asia without latency taking its toll on performance.
Does the cloud solve the distance = latency problem?
If not, Where would be the ideal hosting location for a growing startup?
Some data:
Asp.net 3.5
SQL 2005
Jquery (lots of Ajax)
MVC
Thanks
The Cloud is just an abstraction. It doesn't affect the underlying physical nature of the servers running your code and hosting your data. If the systems storing your data are a long way from your users there will some latency, no matter how you access them.
Most Cloud providers allow you to choose where you want your data - for example, Amazon S3 lets you choose to store your data in either the US or Europe - but no provider is going to be able to magically store all your data in multiple locations simultaneously.
If you want the benefit of multiple data centres you'd have to allow simultaneous updates at each location and there is no way to synchronise such updates without knowledge of the business logic of the application, so you're going to have to write some code to do this.
You're still going to have a look at what each Cloud provider offers and work out how each can help solve your problems, but you're going to have to do some work yourself.
What you're looking is CDN (Content Delivery Network) hosting for Windows Applications. In CDN, your content is cached on various POP's located across the continents. So, if a request is coming from India, cached copy of content stored on Indian POP is served. The same is the case for US, EU and other continent clients.
This technology is still in early phase of development and there are two types of CDN technology - PUSH & PULL. PUSH means content is immediately PUSHED to POP's when there is any change on Master server and PULL means POP servers are pulling content at regular interval from Master server and this interval is usually 12 hours to 24 hours.
If your site is database driven and frequently updated, PUSH technology CDN will be the right choice.

Resources