Dynamically loaded Markers: DDOS prevention - performance

My app shows a map where locations (or Markers) are dynamically loaded via an ajax (and database) request after every map Bounds changes.
I'm convinced that this solution is not scalable : at the moment, Europe area shows a total of 10 markers.
If the database grows and I display for instance 1000 locations, that means 1000 rows would be returned to the user.
This is not a JS / UI since I use the MarkerCluster plugin and I avoid the redraw of loaded locations's markers.
I made some tweaks :
- Delay the Ajax request thanks to an Idle gmaps event
- Increase the minimal zoom level, so the entire world can't be displayed.
But this is not enough.

There are lots of ways to approach this but I will just put here the two I think are most appropriate from your question.
First is to really control from your web app what information is asked for and when. You could write this all yourself in javascript and implement caching techniques ect. There are a number of libraries out there that do most of this work for you though.
I would recommend one of the following:
OpenGeo SDK
OpenLayers
GeoExt
Leaflet
All of these have ways of controlling local caching, when to get the data and what data is gathered from the server. Most of them can also be extended to add any functionality that is missing. The top two I know support google maps (as well as a number of others) as well.
If you need to add even more control over your data locally you could even look at implementing something like PouchDB. I think this is more suited to mobile applications or instances where the network connection is either really slow or intermittent.
This sort of solution should be able to easily handle 1000's to 10000's of features with 100's of users.
If you are really going to scale up to 100000's to 1000000's of features with 100's to 1000's of users then I would suggest adding a tile server to the soloution above. The tile server will sit between your web application and your data base. Most of them have lots of caching settings and optimistions for dealing with large datasets and pushing them out to a client. Because they push out tiles rather than features the data output remains reasonably constant even as the number of features grow. The OpenGeo SDK and Openlayers libraries I mentioned above can work really well with any of the following tile servers:
GeoServer
Mapserver
MapGuide
Quantum GIS Server
If you are reluctant to do any coding there are some offers that work out of the box for enterprise environments. They are all expensive and from your question I think they are probably not what you are looking for.

Related

Conditional Incremental builds in Nextjs

Context
I am learning Nextjs which is a framework for developing react applications quickly by providing many functionalities out of the box such as Server Side Rendering, Fast Refresh and many others out of the box without any configuration. It also provides a functionality to optionally generate some web pages statically which are pre rendered at build time instead of rendering on demand. It achieves it by querying the data required for the page at build time. Nextjs also provides an optional argument expressed in seconds after which the data is re queried and the page re rendered. All of it happens on page level rather than rebuilding the entire website.
Problems
We cannot know in advance how frequently data would change, the data may change after 1 second or 10 minutes and it is impossible to know in advance and extremely hard to predict. However, it is most certainly not a constant number of seconds. With this approach, I might show outdated information due to higher time limit or I might end up querying the database unnecessarily even if data hasn't changed.
Suppose I have implemented some sort of pagination and I want to exploit the fact that most users would only visit first few pages before going to a different link. I could statically pre render first 1000 pages, so the most visited pages are served statically without going to the database whereas the rest are server side rendered. Now, if my data might change frequently, I would have to re render the first 1000 pages after regular intervals and each page would issue a separate query against the same database or external API which would cause too many round trips. I am not aware of the details of Nextjs but I suspect this would be true because Nextjs does not assume anything about the function which pulls the data and a generic implementation would necessitate it.
Attempted Solution
Both problems can be solved by client or server side rendering because the data would be fetched on demand but we lose the benefits of static generation specifically serving static assets compared to querying the database. I believe static generation would be useful if mutations to my data happen infrequently most of the time but we still want to show the updated information as fast as we can when it becomes available.
If I forget about Nextjs for a a while, both problems can be solved by spawning a new process which listens for mutations to the relevant data and only rebuilds those static assets which needs to be updated; kind of like React updates components but on server side. However Nextjs offers a lot of functionalities which would be difficult to replicate, so I cannot use this approach.
If I want to use Nextjs, problem (1) seems impossible to solve due to (perceived?) limitation of Nextjs which only offers one way to rebuild static pages, periodically re render them after a predetermined time. However, (2) can be solved by using some sort of in memory cache which pulls all the required data from the data store in one round trip and structures it up for every page. Then every page will pull data from this cache instead of the database. However, it looks like a hack to me.
Questions
Are there other ways to deal with the problem I might have have missed?
Is there a built-in way to deal with problem (1) and (2) in Nextjs?
Is my assessment of attempted solutions and their viability correct?

LocalStorage, several Ajax requests or huge Ajax request?

I'm facing a really huge issue (at least for me). I'll give you some background.
I'm developing a mobile web app which will show information about bus stations and bike stations in my city. It will show GMap markers for both bus and bike stops and, if you click it, you will have info about the arrival time for the bus or how many bikes are available. I have that covered pretty nicely.
The problematic part is loading the stations.
My first approach was to load the whole amount of stations at page loading. It is about 200 Kb of JSON plus the time it gets to iterate through the array and put it in the map. They are hiddenly loaded so, when the users click on the line name, they appear using the 'findMarker' function. If other stations were present at the map, they get hidden to avoid having too many markers in the map.
This worked good with new mobiles such as iPhone 4 or brand new HTC but it doesn't perform good with 2 years old mobiles.
The second approach I thought, was to load them by request, so you click on the station, they are loaded to the map, and then shown but this leads to two problems:
The first is that you may be (or may be not) perform several requests
which may ends the same (?)
The second one is that to avoid having so
many markers, they should be hidden or deleted so you may be deleting
info that should be needed again in a while, like bike stations that
are loaded as a group and not by line.
Finally, I thought about using LocalStorage to store them, as they won't change that much but will be a huge amount of data and then, a pain in the ass to retrieve them as they are key-value, and also (and I'm not really sure about that) I could find devices with no support of this feature having to fallback to one of the other options.
So, that said, I thought that may be someone faced some similar problem and solved it in some way or have some tips for me :).
Any help would be really appreciated.
The best approach depends on the behaviour of your users. Do they typically click on a lot of stations or just a few? Do they prefer a faster startup (on-demand loading) or a more responsive station detail display (pre-loading data)?
An approach worth investigating would be to load the data by request but employ browser caching (ETag and Expires headers) to avoid retrieving the same information over and over again. This would solve both your concerns without dealing with LocalStorage.
See this this question for different approaches for browser caching ETag vs Header Expires

Performance Issue with Doctrine, PostGIS and MapFish

I am developing a WebGIS application using Symfony with the MapFish plugin http://www.symfony-project.org/plugins/sfMapFishPlugin
I use the GeoJSON produced by MapFish to render layers through OpenLayers, in a vector layer of course.
When I show layers up to 3k features everything works fine. When I try with layers with 10k features or more the application crash. I don't know the threshold, because I either have layers with 2-3k features or with 10-13k features.
I think the problem is related with doctrine, because the last report in the log is something like:
Sep 02 13:22:40 symfony [info] {Doctrine_Connection_Statement} execute :
and then the query to fetch the geographical records.
I said I think the problem is the number of features. So I used the OpenLayers.Strategy.BBox() to decrease the number of feature to get and to show. The result is the same. The app seems stuck while executing the query.
If I add a limit to the query string used to get the features' GeoJSON the application works. So I don't think this is related to the MapFish plugin but with Doctrine.
Anyone has some enlightenment?
Thanks!
Even if theorically possible, it’s a bad idea to try to show so many vector features on a map.
You'd better change the way features are displayed (eg. raster for low zoom levels, get feature on click…).
Even if your service answer in a reasonable time, your browser will be stuck, or at least will have very bad performance…
I’m the author of sfMapFishPlugin and I never ever tried to query so many features, and even less tried to show them on a OL map.
Check out OpenLayers FAQ on this subject: http://trac.osgeo.org/openlayers/wiki/FrequentlyAskedQuestions#WhatisthemaximumnumberofCoordinatesFeaturesIcandrawwithaVectorlayer , a bit outdated with recent browsers improvements, but 10k vector features on a map is not reasonable.
HTH,

Which is better: {REST API, website} --> {database}, or {website} --> {REST API} --> {database}?

I have a product that gathers and displays measurements of all kinds (won't go into it). The display portion is, as one would expect, a database + website built on top of it (with Symfony).
However, we'll probably be creating an API to expose the data to third-parties as well.
Now, we either have the choice of building both the website and the API on top of the database, or just build the API on top, and have the website implement the API.
I would greatly prefer the latter, since otherwise I'll have to adapt both model layers for the API and the website every time the schema changes (which can be a few times).
If I have the latter I obviously have the advantage of only adapting the API model. If the API contract stays the same, the website wouldn't need adapting.
However, obviously there is a downside in performance.
With website <-> database, vs website <-> API <-> database, the first will obviously be the fastest.
My question is: what is your opinion on this trade-off?
I'm hoping the performance can be almost evened out, since all the machines will be on the same LAN + there will be caching. If that's the case, the ease of development would certainly make my life easier :-)
Looking forward to your opinions and experience!
If there was ever a case of premature optimization, this is it! You're not going to know the answer without more information, and I suspect very much that the performance differences between the two will be so negligible as to be irrelevant in your domain.
The best approach, IMO, is to spike on a few of your models using both approaches and see where that gets you.
No better way to make sure your API is going to be usable by others than to use it yourself. I would go website -> API -> database. Write it once, you can always tune it and "cheat" later if you have too.
Many modern websites use JavaScript (AJAX etc) and then make service calls to an API. If you took that approach you would simply have a carefully designed, reusable API layer in front of your DB.
I find that there's little or no extra effort here, and I'm sceptical that you'll incur noticable performance penalties.

search APIs versus screen scraping

I would like to know as a newbie programmer what the benefits are of using for example google search API or newest buzz API for data content gathering instead of screen scraping; obviously apart from the legal aspects.
API's are less likely to change than a screen layout.
One big downside of screen scraping is that the screen can change and break your scraper. So you end up having to continually adjust your code to match theirs, and since you don't know about changes ahead of time, you suffer downtime/outages as a result.
Also, you may be violating their TOS, and they won't like it. If you have paying customers for your service, you can find yourself between a rock and a hard place pretty quickly.
Also, if you're simulating many users, you'll produce an unanticipated drag on the servers. So using a published/permitted API would be much more efficient for you, and for the web site serving up the source material.

Resources