search APIs versus screen scraping

search APIs versus screen scraping - visual-studio

I would like to know as a newbie programmer what the benefits are of using for example google search API or newest buzz API for data content gathering instead of screen scraping; obviously apart from the legal aspects.

API's are less likely to change than a screen layout.

One big downside of screen scraping is that the screen can change and break your scraper. So you end up having to continually adjust your code to match theirs, and since you don't know about changes ahead of time, you suffer downtime/outages as a result.
Also, you may be violating their TOS, and they won't like it. If you have paying customers for your service, you can find yourself between a rock and a hard place pretty quickly.
Also, if you're simulating many users, you'll produce an unanticipated drag on the servers. So using a published/permitted API would be much more efficient for you, and for the web site serving up the source material.

Related

Cost efficient way of displaying web page view count with Go/Datastore

I'd like for each page to display the number of visitors; using Google Analytics does not seem to be an option since too many visitors use adblockers.
I guess there is no smart way to do it with Go since I doubt there is an efficient way of storing the counter in a text file; whereas using Datastore seems like a good option, but I guess the costs of updating and retrieving the view count on every request is not worth it.
I was hoping someone might be able to recommend a good and cheap solution (that preferably does not rely on third-party services that are not a part of the Google Cloud Platform)

Dynamically loaded Markers: DDOS prevention

My app shows a map where locations (or Markers) are dynamically loaded via an ajax (and database) request after every map Bounds changes.
I'm convinced that this solution is not scalable : at the moment, Europe area shows a total of 10 markers.
If the database grows and I display for instance 1000 locations, that means 1000 rows would be returned to the user.
This is not a JS / UI since I use the MarkerCluster plugin and I avoid the redraw of loaded locations's markers.
I made some tweaks :
- Delay the Ajax request thanks to an Idle gmaps event
- Increase the minimal zoom level, so the entire world can't be displayed.
But this is not enough.

There are lots of ways to approach this but I will just put here the two I think are most appropriate from your question.
First is to really control from your web app what information is asked for and when. You could write this all yourself in javascript and implement caching techniques ect. There are a number of libraries out there that do most of this work for you though.
I would recommend one of the following:
OpenGeo SDK
OpenLayers
GeoExt
Leaflet
All of these have ways of controlling local caching, when to get the data and what data is gathered from the server. Most of them can also be extended to add any functionality that is missing. The top two I know support google maps (as well as a number of others) as well.
If you need to add even more control over your data locally you could even look at implementing something like PouchDB. I think this is more suited to mobile applications or instances where the network connection is either really slow or intermittent.
This sort of solution should be able to easily handle 1000's to 10000's of features with 100's of users.
If you are really going to scale up to 100000's to 1000000's of features with 100's to 1000's of users then I would suggest adding a tile server to the soloution above. The tile server will sit between your web application and your data base. Most of them have lots of caching settings and optimistions for dealing with large datasets and pushing them out to a client. Because they push out tiles rather than features the data output remains reasonably constant even as the number of features grow. The OpenGeo SDK and Openlayers libraries I mentioned above can work really well with any of the following tile servers:
GeoServer
Mapserver
MapGuide
Quantum GIS Server
If you are reluctant to do any coding there are some offers that work out of the box for enterprise environments. They are all expensive and from your question I think they are probably not what you are looking for.

Which is better: {REST API, website} --> {database}, or {website} --> {REST API} --> {database}?

I have a product that gathers and displays measurements of all kinds (won't go into it). The display portion is, as one would expect, a database + website built on top of it (with Symfony).
However, we'll probably be creating an API to expose the data to third-parties as well.
Now, we either have the choice of building both the website and the API on top of the database, or just build the API on top, and have the website implement the API.
I would greatly prefer the latter, since otherwise I'll have to adapt both model layers for the API and the website every time the schema changes (which can be a few times).
If I have the latter I obviously have the advantage of only adapting the API model. If the API contract stays the same, the website wouldn't need adapting.
However, obviously there is a downside in performance.
With website <-> database, vs website <-> API <-> database, the first will obviously be the fastest.
My question is: what is your opinion on this trade-off?
I'm hoping the performance can be almost evened out, since all the machines will be on the same LAN + there will be caching. If that's the case, the ease of development would certainly make my life easier :-)
Looking forward to your opinions and experience!

If there was ever a case of premature optimization, this is it! You're not going to know the answer without more information, and I suspect very much that the performance differences between the two will be so negligible as to be irrelevant in your domain.
The best approach, IMO, is to spike on a few of your models using both approaches and see where that gets you.

No better way to make sure your API is going to be usable by others than to use it yourself. I would go website -> API -> database. Write it once, you can always tune it and "cheat" later if you have too.

Many modern websites use JavaScript (AJAX etc) and then make service calls to an API. If you took that approach you would simply have a carefully designed, reusable API layer in front of your DB.
I find that there's little or no extra effort here, and I'm sceptical that you'll incur noticable performance penalties.

What are some great web based interfaces that you use on a day to day basis?

I definitely appreciate a good interface and as a developer, I try to create them for my users. But appreciating a good interface and designing one are a different thing. I'm looking for good interfaces (such as IMHO StackOverflow, Gmail) as examples of good UI from which I can model my own UI's.

I personally think that Netflix has an excellent web UI. Responsive, easy to navigate. Not mutch CRUD going on, but I find it very comfortable.

Pretty much anything by google, really. They're all very simple and to the point, focusing on usability.

You should get yourself a copy of both Don't Make Me Think and The Non-Designer's Design Book for your base knowledge/insight.
From there, it's much easier for you to dissect and analyze the layouts you already know and like, and recreate them for your own amusement.
edit: To mitigate misunderstanding, the point I'm trying to make is that you probably don't need as many good examples of nice layouts, if you know what to look for. For example, I can be shown a thousand haute couture dresses, and I still couldn't make one myself, because I don't know what to look for.

My favorites
Stack Overflow: This is a WIKI so it's not a rep point grab. I just really love the interface on this site. Been to too many crappy Q/A sites
Google Reader
MSDN: It's gotten a ton better in recent years and is a great way to grab little esoteric details about various APIs

iStockPhoto.com it's simple, effective and handles a large amount of information and data without getting bogged down. It also doesn't get in the way of the info you are looking for.

A good user interface fulfills a specific need of its users effectively.
As an example, here is a site (translation) that I have created for finding out what food is available in the cafeterias of the University of Helsinki. The typical use case is that when a student is hungry, he needs to know what food is available in the neighborhood student cafeterias (which are cheap for students), so that he can choose where to eat and what. He knows where each of those cafeterias is, but does not know what food they have today.
That site shows all the needed information at once. Because the students typically have a couple of cafeterias where they go, they can either bookmark the page with those cafeterias selected, or save the selection as a cookie. After that they can reach their goal without any navigation on the web site.

I don't use it on a day-to-day basis, but I'm very impressed with the Perseus Project digital library.
Here's a link to a poem from Catullus' Carmina in Latin as an example of the interface. Some features that I really like:
Click on the bar near the top to jump to any poem in the work. Larger chunks of the bar represent larger sections of the work (poems, chapters, however that particular work is logically broken up by the author).
Click on a Latin word in the poem to bring up a window (be patient; it seems to take a while) with lexicon entries, user voting and statistics on the word form (i.e. what the inflection means in the context of the sentence; it can be ambiguous in Latin) and so forth.
There are a number of resources down the right column, including various English translations, notes, references, etc. Any of them can be either shown in the right column, or swapped out with whatever is in the main content area in the center.

One of my personal favs: newspond.com

Using flickr to get photos of a specific location and put together a model

I've read about systems which use the Flickr database of photos to fill in gaps in photos (http://blogs.zdnet.com/emergingtech/?p=629).
How feasible is a system like this? I was toying with the idea (not just a way of killing time but as a good addition to something I am coding) of using Flickr to get photos of a certain entity (in this case, race tracks) and reconstruct a model. My biggest concern is that there aren't enough photos of a particular track and even then, it would be difficult to tell if two photos are of the same part of the racetrack, in which case one of them may be irrelevant.
How feasible is something like this? Is it worth attempting by a sole developer?

Sounds like you're wanting to build a Photosynth style system - check out Blaise Aguera y Arcas' demo at TED back in 2007. There's a section about 4 minutes in where he builds a model of the Sagrada Família from photographs.

I say +1 for photosynth answer, its a great tool. Not sure how well you could incorporate it into your own app though.
Its definately feasable. Anything is possible. And yes, doable for a single developer, just depends how much free time you have. It would be great to see something like this integrated into Virtual Earth or Google Maps Street View. Someone who could nail some software like this could help 3D model the entire world based purely on photographs. That would be a great product and make any single developer rich and famous.
So get coding. :)

I have plenty of free time, as I am in between jobs.
One way to do it is to get an overhead view of the track layout, make a blueprint based on this model, and then get one photo of the track and mimic the track's road colour. That would be a start.

LINQ to Flickr on codeplex has a great API and would be helpful for your task.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio