Most performant live search technique for mobile safari - performance

I am building a mobile web application that targets webkit. I have a requirement to perform a live search (on keypress) against a database of ~5000 users.
I've tried a number of different techniques:
On page load, making an AJAX call which loads an in-memory representation of all 5000 users, and querying them on the client. I tried sending JSON, which proved to be too large, and also a custom delimited string, which was then parsed using split(). This was better, but ultimately searches against this array of users was slow.
I tried using a conventional AJAX call, which would return users based on a query, also using the custom delimited string technique. This was better, but I was forced to tune it so that searches were only performed with a minimum of 3 characters. This is not optimal, as I would like to be able to start filtering after 1 character. I could also throttle the calls so that not every keystroke within a certain threshold triggered a request. This could help with performance, but I'd rather not have to fiddle with that sort of thing.
Facebook mobile does this very well if you try their friend search. Searches happen instantaneously, and are triggered after 1 character.
My question is, does anyone have any suggestions for faster live searches for a mobile app? Should I be looking at localStorage? Is this reliable, feasible?

Is there any reason you can't use a binary search? The names you're looking for should be in a block. If you want first and last name search, you could create a second copy of the data sorted by last name and look in both sets.
Some helpful but more complicated data structures that address this type of problem include:
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph
http://en.wikipedia.org/wiki/Trie

Related

Get number of results for Google Search Appliance query

We are trying to implement a feature on our search results page similar to dynamic navigation, but different enough that the included dynamic navigation feature cannot be used to meet our business requirements. We've got everything working the way we need, except for one thing.
Using "RES/M" will display the number of results for the current search, but we need the number for a different search, so that we could display that number before the user was to try that search, very similar to dynamic navigation, and if the number is 0, to not enable that other search.
For Example:
Current Search (234)
Search 2 (2)
Search 3 (67)
Search 4 (0)
Is there any way to get the number of results for a search other than the search that's being run at the time?
As far as I know the only way to do this, is to put in your frontend one ajax call for each additional search that you need.
This is not a clean solution, but the only that I know: for one of our customer we have implemented a similar solution using JQuery and it works very well.
Be careful about the number of ajax parallel request you do, because GSA has a maximum number of concurrent request that can handle depending on the version.
In the end, if you want an accurate result for each additional search, you can use the parameter rc=1, see here (Appendix A: Estimated vs. Actual Number of Results): http://static.googleusercontent.com/media/www.google.com/it//support/enterprise/static/gsa/docs/admin/70/gsa_doc_set/xml_reference/xml_reference.pdf. But be careful because this parameter can introduce high latency to the search.
Regards

How do you RESTfully get a complicated subset of records?

I have a question about getting 'random' chunks of available content from a RESTful service, without duplicating what the client has already cached. How can I do this in a RESTful way?
I'm serving up a very large number of items (little articles with text and urls). Let's pretend it's:
/api/article/
My (software) clients want to get random chunks of what's available. There's too many to load them all onto the client. They do not have a natural order, so it's not a situation where they can just ask for the latest. Instead, there are around 6-10 attributes that the client may give to 'hint' what type of articles they'd like to see (e.g. popular, recent, trending...).
Over time the clients get more and more content, but at the server I have no idea what they have already, and because they're sent randomly, I can't just pass in the 'most recent' one they have.
I could conceivably send up the GUIDS of what's stored locally. The clients only store 50-100 locally. That's small enough to stuff into a POST variable, but not into the GET query string.
What's a clean way to design this?
Key points:
Data has no logical order
Clients must cache the content locally
Each item has a GUID
Want to avoid pulling down duplicates
You'll never be able to make this work satisfactorily if the data is truly kept in a random order (bear in mind the Dilbert RNG Effect); you need to fix the order for a particular client so that they can page through it properly. That's easy to do though; just make that particular ordering be a resource itself; at that point, you've got a natural (if possibly synthetic) ordering and can use normal paging techniques.
The main thing to watch out for is that you'll be creating a resource in response to a GET when you do the initial query: you probably should use a resource name that is a hash of the query parameters (including the client's identity if that matters) so that if someone does the same query twice in a row, they'll get the same resource (so preserving proper idempotency). You can always delete the resource after some timeout rather than requiring manual disposal…

Search function for Windows Phone app

I am kind of a newbie at programming (have worked a bit with Delphi years back) but have started to build an application for Windows Phone 7.5 Mango, as I have a great idea for an app :D
In the application the user should be able to pick different locations from a list (a very large list, 5k+ items) - to make sure that all users always get the latest list, I have created a SQL on my website to generate the list as XML - which I load to the application via httpwebrequest; I am not quite sure what best practise is when dealing with a large list, which will be updated frequently etc.?
That is not the main question thou, because this seems to work pretty okay - my real question is, how to add a search function to my application, so the user can search for a location instead of scrolling throug the entire list?
My SQL is build up with ID, Country, State, Region, City (and a few more irrelevant tables for a search function).
I do not know what the best way to approach this is? Should I make a query on my website and generate the result as XML and use httpwebrequest to get the result to the phone - or should it be a search function on the device to search the entire list? And if so, how do I do that?
Thank you ;-)
First of all I have to inform you that fetching an list with over 5k+ items via a smartphone that not are using Wireless Network, will take a while. So, acrording to me there would be a huge waste of traffic to download the whole list if the user only are interested in a few items. This basically means that you are downloading a bunch of date but only are using 0,01% of it which is not the way you should build a program.
So, acording to me you should make an webservice so that the user can call the webservice and make a http request using its search parameter. And then you basically just use the parameter in an SQL Search Query, which could be an stored procedure or just bare in code, but I don't know how your server/database is build and structed so you can basically choose that whatever.
Here is an example:
SELECT * FROM TABLE_NAME WHERE (ID=#ID) OR (Country=#Country) OR (State=#State) OR (Region=#Region) OR (City=#City)
If i would have done this application I would have had two parameters. One that are representing the user input, in other words the search text and one that explains what the serach parameter is. (ID, Country, State, Region or City?).
You have to register your handle function for TextBox.TextChanged with some logic to filter queries that happens too often (for example typing name John may cause 4 requests: J, Jo, Joh, John). It can be done by using System.Threading.Timer with delayed start and changing it starting time when user types new character (if you need example - ask).
Then I recommend you to use WCF service to "talk" with SQL database. On WCF service use any ORM (Entity framework is simplest) to query you database.

Does soCaseInsensitive greatly impact performance for a TdxMemIndex on a TdxMemDataset?

I am adding some indexes to my DevExpress TdxMemDataset to improve performance. The TdxMemIndex has SortOptions which include the option for soCaseInsensitive. My data is usually a GUID string, so it is not case sensitive. I am wondering if I am better off just forcing all the data to the same case or if the soCaseInsensitive flag and using the loCaseInsensitive flag with the call to Locate has only a minor performance penalty (roughly equal to converting the case of my string every time I need to use the index).
At this point I am leaving the CaseInsentive off and just converting case.
IMHO, The best is to assure the data quality at Post time. Reasonings:
You (usually) know the nature of the data. So, eg. you can use UpperCase (knowing that GUIDs are all in ASCII range) instead of much slower AnsiUpperCase which a general component like TdxMemDataSet is forced to use.
You enter the data only once. Searching/Sorting/Filtering which all implies the internal upercassing engine of TdxMemDataSet it's a repeated action. Also, there are other chained actions which will trigger this engine whithout realizing. (Eg. a TcxGrid which is Sorted by default having GridMode:=True (I assume that you use the DevEx. components) and having a class acting like a broker passing the sort message to the underlying dataset.
Usually the data entry is done in steps, one or few records in a batch. The only notable exception is data aquisition applications. But in both cases above the user's usability culture allows way greater response times for you to play with. (IOW how much would add an UpperCase call to a record post which lasts 0.005 ms?) OTOH, users are very demanding with the speed of data retreival operations (searching, sorting, filtering etc.). Keep the data retreival as fast as you can.
Having the data in the database ready to expose reduces the risk of processing errors when you'll write (if you'll write) other modules (you need to remember to AnsiUpperCase the data in any module in any language you'll write). Also here a classical example is when you'll use other external tools to access the data (for ex. db managers to execute an SQL SELCT over the data).
hth.
Maybe the DevExpress forums (or ever a support email, if you have access to it) would be a better place to seek an authoritative answer on that performance question.
Anyway, is better to guarantee that data is on the format you want - for the reasons plainth already explained - the moment you save it. So, in that specific, make sure the GUID is written in upper(or lower, its a matter of taste)case. If it is SQL Server or another database server that have an guid datatype, make sure the SELECT make the work - if applicable and possible, even the sort.

Fast Text Search Over Logs

Here's the problem I'm having, I've got a set of logs that can grow fairly quickly. They're split into individual files every day, and the files can easily grow up to a gig in size. To help keep the size down, entries older than 30 days or so are cleared out.
The problem is when I want to search these files for a certain string. Right now, a Boyer-Moore search is unfeasibly slow. I know that applications like dtSearch can provide a really fast search using indexing, but I'm not really sure how to implement that without taking up twice the space a log already takes up.
Are there any resources I can check out that can help? I'm really looking for a standard algorithm that'll explain what I should do to build an index and use it to search.
Edit:
Grep won't work as this search needs to be integrated into a cross-platform application. There's no way I'll be able to swing including any external program into it.
The way it works is that there's a web front end that has a log browser. This talks to a custom C++ web server backend. This server needs to search the logs in a reasonable amount of time. Currently searching through several gigs of logs takes ages.
Edit 2:
Some of these suggestions are great, but I have to reiterate that I can't integrate another application, it's part of the contract. But to answer some questions, the data in the logs varies from either received messages in a health-care specific format or messages relating to these. I'm looking to rely on an index because while it may take up to a minute to rebuild the index, searching currently takes a very long time (I've seen it take up to 2.5 minutes). Also, a lot of the data IS discarded before even recording it. Unless some debug logging options are turned on, more than half of the log messages are ignored.
The search basically goes like this: A user on the web form is presented with a list of the most recent messages (streamed from disk as they scroll, yay for ajax), usually, they'll want to search for messages with some information in it, maybe a patient id, or some string they've sent, and so they can enter the string into the search. The search gets sent asychronously and the custom web server linearly searches through the logs 1MB at a time for some results. This process can take a very long time when the logs get big. And it's what I'm trying to optimize.
grep usually works pretty well for me with big logs (sometimes 12G+). You can find a version for windows here as well.
Check out the algorithms that Lucene uses to do its thing. They aren't likely to be very simple, though. I had to study some of these algorithms once upon a time, and some of them are very sophisticated.
If you can identify the "words" in the text you want to index, just build a large hash table of the words which maps a hash of the word to its occurrences in each file. If users repeat the same search frequently, cache the search results. When a search is done, you can then check each location to confirm the search term falls there, rather than just a word with a matching hash.
Also, who really cares if the index is larger than the files themselves? If your system is really this big, with so much activity, is a few dozen gigs for an index the end of the world?
You'll most likely want to integrate some type of indexing search engine into your application. There are dozens out there, Lucene seems to be very popular. Check these two questions for some more suggestions:
Best text search engine for integrating with custom web app?
How do I implement Search Functionality in a website?
More details on the kind of search you're performing could definitely help. Why, in particular do you want to rely on an index, since you'll have to rebuild it every day when the logs roll over? What kind of information is in these logs? Can some of it be discarded before it is ever even recorded?
How long are these searches taking now?
You may want to check out the source for BSD grep. You may not be able to rely on grep being there for you, but nothing says you can't recreate similar functionality, right?
Splunk is great for searching through lots of logs. May be overkill for your purpose. You pay according to the amount of data (size of the logs) you want to process. I'm pretty sure they have an API so you don't have to use their front-end if you don't want to.

Resources