"Sounds like" based on Soundex or Metaphone is not uncommon option for proprietary full text search in databases ( Oracle, MS SQL Server ) or open source search engines such as Lucene.
I have a difficulty using Google :) to find out whether anything similar does exist for advanced Google search. Regarding wildcard search it seems to be implied only ( by stemming which doesn't always produce everything we want as if it were a wildcard ) , but what about "sounds like". Is anything similar available only at Google App level , but not at Google website itself ?
Google has a spell-checking algorithm, which includes sounds-like, typos, and common misspellings. It usually shows the corrected spelling as the "Did You Mean" link at the top of the results page, and sometimes it automatically shows the search results of the suggested spelling as well, as in: http://www.google.com/search?q=haskel+programing+language.
Related
I'm facing a weird problem with Google search , when I search for my website using these keywords "dardasha newspaper" ... I got the expected correct result. my site comes first with site-links included.
https://www.google.com/search?q=dardasha+newspaper&ie=utf-8&oe=utf-8
But when I search for my website using these keywords "جريدة دردشة", I got the correct result but with no site-links
https://www.google.com/search?q=dardasha+newspaper&ie=utf-8&oe=utf-8#q=%D8%AC%D8%B1%D9%8A%D8%AF%D8%A9+%D8%AF%D8%B1%D8%AF%D8%B4%D8%A9
Even my website's language is "Arabic" - the second one used for the search. ... Why are the search results different based on used keywords?
The results are expanded to site-links in Google results when you search by website domain or very close.
Your website is www.dardashanewspaper.com and you searched by dardasha newspaper which is the domain name.
Another problem is that Google thinks that dardasha in Arabic is : درداشا not دردشة.
The comment in the following post is particularly helpful in understanding part of the algorithm.
How does Chrome update URL bar completions?
Yet questions remain here. I did some experiment on Chrome:
When I input "eddit", it only suggests "reddit" for general google search, while if I input "reddit" fully, historical reddit url pops.
If I input substring of "facebook" or "google" or "youtube", then urls pop successfully. Say "ceb", "ogl", "utu". Hence tries should not be the (only) data structure used here.
Furthermore, I know Chrome is using sqlite's fts to do full text search(sqlite attribute fts 3/4 to Google). So I guess that Chrome is using inverted index of url in sqlite.
My question is:
How does Chrome manages to autocomplete "utu" -> "youtube"?(based on my local history urls)
I know suffix array/tree can match substring efficiently. But finding the particular word "youtube" will be linear.
I guess a tailored tokenizer(for fts3/4) may achieve this. Say "google" -> {"g",..., "gle", ..}. But there will be too many tokens generated.
I'm curious what the programming terms or methodology is used when Google shows you the "did you mean" link for a word that is made up of multiple words?
For example if I type in "redflower.jpg" It knows to break that up into Red Flower
Is there a common paradigm for doing that sort of operation? Would a Lucene search give you that?
thanks!
If google does not see a lot of matching results for reflowers.jpg, it might then try to cut the words in multiple words until it finds a lot of matching results.
It might also recognize the extension (.jpg), recognize the image extension and then try to find images with the similar name.
If I would have to make an algorithm like this, I would use an huge EXISTING database (either a dictionary or a search engine) and then try what I said in the beginning of my post.
Perhaps they could to look at what other people do when they have searched for redflowers.jpg? Maybe a number of people searched for "redflowers.jpg", didn't click on any links, and then searched for "Red Flower" and found some results worth clicking on.
Of course they would have to take into account that the queries are similar (contain matching strings), otherwise some strange results might appear.
I've been using usenet searches since about 1995 to get programming information, mostly for microsoft APIs. First searching via dejanews, and now google "groups" which bought out dejanews. Over the last few years I've noticed a steady decline in the quantity of search results for usenet from google, and today I find I'm completely unable to get a working usenet search on their advanced group search page. I'm used to searching on "microsoft.*" sometimes suplemented with "microsoft" or "microsoft*". Just try to find a post from 1996-1998 time period on "database" in either the comp.* or microsoft.* hierarchies, and if you can do it, please show your search expression. There should be thousands of results.
http://groups.google.com/groups/search?safe=off&q=database+group%3Amicrosoft*&btnG=Rechercher&as_mind=1&as_minm=1&as_miny=1996&as_maxd=1&as_maxm=1&as_maxy=1999&as_drrb=b&sitesearch=
seems to work nicely... 994 results (no thousands but still...)
It appears to be problem with the advanced search form. I can't get the one at
http://groups.google.fr/advanced_search?hl=fr&q=&hl=fr&
to work either. But I can use the basic form with "database group:microsoft*" and I get many results as expected.
http://www.google.ca/groups/search?safe=off&q=database+group%3Acomp.*&btnG=Search&sitesearch=
returns 3,000 results
The advanced search isn't working for me either:
Broken advanced search results URL
However, removing lr=selected from the query string in that URL makes it work, for some reason:
Working advanced search results URL
In fact, hitting the search button again on the broken advanced search results page will return those results as well for me.
Or actually, it's only partly working, since entering multiple comma-separated groups in the advanced search form (or using the group: search operator) doesn't quite work as expected and ends up adding all the words in the additional group names as search keywords too.
You could try learning Julian dates and use the daterange search operator:
Search results using daterange:
The latest makefiles we've received from a third party vendor contain rules with --depend on the end of build rules, so I thought I would look it up on Google, but try as I might, I can't persuade it to display any pages with exactly the characters --depend
I've tried surrounding it with quotes "--depend": I've tried the Advanced Search: I've tried backslashes "\-\-depend" in the (vain) hope that there is some sort of unpublished regular expression search available.
Am I missing something blindingly obvious?
Please note that this is NOT a question about what --depend does, I know that, it's a question about how you Google for very precise, programmer oriented, text.
You can specifiy literal symbols in a Google Code Search but not Google Web Search.
Examples;
Google Code Search for +"--depend"
Google Web Search for +"--depend"
I had the same issue searching for 'syntax-rules'. You would think they would have solved this by now.
I remember to have read somewhere that google's web search does not index non alphanumeric characters, treating them as word separators, so that's not possible.
Reason for this problem is that a minus sign at the start of a token indicates that you want to EXCLUDE it from the search.
This is how you can filter out really popular results that really have nothing to do with you want.
For example, try searching for "wow". Then try searching for "wow -warcraft".