Full Text search engine other than Lucene Search text - full-text-search

I need a full text search engine which should support internationalization.
Thanks

Use Sphinx with MySQL

There is one called Xapian. I haven't used it but I've heard good things.

I've used Ferret ( Ruby ) and worked for me, unfortunately it only works in ruby 1.8.x , (It's not supported in Ruby 1.9)
Other solutions were already mentioned: Sphinx, Xapian , also SolR ( based on Java/Lucene) should work)

an old yet good reference.
http://wrg.upf.edu/WRG/dctos/Middleton-Baeza.pdf

Related

jedi-vim autocompletion. complete by grep-style searching

I'm comparing PyCharm autocomplete search and jedi-vim in Vim.
In PyCharm I'm able to see the list of methods. where the search pattern might in the beginning/middle/end.
In VIM I can only see methods, which starting at the beginning of the search pattern.
I wonder if it's my configuration or it's as expected? If not what suggestion can you propose. Thanks.
PyCharm example:
Vim example:
Ivan
It's currently not a feature of jedi-vim to search for strings in names (not that this would be hard, but it's not a Jedi feature).
If you really want this, please try out YouCompleteMe. It also uses Jedi and has support for generic substring searching.

Sphinx alternative without Rails

I need an indexing and searching gem like Sphinx but without needing Rails Any suggestions ? It has to run under Ruby 1.9.3 on a windows box. Tried Sphinx without rails but it needs MySql and a lot of configuration, diddn't succeed. Can you recommend something that uses a build in db or feature like Sqlite ?
While certain plugins (like thinking_sphinx) are Rails-specific, Sphinx itself is just a search server, and you can use the client gem directly to index and search whatever you want; it need not be in ActiveRecord (which, incidentally, you could also use outside of Rails if you wished).
Another alternative is Apache Solr, which provides similar functionality to Sphinx, including a sophisticated search system (supporting stemming and lots of other nice things).
Although Sphinx itself works regardless of whether you use Rails or not, if you're having difficulties with it, you can always try something like Solr or Elastic Search. I heard good things about the latter and it is quite trivial to get to run on Windows, as you seem to be.
Another alternative for sphinx,solr is OpenSearchServer . A search engine based on lucene. And it has a good windows support. Have a look at the documentation here how to install in windows.You can use the API to integrate in Ruby.

Approximate string matching in Ruby

I am implementing user search functionality in my Rails application. I want the application to suggest the correct spellings if the user makes a mistake in typing the spelling. Is there any plugin for this in ruby. Can this be done in sql?
Regards,
Pankaj
It looks like hunspell gem is there to help you. It requires some external dependencies so it's not pure Ruby. But according to this readme it seem to be exactly what you are looking for.
Alternatively, you can try BOSSMan. It looks like it spell checks via Yahoo.

Ruby web spider & search engine library

I'm looking for a Ruby library or gem (or set of gems) which will not only do spidering, but also collect the data into, say, a database, and allow basic searches on the data (i.e. a typical web search).
I've found several spidering libraries, so that part seems well covered (I was going to try Anemone first), but I can't find anything that will take the spidered data and allow querying on it. For lack of an existing one, I was going to write something myself with Anemone.
Any suggestions?
That blog post might give you some pointers. Also, look into ferret for the search part.
there is a ruby gem may help you:
http://spidr.rubyforge.org/
There are lots of great stuff on github.com

Segmentation fault in hpricot

I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error?
I was trying to parse html pages with many unicode characters in them and Hpricot kept crashing. Finally, I used the monkey patch from sanitize and put it in the environment.rb for my rails application. There hasn't been a single crash since I added this patch:
http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb
If you're free to choose your HTML parsing library, switch it.
Why, the creator of Hpricot, recently posted that you should better use Nokogiri instead of HPricot, nowadays.
You may also have a look at HTTParty.
On ruby 1.8.5 try using hpricot -v 0.6.161
That worked for me.
From memory, since I last used it about a year ago:
Hpricot stores attributes in a fixed-size buffer, and some frameworks generate outrageously long hashes in document attributes. There's some static field you can set before parsing that lets you set the size of this buffer.
I remember it being fairly prominent in the docs on the webpage, though of course it's gone now.
Well, based on your own question, I'd say "Upgrade to the latest version of Ruby". However, I've also had problems with hpricot segfaulting, which seemed to be related to my usage of threading.
This appears to be an outstanding issue on the bug list. I have experienced it to. My theory is has to do with the HTML structure or bad/corrupt character in the file but I have not found where exactly.
Here are the links to the issues:
http://github.com/why/hpricot/issues/#issue/10
http://github.com/why/hpricot/issues/#issue/4
I'm having the same segfault issue but sadly can't consult the issues Dave cited above, even via Google cache -- from what I've been googling the parse.rb segfaults have to do with encoded entities or alt character sets (accented characters perhaps)
The sanitize lib encountered the same issue and posted a monkeypatch here:
http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

Resources