Can ElasticSearch return relevant passages and not the entire document - elasticsearch

I'm looking for a search engine that is able to return just relevant passages as a result and not the entire documents. Is ElasticSearch able to do this?

If you are looking to extract part of a long document, look at Highlighting. Specifically parameters like fragment_size (100 characters by default) or boundary_chars will help you to build that functionality.

Related

FileNet Social Collaboration - search by comments

We have social collaboration enabled on our FileNet system. I can add comment, tag, like and track how many times a document has been downloaded. These features are nice. When I tag a document, I can search documents by the tag text.
Ex: If I tag a document as say "test". I can user a search template to search for a document by its tag value i.e. test.
When I comment, I can't search document based on Comment Text.
Say I added a comment as "good doc". I can't search it by the text. Rather I need to provide an integer value like 1 search. Then search happens like "get all documents which has number of comments =1". I don't want this behavior instead I should be able to search on the comment text.
Can anybody help on this?
One way to achieve this would be to use CBR on the property. See how to enable CBR on a property
The property will then be full-text searchable using the CONTAINS statement, see doc.
Optionally (but i'm not sure as i've never personally used it) - the satisfies operator might exactly what you're looking for according to the documentation.

Elasticsearch: Return same search results regardless of diacritics/accents

I've got a word in the text (e.g. nagymező) and I want to be able to type in the search query nagymező or nagymezo and it should show this text which contains that word in the search results.
How can it be accomplished?
You want to use a Unicode folding strategy, probably the asciifolding filter. I'm not sure which version of Elasticsearch you're on, so here are a couple of documentation links:
asciifolding for ES 2.x (older version, but much more detailed guide)
asciifolding for ES 6.3
The trick is to remove the diacritics when you index them so they don't bother you anymore.
Have a look at ignore accents in elastic search with haystack
and also at https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html (look for 'diacritic' on the page).
Then, just because it will probably be useful to someone one day or the other, know that the regular expression \p{L} will match any Unicode letter :D
Hope this helps,

autocomplete and search in Elasticsearch

Is there any possibility to make a search on two non-complete words in the same field using Elasticsearch in Rails? I mean the situation when I could successfully search for example "victorian buildings" phrase by inserting into search input for example "vict bui" phrase (only beginnings of words, also with fuzziness).
Partial match (word_start, text_start etc. available in Searchkick) doesn't work in this project. I've also tried using wildcard queries, but it also failed. Maybe writing some custom mappings/settings would be a good idea?
Can I ask you for any suggestions on what to search/read to do this task?
Try this example
"%#{params[:place]}%"
Since % is a wildcard, doing a like on '%%' matches everything,
and you get all the records in the result.

Elastic Greeklish to Greek conversion

I am new to a elastic and I am trying to find a way to convert greeklish character to greek when the search executes.
e.g word "papoutsia" to be searched as "παπουτσια" (shoes)
Due to my search I found the following plugins:
elasticsearch-analysis-greeklish
elasticsearch-skroutz-greekstemmer
Applied the filters to my index as the example but my queries still hit nothing.
Do I have to apply the filter some way in every query or do a special one?
Sorry I this question has a very large/broad answer to be given.
I trying to figure how the whole filtering thing works for a couple of days to understand if I am even in the correct direction or have to find an other way for this solution.
Unfortunately, the intention of the greeklish plugin / char filter is the inverse of what you want to achieve:
Using this filter, you can retrieve greek text from a document, using a query that is written in latin characters ("greeklish").
So, for your example, you can add a document with the text παπούτσια and retrieve it using the terms papoutsia, papoutsi, etc.
We have prepared a detailed text pipeline example in the repo's wiki for future reference.

Google Search Appliance isn't displaying document's title

For some reason, our Google Search Appliance isn't displaying the title of some of our larger files (even though they have a title property). Instead, it's showing the filepath. For example, it does this for 3 word documents that are about 4mb, but it doesn't do it for a powerpoint file that is around 5mb. Any idea what causes this and if there is a workaround to get the title to display?
The GSA will fetch a title based on meta, or the title it can find in the document. If it cannot find a suitable title tag it will use the filepath. Suitability could be length, format, character encoding, position, etc.
This used to be well documented, but I am struggling to find it now apart from vague mentions here https://support.google.com/gsa/answer/4411411?hl=en
Also, as a less important check make sure file sizes are not being exceeded as per the configuration. https://support.google.com/gsa/answer/4411411?hl=en
You can read about how GSA determines the title of documents here.
I don't think file size matters here unless you have specified very less value for Crawl and Index > Index Settings.
You might be missing the "&getfields=title" query string in your search call. To just get all fields in tags, you can just set the query string to "&getfields=*".

Resources