Given a Document ID find the matching Document in Elasticsearch - elasticsearch

I have indexed some articles in the Elasticsearch. Now suppose a user likes an article now i want to recommend some matching article to him. Assuming articles are precise and well written to the point. All articles are of same type.
I know it is like getting all the tokens related to that article and searching all other article on them. Is there anything in elastic search which does this for me...?
Or any other way of doing this..?

You can use More Like This Query:
From the doc it selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. 
Usage:
{
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "your index",
"_type" : "articles",
"_id" : "1" # your document id
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}

Related

Elasticsearch Partial Phrase Match

In Elasticsearch, I would like to match the record "John Oxford" when searching "John Ox". I'm currently using a match_phrase_prefix as such:
{
"query": {
"match_phrase_prefix":{
"SearchName": {
"query": "John Ox"
}
}
}
}
I know this doesn't work because, as the docs state:
While easy to set up, using the match_phrase_prefix query for search autocompletion can sometimes produce confusing results.
For example, consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.
The problem is that the first 50 terms may not include the term fox so the phrase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears.
For better solutions for search-as-you-type see the completion suggester and the search_as_you_type field type.
Is there another way to achieve this, then, without changing the way the data is stored in ES?
You can use match bool prefix query. Adding a working example with index data and search query
Index Data:
{
"name":"John Oxford"
}
Search Query:
{
"query": {
"match_bool_prefix" : {
"name" : "John Ox"
}
}
}
Search Result:
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.287682,
"_source" : {
"name" : "John Oxford"
}
}
]

How do I combine different indexes in a "more like this" query?

The docs of the MLT query give following example (abbreviated by me) to retrieve a document similar to an existing document:
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "imdb",
"_id" : "1"
}],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
Which seems to compare the "title" and "description" fields among movie titles to the one movie with ID 1. Suppose I have an index for people's comments though and I would like to get all movie titles which have a "title" or "description" similar to one particular comment.
I know that I could provide free text as a value for the "like" field - the document (comment) is already part of another index though, so I would like to use that one. Just not based on the "title" and "description" fields (which would not exist on a comment), but let's say its "body" field. How would I do that?
You can add the same alias on both indexes : https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html
and run the query against the alias
note: this will cause a higher load on your elastic cluster.

How to Query just all the documents name of index in elasticsearch

PS: I'm new to elasticsearch
http://localhost:9200/indexname/domains/<mydocname>
Let's suppose we have indexname as our index and i'm uploading a lot of documents at <mydoc> with domain names ex:
http://localhost:9200/indexname/domains/google.com
http://localhost:9200/indexname/domains/company.com
Looking at http://localhost:9200/indexname/_count , says that we have "count": 119687 amount of documents.
I just want my elastic search to return the document names of all 119687 entries which are domain names.
How do I achieve that and is it possible to achieve that in one single query?
Looking at the example : http://localhost:9200/indexname/domains/google.com I am assuming your doc_type is domains and doc id/"document name" is google.com.
_id is the document name here which is always part of the response. You can use source filtering to disable source and it will show only something like below:
GET indexname/_search
{
"_source": false
}
Output
{
...
"hits" : [
{
"_index" : "indexname",
"_type" : "domains",
"_id" : "google.com",
"_score" : 1.0
}
]
...
}
If documentname is a field that is mapped, then you can still use source filtering to include only that field.
GET indexname/_search
{
"_source": ["documentname"]
}

Similar searching with elastic search

If I have a table, which contains a lot of persons. Each person will have their own attributes such as name, social id, age, sex, number of children...
Given a person A which is 40 years old male, have 2 children.. Provide me all persons that is similar to person A.
Is this something I can do with Elastic search? I'm thinking about More Like This query https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
Thank you very much.
Yes MLT will work properly, you can specify the fields where you want to apply the more like this query, like for in your case, it would be age, number of children. Any other specific field through which you want to match. Here is the exmaple query-
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["age", "number_of_children"],
"like" :
{
"_index" : "people",
"_type" : "person",
"_id" : "1"
},
"min_term_freq" : 1
}
}
}

ElasticSearch search query processing

I have been reading up on ElasticSearch and couldn't find an answer for how to do the following:
Say, you have some records with, "study" in the title and a user uses the word "studying" instead of "study". How would you set up ElasticSearch to match this?
Thanks,
Alex
ps: Sorry, if this is a duplicate. Wasn't sure what to search for!
You might be interested in this: http://www.elasticsearch.org/guide/reference/query-dsl/flt-query/
For eg: I have indexed book titles and on this query:
{
"query": {
"bool": {
"must": [
{
"fuzzy": {
"book": {
"value": "ringing",
"min_similarity": "0.3"
}
}
}
]
}
}
}
I got
{
"took" : "1",
"timed_out" : "false",
"_shards" : {
"total" : "5",
"successful" : "5",
"failed" : "0"
}
"hits" : {
"total" : "1",
"max_score" : "0.19178301",
"hits" : [
{
"_index" : "library",
"_type" : "book",
"_id" : "3",
"_score" : "0.19178301",
"_source" : {
"book" : "The Lord of the Rings",
"author" : "J R R Tolkein"
}
}
]
}
}
which is the only correct result..
You could apply stemming to your documents, so that when you index studying, you are beneath indexing study. And when you query you do the same, so that when you search for studying again, you'll be searching for study and you'll find a match, both looking for study and studying.
Stemming depends of course on the language and there are different techniques, for english snowball is fine. What happens is that you lose some information when you index data, since as you can see you cannot really distinguish between studying and study anymore. If you want to keep that distinction you could index the same text in different ways using a multi_field and apply different text analysis to it. That way you could search on multiple fields, both the non stemmed version and stemmed version, maybe giving different weights to them.

Resources