I'm (extremely) new to ElasticSearch so forgive my potentially ridiculous question. I currently use MySQL to perform full-text searches, and want to move this to ElasticSearch. Currently my table has a fulltext index spanning three columns:
title,description,tags
In ES, each document would therefore have title, description and tags fields, allowing me to do a fulltext search for a general phrase, or filter on a given tag.
I also want to add further searchable fields such as username (so I can retrieve posts by a given user). So, how do I specify that a fulltext search should match title OR description OR tags but not username?
From the OR filter example, I'd assume I'd have to use something like this:
{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"or" : [
{
"term" : { "title" : "foobar" }
},
{
"term" : { "description" : "foobar" }
},
{
"term" : { "tags" : "foobar" }
}
]
}
}
}
Coming at this new, it doesn't seem like this is very efficient. Is there a better way of doing this, or do I need to move the username field to a separate index?
This is fine.
I general I would suggest getting familiar with ElasticSearch mapping types and options.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html
Related
I have an elasticsearch document repository with ~15M documents.
Each document has an unique 11-char string field (comes from a mongo DB) that is unique to the document. This field is indexed as keyword.
I'm using C#.
When I run a search, I want to be able to limit the search to a set of documents that I specify (via some list of the unique field ids).
My query text uses bool with must to supply a filter for the unique identifiers and additional clauses to actually search the documents. See example below.
To search a large number of documents, I generate multiple query strings and run them concurrently. Each query handles up to 64K unique ids (determined by the limit on terms).
In this case, I have 262,144 documents to search (list comes, at run time, from a separate mongo DB query). So my code generates 4 query strings (see example below).
I run them concurrently.
Unfortunately, this search takes over 22 seconds to complete.
When I run the same search but drop the terms node (so it searches all the documents), a single such query completes the search in 1.8 seconds.
An incredible difference.
So my question: Is there an efficient way to specify which documents are to be searched (when each document has a unique self-identifying keyword field)?
I want to be able to specify up to a few 100K of such unique ids.
Here's an example of my search specifying unique document identifiers:
{
"_source" : "talentId",
"from" : 0,
"size" : 10000,
"query" : {
"bool" : {
"must" : [
{
"bool" : {
"must" : [ { "match_phrase" : { "freeText" : "java" } },
{ "match_phrase" : { "freeText" : "unix" } },
{ "match_phrase" : { "freeText" : "c#" } },
{ "match_phrase" : { "freeText" : "cnn" } } ]
}
},
{
"bool" : {
"filter" : {
"bool" : {
"should" : [
{
"terms" : {
"talentId" : [ "goGSXMWE1Qg", "GvTDYS6F1Qg",
"-qa_N-aC1Qg", "iu299LCC1Qg",
"0p7SpteI1Qg", ... 4,995 more ... ]
}
}
]
}
}
}
}
]
}
}
}
#jarmod is right.
But if you don't wanna completely redo your architecture, is there some other single talent-related shared field you could query instead of thousands of talendIds? It could be one more simple match_phrase query.
I have a elastic search engine running locally with an index which contains data from Multiple customers. When a customer makes a query, is there a way to dynamically add Customer Id in the filtering criteria so a customer cannot access the records from other customers.
Yes, you can achieve that using filtered aliases. So you'd create one alias per customer like this:
POST /_aliases
{
"actions" : [
{
"add" : {
"index" : "customer_index",
"alias" : "customer_1234",
"filter" : { "term" : { "customer_id" : "1234" } }
}
}
]
}
Then your customer can simply query the alias customer_1234 and only his data is going to come back.
I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.
i am very new to the Elastic search.
Like to know how to search partial multi word search.
\
for ex :
My document
{
"title":"harry porter"
}
i need this document with search with following string
1.)har por
same as sql query (select * from books where title like '%har%' or title like '%por%')
Using a completion suggester will provide most of the feature you want. It will find words starting with an arbitrary string, like "har" or "por".
Check out this question for a full example on how to set up a completion suggester.
As described in the documentation, you can achieve multi-word search (i.e. returning "harry horter" from a search for "por") by creating your analyzer with the option preserve_position_increments set to false
PUT books
{
"mappings": {
"book" : {
"properties" : {
"suggest" : {
"type" : "completion",
"preserve_position_increments": false
},
"title" : {
"type": "keyword"
}
}
}
}
}
Refer to this : Edge NGram Tokenizer
This helps in partial multi-word search (similar to autocomplete suggestions). Hope this helps!
I'm using elasticsearch. I'm already pretty deep into it but I'm very confused as to how to go about writing advanced queries. There are queries / filters / etc. I'm confused as to how to proceed.
I have a schema that looks like this:
photos: {people: [{person_id: 1, person_name:"john kealy"}],
tags: [{tag_id: 1, tag_name:"other tag"},
by_line: "John D Kealy/My website.com",
location: "Some Place OUt West"]
I need to be able to string together these queries dynamically ALWAYS pulling in FULL MATCHES, e.g. I would like to search for
people.person_id: [1,2] (pulls in only photos with BOTH or more peole)
tags.tag_id: [1,2,3] (pulls in only photos with all three or more tags)
by_line: "John D. Kealy/My Website.com" (the full name including the slash)
location: "some place out west"
I would like to write one query with all these items. I need to include the slash in "by_line", i don't care up upper or lower case. I need the exact match "some place out west". What do I use here? Queries or filters / filtered?
General guidelines for bool filters/queries can be found here.
If you are constructing an "exact match" query, you can often use the term filter (or query).
If you are constructing a search that requires a solid performance speed wise, a filtered query is often advisable, as filters are set before the query is run, often improving performance.
As for your specific example, the below filters should work, throw it around a matchAll query or anything else you need [With the non-analyzed by_line field, the analyzed one has a query). This should give you an idea as how to construct future queries:
NOTE: This assumes that your by_line field is not analyzed. The double slash will escape your slash delimiter, if you are using an analyzed field you must use a match query.
Without analyzer on by_line
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"by_line" : "John D. Kealy\\/My Website.com"}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}
I will keep the above there for future readers, however I see in your post history that you are using the standard analyzer, your query should be structured as follows.
With analyzer on by_line
{
"query" : {
"filtered" : {
"query": {
"match": {
"by_line": "John Kealy/BFA.com"
}
},
"filter" : {
"bool" : {
"must" : [
{ "terms" : {"people.person_id" : ["1", "2"]}},
{ "terms" : {"tags.tag_id" : ["1", "2", "3"]}},
{ "term" : {"location" : "some place out west"}}
]
}
}
}
}
}