In my dataset, a document contains 20+ fields with nested objects. Most of them are long text fields. These fields are important for full-text search but we only need to show the title, short-description and Id in output.
Is it possible to specify the output fields in ElasticSearch for a full text query? (like projection in MongoDB)
I think you're looking for the fields property of a search request:
Allows to selectively load specific fields for each document
represented by a search hit. Defaults to load the internal _source
field.
{
"fields" : ["user", "postDate"],
"query" : {
"term" : { "user" : "kimchy" }
}
}
The fields will automatically load stored fields (store mapping set to
yes), or, if not stored, will load the _source and extract it from it
(allowing to return nested document object).
Take care in ElasticSearch 1.0.0.RC1 the fields return values now are always lists,
if need the result to be a long instead of a list of longs (which might be a single value list for you most of the time) you can limit those with _source
{"_source" : ["field1", "field2", ...],
"query" : {
"term" : { "user" : "kimchy" }
}
}
Related
I'm looking for data in two fields with one filed must be the same, one using query
i have data
{
"NUMBER" : "5587120",
"SID" : "121213-13131-_X",
"ADDRESS" : "purwakarta"
}
i have tried use query string like this
GET test/_doc/_search
{
"query" : {
"bool" : {
"must" : [
{"match" : {"NUMBER" : "5587120"}}
],
"filter" : {
"query_string" : {
"default_field" : "SID.keyword",
"query" : "*X*"
}
}
}
}
when I enter the same text as the one recorded, the data I want appears, but when I write the text with lowercase, the data doesn't appear
As it's not clear from your question, that on which field you want the case insensitive search, based on the context I am assuming its the SID.keyword field.
Why your solution not working: Please note that keyword fields are not analyzed and indexed in elasticsearch as it is, so in case of your field SID.keyword you are providing its value 121213-13131-_X so it will be stored as it is, it will not create just one token which is exactly same as the provided value.
Now you are using the query_string on-field SID.keyword, hence your query string will use the same analyzer configured for the field which is the keyword analyzer which is again no-op analyzer, hence doesn't lowercase the *X* provided in the query.
Solution : If you want the insensitive search than instead of using the SID.keyword field, simply creates a custom analyzer which uses the keyword analyzer and then passes it to lowercase token filter, so your 121213-13131-_X will be converted to 121213-13131-_x(Note small case x). And then your query string will also use the same analyzer and will match the document as ultimately elasticsearch works on tokens match.
I am trying to boost fields using multi match query without specifying complete field list but I cannot find out how to do it. I am searching through multiple indices on all fields, which I don't know at the run time, but I know which are the important ones.
For example I have index A with the fields 1,2,3,4 and index B with fields 1,5,6,7,8. I need to search across both indexes through all fields with the boosting on field 1.
So far I got
GET A,B/_search
{
"query": {
"multi_match" : {
"query" : "somethingToSearch"
}
}
}
Which goes through all fields on both indices, but I would like to have something like this (boosting match on field 1 before the others)
GET A,B/_search
{
"query": {
"multi_match" : {
"query" : "somethingToSearch",
"fields" : ["1^5,*"]
}
}
}
Is there any way how to do it without using bool queries?
I want to populate a filtering field based on the data I have indexed inside Elasticsearch. How can I retrieve this data? For example, my documents inside index "test" and type "doc" could be
{"id":1, "tag":"foo", "name":"foothing"}
{"id":2, "tag":"bar", "name":"barthing"}
{"id":3, "tag":"foo", "name":"something"}
{"id":4, "tag":"quux", "name":"quuxthing"}
I'm looking for something like GET /test/doc/_magic?q=tag that would return [foo,bar,quux] from my data. I don't know what this is called or even possible. I don't want to get all index entries into memory and do this programmatically, I have millions of documents in the index with around a hundred different tags.
Is this possible with ES?
Yes, that's possible and this is called a terms aggregation
You can do it like this:
GET /test/doc/_search
{
"size": 0,
"aggs" : {
"tags" : {
"terms" : {
"field" : "tag.keyword",
"size": 100
}
}
}
}
Note that depending on the cardinality of your tag field, you can increase/decrease the size setting (10 by default).
Consider the following JSON file:
{
"titleSony": "Matrix",
"cast": [
{
"firstName": "Keanu",
"lastName": "Reeves"
}
]
}
Now, I know in ElasticSearch, you can apply a synonym token filter to field values as given in the following link: Elasticsearch Analysis: Synonym token filter.
Hence, I can create a "synonym.txt" file with Matrix => Matx, then if I search for titleSony:Matx, it will return the documents with Matrix as well.
Now, what I would like is to create a synonym for the field name titleSony. For example - titleSony => titleAll, such that when I search for titleAll, I should get all documents with titleSony as well.
Is there any way to accomplish this in ElasticSearch?
Now, what I would like is to create a synonym for the field name "titleSony". For example - titleSony => titleAll , hence when I search for "titleAll", I should get all documents with "titleSony" as well.
Yes, somewhat. Elasticsearch has some default behavior very similar to this, which I'll touch on in a bit.
The feature you're looking for is called "Copy to field." It allows you to specify that the terms in one field should be copied into another. This is useful for consolidating terms you expect to match into a single field, to help simplify your query when you would like to match against any one of a number of fields.
In this example, you would specify in your mapping that the terms in the titleSony field ought to be copied into the titleAll field. Presumably you'd have other fields (say, titleDisney) which also copy into that field as well. So a search against titleAll will effectively match the other fields whose terms are copied into it.
An excerpt of your mapping might look something like this:
{
"movies" : {
"properties" : {
"titleSony" : { "type" : "string", "copy_to" : "titleAll" },
"titleDisney" : { "type" : "string", "copy_to" : "titleAll" },
"titleAll" : { "type" : "string" },
"cast" : { ... },
...
}
}
I mentioned earlier that Elasticsearch does something like this. By default it creates a special field called _all into which all the document's terms are copied. This field lets you construct very simple queries to match against terms that occur in any field on the document. So as you see, this is a fairly common convention in Elasticsearch. (Elasticsearch mapping: _all field.)
I'm using ElasticSearch to index forum threads and reply posts. Each post has a date field associated with it. I'd like to perform a query that includes a date range which will return threads that contain posts matching a date range. I've looked at using a nested mapping but the docs say the feature is experimental and may lead to inaccurate results.
What's the best way to accomplish this? I'm using the Java API.
You haven't said much about your data structure, but I'm inferring from your question that you have post objects which contain a date field, and presumably a thread_id field, ie some way of identifying which thread a post belongs to?
Do you also have a thread object, or is your thread_id sufficient?
Either way, your stated goal is to return a list of threads which have posts in a particular date range. This means that you need to group your threads (rather than returning the same thread_id multiple times for each post in the date range).
This grouping can be done by using facets.
So the query in JSON would look like this:
curl -XGET 'http://127.0.0.1:9200/posts/post/_search?pretty=1&search_type=count' -d '
{
"facets" : {
"thread_id" : {
"terms" : {
"size" : 20,
"field" : "thread_id"
}
}
},
"query" : {
"filtered" : {
"query" : {
"text" : {
"content" : "any keywords to match"
}
},
"filter" : {
"numeric_range" : {
"date" : {
"lt" : "2011-02-01",
"gte" : "2011-01-01"
}
}
}
}
}
}
'
Note:
I'm using search_type=count because I don't actually want the posts returned, just the thread_ids
I've specified that I want the 20 most frequently encountered thread_ids (size: 20). The default would be 10
I'm using a numeric_range for the date field because dates typically have many distinct values, and the numeric_range filter uses a different approach to the range filter, making it perform better in this situation
If your thread_ids look like how-to-perform-a-date-range-elasticsearch-query then you can use these values directly. But if you have a separate thread object, then you can use the multi-get API to retrieve these
your thread_id field should be mapped as { "index": "not_analyzed" } so that the whole value is treated as a single term, rather than being analyzed into separate terms