retrieving the multivalued array from elasticsearch - elasticsearch

I have trouble getting the facet in my index.
Basically I want to get the details of particular facet say "Company" in a separate array
I tried many queries but it all get entire facet under facet array .How can I get only particular facet in a facet array
My index is https://gist.github.com/4015817
Please help me .I am badly stuck here

Considering how complex your data structure is, the simples way to extract this information might be using script fields:
curl "localhost:9200/index/doc/_search?pretty=true" -d '{
"query" : {
"match_all" : {
}
},
"script_fields": {
"entity_facets": {
"script": "result=[];foreach(facet : _source.Categories.Types.Facets) {if(facet.entity==entity) result.add(facet);} result",
"params": {
"entity": "Country"
}
},
"first_facet": {
"script": "_source.Categories.Types.Facets[0]"
}
}
}'

Related

How to rank ElasticSearch documents based on scores

I have an Elastic search index that contain thousands of documents, each document represent a user.
each document has set of fields (is_verified: boolean, country: string, is_creator: boolean), also i have another service that call ES search to lookup for documents, how i can rank the retrieved documents based on those fields? for example a verified user with match should come first than un verified one.
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
what shall i read/look to understand how to rank in elastic search.
thanks
I guess the sorting function mentioned by Mikael is pretty straight forward and should cover your use cases. Check Elastic Doc for more information on that.
But in case you want to do really fancy sorting, maybe you could use a bool query and different boost values to set your desired relevancy for each matched field. It tried to come up with a real life example, but honestly didn't find one. For the sake of completeness, he following snippet should give you an idea how to achieve similar results as with the sort API (but still, i would prefer using sort).
GET /yourindexname/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Monica"
}
}
],
"should": [
{
"term": {
"is_verified": {
"value": true,
"boost": 2
}
}
},
{
"term": {
"is_creator": {
"value": true,
"boost": 2
}
}
}
]
}
}
}
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
I wouldn't assign a fixed score to a document while indexing, as the score should be dependent on the query. However, if you insist to have a predefined relevancy for each document, theoretically you could add a field relevancy having that value for ordering and use it later in the query:
GET /yourindexname/_search
{
"query" : {
"match" : {
"name": "Monica"
}
},
"sort" : [
{
"relevancy": {
"order": "desc"
},
"_score"
}
]
}
You can consider using the Sort Api inside your search queries ,In example below we used the search on the field country and sorted the result with respect of Boolean field (is_verified) , You can also add the other Boolean field inside Sort brackets .
GET /yourindexname/_search
{
"query" : {
"match" : {
"country": "Iceland"
}
},
"sort" : [
{
"is_verified": {
"order": "desc"
}
}
]
}

Elasticsearch aggregate by field prefix

I have data entries of the form
{
"id": "ABCxxx",
// Other fields
}
Where ABC is a unique identifier that defines the "type" of this record. (For example a user would be USR1234..., an image would be IMG1234...)
I want to get a list of all the different types of records that I have in my ES. So in essence I want to do a sort by id but only looking at the first three characters of the id.
This doesn't work obviously, because it sorts by id (so USR123 is different than USR456):
{
"fields": ["id"],
"aggs": {
"group_by_id": {
"terms": {
"field": "id"
}
}
}
}
How do I write this query?
You can use the painless scripting language to get this accomplished.
{
"fields": ["id"],
"aggs": {
"group_by_id": {
"terms": {
"script" : {
"inline": "doc['id'].substring(0,3)",
"lang": "painless"
}
}
}
}
}
More info here. Please note that the syntax for the substring method may not be exactly right.
As suggested by paqash already that the same can be achieved via script but I would suggest an alternate of storing "type" as a different field altogether in your schema.
For eg.
USR1234 : {id:"USR1234", type:"USR"}
IMG1234 : {id:"USR1234", type:"IMG"}
This would avoid unnecessary complications in scripting and keep your query interface clean.

Is it possible to return a specific field when running a query in sense for elasticsearch

I have loaded some data into elasticsearch and written a query against the data however the results contain all of the data for the matching queries. Is it possible to filter the results to show a particular field?
Example
Query to find all records for a specific country but to return a list of registration numbers.
All the data is available elasticsearch however I get a full json record back for each match.
I'm running this query in SENSE (within Kibana 4.5.0).
The query is...
GET _search
{
filter_path=reg_no.*,
"fields" : ["reg_no"],
"query" : {
"fields" : ["country_cd", "oprg_stat"],
"query" : "956 AND 9074"
}
}
If I remove the two lines
filter_path=reg_no.*,
"fields" : ["reg_no"],
the query runs but brings back all the data.
Try this query:
POST _search
{
"_source": [
"reg_no"
],
"query": {
"bool": {
"filter": [
{
"term": {
"country_cd": "956"
}
},{
"term": {
"oprg_stat": "9074"
}
}
]
}
}
}

Sort documents by size of a field

I have documents like below indexed,
1.
{
"name": "Gilly",
"hobbyName" : "coin collection",
"countries": ["US","France","Georgia"]
}
2.
{
"name": "Billy",
"hobbyName":"coin collection",
"countries":["UK","Ghana","China","France"]
}
Now I need to sort these documents based on the array length of the field "countries", such that the result after the sorting would be of the order document2,document1. How can I achieve this using elasticsearch?
You can use script based sorting to achieve this.
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "number",
"script": "doc['countries'].values.size()",
"order": "desc"
}
}
}
I would suggest using token count type in Elasticsearch.
By using scripts , it can be done (can check here for how to do it using scripts). But then results wont be perfect.
Scripts mostly uses filed data cache and duplicate are removed in this.
You can read more on how to use token count type here.

How to retrieve all document ids (_id) in a specific index

I am trying to retrieve all documents in an index, while getting only the _id field back.
Basically I want to retrieve all the document ids I have.
While using:
{
"query": {
"match_all": {}
},
"fields": []
}
The hits I get contain: "_index", "_type", "_id" , "_score", "_source"
which is way more then I need.
Edit(Answer):
So my problem was that I used KOPF to run the queries, and the results were not accurate (got the _source and some more..)! When using curl I got the correct results!
So the above query actually achieved what I needed!
You can also use:
{
"query": {
"match_all": {}
},
"_source": false,
}
or
{
"query": {
"match_all": {}
},
"fields": ["_id"]
}
For elasticsearch, only can specific _source fields by using fields array.
_index, _type, _id, _score must will be returned by elasticsearch.
there is no way to remove them from response.
I am assuming your _id is of your document in index not of index itself.
In new version of elastic search, "_source" is used for retrieving selected fields of your es document because _source fields contains everything you need in elastic search for a es record.
Example:
Let's say index name is "movies" and type is "movie" and you want to retrieve the movieName and movieTitle of all elastic search records.
curl -XGET 'http://localhost:9200/movies/movie/_search?pretty=true' -d '
{
"query" : {
"match_all" : {}
},
"_source": ["movieName","movieTitle"]
}'
OR http://localhost:9200/movies/movie/_search?pretty=true&_source=movieName,movieTitle
By default it return 10 results. For getting n number of records then put &size=n in url

Resources