Elasticseach - force a field to index only, avoid store - elasticsearch

How do I force a field to be indexed only and not store the data. This option is available in Solr and not sure if it's possible in Elasticseach.

From document
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering
If you don't want field to be stored in _source too. You can exclude the field from source in mapping
Mapping:
{
"mappings": {
"properties": {
"title":{
"type":"text"
},
"description":{
"type":
}
},
"_source": {
"excludes": [
"description"
]
}
}
}
Query:
GET logs/_search
{
"query": {
"match": {
"description": "b" --> field description is searchable(indexed)
}
}
}
Result:
"hits" : [
{
"_index" : "logs",
"_type" : "_doc",
"_id" : "-aC9V3EBkD38P4LIYrdY",
"_score" : 0.2876821,
"_source" : {
"title" : "a" --> field "description" is not returned
}
}
]
Note:
Removing fields from source will cause below issue
The update, update_by_query, and reindex APIs.
On the fly highlighting.
The ability to reindex from one Elasticsearch index to another, either to change mappings or analysis, or to upgrade an index to a new major version.
The ability to debug queries or aggregations by viewing the original document used at index time.
Potentially in the future, the ability to repair index corruption automatically.

Related

Is there a way to enable _source on existing data?

I create an Index without _source field (considerations of memory).
I want to enable this field on the existing data , there is a way to do that?
For example:
I will create dummy-index :
PUT /dummy-index?pretty
{
"mappings": {
"_doc": {
"_source": {
"enabled": false
}
}
}
}
and I will add the next document :
PUT /dummy-index/_doc/1?pretty
{
"name": "CoderIl"
}
I will get only the hit metadata when I search without the name field
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
}
the question if I could change the _soruce to enable and when I search again I'll get the missing data (in this example "name" field) -
{
"_index" : "dummy-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
"_source" : {
"name" :CoderIl"
}
}
As clarified in the chat, the issue is
_source field is disabled.
In search result he wants what was stored in the fields which is returned as part if _source if enabled like below
_source" : {
"name" :CoderIl"
}
Now in order to achieve it, store option must be enabled on the field, please note this can't be changed dynamically and you have to re-index data again with updated mapping.
Example
Index mapping
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "text"
},
"title" :{
"type" : "text",
"store" : true
}
}
}
}
Index sample docs
{
"name" : "coderIL"
}
{
"name" : "coderIL",
"title" : "seconds docs"
}
**Search doc with fields content using store fields
{
"stored_fields": [
"title"
],
"query": {
"match": {
"name": "coderIL"
}
}
}
And search result
"hits": [
{
"_index": "without_source",
"_type": "_doc",
"_id": "1",
"_score": 0.18232156
},
{
"_index": "without_source",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"fields": {
"title": [
"seconds docs"
]
}
}
]
store option on field controls that, from the same official doc
By default, field values are indexed to make them searchable, but they
are not stored. This means that the field can be queried, but the
original field value cannot be retrieved.
Usually this doesn’t matter. The field value is already part of the
_source field, which is stored by default. If you only want to retrieve the value of a single field or of a few fields, instead of
the whole _source, then this can be achieved with source filtering.
As mentioned on the doc, by default its disabled, and if you want to save space, you can enable it on specific fields and need to re-index the data again
Edit: Index option controls(enabled by default) whether field is indexed or not(this is required for searching on the field) and store option controls whether it's stored or not, this is used if you want to get the non-analyzed value ie what you sent to ES in your index request, which based on field type goes through text analysis and part of index option, refer this SO question for more info.

Can i use Elasticsearch with data that needs authentication to view(ex.Logged in users only)

I want to implement searching on my website.Users should be able to search for products that they have in their shop.Obviously,the products returned should only be theirs,same if customers search on their website.How can i implement this with Elasticsearch?Obviously,i will have my backend do the query not the front-end,but how will i limit the search results to be only for one user.Is it only possible through filtering with my own code.Does it have something like WHERE from sql?Am i going about it the wrong way?Will it be better if i use the Full text search from PostgreSQL.
I am using GO btw.
Best regards
Update:My usecase as requested:
User is paired with an ID.He is in his dashboard and searches for a product he has in his shop.His requests passes the session token cookie and i get his ID on my server.Then i need to get the products that match his query and only his.
In SQL it would be SELECT * FROM products WHERE shop_id=ID for example.Is it possible with Elasticsearch?Is it more trouble than worth instead of implementing full text search on PostgreSQL?
Iy can be easily achieved using Elasticsearch and you should define shop_id as a keyword field and later on use that in filter context of query to make sure, you search only on the products belong to a particular shop_id.
Using shop_id in filter context also improves the performance of your search significantly as these are by default cached at Elasticsearch as explained in the official doc
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Sample mapping and query according to your requirement:
Index mapping
{
"mappings" :{
"properties" : {
"product" : {
"type" : "text"
},
"shop_id" :{
"type" : "keyword"
}
}
}
}
Index sample docs for 2 diff shop ids
{
"product" : "foo",
"shop_id" : "stackoverflow"
}
{
"product" : "foo",
"shop_id" : "opster"
}
Search for foo product where shop_id is stackoverflow
{
"query": {
"bool": {
"must": [
{
"match": {
"product": "foo"
}
}
],
"filter": [
{
"term": {
"shop_id": "stackoverflow"
}
}
]
}
}
}
Search result
"hits": [
{
"_index": "productshop",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {. --> note only foo belong to `stackoverflow` returned
"product": "foo",
"shop_id": "stackoverflow"
}
}
]

How to Query just all the documents name of index in elasticsearch

PS: I'm new to elasticsearch
http://localhost:9200/indexname/domains/<mydocname>
Let's suppose we have indexname as our index and i'm uploading a lot of documents at <mydoc> with domain names ex:
http://localhost:9200/indexname/domains/google.com
http://localhost:9200/indexname/domains/company.com
Looking at http://localhost:9200/indexname/_count , says that we have "count": 119687 amount of documents.
I just want my elastic search to return the document names of all 119687 entries which are domain names.
How do I achieve that and is it possible to achieve that in one single query?
Looking at the example : http://localhost:9200/indexname/domains/google.com I am assuming your doc_type is domains and doc id/"document name" is google.com.
_id is the document name here which is always part of the response. You can use source filtering to disable source and it will show only something like below:
GET indexname/_search
{
"_source": false
}
Output
{
...
"hits" : [
{
"_index" : "indexname",
"_type" : "domains",
"_id" : "google.com",
"_score" : 1.0
}
]
...
}
If documentname is a field that is mapped, then you can still use source filtering to include only that field.
GET indexname/_search
{
"_source": ["documentname"]
}

Attempting to use Elasticsearch Bulk API when _id is equal to a specific field

I am attempting to bulk insert documents into an index. I need to have _id equal to a specific field that I am inserting. I'm using ES v6.6
POST productv9/_bulk
{ "index" : { "_index" : "productv9", "_id": "in_stock"}}
{ "description" : "test", "in_stock" : "2001"}
GET productv9/_search
{
"query": {
"match": {
"_id": "2001"
}
}
}
When I run the bulk statement it runs without any error. However, when I run the search statement it is not getting any hits. Additionally, I have many additional documents that I would like to insert in the same manner.
What I suggest to do is to create an ingest pipeline that will set the _id of your document based on the value of the in_stock field.
First create the pipeline:
PUT _ingest/pipeline/set_id
{
"description" : "Sets the id of the document based on a field value",
"processors" : [
{
"set" : {
"field": "_id",
"value": "{{in_stock}}"
}
}
]
}
Then you can reference the pipeline in your bulk call:
POST productv9/doc/_bulk?pipeline=set_id
{ "index" : {}}
{ "description" : "test", "in_stock" : "2001"}
By calling GET productv9/_doc/2001 you will get your document.

How to index same doc in different indices with different routing

I need to be able to index the same document in different indexes with different routing value.
Basically the problem to solve is to be able to calculate complex aggregations about payment information from the perspective of payer and collector. For example, "payments made / received in the last 15 days grouped by status"
I was wondering how we can achieve this using ElasticSearch bulk api.
Is it possible to achieve this without generating redundancy in the ndjson? Something like this for example:
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "index" : { "_index" : "test_2", "_id" : "1", "routing": "5678" } }
{ "field1" : "value1" }
I looked for documentation but I didn't find a place that explain this.
By only using the bulk API, you'll need to repeat the document each time.
Another way of doing it is to bulk-index the documents into the first index and then using the Reindex API you can create the second index with a different routing value for each document.
POST _bulk
{ "index" : { "_index" : "test_1", "_id" : "1", "routing": "1234" } }
{ "field1" : "value1", "routing2": "5678" }
And then you can reindex into a second index using the second routing value (that you need to store in the document somehow
POST _reindex
{
"source": {
"index": "test_1"
},
"dest": {
"index": "test_2"
},
"script": {
"source": "ctx._routing = ctx._source.routing2",
"lang": "painless"
}
}
That way, you only index the data once using the bulk API, which will roughly take half the time than when doubling all documents, and then by leveraging the Reindex API all the data will be reindexed internally (i.e. without the added network latency of sending the potentially big payload)

Resources