Elasticsearch index with no fields indexed - elasticsearch

I need to create an Elasticsearch index whose contents will be accessed only by the document-id. There will never be any queries related to the contents of documents. These documents can contain any JSON, including instances where the same field can contain different types of data, etc.
I've hunted for this info, and have found much about indexing individual fields, but nothing about treating the entire document as essentially opaque.
Any help would be much appreciated.

You could do that you want, but as for me, this is not right way.
For first you need to create a mapping for index:
PUT index_name_here
{
"mappings": {
"document_type_here": {
"properties": {
"field_name_for_json_data_here": {
"type": "object",
"enabled": false
}
}
}
}
}
After this, you could create documents with custom structure of fields. You just need to store you JSON not directly in document, but inside field of document (in my example inside field "field_name_for_json_data_here")
If it possible, tell me the reason, why you choose Elasticsearch for store this kind of data? Because if I correctly understood the question, you need simple key\value storage (you could store you json as string) and many databases are more suitable for this.

Related

How to find what index a field belongs to in elasticsearch?

I am new to elasticsearch. I have to write a query using a given field but I don't know how to find the appropriate index. How would I find this information?
Edit:
Here's an easier/better way using mapping API
GET _mapping/field/<fieldname>
One of the ways you can find is to get records where the field exist
Replace the <fieldName> with your fields name. /_search will search across all indices and return any document that matches or has the field. Set _source to false, since you dont care about document contents but only index name.
GET /_search
{
"_source": false,
"query": {
"exists": {
"field": "<fieldName>"
}
}
}
Another, more visual way to do that is through the kibana Index Management UI (assuming you have privileges to access the site).
There you can click on the indices and open the mappings tab to get all fields of the particular index. Then just search for the desired field.
Summary:
#Polynomial Proton's answer is the way of choice in 90% of the time. I just wanted to show you another way to solve your issue. It will require more manual steps than #Polynomial Proton's answer. Also, if you have a large amount of indices this way is not appropriate.

Elasticsearch: Set type of field while reindexing? (can it be done with _reindex alone)

Question: Can the Elasticsearch _reindex API be used to set/reset the "field datatypes" of fields that are copied through it?
This question comes from looking at Elastics docs for reindex: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/docs-reindex.html
Those docs show the _reindex API can modify things while they are being copied. They give the example of changing a field name:
POST _reindex
{
"source": {
"index": "from-index"
},
"dest": {
"index":"new-index"
},
"script": {
"source": "ctx._source.New-field-name = ctx._source.remove(\"field-to-change-name-of\")"
}
}
The script clause will cause the "new-index" to have a field called New-field-name, instead of the field with the name field-to-change-name-of from the "from-index"
The documentation implies there is a great deal of flexibility available in the "script" functionality, but its not clear to me if that includes projecting datatypes (for instance quoting data to turn it into a strings/text/keywords, and/or treating things as literals to attempt to turn string data into non-strings (obviously fought with danger)
If setting the datatypes in a _reindex is possible, I'm not assuming it will be efficient and/or be without (perhaps harsh) limits - I just want to better understand the limit of the _reindex functionality (and figure out if I can force a datatype in just one interaction, instead of setting the mapping no the new index before I do the reindex command)
(P.S. I happen to be working on Elasticsearch 6.2, but I think my question holds for all versions that have had the _reindex api (sounds like everything 2.3.0 and greater))
Maybe you are confusing some terms. The part of the documentation you are pointing out refers to the metadata associated with a document, in this case the _type meta field just tells Elasticsearch that a particular document belongs to a specific type (e.g. user type), it is not related to the datatype of a field (e.g. integer or boolean).
If you want to set/reset the mapping of particular fields, you don't even need to use scripting depending on your case. You just have to create the destination index with the new mapping and execute the _reindex API.
But if you want to change the mapping between incompatible values (e.g. a non numerical string into a field with an "integer" datatype), you would need to do some transformation through scripting or through the ingest node.

difference between a field and the field.keyword

If I add a document with several fields to an Elasticsearch index, when I view it in Kibana, I get each time the same field twice. One of them will be called
some_field
and the other one will be called
some_field.keyword
Where does this behaviour come from and what is the difference between both of them?
PS: one of them is aggregatable (not sure what that means) and the other (without keyword) is not.
Update : A short answer would be that type: text is analyzed, meaning it is broken up into distinct words when stored, and allows for free-text searches on one or more words in the field. The .keyword field takes the same input and keeps as one large string, meaning it can be aggregated on, and you can use wildcard searches on it. Aggregatable means you can use it in aggregations in elasticsearch, which resembles a sql group by if you are familiar with that. In Kibana you would probably use the .keyword field with aggregations to count distinct values etc.
Please take a look on this article about text vs. keyword.
Briefly: since Elasticsearch 5.0 string type was replaced by text and keyword types. Since then when you do not specify explicit mapping, for simple document with string:
{
"some_field": "string value"
}
below dynamic mapping will be created:
{
"some_field": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
As a consequence, it will both be possible to perform full-text search on some_field, and keyword search and aggregations using the some_field.keyword field.
I hope this answers your question.
Look at this issue. There is some explanation of your question in it. Roughly speaking some_field is analyzed and can be used for fulltext search. On the other hand some_field.keyword is not analyzed and can be used in term queries or in aggregation.
I will try to answer your questions one by one.
Where does this behavior come from?
It is introduced in Elastic 5.0.
What is the difference between the two?
some_field is used for full text search and some_field.keyword is used for keyword searching.
Full text searching is used when we want to include individual tokens of a field's value to be included in search. For instance, if you are searching for all the hotel names that has "farm" in it, such as hay farm house, Windy harbour farm house etc.
Keyword searching is used when we want to include the whole value of the field in search and not individual tokens from the value. For eg, suppose you are indexing documents based on city field. Aggregating based on this field will have separate count for "new" and "york" instead of "new york" which is usually the expected behavior.
From Elastic 5.0 onwards, strings now will be mapped both as keyword and text by default.

Elasticsearch: Is it possible to index fields not present in source?

Is it possible to make Elasticsearch index fields that are not present in the source document? An example of what I want to do is to index a Geo Point, but not store the value and to leave it out of the _source field. Then I could do searches and aggregations based on location, geohash etc., but not return the position in the result documents themselves, e.g., for privacy reasons.
The possibility seems to not be too far fetched, since mappings can cause fields in the source to be indexed in several different ways, for instance the Geo Point type can index pos.lon, pos.lat and pos.geohash even though these are not in the original source document.
I have looked at source filtering, but that seems to only apply to searches and not indexing. I did not find a way to use it in aliases.
The only way I've found to accomplish something like this would be to not store _source, but do store all other fields, except the single one I want to hide. That seems overly clumsy though.
I think you can do this with mappings:
In my index creation code, I have the following:
"mappings" : {
"item" : {
"_source" : {"excludes" : ["uploader"]},
"properties" : { ... }
}
},
"settings" : { ... }
('item' is the document type of my index. 'uploader' in this case is an email address - something we want to search by, but don't want to leak to the user.)
Then I just include 'uploader' as usual when indexing source documents. I can search by it, but it's not returned in any results.
My related question: How to create elasticsearch index alias that excludes specific fields - not quite the same :)

Field not searchable in ES?

I created an index myindex in elasticsearch, loaded a few documents into it. When I visit:
localhost:9200/myindex/mytype/1023
I noticed that my particular index has the following metadata for mappings:
mappings: {
mappinggroupname: {
properties: {
Aproperty: {
type: string
}
Bproperty: {
type: string
}
}
}
}
Is there some way to add "store:yes" and index: "analyzed" without having to reload/reindex all the documents?
Note that when i want to view a single document...
i.e. localhost:9200/myindex/mytype/1023
I can see the _source field contains all the fields of that document are and when I go to the "Browser" section of the head plugin it appears that all the columns are correct and corresponding to my fieldnames. So why is it that "stored" is not showing up in metadata? I can even perform a _search on them.
What is the difference between "stored":"true" versus the fact that I can see all my fields and values after indexing all my documents via the means I mention above?
Nope, no way! That's how your documents got indexed in the underlying lucene. The only way to change it is to reindex them all!
You see all those fields because you see the content of the special _source field in lucene, that's stored by default through elasticsearch. You are not storing all the fields separately but you do have the source document that you originally indexed through the _source, a single field that contains the whole document.
Generally the _source field is just enough, you don't usually need to configure every field as stored.
Also, the default is "index":"analyzed" if not specified for all the string fields. That means those fields are indexed and analyzed using the standard analyzer if not specified in the mapping. Therefore, as far as I can see from your mapping those two fields should be indexed, thus searchable.

Resources