Is it possible to modify property of an existing field from not_analyzed to analyzed ?
If not, what can I do in order to keep all my documents in store ?
I cannot delete mappings (because then all documents will be gone) and I need that old field as analyzed.
You cannot modify an existing field, however, you can either create another field or add a sub-field to your not_analyzed field.
I'm going for the latter solution. So first, add a new sub-field to your existing field, like this:
curl -XPUT localhost:9200/index/_mapping/type -d '{
"properties": {
"your_field": {
"type": "string",
"index": "not_analyzed",
"fields": {
"sub": {
"type": "string"
}
}
}
}
}'
Above, we've added the sub-field called your_field.sub (which is analyzed) to the existing your_field (which is not_analyzed)
Next, we'll need to populate that new sub-field. If you're running the latest ES 2.3, you can use the powerful Reindex API
curl -XPUT localhost:9200/_reindex -d '{
"source": {
"index": "index"
},
"dest": {
"index": "index"
},
"script": {
"inline": "ctx._source.your_field = ctx._source.your_field"
}
}'
Otherwise, you can simply use the following Logstash configuration which will re-index your data in order to populate the new sub-field
input {
elasticsearch {
hosts => "localhost:9200"
index => "index"
docinfo => true
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{[#metadata][_index]}"
document_type => "%{[#metadata][_type]}"
document_id => "%{[#metadata][_id]}"
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
You can use this... there is something known as multi field type mapping, which allows you to have more than one mapping for a single field, and you can also query based on the field type..
Related
Im having an index mapping with the following configuration:
"mappings" : {
"_source" : {
"excludes" : [
"special_field"
]
},
"properties" : {
"special_field" : {
"type" : "text",
"store" : true
},
}
}
So, when A new document is indexed using this mapping a got de following result:
{
"_index": "********-2021",
"_id": "************",
"_source": {
...
},
"fields": {
"special_field": [
"my special text"
]
}
}
If a _search query is perfomed, special_field is not returned inside _source as its excluded.
With the following _search query, special_field data is returned perfectly:
GET ********-2021/_search
{
"stored_fields": [ "special_field" ],
"_source": true
}
Right now im trying to reindex all documents inside that index, but im loosing the info stored in special_field and only _source field is getting reindexed.
Is there a way to put that special_field back inside _source field?
Is there a way to reindex that documents without loosing special_field data?
How could these documents be migrated to another cluster without loosing special_field data?
Thank you all.
Thx Hamid Bayat, I finally got it using a small logstash pipeline.
I will share it:
input {
elasticsearch {
hosts => "my-first-cluster:9200"
index => "my-index-pattern-*"
user => "****"
password => "****"
query => '{ "stored_fields": [ "special_field" ], "_source": true }'
size => 500
scroll => "5m"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "fields"]
}
}
filter {
if [#metadata][fields][special_field]{
mutate {
add_field => { "special_field" => "%{[#metadata][fields][special_field]}" }
}
}
}
output {
elasticsearch {
hosts => ["http://my-second-cluster:9200"]
password => "****"
user => "****"
index => "%{[#metadata][_index]}"
document_id => "%{[#metadata][_id]}"
template => "/usr/share/logstash/config/index_template.json"
template_name => "template-name"
template_overwrite => true
}
}
I had to add fields into docinfo_fields => ["_index", "_type", "_id", "fields"] elasticsearch input plugin and all my stored_fields were on [#metadata][fields] event field.
As the #metadata field is not indexed i had to add a new field at root level with [#metadata][fields][special_field] value.
Its working like a charm.
I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.
I have a logstash config file below. Elastic is reading my data as a b where as i want it to read it as ab i found i need to use not_analyzed for my sscat filed and max_shingle_size , min_shingle_size for products to get the best result.
Should I use not_analyzed for products field as well? Will that give better result?
How should I fill my my_id_analyzer to actually use the analyzer on different fields?
How should I connect the template with logstash config file?
input{
file{
path => "path"
start_position =>"beginning"
}
}
filter{
csv{
separator => ","
columns => ["Index", "Category", "Scat", "Sscat", "Products", "Measure", "Price", "Description", "Gst"]
}
mutate{convert => ["Index", "float"] }
mutate{convert => ["Price", "float"] }
mutate{convert => ["Gst", "float"] }
}
output{
elasticsearch{
hosts => "host"
user => "elastic"
password => "pass"
index => "masterdb"
}
}
I also have a template that can do it for all the future files that i upload
curl user:pass host:"host" /_template/logstash-id -XPUT -d '{
"template": "logstash-*",
"settings" : {
"analysis": {
"analyzer": {
"my_id_analyzer"{
}
}
}
}
},
"mappings": {
"properties" : {
"id" : { "type" : "string", "analyzer" : "my_id_analyzer" }
}
}
}'
You can use "ignore_above:" to restrict to a max length along with "not_analyzed" while creating mapping so that text doesn't get analyzed.
Declaring type as keyword instead of text will be other alternative for you.
Regarding the connecting template with logstash, why you need this? Once you have template created on elasticsearch, you can create your index which will follow the created template definition and you can start indexing.
I have already parsed a log file using logstash and put it into elasticsearch. I have a field called IP and it is mapped as a string now. I want to convert the existing mapping in elasticsearch to geoip without running logstash again. I have few million records in elasticsearch with this field. I want to convert the mapping of IP from string to geoip in all the records.
I'm afraid you still have to use Logstash for this because geoip is a Logstash filter and Elasticsearch doesn't have access to the GeoIP database by itself.
Fear not, though, you won't need to re-run Logstash on the raw log lines, you can simply re-index your ES documents using an elasticsearch input plugin and an elasticsearch output plugin and by tacking the geoip filter inbetween in order to transform the IP field into the geoip one.
Since you can't modify the mapping of your current IP field from string to geo_point, we need to make sure your index is ready to ingest GeoIP data. First check with the following command if your index already contains a geoip field in your mapping (which would have been created by Logstash using its predefined standard logstash-* template).
curl -XGET localhost:9200/logstash-xyz/_mapping
If you see a geoip field in the output of the above command, then you're good to go. Otherwise, we first need to create the geoip field with the type geo_point:
curl -XPUT localhost:9200/logstash-xyz/_mapping/your_type -d '{
"your_type": {
"properties": {
"geoip": {
"type": "object",
"dynamic": true,
"properties": {
"ip": {
"type": "ip",
"doc_values": true
},
"location": {
"type": "geo_point",
"doc_values": true
},
"latitude": {
"type": "float",
"doc_values": true
},
"longitude": {
"type": "float",
"doc_values": true
}
}
}
}
}
}'
Now your mapping is ready to receive GeoIP data. So, next we create a Logstash configuration file called geoip.conf that looks like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "logstash-xyz"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
geoip {
source => "IP" <--- the field containing the IP string
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "logstash-xyz"
document_id => "%{id}"
workers => 1
}
}
And then after setting the correct values (host + index), you can run this with bin/logstash -f geoip.conf. After running this, your documents should contain a new field called geoip with the GeoIP information.
Going forth, I suggest you directly add the geoip filter to your normal logstash configuration.
(I'm doing this with a fresh copy of Elasticsearch 1.5.2)
I've defined a custom analyzer and it's working:
curl -XPUT 127.0.0.1:9200/test -d '{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"UrlTokenizer": {
"type": "pattern",
"pattern": "https?://([^/]+)",
"group": 1
}
},
"analyzer": {
"accesslogs": {
"tokenizer": "UrlTokenizer"
}
}
}
}
}
}'; echo
curl '127.0.0.1:9200/test/_analyze?analyzer=accesslogs&text=http://192.168.1.1/123?a=2#1111' | json_pp
Now I apply it to an index:
curl -XPUT 127.0.0.1:9200/test/accesslogs/_mapping -d '{
"accesslogs" : {
"properties" : {
"referer" : { "type" : "string", "copy_to" : "referer_domain" },
"referer_domain": {
"type": "string",
"analyzer": "accesslogs"
}
}
}
}'; echo
From the mapping I can see both of them are applied.
Now I try to insert some data,
curl 127.0.0.1:9200/test/accesslogs/ -d '{
"referer": "http://192.168.1.1/aaa.php",
"response": 100
}';echo
And the copy_to field, aka referer_domain was not generated and if I try to add a field with that name, the tokenizer is not applied either.
Any ideas?
copy_to works but, you are assuming that since you don't see the field being generated, it doesn't exist.
When you return your document back (with GET /test/accesslogs/1 for example), you don't see the field under _source. This contains the original document that has been indexed. And you didn't index any referer_domain field, just referer and response. And this is the reason why you don't see it.
But Elasticsearch does create that field in the inverted index. You can use it to query, compute or retrieve if you stored it.
Let me exemplify my statements:
you can query that field and you will get results back based on it. If you really want to see what has been stored in the inverted index, you can do this:
GET /test/accesslogs/_search
{
"fielddata_fields": ["referer","response","referer_domain"]
}
you can, also, retrieve that field if you stored it:
"referer_domain": {
"type": "string",
"analyzer": "accesslogs",
"store" : true
}
with this:
GET /test/accesslogs/_search
{
"fields": ["referer","response","referer_domain"]
}
In conclusion, copy_to modifies the indexed document, not the source document. You can query your documents having that field and it will work because the query looks at the inverted index. If you want to retrieve that field you need to store it, as well. But you will not see that field in the _source field because _source is the initial document that has been indexed. And the initial document doesn't contain referer_domain.