ElasticSearch : field not returned - elasticsearch

I am new to ElasticSearch, please forgive my stupidity.
I cant seem to get the keepalive field out of ES.
{
"_index" : "2013122320",
"_type" : "log",
"_id" : "Y1M18ZItTDaap_rOAS5YOA",
"_score" : 1.0
}
I can get other field out of it cdn:
{
"_index" : "2013122320",
"_type" : "log",
"_id" : "2neLlVNKQCmXq6etTE6Kcw",
"_score" : 1.0,
"fields" : {
"cdn" : "-"
}
}
The mapping is there:
{
"log": {
"_timestamp": {
"enabled": true,
"store": true
},
"properties": {
"keepalive": {
"type": "integer"
}
}
}
}
EDIT
We create a new index every hour using the following perl code
create_index(
index => $index,
settings => {
_timestamp => { enabled => 1, store => 1 },
number_of_shards => 3,
number_of_replicas => 1,
},
mappings => {
varnish => {
_timestamp => { enabled => 1, store => 1 },
properties => {
content_length => { type => 'integer' },
age => { type => 'integer' },
keepalive => { type => 'integer' },
host => { type => 'string', index => 'not_analyzed' },
time => { type => 'string', store => 'yes' },
<SNIPPED>
location => { type => 'string', index => 'not_analyzed' },
}
}
}
);

With so little informations, I can only guess :
In the mapping you gave, keepalive is not explicitely stored and alasticsearch defaults to no. If you do not store a field, you can only get it via the complete source, wich is stored by default. Or you change, the mapping, adding ("store" : "yes") to your field and reindex.
Good luck with ES, It is well worth a few days of learning.

Related

creating data stream through logstash

I have installed elasticsearch cluster v 7.14.
I have created ILM policy and Index template. However data stream parameters mentioned under logstash pipeline file are giving error.
ILM policy -
{
"testpolicy" : {
"version" : 1,
"modified_date" : "2021-08-28T02:58:25.942Z",
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_primary_shard_size" : "900mb",
"max_age" : "2d"
},
"set_priority" : {
"priority" : 100
}
}
},
"delete" : {
"min_age" : "2d",
"actions" : {
"delete" : {
"delete_searchable_snapshot" : true
}
}
}
}
},
"in_use_by" : {
"indices" : [ ],
"data_streams" : [ ],
"composable_templates" : [ ]
}
}
}
Index temaplate -
{
"index_templates" : [
{
"name" : "access_template",
"index_template" : {
"index_patterns" : [
"test-data-stream*"
],
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings" : {
"_routing" : {
"required" : false
},
"dynamic_date_formats" : [
"strict_date_optional_time",
"yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
],
"numeric_detection" : true,
"_source" : {
"excludes" : [ ],
"includes" : [ ],
"enabled" : true
},
"dynamic" : true,
"dynamic_templates" : [ ],
"date_detection" : true
}
},
"composed_of" : [ ],
"priority" : 500,
"version" : 1,
"data_stream" : {
"hidden" : false
}
}
}
]
}
logstash pipeline config file -
input {
beats {
port => 5044
}
}
filter {
if [log_type] == "access_server" and [app_id] == "pa"
{
grok {
match => {
"message" => "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:%{MINUTE}(?::?%{SECOND})\| %{USERNAME:exchangeId}\| %{DATA:trackingId}\| %{NUMBER:RoundTrip:int}%{SPACE}ms\| %{NUMBER:ProxyRoundTrip:int}%{SPACE}ms\| %{NUMBER:UserInfoRoundTrip:int}%{SPACE}ms\| %{DATA:Resource}\| %{DATA:subject}\| %{DATA:authmech}\| %{DATA:scopes}\| %{IPV4:Client}\| %{WORD:method}\| %{DATA:Request_URI}\| %{INT:response_code}\| %{DATA:failedRuleType}\| %{DATA:failedRuleName}\| %{DATA:APP_Name}\| %{DATA:Resource_Name}\| %{DATA:Path_Prefix}"
}
}
mutate {
replace => {
"[type]" => "access_server"
}
}
}
}
output {
if [log_type] == "access_server" {
elasticsearch {
hosts => ['http://10.10.10.76:9200']
user => elastic
password => xxx
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "access"
data_stream_namespace => "default"
ilm_rollover_alias => "access"
ilm_pattern => "000001"
ilm_policy => "testpolicy"
template => "/tmp/access_template"
template_name => "access_template"
}
}
elasticsearch {
hosts => ['http://10.10.10.76:9200']
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
user => elastic
password => xxx
}
}
After all deployment done, can only see system indices but data stream is not created.
[2021-08-28T12:42:50,103][ERROR][logstash.outputs.elasticsearch][main] Invalid data stream configuration, following parameters are not supported: {"template"=>"/tmp/pingaccess_template", "ilm_pattern"=>"000001", "template_name"=>"pingaccess_template", "ilm_rollover_alias"=>"pingaccess", "ilm_policy"=>"testpolicy"}
[2021-08-28T12:42:50,547][ERROR][logstash.javapipeline ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: Invalid data stream configuration: ["template", "ilm_pattern", "template_name", "ilm_rollover_alias", "ilm_policy"]>, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-elasticsearch-11.0.2-java/lib/logstash/outputs/elasticsearch/data_stream_support.rb:57:in `check_data_stream_config!'"
[2021-08-28T12:42:50,702][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
error is saying parameters like template"=>"/tmp/pingaccess_template", "ilm_pattern"=>"000001", "template_name"=>"pingaccess_template", "ilm_rollover_alias"=>"pingaccess", "ilm_policy"=>"testpolicy" are not valid but in below link they are mentioned.
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-data-streams
The solution is to use logstash without be "aware" of data_stream.
FIRST of all (before running logstash) create your ILM and index_template BUT adding the "index.lifecycle.name" in the settings. That way, you are linking the template and ILM. Also, don't forget the data_stream in the index template.
{
"index_templates" : [
{
"name" : "access_template",
"index_template" : {
"index_patterns" : [
"test-data-stream*"
],
"template" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0",
"index.lifecycle.name": "testpolicy"
}
},
"mappings" : {
...
}
},
"composed_of" : [ ],
"priority" : 500,
"version" : 1,
"data_stream" : {
"hidden" : false
}
}
}
]
}
Keep Logstash output like if data_stream doesn't exist but add action => create. This is because you can't use "index" API with data streams. Need the _create API call.
output { elasticsearch {
hosts => ['http://10.10.10.76:9200']
index => "test-data-stream"
user => elastic
password => xxx
action => "create"
}
That way, logstash will output to ES, but the index template will be applied automatically (because of pattern match) and also the ILM and data_stream will be applied.
Important: To make it work, you need to start from scratch. If the index "test-data-stream" already exists in ES (as a traditional index), then data_stream will NOT be created. Make the test with another index name to make sure it works.
The documentation is unclear, but the plugin does not support those options when datastream output is enabled. The plugin is logging the options returned by the invalid_data_stream_params function, which allows action, routing, data_stream, anything else that starts with data_stream_, the shared options defined by the mixin, and the common options defined by the output plugin base.

Elasticsearch - Reindex documents with stored / excluded fields

Im having an index mapping with the following configuration:
"mappings" : {
"_source" : {
"excludes" : [
"special_field"
]
},
"properties" : {
"special_field" : {
"type" : "text",
"store" : true
},
}
}
So, when A new document is indexed using this mapping a got de following result:
{
"_index": "********-2021",
"_id": "************",
"_source": {
...
},
"fields": {
"special_field": [
"my special text"
]
}
}
If a _search query is perfomed, special_field is not returned inside _source as its excluded.
With the following _search query, special_field data is returned perfectly:
GET ********-2021/_search
{
"stored_fields": [ "special_field" ],
"_source": true
}
Right now im trying to reindex all documents inside that index, but im loosing the info stored in special_field and only _source field is getting reindexed.
Is there a way to put that special_field back inside _source field?
Is there a way to reindex that documents without loosing special_field data?
How could these documents be migrated to another cluster without loosing special_field data?
Thank you all.
Thx Hamid Bayat, I finally got it using a small logstash pipeline.
I will share it:
input {
elasticsearch {
hosts => "my-first-cluster:9200"
index => "my-index-pattern-*"
user => "****"
password => "****"
query => '{ "stored_fields": [ "special_field" ], "_source": true }'
size => 500
scroll => "5m"
docinfo => true
docinfo_fields => ["_index", "_type", "_id", "fields"]
}
}
filter {
if [#metadata][fields][special_field]{
mutate {
add_field => { "special_field" => "%{[#metadata][fields][special_field]}" }
}
}
}
output {
elasticsearch {
hosts => ["http://my-second-cluster:9200"]
password => "****"
user => "****"
index => "%{[#metadata][_index]}"
document_id => "%{[#metadata][_id]}"
template => "/usr/share/logstash/config/index_template.json"
template_name => "template-name"
template_overwrite => true
}
}
I had to add fields into docinfo_fields => ["_index", "_type", "_id", "fields"] elasticsearch input plugin and all my stored_fields were on [#metadata][fields] event field.
As the #metadata field is not indexed i had to add a new field at root level with [#metadata][fields][special_field] value.
Its working like a charm.

Nested object in Elasticsearch Using NEST

We have created a nested object in our index mapping as shown:
.Nested<Bedroom>(n=>n.Name(c=>c.Beds).IncludeInParent(true).Properties(pp=>pp
.Number(d => d.Name(c => c.BedId).Type(NumberType.Long))
.Number(d => d.Name(c => c.PropertyId).Type(NumberType.Long))
.Number(d => d.Name(c => c.SingleDoubleShared).Type(NumberType.Integer))
.Number(d => d.Name(c => c.Price).Type(NumberType.Integer))
.Number(d => d.Name(c => c.RentFrequency).Type(NumberType.Integer))
.Date(d => d.Name(c => c.AvailableFrom))
.Boolean(d => d.Name(c => c.Ensuite))
However we are experiencing 2 problems.
1- The AvailableFrom field does not get included in the index mapping (the following show the missing field from Kibana index pattern page)
beds.bedId
beds.ensuite
beds.price
beds.propertyId
beds.rentFrequency
beds.singleDoubleShared
Thanks JFM for a constructive comment. This is that part of the mapping in Elastic
"beds" : {
"type" : "nested",
"include_in_parent" : true,
"properties" : {
"availableFrom" : {
"type" : "date"
},
"bedId" : {
"type" : "long"
},
"ensuite" : {
"type" : "boolean"
},
"price" : {
"type" : "integer"
},
"propertyId" : {
"type" : "long"
},
"rentFrequency" : {
"type" : "integer"
},
"singleDoubleShared" : {
"type" : "integer"
}
}
I can see the availablfrom here but not in the index pattern?
Why
2- When we try to index a document with a nested object, the whole Application (MVC Core 3) crashes.
Would appreciate any assistance.

Logstash couchdb_changes doesn't correctly propagate document deletion to Elasticsearch

I am trying to use the couchdb_changes Logstash plugin to detect my CouchDB changes and update Elasticsearch index adequately.
Document creations/updates work fine, but somehow deletions do not work.
Here is my Logstash configuration:
input {
couchdb_changes {
host => "localhost"
db => "products"
sequence_path => ".couchdb_products_seq"
type => "product"
tags => ["product"]
keep_revision => true
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "products"
# Pass the CouchDB document ID to Elastic, otherwise it is lost and Elastic generates a new one
document_id => "%{[#metadata][_id]}"
}
# Debug
stdout {
codec => rubydebug {
metadata => true
}
}
}
I came across this link but the "protocol" parameter no longer exists in the elasticsearch Logstash plugin, and I would expect such a huge bug to be fixed by now.
In my Logstash console I see this when I delete a CouchDB document (from Futon):
{
"#version" => "1",
"#timestamp" => "2016-05-13T14:06:55.734Z",
"type" => "product",
"tags" => [
[0] "product"
],
"#metadata" => {
"_id" => "15d6f519d6827a2f28de4df1d40082d5",
"action" => "delete",
"seq" => 10020
}
}
So instead of deleting document with id "15d6f519d6827a2f28de4df1d40082d5", it replaces its content. Here is the document "15d6f519d6827a2f28de4df1d40082d5" after the deletion, in Elasticsearch:
curl -XGET 'localhost:9200/products/product/15d6f519d6827a2f28de4df1d40082d5?pretty'
{
"_index" : "products",
"_type" : "product",
"_id" : "15d6f519d6827a2f28de4df1d40082d5",
"_version" : 3,
"found" : true,
"_source" : {
"#version" : "1",
"#timestamp" : "2016-05-13T14:06:55.734Z",
"type" : "product",
"tags" : [ "product" ]
}
}
Any idea of why the deletion doesn't work? Is this a bug of the couchdb_changes plugin? The elasticsearch plugin?
For information, here are my app versions:
Elasticsearch 2.3.2
Logstash 2.3.2
Apache CouchDB 1.6.1
I think I found the problem.
I had to manually add this line in the logstash output.elasticsearch configuration:
action => "%{[#metadata][action]}"
in order to pass the "delete" from metadata to Elasticsearch.
Now there is another issue with upsert, but it's tracked in a GitHub ticket.
Edit: To bypass theupsert issue, I actually changed my configuration to this (mainly, add a field to store whether the action is a delete):
input {
couchdb_changes {
host => "localhost"
db => "products"
sequence_path => ".couchdb_products_seq"
type => "product"
tags => ["product"]
keep_revision => true
}
}
filter {
if [#metadata][action] == "delete" {
mutate {
add_field => { "elastic_action" => "delete" }
}
} else {
mutate {
add_field => { "elastic_action" => "index" }
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "products"
document_id => "%{[#metadata][_id]}"
action => "%{elastic_action}"
}
# Debug
stdout {
codec => rubydebug {
metadata => true
}
}
}
I am nowhere near an expert in Logstash/Elasticsearch, but this seems to work for the moment.

elasticsearch aggregation PHP

i am trying to get the unique values from my elasticsearch database.
So i want the unique names from my elasticsearch database.
So i am aggregation like so ---
$paramss = [
'index' => 'myIndex',
'type' => 'myType',
'ignore_unavailable' => true,
'ignore' => [404, 500]
];
$paramss['body'] = <<<JSON
{
"size": 0,
"aggs" : {
"langs" : {
"terms" : { "field" : "name" }
}
}}
JSON;
$results = $client->search($paramss);
print_r(json_encode($results));
i get the result like so---
{
took: 3,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1852,
max_score: 0,
hits: [
]
},
aggregations: {
langs: {
buckets: [
{
key: "aaaa.se",
doc_count: 430
},
{
key: "bbbb.se",
doc_count: 358
},
{
key: "cccc.se",
doc_count: 49
},
{
key: "eeee.com",
doc_count: 46
}
]
}
}
}
But the problem is i am not getting all the unique values, I am getting only 10 values, which is default value for elasticsearch query.
So how can i change the query size !!!
i tried like so---
$paramss = [
'index' => 'myIndex',
'type' => 'myType',
'size' => 1000,
'ignore_unavailable' => true,
'ignore' => [404, 500]
];
which returns me some weird documents.
So do anyone knows the solution of this problem.
How can i get all the unique names from my elasticsearch database, can someone help me to fix this problem.
You are also doing everuthing right, except you the size.
The "size": 0 should come after the targeted field's name.
$client = new Elasticsearch\Client($params);
$query['body'] = '{
"aggs" : {
"all_sources" : {
"terms" : {
"field" : "source",
"order" : { "_term" : "asc" },
"size": 0
}
}
}
}';
You need to put size parameter inside terms:
{
"aggs" : {
"langs" : {
"terms" : {
"field" : "name",
"size": 0
}
}
}}
Link to documentation where you can find more info:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Resources