Elasticsearch - set max_clause_count

Elasticsearch - set max_clause_count - elasticsearch

I have a pretty big terms query in elasticsearch, so I get
too_many_clauses: maxClauseCount is set to 1024
I tried increasing it in the elasticsearch.yml by
index:
query:
bool:
max_clause_count: 10240
and via
curl -XPUT "http://localhost:9200/plastic/_settings" -d '{ "index" : { "max_clause_count" : 10000 } }'
but nothing worked. My index is named plastic.

In Elasticsearch 5, index.query.bool.max_clause_count has been deprecated/removed.
Insert in your elasticsearch.yml file indices.query.bool.max_clause_count : n instead (where n - new supported number of clauses).
NOTE: Here is link to documentation.

Background
NOTE: The advice given in this answer only applies to versions of Elasticsearch below 5.5. The method described references a property that was eventually removed in 5.5.
Search Settings
The setting index.query.bool.max_clause_count has been removed. In order
to set the maximum number of boolean clauses
indices.query.bool.max_clause_count should be used instead.
Ref - Breaking changes in 5.0 » Settings changes
Original Answer
Add the following:
index.query.bool.max_clause_count: 10240
To the file elasticsearch.yml on each node of the cluster, then of course, restart the nodes (any change in the config file needs a restart).

this can also be achieved by updating configuration inside elasticsearch.yml file (inside config folder of elastic installation)
indices.query.bool.max_clause_count: 4096

If you're looking to make changes to your cloud elasticsearch.yml file in a cloud deployment, see Add Elasticsearch user settings for the steps to follow to achieve this and the changes you're allowed to make. However, note that there is a limit to the settings you're allowed to change in the cloud deployment.
If you're looking to make changes to your local cluster, you can make the change directly in your elasticsearch.yml file located at "elasticsearch-x.x.x/config/elasticsearch.yml" by simply adding a line of the required setting to the file. For example, if you want to change indices.query.bool.max_clause_count from it's default value of 1024 to 4096, you can add the line indices.query.bool.max_clause_count: 4096 to your elasticsearch.yml file.
Remember that indices.query.bool.max_clause_count is 1024 be default, though the elasticsearch.yml does explicitly contain a line stating this value, yet it would resort to 1024 being the default value. So the only way to change this value is that you will explicitly add the line "indices.query.bool.max_clause_count: 4096". By including this line and specifying your preferred value in your own elasticsearch.yml file, you have ultimately modified the value "indices.query.bool.max_clause_count" for your cluster.
The following image shows how this line was appended to a sample elasticsearch.yml file:
After adding this line and saving the changes to your elasticsearch.yml file, start Elasticsearch, and then Kibana (if you're interacting with ES with Kibana). You can verify your setting by running the command: GET /_cluster/settings/?include_defaults in Kibana or curl -XGET "http://localhost:9200/_cluster/settings/?include_defaults" in your command line. Then, look for max_clause_count in the command's output to verify the value of indices.query.bool.max_clause_count

Related

Pick each file in a folder as a single log entry in elastic search with Logstash and Filebeat

I have a requirement where my elk has to pick each file in a folder as a single log.
I have parent_folder/ inside which a folder will be created for a run run_folder/ inside which a few types of log files are created. I need to push each file as a single log into elastic search.
Folder Structure
parent_folder/run1/file1.log
parent_folder/run1/file2.err
parent_folder/run1/file3.diff
...
parent_folder/run2/file1.log
parent_folder/run2/file2.err
parent_folder/run2/file3.diff
Elastic search should have
doc1{
message: the content of parent_folder/run1/file1.log
}
doc2{
message: the content of parent_folder/run1/file2.err
}
doc3{
message: the content of parent_folder/run2/file2.err
}
... so on
These files like parent_folder/run2/file2.err are written once and never changed or touched again, no need to monitor for changes.
Thanks

With filebeat, you can make use of multiline patterns. Find a pattern that never match on your log file and configure something like below in filebeat configuration.
multiline.pattern: 'never_matching_pattern'
multiline.match: after
Reference: https://discuss.elastic.co/t/filebeat-send-the-entire-logfile-as-a-single-message/118265

How to specify pipeline for Filebeat Nginx module?

I have web server (Ubuntu) with Nginx + PHP.
It has Filebeat, which sends Nginx logs to Elastic ingestion node directly (no Logstash or anything else).
When I just installed it 1st time, I made some customizations to the pipeline, which Filebeat created.
Everything worked great for a month or so.
But I noticed, that every Filebeat upgrade result in the creation of new pipeline. Currently I have these:
filebeat-7.3.1-nginx-error-pipeline: {},
filebeat-7.4.1-nginx-error-pipeline: {},
filebeat-7.2.0-nginx-access-default: {},
filebeat-7.3.2-nginx-error-pipeline: {},
filebeat-7.4.1-nginx-access-default: {},
filebeat-7.3.1-nginx-access-default: {},
filebeat-7.3.2-nginx-access-default: {},
filebeat-7.2.0-nginx-error-pipeline: {}
I can create new pipeline, but how do I tell (how to configure) Filebeat to use specific pipeline?
Here is what I tried and it doesn't work:
- module: nginx
# Access logs
access:
enabled: true
# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/var/log/nginx/*/*access.log"]
# Convert the timestamp to UTC
var.convert_timezone: true
# The Ingest Node pipeline ID associated with this input. If this is set, it
# overwrites the pipeline option from the Elasticsearch output.
output.elasticsearch.pipeline: 'filebeat-nginx-access-default'
pipeline: 'filebeat-nginx-access-default
It still using filebeat-7.4.1-nginx-error-pipeline pipeline.
Here is Filebeat instructions on how to configure it (but I can't make it work):
https://github.com/elastic/beats/blob/7.4/filebeat/filebeat.reference.yml#L1129-L1130
Question:
how can I configure Filebeat module to use specific pipeline?
Update (Nov 2019): I submitted related bug: https://github.com/elastic/beats/issues/14348

In beats source code, I found that the pipeline ID is settled by the following params:
beats version
module name
module's fileset name
pipeline filename
the source code snippet is as following:
// formatPipelineID generates the ID to be used for the pipeline ID in Elasticsearch
func formatPipelineID(module, fileset, path, beatVersion string) string {
return fmt.Sprintf("filebeat-%s-%s-%s-%s", beatVersion, module, fileset, removeExt(filepath.Base(path)))
}
So you cannot assign the pipeline ID, which needs the support of elastic officially.
For now, the pipeline ID is changed along with the four params. You MUST change the pipeline ID in elasticsearch when you upgrading beats.

Refer /{filebeat-HOME}/module/nginx/access/manifest.yml,
maybe u should set ingest_pipeline in /{filebeat-HOME}/modules.d/nginx.yml.
the value seems like a local file.

The pipeline can be configured either in your input or output configuration, not in the modules one.
So in your configuration you have different sections, the one you show in your question is for configuring the nginx module. You need to open filebeat.yml and look for the output section where you have configured elasticsearch and put the pipeline configuration there:
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["elk.slavikf.com:9200"]
pipeline: filebeat-nginx-access-default
If you need to be able to use different pipelines depending on the nature of data you can definitely do so using pipeline mappings:
output.elasticsearch:
hosts: ["elk.slavikf.com:9200"]
pipelines:
- pipeline: "nginx_pipeline"
when.contains:
type: "nginx"
- pipeline: "apache_pipeline"
when.contains:
type: "apache"

how do I know if my changes in elasticsearch.yml config file are reflecting or not? I have switched script.inline and script.indexed to true

how do I know if my changes in the elasticsearch.yml config file are reflecting or not? I have added the following two lines and restarted elasticsearch.
script.inline : true
script.indexed : true
I get the same error even on commenting out these two lines, when I run a "query" with "script" inline.
Attached screenshot of query and error message.
enter image description here
Thanks

Generating filebeat custom fields

I have an elasticsearch cluster (ELK) and some nodes sending logs to the logstash using filebeat. All the servers in my environment are CentOS 6.5.
The filebeat.yml file in each server is enforced by a Puppet module (both my production and test servers got the same configuration).
I want to have a field in each document which tells if it came from a production/test server.
I wanted to generate a dynamic custom field in every document which indicates the environment (production/test) using filebeat.yml file.
In order to work this out i thought of running a command which returns the environment (it is possible to know the environment throught facter) and add it under an "environment" custom field in the filebeat.yml file but I couldn't find any way of doing so.
Is it possible to run a command through filebeat.yml?
Is there any other way to achieve my goal?

In your filebeat.yml:
filebeat:
prospectors:
-
paths:
- /path/to/my/folder
input_type: log
# Optional additional fields. These field can be freely picked
# to add additional information to the crawled log files
fields:
mycustomvar: production

in filebeat-7.2.0 i use next syntax:
processors:
- add_fields:
target: ''
fields:
mycustomfieldname: customfieldvalue
note: target = '' means that mycustomfieldname is a top-level field
official 7.2 docs

Yes, you can add fields to the document through filebeats.
The official doc shows you how.

Cannot load index to elasticsearch from external file, using logstash

i am running one instance of elastic and one of logstash in parallel on the same computer.
when trying to load a file into elastic, using logstash that is running the config file below, i get the follwing output msgs on elastic and no file is loaded
(when input is configured to be stdin everything seems to be working just fine)
any ideas?
"
[2014-06-17 22:42:24,748][INFO ][cluster.service ] [Masked Marvel] removed {[logstash- Eitan-PC-5928-2010][Ql5fyvEGQyO96R9NIeP32g][Eitan-PC][inet[Eitan-PC/10.0.0.5:9301]]{client=true, data=false},}, reason: zen-disco-node_failed([logstash-Eitan-PC-5928-2010][Ql5fyvEGQyO96R9NIeP32g][Eitan-PC][inet[Eitan-PC/10.0.0.5:9301]]{client=true, data=false}), reason transport disconnected (with verified connect)
[2014-06-17 22:43:00,686][INFO ][cluster.service ] [Masked Marvel] added {[logstash-Eitan-PC-5292-4014][m0Tg-fcmTHW9aP6zHeUqTA][Eitan-PC][inet[/10.0.0.5:9301]]{client=true, data=false},}, reason: zen-disco-receive(join from node[[logstash-Eitan-PC-5292-4014][m0Tg-fcmTHW9aP6zHeUqTA][Eitan-PC][inet[/10.0.0.5:9301]]{client=true, data=false}])
"
config file:
input {
file {
path => "c:\testLog.txt"
}
}
output {
elasticsearch { host => localhost
index=> amat1
}
}

When you use "elasticsearch" as your output http://logstash.net/docs/1.4.1/outputs/elasticsearch as opposed to "elasticsearch_http" http://logstash.net/docs/1.4.1/outputs/elasticsearch_http you are going to want to set "protocol".
The reason is that it can have 3 different values, "node", "http" or "transport" with different behavior for each and the default selection is not well documented.
From the look of your log files it appears it's trying to use "node" protocol as I see connection attempts on port 9301 which indicates (along with other log entries) that logstash is trying to join the cluster as a node. This can fail for any number of reasons including mismatch on the cluster name.
I'd suggest setting protocol to "http" - that change has fixed similar issues before.
See also:
http://logstash.net/docs/1.4.1/outputs/elasticsearch#cluster
http://logstash.net/docs/1.4.1/outputs/elasticsearch#protocol
EDIT:
A few other issues I see in your config -
Your host and index should be strings, which in a logstash config
file should be wrapped with double quotes, "localhost" and "amat1".
No quotes may work but they recommend you use quotes.
http://logstash.net/docs/1.4.1/configuration#string
If you don't use "http" as the protocol or don't use
"elasticsearch_http" as the output you should set cluster equal to
your ES cluster name (as it will be trying to become a node of the
cluster).
You should set start_position under file in input to "beginning".
Otherwise it will default to reading from the end of the file and you
won't see any data. This a particular problem with Windows right now
as the other way of tracking position within a file, sincedb, is
broken on Windows:
https://logstash.jira.com/browse/LOGSTASH-1587
http://logstash.net/docs/1.4.1/inputs/file#start_position
You should change your path to your log file to this:
"C:/testLog.txt". Logstash prefers forward slashes and upper case
drive letters under Windows.
https://logstash.jira.com/browse/LOGSTASH-430

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio