Does Logstash support Elasticsearch's _update_by_query? - elasticsearch

Does the Elasticsearch output plugin support elasticsearch's _update_by_query?
https://www.elastic.co/guide/en/logstash/6.5/plugins-outputs-elasticsearch.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html

The elasticsearch output plugin can only make calls to the _bulk endpoint, i.e. using the Bulk API.
If you want to call the Update by Query API, you need to use the http output plugin and construct the query inside the event yourself. If you explain what you want to achieve, I can update my answer with some more details.
Note: There's an issue requesting this feature, but it's still open after two years.
UPDATE
So if your input event is {"cname":"wang", "cage":11} and you want to update by query all documents with "cname":"wang" to set "cage":11, your query needs to look like this:
POST your-index/_update_by_query
{
"script": {
"source": "ctx._source.cage = params.cage",
"lang": "painless",
"params": {
"cage": 11
}
},
"query": {
"term": {
"cname": "wang"
}
}
}
So your Logstash config should look like this (your input may vary but I used stdin for testing purposes):
input {
stdin {
codec => "json"
}
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.cage = params.cage"
"[script][params][cage]" => "%{cage}"
"[query][term][cname]" => "%{cname}"
}
remove_field => ["host", "#version", "#timestamp", "cname", "cage"]
}
}
output {
http {
url => "http://localhost:9200/index/doc/_update_by_query"
http_method => "post"
format => "json"
}
}

The same result can be obtained with standard elasticsearch plugins:
input {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
index => "<your index pattern>"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
...
}
output {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
action => "update"
document_id => "%{[#metadata][_id]}"
index => "%{[#metadata][_index]}"
}
}

Related

Splitting A Nested JSON In Logstash

I am working on a project, ingesting CVE data from NVD into Elasticsearch.
My input data looks something like this:
{
"resultsPerPage":20,
"startIndex":0,
"totalResults":189227,
"result":{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_timestamp":"2022-11-23T08:27Z",
"CVE_Items":[
{...},
{...},
{...},
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2022-45060",
"ASSIGNER":"cve#mitre.org"
},
"problemtype":{...},
"references":{...},
"description":{
"description_data":[
{
"lang":"en",
"value":"An HTTP Request Forgery issue was discovered in Varnish Cache 5.x and 6.x before 6.0.11, 7.x before 7.1.2, and 7.2.x before 7.2.1. An attacker may introduce characters through HTTP/2 pseudo-headers that are invalid in the context of an HTTP/1 request line, causing the Varnish server to produce invalid HTTP/1 requests to the backend. This could, in turn, be used to exploit vulnerabilities in a server behind the Varnish server. Note: the 6.0.x LTS series (before 6.0.11) is affected."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"children":[
],
"cpe_match":[
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r2:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r1:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish_cache_project:varnish_cache:7.2.0:*:*:*:*:*:*:*",
"cpe_name":[
]
}
]
}
]
},
"impact":{...},
"publishedDate":"2022-11-09T06:15Z",
"lastModifiedDate":"2022-11-23T03:15Z"
}
]
}
}
Each query from the API gives back a result like this, containing 20 individual CVEs, each CVE contains multiple configuration. What I wanted to achieve was to have one document per configuration, where the other data is the same, only the cpe23Uri field changes. I've tried using the split filter in multiple different ways, with no success.
My pipeline looks like this:
filter {
json {
source => "message"
}
split {
field => "[result][CVE_Items]"
}
split {
field => "[result][CVE_Items][cve][description][description_data]"
}
}
kv {
source => "message"
field_split => ","
}
mutate {
remove_field => [...]
rename =>
rename => ["[result][CVE_Items][configurations][nodes]", "Affected_Products"]
.
.
.
rename => ["[result][CVE_data_timestamp]", "CVE_Timestamp"]
}
fingerprint {
source => ["CVE", "Affected Products"]
target => "fingerprint"
concatenate_sources => "true"
method => "SHA256"
}
}
For splitting I've tried simply using:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
and also with adding [cpe23Uri] to the end,
as well as:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
target => "cpeCopy"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
if [result][CVE_Items][configurations][nodes][cpe_match] {
ruby {
code => "event['cpeCopy'] = event['[result][CVE_Items][configurations][nodes][cpe_match]'][0]"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
}
if [cpeCopy] {
mutate {
rename => { "cpeCopy" => "CPE" }
}
As I've said I've tried multiple ways, I won't list them all but unfortunately nothing has worked so far.

Read a CSV in Logstash level and filter on basis of the extracted data

I am using Metricbeat to get process-level data and push it to Elastic Search using Logstash.
Now, the aim is to categorize the processes into 2 tags i.e the process running is either a browser or it is something else.
I am able to do that statically using this block of code :
input {
beats {
port => 5044
}
}
filter{
if [process][name]=="firefox.exe" or [process][name]=="chrome.exe" {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash"
}
}
But when I try to make that if condition dynamic by reading the process list from a CSV, I am not getting any valid results in Kibana, nor a error on my LogStash level.
The CSV config file code is as follows :
input {
beats {
port => 5044
}
file{
path=>"filePath"
start_position=>"beginning"
sincedb_path=>"NULL"
}
}
filter{
csv{
separator=>","
columns=>["processList","IT"]
}
if [process][name] in [processList] {
mutate {
add_field => { "process.type" => "browsers" }
convert => {
"process.type" => "string"
}
}
}
else {
mutate {
add_field => { "process.type" => "other" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
# manage_template => false
index => "metricbeatlogstash2"
}
}
What you are trying to do does not work that way in logstash, the events in a logstash pipeline are independent from each other.
The events received by your beats input have no knowledge about the events received by your csv input, so you can't use fields from different events in a conditional.
To do what you want you can use the translate filter with the following config.
translate {
field => "[process][name]"
destination => "[process][type]"
dictionary_path => "process.csv"
fallback => "others"
refresh_interval => 300
}
This filter will check the value of the field [process][name] against a dictionary, loaded into memory from the file process.csv, the dictionary is a .csv file with two columns, the first is the name of the browser process and the second is always browser.
chrome.exe,browser
firefox.exe,browser
If the filter got a match, it will populate the field [process][type] (not process.type) with the value from the second column, in this case, always browser, if there is no match, it will populate the field [process][type] with the value of the fallback config, in this case, others, it will also reload the content of the process.csv file every 300 seconds (5 minutes)

logstash output to elasticsearch with document_id; what to do when I don't have a document_id?

I have some logstash input where I use the document_id to remove duplicates. However, most input doesn't have a document_id. The following plumbs the actual document_id through, but if it doesn't exist, it gets accepted as literally %{document_id}, which means most documents are seen as a duplicate of each other. Here's what my output block looks like:
output {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
}
I thought I might be able to use a conditional in the output. It fails, and the error is given below the code.
output {
elasticsearch_http {
host => "127.0.0.1"
if document_id {
document_id => "%{document_id}"
}
}
}
Error: Expected one of #, => at line 101, column 8 (byte 3103) after output {
elasticsearch_http {
host => "127.0.0.1"
if
I tried a few "if" statements and they all fail, which is why I assume the problem is having a conditional of any sort in that block. Here are the alternatives I tried:
if document_id <> "" {
if [document_id] <> "" {
if [document_id] {
if "hello" <> "" {
You're close with the conditional idea but you can't place it inside a plugin block. Do this instead:
output {
if [document_id] {
elasticsearch_http {
host => "127.0.0.1"
document_id => "%{document_id}"
}
} else {
elasticsearch_http {
host => "127.0.0.1"
}
}
}
(But the suggestion in one of the other answers to use the uuid filter is good too.)
One way to solve this is to make sure a document_idis always available. You can achieve this by adding a UUID filter in the filter section that would create the document_id field if it is not present.
filter {
if "" in [document_id] {
uuid {
target => "document_id"
}
}
}
Edited per Magnus Bäck's suggestion. Thanks!
Reference : docinfo_fields
For any document added in elasticsearch, the _id is auto-generated if not specified during insert. We can use this same _id later to update/delete/search queries by using docinfo_fields feature.
Example :
filter {
json {
source => "message"
}
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
query => "..."
docinfo_fields => {
"_id" => "docid"
"_index" => "document_index"
}
}
if ("_elasticsearch_lookup_failure" not in [tags]) {
#... doc update logic ...
}
}
output {
elasticsearch {
hosts => "http://localhost:9200/"
user => elastic
password => elastic
index => "%{document_index}"
action => "update"
doc_as_upsert => true
document_id => "%{docid}"
}
}

ELK for windows logs processing

I've made a working ELK stack on Debian Wheezy and have set up Nxlog to gather windows logs. I see the logs in Kibana - everything is working fine, but i get too much data and want to filter it by removing some fields that I don't need.
I've made a filter section but it's not working at all. What can be the reason?
The filter above
input {
tcp {
type => "eventlog"
port => 3515
format => "json"
}
}
filter {
type => "eventlog"
mutate {
remove => { "Hostname", "Keywords", "SeverityValue", "Severity", "SourceName", "ProviderGuid" }
remove => { "Version", "Task", "OpcodeValue", "RecordNumber", "ProcessID", "ThreadID", "Channel" }
remove => { "Category", "Opcode", "SubjectUserSid", "SubjectUserName", "SubjectDomainName" }
remove => { "SubjectLogonId", "ObjectType", "IpPort", "AccessMask", "AccessList", "AccessReason" }
remove => { "EventReceivedTime", "SourceModuleName", "SourceModuleType", "#version", "type" }
remove => { "_index", "_type", "_id", "_score", "_source", "KeyLength", "TargetUserSid" }
remove => { "TargetDomainName", "TargetLogonId", "LogonType", "LogonProcessName", "AuthenticationPackageName" }
remove => { "LogonGuid", "TransmittedServices", "LmPackageName", "ProcessName", "ImpersonationLevel" }
}
}
output {
elasticsearch {
cluster => "wisp"
node_name => "io"
}
}
I think you try to remove fields that do not exist in some logs.
Does all your logs contains all the fieds you're trying to remove ?
If not, you have to identify your logs before removing fields.
Your filter config will look like this :
filter {
type => "eventlog"
if [somefield] == "somevalue" {
mutate {
remove => { "specificfieldtoremove1", "specificfieldtoremove2" }
}
}
}

Changing the elasticsearch host in logstash 1.3.3 web interface

I followed the steps in this document and I was able to do get some reports on the Shakespeare data.
I want to do the same thing with elastic search remotely installed.I tried configuring the "host" in config file but the queries still run on host as opposed to remote .This is my config file
input {
stdin{
type => "stdin-type" }
file {
type => "accessLog"
path => [ "/Users/akushe/Downloads/requests.log" ]
}
}
filter {
grok {
match => ["message","%{COMMONAPACHELOG} (?:%{INT:responseTime}|-)"]
}
kv {
source => "request"
field_split => "&?"
}
if [lng] {
kv {
add_field => [ "location" , ["%{lng}","%{lat}"]]
}
}else if [lon] {
kv {
add_field => [ "location" , ["%{lon}","%{lat}"]]
}
}
}
output {
elasticsearch {
host => "slc-places-qa-es3001.slc.where.com"
port => 9200
}
}
You need to add protocol => http in to make it use HTTP transport rather than joining the cluster using multicast.

Resources