Im trying to use the http_poller to fetch the data from ElasticSearch and write them into another ES. While doing this, ES query need to done as a POST request.
In the examples provided, I could not find the parameters that shoukd be used to post the body and it referred to the manticore client from ruby. Based n that, I have used the params parameter to post the body.
The http_poller component looks like this
input {
http_poller {
urls => {
some_other_service => {
method => "POST"
url => "http://localhost:9200/index-2016-03-26/_search"
params => '"query": { "filtered": { "filter": { "bool": { "must": [ { "term": { "SERVERNAME": "SERVER1" }}, {"range": { "eventtime": { "gte": "26/Mar/2016:13:00:00" }}} ]}}} }"'
}
}
# Maximum amount of time to wait for a request to complete
request_timeout => 300
# How far apart requests should be
interval => 300
# Decode the results as JSON
codec => "json"
# Store metadata about the request in this key
metadata_target => "http_poller_metadata"
}
}
output {
stdout {
codec => json
}
}
When I execute this, the Logstash gives an error,
Error: Name may not be null {:level=>:error}
Any help is appreciated.
The guess that I have is that the params need to be really key value pairs but then the question is as to how to post a query using logstash.
I referred to this link to get the available options for the HTTP Client
https://github.com/cheald/manticore/blob/master/lib/manticore/client.rb
Since I got the answer when I tried different options, thought I would share the solution as well.
Replace params with body in the above payload.
The correct payload to do a post using HTTP Poller is
input {
http_poller {
urls => {
some_other_service => {
method => "POST"
url => "http://localhost:9200/index-2016-03-26/_search"
body=> '"query": { "filtered": { "filter": { "bool": { "must": [ { "term": { "SERVERNAME": "SERVER1" }}, {"range": { "eventtime": { "gte": "26/Mar/2016:13:00:00" }}} ]}}} }"'
}
}
# Maximum amount of time to wait for a request to complete
request_timeout => 300
# How far apart requests should be
interval => 300
# Decode the results as JSON
codec => "json"
# Store metadata about the request in this key
metadata_target => "http_poller_metadata"
}
}
output {
stdout {
codec => json
}
}
Related
I am working on a project, ingesting CVE data from NVD into Elasticsearch.
My input data looks something like this:
{
"resultsPerPage":20,
"startIndex":0,
"totalResults":189227,
"result":{
"CVE_data_type":"CVE",
"CVE_data_format":"MITRE",
"CVE_data_version":"4.0",
"CVE_data_timestamp":"2022-11-23T08:27Z",
"CVE_Items":[
{...},
{...},
{...},
{
"cve":{
"data_type":"CVE",
"data_format":"MITRE",
"data_version":"4.0",
"CVE_data_meta":{
"ID":"CVE-2022-45060",
"ASSIGNER":"cve#mitre.org"
},
"problemtype":{...},
"references":{...},
"description":{
"description_data":[
{
"lang":"en",
"value":"An HTTP Request Forgery issue was discovered in Varnish Cache 5.x and 6.x before 6.0.11, 7.x before 7.1.2, and 7.2.x before 7.2.1. An attacker may introduce characters through HTTP/2 pseudo-headers that are invalid in the context of an HTTP/1 request line, causing the Varnish server to produce invalid HTTP/1 requests to the backend. This could, in turn, be used to exploit vulnerabilities in a server behind the Varnish server. Note: the 6.0.x LTS series (before 6.0.11) is affected."
}
]
}
},
"configurations":{
"CVE_data_version":"4.0",
"nodes":[
{
"operator":"OR",
"children":[
],
"cpe_match":[
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r2:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish-software:varnish_cache_plus:6.0.8:r1:*:*:*:*:*:*",
"cpe_name":[
]
},
{
"vulnerable":true,
"cpe23Uri":"cpe:2.3:a:varnish_cache_project:varnish_cache:7.2.0:*:*:*:*:*:*:*",
"cpe_name":[
]
}
]
}
]
},
"impact":{...},
"publishedDate":"2022-11-09T06:15Z",
"lastModifiedDate":"2022-11-23T03:15Z"
}
]
}
}
Each query from the API gives back a result like this, containing 20 individual CVEs, each CVE contains multiple configuration. What I wanted to achieve was to have one document per configuration, where the other data is the same, only the cpe23Uri field changes. I've tried using the split filter in multiple different ways, with no success.
My pipeline looks like this:
filter {
json {
source => "message"
}
split {
field => "[result][CVE_Items]"
}
split {
field => "[result][CVE_Items][cve][description][description_data]"
}
}
kv {
source => "message"
field_split => ","
}
mutate {
remove_field => [...]
rename =>
rename => ["[result][CVE_Items][configurations][nodes]", "Affected_Products"]
.
.
.
rename => ["[result][CVE_data_timestamp]", "CVE_Timestamp"]
}
fingerprint {
source => ["CVE", "Affected Products"]
target => "fingerprint"
concatenate_sources => "true"
method => "SHA256"
}
}
For splitting I've tried simply using:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
and also with adding [cpe23Uri] to the end,
as well as:
split {
field => "[result][CVE_Items][configurations][nodes][cpe_match]"
target => "cpeCopy"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
if [result][CVE_Items][configurations][nodes][cpe_match] {
ruby {
code => "event['cpeCopy'] = event['[result][CVE_Items][configurations][nodes][cpe_match]'][0]"
remove_field => "[result][CVE_Items][configurations][nodes][cpe_match]"
}
}
if [cpeCopy] {
mutate {
rename => { "cpeCopy" => "CPE" }
}
As I've said I've tried multiple ways, I won't list them all but unfortunately nothing has worked so far.
I am trying to add a _ttl field in logstash so that elasticsearch removes the document after a while, 120 seconds in this case but that's for testing.
filter {
if "drop" in [message] {
drop { }
}
add_field => { "_ttl" => "120s" }
}
but now nothing is logged in elasticsearch.
I have 2 questions.
Where is logged what is going wrong, maybe the syntax of the filter is wrong?
How do I add a ttl field to elasticsearch for auto removal?
When you add a filter to logstash.conf with a mutator it works:
filter {
mutate {
add_field => { "_ttl" => "120s" }
}
}
POST myindex/_search
{
"query": {
"match_all": {}
}
}
Results:
"hits": [
{
"_index": "myindex",
...................
"_ttl": "120s",
For the other question, cant really help there. Im running logstash as container so logging is read with:
docker logs d492eb3c3d0d
I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object
Does the Elasticsearch output plugin support elasticsearch's _update_by_query?
https://www.elastic.co/guide/en/logstash/6.5/plugins-outputs-elasticsearch.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
The elasticsearch output plugin can only make calls to the _bulk endpoint, i.e. using the Bulk API.
If you want to call the Update by Query API, you need to use the http output plugin and construct the query inside the event yourself. If you explain what you want to achieve, I can update my answer with some more details.
Note: There's an issue requesting this feature, but it's still open after two years.
UPDATE
So if your input event is {"cname":"wang", "cage":11} and you want to update by query all documents with "cname":"wang" to set "cage":11, your query needs to look like this:
POST your-index/_update_by_query
{
"script": {
"source": "ctx._source.cage = params.cage",
"lang": "painless",
"params": {
"cage": 11
}
},
"query": {
"term": {
"cname": "wang"
}
}
}
So your Logstash config should look like this (your input may vary but I used stdin for testing purposes):
input {
stdin {
codec => "json"
}
}
filter {
mutate {
add_field => {
"[script][lang]" => "painless"
"[script][source]" => "ctx._source.cage = params.cage"
"[script][params][cage]" => "%{cage}"
"[query][term][cname]" => "%{cname}"
}
remove_field => ["host", "#version", "#timestamp", "cname", "cage"]
}
}
output {
http {
url => "http://localhost:9200/index/doc/_update_by_query"
http_method => "post"
format => "json"
}
}
The same result can be obtained with standard elasticsearch plugins:
input {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
index => "<your index pattern>"
size => 500
scroll => "5m"
docinfo => true
}
}
filter {
...
}
output {
elasticsearch {
hosts => "${ES_HOSTS}"
user => "${ES_USER}"
password => "${ES_PWD}"
action => "update"
document_id => "%{[#metadata][_id]}"
index => "%{[#metadata][_index]}"
}
}
I am new to elastic stack and not sure how to approach the problem. I have managed to get live stream of tweets with specific keyword using Twitter input plugin for elastic however I want to get a sample real time tweets with no specific keyword, just a percentage of all real time tweets. I tried to search how to do it but cannot find a good documentation, I believe I need to use the GET statuses/sample API but there is no documentation on it. This is what I have for now:
input {
twitter {
consumer_key => " cosumer_key"
consumer_secret => "consumer_secret"
oauth_token => "token"
oauth_token_secret => "secret"
keywords => ["something"]
languages => ["en"]
full_tweet => true
}
}
output {
elasticsearch {}
}
How would I search for all sample tweets without using the keyword?
Thank you so much in advance.
Here's an example random score query, this should solve your problem:
GET /twitter/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"random_score": {}
}
]
}
}
}
Edit - Adding a logstash config that takes random entries as well:
input {
twitter {
consumer_key => " cosumer_key"
consumer_secret => "consumer_secret"
oauth_token => "token"
oauth_token_secret => "secret"
keywords => ["something"]
languages => ["en"]
full_tweet => true,
use_samples => true
}
}
output {
elasticsearch {}
}
use_samples:
Returns a small random sample of all public statuses. The tweets returned by the default access level are the same, so if two different clients connect to this endpoint, they will see the same tweets. If set to true, the keywords, follows, locations, and languages options will be ignored. Default ⇒ false