For loop in logstash - for-loop

I have an event in logstash that looks like:
{
"terms" : { "A" : 1, "B" : 0.5, "c" : 1.6 }
}
I would like to change it to:
{
"terms" : [ "A", "B", "C" ]
}
I didn't find any documentation about a for loop or get the keys of dictionary.
I would like to do something like:
filter {
for key in [terms]{
mutate {
merge => ["tmp_terms", key]
}
mutate {
remove_field => ["terms"]
rename => ["tmp_terms", "terms"]
}
}
Any suggestions ?

Logstash doesn't have a loop construct but you can use the ruby plugin:
filter {
ruby {
code => "event['terms'] = event['terms'].keys"
}
}

Related

"If" condition in logstash Filter plugin

I have an input log stream in logstash 8.5.3 like this:
{
"ibm_product": "IBM App Connect Enterprise",
"messageFlowUniqueName": "BROKER12.T2.TT.MQTEST",
"#timestamp": "2023-02-15T13:01:17.030747900Z",
"localTransactionId": "b5b9418a-d9aa-4d6a-b7c6-6106751d55ce-24",
"ibm_recordtype": "monitor",
"type": "ace_monitoring_event",
"eventName": "MQ Input.CatchTerminal"
}
I've set up my logstash conf as follows:
input {
http {
port => 3333
}
}
filter {
grok {
match => {
"message" => ["%{JSON:payload_raw}"]
}
pattern_definitions => {
"JSON" => "{.*$"
}
}
if [eventName] == "MQ Input.CatchTerminal" {
mutate {
add_field => { "status" => "Failure"}
}
}
mutate {
copy => {
"message" => "message_copy"
}
}
json {
source => "payload_raw"
target => "payload"
}
}
output {
elasticsearch{
hosts => ["172.20.191.90:9200"]
}
stdout { codec => json_lines }
}
I can't get the 'if' condition to process. I get a "_grokparsefailure" in my stream. Can someone help me with coding the 'if' statement?

How to use ConcatArrays in Spring Data MongoDB

I have the following JS code:
$concatArrays: [
"$some.path",
[
{
"foo": "baz",
"x": 5
}
]
]
How do I do the same with ArrayOperators.ConcatArrays?
Toy can use like following. You can pass the aggregation expresion or reference field name in concat(). So what you can do is, you can add the new array before concat() like
db.collection.aggregate([
{
$addFields: {
secondArray: [
{ "foo": "baz", "x": 5 }
]
}
},
{
"$project": {
combined: {
"$concatArrays": [ "$path", "$secondArray" ]
}
}
}
])
Working Mongo playground
ArrayOperators.ConcatArrays concatArrays = ArrayOperators.ConcatArrays
.arrayOf("path")
.concat("secondArray");
Assuming the array field is defined as follows in a document:
{
"_id": ObjectId("60bdc6b228acee4a5a6e1736"),
"some" : {
"path" : [
{
"foo" : "bar",
"x" : 99
}
]
},
}
The following Spring Data MongoDB code returns the result- an array with two elements. The array field "some.path" and the input array myArr are concatenated using the ArrayOperators.ConcatArrays API.
String json = "{ 'foo': 'baz', 'x': 5 }";
Document doc = Document.parse(json);
List<Document> myArr = Arrays.asList(doc);
MongoOperations mongoOps = new MongoTemplate(MongoClients.create(), "testDB");
Aggregation agg = newAggregation(
addFields()
.addFieldWithValue("myArr", myArr)
.build(),
addFields()
.addField("result").withValue(ConcatArrays.arrayOf("$some.path").concat("$myArr"))
.build()
);
AggregationResults<Document> results = mongoOps.aggregate(agg, "testColl", Document.class);

Elasticsearch ingest pipeline: how to recursively modify values in a HashMap

Using an ingest pipeline, I want to iterate over a HashMap and remove underscores from all string values (where underscores exist), leaving underscores in the keys intact. Some values are arrays that must further be iterated over to do the same modification.
In the pipeline, I use a function to traverse and modify the values of a Collection view of the HashMap.
PUT /_ingest/pipeline/samples
{
"description": "preprocessing of samples.json",
"processors": [
{
"script": {
"tag": "remove underscore from sample_tags values",
"source": """
void findReplace(Collection collection) {
collection.forEach(element -> {
if (element instanceof String) {
element.replace('_',' ');
} else {
findReplace(element);
}
return true;
})
}
Collection samples = ctx.samples;
samples.forEach(sample -> { //sample.sample_tags is a HashMap
Collection sample_tags = sample.sample_tags.values();
findReplace(sample_tags);
return true;
})
"""
}
}
]
}
When I simulate the pipeline ingestion, I find the string values are not modified. Where am I going wrong?
POST /_ingest/pipeline/samples/_simulate
{
"docs": [
{
"_index": "samples",
"_id": "xUSU_3UB5CXFr25x7DcC",
"_source": {
"samples": [
{
"sample_tags": {
"Entry_A": [
"A_hyphentated-sample",
"sample1"
],
"Entry_B": "A_multiple_underscore_example",
"Entry_C": [
"sample2",
"another_example_with_underscores"
],
"Entry_E": "last_example"
}
}
]
}
}
]
}
\\Result
{
"docs" : [
{
"doc" : {
"_index" : "samples",
"_type" : "_doc",
"_id" : "xUSU_3UB5CXFr25x7DcC",
"_source" : {
"samples" : [
{
"sample_tags" : {
"Entry_E" : "last_example",
"Entry_C" : [
"sample2",
"another_example_with_underscores"
],
"Entry_B" : "A_multiple_underscore_example",
"Entry_A" : [
"A_hyphentated-sample",
"sample1"
]
}
}
]
},
"_ingest" : {
"timestamp" : "2020-12-01T17:29:52.3917165Z"
}
}
}
]
}
Here is a modified version of your script that will work on the data you provided:
PUT /_ingest/pipeline/samples
{
"description": "preprocessing of samples.json",
"processors": [
{
"script": {
"tag": "remove underscore from sample_tags values",
"source": """
String replaceString(String value) {
return value.replace('_',' ');
}
void findReplace(Map map) {
map.keySet().forEach(key -> {
if (map[key] instanceof String) {
map[key] = replaceString(map[key]);
} else {
map[key] = map[key].stream().map(this::replaceString).collect(Collectors.toList());
}
});
}
ctx.samples.forEach(sample -> {
findReplace(sample.sample_tags);
return true;
});
"""
}
}
]
}
The result looks like this:
{
"samples" : [
{
"sample_tags" : {
"Entry_E" : "last example",
"Entry_C" : [
"sample2",
"another example with underscores"
],
"Entry_B" : "A multiple underscore example",
"Entry_A" : [
"A hyphentated-sample",
"sample1"
]
}
}
]
}
You were on the right path but you were working on copies of values and weren't setting the modified values back onto the document context ctx which is eventually returned from the pipeline. This means you'll need to keep track of the current iteration indexes -- so for the array lists, as for the hash maps and everything in between -- so that you can then target the fields' positions in the deeply nested context.
Here's an example taking care of strings and (string-only) array lists. You'll need to extend it to handle hash maps (and other types) and then perhaps extract the whole process into a separate function. But AFAIK you cannot return multiple data types in Java so it may be challenging...
PUT /_ingest/pipeline/samples
{
"description": "preprocessing of samples.json",
"processors": [
{
"script": {
"tag": "remove underscore from sample_tags values",
"source": """
ArrayList samples = ctx.samples;
for (int i = 0; i < samples.size(); i++) {
def sample = samples.get(i).sample_tags;
for (def entry : sample.entrySet()) {
def key = entry.getKey();
def val = entry.getValue();
def replaced_val;
if (val instanceof String) {
replaced_val = val.replace('_',' ');
} else if (val instanceof ArrayList) {
replaced_val = new ArrayList();
for (int j = 0; j < val.length; j++) {
replaced_val.add(val[j].replace('_',' '));
}
}
// else if (val instanceof HashMap) {
// do your thing
// }
// crucial part
ctx.samples[i][key] = replaced_val;
}
}
"""
}
}
]
}

Is there a more efficient way to refactor the iteration of the hash on ruby?

I have a iteration here:
container = []
summary_data.each do |_index, data|
container << data
end
The structure of the summary_data is listed below:
summary_data = {
"1" => { orders: { fees: '25.00' } },
"3" => { orders: { fees: '30.00' } },
"6" => { orders: { fees: '45.00' } }
}
I want to remove the numeric key, e.g., "1", "3".
And I expect to get the following container:
[
{
"orders": {
"fees": "25.00"
}
},
{
"orders": {
"fees": "30.00"
}
},
{
"orders": {
"fees": "45.00"
}
}
]
Is there a more efficient way to refactor the code above?
Appreciate for any help.
You can use Hash#values method, like this:
container = summary_data.values
If the inner hashes all have the same structure, the only interesting information are the fees:
summary_data.values.map{|h| h[:orders][:fees] }
# => ["25.00", "30.00", "45.00"]
If you want to do some calculations with those fees, you could convert them to numbers:
summary_data.values.map{|h| h[:orders][:fees].to_f }
# => [25.0, 30.0, 45.0]
It might be even better to work with cents as integers to avoid any floating point error:
summary_data.values.map{|h| (h[:orders][:fees].to_f * 100).round }
=> [2500, 3000, 4500]
You need an array having values of provided hash. You can get by values method directly.
summary_data.values

Ignoring duplicates within elastic search

I have many records where the msg is 'a'. Some of these records have the same type.
I'm trying to write a query that counts the number of with msg 'a', but doesn't count duplicates.
Example:
1: msg = 'a', type = 'b'
2: msg = 'a', type = 'b'
3: msg = 'a', type
= 'c'
This should return a count of two because the first and second records have the same type and are only counted once.
Here is my query so far.
body: {
query: {
bool: {
must: [
{
range: {
"#timestamp" => { from: 'now-1d', to: 'now' }
}
},
{ match: { msg: 'a' }}
]
}
}
}
Any help is appreciated!
Try using aggregations they'll count it for you :)
Read here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-terms-aggregation.html
And try something like this:
body:{
query: {
bool: {
must: [
{
range: {
"#timestamp" => { from: 'now-1d', to: 'now' }
}
},
{ match: { msg: 'a' }}
]
}
}
},
aggs:{
"type":{
"terms":{
"field":"type"
}
}
}
}

Resources