How can I use NiFi processor RouteOnContent

How can I use NiFi processor RouteOnContent - apache-nifi

I'm trying to read a log file like that one:
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
unicomp6.unicomp.net - - [01/Jul/1995:00:00:06 -0400] "GET /shuttle/countdown/ HTTP/1.0" 200 3985
199.120.110.21 - - [01/Jul/1995:00:00:09 -0400] "GET /shuttle/missions/sts-73/mission-sts-73.html HTTP/1.0" 200 4085
burger.letters.com - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/countdown/liftoff.html HTTP/1.0" 304 0
199.120.110.21 - - [01/Jul/1995:00:00:11 -0400] "GET /shuttle/missions/sts-73/sts-73-patch-small.gif HTTP/1.0" 200 4179
I'm sending 1000 lines each time I run this exercise, and I'm using a splitText processor, and in the extractText processor I use this regex:
successCode -> ^[0-9A-Z\-a-z\.]* - - \[[0-9A-Za-z\/\:]* -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" ([0-9]*) [0-9]*
tiemStamp -> ^[0-9A-Z\-a-z\.]* - - \[([0-9A-Za-z\/\:]*) -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" [0-9]* [0-9]*
important -> ^([0-9A-Z\-a-z\.]*) - - \[[0-9A-Za-z\/\:]* -[0-9]*\] \"[A-Z]* [0-9A-Za-z\/\.\- ]*\" [0-9]* [0-9]*
It can be a mistake on it. Surely here is my problem.
Then, I tryed to send different logs to different routes. If successCode == 200 then I tried to put it on route /user//success/%{tiemStamp}/, but all my lines go to the third way: "unmatched"
On the RouteOnContent processor I've tryed:
successCode -> ${successCode:equals("200")}
successCode -> ${successCode:contains(2)}
successCode -> ${successCode:contains("2")}
Has anyone worked with "RouteOnContent" processor?

According to the documentation, the ExtractText Processor "Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes [...]"
So you should not use a RouteOnContent but a RouteOnAttribute processor in the next step.
(If you stop your RouteOnXXX processor in order to keep the messages in the queue, you can see the content of the flowfiles. On the "Attributes" tab of a flowfile, you can see the values of the different attributes. And I confirm that with your regexp, I have successCode=200. )

Basically you can use both RouteOnAttribute or RouteOnText, but each uses different parameters.
If you chose to use ExtractText, the properties you defined are populated for each row (after the original file was split by SplitText processor).
Now, you have two options:
Route based on the attributes that have been extracted (RouteOnAttribute).
Route based on the content (RouteOnContent). In this case, you don't really need to use Extract Text.
Each processor routes the FlowFile differently:
RouteOnAttribute queries the attributes of the FlowFile (a NiFi Expression Language query). For example, let's say I defined the property 'name', routing based on its value can be:
On the other hand, RouteOnContext queries the content of the FlowFile based on a regex expression. For example:
After defining these parameters, you can continue to route based on these dynamic relationships:

Related

Drop log lines to Loki using multiple conditions with Promtail

I want to drop lines in Promtail using an AND condition from two different JSON fields.
I have JSON log lines like this.
{"timestamp":"2022-03-26T15:40:41+00:00","remote_addr":"1.2.3.4","remote_user":"","request":"GET / HTTP/1.1","status": "200","body_bytes_sent":"939","request_time":"0.000","http_referrer":"http://5.6.7.8","http_user_agent":"user agent 1"}
{"timestamp":"2022-03-26T15:40:41+00:00","remote_addr":"1.2.3.4","remote_user":"","request":"GET /path HTTP/1.1","status": "200","body_bytes_sent":"939","request_time":"0.000","http_referrer":"http://5.6.7.8","http_user_agent":"user agent 1"}
{"timestamp":"2022-03-26T15:40:41+00:00","remote_addr":"1.2.3.4","remote_user":"","request":"GET / HTTP/1.1","status": "200","body_bytes_sent":"939","request_time":"0.000","http_referrer":"http://5.6.7.8","http_user_agent":"user agent 2"}
My local Promtail config looks like this.
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: testing-my-job-drop
pipeline_stages:
- match:
selector: '{job="my-job"}'
stages:
- json:
expressions:
http_user_agent:
request:
- drop:
source: "http_user_agent"
expression: "user agent 1"
# I want this to be AND
- drop:
source: "request"
expression: "GET / HTTP/1.1"
drop_counter_reason: my_job_healthchecks
static_configs:
- labels:
job: my-job
Using a Promtail config like this drops lines using OR from my two JSON fields.
How can I adjust my config so that I only drop lines where http_user_agent = user agent 1 AND request = GET / HTTP/1.1?

If you provide multiple options they will be treated like an AND clause, where each option has to be true to drop the log.
If you wish to drop with an OR clause, then specify multiple drop stages.
https://grafana.com/docs/loki/latest/clients/promtail/stages/drop/#drop-stage
Drop logs by time OR length
Would drop all logs older than 24h OR longer than 8kb bytes
- json:
expressions:
time:
msg:
- timestamp:
source: time
format: RFC3339
- drop:
older_than: 24h
- drop:
longer_than: 8kb
Drop logs by regex AND length
Would drop all logs that contain the word debug AND are longer than 1kb bytes
- drop:
expression: ".*debug.*"
longer_than: 1kb

clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: testing-my-job-drop
pipeline_stages:
- match:
selector: '{job="my-job"}'
stages:
- json:
expressions:
http_user_agent:
request:
- labels:
http_user_agent:
request:
#### method 1
- match:
selector: '{http_user_agent="user agent 1"}'
stages:
- drop:
source: "request"
expression: "GET / HTTP/1.1"
drop_counter_reason: my_job_healthchecks
## they are both conditions match will drop
#### method 2
- match:
selector: '{http_user_agent="user agent 1",request="GET / HTTP/1.1"}'
action: drop
#### method 3, incase regex pattern.
- match:
selector: '{http_user_agent="user agent 1"} |~ "(?i).*GET / HTTP/1.1.*"'
action: drop
static_configs:
- labels:
job: my-job
match stage include match stage.

Logstash: use line number of the log file as document_id

I want to set the document_id of Logstash to the line number of the log file as below: (FYI, why I need to do this is shown here)
elasticsearch {
host => yourEsHost
cluster => "yourCluster"
index => "logstash-%{+YYYY.MM.dd}"
document_id => "%{lineNumber}"
}
For example, if the log file is:
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
I want the document_id of 3 documents to be 0, 1, 2 respectively.
In my scenario, one Elasticsearch index is generated from only one log file. It guarantees that such document_id will not be duplicated inside one index.
Is there any way to achieve this? Thanks.

According the answer here: https://discuss.elastic.co/t/get-line-number-of-the-log-file-line-being-processed/40960, it is not possible for now. But there is an open issue about: https://github.com/logstash-plugins/logstash-input-file/issues/7. So it may be possible in a future version. For know modifying file input plugin or writing your own input plugin.

API Blueprint and Dredd - Required field missing from response, but tests still pass

I am using a combination of API Blueprint and Dredd to test an API my application is dependent on. I am using attributes in API blueprint to define the structure of the response's body.
Apparently I'm missing something though because the tests always pass even though I've purposefully defined a fake "required" parameter that I know is missing from the API's response. It seems that Dredd is only testing whether the type of the response body (array) rather than the type and the parameters within it.
My API Blueprint file:
FORMAT: 1A
HOST: http://somehost.net
# API Title
## Endpoints [GET /endpoint/{date}]
+ Parameters
+ date: `2016-09-01` (string, required) - Date
+ Response 200 (application/json; charset=utf-8)
+ Attributes (array[Data])
## Data Structures
### Data
- realParameter: 2432432 (number)
- realParameter2: `some string` (string, required)
- realParameter3: `Something else` (string, required)
- realParameter4: 1 (number, required)
- fakeParam: 1 (number, required)
The response body:
[
{
"realParameter": 31,
"realParameter2": "some value",
"realParameter3": "another value",
"realParameter4": 8908
},
{
"realParameter": 54,
"realParameter2": "something here",
"realParameter3": "and here too",
"realParameter4": 6589
}
]
And my Dredd config file:
reporter: apiary
custom:
apiaryApiKey: somekey
apiaryApiName: somename
dry-run: null
hookfiles: null
language: nodejs
sandbox: false
server: null
server-wait: 3
init: false
names: false
only: []
output: []
header: []
sorted: false
user: null
inline-errors: false
details: false
method: []
color: true
level: info
timestamp: false
silent: false
path: []
blueprint: myApiBlueprintFile.apib
endpoint: 'http://ahost.com'
Does anyone have any idea why Dredd ignores the fact that "fakeParameter" doesn't actually show up in the response body and still allows the test to pass?

You've run into a limitation of MSON, the language API Blueprint uses for describing attributes. In many cases, MSON describes what MAY be present in the data structure rather than what MUST exactly be present.
The most prominent case are arrays, where basically any content of the array is optional and thus the underlying generated JSON Schema doesn't put any constraints on array contents. Dredd just respects that, so indirectly it becomes a Dredd issue too, however there's not much Dredd can do about it.
There's an issue for the problem: apiaryio/mson#66 You can follow and comment under the issue to get updated about this. Dredd is usually very prompt in getting the latest API Blueprint parser, so once it's implemented in the language itself, it won't take long to appear in Dredd.
Obvious (but tedious) workaround is to specify your own JSON Schema with stricter rules using the + Schema section alongside the + Attributes section.

How to refer tag value in yaml

server:
- import:
cmd: GET GPRS <gprsEn> <gprsVa> <gprsSt>
- update:
gprsEn: 1
gprsVa: 202
gprsSt: reegan
This is my yaml file how to refer gprsEn,gprsVa and gprsSt value in GET GPRS 1 202 reegan i need a output is like
GET GPRS 1 202 reegan

There is no string substitution defined anywhere in the YAML specification, so you have to do this yourself e.g. by doing:
import ruamel.yaml as yaml
yaml_str = """\
server:
- import:
cmd: GET GPRS <gprsEn> <gprsVa> <gprsSt>
- update:
gprsEn: 1
gprsVa: 202
gprsSt: reegan
"""
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
cmd = data['server'][0]['import']['cmd'].replace('<', '{').replace('>', '}')
keywords = data['server'][1]['update']
print(cmd.format(**keywords))
which prints exactly the output you want:
GET GPRS 1 202 reegan
You can of course also expand the parser, but it would still need to go through hoops to specify the source of the keyword/value expansion which in your case is non-relevant (i.e. not some toplevel mapping).

Regular Expression for Jmeter for http-response header

How to use regular expression extractor to extract Etag from http-header response?
I've the following output from my get and I want to extract Etag and use it in my next case where i've to pass it in my http-header to do If-None-Match. I've tried the following: \Etag:\s? and have chosen "Headers" in "Response Field to Check". But I don't see this value being sent to my header.
Thread Name: Fetch_Links 1-1
Sample Start: 2012-05-24 10:15:10 PDT
Load time: 135
Latency: 131
Size in bytes: 4950
Headers size in bytes: 641
Body size in bytes: 4309
Sample Count: 1
Error Count: 0
Response code: 200
Response message: OK
I'm using 2.6 version of jmeter. Thank you in advance.

I'm able to extract the ETag in JMeter (Regular Expression Extractor - Headers) with the following parameters:
Regular Expression: ETag: "(.*?)"
Template: $1$
Then add a HTTP Header Manager to pass the value of the Etag with 'If-None-Match'.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I use NiFi processor RouteOnContent - apache-nifi

Related

Drop log lines to Loki using multiple conditions with Promtail

Logstash: use line number of the log file as document_id

API Blueprint and Dredd - Required field missing from response, but tests still pass

How to refer tag value in yaml

Regular Expression for Jmeter for http-response header

Categories

Resources