Packetbeat - How to drop_fields from nested object - elasticsearch

I recently started working with Packetbeat.
For my use-case, I only need some specific fields (to the point where if I could I would completely rewrite the mapping, but am leaving that as a last resort).
I tried removing some of the fields from the "dns.answers" array of objects, but what I did doesn't seem to have any effect:
- include_fields:
fields:
- dns.question.name
- dns.question.type
- dns.answers
- dns.answers_count
- dns.resolved_ip
- drop_fields:
fields:
- dns.answers.name
In addition, I also tried including only the fields I want but that didn't seem to work either, e.g:
- include_fields:
fields:
- dns.question.name
- dns.question.type
- dns.answers.data
- dns.answers_count
- dns.resolved_ip
Any ideas? If rewriting the template/mapping of the index is the best choice, or perhaps using the Ingest Node Pipelines is a better approach, I'd love to hear it.
Thanks

Related

Look up and Synonyms are not working in RASA

I have used lookup table and synonyms, but the entities mentioned in the lookup are not detected by RASA, neither synonyms worked.
nlu:
- intent: place_order
examples: |
- wanna [large](size) shoes for husky
- need a [small](size) size [green](color) boots for pupps
- have [blue](color) socks
- would like to place an order
- lookup: size
examples: |
- small
-medium
-large
- synonym: small
examples: |
- small
- s
- tiny
- synonym: large
examples: |
- large
- l
- big
- synonym: medium
examples: |
- medium
- m
- average
- normal
- lookup: color
examples: |
- black
- blue
- white
- red
- green
- orange
- yellow
- purple
It works correctly for "I would like to place an order for large blue shoes" , but if the input is "medium"(which is the lookup table) instead , it wont recognize
it wont work if synonyms of "large" like "big" is used.
After doing some research I found out using RegexEntityExtractor in pipeline will resolve the issue of lookup table
name: RegexEntityExtractor
But still it didn't resolved the problem of synonyms, and by default it was using DIETClassifier (which i think is pretty good intent and entity extractor) and the output of RegexEnityExtractor collided when I used it along with it.
Can someone suggest an Extractor or combination of extractor (intent and entity) such that it works well with lookup and synonyms without any conflicts?

Facing Issue while sending data from Filebeats to Multiple Logstash files

To be Precise, I am handling a log file which has almost millions of records. Since it is a Billing Summary log, Customer Information will be recorded in no particular order.
I am Using customized GROK Patterns and logstash XML filter plugin to extract the data which would be sufficient to track. To track the The Individual Customer Activities, I am using "Customer_ID" as a unique key. So Even though I am using Multiple Logstash Files, and Multiple GROK Patterns, All his Information could be bounded/Aggregated using his "Customer_ID" (Unique Key)
here is my sample of log file,
7-04-2017 08:49:41 INFO abcinfo (ABC_RemoteONUS_Processor.java52) - Customer_Entry :::<?xml version="1.0" encoding="UTF-8"?><ns2:ReqListAccount xmlns:ns2="http://vcb.org/abc/schema/"/"><Head msgId="1ABCDEFegAQtQOSuJTEs3u" orgId="ABC" ts="2017-04-27T08:49:51+05:30" ver="1.0"/><Cust id="ABCDVFR233cd662a74a229002159220ce762c" note="Account CUST Listing" refId="DCVD849512576821682" refUrl="http://www.ABC.org.in/" ts="2017-04-27T08:49:51+05:30"
My Grok Pattern,
grok {
patterns_dir => "D:\elk\logstash-5.2.1\vendor\bundle\jruby\1.9\gems\logstash-patterns-core-4.0.2\patterns"
match => [ "message" , "%{DATESTAMP:datestamp} %{LOGLEVEL:Logseverity}\s+%{WORD:ModuleInfo} \(%{NOTSPACE:JavaClass}\)%{ABC:Customer_Init}%{GREEDYDATA:Cust}"]add_field => { "Details" => "Request" }remove_tag => ["_grokparsefailure"]}
My Customized pattern which is stored inside Pattern_dir,
ABC ( - Customer_Entry :::)
My XML Filter plugin,
xml {
source => "Cust"
store_xml =>false
xpath => [
"//Head/#ts", "Cust_Req_time",
"//Cust/#id", "Customer_ID",
"//Cust/#note", "Cust_note", ]
}
So whatever the details comes behind ** - Customer_Entry :::**, I will be able to extract it using XML Plugin Filter (will be stored similar to multi-line codec). I have written 5 different Logstash files to extract different Activities of Customer with 5 different Grok Patterns. Which will tell,
1.Customer_Entry
2.Customer_Purchase
3.Customer_Last_Purchase
4.Customer_Transaction
5.Customer_Authorization
All the above Grok patterns has different set of Information, which will be grouped by Customer_ID as I said earlier.
I can able to Extract the Information and Visualize It clearly in Kibana without any flaw by using my Customized pattern with different log files.
Since I have 100's of Log files each and everyday to put into logstash, I opted for Filebeats, but Filebeats run with only one port "5044". I tried to run with 5 different ports for 5 different logstash files but that was not working, Only one logstash file of 5 was getting loaded rest of the config files were being Idle.
here is my sample filebeat output.prospector,
output.logstash:
hosts: ["localhost:5044"]
output.logstash:
hosts: ["localhost:5045"]
output.logstash:
hosts: ["localhost:5046"]
I couldn't add all the grok patterns in one logstash config file, because XML Filter plugin takes the source "GREEDYDATA". in such case I will be having 5 different Source=> for 5 different Grok pattern.
I even tried that too but that was not working.
Looking for better approach.
Sounds like you're looking for scale, with parallel ingestion. As it happens, File beats supports something called load-balancing which sounds like what you're looking for.
output.logstash:
hosts: [ "localhost:5044", "localhost:5045", "localhost:5046" ]
loadbalance: true
That's for the outputs. Though, I believe you wanted multithreading on the input. FileBeats s supposed to track all files specified in the prospector config, but you've found limits. Globbing or specifying a directory will single-thread the files in that glob/directory. If your file-names support it, creative-globbing may get you better parallelism by defining multiple globs in the same directory.
Assuming your logs are coming in by type:
- input_type: log
paths:
- /mnt/billing/*entry.log
- /mnt/billing/*purchase.log
- /mnt/billing/*transaction.log
Would enable prospectors on multiple threads reading in parallel files here.
If your logs were coming in with random names, you could use a similar setup
- input_type: log
paths:
- /mnt/billing/a*
- /mnt/billing/b*
- /mnt/billing/c*
[...]
- /mnt/billing/z*
If you are processing lots of files with unique names that never repeat, adding the clean_inactive config config-option to your prospectors will keep your FileBeat running fast.
- input_type: log
ignore_older: 18h
clean_inactive: 24h
paths:
- /mnt/billing/a*
- /mnt/billing/b*
- /mnt/billing/c*
[...]
- /mnt/billing/z*
Which will remove all state for files older than 24 hours old, and won't bother processing any file more than 18 hours old.

Is there a way to shorten this YAML?

Is there a way to make the following YAML shorter so that the same resources aren't repeated?
---
classes:
- roles::storage::nfs
samba::config::includeconf:
- alpha
- beta
- charlie
- delta
- echo
- foxtrot
smb_shares:
alpha:
name: alpha
beta:
name: beta
charlie:
name: charlie
delta:
name: delta
echo:
name: echo
path: /path/to/file
foxtrot:
name: foxtrot
If there's a way to reduce any of the repetition, that would be great. Ideally, each resource name would only appear once.
Yes you can vastly reduce this with two optimisations, one of which nullifies most of the effect of the other. You will however have to change your program from reading in simple sequences and mappings to create smarter objects ( which I called ShareInclude, Shares and Share):
---
classes:
- roles::storage::nfs
samba::config::includeconf: !ShareInclude
- alpha
- beta
- charlie
- delta
- echo
- foxtrot
smb_shares: !Shares
echo: !Share
path: /path/to/file
When creating ShareInclude, you should create a Share for each sequence element with an initial name being the same as the scalar value and insert this in some global list.
The above takes care of most of the Share object, execept info. When
echo: !Share
path: /path/to/file
is processed a temporary anonymous Share should be created with path set as an attribute or other retrievable value (if the name would be different that could be stored as well). Then once Shares is created it will know the name of the share to look up (echo from the key of the mapping) and take one of two actions:
If the name can be looked up, update the Share object with the information from the anonymous Share
If the name cannot be found, promote the anonymous share by providing the key value as its name, and store it.
This way you have to specify echo twice, otherwise there is no way to associate the explicit path with the specific Share object created when processing ShareInclude. If that is still too much you can approach it from the other way and leave ShareInclude empty and implicitly make those entries when dealing with Shares:
---
classes:
- roles::storage::nfs
samba::config::includeconf: !ShareInclude
smb_shares: !Shares
alpha:
beta:
charlie:
delta:
echo:
path: /path/to/file
foxtrot:
Although this is somewhat shorter, depending on your YAML parser, you might no longer have a guaranteed ordering in the creation of the Share object. And if you have to make Shares into a sequence of key-value pairs the shortening advantage is gone.

How to get difference between 2 dates as duration via using xpath on BPM 11g

Need to get difference between 2 dateTime payload objects as duration format for using on HT expiration value. (i.e. returned as PT2055M or P1DT10H15M)
Actually checked functions on that link: http://docs.oracle.com/cd/E23943_01/dev.1111/e10224/bp_appx_functs.htm#autoId13
And i tried to solve the issue by creating duration such as
concat("P", xp20:year-from-dateTime(string(bpmn:getDataObject('myPayloadDate'))) - xp20:year-from-dateTime(xp20:current-dateTime()) ,"Y", xp20:month-from-dateTime(string(bpmn:getDataObject('myPayloadDate'))) - xp20:month-from-dateTime(xp20:current-dateTime()),"M", xp20:day-from-dateTime(string(bpmn:getDataObject('myPayloadDate'))) - xp20:day-from-dateTime(xp20:current-dateTime()),"DT", xp20:hour-from-dateTime(string(bpmn:getDataObject('myPayloadDate'))) - xp20:hour-from-dateTime(xp20:current-dateTime()),"H",xp20:minute-from-dateTime(string(bpmn:getDataObject('myPayloadDate'))) - xp20:minute-from-dateTime(xp20:current-dateTime()),"M")
But realized that this approach interests with just only seperate values not whole values as expected.
I could not find the right composition of functions to solve.
Could you plz guide?
Assuming you have an XPath 2.0 processor, just use the subtraction operator. For example
current-date() - xs:date('2001-03-04')
gives
P4744D
(You said "dates" but your examples look more like dateTime's. The subtraction operator will work with either.)

Lucene.net and relational data searching

I have the following situation:
I have a person with default fields like: Name, FirstName, Phone, Email, ...
A person has many languageskills, the languageskill entity has the following fields: Language, Speaking, Writing, Understanding, Mothertongue
A person has many workexperiences, with fields: Office, Description, Period, Location
How would I index something like this with Lucene.net?
The following searches could be possible:
- FirstName:"Arno" AND LanguageSkill:(Language:"Dutch" AND Speaking:[3 TO 5])
- FirstName:"Arno" AND WorkExperience:(Description:"Marketing")
- FirstName:"Arno" AND WorkExperience:(Description:"Marketing" OR Description:"Sales")
- FirstName:"Arno" AND WorkExperience:(Description:"Programmer") AND LanguageSkill:(Language:"English" AND Speaking:[3 TO 5] AND MotherTongue:"true")
Would something like this be possible in Lucene, I've tried flattening my relations already where a document could look like this:
Name:"Stallen"
FirstName:"Arno"
WorkExperience:"Office=Lidl Description=Sales Location=London"
WorkExperience:"Office=Abro Description=Programmer Location=London"
LanguageSkill:"Language=Dutch Speaking=3 Writing=1 Understanding=3"
LanguageSkill:"Language=Egnlish Speaking=5 Writing=4 Understanding=5 MotherTongue=true"
"if all you have is a hammer, everything looks like a nail"
Your requirements suit better to relational databases. I would go that way since I don't see anything related with free text search
However if you have to use Lucene.Net you should flatten your data a little bit more such as
Name:"Stallen"
FirstName:"Arno"
WorkExperienceDescription:Sales
WorkExperienceLocation:London
LanguageSkillLanguage:Dutch
LanguageSkillLanguage:English
Of course this would result in some info loss and you would not be able to make a search like
FirstName:"Arno" AND LanguageSkill:(Language:"Dutch" AND Speaking:[3 TO 5])
PS: You can use the same fieldname (ex., LanguageSkillLanguage) multiple times in a single document.
I ended up using the Java version of Lucene (3.6) which contains parent child documents. I used IKVM to generate the .net DLL out of it.

Resources