ElasticSearch - Search by IP[regex] - elasticsearch

I have Kibana and ES. I have many indexes. I am using message field in ElasticSearch. My goal is to mask all IP addresses, which I already do using Logstash.
Now, given the fact there are many different indexes, and also different log types, I would like to run either Kibana or ES query for any occurence of IP. Just in case, that I missed any of them. Also, I would like to do it for email format as well.
Question is, how can I run IP/email regex search on ElasticSearch or Kibana?
Message field is string type, and is indexed.

I have found what I was looking for. In my case this approach is valid, since I do not care about performance. This was just a test to make sure I don't 'leak' information.
ElasticSearch regex query.

Related

elasticsearch / kibana, search for documents where message contains '=' char

i have an issue which i suspect is quite basic but i have been stuck on this for too long and i fear i am missing something so basic that i can't see it by now.
we are using the ELK stack today for log analysis of our application logs.
logs are created by the JAVA application into JSON format, shipped using filebeat into logstash which in turn processes the input and queues it into ES.
some of the messages contain unstructured data in the message field which i currently cannot parse into separate fields so i need to catch them in the message field. problem is this:
the string i need to catch is: "57=1" this is an indication of something which i need to filter documents upon. i need to get documents which contain this exact string.
no matter what i try i can't get kibana to match this. it seems to always ignore the equal char and match either 57 or 1.
please advise.
thanks
You may check the Elasticsearch mapping on the field type of the referring field. If it is analyzed, the '=' may not have been indexed due to the default-analyzer. (source 1, source 2)

Dealing with random failure datatypes in Elasticsearch 2.X

So im working on a system that logs bad data sent to an api and what the full request was. Would love to be able to see this in Kibana.
Issue is the datatypes could be random, so when I send them to the bad_data field it fails if it dosen't match the original mapping.
Anyone have a suggestion for the right way to handle this?
(2.X Es is required due to a sub dependancy)
You could use ignore_malformed flag in your field mappings. In that case wrong format values will not be indexed and your document will be saved.
See elastic documentation for more information.
If you want to be able to query such fields as original text you could use fields in your mapping for multi-type indexing, to get fast queries on raw text values.

Elastic search document storing

Basic usecase that we are trying to solve is for users to be able to search from the contents of the log file .
Lets say a simple situation where user searches for a keyword and this is present in a log file which i want to render it back to the user.
We plan to use ElasticSearch for handling this. The idea that i have in mind is to use elastic search as a mechanism to store the indexed log files.
Having this concept in mind, i went through https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
Couple of questions i have,
1) I understand the input provided to elastic search is a JSON doc. It is going to scan this JSON provided and create/update indexes. So i need a mechanism to convert my input log files to JSON??
2) Elastic search would scan this input document and create/update inverted indexes. These inverted indexes actually point to the exact document. So does that mean, ES would store these documents somewhere?? Would it store them as JSON docs? Is it purely in memory or on file sytem/database?
3) No when user searches for a keyword , ES returns back the document which contains the searched keyword. Now do i need to have the ability to convert back this JSON doc to the original log document that user expects??
Clearly im missing something.. Sorry for asking questions this silly , but im trying to improve my skills and its WIP.
Also , i understand that there is ELK stack out there. For some reasons we just want to use ES and not the LogStash and Kibana part of the stack..
Thanks
Logs needs to be parsed to JSON before they can be inserted into Elasticsearch
All documents are stored on the filesystem and some data is kept in memory but all data is persistent.
When you search Elasticsearch you get back matching JSON documents. If you want to display the original error message, you can store that original message in one of the JSON fields and display just that.
So if you just want to store log messages and not break them into fields or anything, you can simply take each row and send it to Elasticsearch like so:
{ "message": "This is my log message" }
To parse logs, break them into fields and add some logic, you will need to use some sort of app, like Logstash for example.

Elasticsearch - Autocomplete return word/term/token suggestions instead of whole documents

I am trying to implement a simple auto completion for query terms.
There are many different approaches but most of them do return documents instead of terms
- or the authors simply stopped explaining from that point and i am not able to adapt.
A user is typing in a query - e.g. phil
What i want is to provide a list of term completion suggestions like philipp, philius, philadelphia, ...
I am able to get document matches via (edge)ngrams, phrase_prefix and so on but i am am stuck at retrieving matching terms (completion suggestions).
Can someone give me a hint?
I have documents like this {"title":"...", "description":"...", "content":"..."}
All fields have larger string values but especially the field content contains fulltext content.
I do not want to suggest the whole title of a document containing e.g. Philadelphia. Just the word "Philadelphia".
Looking for something like that, myself.
In SOLR it was relatively simple to configure (although a pain to build and keep up-to-date) using solr.SpellCheckComponent. Somehow the same underlying Lucene functionality is used differently between SOLR and ElasticSearch, and in ElasticSearch it is geared towards finding whole documents (or whole field values, if you will) or so it seems...
Despite the profusion of "elasticsearch autocomplete" articles, none appears to deal with this particular issue. Like it doesn't exist. Maybe their use case is different and ElasticSearch works for them just fine, who knows?
At this point I think that preparing the exact field values to use with ElasticSearch autocomplete (yes, that's the input field values, not analyzer tokens) maybe the only way to solve the problem. Which is terrible, because the performance is going to be very low.
Try term suggester:
The term suggester suggests terms based on edit distance. The provided
suggest text is analyzed before terms are suggested. The suggested
terms are provided per analyzed suggest text token. The term suggester
doesn’t take the query into account that is part of request.

Kibana: How to visualise based on two fields

I have imported weblogs into Elasticsearch via Logstash. This has completed successfully.
I have a field in the log file (clientip) that is always populated and another field that is sometimes populated (trueclientip). I want to aggregate based on the coalescing of the two; e.g. if trueclientip is not empty then use that otherwise use clientip.
How can I do this with the Visualisation in Kibana? Do I need to generate a scripted field or is there another approach?
Thanks.
Define a scripted field that should have this formula: doc['trueclientip'].value ? doc['trueclientip'].value : doc['clientip'].value and use this in your aggregations.
But, there is a downside to this scripted fields functionality AND the ip type: it seems what you get back from the script is the number itself (which is logic because the scripted fields in Kibana 4 only use Lucene expressions as a language), not the string representation. IPs internally are actually long numbers in Lucene.
For example, 127.0.0.1 is represented internally as 2130706433. And this is what you will see in Visualize.
Is not ideal, indeed, and it would be good to have a more advanced scripting language in scripted fields, but a github issue already exists.

Resources