How can I let elasticsearch map base64 field? - elasticsearch

I have a json file as input to my Elasticsearch 7.10.1 cluster. The format of the json is something like:
{
"data" : "eyJtZXRyaWNfc3RyZWFtX25hbWUiOiJtGltZW5zaW9ucy...
}
The data value in the json is a base64 of a json. How can I create a map in elasticsearch to decode the base64 value and make index on each field inside the decoded json?

Ingest pipeline to the rescue!! You can create an ingest pipeline that will decode the base64 encoded field and then parse the resulting JSON and added all fields to the document. It basically goes like this:
PUT _ingest/pipeline/b64-decode
{
"processors": [
{
"script": {
"source": "ctx.decoded = ctx.b64.decodeBase64();"
}
},
{
"json": {
"field": "decoded",
"add_to_root": true
}
},
{
"remove": {
"field": "decoded"
}
}
]
}
Then you can refer to that ingest pipeline when indexing new documents, as shown below:
PUT index/_doc/1?pipeline=b64-decode
{
"b64": "eyJmaWVsZCI6ICJoZWxsbyB3b3JsZCJ9"
}
The b64 field contains the following base64-encoded JSON
{ "field" : "hello world" }
Finally, the document that will be indexed will look like this:
{
"b64" : "eyJmaWVsZCI6ICJoZWxsbyB3b3JsZCJ9",
"field" : "hello world"
}

Related

Splitting a json array format with same fields name

Currently, I have this kind of JSON array with the same field, what I wanted is to split this data into an independent field and the field name is based on a "name" field
events.parameters (this is the field name of the JSON array)
{
"name": "USER_EMAIL",
"value": "dummy#yahoo.com"
},
{
"name": "DEVICE_ID",
"value": "Wdk39Iw-akOsiwkaALw"
},
{
"name": "SERIAL_NUMBER",
"value": "9KJUIHG"
}
expected output:
events.parameters.USER_EMAIL : dummy#yahoo.com
events.parameters.DEVICE_ID: Wdk39Iw-akOsiwkaALw
events.parameters.SERIAL_NUMBER : 9KJUIHG
Thanks.
Tldr;
There is no filter that does exactly what you are looking for.
You will have to use the ruby filter
I just fixed the problem, for everyone wondering here's my ruby script
if [events][parameters] {
ruby {
code => '
event.get("[events][parameters]").each { |a|
name = a["name"]
value = a["value"]
event.set("[events][parameters_split][#{name}]", value)
}
'
}
}
the output was just like what I wanted.
Cheers!

how to use nested Json field as elasticsearch doc in logstash

say the event is like this:
{
"name": "xxx",
"data": {
"a": xxx
}
}
with logstash, how to just use inner data field as document source send to elasticsearch, like:
{
"a": xxx
}
any response would be appreciated!
tried to use json filter
filter {
json {
source => "data"
}
}
but seems like the event is already parsed as a json, the terminal just print this error message:
Error parsing json {:source=>"data", :raw=>{"a"=>xxx}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash cannot be cast to org.jruby.RubyIO}
FYI, found an answer works
https://discuss.elastic.co/t/move-subarrays-to-document-root/143876
just use ruby code to move nested fields to document root , and remove all other fields
ruby {
code => 'event.get("data").each { | k, v| event.set(k, v) }'
}
mutate {
remove_field => [ "name", "data" ]
}

ElasticSearch: populating ip_range type field via logstash

I'm experimenting with the ip_range field type in ElasticSearch 6.8 (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/range.html) and struggle to find a way to load ip data into the field properly via logstash
I was able to load some sample data via Kibana Dev Tools, but cannot figure out a way to do the same via logstash.
Index definition
PUT test_ip_range
{
"mapping": {
"_doc": {
"properties": {
"ip_from_to_range": {
"type": "ip_range"
},
"ip_from": {
"type": "ip"
},
"ip_to": {
"type": "ip"
}
}
}
}
}
Add sample doc:
PUT test_ip_range/_doc/3
{
"ip_from_to_range" :
{
"gte" : "<dotted_ip_from>",
"lte": "<dotted_ip_to>"
}
}
Logstash config (reading from DB)
input {
jdbc {
...
statement => "SELECT ip_from, ip_to, <???> AS ip_from_to_range FROM sample_ip_data"
}
}
output {
stdout { codec => json_lines }
elasticsearch {
"hosts" => "<host>"
"index" => "test_ip_range"
"document_type" => "_doc"
}
}
Question:
How do I get ip_from and ip_to DB fields into their respective gte and lte parts of the ip_from_to_range via logstash config??
I know I can also insert the ip range in CIDR notation, but would like to be able to have both options - loading in CIDR notation and loading as a range.
After some trial and error, finally figured out the logstash config.
I had posted about a similar issue here, which finally got me on the right track with the syntax for this use case as well.
input { ... }
filter {
mutate {
add_field => {
"[ip_from_to_range]" =>
'{
"gte": "%{ip_from}",
"lte": "%{ip_to}"
}'
}
}
json {
source => "ip_from_to_range"
target => "ip_from_to_range"
}
}
output { ... }
Filter parts explained
mutate add_field: create a new field [ip_from_to_range] with its value being a json string ( '{...}' ). It is important to have the field as [field_name], otherwise the next step to parse the string into json object doesn't work
json: parse the string representation into a json object

elasticsearch: define field's order in returned doc

i'm doing sending queries to elasticsearch and it responde with an unknown order of fields in its documents.
how can i fix the order that elsasticsearch is returning fields inside documents?
i mean, i'm sending this query:
{
"index": "my_index",
"_source":{
"includes" : ["field1","field2","field3","field14"]
},
"size": X,
"body": {
"query": {
// stuff
}
}
}
and when it responds, it gives me something not in the good order.
i ultimatly want to convert this to csv, and want to fix csv headers.
is there something to do so i can get something like
doc1 :{"field1","field2","field3","field14"}
doc2 :{"field1","field2","field3","field14"}
...
in the same order as my "_source" ?
thank's for your help.
A document in Elasticsearch is a JSON hash/map and by definition maps are unordered.
One solution around this would be to use Logstash in order to extract docs from ES using an elasticsearch input and output them in CSV using a csv output. That way you can guarantee that the fields in the CSV file will have the exact same order as specified. Another benefit is that you don't have to write your own boilerplate code to extract from ES and sink to CSV, Logstash does it all for you for free.
The Logstash configuration would look something like this:
input {
elasticsearch {
hosts => "localhost"
query => '{ "query": { "match_all": {} } }'
size => 100
index => "my_index"
}
}
filter {}
output {
csv {
fields => ["field1","field2","field3","field14"]
path => "/path/to/file.csv"
}
}

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Resources