Separate multiple events in logstash input into separate documents in elasticsearch index - elasticsearch

INPUT in logstash :
{
"Teacher": {
"Name": "Mary",
"age": 20,
},
"Student": [
{
"Name": "Tim",
"age"12
},
{
"Name": "Eric",
"age":13
}
]
}
Need to filter this input using logstash to send three separate documents into ElasticSearch.
doc1: {
"Name": "ABC",
"age": 20,
}
doc2: {
"Name": "Tim",
"age"12
}
doc 3:
{
"Name": "Eric",
"age":13
}
Tried split, mutate, ruby filters function but did not get the desired result. Could someone help me separate these into separate outputs to the elasticsearch index.

Since you want a separate event for 'Mary', use the clone filter to create two events. Delete the 'Students' array from one copy to just be left with 'Mary'.
In the second clone, using the split filter will give you different events for 'Tim' and 'Eric'.

Related

How to cleanly batch queries together in Gremlin

I am writing a GraphQL resolver that retrieves all vertices by a particular edge using the following query (created returns label person):
software {
created {
name
}
}
Which would resolve to the following Gremlin Query for each software node found:
g.V().hasLabel('software').has('name', 'ripple').in('created')
This returns a result that includes all properties of the object:
{
"result": [
{
"#type": "d",
"#rid": "#24:0",
"#version": 6,
"#class": "person",
"in_knows": [
"#35:0"
],
"name": "josh",
"out_created": [
"#32:0",
"#33:0"
],
"age": 32,
"#fieldTypes": "in_knows=g,out_created=g"
}
],
"dbStats": {
...
}
}
I realize that this will fall foul on GraphQL's N+1 query so i'm trying to batch queries together using a Dataloader pattern. (i'm also hoping to do property selections, so i'm not asking the database to return too much info)
So i'm trying to craft a query like so:
g.V().union(
__.hasLabel('software').has('name', 'ripple').
project('parent', 'child').by('id').
by(__.in('created').fold()),
__.hasLabel('software').has('name', 'lop').
project('parent', 'child').by('id').
by(__.in('created').fold())
)
But this results in the following where the props are missing and it just includes the id of the vertices I want:
{
"result": [
{
"parent": "ripple",
"child": [
"#24:0"
]
},
{
"parent": "lop",
"child": [
"#22:0",
"#23:0",
"#24:0"
]
}
],
"dbStats": {
...
}
}
My Question is, how can I have the Gremlin query return all of the props for the found vertices and none of the other props? Should I even been doing batching this way?
For anyone else reading, the query I was trying to write wouldn't work because the TraversalSet created in the .by(_.in('created') can't be cast from a List to an ElementMap as the stream cardinality wouldn't be enforced. (You can only have one record per row, I think?)
My working query would be to duplicate the keys for each row and specify the props needed (the query below is ok for gremlin 3.3 as used in ODB, otherwise if you've got < gremlin 3.4 replace the last by step with be(elementMap('name', 'age')):
g.V().union(
__.hasLabel('software').has('name', 'ripple').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value())),
__.hasLabel('software').has('name', 'lop').
as('parent').
in('created').as('child').
select('parent', 'child').
by(values('name')).
by(properties('id', 'name', 'age').
group().by(__.key()).
by(__.value()))
)
So that you get a result like this:
{"data": [
{
"parent": "ripple",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
},
{
"parent": "lop",
"child": {
"id": 5709,
"name": "peter",
"age": 35
}
},
{
"parent": "lop",
"child": {
"id": 5713,
"name": "marko",
"age": 29
}
},
{
"parent": "lop",
"child": {
"id": 5717,
"name": "josh",
"age": 32
}
}
]
}
Which would allow you to create a lookup where you concat all results for "lop" and "ripple" into arrays.

graphQL filter array containing ALL

I am quite new to graphQL, and after searching the whole afternoon, i didn't found my answer to a relative quite simple problem.
I have two objects in my strapi backend :
"travels": [
{
"id": "1",
"title": "Bolivia: La Paz y Salar de Uyuni",
"travel_types": [
{
"name": "Culturales"
},
{
"name": "Aventura"
},
{
"name": "Ecoturismo"
}
]
},
{
"id": "2",
"title": "Europa clásica 2020",
"travel_types": [
{
"name": "Clasicas"
},
{
"name": "Culturales"
}
]
}
]
I am trying to get a filter where I search for travels containing ALL the user-selected travel_types.
I then wrote a query like that :
query($where: JSON){
travels (where:$where) {
id # Or _id if you are using MongoDB
title
travel_types {name}
}
And the parameter i try to input for testing :
{
"where":{
"travel_types.name_contains": ["Aventura"],
"travel_types.name_contains": ["Clasicas"]
}
}
This should return an empty array, because none of the travels have both Aventura and Clasicas travel-types.
But instead it returns the travel with id=2. It seems that only the second filter is taken.
I searched for a query which would be like Array.every() in javascript, but i wasn't able to find.
Does someone has an idea how to achieve this type of filtering ?
Thank you very much,

Receive JSON data from Neo4j to Spring Boot API

I'm trying to send data from Neo4J to Spring boot and I want to receive a JSON like this:
{
"name": "Alex Statham",
"people": [
{
"name": "Jason Statham",
"people": [
{
"name": "Lyna Statham"
},
{
"name": "John Statham"
}
]
},
{
"name": "Will Statham",
"people": [
{
"name": "Michael Statham"
}
]
}
]
}
I try many query and this one maybe the best query which returns a correct family tree for all cases, but it only return nodes with no relationship so I can't put that data to the Library I use:
MATCH (p1:Person {maBN:3})
CALL apoc.path.subgraphNodes(p1, {
sequence: '>Person,FCHILD,>Person,FCHILD,>Person',
maxLevel: 6
}) YIELD node
RETURN node, p1
Here is that Library: https://www.npmjs.com/package/react-tree-graph
What should I do? I don't know too much about Cypher and Neo4j.
You could use apoc.convert.toTree which takes a list of paths and creates a structure like yours.

Elasticsearch to return documents based on 2 criteria where one is based on the other

I have documents in the following format:
{
"id": number
"chefId: number
"name": String,
"ingredients": List<String>,
"isSpecial": boolean
}
Here is a list of 5 documents:
{
"id": 1,
"chefId": 1,
"name": "Roasted Potatoes",
"ingredients": ["Potato", "Onion", "Oil", "Salt"],
"isSpecial": false
},
{
"id": 2,
"chefId": 1,
"name": "Dauphinoise potatoes",
"ingredients": ["Potato", "Garlic", "Cream", "Salt"],
"isSpecial": true
},
{
"id": 3,
"chefId": 2,
"name": "Boiled Potatoes",
"ingredients": ["Potato", "Salt"],
"isSpecial": true
},
{
"id": 4,
"chefId": 3
"name": "Mashed Potatoes",
"ingredients": ["Potato", "Butter", "Milk"],
"isSpecial": false
},
{
"id": 5,
"chefId": 4
"name": "Hash Browns",
"ingredients": ["Potato", "Onion", "Egg"],
"isSpecial": false
}
I will be doing a search where "Potatoes" is contained in the name field. Like this:
{
"query": {
"wildcard": {
"status": {
"value": "*Potatoes*"
}
}
}
}
But I also want to add some extra criteria when returning documents:
If the ingredients contain onion or milk, then return the documents. So documents with the id 1 and 4 will be returned. Note that this means that we have documents returned where chef ids are 1 and 3.
Then, for the documents where we haven't already got another document with the same chef id, return where the isSpecial flag is set to true. So only document 3 will be returned. 2 wouldn't be returned as we already have a document where the chef id is equal to one.
Is it possible to do this kind of chaining in Elasticsearch? I would like to be able to do this in a single query so that I can avoid adding logic to my (Java) code.
You can't have that sort of logic in one elasticsearch query. You could have a tricky query with aggregations / post_filter and so to have all the data you need in one query and then transform it in your Java application.
But the best approach (and the more maintainable) is to have two queries.

logstash fitler how to get the designated fields form log data

the log is like this:
{
"playerId": 2,
"args": {
"uid": 2024657127,
"__route__": "userCenter.playerHandler.getOnLineUids"
},
"time": "03122053",
"timeUsed": 8,
"resp": {
"code": 200,
"uidState": {
"imId": 2024657127,
"uid": 0,
"state": 0
}
}
}
I just need the "__route__" and "timeUsed",
filter {
if "__route__" in [message] {
json {
source => "message"
remove_field => ["args.uid", "playerId", "time", "resp"]
}
}}
the result in kibana like this:
image of the result
we can see the field "arg.uid" is also there,how to delete the field like it? Or any other better way to get "__route__" and "timeUsed"?
Just replace args.uid with [args][uid] , it should work after that. Because in logstash every subfield is accessed by using [parent][child] notation

Resources