We have a complex object with nested fields that the field names can be dynamic and contains dot.When I try to ingest data to elasticsearch it gives me the following error
Object mapping for [x] tried to parse field [x.y] as object, but found a concrete value
One record can have key/values like a.b.c:4 and in other record it can have a.b:3. We don't have control of the source of coming data so the only option can be changing the object in logstash. Here is an example of coming object:
{
"result": "https://www.yahoo.com",
"tags": {
"url": "https://www.yahoo.com",
"projectName": "monitor",
"host": "ttt",
"dd": 12345,
"vv": "kk"
},
"timestamp": 1586599441000,
"runId": 12345,
"performance": {
"x.y.z": 31307
},
"channel": "clientperf",
"asset": {
"a.b.c": 5,
"a.b":4
}
}
as you see values inside asset and performance has dot. The fields on the roots(like runId, performance and ...) are fine. How can I resolve this either with replacing the dot in logstash or anything that doesn't give me error. I'm aware of de_dot plugin but to use it we need to specifically tell what are the name of nested fields while we cannot enforce the naming for the coming records.I also know that we probably can achieve this with ruby plugin but I have zero knowledge of ruby. Any help can be appreciated.
Could use Hash#deep_transform_keys from ActiveSupport:
class Hash
def deep_transform_keys(&block)
result = {}
each do |key, value|
result[yield(key)] = value.is_a?(Hash) ? value.deep_transform_keys(&block) : value
end
result
end
end
puts hash.deep_transform_keys { |key| key.to_s.gsub(".", "" ) }
Related
Currently, I have this kind of JSON array with the same field, what I wanted is to split this data into an independent field and the field name is based on a "name" field
events.parameters (this is the field name of the JSON array)
{
"name": "USER_EMAIL",
"value": "dummy#yahoo.com"
},
{
"name": "DEVICE_ID",
"value": "Wdk39Iw-akOsiwkaALw"
},
{
"name": "SERIAL_NUMBER",
"value": "9KJUIHG"
}
expected output:
events.parameters.USER_EMAIL : dummy#yahoo.com
events.parameters.DEVICE_ID: Wdk39Iw-akOsiwkaALw
events.parameters.SERIAL_NUMBER : 9KJUIHG
Thanks.
Tldr;
There is no filter that does exactly what you are looking for.
You will have to use the ruby filter
I just fixed the problem, for everyone wondering here's my ruby script
if [events][parameters] {
ruby {
code => '
event.get("[events][parameters]").each { |a|
name = a["name"]
value = a["value"]
event.set("[events][parameters_split][#{name}]", value)
}
'
}
}
the output was just like what I wanted.
Cheers!
I am confused by this behavior that I'm seeing with Absinthe.
For a top-level field, e.g.
field :projects, list_of(:project) do
arg :user_id, :string
resolve(&ProjectResolver.list_projects/2)
end
If ProjectResolver.list_projects/2 returns {:ok, []}, then the JSON result will correctly be
{
"data": {
"projects": []
}
}
However, for a subfield, e.g. the tags field in
object :task do
field :id, :string
# ... Other fields
field :tags, list_of(:tag) do
resolve(&TaskResolver.list_tags/3)
end
# ... Other subfields
end
If TaskResolver.list_tags/3 returns {:ok, []}, I get
{
"data": {
"task": {
"id": "ba156cde-8c5f-4806-b161-62071b0098b3",
"tags": [
null
]
}
}
}
instead of
{
"data": {
"task": {
"id": "ba156cde-8c5f-4806-b161-62071b0098b3",
"tags": []
}
}
}
which I think should be the reasonable response.
Now the non-empty array that contains one item (null) is causing headaches for me on the frontend (apollo), and I'm not sure if there's any way I can easily work around that. It would be ideal if the data returned is an empty array in the first place, and I don't see why it's not.
Immediately after posting this question I realized that it might well be that my resolver was not returning {:ok, []} after all... Indeed, it was returning {:ok [nil]} due to the Ecto query being wrong (:left_join instead of :join). That's why the returned JSON contains [null]. I just needed to fix my resolver function to actually return {:ok, []} in this case. I guess writing about an issue does help clear your thoughts on it.
I am ingesting csv data into elasticsearch using the append processor. I already have two fields that are objects (object1 and object2) and I want to append them both into an array of a different field (mainlist). So it would come out as mainlist:[ {object1}, {object}] I have tried the set processor with the copy_from parameter and I am getting an error that I am missing the required property name "value" even though the ElasticSearch documentation clearly doesn't use the "value" property when it uses the "copy_from". {"set": {"field": "mainlist","copy_from": ["object1", "object"]}}. My syntax is even copied exactly from the documentation. Please help.
Furthermore I need to drop empty fields at the ingest level so they are not returned. I don't wish to have "fieldname: "", returned to the user. What is the best way to do that. I am new to ElasticSearch and it has not been going well.
As to dropping the empty fields at ingest level -- set up a pipeline:
PUT _ingest/pipeline/no_empty_fields
{
"description": "Removes empty-ish fields from a doc",
"processors": [
{
"script": {
"source": """
def keys_to_remove = ctx.keySet()
.stream()
.filter(field -> ctx[field] == null ||
ctx[field] == "")
.collect(Collectors.toList());
for (key in keys_to_remove) {
ctx.remove(key);
}
"""
}
}
]
}
and apply it upon indexing
POST myindex/_doc?pipeline=no_empty_fields
{
"fieldname23": 123,
"fieldname": null,
"fieldname123": ""
}
You can of course extend the conditions to ditch other fields such as "undefined", "Infinity" and others.
Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!
You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.
Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially
Given the JSON response:
{
"tags": [
{
"id": 81499,
"name": "sign-in"
},
{
"id": 81500,
"name": "user"
},
{
"id": 81501,
"name": "authentication"
}
]
}
Using RSpec 2, I want to verify that this response contains the tag with the name authentication. Being a fairly new to Ruby, I figured there is a more efficient way than iterating the array and checking each value of name using include? or map/collect. I could simply user a regex to check for /authentication/i but that doesn't seem like the best approach either.
This is my spec so far:
it "allows filtering" do
response = #client.story(15404)
#response.tags.
end
So, if
t = JSON.parse '{ ... }'
Then this expression will either return nil, which is false, or it will return the thing it detected, which has a boolean evaluation of true.
t['tags'].detect { |e| e['name'] == 'authentication' }
This will raise NoMethodError if there is no tags key. I think that's handled just fine in a test, but you can arrange for that case to also show up as false (i.e., nil) with:
t['tags'].to_a.detect { |e| e['name'] == 'authentication' }