How can I add to an existing object using JSONata? - jsonata

I need to be able to add an element to an arbitrarily complex object using JSONata.
I don't know all the elements in the object in advance.
For example, lets say I want to add
"newElement": { "a": 1, "b": 2 }
To an object that looks like:
{ "xx": "An", "yy": "Example", "zz": 1 }
But it might have any number or mix of other elements.
I can replace the whole object but I can't work out how to add to it.

Starting with JSONata 1.3, you can use the $merge function to do this. See example here.

Here is one technique that I have used to merge two objects...
Split all the object keys/values into an array of pairs and build a new object:
$zip(*.$keys(), *.*) {
$[0]: $[1]
}
Note that this requires a single input object which contains the old and new objects in separate fields. (actually, since the $keys() function can operate on an array of objects, you are not limited to only two objects -- in fact, it could be an array of objects instead of separate fields -- your mileage may vary)
{
"newObject": { "a": 1, "b": 2 },
"oldObject": { "xx": "An", "yy": "Example", "zz": 1, "b": 3 }
}
Also, the order of the two objects matters, since the first unique key value will take precedence. For instance, if the newObject is first, and both objects contain field "b", the output value will be used from the first object (effectively overwriting the oldObject value for "b"). So the merged output object is:
{
"a": 1,
"b": 2,
"xx": "An",
"yy": "Example",
"zz": 1
}

Related

Can you represent an object that can have arbitrary fields in proto3?

Consider the following json representation of an object
{
"format": "0.0.1",
"has_impl": true,
"mtv_1b": 1,
"mtv_1c": "h",
"ktc_12": true,
"ktc_zz": true,
}
The first two format and has_impl fields are known. In addition, the object may have arbitrary number of mtv_XX and ktc_XX like fields.
Is such an object representable in proto3 and how would you go at it ?
The following could be an obvious starting point. Are there a combination of oneOf, WellKnownTypes that could be used here ?
message MyObject {
string format = 0;
bool has_impl = 1;
// Is there anything that can go in here ?
....
}
Not directly. The closest you can do would be to have a Struct (which is a map<string, Value>, where Value is a oneof over common types), using struct.proto. Not quite the same, but allows the same ideas.

Is is possible to transform JSON data in an Elasticseach Painless script, and perform further operations on it?

We have a large corpus of JSON-formatted documents to search through to find patterns and historical trends. Elasticsearch seems like the perfect fit for this problem. The first trick is that the documents are collections of tens of thousands of "nested" documents (with a header). The second trick is that these nested documents represent data with varying types.
In order to accommodate this, all the value fields have been "encoded" as an array of strings, so a single integer value has been stored in the JSON as "[\"1\"]", and a table of floats is flattened to "[\"123.45\",\"678.9\",...]" and so on. (We also have arrays of strings, which don't need converting.) While this is awkward, I would have thought this would be a good compromise, given the way everything else involved in Elasticsearch seems to work.
The particular problem here is that these stored data values might represent a bitfield, from which we may need to inspect the state of one bit. Since this field will have been stored as a single-element string array, like "[\"14657\"], we need to convert that to a single integer, and then bit-shift it multiple times to the desired bit (or apply a mask, if such a function is available).
With Elasticsearch, I see that I can embed "Painless" scripts, but examples vary, and I haven't been able to find one that shows how I can covert the arbitrary-length string-array data field to appropriate types, for further comparison. Here's my query script as it stands.
{
"_source" : false,
"from" : 0, "size" : 10,
"query": {
"nested": {
"path": "Variables",
"query": {
"bool": {
"must": {
"match": {"Variables.Designation": "Big_Long_Variable_Name"}
},
"must_not": {
"match": {"Variables.Data": "[0]"}
},
"filter": {
"script": {
"script": {
"source":
"
def vals = doc['Variables.Data'];
return vals[0] != params.setting;
",
"params": {
"setting": 3
}
}
}
}
}
},
"inner_hits": {
"_source": "Variables.Data"
}
}
}
}
I need to somehow transform the vals variable to an array of ints, pick off the first value, do some bit operations, and make a comparison to return true or false. In this example, I'm hoping to be able to set "setting" equal to the bit position I want to check for on/off.
I've already been through the exercise with Elasticsearch in finding out that I needed to make my Variables.Data field a keyword so I could search on specific values in it. I realize that this is getting away from the intent of Elasticsearch, but I still think this might be the best solution, for other reasons. I created a new index, and reimported my test documents, and the index size went up about 30%. That's a compromise I'm willing to make, if I can get this to work.
What tools do I have in Painless to make this work? (Or, am I crazy to try to do this with this tool?)
I would suggest that you encode your data in elasticsearch provided types wherever possible (and even when not) to make the most out of painless. For instance, for the bit strings, you can encode them as an array of 1 and 0's for easier operations with Painless.
Painless, in my opinion, is still primitive. It's hard to debug. It's hard to read. It's hard to maintain. And, it's a horrible idea to have large functions in Painless.
To answer your question, you'd basically need to parse the array string with painless and have it in one of the available datatypes in order to do the comparison that you desire. For example, for the list, you'd use something like the split function, and then manually case each item in the results as int, float, string, etc...
Use the execute API to test small bits before adding this to your scripted field:
POST /_scripts/painless/_execute
{
"script": {
"source": """
ArrayList arr = []; //to start with
// use arr.add(INDEX, VALUE) to add after parsing
""",
"params": {
"foo": 100.0,
"bar": 1000.0
}
}
}
On the other hand, if you save your data in ElasticSearch provided datatypes (note that ElasticSearch supports saving lists inside documents), then this task would be far easier to do in Painless.
For example, instead of having my_doc.foo = "[\"123.45\",\"678.9\",...]" as a string to be parsed later, why not saving it as a native list of floats instead like my_doc.foo = [123.45, 678.9, ...]?
This way, you avoid the unnecessary Painless code required to parse the text document.

What types of GraphQL variable are allowed?

GraphQL common practice, especially over HTTP, is that there is a query parameter, an optional operationName and an optional JSON object providing variables.
All examples of the variables parameter I've seen are simple JSON objects:
{
"var1": "value1",
"var2": 2
}
Each key has a simple JSON type value, such as a numeric or a string.
Can GraphQL variable values be more expansive than this, with deeper mixes of arrays and objects?
{
"var1": "value1",
"var2": {
"name": "My Name",
"numbers": [1, 5, 11, 10]
}
}
Absolutely, however, you're going to want me make sure you have a matching input type in your schema. So to accept an input with a structure like the one in var2, you'd need something like this:
input VarInput {
name: String
numbers: [Int]
}
Then, for the type definition of your query/mutation...
type Query {
foo (var1: String, var2: VarInput): Foo
}
Your resolver can then fetch var2 like any other argument.

How is the memory allocation done for ElasticSearch types in an index?

I was reading the elasticsearch documentation and found an interesting line written, in Index vs. Type under the heading "What is type" the second point says:
Fields that exist in one type will also consume resources for documents of types where this field does not exist.
I am not able to understand what it actually means. Does it mean ti say if I create two types:
Type 1: [a:string, b:text, c:keyword] Type 2: [c: keyword, d:string]
Then even if I am storing a document of type 2, the ElasticSearch will take space for all 5 fields? I don't think it should be the case, but looks to be the same, the way it is written in the documentation.
Elasticsearch is built on top of Lucene, which does not have the concept of a "type". With Lucene, you just have an index and you fill it with documents. A type is an abstraction that only exists at the Elasticsearch layer.
So when you write a document for type 1 then it is like writing this document in Lucene:
{
"a": "Foo",
"b": "Bar",
"c": "Foobar",
"d": null
}
or for writing to type 2 then:
{
"a": null,
"b": null,
"c": "Foobar",
"d": "Foobazz"
}
Despite the fact that when you are writing a document for one type, you are leaving the fields blank for the other type, these empty fields can still consume resources in Lucene. For example, both norms and doc_values are still computed on empty fields (assuming they are enabled, which they are by default depending on the field type).
Also worth reading: https://www.elastic.co/blog/great-mapping-refactoring

Use capitalized hash keys in a map?

I'm trying to create some sample data:
1.upto(19).map {|n| { Keyword: "some-term-#{n}", TotalSearches: n } }
But the data that comes back has slightly different hash keys:
[{:Keyword=>"some-term-1", :TotalSearches=>1}, ...
How can I force it to use the hash keys I specified, like this?
[{"Keyword"=>"some-term-1", "TotalSearches"=>1}, ...
If I put quotes around the hash keys:
1.upto(19).map {|n| { "Keyword": "some-term-#{n}", "TotalSearches": n } }
I get an error.
There's two notations for Ruby hashes.
The first is the traditional notation:
{ :key => "value" }
{ "key" => "value" }
The new notation looks more like JavaScript and others:
{ key: "value" }
This is equivalent to the traditional notation { :key => "value } which is how it shows up in inspect mode, so don't worry if the presentation changes, it's actually the same thing. The new notation forces the use of symbol keys.
If you want to use string keys you need to use the traditional notation.
It's worth noting you can mix and match in the same definition:
{ key: "value", "Key" => "value" }
Ruby encourages the use of symbol keys in cases where the keys are predictable and used frequently. It discourages symbol keys when you're dealing with arbitrary user data such as parameters from forms. In that case strings are a better plan.

Resources