Use capitalized hash keys in a map? - ruby

I'm trying to create some sample data:
1.upto(19).map {|n| { Keyword: "some-term-#{n}", TotalSearches: n } }
But the data that comes back has slightly different hash keys:
[{:Keyword=>"some-term-1", :TotalSearches=>1}, ...
How can I force it to use the hash keys I specified, like this?
[{"Keyword"=>"some-term-1", "TotalSearches"=>1}, ...
If I put quotes around the hash keys:
1.upto(19).map {|n| { "Keyword": "some-term-#{n}", "TotalSearches": n } }
I get an error.

There's two notations for Ruby hashes.
The first is the traditional notation:
{ :key => "value" }
{ "key" => "value" }
The new notation looks more like JavaScript and others:
{ key: "value" }
This is equivalent to the traditional notation { :key => "value } which is how it shows up in inspect mode, so don't worry if the presentation changes, it's actually the same thing. The new notation forces the use of symbol keys.
If you want to use string keys you need to use the traditional notation.
It's worth noting you can mix and match in the same definition:
{ key: "value", "Key" => "value" }
Ruby encourages the use of symbol keys in cases where the keys are predictable and used frequently. It discourages symbol keys when you're dealing with arbitrary user data such as parameters from forms. In that case strings are a better plan.

Related

Can you represent an object that can have arbitrary fields in proto3?

Consider the following json representation of an object
{
"format": "0.0.1",
"has_impl": true,
"mtv_1b": 1,
"mtv_1c": "h",
"ktc_12": true,
"ktc_zz": true,
}
The first two format and has_impl fields are known. In addition, the object may have arbitrary number of mtv_XX and ktc_XX like fields.
Is such an object representable in proto3 and how would you go at it ?
The following could be an obvious starting point. Are there a combination of oneOf, WellKnownTypes that could be used here ?
message MyObject {
string format = 0;
bool has_impl = 1;
// Is there anything that can go in here ?
....
}
Not directly. The closest you can do would be to have a Struct (which is a map<string, Value>, where Value is a oneof over common types), using struct.proto. Not quite the same, but allows the same ideas.

Is is possible to transform JSON data in an Elasticseach Painless script, and perform further operations on it?

We have a large corpus of JSON-formatted documents to search through to find patterns and historical trends. Elasticsearch seems like the perfect fit for this problem. The first trick is that the documents are collections of tens of thousands of "nested" documents (with a header). The second trick is that these nested documents represent data with varying types.
In order to accommodate this, all the value fields have been "encoded" as an array of strings, so a single integer value has been stored in the JSON as "[\"1\"]", and a table of floats is flattened to "[\"123.45\",\"678.9\",...]" and so on. (We also have arrays of strings, which don't need converting.) While this is awkward, I would have thought this would be a good compromise, given the way everything else involved in Elasticsearch seems to work.
The particular problem here is that these stored data values might represent a bitfield, from which we may need to inspect the state of one bit. Since this field will have been stored as a single-element string array, like "[\"14657\"], we need to convert that to a single integer, and then bit-shift it multiple times to the desired bit (or apply a mask, if such a function is available).
With Elasticsearch, I see that I can embed "Painless" scripts, but examples vary, and I haven't been able to find one that shows how I can covert the arbitrary-length string-array data field to appropriate types, for further comparison. Here's my query script as it stands.
{
"_source" : false,
"from" : 0, "size" : 10,
"query": {
"nested": {
"path": "Variables",
"query": {
"bool": {
"must": {
"match": {"Variables.Designation": "Big_Long_Variable_Name"}
},
"must_not": {
"match": {"Variables.Data": "[0]"}
},
"filter": {
"script": {
"script": {
"source":
"
def vals = doc['Variables.Data'];
return vals[0] != params.setting;
",
"params": {
"setting": 3
}
}
}
}
}
},
"inner_hits": {
"_source": "Variables.Data"
}
}
}
}
I need to somehow transform the vals variable to an array of ints, pick off the first value, do some bit operations, and make a comparison to return true or false. In this example, I'm hoping to be able to set "setting" equal to the bit position I want to check for on/off.
I've already been through the exercise with Elasticsearch in finding out that I needed to make my Variables.Data field a keyword so I could search on specific values in it. I realize that this is getting away from the intent of Elasticsearch, but I still think this might be the best solution, for other reasons. I created a new index, and reimported my test documents, and the index size went up about 30%. That's a compromise I'm willing to make, if I can get this to work.
What tools do I have in Painless to make this work? (Or, am I crazy to try to do this with this tool?)
I would suggest that you encode your data in elasticsearch provided types wherever possible (and even when not) to make the most out of painless. For instance, for the bit strings, you can encode them as an array of 1 and 0's for easier operations with Painless.
Painless, in my opinion, is still primitive. It's hard to debug. It's hard to read. It's hard to maintain. And, it's a horrible idea to have large functions in Painless.
To answer your question, you'd basically need to parse the array string with painless and have it in one of the available datatypes in order to do the comparison that you desire. For example, for the list, you'd use something like the split function, and then manually case each item in the results as int, float, string, etc...
Use the execute API to test small bits before adding this to your scripted field:
POST /_scripts/painless/_execute
{
"script": {
"source": """
ArrayList arr = []; //to start with
// use arr.add(INDEX, VALUE) to add after parsing
""",
"params": {
"foo": 100.0,
"bar": 1000.0
}
}
}
On the other hand, if you save your data in ElasticSearch provided datatypes (note that ElasticSearch supports saving lists inside documents), then this task would be far easier to do in Painless.
For example, instead of having my_doc.foo = "[\"123.45\",\"678.9\",...]" as a string to be parsed later, why not saving it as a native list of floats instead like my_doc.foo = [123.45, 678.9, ...]?
This way, you avoid the unnecessary Painless code required to parse the text document.

How can I add to an existing object using JSONata?

I need to be able to add an element to an arbitrarily complex object using JSONata.
I don't know all the elements in the object in advance.
For example, lets say I want to add
"newElement": { "a": 1, "b": 2 }
To an object that looks like:
{ "xx": "An", "yy": "Example", "zz": 1 }
But it might have any number or mix of other elements.
I can replace the whole object but I can't work out how to add to it.
Starting with JSONata 1.3, you can use the $merge function to do this. See example here.
Here is one technique that I have used to merge two objects...
Split all the object keys/values into an array of pairs and build a new object:
$zip(*.$keys(), *.*) {
$[0]: $[1]
}
Note that this requires a single input object which contains the old and new objects in separate fields. (actually, since the $keys() function can operate on an array of objects, you are not limited to only two objects -- in fact, it could be an array of objects instead of separate fields -- your mileage may vary)
{
"newObject": { "a": 1, "b": 2 },
"oldObject": { "xx": "An", "yy": "Example", "zz": 1, "b": 3 }
}
Also, the order of the two objects matters, since the first unique key value will take precedence. For instance, if the newObject is first, and both objects contain field "b", the output value will be used from the first object (effectively overwriting the oldObject value for "b"). So the merged output object is:
{
"a": 1,
"b": 2,
"xx": "An",
"yy": "Example",
"zz": 1
}

can puppet differentiate between a scalar and a single element array as a resource attribute?

is there a way to make Puppet to differentiate between
my_custom_type { 'key':
value => 'blah',
}
and
my_custom_type { 'key':
value => ['blah'],
}
when declaring resource attributes?
this is for a custom type, so i have full ruby-land control, but both show up to Puppet::Type#set_parameters and consequently Puppet::Property#should= as 'blah'.
i'm using Puppet 3.4.3 on top of Ruby 2.0.0 (through Boxen). i'm not sure how easy it would be for me to change either of those versions.
CONTEXT: the custom type I'm implementing edits Apple property lists (.plist files), where a string and an array containing a single string element are quite different.
declaring the property like
newproperty(:value, :array_matching => :all) do
along the lines of
https://docs.puppetlabs.com/guides/custom_types.html#customizing-behaviour
doesn't seem to change what set_parameters or should= receive, they just make Puppet::Property#should return ['blah'] instead of 'blah' in both cases. it appears the differentiation is tossed out further up at the parser level.
providing
my_custom_type { 'key':
value => [['blah']],
}
doesn't help either - same result.
PLEASE NOTE:
i realize i can work around this by providing additional information in the declaration, like so:
my_custom_type { 'key':
value => ['blah'],
is_array => true,
}
or
my_custom_type { 'key':
value_array => ['blah'],
}
i'm wondering if there is a way to capture whether an array or scalar was declared... though feel free to explain to me why doing so would be unwise or heretic in Puppet-world; i'm a little new to this strange place.
The underlying single-element special-casing was deprecated in puppet3, and is not part of the language's behaviour since a long while. See https://tickets.puppetlabs.com/browse/PUP-1299.

How do I access the value for this nested hash's key in Ruby?

I have the following hash:
{:charge_payable_response=>{:return=>"700", :ns2=>"http://ws.myws.com/"}}
How can I get the value of the key :return, which in this example is 700?
If you have:
h = {:charge_payable_response=>{:return=>"700", :ns2=>"http://ws.myws.com/"}}
Then use:
h[:charge_payable_response][:return]
# => "700"
The colon prefix means that the key in the hash is a symbol, a special sort of unique identifier:
Symbol objects represent names and some strings inside the Ruby interpreter. They are generated using the :name and :"string" literals syntax, and by the various to_sym methods. The same Symbol object will be created for a given name or string for the duration of a program‘s execution, regardless of the context or meaning of that name. Thus if Fred is a constant in one context, a method in another, and a class in a third, the Symbol :Fred will be the same object in all three contexts.
If:
data = { :charge_payable_response=> { :return=>"700", :ns2=>"http://ws.myws.com/" } }
Then to get the return value use:
data[:charge_payable_response][:return]
I would say it should be:
hash[charge_payable_response][return]
But, isn't return a reserved word in Ruby? That could cause a problem.

Resources