I have a yaml file that includes the following:
:common
:substitue
:foo: fee
I read this data like:
data = YAML.load(erb_data[File.basename(__FILE__, '.*')].result(binding))
common = data[:common]
def substitute_if_needed(original_value)
mapping = common.dig(:substitue, original_value)
if mapping.nil? ? original_value : mapping
end
Unfortunately, this doesn't do the substitution that I want. I want to call substitute_if_needed('foo') and get 'fee' back. I also want to call substitute_if_needed('bar') and get 'bar' back.
How can I do this?
There are several problems in your code:
YAML example looks broken. The proper one should looks like:
common:
substitute:
foo: fee
You're trying to fetch common key in common = data[:common] using a symbol as a key, but it should be a string (data["common"]). Also, I'd say it's a bad idea to spilt fetching logic into two pieces - first extract "common" outside of substitute_when_needed and then dig into it inside.
if statement is broken. It should be either proper if or proper ternary operator.
Fixing all this gives us something like (I've just replaced a file with StringIO for convenience - to make the snippet executable as is):
yaml = StringIO.new(<<~DATA)
common:
substitute:
foo: fee
DATA
def substitute_if_needed(data, original_value)
mapping = data.dig("common", "substitute", original_value)
mapping.nil? ? original_value : mapping
end
data = YAML.load(yaml)
substitute_if_needed(data, "foo") # => "fee"
substitute_if_needed(data, "bar") # => "bar"
Related
I'm trying to understand some yaml syntax related to using hydra in a machine learning approach. So given the following, extracted from the original github repo:
datamodule:
_target_: cdvae.pl_data.datamodule.CrystDataModule
datasets:
train:
_target_: cdvae.pl_data.dataset.CrystDataset
name: Formation energy train
path: ${data.root_path}/train.csv
prop: ${data.prop}
niggli: ${data.niggli}
primitive: ${data.primitive}
graph_method: ${data.graph_method}
lattice_scale_method: ${data.lattice_scale_method}
preprocess_workers: ${data.preprocess_workers}
I don't understand what the syntax ${} stands for in this context. Is that some kind of text formatting? Or is it calling some specific module? Could anybody provide some examples?
EDIT:
Ok, as pointed by #flyx, it seems that the syntax ${} refers to hydra which is indeed part of the project I'm analyzing.
The string foo: ${bar} in yaml corresponds roughly to the dict {"foo": "${bar}"} in python. This is to say, the dollar-bracket notation ${} is not given special meaning by the yaml grammar.
OmegaConf, which is the backend used by Hydra, does give special meaning to the dollar-bracket syntax; it is used for "variable interpolation." See the OmegaConf docs on variable interpolation.
To summarize briefly, the idea of variable interpolation is that the string "${bar}" is a "pointer" to the value with key bar that would appear elsewhere in your OmegaConf config object.
Here's a short demo of variable interpolations in action:
from omegaconf import OmegaConf
yaml_data = """
foo: ${bar}
bar: baz
"""
config = OmegaConf.create(yaml_data)
assert config.bar == "baz"
assert config.foo == "baz" # the ${bar} interpolation points to the "baz" value
I have this input repeated in 1850 files:
[
{
"id"=>66939,
"login"=>"XXX",
"url"=>"https://website.com/XX/users/XXX"
},
...
{}
]
And I wanted to make a list in a way that by looking for the login I can retrieve the ID using a syntax like:
users_list[XXX]
This is my desired output:
{"XXX"=>"66570", "XXX"=>"66570", "XXX"=>"66570", "XXX"=>"66570", ... }
My code is:
i2 = 1
while i2 != users_list_raw.parsed.count
temp_user = users_list_raw.parsed[i2]
temp_user_login = temp_user['login']
temp_user_id = temp_user['id']
user = {
temp_user_login => temp_user_id
}
users_list << user
i2 += 1
end
My output is:
[{"XXX":66570},{"XXX":66569},{"XXX":66568},{"XXX":66567},{"XXX":66566}, ... {}]
but this is not what I want.
What's wrong with my code?
hash[key] = value to add an entry in a hash. So I guess in your case users_list[temp_user_login] = temp_user_id
But I'm unsure why you'd want to do that. I think you could look up the id of a user by having the login with a statement like:
login = XXX
user = users_list.select {|user| user["login"] == login}.first
id = user["id"]
and maybe put that in a function get_id(login) which takes the login as its parameter?
Also, you might want to look into databases if you're going to manipulate large amounts of data like this. ORMs (Object Relational Mappers) are available in Ruby such as Data Mapper and Active Record (which comes bundled with Rails), they allow you to "model" the data and create Ruby objects from data stored in a database, without writing SQL queries manually.
If your goal is to lookup users_list[XXX] then a Hash would work well. We can construct that quite simply:
users_list = users_list_raw.parsed.each.with_object({}) do |user, list|
list[user['login']] = user['id']
end
Any time you find yourself writing a while loop in Ruby, there might be a more idiomatic solution.
If you want to keep track of a mapping from keys to values, the best data structure is a hash. Be aware that assignment via the array operator will replace existing values in the hash.
login_to_id = {}
Dir.glob("*.txt") { |filename| # Use Dir.glob to find all files that you want to process
data = eval(File.read(filename)) # Your data seems to be Ruby encoded hash/arrays. Eval is unsafe, I hope you know what you are doing.
data.each { |hash|
login_to_id[hash["login"]] = hash["id"]
}
}
puts login_to_id["XXX"] # => 66939
I'm trying to create a validation for a predetermined list of valid brands as part of an ETL pipeline. My validation requires case insensitivity, as some brands are compound words or abbreviations that are insignificant.
I created a custom predicate, but I cannot figure out how to generate the appropriate error message.
I read the error messages doc, but am having a hard time interpreting:
How to build the syntax for my custom predicate?
Can I apply the messages in my schema class directly, without referencing an external .yml file? I looked here and it seems like it's not as straightforward as I'd hoped.
Below I've given code that represents what I have tried using both built-in predicates, and a custom one, each with their own issues. If there is a better way to compose a rule that achieves the same goal, I'd love to learn it.
require 'dry/validation'
CaseSensitiveSchema = Dry::Validation.Schema do
BRANDS = %w(several hundred valid brands)
# :included_in? from https://dry-rb.org/gems/dry-validation/basics/built-in-predicates/
required(:brand).value(included_in?: BRANDS)
end
CaseInsensitiveSchema = Dry::Validation.Schema do
BRANDS = %w(several hundred valid brands)
configure do
def in_brand_list?(value)
BRANDS.include? value.downcase
end
end
required(:brand).value(:in_brand_list?)
end
# A valid string if case insensitive
valid_product = {brand: 'Valid'}
CaseSensitiveSchema.call(valid_product).errors
# => {:brand=>["must be one of: here, are, some, valid, brands"]} # This message will be ridiculous when the full brand list is applied
CaseInsensitiveSchema.call(valid_product).errors
# => {} # Good!
invalid_product = {brand: 'Junk'}
CaseSensitiveSchema.call(invalid_product).errors
# => {:brand=>["must be one of: several, hundred, valid, brands"]} # Good... (Except this error message will contain the entire brand list!!!)
CaseInsensitiveSchema.call(invalid_product).errors
# => Dry::Validation::MissingMessageError: message for in_brand_list? was not found
# => from .. /gems/2.5.0/gems/dry-validation-0.12.2/lib/dry/validation/message_compiler.rb:116:in `visit_predicate'
The correct way to reference my error message was to reference the predicate method. No need to worry about arg, value, etc.
en:
errors:
in_brand_list?: "must be in the master brands list"
Additionally, I was able to load this error message without a separate .yml by doing this:
CaseInsensitiveSchema = Dry::Validation.Schema do
BRANDS = %w(several hundred valid brands)
configure do
def in_brand_list?(value)
BRANDS.include? value.downcase
end
def self.messages
super.merge({en: {errors: {in_brand_list?: "must be in the master brand list"}}})
end
end
required(:brand).value(:in_brand_list?)
end
I'd still love to see other implementations, specifically for a generic case-insensitive predicate. Many people say dry-rb is fantastically organized, but I find it hard to follow.
I am very new to ruby. I am able to connect to AWS S3 using ruby. I am using following code
filePath = '/TMEventLogs/stable/DeviceWiFi/20160803/1.0/20160803063600-2f9aa901-2ce7-4932-aafd-f7286cdb9871.csv'
s3.get_object({bucket: "analyticspoc", key:"TMEventLogs/stable/DeviceWiFi/20160803/1.0/"}, target:filePath ) do |chunk|
puts "1"
end
In above code s3 is client. "analyticspoc" is root bucket. My path to csv file is as follows All Buckets /analyticspoc/TMEventLogs/stable/DeviceWiFi/20160803/1.0/20160803063600-2f9aa901-2ce7-4932-aafd-f7286cdb9871.csv.
I have tried above code. I above code I was getting error Error getting objects: [Aws::S3::Errors::NoSuchKey] - The specified key does not exist. Using above code I want to read the contents of a file. How to do that ? Please tell me what is the mistake in above code
Got the answer. You can use list_objects for accessing array of file names in chunk(1000 at a time) where as get_object is used for accessing the content of a single file as follows
BUCKET = "analyticspoc"
path = "TMEventLogs/stable/DeviceWiFi/20160803/1.0/"
s3.list_objects(bucket:BUCKET, prefix: path).each do |response|
contents = response.contents
end
file_name = "TMEventLogs/stable/DeviceWiFi/20160803/1.0/012121212121"
response = s3.get_object(bucket: BUCKET, key: file_name)
As far as I can tell you're passing in the arguments incorrectly. It should be a single options hash according to the documentation for get_object:
s3.get_object(
bucket: "analyticspoc",
key: "TMEventLogs/stable/DeviceWiFi/20160803/1.0/",
target: filePath
) do |chunk|
puts "1"
end
I believe it was trying to use your hash as a string key which is obviously not going to work.
With Ruby the curly braces { } are only necessary in method calls if additional arguments follow that need to be in another hash or are non-hash in nature. This makes the syntax a lot less ugly in most cases where options are deliberately last, and sometimes first and last by virtue of being the only argument.
I have a text file (objects.txt) which contains Objects and its attributes.
The content of the file is something like:
Object.attribute = "data"
On a different file, I am Loading the objects.txt file and if I type:
puts object.attribute it prints out data
The issue comes when I am trying to access the object and/or the attribute with a string. What I am doing is:
var = "object" + "." + "access"
puts var
It prints out object.access and not the content of it "data".
I have already tried with instance_variable_get and it works, but I have to modify the object.txt and append an # at the beginning to make it an instance variable, but I cannot do this, because I am not the owner of the object.txt file.
As a workaround I can parse the object.txt file and get the data that I need but I don't want to do this, as I want take advantage of what is already there.
Any suggestions?
Yes, puts is correctly spitting out "object.access" because you are creating that string exactly.
In order to evaluate a string as if it were ruby code, you need to use eval()
eg:
var = "object" + "." + "access"
puts eval(var)
=> "data"
Be aware that doing this is quite dangerous if you are evaluating anything that potentially comes from another user.