kedro PartitionedDataSet lazy saving to spare memory? - kedro

I am working with PartionedDataSet in kedro. One of the data set is of type pillow.ImageDataSet:
raw_images:
type: PartitionedDataSet
<<: *data_path_on_disk
dataset:
type: pillow.ImageDataSet
filename_suffix: ".png"
I want to process this set of image (for example cropping them) and save them in a new PartionedDataSet of type pillow (same as before except for the path).
node(func=crop_image, inputs="raw_images",outputs="cropped_images")
where crop_image is defined as follow:
def crop_image(images: dict):
return {image_path: image().crop([10,10,20,20]) for image_path, image in images.items()}
How will the dictionary be build? Will it be built and be completely stored in memory (which will overload soon for a big dataset) or will it write the image progressively to the disk as they are computed?

In the form you've presented it:
def crop_image(images: dict):
return {image_path: image().crop([10,10,20,20]) for image_path, image in images.items()}
it has nothing to do with Kedro. Your crop_image will be executed by Python eagerly, making the execution allocate the dictionary as well as the images, before you return it.
If you want to have lazy saving, follow the Partitioned dataset lazy saving guide: https://kedro.readthedocs.io/en/stable/data/kedro_io.html#partitioned-dataset-lazy-saving
You still need to allocate the dictionary with keys (so if you have millions of them, the dictionary will get large), but the save method can be deffered and executed lazily, like this (remember about closures):
def crop_single_image_lazy(img):
def crop():
return img().crop([10,10,20,20])
return crop
def crop_image(images: dict):
"""
Returns:
Dictionary of the partitions to create to a function that creates them.
"""
return {
image_path: crop_single_image_lazy(image) for image_path, image in images.items()
}

If you want to enable lazy-saving, you can use a dict of callable to do that.
See this part of the documentations.
https://kedro.readthedocs.io/en/stable/data/kedro_io.html#partitioned-dataset

Related

how to iterate over a list of values returning from ops to jobs in dagster

I am new to the dagster world and working on ops and jobs concepts. \
my requirement is to read a list of data from config_schema and pass it to #op function and return the same list to jobs. \
The code is show as below
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#job
def write_db():
tableNames_frozenList=read_tableNames()
print(f'-------------->',type(tableNames_frozenList))
print(f'-------------->{tableNames_frozenList}')
when it accepts the list in #op function, it is showing as a frozenlist type but when i tried to return to jobs it conver it into <class 'dagster._core.definitions.composition.InvokedNodeOutputHandle'> data type
My requirement is to fetch the list of data and iterate over the list and perform some operatiosn on individual data of a list using #ops
Please help to understand this
Thanks in advance !!!
When using ops / graphs / jobs in Dagster it's very important to understand that the code defined within a #graph or #job definition is only executed when your code is loaded by Dagster, NOT when the graph is actually executing. The code defined within a #graph or #job definition is essentially a compilation step that only serves to define the dependencies between ops - there shouldn't be any general-purpose python code within those definitions. Whatever operations you want to perform on data flowing through your job should take place within the #op definitions. So if you wanted to print the values of your list that is be input via a config schema, you might do something like
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
context.log.info(f'-------------->',type(tableNames_frozenList'))
context.log.info(f'-------------->{tableNames_frozenList}')
here's an example using two ops to do this data flow:
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#op
def print_tableNames(context, table_names):
context.log.info(f'-------------->',type(table_names)
#job
def simple_flow():
print_tableNames(read_tableNames())
Have a look at some of the Dagster tutorials for more examples

Kv related question - How to bound an on_press/release function to the viewclass of the recycleview?

I've been working on a project which required me to learn kv.
what I'm trying to do is use recycleview to display a list of people that are a part of a dataset I built and allow easy edit of the dataset.
what I've done is read the documentation and simply use the first example from there (with a slight change, the viewclass being a togglebutton:
[The Example][1]
so as for my question, what I want to do is simply bound an on_press/release function to the viewclass objects, for example what I want to do is to bound a function to all of the toggle buttons which appends the button's text to a list when It's being pressed and removes the name from the list when It's being released.
[1]: https://i.stack.imgur.com/55FlM.png
You can do that by adding the on_press to the data:
class RV(RecycleView):
def __init__(self, **kwargs):
self.list = []
super(RV, self).__init__(**kwargs)
self.data = [{'text': str(x), 'on_press':partial(self.toggle, str(x))} for x in range(100)]
def toggle(self, name):
print('toggle')
if name in self.list:
self.list.remove(name)
else:
self.list.append(name)
print('list is now:', self.list)

How can I extract, edit and replot a data matrix in Abaqus?

Good afternoon,
We´ve been working on an animal model (skull) applying a series of forces and evaluating the resultant stresses in Abaqus. We got some of those beautiful and colourful (blue-to-red) contour-plots. Now, we´d like to obtain a similar image but coloured by a new matrix, which will be the result of some methematical transformations.
So, how can I extract the data matrix used to set those colour patterns (I guess with X-, Y-, Z-, and von Mises-values or so), apply my transformation, and replot the data to get a new (comparable) figure with the new values?
Thanks a lot and have a great day!
I've never done it myself but I know that this is possible. You can start with the documentation (e.g. here and here).
After experimenting using GUI you can check out the corresponding python code which should be automatically recorded in the abaqus.rpy file at your working directory (or at C:\temp). Working it trhough you could get something like:
myodb = session.openOdb('my_fem.odb') # or alternatively `session.odbs['my_fem.odb']` if it is already loaded into the session
# Define a temporary step for accessing your transformed output
tempStep = myodb.Step(name='TempStep', description='', domain=TIME, timePeriod=1.0)
# Define a temporary frame to storeyour transformed output
tempFrame = tempStep.Frame(frameId=0, frameValue=0.0, description='TempFrame')
# Define a new field output
s1f2_S = myodb.steps['Step-1'].frames[2].fieldOutputs['S'] # Stress tensor at the second frame of the 'Step-1' step
s1f1_S = myodb.steps['Step-1'].frames[1].fieldOutputs['S'] # Stress tensor at the first frame of the 'Step-1' step
tmpField = s1f2_S - s1f1_S
userField = tempFrame.FieldOutput(
name='Field-1', description='s1f2_S - s1f1_S', field=tmpField
)
Now, to display your new Field Output using python you can do the following:
session.viewports['Viewport: 1'].odbDisplay.setFrame(
step='TempStep', frame=0
)
For more information on used methods and objects, you can consult with the documentation "Abaqus Scripting Reference Guide":
Step(): Odb commands -> OdbStep object -> Step();
Frame(): Odb commands -> OdbFrame object -> Frame();
FieldOutput object: Odb commands -> FieldOutput object;

Any ar js multimarkers learning tutorial?

I have been searching for ar.js multimarkers tutorial or anything that explains about it. But all I can find is 2 examples, but no tutorials or explanations.
So far, I understand that it requires to learn the pattern or order of the markers, then it stores it in localStorage. This data is used later to display the image.
What I don't understand, is how this "learner" is implemented. Also, the learning process is only used once by the "creator", right? The output file should be stored and then served later when needed, not created from scratch at each person's phone or computer.
Any help is appreciated.
Since the question is mostly about the learner page, I'll try to break it down as much as i can:
1) You need to have an array of {type, URL} objects.
A sample of creating the default array is shown below (source code):
var markersControlsParameters = [
{
type : 'pattern',
patternUrl : 'examples/marker-training/examples/pattern-files/pattern-hiro.patt',
},
{
type : 'pattern',
patternUrl : 'examples/marker-training/examples/pattern-files/pattern-kanji.patt',
}]
2) You need to feed this to the 'learner' object.
By default the above object is being encoded into the url (source) and then decoded by the learner site. What is important, happens on the site:
for each object in the array, an ArMarkerControls object is created and stored:
// array.forEach(function(markerParams){
var markerRoot = new THREE.Group()
scene.add(markerRoot)
// create markerControls for our markerRoot
var markerControls = new THREEx.ArMarkerControls(arToolkitContext, markerRoot, markerParams)
subMarkersControls.push(markerControls)
The subMarkersControls is used to create the object used to do the learning. At long last:
var multiMarkerLearning = new THREEx.ArMultiMakersLearning(arToolkitContext, subMarkersControls)
The example learner site has multiple utility functions, but as far as i know, the most important here are the ArMultiMakersLearning members which can be used in the following order (or any other):
// this method resets previously collected statistics
multiMarkerLearning.resetStats()
// this member flag enables data collection
multiMarkerLearning.enabled = true
// this member flag stops data collection
multiMarkerLearning.enabled = false
// To obtain the 'learned' data, simply call .toJSON()
var jsonString = multiMarkerLearning.toJSON()
Thats all. If you store the jsonString as
localStorage.setItem('ARjsMultiMarkerFile', jsonString);
then it will be used as the default multimarker file later on. If you want a custom name or more areas - then you'll have to modify the name in the source code.
3) 2.1.4 debugUI
It seems that the debug UI is broken - the UI buttons do exist but are nowhere to be seen. A hot fix would be using the 'markersAreaEnabled' span style for the div
containing the buttons (see this source bit).
It's all in this glitch, you can find it under the phrase 'CHANGES HERE' in the arjs code.

Specifying styles for portions of a PyYAML dump

I'm using YAML for a computer and human-editable and readable input format for a simulator. For human readability, some parts of the input are mostly amenable to block style, while flow style suits others better.
The default for PyYAML is to use block style wherever there are nested maps or sequences, and flow style everywhere else. *default_flow_style* allows one to choose all-flow-style or all-block-style.
But I'd like to output files more of the form
bonds:
- { strength: 2.0 }
- ...
tiles:
- { color: red, edges: [1, 0, 0, 1], stoic: 0.1}
- ...
args:
block: 2
Gse: 9.4
As can be seen, this doesn't follow a consistent pattern for styles throughout, and instead changes depending upon the part of the file. Essentially, I'd like to be able to specify that all values in some block style sequences be in flow style. Is there some way to get that sort of fine-level control over dumping? Being able to dump the top-level mapping in a particular order while not requiring that order (eg, omap) would be nice as well for readability.
It turns out this can be done by defining subclasses with representers for each item I want not to follow default_flow_style, and then converting everything necessary to those before dumping. In this case, that means I get something like:
class blockseq( dict ): pass
def blockseq_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=False )
class flowmap( dict ): pass
def flowmap_rep(dumper, data):
return dumper.represent_mapping( u'tag:yaml.org,2002:map', data, flow_style=True )
yaml.add_representer(blockseq, blockseq_rep)
yaml.add_representer(flowmap, flowmap_rep)
def dump( st ):
st['tiles'] = [ flowmap(x) for x in st['tiles'] ]
st['bonds'] = [ flowmap(x) for x in st['bonds'] ]
if 'xgrowargs' in st.keys(): st['xgrowargs'] = blockseq(st['xgrowargs'])
return yaml.dump(st)
Annoyingly, the easier-to-use dumper.represent_list and dumper.represent_dict don't allow flow_style to be specified, so I have to specify the tag, but the system does work.

Resources