Set the name for each ParallelFor iteration in KFP v2 on Vertex AI - google-cloud-vertex-ai

I am currently using kfp.dsl.ParallelFor to train 300 models. It looks something like this:
...
models_to_train_op = get_models()
with dsl.ParallelFor(models_to_train_op.outputs["data"], parallelism=100) as item:
prepare_data_op = prepare_data(item)
train_model_op = train_model(prepare_data_op["train_data"]
...
Currently, the iterations in Vertex AI are labeled in a dropdown as something like for-loop-worker-0, for-loop-worker-1, and so on. For tasks (like prepare_data_op, there's a function called set_display_name. Is there a similar method that allows you to set the iteration name? It would be helpful to relate them to the training data so that it's easier to look through the dropdown UI that Vertex AI provides.

I reached out to a contact I have at Google. They recommended that you can pass the list that is passed to ParallelFor to set_display_name for each 'iteration' of the loop. When the pipeline is compiled, it'll know to set the corresponding iteration.
# Create component that returns a range list
model_list_op = model_list(n_models)
# Parallelize jobs
ParallelFor(model_list_op.outputs["model_list"], parallelism=100) as x:
x.set_display_name(str(model_list_op.outputs["model_list"]))

Related

how to iterate over a list of values returning from ops to jobs in dagster

I am new to the dagster world and working on ops and jobs concepts. \
my requirement is to read a list of data from config_schema and pass it to #op function and return the same list to jobs. \
The code is show as below
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#job
def write_db():
tableNames_frozenList=read_tableNames()
print(f'-------------->',type(tableNames_frozenList))
print(f'-------------->{tableNames_frozenList}')
when it accepts the list in #op function, it is showing as a frozenlist type but when i tried to return to jobs it conver it into <class 'dagster._core.definitions.composition.InvokedNodeOutputHandle'> data type
My requirement is to fetch the list of data and iterate over the list and perform some operatiosn on individual data of a list using #ops
Please help to understand this
Thanks in advance !!!
When using ops / graphs / jobs in Dagster it's very important to understand that the code defined within a #graph or #job definition is only executed when your code is loaded by Dagster, NOT when the graph is actually executing. The code defined within a #graph or #job definition is essentially a compilation step that only serves to define the dependencies between ops - there shouldn't be any general-purpose python code within those definitions. Whatever operations you want to perform on data flowing through your job should take place within the #op definitions. So if you wanted to print the values of your list that is be input via a config schema, you might do something like
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
context.log.info(f'-------------->',type(tableNames_frozenList'))
context.log.info(f'-------------->{tableNames_frozenList}')
here's an example using two ops to do this data flow:
#op(config_schema={"table_name":list})
def read_tableNames(context):
lst=context.op_config['table_name']
return lst
#op
def print_tableNames(context, table_names):
context.log.info(f'-------------->',type(table_names)
#job
def simple_flow():
print_tableNames(read_tableNames())
Have a look at some of the Dagster tutorials for more examples

How can I extract, edit and replot a data matrix in Abaqus?

Good afternoon,
We´ve been working on an animal model (skull) applying a series of forces and evaluating the resultant stresses in Abaqus. We got some of those beautiful and colourful (blue-to-red) contour-plots. Now, we´d like to obtain a similar image but coloured by a new matrix, which will be the result of some methematical transformations.
So, how can I extract the data matrix used to set those colour patterns (I guess with X-, Y-, Z-, and von Mises-values or so), apply my transformation, and replot the data to get a new (comparable) figure with the new values?
Thanks a lot and have a great day!
I've never done it myself but I know that this is possible. You can start with the documentation (e.g. here and here).
After experimenting using GUI you can check out the corresponding python code which should be automatically recorded in the abaqus.rpy file at your working directory (or at C:\temp). Working it trhough you could get something like:
myodb = session.openOdb('my_fem.odb') # or alternatively `session.odbs['my_fem.odb']` if it is already loaded into the session
# Define a temporary step for accessing your transformed output
tempStep = myodb.Step(name='TempStep', description='', domain=TIME, timePeriod=1.0)
# Define a temporary frame to storeyour transformed output
tempFrame = tempStep.Frame(frameId=0, frameValue=0.0, description='TempFrame')
# Define a new field output
s1f2_S = myodb.steps['Step-1'].frames[2].fieldOutputs['S'] # Stress tensor at the second frame of the 'Step-1' step
s1f1_S = myodb.steps['Step-1'].frames[1].fieldOutputs['S'] # Stress tensor at the first frame of the 'Step-1' step
tmpField = s1f2_S - s1f1_S
userField = tempFrame.FieldOutput(
name='Field-1', description='s1f2_S - s1f1_S', field=tmpField
)
Now, to display your new Field Output using python you can do the following:
session.viewports['Viewport: 1'].odbDisplay.setFrame(
step='TempStep', frame=0
)
For more information on used methods and objects, you can consult with the documentation "Abaqus Scripting Reference Guide":
Step(): Odb commands -> OdbStep object -> Step();
Frame(): Odb commands -> OdbFrame object -> Frame();
FieldOutput object: Odb commands -> FieldOutput object;

How does TextCategorizer.predict work with spaCy?

I've been following the spaCy quick-start guide for text classification.
Let's say I have a very simple dataset.
TRAIN_DATA = [
("beef", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
("apple", {"cats": {"POSITIVE": 0, "NEGATIVE": 1}})
]
I'm training a pipe to classify text. It trains and has a low loss rate.
textcat = nlp.create_pipe("pytt_textcat", config={"exclusive_classes": True})
for label in ("POSITIVE", "NEGATIVE"):
textcat.add_label(label)
nlp.add_pipe(textcat)
optimizer = nlp.resume_training()
for i in range(10):
random.shuffle(TRAIN_DATA)
losses = {}
for batch in minibatch(TRAIN_DATA, size=8):
texts, cats = zip(*batch)
nlp.update(texts, cats, sgd=optimizer, losses=losses)
print(i, losses)
Now, how do I predict whether a new string of text is "POSITIVE" or "NEGATIVE"?
This will work:
doc = nlp(u'Pork')
print(doc.cats)
It gives a score for each category we've trained to predict on.
But that seems at odds with the docs. It says I should use a predict method on the original subclass pipeline component.
That doesn't work though.
Trying textcat.predict('text') or textcat.predict(['text']) etc.. throws:
AttributeError Traceback (most recent call last)
<ipython-input-29-39e0c6e34fd8> in <module>
----> 1 textcat.predict(['text'])
pipes.pyx in spacy.pipeline.pipes.TextCategorizer.predict()
AttributeError: 'str' object has no attribute 'tensor'
The predict methods of pipeline components actually expect a Doc as input, so you'll need to do something like textcat.predict(nlp(text)). The nlp used there does not necessarily have a textcat component. The result of that call then needs to be fed into a call to set_annotations() as shown here.
However, your first approach is just fine:
...
nlp.add_pipe(textcat)
...
doc = nlp(u'Pork')
print(doc.cats)
...
Internally, when calling nlp(text), first the Doc for the text will be generated, and then each pipeline component, one by one, will run its predict method on that Doc and keep adding information to it with set_annotations. Eventually the textcat component will define the cats variable of the Doc.
The API docs from which you're citing for the other approach, kind of give you a look "under the hood". So they're not really conflicting approaches ;-)

vars.put function not writing the desired value into the jmeter parameter

Below is the code which i have been trying to address the below UseCase in JMETER.Quick help is appreciated.
Usecase:
A particular text like "History" in a page response needs to be validated and the if the text counts is more than 50 a random selection of the options within the page needs to be made.And if the text counts is less than 50 1st option needs to be selected.
I am new to Jmeter and trying to solve this usingJSR223 POST processor but somehow stuck at vars.put function where i am unable to see the desired number being populated within the V paramter.
Using a boundary extractor where match no 1 should suffice the 1st selection and 0 should suffice the random selection.
def TotalInstanceAvailable = vars.get("sCount_matchNr").toInteger()
log.info("Total Instance Available = ${TotalInstanceAvailable}");
def boundary_analyzer =50;
def DesiredNumber,V
if (TotalInstanceAvailable < boundary_analyzer)
{
log.info("I am inside the loop")
DesiredNumber = 0;
log.info("DesiredNumber= ${DesiredNumber}");
vars.put("V", DesiredNumber)
log.info("v= ${V}");
}
else{
DesiredNumber=1;
log.info("DesiredNumber=${DesiredNumber}");
vars.put("V", "DesiredNumber")
log.info("v= ${V}");
}
def sCount = vars.get("sCount")
log.info("Text matching number is ${sCount_matchNr}")
You cannot store an integer in JMeter Variables using vars.put() function, you either need to cast it to String first, to wit change this line:
vars.put("V", DesiredNumber)
to this one
vars.put("V", DesiredNumber as String)
alternatively you can use vars.putObject() function which can store literally everything however you will be able to use the value only in JSR223 Elements by calling vars.getObject()
Whenever you face a problem with your JMeter script get used to look at jmeter.log file or toggle Log Viewer window - in absolute majority of cases you will find the root cause of your problem in the log file:

XPath: Select Certain Child Nodes

I'm using XPath with Scrapy to scrape data off of a movie website BoxOfficeMojo.com.
As a general question: I'm wondering how to select certain child nodes of one parent node all in one Xpath string.
Depending on the movie web page from which I'm scraping data, sometimes the data I need is located at different children nodes, such as whether or not there is a link or not. I will be going through about 14000 movies, so this process needs to be automated.
Using this as an example. I will need actor/s, director/s and producer/s.
This is the Xpath to the director: Note: The %s corresponds to a determined index where that information is found - in the action Jackson example director is found at [1] and actors at [2].
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()
However, would a link exist to a page on the director, this would be the Xpath:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/a/text()
Actors are a bit more tricky, as there <br> included for subsequent actors listed, which may be the children of an /a or children of the parent /font, so:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()
Gets all most all of the actors (except those with font/br).
Now, the main problem here, I believe, is that there are multiple //div[#class="mp_box_content"] - everything I have works EXCEPT that I also end up getting some digits from other mp_box_content. Also I have added numerous try:, except: statements in order to get everything (actors, directors, producers who both have and do not have links associated with them). For example, the following is my Scrapy code for actors:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()' % (locActor,)).extract()
try:
second = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
for n in second:
actors.append(n)
except:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
This is an attempt to cover for the facts that: the first actor may not have a link associated with him/her and subsequent actors do, the first actor may have a link associated with him/her but the rest may not.
I appreciate the time taken to read this and any attempts to help me find/address this problem! Please let me know if any more information is needed.
I am assuming you are only interested in textual content, not the links to actors' pages etc.
Here is a proposition using lxml.html (and a bit of lxml.etree) directly
First, I recommend you select td[2] cells by the text content of td[1], with expressions like .//tr[starts-with(td[1], "Director")]/td[2] to account for "Director", or "Directors"
Second, testing various expressions with or without <font>, with or without <a> etc., makes code difficult to read and maintain, and since you're interested only in the text content, you might as well use string(.//tr[starts-with(td[1], "Actor")]/td[2]) to get the text, or use lxml.html.tostring(e, method="text", encoding=unicode) on selected elements
And for the <br> issue for multiple names, the way I do is generally modify the lxml tree containing the targetted content to add a special formatting character to <br> elements' .text or .tail, for example a \n, with one of lxml's iter() functions. This can be useful on other HTML block elements, like <hr> for example.
You may see better what I mean with some spider code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import lxml.etree
import lxml.html
MARKER = "|"
def br2nl(tree):
for element in tree:
for elem in element.iter("br"):
elem.text = MARKER
def extract_category_lines(tree):
if tree is not None and len(tree):
# modify the tree by adding a MARKER after <br> elements
br2nl(tree)
# use lxml's .tostring() to get a unicode string
# and split lines on the marker we added above
# so we get lists of actors, producers, directors...
return lxml.html.tostring(
tree[0], method="text", encoding=unicode).split(MARKER)
class BoxOfficeMojoSpider(BaseSpider):
name = "boxofficemojo"
start_urls = [
"http://www.boxofficemojo.com/movies/?id=actionjackson.htm",
"http://www.boxofficemojo.com/movies/?id=cloudatlas.htm",
]
# locate 2nd cell by text content of first cell
XPATH_CATEGORY_CELL = lxml.etree.XPath('.//tr[starts-with(td[1], $category)]/td[2]')
def parse(self, response):
root = lxml.html.fromstring(response.body)
# locate the "The Players" table
players = root.xpath('//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table')
# we have only one table in "players" so the for loop is not really necessary
for players_table in players:
directors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Director")
actors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Actor")
producers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
writers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
composers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Composer")
directors = extract_category_lines(directors_cells)
actors = extract_category_lines(actors_cells)
producers = extract_category_lines(producers_cells)
writers = extract_category_lines(writers_cells)
composers = extract_category_lines(composers_cells)
print "Directors:", directors
print "Actors:", actors
print "Producers:", producers
print "Writers:", writers
print "Composers:", composers
# here you should of course populate scrapy items
The code can be simplified for sure, but I hope you get the idea.
You can do similar things with HtmlXPathSelector of course (with the string() XPath function for example), but without modifying the tree for <br> (how to do that with hxs?) it works only for non-multiple names in your case:
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Director")]/td[2])').extract()
[u'Craig R. Baxley']
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Actor")]/td[2])').extract()
[u'Carl WeathersCraig T. NelsonSharon Stone']

Resources