Can someone explain how to read these diagrams? I understand the flow from head to tail, but I am specifically wondering about how to read the field (bracket) transitions between ellipses (Pipes/Taps).
By way of example using the Fields following the Every Pipe in the image, the way I have been able to interpret these is the first Field set i.e. [{2}:'token', 'count'] is what goes into the next Pipe/Tap, but what is the significance of the second Field set [{1}: 'token']?
Is this the field set that went into the previous Pipe above? Is there a programmatic significance to the second bracket i.e. are we able to access it within that pipe with particular Cascading code? (In the case where the second Fields set is greater than the first)
(source: cascading.org)
The second field set represents which fields are available for subsequent operations in that map or reduce.
In your example above, in the reduce step, since you grouped by 'token', only 'token' is available for subsequent aggregations (Everys) in that reduce step. You could, for example, add another aggregation which output the average token length, but you could not use an aggregation which utilized the 'count' yet.
The reason for this behaviour is that subsequent aggregations on the same group happen in parallel. Thus, the Count won't be completed to feed into any other aggregations you chained on.
Related
Referencing this example from Practical Gremlin and this stack overflow post:
Gremlin Post Filter Path
g.withSack(0).V().
has('code','AUS').
repeat(out().simplePath().has('country',within('US','UK')).
choose(has('code','MAN'),sack(sum).by(constant(1)))).
until(has('code','EDI')).
where(sack().is(1)).
path().by('code').
limit(10)
Is it possible to perform a sack sum in such a way as to only sum the first time a property is found. For instance, instead of the 'code' property inside of the choose() which will only sum once per 'code' encountered thanks to the simplePath(), what if there was another property called 'airport_color'. As we perform the traversal, I would only want the sack sum to increment the first time it encountered 'blue' or 'white' as an example, even though multiple airports could have the same color as we go through the traversal. This would help me in the where() clause because, if I had a couple of colors I was interested in looking for as an example (maybe blue and white), I could set the where() clause to be equal to two and know that two wasn't arrived at just because I passed through blue twice but because blue and white was encountred.
I tried using aggregation to make the sack sum increment only on the first encounter but couldn't get it to work, something like this:
g.withSack(0).V().
has('code','AUS').
repeat(out().simplePath().has('country',within('US','UK')).
choose(has('airport_color','blue').has('airport_color', without('airport_color_agg')),sack(sum).by(constant(1))).
aggregate('airport_color_agg').by('airport_color')).
until(has('code','EDI')).
where(sack().is(1)).
path().by('code').
limit(10)
There could be multiple colors in the choose() via or() but I limited it to just one to keep the example more straightforward.
Thanks for your help!
Using the sample graph below:
g.addV('A').property(id,'a1').property('color','red').as('a1').
addV('A').property(id,'a2').property('color','blue').as('a2').
addV('A').property(id,'a3').property('color','red').as('a3').
addV('A').property(id,'a4').property('color','yellow').as('a4').
addV('A').property(id,'a5').property('color','green').as('a5').
addV('A').property(id,'a6').property('color','blue').as('a6').
addE('R').from('a1').to('a2').
addE('R').from('a2').to('a3').
addE('R').from('a3').to('a4').
addE('R').from('a4').to('a5').
addE('R').from('a5').to('a6')
we can inspect the path through the graph as follows:
g.V('a1').
repeat(out()).
until(not(out())).
path().
by('color')
which shows us the colors found
path[red, blue, red, yellow, green, blue]
the next thing we need to do is to remove the duplicates and filter out the colors not in our want list.
g.withSideEffect('want',['red','green']).
V('a1').
repeat(out()).
until(not(out())).
path().
by('color').
dedup(local).
unfold().
where(within('want'))
which gives us:
red
green
finally, we just need to count them:
g.withSideEffect('want',['red','green']).
V('a1').
repeat(out()).
until(not(out())).
path().
by('color').
dedup(local).
unfold().
where(within('want')).
count()
Which, as expected, gives us:
2
UPDATED 2022-09-01 To reflect discussion in comments.
To change the query so that only paths that visit each of the required colors at least once are returned, the previous steps leading up to the count need to be turned into a filter.
g.withSideEffect('want',['red','green']).
V('a1').
repeat(out()).
until(not(out())).
filter(
path().
by('color').
dedup(local).
unfold().
where(within('want')).
count().is(2)).
path()
which for our sample graph returns:
1 path[v[a1], v[a2], v[a3], v[a4], v[a5], v[a6]]
The query as written gets the job done and hopefully is not too hard to follow. There are some things we could change/improve.
Pass in the count for the is step as another parameter like the want list.
Rather than pass in a parameter, use the size of the want list instead. This makes the query a little more complex.
Use a sack as the query proceeds to collect the colors seen. The query gets more complex in that case as, while you can maintain a list in a sack, updating it requires a few extra steps.
Here is the query rewritten to use a sack. This assumes the start node has a color that should be included. If that is not the case, the first sack(assign) can be removed and a withSack([]) added after the withSideEffect.
g.withSideEffect('want',['red','green']).
V('a1').
sack(assign).by(values('color').fold()).
repeat(out().sack(assign).by(union(sack().unfold(),values('color')).fold())).
until(not(out())).
filter(
sack().
unfold().
dedup().
where(within('want')).
count().is(2)).
path()
Writing script in LR for Siebel Open UI. All my requests contains this parameter, with different values. What does it mean?
Examples (from different requests):
"Name=SWEIPS", Value = #0'0'1'0'GetProfileAttr'3'attrName'SBRF Position Id'"
"Name=SWEIPS", Value = #0'0''0'3'1-SQE21A, 1-SQL21E, 1SQE31"
And so on.
Can I simple delete it?
Can I simply delete it? - No, you’re not supposed to delete it.
Compare SWEIPS value by recording twice or trice with different data sets, check is there any date/time values in SWEIPS. If there is nothing to correlate leave as it is, no need to delete.
Ensure to correlate values like SWET,ROWID,SWECount,SWEC and so on.
Still using DC.JS to get some analysis tools written for our tool performance. Thanks so much for having this library available.
I am trying to show which recipe setup times are the worst for a given set of data. Everything works great as long as you show the whole group. When you only display the specified topN using .rowscap on the rowChart the following happens:
The chart will show the right number of bars and they are even sorted properly but the chart has picked the topN unfiltered bars first and then ordered them. I want it to pick the topN from the ordered list, not the other way around. See jsfiddle for demo. (http://jsfiddle.net/za8ksj45/24/)
in the fiddle, the longest setup time belongs to recipeD.
But if you have more than two recipes selected before recipeD
it is dropped of the right (top2) chart.
line 099-110: reductio definition
line 120-140: removal of empty bins (works okay)
(This is very similar to a problem Gordon helped resolved earlier (dc.js rowChart topN without zeros) and I reused the code from that solution. Something went 'wrong' when I combined it with the reductio.js library.)
I think I am not returning the value portion of the reductio group somewhere but have been unable to figure it out. Any help would be appreciated.
The issue is that at the time you .slice(0,n) the group in your function to remove empty bins, the group is not ordered, so you effectively get a random 2 groups, not the top 2 groups. This is actually clear from the unfiltered view, as the "top2" view shows the 2nd and 3rd group from the "all" view, not the actual top 2 (at least for me).
The previous example worked because Crossfilter's standard groups are ordered by default, but in the case of a complex group like the one you are generating with Reductio, what should it order by? There's no way it can know, so Reductio doesn't mess with the ordering at all, which I suppose means it is ordering by the value property, which is an object.
You need to add one line to order your FactsByRecipe group by average and I think it should fix your problem:
FactsByRecipe.order(function(d) { return d.avg; });
Note that there can only be one ordering on a Crossfilter group, so if you want to show "top X" for more than one property of that group you'll need to create another wrapper (like the remove empty bins wrapper) but have the "top" function re-sort the group by the ordering you want.
Good luck!
Been looking all over the place for a solution to this issue. I have a Yahoo Pipe (http://pipes.yahoo.com/pipes/pipe.info?_id=e5420863cfa494ee40e4c9be43f0e812) that I've created to pull back image content from the Bing Search API. The URL builder includes a $skip attribute that takes an integer and uses it to select the starting (index) point for the result set that the query returns.
My initial plan had been to use the math engine in the Wolfram Alpha API to generate a random number (randomInteger[1000]) that I could use to seed the $skip value each time that the pipe is run. I have an earlier version of the pipe where I was able to get the query / result steps working using either "XPath Fetch" and "Fetch Data". However, regardless of how I Fetch the result, the response returns as an attribute / value pair in a list item.Even when I use "Emit items as string" in XPath Fetch, I still get a list with a single item, when what I really want is the integer that I can plug into my $skip attribute.
I've tried everything in Pipes I can think of, and spent a lot of time online looking for an answer. Is there anyway to extract text (in this case, a number) from a single list item and then use the output as input to "wire" a text parameter in another Pipes block? Any suggestions / ideas welcome. In the meantime, I'm generating a sorta-random number by manipulating a timecode hash, but it just feels tacky :-)
Thanks!
All the sources are for repeated items. You can't have a source that just makes a single number.
I'm not really clear what you're trying to do. You want to put a random number into part of the URL string that gets an RSS feed?
I need to count how many times a particular node occurs in a document based on the values of two if its attributes. So, given the following small sample of XML:
<p:entry timestamp="2012-11-15T17:53:34.642-05:00" ticks="89709622449012" system="OSD" component="OSD5" marker=".\Launcher.cpp:1741" severity="Info" type="Driver" subtype="Start" tags="" sensitivity="false">
This can occur one or more times in the document with different attribute sets. I need to count how many show up with type="Driver" AND subtype="Start". I am able to count how many just have type="Driver" using:
count(//p:entry[#type="Driver"])
but haven't been able to combine them. This didn't work:
count(//p:entry[#type="Driver" and #subtype="Start"])
This works for the OP. Specify 2 predicates in succession instead of using operator and result in the same effect:
count(//p:entry[#type="Driver"][#subtype="Start"])
By right, the original code count(//p:entry[#type="Driver" and #subtype="Start"]) should work, as far as my knowledge goes.