What are prefuse data groups and how do I put nodes into them? - prefuse

Many components of prefuse seem to use a String group to identify some subset of data. How do you use groups in practice? Some documentation seems to imply that a single row of data can belong to multiple groups, but I cannot work out how to make this happen.
Ideally, I could put some nodes of a graph into multiple groups, but have them each visualized once, and apply various forces and layouts to them.

Indeed most prefuse components such as Layout, ColorAction or the RendererFactory use group names.
There are different types of groups:
groups created from raw data, e.g. by vis.addGraph(...)
focus groups that contains some items (= rows) from another group
decorator groups, e.g., for labels
aggregate groups that represent items merged to aggregated items
In order to put "some nodes of a graph into multiple groups" you can use focus groups.
Then some Action may be added to run only on the focus group.
Alternatively you could pass a Predicate to the Action, so that only items (= rows) matching the Predicate are handled by the Action.
I recommend to look at the demos to see how this works:
for example https://github.com/prefuse/Prefuse/blob/master/demos/prefuse/demos/ZipDecode.java

Related

(Google Sheets) How to remove certain dropdown options after a certain number of cells with said option is met?

I'm currently working on a google sheets file to organize the members of my class. I am currently assigning committees and I want them to choose their committee in Google Sheets. However, I want to apply only a certain limit per committee.
What I want to happen is, if a certain choice has been chosen i.e. 5 times, I would like that choice to disappear from the choices and would make it reappear again if ever a students change their choice, however, I do not know how to do this in terms of a formula or through data validation.
I would really appreciate your help. Thank you!
Here's a toy example you may be able to adapt to your needs:
Create a list of options a,b,c,d,e in A1:E1 of Sheet1
Create a list of the limits for each option in A2:E2 (for instance 2,1,3,5,3)
Create a list of people Person1,Person2,Person3 in G2:G4
Apply data validation to H2:H4:
Use criteria 'drop down (from a range)'
Set the data range to =Sheet1!$A3:$E3 (only lock columns, not rows)
In A3 enter the following formula:
=lambda(people,choices,list,limits,
makearray(counta(people),counta(list),lambda(r,c,
if(index(choices,r)<>index(list,,c),if(countif(choices,index(list,,c))<index(limits,,c),index(list,,c),),index(list,,c)))))(
$G$2:$G$4,$H$2:$H$4,$A$1:$E$1,$A$2:$E$2)
We are using MAKEARRAY to create a 2D array with the list of options on each line, however we are asking it to omit elements of the list from each line if they haven't already been selected AND a preset limit on the number of selections for that option has not been reached. Obviously in a 'real' example you would place the data range for validation in a separate sheet and probably hide and protect that sheet as well. You could also potentially use an array literal of strings rather than a cell range as the list of options in order to make the validation list formula completely self-contained.

Using PowerQuery, is there a way to view a sheet of different-sized groups of data as a single table?

I have a sheet of non-table groups and would like to view them as a single table.
Each group consists of 4 or 5 rows and 2+ columns with 1 or more blank rows/columns between.
Overall, the groups are organized into rows and columns on the sheet. There shouldn't be more than 3 groups in a row, but some rows may have blank spaces for future groups.
New groups and group columns are added regularly so existing groups can be relocated on the sheet anytime.
The group names are a unique combination of letters and a number. Unfortunately, they are not prefixed "Group" or numbered consecutively like in my example. Giving a generic example that conveys all criteria is harder than it looks 😅.
This is a shared document so I'd like to avoid making structural changes that would affect other users.
I've experimented with some of the transform options, but I'm new to PQ and didn't make much progress.
This answer to a similar question is a step in the right direction, but it looks like I'll need additional steps since my starting data isn't quite as consistent.
Thank you for your time.
Before Example
After Example

How to filter entries that are not duplicates of entries from others columns in Google Sheets?

I have a column called "Masterlist" which contains values from Lists 1, 2 and 3. It also contains values which are present only in Masterlist.
How can I filter them, like shown at the attached image in Google Sheets?
EDIT: The lists will have more than one entries.
Solution 1
In E2, type in
=filter(A2:A,arrayformula(iserror(match(A2:A,B2:D2,0))))
Check the documentation of filter or match for how to use them. With match, be sure to include the third argument. That is an easy one to forget. arrayformula iterates a formula over a range. The output can be a range, in which case it will print over any un-written cells. When arrayformula interacts with match, it only iterates over the first argument, which is why this solution works.
EDIT: If you have a two-dimensional range to match to, you need to collapse them into a one-dimensional range using the concatenation operators such as
=filter(A2:A,arrayformula(iserror(match(A2:A,{B2:B4;C2:C4;D2:C4},0))))
You can experiment with endings without row indices and let Google Sheets select an ending index for you.
Solution 2
Use the native Filter View feature. Good for the scenarios where you don't need to separately print a list of the unique values in "masterlist".
Go to Data -> Create Filter View
Use the relevant help pages to navigate yourself. I can see a few ways to implement what you desire, including
filter by value on the same column (selecting the actual values manually);
filter by value on a "helper column" where you include a formula in the cells to check whether the content in "masterlist" belongs to the list you want to check against. You can use the match and iserror combo here;
custom formula using a similar formula as above.
If your column A, ie. the "masterlist", is something a user would add to, then Data Validation can be used to good effect in conjunction with Filter View.

TABLEAU: Create global filter from a secondary data source to multiple data sources on dashboard

I have a Tableau dashboard with various visualizations created from 3 data sources (i.e. A,B, C).
Each data source has a relationship (join) with the same secondary data source (i.e. D), and the secondary data sources provides information to create a filter for each data source. In other words, there is the following relationship for my data sources:
A - D
B - D
C - D
I would like to create a global filter on a dashboard I have created. I would like one filter card from "D" to show up and be applied to "A," "B," and "C" at once rather than having a separate filter card show up for each data source.
I tried to create a global filter via a parameter and calculated field, but the parameter requires layers of connections because data sources "A,B, and C" only have "D" in common.
Thoughts?
Its not completely clear from your question, but it sounds like you are using Tableau data blending on your worksheets to include data from multiple data sources, rather than a join to create a data source based on multiple tables. If all your tables are on the same database server or spreadsheet, then traditional joins are usually more efficient than data blending.
The following approach often works well.
Instead of using Tableau's quick filter feature, create a worksheet based solely on D that shows the values you wish to use for filtering. It can be a simple list of names, or a bubble chart or anything you like. Use that worksheet as your filter by creating actions where it is the source and all the other worksheets on your dashboard are the target. Typically, you would want to specify the field names explicitly.
Data blending is useful but can be complex. Depending on details, you may need to make D the primary data source on your other worksheets. Experiment.
The parameter and calculated field you mentioned can be even simpler and faster than using actions, but users are restricted to selecting a single value for a parameter unlike the filter action approach. (Of course, one parameter value can represent multiple values in your target data source field depending entirely on how your calculated field interprets the parameter).
I can't tell why that didn't work for you or what you mean by "layers of connections". You might consider clarifying that part of your question.

Hadoop map-reduce : Order of records while grouping

I have a record in each line of input and each record has around 10 fields. First, I group the records by three fields (field1, field2, field3) thus one mapper/reducer is responsible for one unique group (based on the three fields). Within each group, I sort the records based on another integer field timestamp and I tag each record in the group with the same tag aTag by adding another field.
Lets say that in mapper#1, I tag a sorted group as aTag and in mapper#2, I tag another group (a different group because I initially grouped the records based on the three fields) with the same tag aTag.
Now, if I group the records based on the tag field (i.e., grouping the groups in different mappers), I notice that the ordering within each group is no more preserved. I was expecting that since each mapper has a group with all records having the same tag, grouping by the tag name should just involve getting the relevant groups from other mappers and just concatenating them without re-ordering each individual group.
Is it because I am trying to store the records in gzip format and hence it tries to re-order the records for better compression? Also I would like to know how to preserve the order after grouping by the tag name.
It seems that you are trying to implement the sort step of MapReduce yourself in local memory, but then it completely ignores what you did and re-sorts the items in each group anyway. The proper way to fix this would be to specify a comparator on the keys, so that within each partition so that the merged input to the reducer is according to that comparison function. This means that
You don't have to do the sorting yourself
You don't run out of memory on one machine trying to sort a really large group.
It seems on your case that you'd want to add timestamp to the set of keys, tell it to partition on the first three keys, and tell it to sort on the timestamp.
For more information, see the following diagram, and Where is Sort used in MapReduce phase and why?

Resources