Determine unique values across multiple arrays (with `d3.nest`) - d3.js

I have a large dataset in which each entry has this shape:
{
id: 'foo',
name: 'bar',
tags: ['baz', 'qux']
}
I know how to find, say, all unique names in my dataset using d3.nest:
d3.nest()
.key(d => d.name)
.rollup(d => d[0])
.entries(data)
.map(d => d.key);
How can I find all unique tags in my dataset, preferably using d3.nest()? I could roll my own reducer, but would prefer to stick to d3 paradigms if possible.

Ok, sometimes it's better to just skip the library and roll your own answer. It's as simple as:
let allTags = Object.keys(data.reduce((acc, d) => {
d.tags.forEach(n => acc[n] = true);
return acc;
}, {}));
Maybe that will help someone in the future.
¯\_(ツ)_/¯

Related

RxJS logic which solves a filter/merge issue

This is more a logical problem then a RxJS problem, I guess, but I do not get it how to solve it.
[input 1]
From a cities stream, I will receive 1 or 2 objects (cities1 or cities2 are test fixtures).
1 object if their is only one language available, 2 objects for a city with both languages.
[input 2]
I do also have a selectedLanguage ("fr" or "nl")
[algo]
If the language of the object corresponds the selectedLanguage, I will pluck the city. This works for my RxJS when I receive 2 objects (cities2)
But since I also can receive 1 object, the filter is not the right thing to do
[question]
Should I check the cities stream FIRST if only one object exists and add another object. Or what are better RxJS/logical options?
const cities1 = [
{city: "LEUVEN", language: "nl"}
];
const cities2 = [
{city: "BRUSSEL", language: "nl"},
{city: "BRUXELLES", language: "fr"}
];
const selectedLang = "fr"
const source$ = from(cities1);
const result = source$.pipe(
mergeMap((city) => {
return of(selectedLang).pipe(
map(lang => {
return {
lang: city.language,
city: city.city,
selectedLang: lang
}
}),
filter(a => a.lang === selectedLang),
pluck('city')
)
}
)
);
result.subscribe(console.log)
If selectedLang is not an observable (i.e. you don't want this to change) then I think it would make it way easier if you keep it as a value:
const result = source$.pipe(
filter(city => city.language === selectedLang)
map(city => city.city)
);
There's nothing wrong from using external parameters, and it makes the stream easier to read.
Now, if selectedLang is an observable, and you want result to always give the city with that selectedLang, then you probably need to combine both streams, while keeping all the cities received so far:
const selectedLang$ = of(selectedLang); // This is actually a stream that can change value
const cities$ = source$.pipe(
scan((acc, city) => [...acc, city], [])
);
const result = combineLatest([selectedLang$, cities$]).pipe(
map(([selectedLang, cities]) => cities.find(city => city.language == selectedLang)),
filter(found => Boolean(found))
map(city => city.city)
)
Edit: note that this result will emit every time cities$ or selectedLang$ changes and one of the cities matches. If you don't want repeats, you can use the distinctUntilChanged() operator - Probably this could be optimised using an exhaustMap or something, but it makes it harder to read IMO.
Thanks for your repsonse. It's great value for me. Indeed I will forget about the selectedLang$ and pass it like a regular string. Problem 1 solved
I'll explain a bit more in detail my question. My observable$ cities$ in fact is a GET and will always return 1 or 2 two rows.
leuven:
[ { city: 'LEUVEN', language: 'nl', selectedLanguage: 'fr' } ]
brussel:
[
{ city: 'BRUSSEL', language: 'nl', selectedLanguage: 'fr' },
{ city: 'BRUXELLES', language: 'fr', selectedLanguage: 'fr' }
]
In case it returns two rows I will be able to filter out the right value
filter(city => city.language === selectedLang) => BRUXELLES when selectedLangue is "fr"
But in case I only receive one row, I should always return this city.
What is the best solution to this without using if statements? I've been trying to work with object destruct and scaning the array but the result is always one record.
// HTTP get
const leuven: City[] = [ {city: "LEUVEN", language: "nl"} ];
// same HTTP get
const brussel: City[] = [ {city: "BRUSSEL", language: "nl"},
{city: "BRUXELLES", language: "fr"}
];
mapp(of(brussel), "fr").subscribe(console.log);
function mapp(cities$: Observable<City[]>, selectedLanguage: string): Observable<any> {
return cities$.pipe(
map(cities => {
return cities.map(city => { return {...city, "selectedLanguage": selectedLanguage }}
)
}),
// scan((acc, value) => [...acc, { ...value, selectedLanguage} ])
)
}

Splitting long piped chains in sub-chains without loosing information about parameters passes between sub-chains

I have a long chain of operations within a pipe. Sub-parts of this chain represent some sort of high level operation. So, for instance, the code could look something like
firstObservable().pipe(
// FIRST high level operation
map(param_1_1 => doStuff_1_1(param_1_1)),
concatMap(param_1_2 => doStuff_1_2(param_1_2)),
concatMap(param_1_3 => doStuff_1_3(param_1_3)),
// SECOND high level operation
map(param_2_1 => doStuff_2_1(param_2_1)),
concatMap(param_2_2 => doStuff_2_2(param_2_2)),
concatMap(param_2_3 => doStuff_2_3(param_2_3)),
)
To improve readability of the code, I can refactor the example above as follows
firstObservable().pipe(
performFirstOperation(),
performSecondOperation(),
}
performFirstOperation() {
return pipe(
map(param_1_1 => doStuff_1_1(param_1_1)),
concatMap(param_1_2 => doStuff_1_2(param_1_2)),
concatMap(param_1_3 => doStuff_1_3(param_1_3)),
)
}
performSecondOperation() {
return pipe(
map(param_2_1 => doStuff_2_1(param_2_1)),
concatMap(param_2_2 => doStuff_2_2(param_2_2)),
concatMap(param_2_3 => doStuff_2_3(param_2_3)),
)
}
Now, the whole thing works and I personally find the code in the second version more readable. What I loose though is the information that performFirstOperation() returns a parameter, param_2_1, which is then used by performSecondOperation().
Is there any different strategy to break a long pipe chain without actually loosing the information of the parameters passed from sub-pipe to sub-pipe?
setting aside the improper usage of forkJoin here, if you want to preserve that data, you should set things up a little differently:
firstObservable().pipe(
map(param_1_1 => doStuff_1_1(param_1_1)),
swtichMap(param_1_2 => doStuff_1_2(param_1_2)),
// forkJoin(param_1_3 => doStuff_1_3(param_1_3)), this isn't an operator
concatMap(param_2_1 => {
const param_2_2 = doStuff_2_1(param_2_1); // run this sync operation inside
return doStuff_2_2(param_2_2).pipe(
concatMap(param_2_3 => doStuff_2_3(param_2_3)),
map(param_2_4 => ([param_2_1, param_2_4])) // add inner map to gather data
);
})
)
this way you've built your second pipeline inside of your higher order operator, so that you can preserve the data from the first set of operations, and gather it with an inner map once the second set of operations has concluded.
for readability concerns, you could do something like what you had:
firstObservable().pipe(
performFirstOperation(),
performSecondOperation(),
}
performFirstOperation() {
return pipe(
map(param_1_1 => doStuff_1_1(param_1_1)),
swtichMap(param_1_2 => doStuff_1_2(param_1_2)),
// forkJoin(param_1_3 => doStuff_1_3(param_1_3)), this isn't an operator
)
}
performSecondOperation() {
return pipe(
concatMap(param_2_1 => {
const param_2_2 = doStuff_2_1(param_2_1);
return doStuff_2_2(param_2_2).pipe(
concatMap(param_2_3 => doStuff_2_3(param_2_3)),
map(param_2_4 => ([param_2_1, param_2_4]))
);
})
)
}
an alternative solution would involve multiple subscribers:
const pipe1$ = firstObservable().pipe(
performFirstOperation(),
share() // don't repeat this part for all subscribers
);
const pipe2$ = pipe1$.pipe(performSecondOperation());
then you could subscribe to each pipeline independently.
I broke one complex operation into two like this:
Main Code
dataForUser$ = this.userSelectedAction$
.pipe(
// Handle the case of no selection
filter(userName => Boolean(userName)),
// Get the user given the user name
switchMap(userName =>
this.performFirstOperation(userName)
.pipe(
switchMap(user => this.performSecondOperation(user))
))
);
First Operation
// Maps the data to the desired format
performFirstOperation(userName: string): Observable<User> {
return this.http.get<User[]>(`${this.userUrl}?username=${userName}`)
.pipe(
// The query returns an array of users, we only want the first one
map(users => users[0])
);
}
Second Operation
// Merges with the other two streams
performSecondOperation(user: User) {
return forkJoin([
this.http.get<ToDo[]>(`${this.todoUrl}?userId=${user.id}`),
this.http.get<Post[]>(`${this.postUrl}?userId=${user.id}`)
])
.pipe(
// Map the data into the desired format for display
map(([todos, posts]) => ({
name: user.name,
todos: todos,
posts: posts
}) as UserData)
);
}
Notice that I used another operator (switchMap in this case), to pass the value from one operator method to another.
I have a blitz here: https://stackblitz.com/edit/angular-rxjs-passdata-deborahk

Add/modify text between parentheses

I'm trying to make a classified text, and I'm having problem turning
(class1 (subclass1) (subclass2 item1 item2))
To
(class1 (subclass1 item1) (subclass2 item1 item2))
I have no idea to turn text above to below one, without caching subclass1 in memory. I'm using Perl on Linux, so any solution using shell script or Perl is welcome.
Edit: I've tried using grep, saving whole subclass1 in a variable, then modify and exporting it to the list; but the list may get larger and that way will use a lot of memory.
I have no idea to turn text above to below one
The general approach:
Parse the text.
You appear to have lists of space-separated lists and atoms. If so, the result could look like the following:
{
type => 'list',
value => [
{
type => 'atom',
value => 'class1',
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass1',
},
]
},
{
type => 'list',
value => [
{
type => 'atom',
value => 'subclass2',
},
{
type => 'atom',
value => 'item1',
},
{
type => 'atom',
value => 'item2',
},
],
}
],
}
It's possible that something far simpler could be generated, but you were light on details about the format.
Extract the necessary information from the tree.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $item = $tree->{value}[2]{value}[1]{value};
Perform the required modifications.
You were light on details about the data format, but it could be as simple as the following if the above data structure was created by the parser:
my $new_atom = { type => 'atom', value => $item };
push #{ $tree->{value}[1]{value} }, $new_atom;
Serialize the data structure.
For the above data structure, you could use the following:
sub serialize {
my ($node) = #_;
return $node->{type} eq 'list'
? "(".join(" ", map { serialize($_) } #{ $node->{value} }).")"
: $node->{value};
}
Other approaches could be available depending on the specifics.

Using subscribe on a GroupedObservable in map

I have an array of objects like the follwing:
private questions: Question[] = [
{
title: "...",
category: "Technologie",
answer: `...`
},
{
title: "...",
category: "Technologie",
answer: `...`
},
{
title: "...",
category: "eID",
answer: `...`
}
];
And I would like to group them by categories, filter them based on a value and return the result as an array. Currently, I'm using this:
Observable
.from(this.questions)
.groupBy(q => q.category)
.map(go =>
{
let category: Category = { title: go.key, questions: [] };
go.subscribe(d => category.questions.push(d));
return category;
})
.filter(c => c.title.toLowerCase().indexOf(value.toLowerCase()) >= 0 || c.questions.filter(q => q.title.toLowerCase().indexOf(value.toLowerCase()) >= 0).length > 0)
.toArray()
This finds the question with the value in the category title but not the one with the value in the question title. I think that's because I'm using a subscribe in map, therefore, the questions are not yet available in the filter method, so I was wondering if there's a possibility to wait for the subscribe to end before going into filter. My research pointed me to flatMap but I can't get it to do what I want.
EDIT
I figured out that I can fix the issue like this:
Observable
.from(this.questions)
.filter(q => q.category.toLowerCase().indexOf(value.toLowerCase()) >= 0 || q.title.toLowerCase().indexOf(value.toLowerCase()) >= 0)
.groupBy(q => q.category)
.map(go =>
{
let category: Category = { title: go.key, questions: [] };
go.subscribe(d => category.questions.push(d));
return category;
})
.toArray()
But I'm still interested in the answer.
When you use groupBy, you get a grouped observable that can be flattened with operators like concatMap, mergeMap, switchMap etc. Within those operators, grouped observables can be transformed separately for each category, i.e. collect the questions together into an array with reduce, and then create the desired object with map.
Observable
.from(questions)
.groupBy(q => q.category)
.mergeMap(go => {
return go.reduce((acc, question) => { acc.push(question); return acc; }, [])
.map(questions => ({ title: go.key, questions }));
})
.filter(c => "...")
.toArray()

Mongoid Complex Query Including Embedded Docs

I have a model with several embedded models. I need to query for a record to see if it exists. the issue is that I will have to include reference to multiple embedded documents my query would have to include the following params:
{
"first_name"=>"Steve",
"last_name"=>"Grove",
"email_addresses"=>[
{"type"=>"other", "value"=>"steve#stevegrove.com", "primary"=>"true"}
],
"phone_numbers"=>[
{"type"=>"work_fax", "value"=>"(720) 555-0631"},
{"type"=>"home", "value"=>"(303) 555-1978"}
],
"addresses"=>[
{"type"=>"work", "street_address"=>"6390 N Main Street", "city"=>"Elbert", "state"=>"CO"}
],
}
How can I query for all the embedded docs even though some fields are missing such as _id and associations?
A few things to think about.
Are you sure the query HAS to contain all these parameters? Is there not a subset of this information that uniquely identifies the record? Say (first_name, last_name, and an email_addresses.value). It would be silly to query all the conditions if you could accomplish the same thing in less work.
In Mongoid the where criteria allows you to use straight javascript, so if you know how to write the javascript criteria you could just pass a string of javascript to where.
Else you're left writing a really awkward where criteria statement, thankfully you can use the dot notation.
Something like:
UserProfile.where(first_name: "Steve",
last_name: "Grove",
:email_addresses.matches => {type: "other",
value: "steve#stevegrove.com",
primary: "true"},
..., ...)
in response to the request for embedded js:
query = %{
function () {
var email_match = false;
for(var i = 0; i < this.email_addresses.length && !email_match; i++){
email_match = this.email_addresses[i].value === "steve#stevegrove.com";
}
return this.first_name === "Steve" &&
this.last_name === "Grove" &&
email_match;
}
}
UserProfile.where(query).first
It's not pretty, but it works
With Mongoid 3 you could use elem_match http://mongoid.org/en/origin/docs/selection.html#symbol
UserProfile.where(:email_addresses.elem_match => {value: 'steve#stevegrove.com', primary: true})
This assumes
class UserProfile
include Mongoid::Document
embeds_many :email_addresses
end
Now if you needed to include every one of these fields, I would recommend using the UserProfile.collection.aggregate(query). In this case you could build a giant hash with all the fields.
query = { '$match' => {
'$or' => [
{:email_addresses.elem_match => {value: 'steve#stevegrove.com', primary: true}}
]
} }
it starts to get a little crazy, but hopefully that will give you some insight into what your options might be. https://coderwall.com/p/dtvvha for another example.

Resources