Extract a column from crossfilter - dc.js

Heres my problem. I have a working dc.js based dashboard and some data within it. A column of the data contains text data (twitter info). Is it somehow possible to extract that specific column from crossfilter? My aim is to create some charts and the crossfilter containing the text data should feed into a d3 based word cloud so that i can do the drill down based filtering as well which dc and crossfilter provide out of the box. I tried a dimension.top(infinity) but that returns all the key value pairs in the data. I just need the values for a particular key across the whole data set. I hope my question makes some sense.
EDIT:
More research reveals that the wordcloud will accept data in key value pair where the key is the word and value is its frequency of appearance. So i am guessing that will need to be implemented as well. If there is a ready to implement library out there kindly let me know as well. This changes things a bit as far as crossfilter is concerned.I need to throw this calculated key value pair (fit for the word cloud consumption) whenever a filter is triggered. How to go about it?
Looking forward to hearing from you all.
Best,
Anmol

Answer to the first part of the question: Probably dimension.top(Infinity) and then use an accessor to get the values you need. Not exactly efficient, but it is what it is.
Answer to the 2nd part of the question:
You need groupAll, I think. You want to take a tweet, generate an array of tokens (words), then generate a Crossfilter grouping that is a count per word, right? You can code your own custom crossfilter.dimension.groupAll reducers (if you want to do that, create a working example and I can probably cook it up). Or if you want to use Reductio:
tweetWords = data.dimension(function(d) { return d.tweetText.split(' '); });
wordCounts = tweetWords.groupAll();
reducer = reductio()
.groupAll(function(d) {
return d.tweetText.split(' ');
})
.count(true);
reducer(wordCounts);
wordCounts.all();
If you want to filter on this dimension you'll have to override the filter handler and check if the group key is in the dimension array for the record using a filterFunction.

Related

Filter Data for Each Row in a Column

EVE Online Manufacturing Spreadsheet
In Batch!F3:G, I'm attempting to break down the data input from columns B3:C to their components (and eventually materials/minerals in I3:J) by using filter to compare results in Engine!P:R. Multiplied of course by the total number of each finished product I need.
I've been trying to figure out ways to arrayformula this together, and even tried quite a few query functions without success. The best I've been able to come up with is to string the actual formula together, appending them with {}, but this gets bloated quickly. I need this to be open ended because I have a tendency to build a lot of things at once. Any help would be appreciated, even just point me in the right direction!
Well, based on my limited knowledge about google sheet, I can only think of one way to do this automatically.
Here's a sheet I constructed based on your sheet.
https://docs.google.com/spreadsheets/d/1AfX8o05gUGPiN5S90w4o0yxuIYjsJRaXsaYUFTJuEPo/edit?usp=sharing
First, on Engine sheet, add one more column which will give you the number of materials required for that part, which is looked up in the PART LIST of BATCH sheet. For this I use VLOOKUP, as you see in D2.
Then on BATCH sheet, query the materials that VLOOKUP return positive, multiply it by the amount of item and then sum them.
This is done by the QUERY used in F3
This method only if you don't have duplicate item in your PART LIST, due to the way VLOOKUP work.
Of course if you want to break the material list further, you can do the same approach..

jsGrid - get data typed by the user in a search navGrid

I'm trying to bind a table and a graph using d3 and jqGrid library. For that I have to get the search typed by the user in the searchbox (my table looks like this : http://www.guriddo.net/demo/guriddojs/)
I've found this function :
grid.getGridParam("postData").filters
but I don't know how to use it. I thought about the trigger event "jqGridToolbarAfterSearch" to get the data after each search but doesn't seems to work...
If someone has an idea I'll be very grateful!
Thanks.
Ps : if the same method exist to set data, I'm interested too.
I hope that I correctly understand your problem. I suppose that you first converts the CSV data of the demo to some more continent data format: array of items with some properties (name, economy, cylinders, displacement, power, weight, mph, year). Then you can use datatype: "local" and data as the input data. I suppose that the user apply the local filter and then you want to get the filtered data
If you use free jqGrid fork of jqGrid (it's the fork which I develop) then you can get lastSelectedData parameter (var filteredData = $grid.jqGrid("getGridParam", "lastSelectedData");) to have the array of filtered items (see the demo). After that you can use d3 with the filtered items.

Tableau calculated-field filter on pie-chart doesn't work

Based on previous question, I had to create calculated value for Location, and use that as quick filter, i.e.
Location Filter:
LOOKUP(ATTR([Location (Loc)]),0)
Workbook is on Public Tableau
For hovering over points in a map, the calculated field works, but when I create pie chart, it doesn't work.
For instance, if I select All, this is the result
And if I select a business from Location Filter, this is what I get
How to troubleshoot?
Additional Info
However, if I use regular Location filter, then it works, i.e
There are two separate issues to address here:
LOOKUP(ATTR([Location (Loc)]),0) is a sneaky way of filtering the data in the view while still maintaining all of the locations in the partition (by disguising the field as a table calculation, the filtered partition is created before this table calculation is ever executed). Because you've used it here, you still have every location in the partition, even when you filter them out with the quick filter. Because they're still in the partition, when you calculate the percent of total, those other locations will be included in that total, even if they're not displayed in the view.
I don't see a reason for you to keep all of the locations in your partition in this case, so I'd just replace that filter with [Location].
It looks like you've dragged [Location] into your mark as a dimension. As a result, it's broken up the pie slices into smaller chunks, one per location. If you add a dimension to your data, then Tableau will have to group by that dimension when calculating the aggregations.
If you want the Location to appear in the tooltip of your pie chart, you'll have to either add it as an attribute (in which case you'll have to deal with the "*" when you have more than one location in the partition), or you'll just have to deal with the slices being broken up further.

Slickgrid: array scheme inconsistent when data exceeds initial row count

I initialize an editable Slickgrid with n rows. A user keys or pastes >n rows of data. The grid adds rows to hold the extra data, but when I then grid.getData() the resulting array has a different format for rows >n. For example, when n=2 it looks like this
[["A","I","X"],
["B","J","Y"],
{"0":"C","1":"K","2":"Z"}]
I need this array to be uniformly constructed. I tried this but without effect:
grid.updateRowCount();
grid.render();
thatdata=grid.getData();
Hopefully I'm missing something simple in the docs--any help appreciated!
Edit: I should have mentioned I'm using the Celebio/Nereo labs fork, so this isn't purely a Slickgrid question.
SlickGrid does not add the data to the array for you. You do that by subscribing to onAddNewRow and providing the implementation to add a new item to the array, so it's your code that adds the data in the wrong format.

couchdb - retrieve unique documents for a view that emits non-unique two array keys

I have an map function in a view in CouchDB that emits non-unique two array keys, for documents of type message, e.g.
The first position in the array key is a user_id, the second position represents whether or not the user has read the message.
This works nicely in that I can set include_docs=true and retrieve the actual documents. However, I'm retrieving duplicate documents in that case, as you can see above in the view results. I need to be able to write a view that can be queried to return unique messages that have been read by a given user. Additionally, I need to be able to efficiently paginate the resultset.
notice in the image above that [66, true] is emitted twice for doc id 26a9a271de3aac494d37b17334aaf7f3. As far as I can tell, with the keys in my map function, I cannot reduce in such a way that unique documents will be returned.
the next idea I had was to emit doc._id also in the map function and reduce with group_level=exact the result being:
now I am able to get unique document ids, but I cannot get the documents without doing a second query. And even in the case of a second query, it will require a lot of complexity to do pagination like this (at least I think so).
the last idea I came up with is to emit the entire document rather than the doc._id in the third position in the array key, then I can access the entire document and likely paginate. This seems really brutish.
So my question is:
Is #3 above a terrible idea? Is there something I'm missing? Is there a better approach?
Thanks in advance.
See #WickedGrey's comment to the question. The solution is to ensure that I never emit the same key twice for one document. I do this in the map function by keeping track of the keys as I emit them in an array, then skipping the emit if the key exists in the array.

Resources