How to sort in filter without using Dynamic Ranking in Endeca? - sorting

We are using Endeca to fetch and display records in frontend as a datagrid. In that datagrid, we have 10 columns and we display data sorted in table on the basis of 2 columns (say X and Y). For this, we use Endeca.stratify(collection()/record[not%20(X)])||X|1||*,Endeca.stratify(collection()/record[not%20(Y)])||Y|1.
We can also apply filter on the columns where we display data sorted asc/desc. We used Dynamic Ranking in Endeca and created dimensions for each field with selecting dynamic ranking and set maximum dimension value to return as 20 as per the requirement. Since we know that dynamic ranking is the relevancy ranking, it fetches most used records and does sorting on that data.
However, we need to select 20 unique values and sort them in asc/desc order. Example: if we have date as the column, then we need to fetch 20 unique dates with most recent at the top. i.e. in descending order.
Is there any other way to do sorting on filter apart from dynamic ranking? If we disable dynamic ranking, then we won't have option to set maximum dimension value as 20 from developer studio.
Please suggest for the ranking.

We finally found a solution!! I removed/unchecked "dynamic ranking" for the properties in dimensions from the pipeline using developer studio. I did not want it to remove since we had already selected an option as sort "alphabetically" instead of "dynamically" in dynamic ranking tab in dimensions.
Also, if we uncheck dynamic ranking then the option for giving maximum limit for displaying the dimensions (which was set 20 for us as per the requirement) was also gone.
So, I handled this in java to display only 20 values by putting a check on results obtained and created a counter which would add values only till the 20 are received. Now this is working as required!!!!!

Related

PowerBI groupby with filters

My company has tasked with slicing the information on turnover and to create different graphs.
My source data looks like this: Relevant columns are: Voluntary/Involuntary, Termination Reason, Country, Production, and TermDateKey
I am trying to get counts using different filters on the data. I managed to get the basic monthly total using the formula:
Term Month Count = GROUPBY('Turnover Source','Turnover Source'[TermDateKey],"Turnover Total Count", COUNTX(CURRENTGROUP(),'Turnover Source'[TermDateKey]))
This gave me a new sheet with the counts for each month.
Table that shows TermDateKey on Column 1, and Counts on column 2
I am trying to add onto this table by adding counts but using different filters.
For example, I am trying to add another column that gives me the monthly count but filtered for 'Turnover Source'[Voluntary/Involuntary]=="Voluntary". Then another column for 'Turnover Source'[Voluntary/Involuntary]=="Involuntary" and so on. I have not found anywhere that shows me how to do this and when I add in the FILTER function it says that GROUPBY(...) can only work on CURRENTGROUP().
Can some one point me to a resource that will give me the solution I need? I am at a loss, thank you all.
It looks like you may not be aware that you don't have to calculate all possible groupings with DAX formulas.
The very nature of Power BI is that you use a column like "Termination Reason" on an X axis or in the legend of a visual. Any measure that you have created on values of another column, for e.g. a count of all rows, will then automatically be calculated to be grouped by the values in "Termination Reason", giving you a count of each of the values in the column.
You do NOT need DAX functions to calculate the grouping values for each measure for each column value combination.
Here is some simple sample data that has been grouped into dates and colours, one chart showing a count of each colour and one chart showing a sum of the Value column. No DAX was written for that.
If your scenario is different, please explain.

Grafana Templating

I am wondering if there is a way to limit the number of rows generated from grafana templating.
I can have a drop down variable "$x"on my grafana page and I can select the row editor and say repeat row for every value under $x.
Based on different criteria, I can have x anywhere between 1 and like 160 rows. I need to only be looking at about 10 at a time. I am wondering if I can control the number of rows shown/change the rows shown somewhere in grafana.
I can manually select items from the $x drop down to show only a few items of course, but its a matter of selecting only say.. 10 items right when the page loads.
Please let me know if I am not describing the problem correctly or if I need to clarify more.
Thanks,
Karan
As far as I know there is no direct option for this but there are some ways you may be able to achieve what you want.
You could select your ~10 default entries and then save the dashboard this will save the selected ones in the dashboards JSON. (or modify the JSON of the dashboard directly)
You could use the regex field in the template settings to filter some of your values and split them in groups this way. (one variable per regex group)
You could change your data in elasticsearch in order to use multiple fields where you can split on.
see PR #5616 as #Daniel Lee mentioned
In general I think you get a faster response to this in grafana directly at github.

Linq to SQL - Random Select Order and Paging

We have a database with 2,00,000 vendor in 100 plus category, if someone visit the website we want to allow them to select a category and show them 25 Vendor per page, first we kept order by VendorId but it always use to get first 25, but we removed it, but now in paging it sometime repeat the vendor, is there a way to get random 25 vendor and also keep the paging.
Regards
you can randomize your result but everytime you dot he query, it will create new random list so unless you randomize and save the randomized state in your Code and page over it, it cant be done straightforward way.
refer, SQL Query results pagination with random Order by in SQL Server 2008
I believe this requirement is impossible to implement if a new random order is needed every time, there needs to be good performance and every item should have equal chance to get selected. I believe you should redesign the way your application works.
One possible workaround is to have a couple of columns in a table and fill them with random numbers. When a user requests the list assign the random column to him (stick it in the URL for example). Then do an order by that column and display the results. Randomly switch 4-5 columns to create the appearance of randomness. Update the random numbers in the columns once a day.

Joining grouped tables

I have two different scripted data sets that I am pulling data from and aggregating (on the same key). What I want to do is to display one one line the aggregated data from both sources. The data is coming from a scripted data source (POJOs).
A simplified example is given below in which an Order has many Components, with each component being for a different customer at a different quoted price. Then when each Order is filled in different lots (or Fills) at different prices. I want to be able to produce a summary of each Order with the total Ordered and Filled quantity, and the weighted average quoted price and filled price.
An Order Component table
Order ID, Customer Num, Qty, Quoted Px
Ord01,Cust01,3,100
Ord01,Cust02,3,102
Ord02,Cust01,5,200
Ord02,Cust03,5,204
And then a Order Fullfillment table
OrderID,FillId,Qty,CostPx
Ord01,F01,4,100
Ord01,F02,2,106
Ord02,F03,2,200
Ord02,F04,8,210'
I would like to display something like this:
Order ID, Order Qty, Fill Qty, Avg Order Px, Avg Fill Px
Ord01, 6, 6, 101, 102
Ord02, 10, 10, 202, 208
I've tried using subreports and that seems to be able to get me the results but in a terrible format. The subtable headers repeat so every order gets it's own headers.
You may want to create a BIRT joined dataset between your two scripted datasets, based on a full outer join on the "order ID" column, and then use this joined dataset in your report. It should meet your needs.
I solved my problem by more or less following the following guide.
So I created a List linked to my first data source. I then added a group on Order ID so that I had one list row per Order. In the group header I added a 2x1 grid, I placed a table of the Order Components into one side of the grid and a table of Fills into the other. I had to add filters to both of these so that they only contained data for the correct OrderId. I then grouped the tables by OrderId, added my aggregation fields.
All that is left is to set the visibilty. So I set the visibility of the table details to false. In order to only show the table header once (instead of once per order) I added a Running Count aggregation to the List and set the visibiity to invisible when this aggregation was greater than 1.
Was actually quite easy in the end but took me ages to work out how to do it.

Random exhaustive (non-repeating) selection from a large pool of entries

Suppose I have a large (300-500k) collection of text documents stored in the relational database. Each document can belong to one or more (up to six) categories. I need users to be able to randomly select documents in a specific category so that a single entity is never repeated, much like how StumbleUpon works.
I don't really see a way I could implement this using slow NOT IN queries with large amount of users and documents, so I figured I might need to implement some custom data structure for this purpose. Perhaps there is already a paper describing some algorithm that might be adapted to my needs?
Currently I'm considering the following approach:
Read all the entries from the database
Create a linked list based index for each category from the IDs of documents belonging to the this category. Shuffle it
Create a Bloom Filter containing all of the entries viewed by a particular user
Traverse the index using the iterator, randomly select items using Bloom Filter to pick not viewed items.
If you track via a table what entries that the user has seen... try this. And I'm going to use mysql because that's the quickest example I can think of but the gist should be clear.
On a link being 'used'...
insert into viewed (userid, url_id) values ("jj", 123)
On looking for a link...
select p.url_id
from pages p left join viewed v on v.url_id = p.url_id
where v.url_id is null
order by rand()
limit 1
This causes the database to go ahead and do a 1 for 1 join, and your limiting your query to return only one entry that the user has not seen yet.
Just a suggestion.
Edit: It is possible to make this one operation but there's no guarantee that the url will be passed successfully to the user.
It depend on how users get it's random entries.
Option 1:
A user is paging some entities and stop after couple of them. for example the user see the current random entity and then moving to the next one, read it and continue it couple of times and that's it.
in the next time this user (or another) get an entity from this category the entities that already viewed is clear and you can return an already viewed entity.
in that option I would recommend save a (hash) set of already viewed entities id and every time user ask for a random entity- randomally choose it from the DB and check if not already in the set.
because the set is so small and your data is so big, the chance that you get an already viewed id is so small, that it will take O(1) most of the time.
Option 2:
A user is paging in the entities and the viewed entities are saving between all users and every time user visit your page.
in that case you probably use all the entities in each category and saving all the viewed entites + check whether a entity is viewed will take some time.
In that option I would get all the ids for this topic- shuffle them and store it in a linked list. when you want to get a random not viewed entity- just get the head of the list and delete it (O(1)).
I assume that for any given <user, category> pair, the number of documents viewed is pretty small relative to the total number of documents available in that category.
So can you just store indexed triples <user, category, document> indicating which documents have been viewed, and then just take an optimistic approach with respect to randomly selected documents? In the vast majority of cases, the randomly selected document will be unread by the user. And you can check quickly because the triples are indexed.
I would opt for a pseudorandom approach:
1.) Determine number of elements in category to be viewed (SELECT COUNT(*) WHERE ...)
2.) Pick a random number in range 1 ... count.
3.) Select a single document (SELECT * FROM ... WHERE [same as when counting] ORDER BY [generate stable order]. Depending on the SQL dialect in use, there are different clauses that can be used to retrieve only the part of the result set you want (MySQL LIMIT clause, SQLServer TOP clause etc.)
If the number of documents is large the chance serving the same user the same document twice is neglibly small. Using the scheme described above you don't have to store any state information at all.
You may want to consider a nosql solution like Apache Cassandra. These seem to be ideally suited to your needs. There are many ways to design the algorithm you need in an environment where you can easily add new columns to a table (column family) on the fly, with excellent support for a very sparsely populated table.
edit: one of many possible solutions below:
create a CF(column family ie table) for each category (creating these on-the-fly is quite easy).
Add a row to each category CF for each document belonging to the category.
Whenever a user hits a document, you add a column with named and set it to true to the row. Obviously this table will be huge with millions of columns and probably quite sparsely populated, but no problem, reading this is still constant time.
Now finding a new document for a user in a category is simply a matter of selecting any result from select * where == null.
You should get constant time writes and reads, amazing scalability, etc if you can accept Cassandra's "eventually consistent" model (ie, it is not mission critical that a user never get a duplicate document)
I've solved similar in the past by indexing the relational database into a document oriented form using Apache Lucene. This was before the recent rise of NoSQL servers and is basically the same thing, but it's still a valid alternative approach.
You would create a Lucene Document for each of your texts with a textId (relational database id) field and multi valued categoryId and userId fields. Populate the categoryId field appropriately. When a user reads a text, add their id to the userId field. A simple query will return the set of documents with a given categoryId and without a given userId - pick one randomly and display it.
Store a users past X selections in a cookie or something.
Return the last selections to the server with the users new criteria
Randomly choose one of the texts satisfying the criteria until it is not a member of the last X selections of the user.
Return this choice of text and update the list of last X selections.
I would experiment to find the best value of X but I have in mind something like an X of say 16?

Resources