Get 50 random rows from Sphinx query

Get 50 random rows from Sphinx query - random

I have a PHP website using Sphinx by sphinxClient lib.
My Sphinx query has some filters and a limit of 2500 rows. From the 2500 rows (could be less), I want to fetch just 50 random rows.
Is there a way to do it using only Sphinx?
Edit: The original query already sorted by the popularity of the rows, the main idea is to get 50 random products from the 2500 most popular. Because of that I can't sort by random

See SPH_SORT_EXTENDED. There is a #random sort order
http://sphinxsearch.com/docs/current.html#sorting-modes

What's the issue with using order by RAND() limit 50
Check this link for similar result.
sphinxQL fetching random?

Related

Error: Data too large, in google shets with importrange and query criterias [duplicate]

I think I have read every forum online, and haven't been able to find a solution.
I am trying to query data in a different Google Workbook. My data has approximately 127,000 rows and has 5 columns.
It seems that importrange only works up to about 5,000 rows. Is there any other way to query data of this size? The results would be up to 100 rows, but it is necessary to search all 127,000 rows.

divide your QUERY / IMPORTRANGE into smaller chunks and wrap them into array constellation:
={QUERY(IMPORTRANGE("ID", "A1:A5000"), "select Col1", 0);
QUERY(IMPORTRANGE("ID", "A5001:A10000"), "select Col1", 0);
QUERY(IMPORTRANGE("ID", "A10001:A15000"), "select Col1", 0)}

Filtering out nested sub-rows without using index in Tableau?

Nested Rows
For example, how would I filter out all the issues per company that have less than 2,000 counts? I've done it with index but that only shows N amount of rows for each company but I want to show rows that have more than 2,000 counts.
nested rows
Filtering by condition where issue count > 2000 seems to filter out issues that are less than 2,000 through the sum of all counts of that issue when I want to filter it out per company.
Edit: Added in snapshots of data for clarification
Snapshot
Snapshot2

Try using filter on CNT([ISSUE]) instead of [ISSUE]. That should help.
or alternatively use LOD calculation filter (and add it context)
filter of calculated field CF with TRUE (as >2000)
{FIXED [company] : COUNT([ISSUE])} > 2000
If these are not working please show me a snapshot of your data

How to do pagination in clickhouse

Can you please suggest how can I do pagination in click house?
Dor example in elastic search I do aggregation query like below. Here elastic search takes parameters partition number and partition size and give the result. Let's say in total we have 100 records than if we give partition size of 10 and partition number 2 then we will get 11-20 latest records.
How can we do it in click house considering data in inserting in a table.
SearchResponse response = elasticClient.prepareSearch(index)
.setTypes(documentType)
.setQuery(boolQueryBuilder)
.setSize(0)
.addAggregation(AggregationBuilders.terms("unique_uids")
.field(Constants.UID_NAME)
.includeExclude(new IncludeExclude(partition,numPartitions))
.size(Integer.MAX_VALUE))
.get();

According to specification common sql syntax for limit and offset will work:
LIMIT n, m allows you to select the first m rows from the result after skipping the first n rows. The LIMIT m OFFSET n syntax is also supported.
https://clickhouse.yandex/docs/en/query_language/select/#limit-clause

I think you're wanting to only select a subset of the result set? I haven't needed to do this yet, but seems you could specify the format you want CH to return the data in (https://clickhouse-docs.readthedocs.io/en/latest/formats/index.html) and go from there. For instance, select one of the JSON formats as shown in the ^^ documentation and then get the subset of results appropriate for your situation out of the JSON response.

How to sort in filter without using Dynamic Ranking in Endeca?

We are using Endeca to fetch and display records in frontend as a datagrid. In that datagrid, we have 10 columns and we display data sorted in table on the basis of 2 columns (say X and Y). For this, we use Endeca.stratify(collection()/record[not%20(X)])||X|1||*,Endeca.stratify(collection()/record[not%20(Y)])||Y|1.
We can also apply filter on the columns where we display data sorted asc/desc. We used Dynamic Ranking in Endeca and created dimensions for each field with selecting dynamic ranking and set maximum dimension value to return as 20 as per the requirement. Since we know that dynamic ranking is the relevancy ranking, it fetches most used records and does sorting on that data.
However, we need to select 20 unique values and sort them in asc/desc order. Example: if we have date as the column, then we need to fetch 20 unique dates with most recent at the top. i.e. in descending order.
Is there any other way to do sorting on filter apart from dynamic ranking? If we disable dynamic ranking, then we won't have option to set maximum dimension value as 20 from developer studio.
Please suggest for the ranking.

We finally found a solution!! I removed/unchecked "dynamic ranking" for the properties in dimensions from the pipeline using developer studio. I did not want it to remove since we had already selected an option as sort "alphabetically" instead of "dynamically" in dynamic ranking tab in dimensions.
Also, if we uncheck dynamic ranking then the option for giving maximum limit for displaying the dimensions (which was set 20 for us as per the requirement) was also gone.
So, I handled this in java to display only 20 values by putting a check on results obtained and created a counter which would add values only till the 20 are received. Now this is working as required!!!!!

How to get the total count after Kaminari pagination

I am using rails 3.2. I am paginating my results using .page(1).per_page(10)
like
#users = User.method().page(1).per_page(10)
Now how to find the total count of the users from the pagination
As because #users.count gives 10 from the first page and not the total count
How to get the total count of the users even after pagination
EDIT : #users.total_count gives the whole paginated count

As mentioned in the question, you can get the total count of the records with
#users.total_count
I've added this as an answer for the sake of completeness.

You can use #users.total_count.
However, if you're trying to get the total count for each page, i.e. something like:
20 of 135 results
then total_count will just return 135, not the number of results on the page.
If you want to handle this case as well as the case where the number of results is less than the pagination result number, then I'd go with something like this:
(per_size > #users.total_count) ? #users.total_count : per_size
where per_size is the value you are setting for your per scope (docs here).

User.count would give you the count but it would hit the db. If you are using mongodb #user.length would give you the total count

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get 50 random rows from Sphinx query - random

See SPH_SORT_EXTENDED. There is a #random sort order http://sphinxsearch.com/docs/current.html#sorting-modes

What's the issue with using order by RAND() limit 50 Check this link for similar result. sphinxQL fetching random?

Related

Error: Data too large, in google shets with importrange and query criterias [duplicate]

Filtering out nested sub-rows without using index in Tableau?

How to do pagination in clickhouse

How to sort in filter without using Dynamic Ranking in Endeca?

How to get the total count after Kaminari pagination

Categories

Resources