data.gov.in : limit parameter not working - url-parameters

I am trying the following
https://data.gov.in/api/datastore/resource.json/?resource_id=e16c75b6-7ee6-4ade-8e1f-2cd3043ff4c9&api-key=APIKEY&limit=200
I still get only 100 records. If I change the limit to 50 it gives me 50 records. How do I get the records from 101 - 200 and beyond?
I also tried using the offset parameter like so :
&offset=50
expecting it to give me record number 50-150, but it doesn't.
Does anyone have an Idea?

Try this query from OGD Platform.
you will find total_records = 2947 and count=100.
Here, you have total of 2947 records and maximum of 100 records can be fetched in one query. If you want next 200 result, set offset=2 which will give results from 201 to 300 and so on. You need to increase your offset by 1 in each query till 2947/100 = 29 (29th query will give 47 records) to get all data.
Parameter limit is used to fetch total number of records in each query and that will be between 0 to 100 (max). That's why when you set limit=50, you got 50 records but if you set limit=110, still you will get 100 records only.
Hope I my answer is clear enough. Good luck.

Related

In Pentaho Data Integration can I output conditionally?

I need to output a different CSV file every 100 rows. For example, if there are 305 rows in a stream, I'd need to output a CSV for rows one through 100, 101 to 200, 201 to 300, and 301 to 305.
I got a column for the last row number, and built a page number variable which increments every 100 rows. I then tried searching online since I can't yet conceptualize a solution.
var numberOfInvoicePages = Math.ceil(Number(lastRow) / 300);
if(rowNumber % 300 == 0){
pageNumber += 1;
}
I expect to get a CSV which says ${baseTitle} ${pageNumber} for each page, and for the actual results I don't yet know how to build this.
In the Text File output step, you can adjust on how many rows the output will split to another file, under the option 'Split ever ... rows'.

How to Calculate only distinct values in rdlc report

Hey I want to sum only distinct values in rdlc report but I was failed every time. Give some good tips to complete this task.
I have a rdlc report and Values return like this
Given Wt Retained Given% Discount Net Payment
23.86 10.225 70 1406 68911
6.007 2.575 70 177 17528
23.86 10.225 70 1406 68911
6.007 2.575 70 177 17528
I want to sum only distinct value that is 23.86+6.007=29.867 but I got when I use =SUM(Cdec(Fields!Given_wt.Value)) i.e 59.734
Please give some useful tips

OFFSET/LIMIT only count DISTINCT values in Activerecord query

I am running this query
Playlistship.order("created_at desc").select("distinct playlist_id").limit(12).offset(2)
This query does not necessarily return 12 records. It returns the number of distinct records in the set of 12 defined by the LIMIT, OFFSET and ORDER parameters.
For example if the Playlistships between id=13 and id=24 had playlist_ids of [2,3,3,5,6,3,5,6,8,11,12,12], then this query will only give return 7 records, corresponding to the first ones having the playlist_ids [2,3,5,6,8,11,12].
What I would like to find is a query that yields 12, records with distinct playlist_ids, with the correct offset so that running this query again with an OFFSET of 3 would yield the next 12 records with distinct playlist_ids.
Hopefully I didn't "over explain" this one, as I think it's a relatively straightforward question. Please ask for more details if you need them.
Thanks!
Have you tried with subqueries? Give this a try:
Playlistship.select("distinct playlist_id").limit(12).where(playlist_id: Playlistship.order("created_at desc").select('playlist_id').offset(2))

Way to fetch rown in batch from mysql using hibernate from mentioned row onward

I have one query in service class
PageRequest page = new PageRequest(pageNo,batchSize , new Sort(new Order(Direction.ASC, "Id")));
Page pPage = this.pRepository.findByStatusAndParentPId( Status.PENDING, -1, page);
where batchSize=500
In repository we have following code:
#Query("select p from Part p where p.succeedOn IS NOT NULL and p.Status=? and p.parentId=?")
Page<Payment> findByStatusAndParentId( String status, Integer parentId, Pageable p);
Now flow is like this, i want to fetch 500 rows everytime whose status is pending and then i need to process it and need to change the status to success. So using Pageable is giving me wrong result because , suppose it fetch first 500 rows whose status is pending , it processed it and changed the status to success, now it will again make sql query and will fetch row from 501 to 1000 , but actually row 1 to 500 also have status pending as older processed rows status changed to success, so they will not be covered in sql query.
Now to solve this i want to do pagenation with from last row of last fetched Id , say last time it fetched from 100 to 600, then i want to give 601 as argument and want to fetch all rows onward..... hopefully i am able to explain my answer. Limit in query do not work in JPA.
Thanks
I think you can just keep fetching page 1. Assuming each batch is processed in its own transaction, then by the time you process the second batch, page 1 should give you the next 500 un-processed rows

Cassandra slow get_indexed_slices speed

We are using Cassandra for log collecting.
About 150,000 - 250,000 new records per hour.
Our column family has several columns like 'host', 'errorlevel', 'message', etc and special indexed column 'indexTimestamp'.
This column contains time rounded to hours.
So, when we want to get some records, we use get_indexed_slices() with first IndexExpression by indexTimestamp ( with EQ operator ) and then some other IndexExpressions - by host, errorlevel, etc.
When getting records just by indexTimestamp everything works fine.
But, when getting records by indexTimestamp and, for example, host - cassandra works for long ( more than 15-20 seconds ) and throws timeout exception.
As I understand, when getting records by indexed column and non-indexed column, Cassandra firstly gets all records by indexed column and than filters them by non-indexed columns.
So, why Cassandra does it so slow? By indexTimestamp there are no more than 250,000 records. Isn't it possible to filter them at 10 seconds?
Our Cassandra cluster is running on one machine ( Windows 7 ) with 4 CPUs and 4 GBs memory.
You have to bear in mind that Cassandra is very bad with this kind of queries. Indexed columns queries are not meant for big tables. If you want to search for your data around this type of queries you have to tailor your data model around it.
In fact Cassandra is not a DB you can query. It is a key-value storage system. To understand that please go there and have a quick look: http://howfuckedismydatabase.com/
The most basic pattern to help you is bucket-rows and ranged range-slice-queries.
Let's say you have the object
user : {
name : "XXXXX"
country : "UK"
city : "London"
postal_code :"N1 2AC"
age : "24"
}
and of course you want to query by city OR by age (and & or is another data model yet).
Then you would have to save your data like this, assuming the name is a unique id :
write(row = "UK", column_name = "city_XXXX", value = {...})
AND
write(row = "bucket_20_to_25", column_name = "24_XXXX", value = {...})
Note that I bucketed by country for the city search and by age bracket for age search.
the range query for age EQ 24 would be
get_range_slice(row= "bucket_20_to_25", from = "24-", to = "24=")
as a note "minus" == "under_score" - 1 and "equals" == "under_score" + 1, giving you effectively all the columns that start with "24_"
This also allow you to query for age between 21 and 24 for example.
hope it was useful

Resources