MailChimp retrieve large list data - mailchimp

I have a campaign that I want to analyze, and for this I use the MailChimp open-details API call:
curl -X GET 'https://<server>.api.mailchimp.com/3.0/reports/<CAMPAIGN ID>/open-details?offset=50&count=1000' --user "anystring:APIKEY"
However, the campaign has 36 thousand members, and the limit seems to be one thousand. I have set the offset to 50, but I cannot see any link for the next page in the pagination. Is it possible to retrieve such a large number from MailChimp?

Offset is the number of records to skip, not the number of pages. To get the 2nd page, you need to set offset to 1000.
curl -X GET 'https://<server>.api.mailchimp.com/3.0/reports/<CAMPAIGN ID>/open-details?offset=1000&count=1000' --user "anystring:APIKEY"
To get the 3rd page, you need to set offset to 2000.
curl -X GET 'https://<server>.api.mailchimp.com/3.0/reports/<CAMPAIGN ID>/open-details?offset=2000&count=1000' --user "anystring:APIKEY"
And so on.
Check the number of records in the result to determine when to stop iterating. If the number of records is less than 1000, then you reached the last page.

Related

REST API - Retrieve previous query in dynamoDB

I have 100 rows of data in DynamoDB and a api with path api/get/{number}
Now when I say number=1 api should return me first 10 values. when I say number=2 it should return next 10 values. I did something like this with query, lastEvaluatedKey and sort by on createdOn . Now the use case is if the user passes number=10 after number=2 the lastEvaluatedKey is still that of page 2 and the result would be data of page 3. How can I get data directly. Also if the user goes from number=3 to number=1 still the data will not be of page 1.
I am using this to make API call based of pagination on HTML.
I am using java 1.8 and aws-java-sdk-dynamodb.
Non-sequential pagination in DynamoDB is tough - you have to design your data model around it, if it's an operation that needs to be efficient at all times. For a recommendation in your specific case I'd need more details about the data and access patterns.
In general you have the option of setting the ExclusiveStartKey attribute in the query call, which is similar to an offset in relational databases, but only similar and not identical. The ExclusiveStartKey is the key after which the query will continue, meaning data from your table and not just a number.
That means you usually can't guess it, unless it's a sequential number - which isn't ideal.
For sequential pagination, i.e. the user goes from page 1 to page 2, page 2 to page 3 etc. you can pass that along in the request as a token, but that won't work if the user moves in the other direction page 3 to page 2 or just randomly navigates to page 14.
In your case you only have a limited amount of data - 100 items, so my solution for your specific case would be to query all items and limit the amount of items in the response to n * 10, where n is the result page. Then you return the last 10 items from that result to your client.
This is a solution that would get expensive at scale (time + cost) though, fortunately not many people will use the pagination to go to page 7 or 8 though (you could bury a body on page 2 of the google search results).
Yan Cui has written an interesting post on this problem on Hackernoon, you might want to check it out.

Infinite/paginated scrolling with caching

I have a requirement where I need to display a long table. It doesn't have to be displayed all at once, so ajax loading it is (load first 50 recs, then get another 50 rows everytime the user scrolls to/past the tenth row from the last).
But I'm not sure which of the two, pagination and infinite scrolling, is better. I'd like the user to be able to skip to the last scrolled-to point when returning to the page (through Back button, definitely; if I can do that whenever, however user visits the page, even better!) with the previous rows visible as well. At the same time, for performance, I want to restrict the number of ajax calls to as low as I can keep it.
Any thoughts?
To implement such scenerio, first consume an api with page no and number of records as request params in API calls
For Ex- 'www.abc.com/v1/tableData/pageId=1&noOfRecords=50'
Then you will get the first 50 records. Its response should also provide you the total number of recors avaiallbe in database after callling first api .
When you scroll down, increase the pageId with +1
For ex - 'www.abc.com/v1/tableData/pageId=2&noOfRecords=50'
In the same way, you will increase the pageId untill you check the total records you got till now, should be equals to the total records, you are getting from API key.
In this way you can able to impmentent it.
Talking about performance, its up to you whther you are using pagination or scroll, it does not matter, since you are restricting the number of records to display.

How to combine multiple page records?

I have a model called DemoModel and contains 1000 records in DB. So i am paginating using paginator in Django(assume that per page 15 records, so i have 67 pages).
So i want to get the records of 3,4 and 5 pages and i have to append the records into list.
So can i get the objects_list based on page range or anything else i want to do?
Example:
records.page(1)
Here i am getting only one page records at a time, but how can i get multiple page records i.e; from fist page to third page
Assuming you are asking about the API request to get the paginated resources, and you are using the default pagination class: rest_framework.pagination.LimitOffsetPagination, then you can make an request as such:
https://api.example.org/accounts/?limit=30&offset=15
which in turns give you the 2nd and 3rd "page".
The limit indicates the maximum number of items to return, and is equivalent to the page_size in other styles. The offset indicates the starting position of the query in relation to the complete set of unpaginated items. doc link

Elasticsearch Scroll

I am little bit confused over Elasticsearch by its scroll functionality.
In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set?
From documentation
"search_type" => "scan", // use search_type=scan
"scroll" => "30s", // how long between scroll requests. should be small!
"size" => 50, // how many results *per shard* you want back
Is that mean it will perform search for every 30 seconds and returns all the sets of results until there is no records?
For example my ES returns total 500 records. I am getting an data from ES as two sets of records each with 250 records. Is there any way I can display first set of 250 records first, when user scrolls then second set of 250 records.Please suggest
What you are looking for is pagination.
You can achieve your objective by querying for a fixed size and setting the from parameter. Since you want to set display in batches of 250 results, you can set size = 250 and with each consecutive query, increment the value of from by 250.
GET /_search?size=250 ---- return first 250 results
GET /_search?size=250&from=250 ---- next 250 results
GET /_search?size=250&from=500 ---- next 250 results
On the contrary, Scan & scroll lets you retrieve a large set of results with a single search and is ideally meant for operations like re-indexing data into a new index. Using it for displaying search results in real-time is not recommended.
To explain Scan & scroll briefly, what it essentially does is that it scans the index for the query provided with the scan request and returns a scroll_id. This scroll_id can be passed to the next scroll request to return the next batch of results.
Consider the following example-
# Initialize the scroll
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'scan',
size = 1000,
body = {
# Your query's body
}
)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
# Start scrolling
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print "scroll size: " + str(scroll_size)
# Do something with the obtained page
In above example, following events happen-
Scroller is initialized. This returns the first batch of results along with the scroll_id
For each subsequent scroll request, the updated scroll_id (received in the previous scroll request) is sent and next batch of results is returned.
Scroll time is basically the time for which the search context is kept alive. If the next scroll request is not sent within the set timeframe, the search context is lost and results will not be returned. This is why it should not be used for real-time results display for indexes with a huge number of docs.
You are understanding wrong the purpose of the scroll property. It does not mean that elasticsearch will fetch next page data after 30 seconds. When you are doing first scroll request you need to specify when scroll context should be closed. scroll parameter is telling to close scroll context after 30 seconds.
After doing first scroll request you will get back scroll_idparameter in response. For next pages you need to pass that value to get next page of the scroll response. If you will not do the next scroll request within 30 seconds, the scroll request will be closed and you will not be able to get next pages for that scroll request.
What you described as an example use case is actually search results pagination, which is available for any search query and is limited by 10k results. scroll requests are needed for the cases when you need to go over that 10k limit, with scroll query you can fetch even the entire collection of documents.
Probably the source of confusion here is that scroll term is ambiguous: it means the type of a query, and also it is a name of a parameter of such query (as was mentioned in other comments, it is time ES will keep waiting for you to fetch next chunk of scrolling).
scroll queries are heavy, and should be avoided until absolutely necessary. In fact, in the docs
it says:
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, ...
Now regarding your another question:
In elasticsearch is it possible to call search API everytime whenever the user scrolls on the result set?
Yes, even several parallel scroll requests
are possible:
Each scroll is independent and can be processed in parallel like any scroll request.
The documentation of the Scroll API at elastic explains this behaviour also.
The result size of 10k is a default value and can be overwritten during runtime, if necessary:
PUT { "index" : { "max_result_window" : 500000} }
The life time of the scroll id is defined in each scroll request with the parameter "scroll", e.g.
..
"scroll" : "5m"
..
In recent versions of Elasticsearch, you'll use search_after. The keep_alive you set there, much like the timeout in the scroll, is only the time needed for you to process one page.
That's because Elasticsearch will keep your "search context" alive for that amount of time, then removes it. Also, Elasticsearch won't fetch the next page for you automatically, you'll have to do that by sending requests with the ID from the last request.
It is wise to use the scroll api as one can not get more than 10K data at a time in elasticsearch.

Youtube Data API v3 - Able to fetch page results only for limited number of pages

I am making use of the Search.List method from YouTube Data API v3 and doing a keyword search with maxResults=50 per page. The totalResults has a value more than 13000 and I am able to send the nextPageToken from the second query and fetch the subsequent page results. But beyond 10-12 pages I do not get the nextPageToken parameter in my response at all.(Since the totalResults is more than 13000, I should atleast get around 260 pages.)
How do I get page results for the remaining pages? Is this something to do with the quota?

Resources