I'm actually working on integrating an external API which uses pagination with pages, links and which also has some limits we should respect, for example : not more than 1 request per second.
So I've created a command, that will be scheduled every hour. This command goals is to fetch all data from the exteral API and dispatch them to a queued job so that I can throttle requests with redis.
The main problem is that this API has a pagination such as :
"links": [
{
"href": "https://testapi.io/v1/datas?size=50&limit=2000&page=1",
"rel": "first",
"method": "GET"
},
{
"href": "https://testapi.io/v1/datas?size=50&limit=2000&page=1",
"rel": "previous",
"method": "GET"
},
{
"href": "https://testapi.io/v1/datas?size=50&limit=2000&page=2",
"rel": "next",
"method": "GET"
},
{
"href": "https://testapi.io/v1/datas?size=50&limit=2000&page=4",
"rel": "last",
"method": "GET"
}
]
So I've tried to code an algorithm which :
Creates a job to fetch for the first page of results
This job make a request to the external API and saves first results
Doing a foreach loop on pages until I manage to get to the value in "last" in the links
Each pages should call the same job to register data into my database
The main problem is that my jobs will not wait for eachothers before being fired, which means, once I'm done with the first page, I can't return the current page I'm on, so there is no way I know when I'm at the end of results without returning data from the first executed job ?
I think this is not a problem related to Larave, but more a logical problem. This is the code I'm trying to implement in my command :
$testService = new testService();
// We fetch all datas of external API
$page = 1;
$lastPage = 0;
// We get datas of the first page of the API results
$results = MyJob::dispatch($this->size, $page, $testService)->onConnection('medium');
$page++;
// After the first request, we get the "last" page from the "links" of the reponse
// This can't work because my job will not return any data
$pages = $results->links;
foreach($pages as $singlePage)
{
if($singlePage->rel == "last")
{
$lastPage = substr($singlePage->href, -1, 1);
}
}
// While we have some datas and pages to explore we fetch them
while($page <= $lastPage)
{
MyJob::dispatch($this->size, $page, $testService)->onConnection('medium');
$page++;
}
The problem is that I can only know the total number of pages through the first job executed ... Which means actually my code would run in an infinity loop or will not work at all because I can't return data from the first executed job.
Is there any logical solution to fetch and to loop on an external API whith one job / request to the API to be sure to respect limitations ?
Related
During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}
I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.
"items": {
"a" : {
"size" : "small",
"text" : "small thing"
},
"b" : {
"size" : "medium",
"text" : "medium sample"
},
"c" : {
"size" : "large",
"text" : "large widget"
}
}
Suppose, I have data as above. I want to get data of key a from items list in component file without iterating over whole list. I tried from the docs. But could not find this kind of requirement. Solutions on this platform are related to previous versions before 5.0. Does anyone know how can we achieve it?
I suppose you are referring to Real Time DB because you mentioned "items list", where in FireStore it is collection.
To get an object, use the AngularFireDatabase.object() function.
Examples:
// Get an object, just the data (valuesChanges function), only once (take(1) function)
import 'rxjs/add/operator/take';
let subscribe = AngularFireDatabase.object('items/a').valueChanges().take(1).subscribe(
data =>{
console.log(data);
}
);
Another way is to call the subscrible.unsubscribe() function inside the subscrible return instead of using the take() function:
let subscribe = AngularFireDatabase.object('items/a').valueChanges().subscribe(
data =>{
console.log(data);
subscribe.unsubscribe();
}
);
The above example get the data at the time of execution, not getting changes that occur with the object after that.
// Get an object, just the data (valuesChanges function), and all subsequent changes
let subscribe = AngularFireDatabase.object('items/a').valueChanges().subscribe(
data =>{
console.log(data);
}
);
The above example get the data every time there are changes in the object.
I have tested both examples.
You can read about AnglarFire2 version 5, here and here.
You can read about retrieve data from objects here.
I am using firebase real-time database in my application, but I am facing one weird issue. The first firebase call takes too much time after first response the works much faster.
Database.database().reference().child(FireBaseTable.bpmTable).child(firebaseKey).queryOrdered(byChild: "timestamp").observeSingleEvent(of: .value, with: { (snapshot) in
print("Initial load done2")
})
After first response, same code with diff/same key gives much faster response.
A solution could be to index data in your rules: https://firebase.google.com/docs/database/security/indexing-data
{
"lambeosaurus": {
"height" : 2.1,
"length" : 12.5,
"weight": 5000
},
"stegosaurus": {
"height" : 4,
"length" : 9,
"weight" : 2500
}
}
Index can be set like this:
{
"rules": {
"dinosaurs": {
".indexOn": ["height", "length"]
}
}
}
"Firebase allows you to do ad-hoc queries on your data using an
arbitrary child key. If you know in advance what your indexes will be,
you can define them via the .indexOn rule in your Firebase Realtime
Database Rules to improve query performance."
In a previous discussion thread
(How to output multiple blobs from an Azure Function?)
it was explained to me how to use imperative binding to output multiple blobs from a single
Azure Function invocation. I have had partial success with that approach, and need guidance
on diagnosing this problem. My function triggers on a blob, processes it, and generates
multiple output blobs. Basically it is partitioning a big data table by date.
When I trigger it with a small blob (8K) it works fine. When I process a bigger blob (2M),
all the logging in the Function indicates that it was successful, but the function monitor
blade shows that it failed:
Failure: The operation was canceled.
Again, the Function log has all my logging and no errors.
The invocation that succeeded took 1785ms.
The invocation that failed multiple entries in the invocation log (I assume because the
blog didn't get marked as processed). Their times are all around 12,000ms. That time
is well within the five minute limit for a function.
I assume that I've hit some limit with imperative binding timing out.
I am seeking guidance on how to diagnose and resolve this problem. The files I actually have to process are up to 20M so will take even longer to process but would still be under 5 minutes.
function.json:
{
"bindings": [
{
"name": "myBlob",
"type": "blobTrigger",
"direction": "in",
"path": "dms/{blobname}",
"connection": "deal2_STORAGE"
},
{
"type": "queue",
"name": "emailQueueItem",
"queueName": "emailqueue",
"connection": "deal2_STORAGE",
"direction": "out"
}
],
"disabled": false
}
run.csx
public static async Task Run(Stream myBlob, string blobname, IAsyncCollector<string> emailQueueItem, Binder binder, TraceWriter log)
{
...
try {
...
foreach (var dt in dates) {
blobPath = $"json/{fileNamePart}_{dateString}";
var attributes = new Attribute[] {
new BlobAttribute(blobPath),
new StorageAccountAttribute("deal2_STORAGE")
};
using (var writer = await binder.BindAsync<TextWriter>(attributes).ConfigureAwait(false)) {
writer.Write( jsonString.ToString() );
}
}
...
await emailQueueItem.AddAsync( $"{{\"script\":\"DmsBlobTrigger\",\"tsvFileName\":\"{tsvFileName}\",\"status\":\"{retval}\",\"message\":\"{statusMessage}\"}}" );
} catch (Exception excp) {
Logger.Info(excp.ToString());
}
}
How can I read JSON from a url in MQL5?
For example this simple JSON from: https://api.myjson.com/bins/56z28
{ "employees": [ { "firstName": "John",
"lastName": "Doe"
},
{ "firstName": "Anna",
"lastName": "Smith"
},
{ "firstName": "Peter",
"lastName": "Jones"
}
]
}
Simple, but restrictions apply.
MetaTrader Terminal 5 is a code-execution environment, that can communicate with an external URL target (if explicitly configured as a permitted URL) via both HTTP/HTTPS protocols over port 80/443 respectively.
string aCookieHOLDER = NULL,
aHttpHEADERs;
char postBYTEs[],
replBYTEs[];
int aRetCODE;
string aTargetURL = "https://api.myjson.com/bins/56z28";
/* to enable access to the URL-> pointed server,
you should append "https://api.myjson.com/bins/56z28"
to the list of allowed URLs in
( Main Menu -> Tools -> Options, tab "Expert Advisors" ):
*/
ResetLastError(); // Reset the last error code
int aTIMEOUT = 5000; // less than 1 sec. is NOT
// enough for slow Internet connection
aRetCODE = WebRequest( "GET",
aTargetURL,
aCookieHOLDER,
NULL,
aTIMEOUT,
postBYTEs,
0,
replBYTEs,
aHttpHEADERs
);
if ( aRetCODE == EMPTY ) // Check errors
{ Print( "Error in WebRequest(). Error code = ", GetLastError() );
}
else
{ // Load was successful, PROCESS THE STRING ... assumed to be a JSON
}
As noted in code, to use the WebRequest() function, one has to add the addresses of all the required URLs (servers) a-priori in the list of allowed URLs in the "Expert Advisors" tab of the "Options" window. Server port is automatically selected on the basis of the specified protocol - 80 for "http://" and 443 for "https://" (not a free option...).
The WebRequest() function is synchronous, which means its breaks/blocks(!) the program execution and waits for the response from the requested URL. Since the delays in receiving a response can be large, the function is not available for calls from the indicators, because indicators run in a common thread shared by all indicators and charts on one symbol. Indicator performance delay on one of the charts of a symbol may stop updating of all charts of the same symbol (!!!!).
The function can be called only from Expert Advisors and scripts, as they run in their own execution threads. If you try to call the function from a Custom Indicator, GetLastError() will return error 4060 – "Function is not allowed for call".
WebRequest() cannot be executed in the Strategy Tester.
Bad news?
If all this sounds as a bad news to your Project, do not give up. MQL code can call DLL-functions, so one can integrate a fair, distributed, non-blocking communicator, that cooperates with MQL code smoothly and does not include any of the above listed limitations in a production system.