Power Query sends multiple requests instead of one. How to deal with it? - powerquery

My code for requesting data is simple. Server returns 404 (and simple message "Broken data") every request except every 10th request. On 10th request server returns 200 (With other simple text "Data from server")
So in power query I found this part of code:
producer = (val) =>
let
result = Web.Contents(url, [ManualStatusHandling = {404}]) // (1)
status = Value.Metadata(result)[Response.Status] // (2)
actualResult = if status = 404 then null else result // (3)
in
Text.FromBinary(actualResult)
`
So, when I run request result (1) is okay. And status (2) is okay. But when line (3) executed - instead of using (1) it resends request and gets wrong result.
I've tried to convert (1) to Binary.Buffer but in this case Value.Metadata gives empty record.
How can I force PQ to send only one request or somehow make it manually. Thanks!

Related

Web.Content calling API service and merging pages with List.Transform started to fail

I created PowerBI report which which is connecting to data source via API service. Returning json contains thousands of entities. API service is called via Web.Content function. API service returns always total record count and so we are able to calculate nr. of pages which has to be called to obtain whole dataset. This report is displaying data from our servicedesk app, which is deployed on many servers and for many customers and use Query parameters to connect to any of these servers.
Detail of Power query is below.
Why am I writing here. This report was working without any issue more than 1,5 year but on August 17th one of servers start causing erros in step Pages where are some random lines (pages) with errors - see attached picture labeled "Errors in step Pages". and this is reason that next step Entities (List.Union) in query is stopping refresh and generate errors with message:
Expression.Error: We cannot apply field access to the type List. Details: Value=[List] Key=requests
What is notable
API service si returning records in the same order but faulty lists are random when calling with same parameters
some times is refresh without any error
The same power query called on another server is working correctly , problem is only with one specific server.
This problem started without notice on the most important server after 1,5 year without any problem.
Here is full text power of query for this main source, which is used later in other queries to extract all necessary data. Json is really complicated and I extract from it list of requests, list of solvers, list of solver groups,.... and this base query and its output is input for many referenced queries.
Errors in step Pages
let
BaseAPIUrl = apiurl&"apiservice?", /*apiurl is parameter - name of server e.g. https://xxxx.xxxxxx.sk/ */
EntitiesPerPage = RecordsPerPage, /*RecordsPerPage is parameter and defines nr. of record per page - we used as optimum 200-400 record per pages, but is working also with 4000 record per page*/
ApiToken = FnApiToken(), /*this function is returning apitoken value which is returning value of another api service apiurl&"api/auth/login", which use username and password in body of call to get apitoken */
GetJson = (QParm) => /*definiton general function to get data from data source*/
let
Options =
[ Query= QParm,
Headers=
[
Accept="application/json",
ApiKeyName="apitoken",
Authorization=ApiToken
]
],
RawData = Web.Contents(BaseAPIUrl, Options),
Json = Json.Document(RawData)
in Json,
GetEntityCount = () => /*one times called function to get nr of records using GetJson, which is returned as a part of each call*/
let
QParm = [pp="1", pg="1" ],
Json = GetJson(QParm),
Count = Json[totalRecord]
in
Count,
GetPage = (Index) => /*repeatadly called function to get each page of json using GetJson*/
let
PageNr = Text.From(Index+1),
PerPage = Text.From(EntitiesPerPage),
QParm = [pg = PageNr, pp=PerPage],
Json = GetJson(QParm),
Value = Json[data][requests]
in Value,
EntityCount = List.Max({ EntitiesPerPage, GetEntityCount() }), /*setup of nr. of records to variable*/
PageCount = Number.RoundUp(EntityCount / EntitiesPerPage), /*setup of nr. of pages */
PageIndices = { 0 .. PageCount - 1 },
Pages = List.Transform(PageIndices, each GetPage(_) /*Function.InvokeAfter(()=>GetPage(_),#duration(0,0,0,1))*/), /*here we call for each page GetJson function to get whole dataset - there is in comment test with delay between getpages but was not neccessary*/
Entities = List.Union(Pages),
Table = Table.FromList(Entities, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
I also tried another way of appending pages to list using List.Generate. This is also bringing random errors in list but
it is bringing possibility to transform to table in contrast with original way with using List.Transform, but other referenced queries are failing and contains on the last row errors
When I am exploring content of faulty page/list extracting it via Add as New Query there are always all record without any fail.....
Source = List.Generate( /*another way to generate list of all pages*/
() => [Page = 0, ReqPageData = GetPage(0) ],
each [Page] < PageCount,
each [ReqPageData = GetPage( [Page] ),
Page = [Page] + 1 ],
each [ReqPageData]
),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), /*here i am able to generate table from list in contrast when is used List.Generate*/
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"), /*here aj can expand list to column*/
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Expanded Column1", {"Column1"}) /*here i try to exclude errors, but i dont know what happend and which records (if any) are excluded*/
Extracting errored page
and finnaly I am tottaly clueless not able to find the cause of this behavior on this specific server. I tested to call pages which are errored via POSTMAN, I discused this issue with author of API service and He also tried to call this API service with all parameters but server is returning every page OK, only Power query is not able to List.Transform ...
I will be grateful and appreciate any tips or advice or if somebody solved the same issue in the past ....
Kuby
No, each error line of list in step List.Transform coud by extracted as new query and there are all records from one page OK. hmmmm
Finnaly, problem described in this issue was caused by "corrupted" content of returning json. The provider of core system informed me that they found bug and after fixing on the side of servisdesk is everything OK again. I tried to find problem in Power query and problem was in servisdesk. :(

Get the only failed document response in Bulk API Elasticsearch

I am struggling with Bulk API. I am sending 100 request (index, update) in every Bulk request. It gives me response with status of each request. Suppose my 97th request got fail, I have to make it loop to find the particular error document. I think its not optimize way. If i am sending more number of Bulk request, It makes my process slow. Is there any way where i will get only failed document or count of fail/success document in response? I am using php-elasticsearch SDK .
for count of fail/success you can use this method:
get count of index before bulk action
you can ignore if if there is no index
$parameters = ["index" => "your_index","type" => "your_type"];
$response = $esclient->count($params);
$old_count = $response['count'];
use refresh key with true value in parameters that you send as bulk
this refresh this index after performing bulk action
$params['refresh'] = true;
$params['body'] = ...;
$total_count = count($params['body']) / 2; //get the total request count
$esclient->bulk($params);
after that you can use count method to find out how many index exist
$response = $esclient->count($parameters);
$new_count = $response['count'];
get the total success
$total_success = $new_count - $old_count;
get the total failed
$total_fail = $total_count- $total_success;

How can I receive real-time updates from a long asynchronous process?

I'm writing a small, internal web application that reads in form data and creates an excel file which then gets emailed to the user.
However, I'm struggling to understand how I can implement real-time updates for the user as the process is being completed. Sometimes the process takes 10 seconds, and sometimes the process takes 5 minutes.
Currently the user waits until the process is complete before they see any results - They do not see any updates as the process is being completed. The front-end waits for a 201 response from the server before displaying the report information and the user is "blocked" until the RC is complete.
I'm having difficulty understanding how I can asynchronously start the Report Creation (RC) process and at the same time allow the user to navigate to other pages of the site. or see updates happening in the background. I should clarify here that the some of the steps in the RC process use Promises.
I'd like to poll the server every second to get an update on the report being generated.
Here's some simple code to clarify my understanding:
Endpoints
// CREATE REPORT
router.route('/report')
.post(function(req, res, next) {
// Generate unique ID to keep track of report later on.
const uid = generateRandomID();
// Start report process ... this should keep executing even after a response (201) is returned.
CustomReportLibrary.createNewReport(req.formData, uid);
// Respond with a successful creation.
res.status(201);
}
}
// GET REPORT
router.route('/report/:id')
.get(function(req, res, next){
// Get our report from ID.
let report = CustomReportLibrary.getReport(req.params.id);
// Respond with report data
if(report) { res.status(200).json(report); }
else { res.status(404); }
}
CustomReportLibrary
// Initialize array to hold reports
let _dataStorage = [];
function createNewReport(data, id) {
// Create an object to store our report information
let reportObject = {
id: id,
status: 'Report has started the process',
data: data
}
// Add new report to global array.
_dataStorage.push(reportObject);
// ... continue with report generation. Assume this takes 5 minutes.
// ...
// ... update _dataStorage[length-1].status after each step
// ...
// ... finish generation.
}
function getReport(id) {
// Iterate through array until report with matching ID is found.
// Return report if match is found.
// Return null if no match is found.
}
From my understanding, CustomerReportLibrary.createNewReport() will execute in the background even after a 201 response is returned. In the front-end, I'd make an AJAX call to /report/:id on an interval basis to get updates on my report. Is this the right way to do this? Is there a better way to do this?
I think you are on the right way. HTTP 202 (The request has been accepted for processing, but the processing has not been completed) is a proper way to handle your case.
It can be done like this:
client sends POST /reports, server starts creating new report and returns:
202 Accepted
Location: http://api.domain.com/reports/1
client issues GET /reports/1 to get status of the report
All the above flow is async, so users are not blocked.

correct status code / description returning empty collection as web response

I have a method that returns all items purchased by user like this
http://myapihost/items/{user_id}
Some users won't have any purchased items, so items will be items = [];.
Whats the correct response when returning an empty collection?
//pass empty array with 200 OK
Request.CreateResponse(HttpStatusCode.OK, items);
Or
//pass message with 200 OK
Request.CreateResponse(HttpStatusCode.OK, "No items were purchased by the user.");
The problem with passing string with 200 OK is that end user would be forced to do another check before de-serializing the response to List<Item> as normally 200 OK returns List<Item>
Or
Request.CreateResponse(HttpStatusCode.NoContent, "No items were purchased by the user.");
RFC defines NoContent as
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation.
So this response cannot be correct, right?
Or
Request.CreateResponse(HttpStatusCode.NotFound, "No items were purchased by the user.");
RFC defines NotFound as
The server has not found anything matching the Request-URI.
So this response cannot be correct too, right?
Or is there any other response?
Sending the empty array would be best option for 2 reason
The end client would always expect an consistent object of course it can be null but the client code would be lot cleaner
The data transmitted via the wire would be minimum for the null case rather sending a message as client would be responsible for the message and the language in which message would be shown

How to synchronize HttpRequest or WebClient in Wp7?

Now I know i can only dowload a string asynchronously in Windows Phone Seven, but in my app i want to know which request has completed.
Here is the scenario:
I make a certain download request using WebClient()
i use the following code for download completed
WebClient stringGrab = new WebClient();
stringGrab.DownloadStringCompleted += ClientDownloadStringCompleted;
stringGrab.DownloadStringAsync(new Uri(<some http string>, UriKind.Absolute));
i give the user the option of giving another download request if this request takes long for the user's liking.
my problem is when/if the two requests return, i have no method/way of knowing which is which i.e. which was the former request and which was second!
is there a method of knowing/sychronizing the requests?
I can't change the requests to return to different DownloadStringCompleted methods!
Thanks in Advance!
Why not do something like this:
void DownloadAsync(string url, int sequence)
{
var stringGrab = new WebClient();
stringGrab.DownloadStringCompleted += (s, e) => HandleDownloadCompleted(e, sequence);
stringGrab.DownloadStringAsync(new Uri(url, UriKind.Absolute));
}
void HandleDownloadCompleted(DownloadStringCompletedEventArgs e, int sequence)
{
// The sequence param tells you which request was completed
}
It is an interesting question because by default WebClient doesn't carry any unique identifiers. However, you are able to get the hash code, that will be unique for each given instance.
So, for example:
WebClient client = new WebClient();
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client.DownloadStringAsync(new Uri("http://www.microsoft.com", UriKind.Absolute));
WebClient client2 = new WebClient();
client2.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
client2.DownloadStringAsync(new Uri("http://www.microsoft.com", UriKind.Absolute));
Each instance will have its own hash code - you can store it before actually invoking the DownloadStringAsync method. Then you will add this:
int FirstHash = client.GetHashCode();
int SecondHash = client2.GetHashCode();
Inside the completion event handler you can have this:
if (sender.GetHashCode() = FirstHash)
{
// First completed
}
else
{
// Second completed
}
REMEMBER: A new hash code is given for every re-instantiation.
If the requests are essentially the same, rather than keep track of which request is being returned. Why not just keep track of if one has previously been returned? Or, how long since the last one returned.
If you're only interested in getting this data once, but are trying to allow the user to reissue the request if it takes a long time, you can just ignore all but the first successfully returned result. This way it doesn't matter how many times the user makes additional requests and you don't need to track anything unique to each request.
Similarly, if the user can request/update data from the remote service at any point, you could keep track of how long since you last got successfull data back and not bother updating the model/UI if you get another resoponse shortly after that. It'd be preferable to not make requests in this scenario but if you've got to deal with long delays and race conditions in responses you could use this technique and still keep the UI/data up to date within a threshold of a few minutes (or however long you specify).

Resources