How do I verify if data is new?

How do I verify if data is new? - algorithm

I need to receive some data from xml document from the web and
display them but only if I didn't output them earlier.
I check xml document every 5 minutes.
Data items don't have any timestamps associated with them. (So I can't compare them)
What is the best way to check if data is new?

The XML you're reading may not have any timestamps in it, but can you provide a specific example of the web resource you're accessing?
If it is just a resource accessed over regular HTTP, and if the HTTP server is following standards, there should be a Last-Modified HTTP Header that you could use to determine when the file was last modified. (Details at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html.)

Related

JMeter encoding issue on "application/soap+msbin1"

Working on JMeter and trying to send the soap request to server and shows the below error msg.
Error Msg:- Cannot process the message because the content type 'application/soap+msbin1' was not the expected type 'application/xml; charset=utf-8'.
We need help to encode XML to 'application/soap+msbin1' format.

Bit late to the party, but I encountered a similar issue - I had a template for SOAP request which uses embedded-binary XML (xop:Include cid="...") and had to scratch my head to figure out how to do that with the stock HTTP Request.
The answer: you can't - not in a simple way. To solve the issue, I ended up customizing JMeter (I also looked at HTTPRawRequest as well but it doesn't seem to support https and I would have to rewrite a lot of the test script to use that). Since HTTP request does 99% of the job, the quickest way to support binary data is to change the source code to handle binary data.
The main issues are two: the Function interface in JMeter is designed around returning String, not byte[]. So already __FileToString() (which I used to read an external binary file to use) encodes the content of the file . Secondly, the HTTP Request Sampler and HTTPHC4Impl itself (excluding the "upload file" bit) encodes the parts of the HTTP request before sending it over to the wire.
Changing that implied changes in Function, AbstractFunction, CompoundVariable and create a new function class FileToStringBinary which encode the binary data in a way that it can be decoded after (by changes made to HTTPHC4Impl).
If I have the time I'll find someplace where to post the idea and the source (can't submit to JMeter because my update to HTTPHC4Impl is limited to handle the specific requests I need to test, where the embedded binary is in a multipart/related part, and I have no time or inclination to handle the general cases), but if you still need help to make it work, drop a line.

How to do persistent caching in golang?

I am creating a command line utility using go , which will request particular server for data and will get the data from server in JSON response. Utility can do multiple requests for multiple products .
I am able to do multiple requests and get the response for each product properly in JSON format.
Now , I want to save the result locally in caching or local files. By which on every request I will check the local cache before sending request to server . If data is available for that product then no request will be send .
I want to save the whole json response in cache or local file and keep that data every time before doing any request to server for data.
Use Case :
Products {"A","B","C","D","E"} It could be any number of products
Test 1 : Get data for A,B,C
Check local storage whether data is available or do request.
Save json request in storage for each product.
So for test 1 ,It will have entry like:
[{test 1 : [{product a : json response} , {product b : json response} ,{product c : json response}]}]
And in case if test fails in between and it get results for two products it should save response for 2 products and if we reran the test it will get result for 3rd product only.

There's a bunch of Go libraries to do HTTP caching transparently.
https://github.com/lox/httpcache
https://github.com/gregjones/httpcache
Choose the one you like most and satisfies your needs better, both have examples in their README to get you started real fast.
If for some reason you can't or don't wanna use third-party libraries check out this answer https://stackoverflow.com/a/32885209/322221 which uses httputil.DumpResponse and http.ReadResponse, both on Go's standard library and also the answerer provides an example implementation you can base your work on.

You can store the data inside a Map and get it via Key, you can implement it or use plugin such as go-cache.
As alternative you can use Redis for storing the data, here you can find the driver for Go

Perform sever-side caching of 3rd party images

I just added some functionality to my site which, when a user hovers their mouse over a link (to a 3rd party page), a preview of the link is created from the meta tags on the target page and displayed. I'm worried about the implications of hot-linking in my current implementation.
I'm now thinking of implementing some kind of server-side caching such that the first request for the preview fetches the info and image from the target page, but each subsequent request (up to some age limit) is served from a cache on my host. I'm relatively confident that I could implement something of my own, but is there an off-the-shelf solution for something like this? I'm self-taught so I'm guessing that my DIY solution would be less than optimal. Thanks.
Edit I implemented a DIY solution (see below) but I'm still open to suggestions as to how this could be accomplished efficiently.

I couldn't find any off-the-shelf solutions so I wrote one in PHP.
It accepts a URL as a HTTP GET parameter and does some error checking. If error-checking passes, it opens a JSON-encoded database from disk and parses the data into an array of Record objects that contain the info that I want. The supplied URL is used as the array key. If the key exists in the array, the cached info is returned. Otherwise, the web page is fetched, meta tags parsed, image saved locally, and cached data returned. The cached info is then inserted into the database. After the cached info is returned to the requesting page, each record is examined for its expiration date and expired records are removed. Each request for a cached record extends its expiration date. Lastly, the database is JSON-encoded and written back to disk.

HTTP GET vs POST for Idempotent Reporting

I'm building a web-based reporting tool that queries but does not change large amounts of data.
In order to verify the reporting query, I am using a form for input validation.
I know the following about HTTP GET:
It should be used for idempotent requests
Repeated requests may be cached by the browser
What about the following situations?
The data being reported changes every minute and must not be cached?
The query string is very large and greater than the 2000 character URL limit?
I know I can easily just use POST and "break the rules", but are there definitive situations in which POST is recommended for idempotent requests?
Also, I'm submitting the form via AJAX and the framework is Python/Django, but I don't think that should change anything.

I think that using POST for this sort situation is acceptable. Citing the HTTP 1.1 RFC
The action performed by the POST method might not result in a
resource that can be identified by a URI. In this case, either 200
(OK) or 204 (No Content) is the appropriate response status,
depending on whether or not the response includes an entity that
describes the result.
In your case a "search result" resource is created on the server which adheres to the HTTP POST request specification. You can either opt to return the result resource as the response or as a separate URI to the just created resource and may be deleted as the result resource is no longer necessary after one minute's time(i.e as you said data changes every one minute).
The data being reported changes every minute
Every time you make a request, it is going to create a new resource based on your above statement.
Additionally you can return 201 status and a URL to retrieve the search result resource but I m not sure if you want this sort of behavior but I just provided as a side note.
Second part of your first question says results must not be cached. Well this is something you configure on the server to return necessary HTTP headers to force intermediary proxies and clients to not cache the result, for example, with If-Modified-Since, Cache-control etc.
Your second question is already answered as you have to use POST request instead of GET request due to the URL character limit.

HTTP POST - nameless data VS named data

Our server A notifies 3rd party server B with an XML-formatted message, sent as HTTP POST request. It's us who specify the message format and other aspects of interaction.
We can specify that the XML is sent as
a) raw data (just the XML)
b) single POST parameter having some specific name (say, xml=XML)
The question is which way is better for the 3rd party in general, if we don't know the platform and language they are using.
I thought I had seen some problems in certain languages to easily parse the nameless raw data, though I don't remember any specific case. While my colleague insists that the parameter name is redundant, and it's really better to send the raw data without any name.

If you don't need send extra information in other post parameters the xml parameter name is redundant and innecesary as your teammate said, if the 3rd party waits only for a XML data only send the raw data in the POST body with the correct mime type and encoding and and do not complicate.
The process for Getting raw data is easy in most application server containers, so you dont care about that, most of them uses a Reader to get received data and manipulate it.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How do I verify if data is new? - algorithm

I need to receive some data from xml document from the web and display them but only if I didn't output them earlier. I check xml document every 5 minutes. Data items don't have any timestamps associated with them. (So I can't compare them) What is the best way to check if data is new?

Related

JMeter encoding issue on "application/soap+msbin1"

How to do persistent caching in golang?

Perform sever-side caching of 3rd party images

HTTP GET vs POST for Idempotent Reporting

HTTP POST - nameless data VS named data

Categories

Resources