Difference between net/http/pprof and runtime/pprof - go

I have been reading about the profiling here it is explained using net_http_pprof and here runtime_pprof. So, what are difference between these two and which one should be prefer over another. And please don't paste the overview definition from the given links

net/http/pprof exposes the pprof profiler data via a web interface. If you're writing a server-based system, this may be useful since you may not have access to the host.
runtime/pprof will write the profiler data to a data file that you can then convert into various image or document types. If you're writing a system that can't or won't have a web interface, this is probably the one to use. Either way they expose the same data.

Related

Looking for a faster way to export Resources in HAPI FHIR

Following the documentation on $export API (link), it is possible to export FHIR data, e.g. Patient, Observation, etc. However, this process is very slow (take many days) if the Resources to be exported are substantial in size. Querying database, serializing, writing down the Resources into NDJSON files, then communicate the data over Internet, etc. is an expensive process.
Is there any other way to export FHIR data? E.g. a more efficient technique? low-level exporting (some customized programs to export data directly from the database on the server)? Or perhaps increase the server capacity (CPU, memory, process priority/multi-threading configuration, etc.)
Feel free to suggest whatever solutions. The ultimate goal is to have NDJSON files (after the export) in order to ingest them into other 3rd party data-warehouses for further analysis.
Environment
HAPI FHIR 6.0.4 REST Server
If you have access to the database, you can write whatever program you want to get the export done, to any format you would like - provided you know how to convert the FHIR data. Or maybe you can use existing database tools that can dump data to files.
If you really need to go through the REST API, you can also request the resources with regular searches and write a program that iterates over the result Bundles to get all the data out and into NDJSON files. However, this will lead to a lot of calls and network load. One of the reasons for $export is to prevent clogging the system/network, and have the server be available for other requests while exporting. Maybe after the initial export you can use the $export parameters to only export changed data, which should be quicker than the whole set.

Off-Chain Worker Framework

I haven’t entirely given up on the idea of validators moonlighting as oracles for off-chain computation…based on this extensive discussion: https://gov.near.org/t/off-chain-computation-framework/1400/6
So far from studying Sputnik’s code, I have figured out the mechanics of how to upload a blob to a smart contract. Let's say that a blob represents a storage-less contract, having only stateless functions that act only on input to the function, and return those inputs modified.
Now I’m missing the piece of how Validators can download and execute the blob. As mentioned by Ilya in the link above, the NearSDK would be able to interpret the blob (if the blob is essentially a compiled contract), but it needs to be a modified version of the SDK...
Think of this like sandbox mode…blob cannot modify state of any other contract, but can read state (forget about the internet access part for now). Results of the blob execution are then fed back to a smart contract, where they have to match the results of every other validator who executed the blob. This can be done by hash comparison (rather than looping through the results individually), so it’s not an expensive comparison, especially because it’s all or nothing.
Question: how can a Validator download the blob and execute it via a sandboxed SDK, and post the result via the regular SDK to the blockchain? I am missing a lot of architectural context…and this is bringing me to the edge of giving on the idea. Please help prevent that from happening!
If you are implementing this as a separate binary, your binary will be doing next things:
Use RPC to load the WASM file from the blockchain. See RPC reference
Use runtime-standalone to run this WASM with specific inputs. An example of using runtime standalone is here, but you will need to customize this with few things.
The result should be sent as a transaction signed by this binary again via RPC.
If you want these WASM files to have access to state, you will need to load state inside this binary. There are two options:
Modify a nearcore node to also do the above items
Run nearcore in parallel, and open the database on read when you are initializing Trie (e.g. here load from disk instead).
If you want to add more host functions (like accessing internet), you will need to fork runtime-standalone to expose those functions.

Golang Cache HTTP GET Results In Memory

I am working on a CLI in Go that scrapes a webpage to collect the href attributes of all the links on the page into a slice. I want to store this slice in memory for some time so that the scraper is not being called on every execution of the CLI command. Ideally, the scraper would only be called after the cache expires or the user provides some sort of --update flag.
I came across the library go-cache and other similar libraries, but from what I could tell they only work for something that is continuously running, like a server.
I thought about writing the links to a file, but then how would I expire the results after a specific duration? Would it make sense to create a small server in the background that shuts down after a while in order to use a library like go-cache? Any help is appreciated.
There are two main approaches in these scenarios:
Create a daemon, service or background application that acts as your data repository. You can run it as an HTTP server / RPC server depending on your requirements. Your CLI application then interacts with this daemon as required;
Implement a persistence mechanism that will allow data to be written and read across multiple CLI application executions. You may use normal text files, databases or even an implementation of golang's encoding/gob to write and read your slice (a map would probably be better) to and from a binary file.
You can timestamp entries and simply remove them after their ttl expires by explicitly deleting them, or by simply not rewriting them during subsequent executions, according to the strategy / approach selected above.
The scope and number of examples for such an open ended question is too myriad to post in a single answer and will most likely require multiple specific questions.
Use a database and store as much detail as you can (fetched_at, host, path, title, meta_desc, anchors etc). You'll be able to query over the data later and it will be useful to have it in a structured format. If you don't want to deal with a db dependency you could embed something like boltdb (pure go) or sqlite (cgo).

Which API does Windows Resource Monitor use?

Windows Resource Monitor displays (among other things) which files on disk are currently accessed by which processes. And it does that in realtime. How?
I know that it probably uses ETW and that I can generate traces with tools like xperf. But how to get realtime information without having to start, stop and parse a trace file?
I need to programmatically access the data, i.e. from C# or C++.
wOpenTrace/ProcessTrace/StopTrace can get the data in real-time as long as you know the provider GUID. They can run on Win2000 but you need to parse the raw data in your callback functions. To convert raw data into human-readable text, we need the TMF/MOF. Not sure if they are public though.
For Vista/Win7, there is a new set of TDH (Trace Data Helper) APIs (eg: TdhFormatProperty).
Scroll down a little of above links and you can see them. The good thing about TDH is they can parse the data for you (still need to provide TDH the TMF/MOF though).
I tried to write my own .etl to readable .txt program using Open/Process/StopTrace API (because I need to support XP). I found out it's quite difficult. The TMF file is not hard to interpret since it pure text. The hard thing is to decipher more than 50 different undocumented prinf-alike format-specifications' internal structures. So I gave up in the end and stick to the powerful tracefmt.exe provided in Microsoft WDK.

How can Windows API calls to an application/service be monitored?

My company is looking at implementing a new VPN solution, but require that the connection be maintained programatically by our software. The VPN solution consists of a background service that seems to manage the physical connection and a command line/GUI utilty that initiates the request to connect/disconnect. I am looking for a way to "spy" on the API calls between the front-end utilty and back-end service so that our software can make the same calls to the service. Are there any recommended software solutions or methods to do this?
Typically, communications between a front-end application and back-end service are done through some form of IPC (sockets, named pipes, etc.) or through custom messages sent through the Service Control Manager. You'll probably need to find out which method this solution uses, and work from there - though if it's encrypted communication over a socket, this could be difficult.
Like Harper Shelby said, it could be very difficult, but you may start with filemon, which can tell you when certain processes create or write to files, regmon, which can do the same for registry writes and reads, and wireshark to monitor the network traffic. This can get you some data, but even with the data, it may be too difficult to interpret in a manner that would allow you to make the same calls.
I don't understand why you want to replace the utility, instead of simply running the utility from your application.
Anyway, you can run "dumpbin /imports whatevertheutilitynameis.exe" to see the static list of API function names to which the utility is linked; this doesn't show the sequence in which they're called, nor the parameter values.
You can then use a system debugger (e.g. Winice or whatever its more modern equivalent might be) to set breakpoints on these API, so that you break into the debugger (and can then inspect parameter values) when the utility invokes these APIs.
You might be able to glean some information using tools such as Spy++ to look at Windows messages. Debugging/tracing tools (Windbg, or etc.) may allow you to see API calls that are in process. The Sysinternals tools can show you system information to some degree of detail of usage.
Although I would recommend against this for the most part -- is it possible to contact the solution provider and get documentation? One reason for that is fragility -- if a vendor is not expecting users to utilize that aspect of the interface, they are more likely to change it without notice.

Resources