Off-Chain Worker Framework

Off-Chain Worker Framework - nearprotocol

I haven’t entirely given up on the idea of validators moonlighting as oracles for off-chain computation…based on this extensive discussion: https://gov.near.org/t/off-chain-computation-framework/1400/6
So far from studying Sputnik’s code, I have figured out the mechanics of how to upload a blob to a smart contract. Let's say that a blob represents a storage-less contract, having only stateless functions that act only on input to the function, and return those inputs modified.
Now I’m missing the piece of how Validators can download and execute the blob. As mentioned by Ilya in the link above, the NearSDK would be able to interpret the blob (if the blob is essentially a compiled contract), but it needs to be a modified version of the SDK...
Think of this like sandbox mode…blob cannot modify state of any other contract, but can read state (forget about the internet access part for now). Results of the blob execution are then fed back to a smart contract, where they have to match the results of every other validator who executed the blob. This can be done by hash comparison (rather than looping through the results individually), so it’s not an expensive comparison, especially because it’s all or nothing.
Question: how can a Validator download the blob and execute it via a sandboxed SDK, and post the result via the regular SDK to the blockchain? I am missing a lot of architectural context…and this is bringing me to the edge of giving on the idea. Please help prevent that from happening!

If you are implementing this as a separate binary, your binary will be doing next things:
Use RPC to load the WASM file from the blockchain. See RPC reference
Use runtime-standalone to run this WASM with specific inputs. An example of using runtime standalone is here, but you will need to customize this with few things.
The result should be sent as a transaction signed by this binary again via RPC.
If you want these WASM files to have access to state, you will need to load state inside this binary. There are two options:
Modify a nearcore node to also do the above items
Run nearcore in parallel, and open the database on read when you are initializing Trie (e.g. here load from disk instead).
If you want to add more host functions (like accessing internet), you will need to fork runtime-standalone to expose those functions.

Related

Index data entered in the smart contract for offchain computation

I want to index the data entered in the near protocol smart contract for offchain computation.
How to trigger a new entry of the smart contract in offchain sql database or elasticsearch for real-time data indexing?
I can do that in the frontend, but don't know if it's the right/best method as different users can use different frontend for querying the blockchain.

Great question. There is currently not an "event" system in NEAR. So the solution will have to be polling for now.
I would recommend looking at this experimental method:
https://docs.near.org/docs/api/rpc-experimental#example-of-data-changes
The code contained in that resource is meant to be run in Terminal, similar to running a curl command. These types of RPC calls are fairly easy to understand if you look in near-api-js. For an example of how a NodeJS app uses this library which talks to the RPC, I would recommend looking in this directory of near-shell:
https://github.com/near/near-shell/tree/master/commands

Golang Cache HTTP GET Results In Memory

I am working on a CLI in Go that scrapes a webpage to collect the href attributes of all the links on the page into a slice. I want to store this slice in memory for some time so that the scraper is not being called on every execution of the CLI command. Ideally, the scraper would only be called after the cache expires or the user provides some sort of --update flag.
I came across the library go-cache and other similar libraries, but from what I could tell they only work for something that is continuously running, like a server.
I thought about writing the links to a file, but then how would I expire the results after a specific duration? Would it make sense to create a small server in the background that shuts down after a while in order to use a library like go-cache? Any help is appreciated.

There are two main approaches in these scenarios:
Create a daemon, service or background application that acts as your data repository. You can run it as an HTTP server / RPC server depending on your requirements. Your CLI application then interacts with this daemon as required;
Implement a persistence mechanism that will allow data to be written and read across multiple CLI application executions. You may use normal text files, databases or even an implementation of golang's encoding/gob to write and read your slice (a map would probably be better) to and from a binary file.
You can timestamp entries and simply remove them after their ttl expires by explicitly deleting them, or by simply not rewriting them during subsequent executions, according to the strategy / approach selected above.
The scope and number of examples for such an open ended question is too myriad to post in a single answer and will most likely require multiple specific questions.

Use a database and store as much detail as you can (fetched_at, host, path, title, meta_desc, anchors etc). You'll be able to query over the data later and it will be useful to have it in a structured format. If you don't want to deal with a db dependency you could embed something like boltdb (pure go) or sqlite (cgo).

Setting up multiple network layers in Relay Modern

I am using a react-native app with relay modern.
Currently our app's fetchQuery implementation, just does a fetch on the network (like in https://facebook.github.io/relay/docs/en/network-layer.html),
Although there is a possibility of another local-network layer like https://github.com/relay-tools/relay-local-schema which returns data from a local-db like sqlite/realm.
Is there a way to setup offline-first response from local-network layer, followed by automatic request to real network which also populates the store with fresher data (along with writing to local-db)?
Also should/can they share the same store?
From the requirements of Network.create(), it should return a promise containing the payload, there does not seem a possibility to return multiple values.
Any ideas/help/suggestions are appreciated.

What you trying to achieve its complex, and ill go for the easy approach which is long time cache.
As you might know relay modern uses a local storage and its exact copy of the data you are fetching, you can configure this store cache as per your needs, no cache on mutations.
To understand how this is achieve the best library around to customise Relay Modern or Classic network layer you can find in https://github.com/nodkz/react-relay-network-modern
My recommendation: setup your cache and watch your request.... (you going to love it)
Thinking in Relay,
https://facebook.github.io/relay/docs/en/thinking-in-relay.html

Core Data cloud sync - need help with logic

I'm in the middle of brainstorming a cloud sync solution for a Core Data app that I am currently developing. I'm planning to open source the code for this once its done, for anyone to use with their Core Data apps, so input from the community on how this system should work is much appreciated :-) Here's what I'm thinking:
Server Side
Storage Provider
As with all cloud sync systems, storage is a major piece of the puzzle. There are many ways to handle this. I could set up my own server for storage, or use a service like Amazon S3, but because I'm starting out with $0 capital, at this moment, a paid storage solution isn't a viable option. After some thought, I decided to settle with Dropbox (an already well established cloud sync application and storage provider). The pros of using Dropbox are:
It's free (for a limited amount of space)
In addition to being a storage service, it also handles cloud sync
They recently released an Objective-C SDK which makes it much easier to interface with it in Mac and iPhone apps
In case I decide to switch to a different storage provider in the future, I intend to add "services" to this cloud sync framework, basically allowing anyone to create a service class to interface with their choice of storage provider, which can then simply be plugged into the framework.
Storage Structure
This is a really difficult part to figure out, so I need as much input as I can here. I've been thinking about a structure like this:
CloudSyncFramework
======> [app name]
==========> devices
=============> (device id)
================> deviceinfo
================> changeset
==========> entities
=============> (entity name)
================> (object id)
A quick explanation of this structure:
The master "CloudSyncFramework" (name undecided) folder will contain separate folders for each app that uses the framework
Each app folder contains a devices folder and an entities folder
The devices folder will contain a folder for each device that is registered with the account. The device folder will be named according to the device ID, obtained using something like [[UIDevice currentDevice] uniqueIdentifier] (on iOS) or a serial number (on Mac OS).
Each device folder contains two files: deviceinfo and changeset. deviceinfo contains information about the device (e.g. OS version, last sync date, model, etc.) and the changeset file contains information about objects that have changed since the device last synchronized. Both files will just be simple NSDictionaries archived into files using NSKeyedArchiver.
Each Core Data entity has a subfolder under the entities folder
Under each entity folder, every object that belongs to that entity will have a separate file. This file will contain a JSON dictionary with the key-value pairs.
Simultaneous Sync
This is one of the areas where I am almost completely clueless. How would I handle 2 devices connecting and syncing with the cloud at the same time? There seems to be a high risk of things getting out of sync here, or even data corruption.
Handling migrations
Once again, another clueless area here. How would I handle migrations of the Core Data managed object model? The easiest thing to do here seems to be just to wipe the cloud data store clean and upload a new copy of the data from a device which has undergone the migration process, but this seems somewhat risky, and there may be a better way.
Client Side
Converting NSManagedObjects into JSON
Converting attributes into JSON isn't a very hard task (theres lots of code for it floating around the web). Relationships are the key problem here. In this stackoverflow post, Marcus Zarra posts code in which the relationship objects themselves are added to the JSON dictionary. However, he mentions that this can cause an infinite loop depending on the structure of the model, and I'm not sure if this would work with my method, because I store each object as an individual file.
I've been trying to find a way to get an ID as a string for an NSManagedObject. Then I could save relationships in JSON as an array of IDs. The closest thing I found was [[managedObject objectID] URIRepresentation], but this isn't really an ID for an object, its more of a location for the object in the persistent store, and I don't know if its concrete enough to use as a reference for an object.
I suppose I could generate a UUID string for each object and save it as an attribute, but I'm open for suggestions.
Syncing changes to the cloud
The first (and still best) solution that popped into my head for this was to listen for the NSManagedObjectContextObjectsDidChangeNotification to get a list of changed objects, then update/delete/insert those objects in the cloud data store. After the changes have been saved, I would need to update the changeset file for every other registered device to reflect the newly changed objects.
One problem that comes up here is, how would I handle a failed or interrupted sync?. One idea I have is to first push changes to a temporary directory on the cloud, then once that has been confirmed as successful, to merge it with the master data on the cloud so that an interruption in the middle of the sync won't corrupt data. Then I would save records of the objects that need to be updated in the cloud into a plist file or something, to be pushed during the next time the app is connected to the internet.
Retrieving changed objects
This is fairly simple, the device downloads its changeset file, figures out which objects need to be updated/inserted/deleted, then acts accordingly.
And that sums up my thoughts for the logic that this system will use :-) Any insight, suggestions, answers to problems, etc. is greatly appreciated.
UPDATE
After lots of thinking, and reading TechZens suggestions, I have come up with some modifications to my concept.
The largest change I've thought up is to make each device have a separate data store in the cloud. Basically, every time the managed object context saves (thanks TechZen), it will upload the changes to that device's data store. After those changes are updated, it will create a "changeset" file with change details, and save it into the changeset folders of the OTHER devices that are using the application. When the other devices connect to sync, they will go through the changeset folder and apply each changeset to the local data store, then update their respective data stores in the cloud as well.
Now, if a new device is registered with the account, it will find the newest copy of the data out of all the devices and download that for use as its local storage. This solves the problem of simultaneous sync and reduces the chances for data corruption because there is no "central" data store, each devices touches only its data and just updates changes rather than every device accessing and modifying the same data at the same time.
There's some obvious conflict situations to deal with, mainly in relation to deleting objects. If a changeset is downloading instructing the app to delete an object that is currently being edited, etc. there needs to be ways to deal with this.

You want to look at this pessimistic take on cloud sync: Why Cloud Sync Will Never Work.
It covers a lot of the issues that you are wrestling with. Many of them are largely intractable.
It is very, very, very difficult to synchronize information period. Adding in different devices, different operating systems, different data structures, etc snowballs the complexity often fatally. People have been working on variants of this problem since the 70s and things really haven't improve much.
The fundamental problem is that if you leave the system flexible and customizable, then the complexity of synchronizing all the variations explodes exponentially as a function of the number of customization. If you make it rigid, you can sync but you are limited in what you can sync.
How would I handle 2 devices
connecting and syncing with the cloud
at the same time?
If you figure that out, you will be rich. It's a big issue for current cloud sync providers. They real problem here is that your not "syncing" your merging. Software sucks at merging because its very hard to establish a predefined rule set to describe all the possible merges.
The simplest system is to establish either a canonical device or a device hierarchy such that the system always knows which input to choose. This however, destroys flexibility.
How would I handle migrations of the
Core Data managed object model?
The migration of the Core Data model is largely irrelevant to the server. That's something that Core Data manages internally to itself. Model migration updates the model i.e. the entity graph, not the actual data.
Converting NSManagedObjects into JSON
Modeling relationships is hard especially with tools that don't support it as easily as Core Data does. However, the URI of a permanent managed object ID is supposed to serve as a UUID that nails the object down to a specific location in a specific store on a specific device. It's not technically guaranteed to be universally unique but its close enough for all practical purposes.
Syncing changes to the cloud
I think you're confusing implementation details of Core Data with the cloud itself. If you use NSManagedObjectContextObjectsDidChangeNotification you will evoke network traffic every time the observed context changes regardless of whether those changes are persisted or not. Depending on the app, this could drive connections thousands of times in a few minutes. Instead, you only want to sync when context is saved at the most.
One problem that comes up here is, how
would I handle a failed or interrupted
sync?
You don't commit changes until the sync completes. This is a big problem and leads to corrupt data. Again, you can have flexibility, complexity and fragility or inflexibility, simplicity and robustness.
Retrieving changed objects: This is
fairly simple, the device downloads
its changeset file, figures out which
objects need to be
updated/inserted/deleted, then acts
accordingly
It's only simple if you have an inflexible data structure. Describing changes to a flexible data structure is a nightmare.
Not sure if I have helped any. None of the problems have elegant solutions. Most designer end up with rigidity and/or slow, brute force iterative merging.

Take a serious look at RestKit.
It is an open source project that aims to help with integrating iOS apps with cloud data, including but not limited to the scenario where there is a core-data model for that data on the client.
I have recently started to use it in one of my projects, and found it to be quite useful. In the core-data scenario, you implement declarative mappings between your data model and the content you GET from and POST to the server, and it takes care of things like injecting objects from the cloud into your client model, posting new objects to the server and incorporating server-generated objects IDs into your client-side model, doing all of this in a background thread and taking care of all the core-data context threading issues and so on.
RestKit by no means is a mature product, but is has a fairly good foundation and quite a few things that can use help from other contributors. Especially, if your goal is to create an open source solution, it would be great to contribute and improve something like this rather than re-invent a new solution. Unless of course, your see serious differences between what you have in mind and other existing solutions :-)

Since this post was current, there are several new options available. It is possible to develop a solution, and there are apps shipping with these solutions.
Here is a short list of the main Core Data sync options:
Apple's native Core Data/iCloud sync. (Had a rocky start. Seems better now.)
TICDS
Wasabi Sync, a paid service.
Simperium (Seems abandoned.)
ParcelKit with Dropbox Datastore API
Ensembles, the most recent. (Disclosure: I am the founder of the project)

It's like Apple answered my question for me with the announcement of the iCloud SDKs, which come complete with Core Data integration. Win!

Using Windows LockResource to access binary resource data

First, my concrete question: In my attempt to access cursor raw data, my call to LockResource succeeds, but the SizeOfResource call tells me that the data is only 20 bytes, which is just too small...
What I'm really trying to do: I am exploring possibilities for remoting cursors from mixed code server application to a CLR client application. My (quite possibly naive) idea is to use LockResource to access the binary data of a resource (embedded in a native dll), pass this data to the client and treat it in the same way as resource data that has been retrieved from a local assembly using Assembly.GetManifestResourceStream to get the resource stream and Resources.ResourceSet to iterate through the resources. I am hoping that since .NET no doubt makes the same underlying system calls as native code, this makes sense. On the other hand...
Does anyone have any comments or better ideas ? (It would of course be easier to simply provide a compatible resource package on the client and remote some cursor id, but we seem to have a requirement for cursors to be dynamically added at runtime.)
Any comments gratefully received!

In the end, I used Win32 calls to get at the cursor bitmap, serialised this and the hotspot location to the client, and recreated the cursor there using the Win32 API again. Once you've got an HCURSOR cient-side, you can construct a .NET WinForms Cursor from it if you want (but such an object cannot be serialised using straightforward .NET - otherwise that would have been a much easier way to remote it!).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio