How to backup data of NEAR smart contract? - nearprotocol

It possible backup or make snapshots for data stored in NEAR smart contract?
I try use near view-state but this method return base64 encoded data and don't work on contracts with some data size bigger than hello world apps.

You can query for the state of any contract / account at any given block height since genesis. The chain is essentially just a giant history of backups / snapshots in a way. See this stack overflow post which outlines how you can view the data and how you can get past the Error: [-32000] Server error: State of contract _____.near is too large to be viewed error by running your own node.

Related

Azure Data Factory (Graph Data Connect/Office365 Linked Service): how to work with Binary sink dataset?

Here's what I'm doing.
My company needs me to dump all group members and their corresponding groups into an SQL database. Power Automate takes forever with too many loops and API calls...so I'm trying Data Factory for the first time.
Using the Office365 Linked Service, we can get all organization members--but the only compatible sink option is Azure Blob storage (or DataLake) because the sink MUST be binary.
Ok, fine. So we got a Azure Blob storage account configured and set up.
But now that the pipeline 'copy data' has completed (after 4 hours?), I don't know what to do with this binary data. There seems to be no function, method or dataflow option to interpret the binary data as JSON, delimited text, or otherwise. The storage account shows 1042 different blobs, ranging haphazardly from a few kilobytes to dozens of megabytes (why???). Isn't there anything in Data Factory that can interpret this binary data and allow me to dump the columns I need into SQL?
I was able to load the blob data into Power Automate and parse it into usable JSON using the base64 and json functions, but this is robbing Peter to pay Paul because I have to us a loop to load the contents of 1042 different blobs and I'm exceeding our bandwidth quota. Besides that, some of the contents of the blobs are empty!! (again...why??)
I've looked everywhere for answers, no luck. So thank you for any insight.
You can use Binary dataset in Copy activity, GetMetadata activity, or
Delete activity. When using Binary dataset, the service does not parse
file content but treat it as-is.
So, The data flow activity which is used to transform the data in Azure Data Factory isn't supported for Binary dataset.
Hence, you can use Azure Service for another approach like Azure Databricks in which you can use Python OpenCV or any other Data Engineering library in preferred programming language.

Where in the stack to best merge analytical data-warehouse data with data scraped+cached from third-party APIs?

Background information
We sell an API to users, that analyzes and presents corporate financial-portfolio data derived from public records.
We have an "analytical data warehouse" that contains all the raw data used to calculate the financial portfolios. This data warehouse is fed by an ETL pipeline, and so isn't "owned" by our API server per se. (E.g. the API server only has read-only permissions to the analytical data warehouse; the schema migrations for the data in the data warehouse live alongside the ETL pipeline rather than alongside the API server; etc.)
We also have a small document store (actually a Redis instance with persistence configured) that is owned by the API layer. The API layer runs various jobs to write into this store, and then queries data back as needed. You can think of this store as a shared persistent cache of various bits of the API layer's in-memory state. The API layer stores things like API-key blacklists in here.
Problem statement
All our input data is denominated in USD, and our calculations occur in USD. However, we give our customers the query-time option to convert the response just-in-time to another currency. We do this by having the API layer run a background job to scrape exchange-rate data, and then cache it in the document store. Individual API-layer nodes then do (in-memory-cached-with-TTL) fetches from this exchange-rates key in the store, whenever a query result needs to be translated into a specific currency.
At first, we thought that this unit conversion wasn't really "about" our data, just about the API's UX, and so we thought this was entirely an API-layer concern, where it made sense to store the exchange-rates data into our document store.
(Also, we noticed that, by not pre-converting our DB results into a specific currency on the DB side, the calculated results of a query for a particular portfolio became more cache-friendly; the way we're doing things, we can cache and reuse the portfolio query results between queries, even if the queries want the results in different currencies.)
But recently we've been expanding into also allowing partner clients to also execute complex data-science/Business Intelligence queries directly against our analytical data warehouse. And it turns out that they will also, often, need to do final exchange-rate conversions in their BI queries as well—despite there being no API layer involved here.
It seems like, to serve the needs of BI querying, the exchange-rate data "should" actually live in the analytical data warehouse alongside the financial data; and the ETL pipeline "should" be responsible for doing the API scraping required to fetch and feed in the exchange-rate data.
But this feels wrong: the exchange-rate data has a different lifecycle and integrity constraints than our financial data. The exchange rates are dirty and ephemeral point-in-time samples attained by scraping, whereas the financial data is a reliable historical event stream. The exchange rates get constantly updated/overwritten, while the financial data is append-only. Etc.
What is the best practice for serving the needs of analytical queries that need to access backend "application state" for "query result presentation" needs like this? Or am I wrong in thinking of this exchange-rate data as "application state" in the first place?
What I find interesting about your scenario is about when the exchange rate data is applicable.
In the case of the API, it's all about the realtime value in the other currency and it makes sense to have the most recent value in your API app scope (Redis).
However, I assume your analytical data warehouse has tables with purchases that were made at a certain time. In those cases, the current exchange rate is not really relevant to the value of the transaction.
This might mean that you want to store the exchange rate history in your warehouse or expand the "purchases" table to store the values in all the currencies at that moment.

Azure Data Factory Missing Blob Triggers

I have created an ADF pipeline that should trigger when a blob is added to a storage container (say container1) and copy the blob to another storage container (say container2). All my blob names are alphanumeric with '-' (basically a GUID). I see that the ADF is triggered only a few times compared to the number of blobs in container1 (i.e if I have n files in container1, the ADF is triggered only x times where x<n).
I also observed that whenever the blobs created per second in container1 is high there are more missed triggers. I am not using any event batching in the event grid. My storage account is v2 BlockBlobStorage.
Is there a way I can resolve this?
I think it is difficult for us to get the correct answer in the community. We'd better move this issue to here. After MS's stress test to find out the possibility of bugs.

Handling Parse 64KB limit

We are using Parse to send out Push notifications using the Parse REST API. We compute the audience of the push notification based on dynamic user data such as user's current location. And in our Production system, we observe that there are times when this user base can be quite high given the time of the day. During such times, we have seen the ParseException:
org.parse4j.ParseException: Neither where clause nor data may exceed 64KB
This is because the where clause has a large number of "installation ids" or "device tokens" specified as we find a large number of users in a given location.
I understand that Channels / Parse Audience is a way to deal with larger sets of users. But this requires me to store the dynamic data like a user's current location in the Parse database as part of the Installation metadata.
My questions are:
Is it the right way to implement it if we decide to store user location in Parse, which will also mean that we would need to update this Installation object per user very frequently.
Is it advisable to just send push notifications via Parse in chunks, that is first to a set of 2000 users, then the next 1000 etc.
Is there any other way to handle such a case?

Realistic Data Backup method for Parse.com

We are building an iOS app with Parse.com, but still can't figure out the right way to backup data efficiently.
As a premise, we have and will have a LOT of data store rows.
Say we have a class with 1million rows, assume we have it backed up, then want to bring it back to Parse, after a hazardous situation (like data loss on production).
The few solutions we have considered are the following:
1) Use external server for backup
BackUp:
- use the REST API to constantly back up data to a remote MySQL server (we chose MySQL for customized analytics purpose, since it's way faster and easier to handle data with MySQL for us)
ImportBack:
a) - recreate JSON objects from MySQL backup and use the REST API to send back to Parse.
Say we use the batch operation which permits 50 simultaneous objects to be created with 1 query, and assume it takes 1 sec for every query, 1million data sets will take 5.5hours to transfer to Parse.
b) - recreate one JSON file from MySQL backup and use the Dashboard to import data manually.
We just tried with 700,000 records file with this method: it took about 2 hours for the loading indicator to stop and show the number of rows in the left pane, but now it never opens in the right pane (it says "operation time out") and it's over 6hours since the upload started.
So we can't rely on 1.b, and 1.a seems to take too long to recover from a disaster (if we have 10 million records, it'll be like 55 hours = 2.2 days).
Now we are thinking about the following:
2) Constantly replicate data to another app
Create the following in Parse:
- Production App: A
- Replication App: B
So while A is in production, every single query will be duplicated to B (using background job constantly).
The downside is of course that it'll eat up the burst limit of A as it'll simply double the amount of query. So not ideal thinking of scaling up.
What we want is something like AWS RDS which gives an option to automatically backup daily.
I wonder how this could be difficult for Parse since it's based on AWS infra.
Please let me know if you have any idea on this, will be happy to share know-hows.
P.S.:
We’ve noticed an important flaw in the above 2) idea.
If we replicate using REST API, all the objectIds of all Classes will be changed, so every 1to1 or 1toMany relations will be broken.
So we think about putting a uuid for every object class.
Is there any problem about this method?
One thing we want to achieve is
query.include(“ObjectName”)
( or in Obj-C “includeKey”),
but I suppose that won’t be possible if we don’t base our app logic on objectId.
Looking for a work around for this issue;
but will uuid-based management be functional under Parse’s Datastore logic?
Parse has never lost production data. While we don't currently offer automated backups, you can request one any time you like, and we're working on making all of this even nicer. Additionally, it's easier in most cases to import the JSON export file through the data browser rather than using the REST batch.
I can confirm that today, Parse did lost my data. Or at least it appeared to be so.
After several errors where detected on multiple apps (agreed by Parse Status twitter account), we could not retrieve data for an app, without any error.
It was because an entire column of one of our class (type pointer) disappeared and data was not present anymore in the dashboard.
We are using this pointer column to filter / retrieve data, so the returned queries and collections were empty.
So we decided to recreate the column manually. By chance, recreating the column, with the same name and type, solved the issue and the data was still there... I can't explain it but I really thought, and the app reacted as if, data were lost.
So an automated backup and restore option is mandatory, it is not an option.
On December 2015 parse.com released a new dashboard with an improved export feature.
Just select your app, click on "App Settings" -> "General" -> "Export app data". Parse generates a json-file for every class in your app and sends an email to you, if the export-progress is done.
UPDATE:
Sad but true, parse.com is winding down: http://blog.parse.com/announcements/moving-on/
I had the same issue of backing up parse server data. As parse server is using mongodb that is why backing up data is not an issue I have just done a simple thing. downloaded the mongodb backup from the server. And then restored it using
mongorestore /path-to-mongodump (extracted files)
As parse has been turned to open source.Therefore we can adopt this technique.
For accidental deletes, writing a cloud function 'beforedelete' to backup the current row to another class would work.
For regular backups, manual export of changed records (use filter) will be useful. For recovery this requires you to write scripts / use import option (not so sure) in data browser. You could also write a cloud function replicate data on your backup server (haven't tried this yet).
However there are some limitations to cloud code that you should consider before venturing into it:
https://parse.com/docs/cloud_code_guide#functions-resource

Resources