How can I in a datafactory pipeline download a pdf file(or any type of file)? to a blob storage , this file are searching throutgh an api but they are in base 64
Not very sure which API you download the pdf file from, you could think about these connectors in Data Factory:
REST connector specifically support copying data from RESTful APIs;
HTTP connector is generic to retrieve data from any HTTP endpoint,
e.g. to download file. Before REST connector becomes available, you
may happen to use the HTTP connector to copy data from RESTful API,
which is supported but less functional comparing to REST connector.
Web table connector extracts table content from an HTML webpage.
These may achieve your request.
Update:
At last now, you decided to choose Logic app to do it for cost concerns:
"During the weekend I was trying to do this by DataFlow, but
unfortunately it is not possible to do this type of integration by
Data Factory, for this reason I will have to make a small logic app
that allows me to do it, I do it by logic app because it is what
represents the lowest cost to me."
Related
We have a use case of data streaming from the main transactional system to other downstream such as data analytics and machine learning team.
One of the requirements are to ensure data governance that data source can control who can read which column, and potentially lifecycle of a data to ensure data siting in another domain gets purged should the source data removed it, such as if a user deletes the account, we need to make sure the data in all downstream gets removed.
While we are considering Thrift, Avro and ProtoBuf, what are the common frameworks that we can use for such data governance? Do any of these protocol supports metadata for such data governance around data authorization, lifecycle?
Let me get this straight:
protobuf is not a security device; to someone with the right tools it is just as readable as xml or json, with the slight issue that it can be uncertain how to interpret some values;
It's not of a much difference than JSON nor XML. It is just an interface language. Sure, it has encoding, it is a bit different and a lot more customizable, but it does in no way confront security. It is up to you to secure the channel between sender and receiver.
My app use a Google cloud Firestore instance. Among the data my App manages there some classical data (string, number, ...): No problem with that / Firestore handle these use case easily.
But my app also need to consume images that are linked to the other data.
So I'm looking for the right solution to manage images.: I try to use the "reference" type Field from my Firestore instance but I'm not sure that the right way...
Is there another solution outside Firestore?
What about Google cloud Filestore?: It seems available only from an app engine or a VM...
I try to use the "reference" type Field from my Firestore instance but I'm not sure that the right way...
Is there another solution outside Firestore?
What about Google cloud Filestore?: It seems available only from an app engine or a VM...
Disclosure: I work on the Firebase team at Google.
When I want to use both structured and unstructured data in my application, I use Cloud Firestore for the structured data, and Cloud Storage for the unstructured data. I use both of these through their Firebase SDKs, so that I can access the data and files directly from within my application code, or from server-side code (typically running in Cloud Functions).
There is no built-in reference type between Firestore and Storage, so you'll need to manage that yourself. I usually store either the path to the image in Firestore, or the download URL of the image. The choice between these two mostly depends on whether I want the file to be publicly accessible, or whether access needs to be controlled more tightly.
Since there is no managed relationship between Firestore and Storage (or any other Firebase/Google Cloud Platform services), you'll need to manage this yourself. This means that you'll need to write the related data (like the path above), check for its integrity when reading it (and handle corrupt data gracefully), and consider periodically running a script that removes/fixes up corrupt data.
I have a React Native app integrated with Relay and I want to delivery an offline-first experience for users.
So, in the first app launch a placeholder should be shown while data is being loaded. After that, every time the app be launched I want to show the last cached data while a fresh data is loaded.
I found this issue from 2015 and based on eyston's answer I've tried to implement a CacheManager based on relay-cache-manager using the AsyncStorage. With the CacheManager I can save and load relay records from cache but when the network is disabled the app isn't able to show cached data.
Is there any way of use relay cached data while relay is fetching fresh data?
We have a production app which uses Relay and RealmDB for offline experience. We took a separate approach from CacheManager because CacheManager was not quite ready at that time. We used relay-local-schema for this.
We defined the entire schema required for mobile using relay-local-schema. This could be the same file as what your backend server would be using for defining graphql schema and change the resolve function to resolve the data from realm db. For this we also created schema in realmdb which had nearly same structure as the graphql schema for simplicity of writing data returned by backend server to realmdb. You can also automate generating of this schema by using the graphql introspection query. We defined a custom network layer where we made sure that all Relay queries always touch the local db. In the sendQueries function all queries are resolved with relay-local-schema which gets resolved very fast and react views shows the old data and at same time a network request is made for each request in sendQueries function. On receiving data from network request it is written in realmdb and Relay in-memory is store is also populated with the new data, this automatically refreshes all the react views whose data changed. To write data to Relay in-memory store we used the following undocumented method
Relay.Store.getStoreData().handleQueryPayload(query, response);
You can get query object from request that you receive in sendQueries function using request.getQuery().
Our current implementation is bit tied up with our business logic and hence it is difficult to open source this logic. I'll try to provide a demo app is possible.
I need to create a website that reads contents of different websites and help to compare them.
One of the examples having a similar website
http://www.mysmartprice.com/mobile/samsung-galaxy-grand-2-msp3633
This helps us to compare prices of samsung mobile between different online websites.
Now I need to know :
1. How to read data from different websites.
Using java, I can read and fetch html data. But question arises, what is the best way to parse the html content to get desired information?
I want to use Spring XD. Please suggest best strategy?
Regards,
Jubin
I think you need to develop a java application for each data source, and then develop a custom module "source", and use Spring xd to ingest the data.
Another solution is to develop the application, make your applications load required data to csv files and trasfer them into a path like /tmp/xd/input automatically when the program runs, and then use Spring XD to ingest the data from csv files into whatever destination you need.
We have an application provided by a third party which takes a stream of market data (provided by said third party), and writes it into a JDBC compatible database.
The only configuration parameters it has are the JDBC connection string, plus settings allowing us to pick what pieces of data we'd like to be stored in this database.
This is very good for static data, but we'd like to feed this data into our internal ActiveMQ messaging fabric (in addition to writing it into the DB).
The database updates are triggered by pushes of market data to us. I'd like to have this application write the data directly to a set of MQ topics by implementing some kind of jdbc "facade" that would re-route the data directly into MQ.
What I don't want to do is poll the database for new information - as I want to keep the same fluidity of the data (e.g. fast moving stocks will generate a lot more data than slow moving - and we'd want to retain this).
Advice and pointers are very much welcome!
Camel is the answer, but potentially only if you're ok with polling the database. It's great for integration issues like this. If there was some other trigger that you could work with, you could use that to cause the database to be read.