Read file from Azure Databricks DBFS REST 2.0 API - azure-databricks

I am working on an application to read and write files using Azure Databricks DBFS API 2.0. Reference Documentation for this API:
https://docs.azuredatabricks.net/api/latest/dbfs.html#read
I am able to upload file (lets say 1.4MB file) by converting it into base64 then divided into 1MB chunks of data.
As the read length is limited to 1MB, I am iterating a loop to read data from offset 0 to 1000000 and 1000001 to end of the file. Now the first iteration of data, 0 - 1000000, is valid and I can confirm from the original file which i used for upload.
But the second and later data iterations, the base64 data is completely different and not present in the original base64 file.
Following is my test code:
Second iteration - 0.4MB
const axios = require('axios')
const fs = require('fs')
axios({
method: 'get',
url: 'https://********.azuredatabricks.net/api/2.0/dbfs/read',
data: {
path: '/Test/backlit-beach-clouds-1684881.jpg',
offset: 0,
length: 1000000
},
headers: {
'Authorization': `Bearer ****`
}
}).then(({data}) => {
if (data) {
console.log('Success', data.bytes_read)
fs.writeFile('./one.txt', data.data, function (err) {
console.log('done', err)
})
} else {
console.log('failed')
}
})
First iteration - 1MB
const axios = require('axios')
const fs = require('fs')
axios({
method: 'get',
url: 'https://********.azuredatabricks.net/api/2.0/dbfs/read',
data: {
path: '/Test/backlit-beach-clouds-1684881.jpg',
offset: 1000001,
length: 1000000
},
headers: {
'Authorization': `Bearer ****`
}
}).then(({data}) => {
if (data) {
console.log('Success', data.bytes_read)
fs.writeFile('./two.txt', data.data, function (err) {
console.log('done', err)
})
} else {
console.log('failed')
}
})
Here, *** are replaced with relevant domain and tokens.
As you can see, the above test code samples will generate one.txt and two.txt. By using cat one.txt two.txt > final.txt I can get final.txt which I will use to decode original file.
As this is just a testing code, I haven't used any loops or better coding format. This is just to understand what went wrong.
I stuck with this for over 1 week now. I am referring other code samples written for python, but no help.
I am not trying to waste anyone's time. But please, someone help me to figure out what went wrong or any other standard procedure that I can follow?

I've run the same type of issue and solve it with an iterative extraction.
However in order to make it work I had to extract from 0 to 1000000 and 1000000 to end of the file.
Otherwise the base64 decoding part was not properly considering the beginning of the second slice.
I found that you don't need to add this extra byte from one slice to another.

Related

Memory leaks in Appcelerator Android HTTP?

I am running into what looks like a memory leak on Android using Appcelerator. I am making an HTTP GET call repeatedly until all data is loaded. This call happens about 50 times, for a total of roughly 40 MB of JSON. I am seeing the memory usage spike dramatically if this is executed. If I execute these GETs the heap size (as reported by Android Device Monitor, the preferred method to check memory according to the official Appcelerator docs) gets up to ~240 MB and stays there for as long as the app runs. If I do not execute these GETs, it only uses about 50 MB. I don't think this is a false heap reading either, because if I execute the GETs again (from page 1) I run out of memory.
I have looked through the code and cannot find any obvious leaks, such as storing all results in a global variable or something. Are the HTTP responses being cached somewhere?
Here is my code, for reference. syncThings(1, 20) (sanitized name :) ) gets called during startup. It in turn calls a helper function syncDocuments(). Here are the two functions. Don't worry about launchMainWindow() unless you think it could be relevant, but assume it does no cleanup.
function syncThings(page, itemsPerPage) {
var url = "the_url";
console.log("Getting page " + page);
syncDocuments(url,
function(response) {
if (response.totalDocumentsInQuery == itemsPerPage) {
// More pages to get
setTimeout(function() {
syncThings(page + 1, itemsPerPage);
}, 1);
} else {
// This was the last page
launchMainWindow();
}
},
function(e) {
Ti.API.error('Default error callback called for syncThings;', e);
dispatcher.trigger('app:update:stop');
});
}
function syncDocuments(url, successCallback, errorCallback) {
new HTTPRequest({
url: url,
method: 'GET',
headers: {
'Content-Type': 'application/json'
},
timeout: 30000,
success: function (response) {
Ti.API.info('Success callback called for ' + url);
successCallback(response);
},
error: function (error) {
errorCallback(error);
}
}).send();
}
Any ideas? Am I doing something wrong here?
Edit: I am using Titanium SDK 6.0.1.GA. This happens on all Android versions.
Try using the file-property of the HTTPClient: http://docs.appcelerator.com/platform/latest/#!/api/Titanium.Network.HTTPClient-property-file
otherwise the file will be loaded into memory.
There will be a memory leak fix in 6.1.0: https://github.com/appcelerator/titanium_mobile/pull/8818 that might fix something too.

docompress-zip in Node WebKit app cannot unzip downloaded file, downloaded file "corrupt" according to WinRAR

I'm attempting to write an automated installer for a *.exe file. I am using node-webkit, my unzipper is decompress-zip. I am downloading the installer via AJAX:
$.ajax({
type: "GET",
url: 'https://mywebste.com/SyncCtrl.zip',
contentType: "application/zip;",
success: function (dat) {
console.log(dat)
fs.writeFile("./SyncCtrl.zip", dat, function () {
console.log(dat)
})
},
error: function (err) {
console.log(err)
fs.writeFile("./SyncCtrl.zip", err.responseText, function () {
})
}
})
The .zip is written through the err.responseText content. I know this isn't best practice, but I haven't been able to get it into the success callback, even though the response code is 200. This is for another question though.
After I write the .zip file to disk, I wait for an authentication request, then unzip it in the success callback:
var unzip = new dc("./SyncCtrl.zip")
unzip.on('error', function (err) {
console.log("Something went terribly wrong.")
console.log(err)
})
unzip.on('extract', function (log) {
console.log("Finished!")
console.log(log)
})
unzip.on('progress', function (i, c) { //index, count (irrelevant for single file)
console.log("Extraction progress.")
})
unzip.extract({
path: "./SyncCtrl"
})
This is nearly copy/pasted directly from the decompress-zip github page. This fails, in the error handler it prints:
Error {message: "File entry unexpectedly large: 80606 (max: 4096)"}
I assume this limit is in MB? This is very confusing as in both locations the file size on disk is 1.7MB for the file I'm trying to extract. Any help is greatly appreciated.
The best way to accomplish this is to interpret the static file as a stream.
Simply using the static file on the server, this is my working node-webkit code:
var file = fs.createWriteStream("./MOCSyncCtrl.zip");
var request = https.get("https://moc.maps-adr.com/MOCSyncCtrl.zip", function (response) {
response.pipe(file);
});

Getting binary file content instead of UTF-escaped using file.get

I'd like to know if it's possible to get exact binary data using callback from drive.files.get method of NodeJS Google API. I know that object returned by calling this API endpoint is a normal request object that could be e.g. piped like this:
drive.files.get({
fileId: fileId,
alt: 'media'
}).pipe(fs.createWriteStream('test'));
However I would like to know if it's possible to get binary data from within callback using this syntax:
drive.files.get({
fileId: fileId,
alt: 'media'
}, function(err, data) {
// Here I have binary data exposed
});
As far as I know, it should be possible to get that kind of data from request during its creation, passing {encoding: null} in request options object like this:
var requestSettings = {
method: 'GET',
url: url,
encoding: null // This is the important part
};
request(requestSettings, function(err, data) {/.../})`
however it seems that Google obscures this configuration object in its library.
So my question is - is it possible to do so without interfering/hacking the library?
Ok, so i found answer that could be useful for others :)
Aforementioned drive.files.get method returns Stream object, so it could be directly handled using proper event handlers. Then, buffer parts could be concatenated into one part and sent back in callback like this:
var stream = drive.files.get({
fileId: fileId,
alt: 'media'
});
// Build buffer
var chunks = [];
stream.on('data', (chunk) => {
chunks.push(chunk);
});
stream.on('end', () => {
return cb(null, Buffer.concat(chunks));
});

Cloud Code: Creating a Parse.File from URL

I'm working on a Cloud Code function that uses facebook graph API to retrieve users profile picture. So I have access to the proper picture URL but I'm not being able to acreate a Parse.File from this URL.
This is pretty much what I'm trying:
Parse.Cloud.httpRequest({
url: httpResponse.data["attending"]["data"][key]["picture"]["data"]["url"],
success: function(httpImgFile)
{
var imgFile = new Parse.File("file", httpImgFile);
fbPerson.set("profilePicture", imgFile);
},
error: function(httpResponse)
{
console.log("unsuccessful http request");
}
});
And its returning the following:
Result: TypeError: Cannot create a Parse.File with that data.
at new e (Parse.js:13:25175)
at Object.Parse.Cloud.httpRequest.success (main.js:57:26)
at Object.<anonymous> (<anonymous>:842:19)
Ideas?
I was having trouble with this exact same problem right now. For some reason this question is already top on Google results for parsefile from httprequest buffer!
The Parse.File documentation says
The data for the file, as 1. an Array of byte value Numbers, or 2. an Object like { base64: "..." } with a base64-encoded String. 3. a File object selected with a file upload control. (3) only works in Firefox 3.6+, Safari 6.0.2+, Chrome 7+, and IE 10+.
I believe for CloudCode the easiest solution is 2. The thing that was tripping me earlier is that I didn't notice it expects an Object with the format { base64: {{your base64 encoded data here}} }.
Also Parse.Files can only be set to a Parse.Object after being saved (this behaviour is also present on all client SDKs). I strongly recommend using the Promise version of the API as it makes much easier to compose such asynchronous operations.
So the following code will solve your problem:
Parse.Cloud.httpRequest({...}).then(function (httpImgFile) {
var data = {
base64: httpImgFile.buffer.toString('base64')
};
var file = new Parse.File("file", data);
return file.save();
}).then(function (file) {
fbPerson.set("profilePicture", file);
return fbPerson.save();
}).then(function (fbPerson) {
// fbPerson is saved with the image
});

Node: Download raw bytes of jpeg without piping output

Here is what I'm trying to do:
Retrieve raw data of an image (jpeg) from a URL given to me by an API
Pass the raw data or buffer to a function that uploads it to another server
NEVER PIPE THE IMAGE TO THE DISK
I've followed every example I can find (that doesn't pipe to disk), but still the content comes out corrupted. I have tried forcing various "accept-encodings" (gzip, deflate) but they basically resolve to the same data, just compressed.
I believe this has something to do with the response encoding rather than how I am asking for the data.
Here's the code so far:
var parsedUrl = require('url').parse(PATH_TO_IMAGE)
var params = {
hostname: parsedUrl.hostname,
path: parsedUrl.path,
}
return http.get(params, function(photo_res) {
var photoData = '';
res.setEncoding('binary');
photo_res.on('data', function(chunk) {
photoData += chunk;
});
photo_res.on('end', function() {
// DO STUFF TO UPLOAD IMAGE
});
photo_res.on('error', function(err) {
console.error('Unable to download photo:', err);
return done(err);
});
});
You have a simple typographic error which may be causing Node to interpret your data stream with the incorrect type. Your error is in this line:
res.setEncoding('binary');
To avoid confusion you should keep the response variable named res, and since your data is in binary format, it might be better to keep it as a buffer.
http.get(options, function(res) {
var photoData = [];
res.setEncoding('binary');
res.on('data', function(chunk) {
photoData.push(chunk);
});
res.on('end', function() {
var photo = Buffer.concat(photoData);
});
res.on('error', function(err) {
console.error('Unable to download photo:', err);
});
});
In the example, I store all chunks of data into an array, then use Buffer.concat() to create a single buffer. It is better this way because you were originally appending your image's data onto a string, which may have cause the corruption.

Resources