Convert binary data from URL to an image file - image

I read several articles on StackOverflow, but none of them seems to work in my case so here is the situation.
I have a webpage that is not under my control. It contains an image that is referenced in the markup as something like <img src="getimage.asp?pic=4c54aae0ea..." />. Given the URL of that image, I would like to download it, save it to disk and do something with it.
When I enter the URL directly in my browser I get a binary stream. This is the first load of characters.
ÿØÿàJFIFHHÿþLEAD Technologies Inc. V1.01ÿÛ„ÿÄ¢ }!1AQa"q2‘¡#B±ÁRÑð$3br‚ %&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyzƒ„…†‡ˆ‰Š’“”•–—˜™š¢£¤¥¦§¨©ª²³´µ¶·¸¹ºÂÃÄÅÆÇÈÉÊÒÓÔÕÖ×ØÙÚáâãäåæçèéêñòóôõö÷øùúw!1AQaq"2B‘¡±Á #
How do I convert that data to an image using e.g. C# or any other language. Since I do not control the page I have no idea of how the data is encoded - so can I still decode it?
As can be seen from the first couple of characters, the string "LEAD Technologies Inc." is included in the data so I guess its not all image data. But at least, Chrome obviously knows how to decode it. A quick Google check reveals that "LEAD technologies" is an imaging SDK, but their website doesn't seem to offer much information about it's use and Im also not proficient in image manipulation. Any ideas would be appreciated.

The first couple of characters indicate that the response is probably an jpeg file interpreted as ASCII text. I guess the Content-Type header in the HTTP response has the wrong value, probably something like text/plain or text/html instead of image\jpeg. This makes Chrome display the image as plain text.
I don't think you have to convert the data. Just save the response stream to a file and you will have a proper jpeg file:
string url = "http://my-domain/getimage.asp?pic=4c54aae0ea...";
string fileLocation = #"C:\MyImage.jpg";
var client = new WebClient();
client.DownloadFile(url, fileLocation);
The reason I think that the response is probably jpeg, is that a jpeg file begins with 0xFFD8FFE0 which looks like ÿØÿà when displayed as ISO 8859-1 encoded text.

Related

AppleScript: renaming PDF with content of PDF

I am trying to do exactly what is described in the following thread:
AppleScript/Automator: renaming PDF with extracted text content of this PDF
So I am using the Chino22's version and there are two issues with it:
First, instead of the contents of the pdf, theFileContentsText gets some metadata stuff.
Second, althought the script runs to the end, I get the following error for the last step:
error "The variable thisFile is not defined." number -2753 from "thisFile"
So, how do I get the text contents instead, and how do I define thisFile to the current pdf that is being processed in the loop?
Thanks in advance!
I would not expect the linked script to work.
Except for document metadata, extracting text content from PDF is notoriously difficult and unreliable, and not a road you want to go down if you can possibly avoid it. Adobe’s PDF file format is designed for printing, not for data processing. PDF files contain blocks of Postscript-like page drawing instructions, typically compressed, and while it’s possible for PDFs also to include the original plain text for accessibility use, most PDF generators do not do this so the only way to get the original text is by reconstructing it from those low-level drawing instructions—not a trivial job.
AppleScript’s read command only reads that raw file data; it does not parse it into drawing instructions, never mind translating those drawing instructions back into plain text. Change a PDF file’s extension to .txt and open it in a plain text editor, and you’ll see what I mean. Nasty.
If you need to work with the PDF’s original content (text, images, whatever), your best solution is to get those files before they were converted into a PDF.
If you must extract content from a PDF file, use an existing tool that knows how to do it.
For instance, if you’re lucky enough to have PDFs that contain XFDF (XML form) or accessibility data, there are 3rd-party apps and libraries to extract that content in readable form. I can’t think offhand of any that are AppleScriptable (Adobe Acrobat has only minimal AS support) so you’ll probably need to find one you can run from command line (do shell script in AS).
Or, if the PDFs have a consistent visual structure, a 3rd-party library such as Python’s PDFMiner (which I’ve used in the past) can identify blocks of characters by position and convert those back into strings with varying degrees of reliability (it has to convert font glyphs back into Unicode characters, guess at which characters are close enough to constitute a word, and where to insert space and return characters between those words). You’ll have to write some Python code to extract the bits you want, so look for tutorials to get started (or pay someone to write it for you).
But again, if you can possibly avoid having to extract text from PDF, you should. You will save yourself a lot of trouble.

Part of Image Missing From Data URL

Backstory to the below issue:
I'm using the jQuery plugin Cropit to produce an image which I get in data URL form (the user uploads an image and Cropit allows them to manipulate it, when the user is happy, Cropit exports the final image).
This data URL is attached to the product (this is a Shopify website) via Shopify properties (in a similar way you would attach text for an engraved product) and then when the order is created, I have an app listening for new orders and I pull the data URL from the order.
From testing, I can confirm that the data URL is wrong / corrupted / broken at the time the order is placed and not being broken in transit.
Original Question
I have a bit of a weird situation and I can't find any similar situations online.
I'm being sent an image in data URL format (from Shopify if it's relevant, I have written a private app and their webhook is sending me an image)
The image is in a data URL format that starts with, as an example,
.....
The problem I am having is sometimes (and it's maybe less than 10% of the time) when I get the image and try to print it, it's missing the bottom chunk of the image. In a PDF, it considers the image corrupt, and in a web browser, it just sees the bottom of the image as transparent, however much is missing.
This is what it looks like in Inspect Element on Google Chrome when you hover over the image URL (image has been purpled out for anonymity)
My question is, does anyone know why?
We can't find a correlation with browser or device type. And I'm not sure if it's because part of the data URL is somehow missing (maybe a character limit, because it's a really long string!) or if it's the type of image. Might possibly be something going wrong in the upload process?
Is anyone able to shed any light? It's such a weird issue I'm not even sure what to google!
And just to confirm, the image absolutely has to be sent in this format for a whole series of reasons, mainly Shopify restrictions so I can't send the image in file format.

I want to batch extract gps data (exif) then convert to address and save that text to a jpg

I have 1500 pictures that need the address where they were taken to be shown in the corner of the picture. I have the pictures geo-tagged.
I need help extracting the GPS data and converting that to an address.
Then getting that address and saving it into the picture in the bottom right corner. Can anyone help or point me in the right direction please?
You're going to need two things. First you need an application that will extract the EXIF data that you are interested in. You should be able to write this yourself as it is fairly simple to do. You will need the JPEG standard and just need enough of it to identify the markers; specifically the APPn markers. You are also going to need the EXIF and (possibly the) TIFF standards to figure out how to extract the data you need form the EXIF APPn marker.
Writing the information to the corner of the image is the tough part. There are probably command line applications that will allow you to do that already. If worst comes to worst, there are various language API's that will allow you to read a JPEG stream into a buffer; draw text to the buffer; then write the buffer back to a JPEG stream.
You will most likely need to use a programming language for this; I think Python would be suitable as it's easy to get started and has libraries needed for your task.
For example, in order to extract the location (coordinates) from the JPEG files you can use pyexiv2.
To transform those coordinates to addresses you need to use a geocoding service such as Google's Geocoding API - you can use their Python library directly or code your own using something like requests.
Now that you have the address data you can overlay that data onto images using Python's pillow library.
If you're looking for some code to get started let me shamelessly plug my own project called photomap; you can find code to read GPS information from images here: https://github.com/iticus/photomap/blob/master/handlers.py#L170

Why it is possible to show a base64 encoded PNG with an "image/jpeg" data URL?

Here's as an example of a base64 encoded PNG data URL (using "image/png" data type, of course):

I noticed that (in Firefox and Chrome) things work even if data type is set to "image/jpeg" (and leaving all the rest untouched) like this:

But... Why?
They're both using the image handling subsystem, which ignores the mime type and just goes with the actual format of the image.
Specifically, most browsers will translate the viewing of an image into the viewing of an HTML webpage with an <img> tag in them. Since servers lie and browsers are supposed to be able to show even badly-configured websites, the part of the browser that deals with images will in most cases completely ignore any extensions or MIME types. There was no point programming in an exception for data: URIs.

What exactly is going on with this image URL

I was copying image links from google and I'm seeing more and more urls like this. What exactly is going on here and why are developers doing this? Heres a example.

That is the image, stored as a base64 encoded string.
So they're not giving you a link to the image, and are instead giving you the image data directly.
Copy and paste everything after the first comma into e.g. https://www.base64decode.org/ and you'll see the picture.
This article gives a very detailed breakdown on why someone might choose to do so. One of the main reasons is to cut down the number of requests to your webserver. (clicking that link did not make a request to google, as you had already downloaded the data)
this image is just encoded using base64.
why people do this ? depend on the project. This way your page will be a bit heavier, but your image will be inside the page. If you use an URL, to load your image the browser will need to make a call.
Maybe the best call to do that, is this way you don't depend on another website to save and keep your image

Resources