I am trying to parse text in an image of a restaurant bill. I've been able to set up the ruby AWS SDK which has the Rekognition client using this example. Moreover, locally I have been able to make a call to Rekognition, passing an image locally.
When I make the call with #detect_text (docs), I get a response and the response has TextDetections which represent either lines or words in the image. I would like however that response to only contain TextDetections of type LINE. Here are my questions:
Is it possible to get a response back that only contains TextDetections of type LINE?
Is it possible to increase the limit of words detected in an image? Apparently according to the docs:
DetectText can detect up to 50 words in an image
That sounds like a hard limit to me.
Is there a way I can get around the limit of 50 words in an image? Perhaps I can make multiple calls on the same image where Rekognition can parse the same image multiple times until it has all the words?
Yes. You cannot detect more than 50 words in an image. A workaround is to crop the image into multiple images, and run DetectText on each cropped image.
Related
so my current task is to receive image from my host (currently s3), but the catch is that nothing should be persisted about this image, what this means is that i can not persist its url, for example since s3 always includes the same key name in the url even if it is presigned, i can not use that data directly, the solution for this would be to create a image server which would download the image from s3 and send it back to client but url for this image would be always dynamic and random (with jwt), now problem is that, base64 that i have just received is persistent, it is not changed, what can i tradeoff is that i can randomly modify characters inside the base64 string, maybe 2,3 characters that would mess up pixel or two but as long as its not noticeable thats okay with me, but this technique seems bit slow, because of bandwidth size, is there any way that i can use to make image non persistent and random every time client receives it?
Due the Hard Failure I lost the separated photos. The I recovered them using image recovery. But now all the images are in one folder. Those images may be over 500 in the same one folder.
The images have customized names also.
The images are not in the same size also.
The images are not in same dimension also.
I am unable to cluster them and separate them in to a new folder as manually and time consuming. So, is there any online solution or software to automatically cluster them and move them into a folder?
For example :
Image set 1 :
Image set 2 :
Image set 3 :
In the above set of pictures, every image has the same background. So those images should be clustered as one and put them in a folder.
As like this, is there any solution or API level solution to simplify the manual works?
If they are JPEG images, you can try running jhead on them and it should be able to find the dates in the files. See jhead.
It can then rename the files based on the date for you, then you could separate them by their names/dates.
It may also tell you the GPS latitude/longitude, so you could move them to folders based on their proximity to each other.
Try the -v option to see the full information in a file:
jhead -v recovered123.jpg
Get the time information from the EXIF metadata.
Use this to automatically name and sort the images. Since you likely did not operate two cameras at two different events at the same time, this will work extremely well. Unlesw you managed to destroy this metadata.
I currently have two buckets in S3 - let's call them photos and photos-thumbnails. Right now, when a user uploads an image from our iOS app, we directly upload that photo to the photos bucket, which triggers a lambda function that resizes the photo into a thumbnail and uploads the thumbnail into the photos-thumbnails bucket.
I now want to include some image compression for the images in the photos bucket before a thumbnail is created in the original bucket (photos). However, if I set the compression lambda function to be triggered whenever an object is created in the photos bucket, it will wind up in a never-ending loop of the user uploading the original photo, triggering the compression and placing back in the same bucket, triggering compression again, etc.
Is there a way I can intercept this before it becomes a recursive call for image compression? Or is the only way to create a third bucket?
A third bucket would probably be the best. If you really want to use the same bucket, just choose some criteria controlling whether the image in photos should be modified or not (perhaps image file size or something), then ensure that images that have been processed once fall below the threshold. The lambda will still run twice, but the second time it will examine the image and find it has already been processed and thus not process it again. To my knowledge there is no way to suppress the second run of the lambda.
Another option might be to filter based on how the object is created. The following event types can be used in S3. Use one for what your users upload (maybe POST?) and the other for what your lambda does (maybe PUT?)
s3:ObjectCreated:Put
s3:ObjectCreated:Post
s3:ObjectCreated:Copy
s3:ObjectCreated:CompleteMultipartUpload
A third bucket would work, or essentially the same thing, rename the file with a prefix after compressing and then check for that prefix before reprocessing the file.
If you name the outputs of your function in a predictable way, you can just filter any files that were created by your function at the start of the function.
However, as was mentioned previously, using a different bucket for the output would be simpler.
I am using wkhtmltopdf on my ubuntu server to generate pdfs out of html-templates.
wkhtmltopdf is therefore started from a php-script with shell_exec.
My problem is, that I want to create up to 200 pdfs at (almost) the same time, which makes the runtime of wkhtmltopdf kind of stack for every pdf. One file needs 0.6 seconds, 15 files need 9 seconds.
My idea was to start wkhtmltopdf in a screen-session to decrease the runtime, but I can't make it work from php plus this might not make that much sense, because I want to additionally summarize all pdfs in one after creation, so I would have to check if every session is terminated?!
Do you have any ideas how I can decrease the runtime for this amount of pdfs or can you give me advice how to realize this correctly and smart with screen?
My script looks like the following:
loop up to 200times {
- get data for html-template from database
- fill template-string and write .html-file
- create pdf out of html-template via shell_exec("wkhtmltopdf....")
- delete template-file
}
merge all generated pdfs together to one and send it via mail
Thank you in advance and sorry for my bad english.
best wishes
Just create a single large HTML file and convert it in one pass instead of merging multiple PDFs afterwards.
I'm trying to open an image file and store a list of pixels by color in a variable/array so I can output them one by one.
Image type: Could be BMP, JPG, GIF or PNG. Any of them is fine and only one needs to be supported.
Color Output: RGB or Hex.
I've looked at a couple libraries (RMagick, Quick_Magick, Mini_Magick, etc) and they all seem like overkill. Heroku also has some sort of difficulties with ImageMagick and my tests don't run. My application is in Sinatra.
Any suggestions?
You can use Rmagick's each_pixel method for this. each_pixel receives a block. For each pixel, the block is passed the pixel, the column number and the row number of the pixel. It iterates over the pixels from left-to-right and top-to-bottom.
So something like:
pixels = []
img.each_pixel do |pixel, c, r|
pixels.push(pixel)
end
# pixels now contains each individual pixel of img
I think Chunky PNG should do it for you. It's pure ruby, reasonably lightweight, memory efficient, and provides access to pixel data as well as image metadata.
If you are only opening the file to display the bytes, and don't need to manipulate it as an image, then it's a simple process of opening the file like any other, reading X number of bytes, then iterating over them. Something like:
File.open('path/to/image.file', 'rb') do |fi|
byte_block = fi.read(1024)
byte_block.each_byte do |b|
puts b.asc
end
end
That will merely output bytes as decimal. You'll want to look at the byte values and build up RGB values to determine colors, so maybe using each_slice(3) and reading in multiples of 3 bytes will help.
Various image formats contain differing header and trailing blocks used to store information about the image, data format and EXIF information for the capturing device, depending on the type. Probably going with a something that is uncompressed would be good if you are going to read a file and output the bytes directly, such as uncompressed TIFF. Once you've decided on that you can jump into the file to skip headers if you want, or just read those too to see or learn what's in them. Wikipedia's Image file formats page is a good jumping off place for more info on the various formats available.
If you only want to see the image data then one of the high-level libraries will help as they have interfaces to grab particular sections of the image. But, actually accessing the bytes isn't hard, nor is it to jump around.
If you want to learn more about the EXIF block, used to describe a lot of different vendor's Jpeg and TIFF formats ExifTool can be handy. It's written in Perl so you can look at how the code works. The docs nicely show the header blocks and fields, and you can read/write values using the app.
I'm in the process of testing a new router so I haven't had a chance to test that code, but it should be close. I'll check it in a bit and update the answer if that didn't work.