Proper way to convert an image to TF record Format (Writing an image to TFrecord format generate Sparse Tensor) - image

I am reading an image from the local file system, converting it to bytes format and finally ingesting the image to tf.train.Feature to convert into TFRecord format. Things are working fine until the moment I read the TFrecord and extract the image bytes format which seems to be a sparse format in the end. Below is my code for the complete process flow.
reading df and image file: No Error
import tensorflow as tf
from PIL import Image
img_bytes_list = []
for img_path in df.filepath:
with tf.io.gfile.GFile(img_path, "rb") as f:
raw_img = f.read()
img_bytes_list.append(raw_img)
defining features : No Error
write_features = {'filename': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['filename'].apply(lambda x: x.encode("utf-8")))),
'img_arr':tf.train.Feature(bytes_list=tf.train.BytesList(value=img_bytes_list)),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=df['width'])),
'height': tf.train.Feature(int64_list=tf.train.Int64List(value=df['height'])),
'img_class': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['class'].apply(lambda x: x.encode("utf-8")))),
'xmin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmin'])),
'ymin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymin'])),
'xmax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmax'])),
'ymax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymax']))}
create example: No Error
example = tf.train.Example(features=tf.train.Features(feature=write_features))
writing data in TfRecord Format: No Error
with tf.io.TFRecordWriter('image_data_tfr') as writer:
writer.write(example.SerializeToString())
Read and print data: No Error
read_features = {"filename": tf.io.VarLenFeature(dtype=tf.string),
"img_arr": tf.io.VarLenFeature(dtype=tf.string),
"width": tf.io.VarLenFeature(dtype=tf.int64),
"height": tf.io.VarLenFeature(dtype=tf.int64),
"class": tf.io.VarLenFeature(dtype=tf.string),
"xmin": tf.io.VarLenFeature(dtype=tf.int64),
"ymin": tf.io.VarLenFeature(dtype=tf.int64),
"xmax": tf.io.VarLenFeature(dtype=tf.int64),
"ymax": tf.io.VarLenFeature(dtype=tf.int64)}
reading single example from tfrecords format: No Error
for serialized_example in tf.data.TFRecordDataset(["image_data_tfr"]):
parsed_s_example = tf.io.parse_single_example(serialized=serialized_example,
features=read_features)
reading image data from tfrecords format: No Error
image_raw = parsed_s_example['img_arr']
encoded_jpg_io = io.BytesIO(image_raw)
Here it is giving error: TypeError: a bytes-like object is required, not 'SparseTensor'
image = Image.open(encoded_jpg_io)
width, height = image.size
print(width, height)
Please tell me what changes are required at the input of "image_arr" so that it will not generate sparse tensor and return a byte format ?
Is there anything that I can do to optimize my existing code?

Related

CoreML Compile Error after changing multiArray type

I am trying to use a converted '.mlmodel' from Google MediaPipe '.tflite'.
However, after changing the type of the multiArray, a CoreML compile error occurs and cannot be resolved.
The input type of mlmodel could be changed as follows.
# convert_inputType.py :convert multiArray to image type
import coremltools as ct
from coremltools.proto import FeatureTypes_pb2 as ft
spec = ct.utils.load_spec('model_coreml_float32.mlmodel') # miltiArray type
builder = ct.models.neural_network.NeuralNetworkBuilder(spec=spec)
# check input/output features
print('--- Before change:')
builder.inspect_input_features()
builder.inspect_output_features()
# change the input so the model can accept 256x256 RGB images
input = spec.description.input[0]
# del input.type.multiArrayType.shape[0]
input.type.imageType.colorSpace = ft.ImageFeatureType.RGB
input.type.imageType.width = 256
input.type.imageType.height = 256
# converted input/output features
print('--- After change:')
builder.inspect_input_features()
builder.inspect_output_features()
# save inputType-converted model
ct.utils.save_spec(spec, 'selfie_segmentation.mlmodel') # changed type
--- Before change:
[Id: 0] Name: input_1
Type: multiArrayType {
shape: 1
shape: 256
shape: 256
shape: 3
dataType: FLOAT32
}
[Id: 0] Name: activation_10
Type: multiArrayType {
dataType: FLOAT32
}
--- After change:
[Id: 0] Name: input_1
Type: imageType {
width: 256
height: 256
colorSpace: RGB
}
[Id: 0] Name: activation_10
Type: multiArrayType {
dataType: FLOAT32
}
'model_coreml_float32.mlmodel' :converted from MediaPipe TFlite at PINTO_model_zoo
'selfie_segmentation.mlmodel' :changed-type mlmode, saved
The following error occurs when loading the changed-type mlmodel into Xcode Peoject.
Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast:
----------------------------------------
SchemeBuildError: Failed to build the scheme "testSelfieSegmentation"
compiler error: Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast:
Compile CoreML model selfie_segmentation.mlmodel:
coremlc: error: compiler error: Espresso exception: "Invalid blob shape": generic_elementwise_kernel: cannot broadcast:
(1, 16, 8, 128)
(1, 16, 2, 128)
I have checked the model configuration with Newtron, and cannot find any layer like (1,16,8,128) or (1,16,2,128) which in the error message.
Is there something wrong with the change code? or does it have something to do with converted multiArray-type shape mlmodel?, Perhaps, I guess that the shape of (1,256,256,3) should be (3,256,256), but I don't know how to change it.
Any tips would be appreciated.

Google Cloud Translation API: Creating glossary error

I tried to test Cloud Translation API using glossary.
So I created a sample glossary file(.csv) and uploaded it on Cloud Storage.
However when I ran my test code (copying sample code from official documentation), an error occurred. It seems that there is a problem in my sample glossary file, but I cannot find it.
I attached my code, error message, and screenshot of the glossary file.
Could you please tell me how to fix it?
And can I use the glossary so that the original language is used when translated into another language?
Ex) Translation English to Korean
I want to visit California. >>> 나는 California에 방문하고 싶다.
Sample Code)
from google.cloud import translate_v3 as translate
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="my_service_account_json_file_path"
def create_glossary(
project_id="YOUR_PROJECT_ID",
input_uri="YOUR_INPUT_URI",
glossary_id="YOUR_GLOSSARY_ID",
):
"""
Create a equivalent term sets glossary. Glossary can be words or
short phrases (usually fewer than five words).
https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
"""
client = translate.TranslationServiceClient()
# Supported language codes: https://cloud.google.com/translate/docs/languages
source_lang_code = "ko"
target_lang_code = "en"
location = "us-central1" # The location of the glossary
name = client.glossary_path(project_id, location, glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
language_codes=[source_lang_code, target_lang_code]
)
gcs_source = translate.types.GcsSource(input_uri=input_uri)
input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)
glossary = translate.types.Glossary(
name=name, language_codes_set=language_codes_set, input_config=input_config
)
parent = client.location_path(project_id, location)
# glossary is a custom dictionary Translation API uses
# to translate the domain-specific terminology.
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(timeout=90)
print("Created: {}".format(result.name))
print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))
create_glossary("my_project_id", "file_path_on_my_cloud_storage_bucket", "test_glossary")
Error Message)
Traceback (most recent call last):
File "C:/Users/ME/py-test/translation_api_test.py", line 120, in <module>
create_glossary("my_project_id", "file_path_on_my_cloud_storage_bucket", "test_glossary")
File "C:/Users/ME/py-test/translation_api_test.py", line 44, in create_glossary
result = operation.result(timeout=90)
File "C:\Users\ME\py-test\venv\lib\site-packages\google\api_core\future\polling.py", line 127, in result
raise self._exception
google.api_core.exceptions.GoogleAPICallError: None No glossary entries found in input files. Check your files are not empty. stats = {total_examples = 0, total_successful_examples = 0, total_errors = 3, total_ignored_errors = 3, total_source_text_bytes = 0, total_target_text_bytes = 0, total_text_bytes = 0, text_bytes_by_language_map = []}
Glossary File)
https://drive.google.com/file/d/1RaladmLjgygai3XsZv3Ez4ij5uDH5EdE/view?usp=sharing
I solved my problem by changing encoding of the glossary file to UTF-8.
And I also found that I can use the glossary so that the original language is used when translated into another language.

Saving Dask DataFrame with image column to HDF5

I am trying to load images of varying sizes into a Dask DataFrame column and save the dataframe to HDF5 file format.
Here's the standard approach:
import glob
import dask.dataframe as dd
import pandas as pd
import numpy as np
from skimage.io import imread
dir = '/Users/petioptrv/Downloads/mask'
filenames = glob.glob(dir + '/*.png')[:5]
df = pd.DataFrame({"paths": filenames})
ddf = dd.from_pandas(df, npartitions=2)
ddf['images'] = ddf['paths'].apply(imread, meta=('images', np.uint8))
ddf.to_hdf('test.h5', '/data')
I get the following error message:
...
File "/Users/petioptrv/miniconda3/envs/dask/lib/python3.7/site-packages/pandas/io/pytables.py", line 2214, in set_atom_string
item=item, type=inferred_type
TypeError: Cannot serialize the column [images] because
its data contents are [mixed] object dtype
Essentially, PyTables detects that the column has an object dtype and checks if it's of type str. It's not, so it throws an exception.
I can probably hack it by opening the images into byte-arrays and converting those to strings, but that is far from the ideal scenario.
Try specifying the data_columns as suggested in this issue.
ddf.to_hdf('test.h5', '/data', format = 'table', data_columns = ['images'])

Vision API: How to get JSON-output

I'm having trouble saving the output given by the Google Vision API. I'm using Python and testing with a demo image. I get the following error:
TypeError: [mid:...] + is not JSON serializable
Code that I executed:
import io
import os
import json
# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types
# Instantiates a client
vision_client = vision.ImageAnnotatorClient()
# The name of the image file to annotate
file_name = os.path.join(
os.path.dirname(__file__),
'demo-image.jpg') # Your image path from current directory
# Loads the image into memory
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
image = types.Image(content=content)
# Performs label detection on the image file
response = vision_client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description, label.score, label.mid)
with open('labels.json', 'w') as fp:
json.dump(labels, fp)
the output appears on the screen, however I do not know exactly how I can save it. Anyone have any suggestions?
FYI to anyone seeing this in the future, google-cloud-vision 2.0.0 has switched to using proto-plus which uses different serialization/deserialization code. A possible error you can get if upgrading to 2.0.0 without changing the code is:
object has no attribute 'DESCRIPTOR'
Using google-cloud-vision 2.0.0, protobuf 3.13.0, here is an example of how to serialize and de-serialize (example includes json and protobuf)
import io, json
from google.cloud import vision_v1
from google.cloud.vision_v1 import AnnotateImageResponse
with io.open('000048.jpg', 'rb') as image_file:
content = image_file.read()
image = vision_v1.Image(content=content)
client = vision_v1.ImageAnnotatorClient()
response = client.document_text_detection(image=image)
# serialize / deserialize proto (binary)
serialized_proto_plus = AnnotateImageResponse.serialize(response)
response = AnnotateImageResponse.deserialize(serialized_proto_plus)
print(response.full_text_annotation.text)
# serialize / deserialize json
response_json = AnnotateImageResponse.to_json(response)
response = json.loads(response_json)
print(response['fullTextAnnotation']['text'])
Note 1: proto-plus doesn't support converting to snake_case names, which is supported in protobuf with preserving_proto_field_name=True. So currently there is no way around the field names being converted from response['full_text_annotation'] to response['fullTextAnnotation']
There is an open closed feature request for this: googleapis/proto-plus-python#109
Note 2: The google vision api doesn't return an x coordinate if x=0. If x doesn't exist, the protobuf will default x=0. In python vision 1.0.0 using MessageToJson(), these x values weren't included in the json, but now with python vision 2.0.0 and .To_Json() these values are included as x:0
Maybe you were already able to find a solution to your issue (if that is the case, I invite you to share it as an answer to your own post too), but in any case, let me share some notes that may be useful for other users with a similar issue:
As you can check using the the type() function in Python, response is an object of google.cloud.vision_v1.types.AnnotateImageResponse type, while labels[i] is an object of google.cloud.vision_v1.types.EntityAnnotation type. None of them seem to have any out-of-the-box implementation to transform them to JSON, as you are trying to do, so I believe the easiest way to transform each of the EntityAnnotation in labels would be to turn them into Python dictionaries, then group them all into an array, and transform this into a JSON.
To do so, I have added some simple lines of code to your snippet:
[...]
label_dicts = [] # Array that will contain all the EntityAnnotation dictionaries
print('Labels:')
for label in labels:
# Write each label (EntityAnnotation) into a dictionary
dict = {'description': label.description, 'score': label.score, 'mid': label.mid}
# Populate the array
label_dicts.append(dict)
with open('labels.json', 'w') as fp:
json.dump(label_dicts, fp)
There is a library released by Google
from google.protobuf.json_format import MessageToJson
webdetect = vision_client.web_detection(blob_source)
jsonObj = MessageToJson(webdetect)
I was able to save the output with the following function:
# Save output as JSON
def store_json(json_input):
with open(json_file_name, 'a') as f:
f.write(json_input + '\n')
And as #dsesto mentioned, I had to define a dictionary. In this dictionary I have defined what types of information I would like to save in my output.
with open(photo_file, 'rb') as image:
image_content = base64.b64encode(image.read())
service_request = service.images().annotate(
body={
'requests': [{
'image': {
'content': image_content
},
'features': [{
'type': 'LABEL_DETECTION',
'maxResults': 20,
},
{
'type': 'TEXT_DETECTION',
'maxResults': 20,
},
{
'type': 'WEB_DETECTION',
'maxResults': 20,
}]
}]
})
The objects in the current Vision library lack serialization functions (although this is a good idea).
It is worth noting that they are about to release a substantially different library for Vision (it is on master of vision's repo now, although not released to PyPI yet) where this will be possible. Note that it is a backwards-incompatible upgrade, so there will be some (hopefully not too much) conversion effort.
That library returns plain protobuf objects, which can be serialized to JSON using:
from google.protobuf.json_format import MessageToJson
serialized = MessageToJson(original)
You can also use something like protobuf3-to-dict

How to extract characters from images where their positions are known?

I have a set of png images of 300dpi . Each image is full of text (not handwritten), digits (not handwritten).
l want to extract each character and save it in a different image.
For each character in the image l have its position stored in csv file.
For instance in image1.png for a given character “k” l have its position :
“k”=[left=656, right=736,top=144,down= 286]
Is there any python library which allows to do that ?. As input l have the images (png format) and csv file that contains the position of each character of each images.
after executing the code l stack at this line :
img_charac=img[int(coords[2]):int(coords[3]),int(coords[0]):int(coords[1])]
l got the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object has no attribute '__getitem__'
So if I understood correctly, this has nothing to do with image processing, just file opening, image cropping and saving.
With a csv file looking like ,
an input image looking like
I get results like
import cv2
import numpy as np
import csv
path_csv= #path to your csv
#stock coordinates of characters from your csv in numpy array
npa=np.genfromtxt(path_csv+"cs.csv", delimiter=',',skip_header=1,usecols=(1,2,3,4))
nb_charac=len(npa[:, 0]) #number of characters
#stock the actual letters of your csv in an array
characs=[]
cpt=0
#take characters
f = open(path_csv+"cs.csv", 'rt')
reader = csv.reader(f)
for row in reader:
if cpt>=1: #skip header
characs.append(str(row[0]))
cpt+=1
#open your image
path_image= #path to your image
img=cv2.imread(path_image+"yourimagename.png")
path_save= #path you want to save to
#for every line on your csv,
for i in range(nb_charac):
#get coordinates
coords=npa[i,:]
charac=characs[i]
#actual cropping of the image (easy with numpy)
img_charac=img[int(coords[2]):int(coords[3]),int(coords[0]):int(coords[1])]
#saving the image
cv2.imwrite(path_save+"carac"+str(i)+"_"+str(charac)+".png",img_charac)
This is sort of quick and dirty, the csv opening is a bit messy for example (you could get all the info with one opening and converting), and should be adapted to your csv file anyway.

Resources