How to extract EXIF metadata from BLOB column - oracle

I am trying to extract geolocation from EXIF metadata and the JPEG-image is stored in blob column in Oracle Database 12c using PLSQL.
Oracle Multimedia package is installed. The image contains data about geolocation before uploading, and when I download it again from the BLOB, latitude and longitude are there in metadata. But when I try to read it in database, I can read only basic metadata like image width, height, size, mimetype.
I am using this procedure to extract metadata from BLOB column:
PROCEDURE extractMetadata(inID IN INTEGER) IS
img ORDSYS.ORDIMAGE;
metav XMLSequenceType;
meta_root VARCHAR2(40);
xmlORD XMLType;
xmlXMP XMLType;
xmlEXIF XMLType;
xmlIPTC XMLType;
BEGIN
-- select the image
SELECT ordsys.ordimage(d.blob,1)
INTO img
FROM PHOTOS d
WHERE d.id = inID;
-- extract all the metadata
metav := img.getMetadata( 'ALL' );
-- process the result array to discover what types of metadata were returned
FOR i IN 1..metav.count() LOOP
meta_root := metav(i).getRootElement();
DBMS_OUTPUT.PUT_LINE(meta_root);
CASE meta_root
WHEN 'ordImageAttributes' THEN xmlORD := metav(i);
WHEN 'xmpMetadata' THEN xmlXMP := metav(i);
WHEN 'iptcMetadata' THEN xmlIPTC := metav(i);
WHEN 'exifMetadata' THEN xmlEXIF := metav(i);
ELSE NULL;
END CASE;
END LOOP;
-- Update metadata columns
--
update photos SET metaORDImage = xmlORD,
metaEXIF = xmlEXIF,
metaIPTC = xmlIPTC,
metaXMP = xmlXMP
WHERE id = inID;
END extractMetadata;
but only metaORDImage is extracted.
It seems like there is no EXIF metadata available, but when I download image to my PC and check details, there is location data:
What could be wrong here? Is it possible at all to extract EXIF from BLOB and should I store the image in some other format rather than BLOB?
UPDATE: I used Javascript library exif-js to read metada on page when the image is loaded into webpage, and once again geolocation is there. I can read it anywhere but not in PLSQL.
{
"ImageWidth": 720,
"ImageHeight": 1520,
"ExifIFDPointer": 74,
"Orientation": 0,
"GPSInfoIFDPointer": 92,
"LightSource": "Unknown",
"GPSLatitude": [
43,
xx,
17.861
],
"GPSLatitudeRef": "N",
"GPSLongitudeRef": "E",
"GPSLongitude": [
17,
xx,
7.29
],
"thumbnail": {}
}

My proposal would be a Java class, e.g. Apache Commons Imaging or org.w3c.tools.jpeg or metadata-extractor
First you need to load the Java classes into your database. Can be done with the loadjava tool:
loadjava -verbose -user username/password#DATABASE_NAME -r -o .\commons-imaging-1.0-alpha3.jar
Then you need to define the PL/SQL procedure/function:
CREATE OR REPLACE FUNCTION getMetadata(img IN BLOB) RETURN VARCHAR2
AS LANGUAGE JAVA NAME 'Imaging.getMetadata(byte[] bytes) return java.lang.String';
/
In the final step you can use the PL/SQL as usual. Above function does not work, because of wrong return type but I hope you get an idea how it works in general.
See Calling Java Methods in Oracle Database and https://www.tabnine.com/code/java/classes/oracle.sql.BLOB

Related

Proper way to convert an image to TF record Format (Writing an image to TFrecord format generate Sparse Tensor)

I am reading an image from the local file system, converting it to bytes format and finally ingesting the image to tf.train.Feature to convert into TFRecord format. Things are working fine until the moment I read the TFrecord and extract the image bytes format which seems to be a sparse format in the end. Below is my code for the complete process flow.
reading df and image file: No Error
import tensorflow as tf
from PIL import Image
img_bytes_list = []
for img_path in df.filepath:
with tf.io.gfile.GFile(img_path, "rb") as f:
raw_img = f.read()
img_bytes_list.append(raw_img)
defining features : No Error
write_features = {'filename': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['filename'].apply(lambda x: x.encode("utf-8")))),
'img_arr':tf.train.Feature(bytes_list=tf.train.BytesList(value=img_bytes_list)),
'width': tf.train.Feature(int64_list=tf.train.Int64List(value=df['width'])),
'height': tf.train.Feature(int64_list=tf.train.Int64List(value=df['height'])),
'img_class': tf.train.Feature(bytes_list=tf.train.BytesList(value=df['class'].apply(lambda x: x.encode("utf-8")))),
'xmin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmin'])),
'ymin': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymin'])),
'xmax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['xmax'])),
'ymax': tf.train.Feature(int64_list=tf.train.Int64List(value=df['ymax']))}
create example: No Error
example = tf.train.Example(features=tf.train.Features(feature=write_features))
writing data in TfRecord Format: No Error
with tf.io.TFRecordWriter('image_data_tfr') as writer:
writer.write(example.SerializeToString())
Read and print data: No Error
read_features = {"filename": tf.io.VarLenFeature(dtype=tf.string),
"img_arr": tf.io.VarLenFeature(dtype=tf.string),
"width": tf.io.VarLenFeature(dtype=tf.int64),
"height": tf.io.VarLenFeature(dtype=tf.int64),
"class": tf.io.VarLenFeature(dtype=tf.string),
"xmin": tf.io.VarLenFeature(dtype=tf.int64),
"ymin": tf.io.VarLenFeature(dtype=tf.int64),
"xmax": tf.io.VarLenFeature(dtype=tf.int64),
"ymax": tf.io.VarLenFeature(dtype=tf.int64)}
reading single example from tfrecords format: No Error
for serialized_example in tf.data.TFRecordDataset(["image_data_tfr"]):
parsed_s_example = tf.io.parse_single_example(serialized=serialized_example,
features=read_features)
reading image data from tfrecords format: No Error
image_raw = parsed_s_example['img_arr']
encoded_jpg_io = io.BytesIO(image_raw)
Here it is giving error: TypeError: a bytes-like object is required, not 'SparseTensor'
image = Image.open(encoded_jpg_io)
width, height = image.size
print(width, height)
Please tell me what changes are required at the input of "image_arr" so that it will not generate sparse tensor and return a byte format ?
Is there anything that I can do to optimize my existing code?

Error using Polybase to load Parquet file: class java.lang.Integer cannot be cast to class parquet.io.api.Binary

I have a snappy.parquet file with a schema like this:
{
"type": "struct",
"fields": [{
"name": "MyTinyInt",
"type": "byte",
"nullable": true,
"metadata": {}
}
...
]
}
Update: parquet-tools reveals this:
############ Column(MyTinyInt) ############
name: MyTinyInt
path: MyTinyInt
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=8, isSigned=true)
converted_type (legacy): INT_8
When I try and run a stored procedure in Azure Data Studio to load this into an external staging table with PolyBase I get the error:
11:16:21Started executing query at Line 113
Msg 106000, Level 16, State 1, Line 1
HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: class java.lang.Integer cannot be cast to class parquet.io.api.Binary (java.lang.Integer is in module java.base of loader 'bootstrap'; parquet.io.api.Binary is in unnamed module of loader 'app')
The load into the external table works fine with only varchars
CREATE EXTERNAL TABLE [domain].[TempTable]
(
...
MyTinyInt tinyint NULL,
...
)
WITH
(
LOCATION = ''' + #Location + ''',
DATA_SOURCE = datalake,
FILE_FORMAT = parquet_snappy
)
The data will eventually be merged into a Data Warehouse Synapse table. In that table the column will have to be of type tinyint.
I have the same issue and good support plan in Azure, so I've got an answer from Microsoft:
there is a known bug in ADF for this particular scenario: The date
type in parquet should be mapped as data type date in Sql sever
however, ADF incorrectly converts this type to Datetime2 which causes
a conflict in PolyBase. I have confirmation for the core engineering
team that this will be rectified with a fix by the end of November and
will be published directly into the ADF product.
In the meantime, as a workaround:
Create the target table with data type DATE as opposed to DATETIME2
Configure the Copy Activity Sink settings to use Copy Command as opposed to PolyBase
but even Copy command don't work for me, so only one workaround is to use Bulk insert, but Bulk is extremely slow and on big datasets it's would be a problem

Unable to load data into parquet file format?

I am trying to parse log data into parquet file format in hive , the separator used is "||-||".
The sample row is
"b8905bfc-dc34-463e-a6ac-879e50c2e630||-||syntrans1||-||CitBook"
After performing the data staging I am able to get the result
"b8905bfc-dc34-463e-a6ac-879e50c2e630 syntrans1 CitBook ".
While converting the data to parquet file format I got error :
`
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2185)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:137)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
... 24 more
This is what I have tried
create table log (a String ,b String ,c String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES (
"field.delim"="||-||",
"collection.delim"="-",
"mapkey.delim"="#"
);
create table log_par(
a String ,
b String ,
c String
) stored as PARQUET ;
insert into logspar select * from log_par ;
`
Aman kumar,
To resolve this issue, run the hive query after adding the following jar:
hive> add jar hive-contrib.jar;
To add the jar permanently, do the following:
1.On Hive Server host, create a /usr/hdp//hive/auxlib directory.
2.Copy /usr/hdp//hive/lib/hive-contrib-.jar to /usr/hdp//hive/auxlib.
3.Restart the HS2 server.
Please check further reference.
https://community.hortonworks.com/content/supportkb/150175/errororgapachehadoophivecontribserde2multidelimits.html.
https://community.hortonworks.com/questions/79075/loading-data-to-hive-via-pig-orgapachehadoophiveco.html
Let me know,if you face any issues

how to insert a giant geometry into a field of type SDO_GEOMETRY?

I am trying to insert this http://pastebin.com/cKXnCqx7 geometry into an SDO_GEOMETRY field, as follows:
declare
str clob = http://pastebin.com/cKXnCqx7
begin
INSERT INTO TEMP_TEST_GEOMETRY VALUES (SDO_CS.TRANSFORM (SDO_GEOMETRY (str, 4674), 1000205));
end;
And is generating the following error:
The maximum size of a string literal is 32K in PL/SQL. You're trying to assign a 98K+ literal, which is what is causing the error (although since that refers to 4k, that seems to be from a plain SQL call rather than the PL/SQL block you showed). It isn't getting as far as trying to create the sdo_geometry object from the CLOB.
Ideally you'd load the value from a file or some other mechanism that presents you with the complete CLOB. If you have to treat it as a literal you'd have to manually split it up into chunks; at least 4 with this value if you make them large:
declare
str clob;
begin
dbms_lob.createtemporary(str, false);
dbms_lob.append(str, 'POLYGON ((-47.674240062208945 -2.8066454423517624, -47.674313162 -2.8066509996, <...snip...>');
dbms_lob.append(str, '47.6753521374 -2.8067875453, -47.6752566506 -2.8067787567, -47.6752552377 -2.8067785117, <...snip...>');
dbms_lob.append(str, '-47.658134547 -2.8044846153, -47.6581360233 -2.8044849964, -47.6581811229 -2.8044926289, <...snip...>');
dbms_lob.append(str, '-47.6717079633 -2.8057792966, -47.6717079859 -2.80577931, -47.6717083252 -2.8057795101, -47.6718125136 -2.8058408619, -47.6719547186 -2.8059291721, -47.6719573483 -2.8059307844, -47.6719575243 -2.8059308908, -47.6719601722 -2.8059324729, -47.6720975574 -2.8060134925, -47.6721015308 -2.8060157903, -47.6721017969 -2.8060159412, -47.6721058088 -2.806018171, -47.6721847946 -2.8060611947, -47.6721897923 -2.8060650985, -47.6722059263 -2.8060767675, -47.6722070291 -2.8060775047, -47.6722416572 -2.8060971165, -47.6722428566 -2.8060976832, -47.6722611616 -2.8061055189, -47.6722666301 -2.8061076243, -47.6722849174 -2.806116847, -47.6722862515 -2.8061174528, -47.6723066339 -2.8061257231, -47.6723316499 -2.8061347029, -47.6723426416 -2.8061383836, -47.6723433793 -2.8061386131, -47.672354519 -2.8061418177, -47.6723803034 -2.8061486384, -47.6725084039 -2.8061908942, -47.6725130545 -2.8061923817, -47.6725133654 -2.806192478, -47.6725180423 -2.806193881, -47.6728423039 -2.8062879629, -47.6728698649 -2.8062995965, -47.6728952856 -2.8063088527, -47.672897007 -2.8063093833, -47.672949984 -2.8063200428, -47.6729517767 -2.8063202193, -47.672966443 -2.8063209226, -47.6729679855 -2.8063213223, -47.6733393514 -2.8064196858, -47.6733738728 -2.8064264543, -47.6733761939 -2.8064267537, -47.6733804796 -2.8064270239, -47.6733890639 -2.806431641, -47.6734057692 -2.8064398944, -47.6734069006 -2.8064404055, -47.6734241363 -2.8064474849, -47.6736663052 -2.8065373005, -47.6736676833 -2.8065378073, -47.6736677752 -2.8065378409, -47.673669156 -2.8065383402, -47.6737754465 -2.8065764489, -47.673793217 -2.8065850801, -47.6737945765 -2.8065856723, -47.6738366895 -2.8066000123, -47.6738381279 -2.8066003728, -47.6738811819 -2.8066074503, -47.6739137725 -2.8066153813, -47.6739159177 -2.806615636, -47.6739318345 -2.8066165588, -47.673951326 -2.806622013, -47.6739530661 -2.8066223195, -47.6739793945 -2.8066256302, -47.6740948445 -2.8066344025, -47.674240062208945 -2.8066454423517624))');
INSERT INTO TEMP_TEST_GEOMETRY
VALUES (SDO_CS.TRANSFORM (SDO_GEOMETRY (str, 4674), 1000205));
dbms_lob.freetemporary(str);
end;
/
You can make the chunks smaller and have many more appends, which may be more manageable in some respects, but is more work to set up. It's still painful to do manually though, so this is probably a last resort if you can't get the value some other way.
If the string is coming from an an HTTP call you can read the response in chunks and convert it into a CLOB as you read it in much the same way, using dbms_lob.append. Exactly how depends on the mechanism you're using to currently get the response. Also worth noting that Oracle 12c has built-in JSON handling; in earlier versions you may be able to use the third-party PL/JSON module, which seems to handle CLOBs.

Matlab insert image(.jpg) file to word file

I want to insert the image to word file, If I try like the below code the word file show some unknown symbols like
"ΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰΰ"
My code:
figure,imshow(img1);
fid = fopen('mainfile.doc', 'a', 'n', 'UTF-8');
fwrite(fid, img1, 'char');
fclose(fid);
open('mainfile.doc');
fwrite won't do this directly. You could try Matlab report generator if you have access to it or a file exchange submission, OfficeDoc.

Resources