Could protobuf read text file which has no schema but just data? - protocol-buffers

For example, the proto file is like this.
message {
required int key = 1;
repeated int value = 2;
}
The text file is like this where the first column indicates key while the others indicates the repeated value.
3391 [ 4847 3948 4849 ]
9483 [ 4938 48497 71 ]
...
Could protobuf read and parse this text file?

No, protobuf has no support for custom text formats.
You'll have to write custom parser code for it, which can then convert to protobuf or whatever other representation you might want.

Related

regex as protobuf message field name?

can we define regular expression in protobuf field name? I send the request as list of dictinary in client.py file
"cur_cur_bin" : [{"cur_cur_bin1_bin3_bin1" : 4,"cur_cur_bin3_bin5_bin8" : 6} ]
I defined .proto file like,
int32 cur_cur_bin1_bin3_bin1 = 1;
}
message Message{
repeated cur_cur_BIN cur_cur_bin = 1;
}```
any one can explain how to define this type of field in .proto file dynamically. because
(bin1) having some range like (1 - [1-8]) same for (bin3) like (3 -[8-11]) like this.
No, as far as I know there is no mechanism to generate field names automatically or dynamically in protoc.
You can create messages dynamically through the python-protobuf reflection API, but that probably doesn't make sense for your usecase.
Instead define a reasonable .proto format, and then write some custom Python code to convert your JSON data to that.

how we can parse proto file if it having "." and "-" sign

I create one proto file using java and another using python when I put "." and "-" sign in proto file message field name it shows error that is "missing field number". I m not able to parse that file. and if proto file not accept this character please can share list of character which proto file not accept.
Please find the protobuf documentation on proper syntax for defining messages in protos.
. is used for selecting an attribute of an object in python, so obviously one shouldn't use it in message fields of proto. Instead You can use _ as character for separating words in a single field name.
A small Sample for you :
syntax = "proto3";
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}

Get TIFF tag value (including non-ASCII characters) from TIFF images in Java 11

I am trying to read different tag values (like tags 259 (Compression), 33432 (Copyright), 306 (DateTime), 315 (Artist) etc.) from a TIFF image in Java. Can anyone suggest what is best way to get those values in Java 11 ?
i tried to get those values using tiffinfo commands (like "tiffinfo -c myfile.tif"). But i did not find any specific command in tiffinfo (libtiff) or any Java library which will give me the specific tag values (e.g. DateTime) of a TIFF image.
Update:
As haraldK suggested, i tried with ImageIO like following
try (ImageInputStream input = ImageIO.createImageInputStream(tiffFile)) {
ImageReader reader = ImageIO.getImageReaders(input).next(); // TODO: Handle reader not found
reader.setInput(input);
IIOMetadata metadata = reader.getImageMetadata(0);
TIFFDirectory ifd = TIFFDirectory.createFromMetadata​(metadata);
TIFFField dateTime = ifd.get​TIFFField(306);
String dateString = dateTime.getAsString(0);
}
But it does not give exact value of the tag. In case of non-ASCII value (ö, ü, ä etc), question marks replace the real values.
Can anyone tell me how to get the exact value (including non-ASCII) of the tag from TIFFField ?
You can use standard ImageIO, read the TIFF image metadata and get the requested values from it, like this by using some extra support classes in the JDK, starting from Java 9:
try (ImageInputStream input = ImageIO.createImageInputStream(tiffFile)) {
ImageReader reader = ImageIO.getImageReaders(input).next(); // TODO: Handle reader not found
reader.setInput(input);
IIOMetadata metadata = reader.getImageMetadata(0); // 0 is the index of first image
TIFFDirectory ifd = TIFFDirectory.createFromMetadata​(metadata);
TIFFField dateTime = ifd.get​TIFFField(306); // Yes, that's 3 F's...
String dateString = dateTime.getAsString(0); // TIFF dates are strings...
}
tiffFile must be a valid (existing, readable) java.io.File, java.io.RandomAccessFile or java.io.InputStream (or other supported input, this is plugin-based, really). If not, input will be null, and the code will fail.
You can use similar, but a lot more verbose version, that will work in older versions of Java, as long as you have a TIFF plugin:
try (ImageInputStream input = ImageIO.createImageInputStream(tiffFile)) {
ImageReader reader = ImageIO.getImageReaders(input).next(); // TODO: Handle reader not found
reader.setInput(input);
IIOMetadata metadata = reader.getImageMetadata(0); // 0 is the index of first image
// Get "native" TIFF metadata for first IFD
IIOMetadataNode root = metadata.getAsTree("com_sun_media_imageio_plugins_tiff_image_1.0");
Node ifd = root.getFirstChild();
NodeList fields = ifd.getElementsByTagName("TIFFField"); // Yes, that's 3 F's...
for (int i = 0; i < fields.getLength(); i++) {
Element field = (Element) fields.item(i);
if ("306".equals(field.getAttribute("number"))) {
// This is your DateTime (306) tag,
// now do something with it 😀
// ...
}
}
}
Hardly elegant code, though... The Java 9+ approach is much cleaner and easier to reason about.

How serialized protobuf text format looks like?

Given the following proto file
syntax = "proto3";
package tutorial;
message MyMessage {
string my_value = 1;
}
How the corresponding serialized text file should look like?
my_value : "abc"
or
MyMessage {
my_value : "abc"
}
Neither. There are two data formats found in protobuf; the more common is the binary protobuf format; the second (and rarer) is an opinionated JSON variant. So; if we assume that you're talking about the JSON version, we would expect valid JSON (note that I'm not accounting for whitespace here) similar to:
{
"my_value" : "abc"
}

Read image IPTC data

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site.
So I created this simple script to read them out:
$size = getimagesize($image, $info);
if(isset($info['APP13'])) {
$iptc = iptcparse($info['APP13']);
print '<pre>';
var_dump($iptc['2#025']);
print '</pre>';
}
This works perfectly in most cases, but it's having trouble with some images.
Notice: Undefined index: 2#025
While I can clearly see the keywords in photoshop.
Are there any decent small libraries that could read the keywords in every image? Or am I doing something wrong here?
I've seen a lot of weird IPTC problems. Could be that you have 2 APP13 segments. I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. It's possibly the problem with using several photo-editing programs or some manual file manipulation.
Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata".
Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values.
Try HEX editor and check the file "manually".
I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. You can sometimes get the IPTC info by using iptcparse($info['APP1']), but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!):
The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"
So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)), use strpos() to find each opening (<rdf:li>) and closing (</rdf:li>) XML tag, and grab the keyword between them using substr().
The following snippet works for all jpegs I have tested it on. It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id:
$content = file_get_contents(get_attached_file($attachment_id));
// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;
// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
$xmp_data_end = strpos($content, '</dc:subject>');
$xmp_data_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_data_length);
// Look for tag "rdf:Seq" where individual keywords are listed
$key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;
// Only proceed if able to find rdf:Seq tag
if ($key_data_start != FALSE) {
$key_data_end = strpos($xmp_data, '</rdf:Seq>');
$key_data_length = $key_data_end - $key_data_start;
$key_data = substr($xmp_data, $key_data_start, $key_data_length);
// $ctr will track position of each <rdf:li> tag, starting with first
$ctr = strpos($key_data, '<rdf:li>');
// Initialize empty array to store keywords
$keys = Array();
// While loop stores each keyword and searches for next xml keyword tag
while($ctr != FALSE && $ctr < $key_data_length) {
// Skip past the tag to get the keyword itself
$key_begin = $ctr + 8;
// Keyword ends where closing tag begins
$key_end = strpos($key_data, '</rdf:li>', $key_begin);
// Make sure keyword has a closing tag
if ($key_end == FALSE) break;
// Make sure keyword is not too long (not sure what WP can handle)
$key_length = $key_end - $key_begin;
$key_length = (100 < $key_length ? 100 : $key_length);
// Add keyword to keyword array
array_push($keys, substr($key_data, $key_begin, $key_length));
// Find next keyword open tag
$ctr = strpos($key_data, '<rdf:li>', $key_end);
}
}
}
I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here.
ExifTool is very robust if you can shell out to that (from PHP it looks like?)

Resources