Serialize/Deserialize Binary Data

Serialize/Deserialize Binary Data - lazarus

I am writing a socket based application where i am required to serialize and deserialize data in various data types such as string, integer, object (key-value pair of string and other data types) with a predefined length. If you were me how would you go about achieving it ?
I understand that i haven't provided any code and that's because i couldn't think of an appropriate way to achieve what i wanted to do
Hopefully you guys would suggest me something.
Data is formatted in the following way :
$1 is a Boolean Marker where the next byte tells whether its true or false
$2 is a string marker followed by 16 bit integer which is the length of the string
$3 is an object marker = Data is stored in key value pairs where key is always a string and value could be string, boolean etc. Object type ends with $0 $0 $9

Do create class(es) declaring properties for all of your data and implement a converter which reads from the socket and populate object(s). Depending on the complexity of your data, implement an interpreter could be of some help.

Related

h2o Steam Prediction Servlet not accepting character values from python script

I am using Steam to attempt to build a prediction service using a python preprocessing script. When python passes the cleaned data to the prediction service in the
variable:value var2:value2 var3:value3
format (as seen in the Spam Detection Example) I get a
ERROR PredictPythonServlet - Failed to parse
error from the service. When I look at the PredictPythonServlet.java file it seems to only use the strMapToRowData function which assumes every value in the input string is a number:
for (String p : pairs) {
String[] a = p.split(":");
String term = a[0];
double value = Float.parseFloat(a[1]);
row.put(term, value);
}
Are character values not allowed to be sent in this format? If so is there a way to get the PredictPythonServlet file to use the csvToRowData function that is defined but never used? I'd like to not have to use One-Hot encoding for my models so being able to pass the actual character string representation would be ideal.
Additionally, I passed the numeric representation found in the model pojo file for the categorical variables and received the error:
hex.genmodel.easy.exception.PredictUnknownTypeException: Unexpected object type java.lang.Double for categorical column home_team
So it looks like the service expects a character string but I can't figure out how to pass it along to the actual model. Any help would be greatly appreciated!

The prediction service is using EasyPredictModelWrapper and it can only use what the underlying model uses. Here it's not clear what model you use, but most use numerical float values. In the for loop code snippet you can see that the number has to be float.

I need to unwrap/parse snmp results

Let me preface this with I know next to nothing about SNMP but I am learning. I am trying to get the device name from a printer.
I get the '1.3.6.1.2.1.1.5.0' OID. But it has a lot of additional information in it and I think it's some type of wrapper, but I don't know how to unwrap it.
Here are the results of my get
varBinds=[ObjectType(ObjectIdentity(ObjectName('1.3.6.1.2.1.1.5.0')), DisplayString(b'OFHP1', subtypeSpec=ConstraintsIntersection(ConstraintsIntersection(ConstraintsIntersection(ConstraintsIntersection(), ValueSizeConstraint(0, 65535)), ValueSizeConstraint(0, 255)), ValueSizeConstraint(0, 255))))]
the printer name is OFHP1. That's all I need. Is there a command to unwrap this, or do I need to just parse it by brute force?

When it comes to SNMP, you typically deal with so called variable-binding or OID-value pairs. That is conceptually similar to key-value pairs that you may encounter in other applications.
So your varBinds is a list of objects, each object represent a ket-value pair. To get the value you need to traverse that down to the component you need:
varBind = varBinds[0] # first var-bind
oid, value = varBind # unpack var-bind into OID and value
Now, values in SNMP are typed and constrained (they are actually ASN.1 types). That is why they are not just base Python types, but specialized objects. But you can strip extra information they carry and get a pure Python string (or int) from any SNMP scalar:
py_value = str(value) # turn SNMP value object into Python str
py_value = value.prettyPrint() # turn SNMP value object into a MIB-guided, human friendly representation

Is there an off the shelf binary format that allows string caching

I am investigating migrating of a highly customized and efficient binary format to one of the available binary formats. The data is stored on some low powered mobile among other places, so performance is important requirement.
Advantage of the current format is that all strings are stored in a pool. This means that we don't repeat the same string hundred of times in file, we read it only once during deserialization and all objects are referencing it by its index. It also means that we keep only one copy in memory. So a lot of advantages :)
I was not able to find a way for capnproto or flatbuffers to support this. Or would I need to build layer on top, and in generated object use integer index to strings explicitly?
Thanks you!

FlatBuffers supports string pooling. Simply serialize a string once, then refer to that string multiple times in other objects. The string will only occur in memory once.
Simplest example, schema:
table MyObject { name: string; id: string; }
code (C++):
FlatBufferBuilder fbb;
auto s = fbb.CreateString("MyPooledString");
// Both string fields point to the same data:
auto o = CreateMyObject(fbb, s, s);
fbb.Finish(o);

You can always do this manually like:
struct MyMessage {
stringTable #0 :List(Text);
# Now encode string fields as integer indexes into the string table.
someString #1 :UInt32;
otherString #2 :UInt32;
}
Cap'n Proto could in theory allow multiple pointers to point at the same object, but currently prohibits this for security reasons: it would be too easy to DoS servers that don't expect it by sending messages that are cyclic or contain lots of overlapping references. See the section on amplification attacks in the docs.

I wanted to develop a map reduce logic to find sentence count form the input file

I am new to hadoop and have basic idea of map reduce , the input to the map function will be key and value pair. So how do i basically identify when my sentence is completed and how can i count it. Is default input format that is TextInput format can be used or can we use some other input format to do it in a easier way.

I suppose you'd just check the line for periods. Decide whether an elipses (...) should be ignored, etc. Then as each line is passed to the map() method, you'd write out a key/value counting those legitimate periods to the context. The definition of what it means to end a sentence is your call. The logic to do that should be straightforward.
You can make it so that entire sentences are passed, one at a time, to the map() method, but that's much harder to do. You basically take that same logic and put it in a new input format type and corresponding RecordReader. If you have a choice go with the logic in the map() method and not the input format type and record reader.

Whether it is possible in Microsoft. Office. Interop. Outlook.userproperties to add an array

Whether it is possible in Microsoft.Office.Interop.Outlook.UserProperties to add an array/list of integer numbers and how? Usage of type OlUserPropertyType.olEnumeration leads to an exception at a stage of adding of the parameter.

There is no array support in the MAPI-supported user properties. You would have to serialize the array to a string - OlUserPropertyType.OlText (PT_STRING8) using some serialized array format (XML, CSV, JSON, etc.).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Serialize/Deserialize Binary Data - lazarus

Do create class(es) declaring properties for all of your data and implement a converter which reads from the socket and populate object(s). Depending on the complexity of your data, implement an interpreter could be of some help.

Related

h2o Steam Prediction Servlet not accepting character values from python script

I need to unwrap/parse snmp results

Is there an off the shelf binary format that allows string caching

I wanted to develop a map reduce logic to find sentence count form the input file

Whether it is possible in Microsoft. Office. Interop. Outlook.userproperties to add an array

Categories

Resources