I've run coreNLP on a large corpus and stored the annotations in JSON format by using a call to the jsonMinified() of a edu.stanford.nlp.simple.Document object.
Is there a method to create a simple.Document starting from a string containing the JSON representation of the document?
Related
I have a field in elastic that is sometimes a string and sometimes a string array. The field in my .NET model is a string array. During deserialization I would like to convert this always to a string array to match my model. Would I use IElasticsearchSerializer or is that for handling the entire source and not just a single field? Does anyone have any simple example I could try?
I have written an api to create index by specifying custom settings and mappings using Java High Level Rest Client. Version of Elastic Search is 7.The index is created successfully. The info of api is specified below.
public IndexResponse createIndexWithCustomSettingAndMappings(String indexName, String settings, String fieldsMapping)
After this I tested with localhost:9200/indexName
{"testing":{"aliases":{},"mappings":{"properties":{"category_title_en":{"type":"text"}}},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"testing"}}}}
After this I have also written an api to add document to an index but I have not specified any mapping because I want to use dynamic mapping and want elastic search to figure out the types itself.
public CreateAndUpdateResponse createDocumentById(#RequestParam String indexName)
This is the controller method. The document is basically a Java Class wherein I feed the values to it and in the service layer I pass it as a String argument by converting the Java POJO to String. The service method info is given below.
public CreateAndUpdateResponse createDocumentById(String indexName, String jsonData).
Here jsonData is the String which is prepared by converting the POJO as mentioned above.
On running this api, the document is inserted into the given index but I don't see any mappings for the fields which have been inserted. Elastic search should have made the use of dynamic mapping.
{"demo":{"aliases":{},"mappings":{"properties":{"{\"index_name\":\"demo\",\"status\":\"success\",\"updated_at\":\"Wed Sep 29 14:12:54 IST 2021\",\"created_at\":\"Wed Sep 29 14:12:54 IST 2021\",\"created_at_epoch\":0}":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"settings":{"index":{"routing":{"allocation":{"include":{"_tier_preference":"data_content"}}},"number_of_shards":"1","provided_name":"demo","creation_date":"1632904975248","number_of_replicas":"1","uuid":"1vdUxgafQDeub6GHp2C-IA","version":{"created":"7140099"}}}}}
In this response, you can see the document is inserted in index demo but I don't see the types of all the variables. There is also a field named with type given below in this response but this values looks generic. If elastic search is using dynamic mapping then where we can see what datatype it has given to the fields of the document.
{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}
I have a process that imports some of the data from external sources to elasticsearch. I use C# and NEST client.
Some of the classes have string properties that contain JSON. Same property may contain different json schema depending on source.
I want to index and analyze json objects in these properties.
I tried object type mapping using [ElasticProperty(Type=FieldType.Object)] but it doesn't seem to help.
What is the right way to index and analyze these strings?
E.g. I import objects like below and then want to query all start events of customer 9876 that have status rejected. I then want to see how they distribute over period of time (using kibana).
var e = new Event (){id=123, source="test-log" input="{type:'START',params:[{name:'customerid',value:'9876'},{name:'region',value:'EU'}]}",result="{status:'rejected'}"};
I am using java api for mongodb to parse the JSON string into Java object and storing it into MongoDB.
My JSON string will have fields that are date like the one given below.
"created":"2012-10-17 21:39:06.385987 +0000"
When I try to save the parsed java object into MongoDB it stores the value as String. I would like to store it has a datetime field. Can someone shed some light on this?
Thanks and Regards,
Balaji.R
Java driver, as other default drivers, maps basic language types to MongoDB types, so to save date to mongo, you just need to save the Date object, not date string.
Here's simple sample from MongoDB documentation.
What's the difference between these two packages:
org.apache.hadoop.hive.serde2.objectinspector
org.apache.hadoop.hive.serde2.typeinfo
Is one a newer API? Are they both current, but somehow different? They seem pretty similar to me :/
since the two package both under the specifier serde2, i think both of them are in currently use
TypeInfo stores information of a type, and each type with exactly one object to represent it. so, TypeInfo is just a read-only information deal with the object's type(category, type name, etc)
Hive has multiple in-memory data format for a given type (e.g. Integer: Integer, IntWritable and LazyInteger). data are stored in object and format/operations stored in object inspector. so a data object and objectinspector represents a data unit, feels like you can deserialize the object use the information provided by objectinspector.
ObjectInspectors are used to serialize an object, like suppose you are creating a JSON serde and using a JSON library to convert java objects into JSON and vice versa, then the Hive object you receive is an internal representation of row, this needs to be converted to java object which will be further converted to JSON. for the Hive to JAVA conversion we need objectInspectors eg ListObjectInspector.
Similarly when you deserialize you convert a JSON to a Hive row object, for that we use TypeInfo class eg ListTypeInfo.