Index object in ElasticSearch? - elasticsearch

I have a dictionary containing a lot of fields, and if possible I would like to index it as it is inside ElasticSearch. I am using Python for this. Could it work or is it a limitation of ElasticSearch that it can support only simple data types? Thanks!
Later edit: My dictionary looks like below:
{
'lingpipe': Tagging(tagger_id = 'lingpipe', raw_tagging = '', tagger_config = None, tagger_version = None, generation_time = StreamTime(epoch_ticks = 1364699587.0, zulu_timestamp = '2013-03-31T04:13:07.663542Z')),
'serif': Tagging(tagger_id = 'serif', raw_tagging = '', tagger_config = 'English', tagger_version = '6.1', generation_time = None),
'stanford': Tagging(tagger_id = 'stanford', raw_tagging = '1\tFree\tFree\tNNP\tO\t23\t27\n2\tBiscayne\tBiscayne\tNNP\tO\t28\t36\n3\tNational\tNational\tNNP\tO\t37\t45\n4\tPark\tPark\tNNP\tO\t46\t50\n5\tGuide\tGuide\tNNP\tO\t51\t56\n6\tGet\tget\tVB\tO\t57\t60\n7\tyour\tyou\tPRP$\tO\t61\t65\n8\tquick\tquick\tJJ\tO\t66\t71\n9\tguide\tguide\tNN\tO\t72\t77\n10\tto\tto\tTO\tO\t78\t80\n11\tthe\tthe\tDT\tO\t81\t84\n12\ttop\ttop\tJJ\tO\t85\t88\n13\thotels\thotel\tNNS\tO\t89\t95\n14\t,\t,\t,\tO\t95\t96\n15\trestaurants\trestaurant\tNNS\tO\t97\t108\n16\tand\tand\tCC\tO\t109\t112\n17\tthings\tthing\tNNS\tO\t113\t119\n18\tto\tto\tTO\tO\t120\t122\n19\tdo\tdo\tVB\tO\t123\t125\n20\t.\t.\t.\tO\t125\t126\n\n1\tGrab\tgrab\tVB\tO\t127\t131\n2\tit\tit\tPRP\tO\t132\t134\n3\tand\tand\tCC\tO\t135\t138\n4\tGo\tgo\tVB\tO\t139\t141\n5\t!\t!\t.\tO\t141\t142', tagger_config = 'annotators: {tokenize, cleanxml, ssplit, pos, lemma, ner}, properties: pos.maxlen=100', tagger_version = 'Stanford CoreNLP ver 1.2.0', generation_time = StreamTime(epoch_ticks = 1338505200.0, zulu_timestamp = '2012-06-01T00:00:00.0Z'))
}
so it's basically a mix of data types. I am not yet sure how I can serialize it, ES is throwing a Serialization error. How can I convert this object to JSON?

Related

Not able to read data from elasticsearch using alpakka

I was trying to read document(json data stored in ES) from elasticsearch using alpakka.
I got this alpakka-Elasticsearch.
Here it says that you can stream messages from or to Elasticsearch using the
ElasticsearchSource, ElasticsearchFlow or the ElasticsearchSink.
I tried to impliment ElasticsearchSource method. So my code looks like this
val url = "http://localhost:9200"
val connectionSettings = ElasticsearchConnectionSettings(url)
val sourceSettings = ElasticsearchSourceSettings(connectionSettings)
val elasticsearchParamsV7 = ElasticsearchParams.V7("category_index")
val copy = ElasticsearchSource
.typed[CategoryData](
elasticsearchParamsV7,
query = query,
sourceSettings
).map { message: ReadResult[CategoryData] =>
println("Inside message==================> "+message)
WriteMessage.createIndexMessage(message.id, message.source)
} .runWith(
ElasticsearchSink.create[CategoryData](
elasticsearchParamsV7,ElasticsearchWriteSettings(connectionSettings)
)
)
println("Final data==============>. "+copy)
At the end, copy value returning Future[Done].
But I was not able to read data from ES.
Is there Something I missing?
And also is there any other way using akka http client api to do the same?
What is preferred way to use ES in akka?
To read data from Elasticsearch, sth like this should be enough:
val matchAllQuery = """{"match_all": {}}"""
val result = ElasticsearchSource
.typed[CategoryData](
elasticsearchParamsV7,
query = matchAllQuery,
sourceSettings
).map { message: ReadResult[CategoryData] =>
println("Read message==================> "+message)
}.runWith(Sink.seq)
result.onComplete(res => res.foreach(col => println(s"Read: ${col.size} records")))
If the type CategoryData does not match correctly to what is stored in the index, the query may not return results.
If in doubt, it's possible to read raw JSON:
val elasticsearchSourceRaw = ElasticsearchSource
.create(
elasticsearchParamsV7,
query = matchAllQuery,
settings = sourceSettings
)

Spring Boot + Mongo - com.mongodb.BasicDocument, you can't add a second 'id' criteria

Any ideas why I get this error when making a query:
org.springframework.data.mongodb.InvalidMongoDbApiUsageException: Due to limitations of the com.mongodb.BasicDocument, you can't add a second 'id' criteria. Query already contains '{ "id" : "123"}'
I'm using Spring Boot and Mongo:
fun subGenreNames(subGenreIds: List<String>?): List<String> {
val results = mutableListOf<String>()
var query = Query()
subGenreIds!!.forEach{
query.addCriteria(Criteria.where("id").`is`(it))
var subGenreName = mongoTemplate.findById(it, SubGenre::class.java)
results.add(subGenreName!!.name)
}
return results
}
I have the class SubGenre set with:
#Document(collection = "subgenres")
data class SubGenre(
#Field("id")
val id: String,
val name: String
)
Thanks
Based on your code, you need to use either
query.addCriteria(Criteria.where("id").`is`(it))
var subGenreName = mongoTemplate.find(query, SubGenre::class.java)
or
var subGenreName = mongoTemplate.findById(it, SubGenre::class.java)
but not both.

Producer.send doesn't work inside of .map

I'm making application that fetches data from elasticsearch and sends it to kafka. But producer.send() function does not work inside of map, however outside of it, everything works perfectly
val f1 = ElasticsearchSource
.create(
indexName = "products",
typeName = "product",
query = """{"match_all": {}}"""
)
.map { message: OutgoingMessage[spray.json.JsObject] =>
val product = message.source
producer.send(new ProducerRecord("test", product))
println("publishing message ")
IncomingMessage(Some(message.id), message.source)
}
.runWith(Sink.seq)
What might be the cause of it?

Elasticsearch (NEST client) - How to search across multiple indices using OIS

I need to search across multiple indices using OIS(Object Initializer Syntax).
I have seen examples of executing search across multiple indices with Fluent DSL, but I still do not know how to execute an equivalent search with OIS.
Here is my OIS search(Only searching against one index) :
var searchResult =
await _client.LowLevel.SearchAsync<string>(ApplicationsIndexName, "application", new SearchRequest()
{
From = (query.PageSize * query.PageNumber) - query.PageSize,
Size = query.PageSize,
Query = GetQuery(query),
Aggregations = GetAggregations()
});
Which modifications can be done, so I can search across multiple indices?
After some research, I found out how to search across multiple indices:
var searchResult =
await _client.LowLevel.SearchAsync<string>(new SearchRequest()
{
IndicesBoost = new Dictionary<IndexName, double>
{
{ "applications", 1.4 },
{ "attachments", 1.4 }
},
From = (query.PageSize * query.PageNumber) - query.PageSize,
Size = query.PageSize,
Query = GetQuery(query),
Aggregations = GetAggregations()
});

How do I do a raw query using Nest with object initializer syntax?

I am trying to do a search in ElasticSearch using Nest. I want to use the object initializer syntax because I need to build the parts of the search dynamically. I have figured out how to build much of the request, but am not clear how I would initialize a Raw Query. The OIS doesn't seem to have QueryRaw as a parameter to the request.
Code that I have now:
var searchResults = client.Search<dynamic>(s => s
.Index("myIndex"),
.Type("myType),
.Aggregations(a => a
.DateHistogram("my_date_histogram", h => h
.Field("DateField")
.Interval("day")
)
)
.QueryRaw(queryText)
)
Code that I am trying to create:
var request = new SearchRequest<dynamic>
{
Index = "MyIndex",
Type = "MyType",
QueryRaw = <doesn't exist>
};
You can do this by
var searchResponse = client.Search<dynamic>(new SearchRequest
{
Query = new RawQuery(yourquery)
});
Tested with NEST 2.0.0.alpha2 and ES 2.1.0
Here's how to do raw queries using the new object structure:
var response = client.Search<dynamic>(s => s
.Query(qry => qry
.Raw(yourRawQueryStringHere)
)
);

Resources