suggestions on fulltext search or already existing search algorithms - algorithm

Can someone suggest how to solve the below search problem easily, I mean is there any algorithm, or full text search will be suffice for this?
There is below classification of items data,
ItemCategory
ItemCluster
ItemSubCluster
SubCluster
Items
Vegetable
Root vegetables
Root
WithOutSkin
potato, sweet potato, yam
Vegetable
Root vegetables
Root
WithSkin
onion, garlic, shallot
Vegetable
Greens
Leafy green
Leaf
lettuce, spinach, silverbeet
Vegetable
Greens
Cruciferous
Flower
cabbage, cauliflower, Brussels sprouts, broccoli
Vegetable
Greens
Edible plant stem
Stem
celery, asparagus
The inputs will be some thing like,
sweet potato, yam
Yam, Potato
garlik, onion
lettuce, spinach, silverbeet
lettuce, silverbeet
lettuce, silverbeet, spinach
From the input, I want to get the mapping of the input items those belongs to which ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
Any help will be much appreciated.

You are nearly following the right approach.
You don't need full text searching here.
What you can create here is a kind of inverted index as follows:
If we take example of potato, create a map for potato storing what is its ItemCategory, ItemCluster, ItemSubCluster, SubCluster.
For example -
"potato": {
"ItemCategory": "Vegetable",
"ItemCluster": "Root vegetables",
"ItemSubcluster": "Root",
"Subcluster": "Without Skin"
}
Now, to store this kind of data for each vegetable would be expensive.
You can optimise the storage by using an encoding scheme:
For example -
let ItemCategory be denoted by 0,
let ItemCluster be denoted by 1,
let ItemSubcluster be denoted by 2,
let Subcluster be denoted by 3
and the values be denoted by a similar encoding scheme:
let Vegetable be denoted by 0,
let Root vegetables be denoted by 1,
let Root be denoted by 2,
let Without Skin be denoted by 3
Now, your mapping becomes:
"potato": {
"0": "0",
"1": "1",
"2": "2",
"3": "3",
}
To further optimise this, you can also make maintain an index of vegetables. For example, potato can be denoted by 0.
So your final index becomes:
"0": {
"0": "0",
"1": "1",
"2": "2",
"3": "3",
}

Related

Ruby: compile a hash from two 2d arrays with appropriate combinations

I have two 2d arrays:
product_names_array = [product_one_names = ["Product One A", "Product One B"],
product_two_names = ["Product Two C", "Product Two D"]]
product_prices_array = [product_one_price = [product_one_price_1, product_one_price_2, product_one_price_3, product_one_price_4],
product_two_price = [product_two_price_1, product_two_price_2, product_two_price_3, product_two_price_4]]
In the first 2D array, I have 2 sub-arrays (in reality there are 16 of them) - one for each product. Each of them lists different names for the same product (each product can have from 1 to 22 alternative names).
In the second 2D array, I have 2 sub-arrays (in reality there are also 16 of them) - one per price list for each product. Each of them lists different prices (in reality, 10 price options) of the same product (which may have several names) from the corresponding sub-array in the previous 2D array.
From arrays I want to make such a hash:
my_hash = {"Product One A" => [product_one_price_1, product_one_price_2, product_one_price_3, product_one_price_4],
"Product One B" => [product_one_price_1, product_one_price_2, product_one_price_3, product_one_price_4],
"Product One C" => [product_two_price_1, product_two_price_2, product_two_price_3, product_two_price_4],
"Product Two D" => [product_two_price_1, product_two_price_2, product_two_price_3, product_two_price_4]}
As you can see, each value from the arrays in the first 2D array creates all possible combinations with each corresponding array in the second 2D array.
Then I want to use the created hash like this:
puts my_hash["Product One B"[2]] # => product_one_price_3
(I doubt the correctness of this expression, so I will be grateful if you will help me here too...)
I would also like to avoid defining new methods, because this code will be used in the plugin Computed Custom Field for Redmine, and it does not accept def.
I have already re-read a bunch of information about arrays and hashes in Ruby, well, so far I have not even come close to solving my problem. Any help would be helpful!
You can benefit from zip method, which will combine corresponding elements from two arrays
([0] - [0], [1] - [1] and so on)
result = product_names_array.zip(product_prices_array)
.flat_map { |names, prices| names.map { |name| [name, prices] }}
.to_h
result
# output
# product_one_prices: 11, 12, 13, 14
# product_two_prices: 21, 22, 23, 24
{
"Product One A"=>[11, 12, 13, 14],
"Product One B"=>[11, 12, 13, 14],
"Product Two C"=>[21, 22, 23, 24],
"Product Two D"=>[21, 22, 23, 24]
}

How can I obtain the word context for a list of search terms in elasticsearch?

I just set up my first elasticsearch cluster and uploaded a few thousand documents. Now I would like to perform a relatively simple task: I have a list of search terms and, for each term, would like to obtain a list of the documents in my database that contain this search term together with the word context (5 words before search term, 5 words after search term).
Is there a simple way to do this? I already searched a lot but have not found a satisfying answer.
Example
I have a database with one document (id: 1): "The dog runs up the
hill to fly a yellow kite. He looks happy."
I have one search term: "hill".
I would like to write a request that returns the id 1 together with
the 5 words before ("The dog runs up the") and the 5 words after ("to
fly a yellow kite.") the search term ("hill").
I don't think it is possible to make elasticsearch return exactly n words around a match, but you can use the highlighting feature to retrieve the rough context and then post-process the result in your application.
By default, elasticsearch tries to determine what context makes a good snippet, so you maybe have to increase the size of this window by setting fragment_size (number of characters of the returned snippet).
Here is an example query:
{
"query": {
"match": {
"yourtextfield": "hill"
}
},
"highlight": {
"fields": {
"yourtextfield": {}
},
"boundary_scanner": "word",
"type": "plain",
"fragment_size": 150,
"pre_tags": "",
"post_tags": ""
}
}
Normally, the match is encapsulated in <em> and </em>, but you can modify or remove them with via pre_tags and post_tags. It might be useful to utilize them as markers so that you know what word exactly matched your query.
Please have a look at the documentation as well, there are many good examples that might help you.

OCaml: design datatypes for a text adventure game

I am trying to make a simple naive text adventure game (base one this page) to learn OCaml.
The game is about making an game engine, so all the information about rooms, items ect, is store in a json file.
Sample json file would be like this:
{
"rooms":
[
{
"id": "room1",
"description": "This is Room 1. There is an exit to the north.\nYou should drop the white hat here.",
"items": ["black hat"],
"points": 10,
"exits": [
{
"direction": "north",
"room": "room2"
}
],
"treasure": ["white hat"]
},
{
"id": "room2",
"description": "This is Room 2. There is an exit to the south.\nYou should drop the black hat here.",
"items": [],
"points": 10,
"exits": [
{
"direction": "south",
"room": "room1"
}
],
"treasure": ["black hat"]
}
],
"start_room": "room1",
"items":
[
{
"id": "black hat",
"description": "A black fedora",
"points": 100
},
{
"id": "white hat",
"description": "A white panama",
"points": 100
}
],
"start_items": ["white hat"]
}
I've almost done the game, but on the project description page, it says two of the objectives are
Design user-defined data types, especially records and variants.
Write code that uses pattern matching and higher-order functions on lists and on trees.
However, the only user-defined datatype I made is a record type used to capture the current state of the game, I did not use tree and variant :
type state = {
current_inventory : string list ;
current_room : string ;
current_score : int ;
current_turn : int ;
}
then just parse user input and use pattern matching to handle different situations.
I'm been trying to figure out how should I use variant (or polymorphic variant) and tree in my game.
Can anyone please provide some suggestions?
The json is inherently a tree. You may, of course just parse the json without having an in-memory representation and perform side-effectful computations as you descent though the json data to fill in hash tables with the data that you've read. This is a valid option, but it looks like that authors of the course expect, that you first read the entire json and represent it in memory as a tree, and then perform lookups on the tree.
What concerning variants, then you should represent with a variant type the following data:
movement directions: type dir = N | NE | E ...
verbs type verb = Go | Take of item | Drop of item
Also, it would be a good idea to create an abstract data types for room and items, that will guarantee that they are is actually present in the json data bases. You're using string to represent them. But this type includes all values, including those, that doesn't represent a valid identifiers, as well as those, that doesn't occur in the game description file. Inventory items are also deserve to get their own type.
In general in languages with rich type system, you should try to express as much as possible with the type system.
Just to be less theoretical, if I were you, then I will have the following types in my game (as a first approximation):
type game
type room
type item
type verb
type dir
type treasure
type state
(** a static representation of a game (using a tree inside) *)
module Game : sig
type t = game
val from_json : string -> t option
val start : t -> room
val room_exits : t -> room -> (dir * room) list
end
module Room : sig
type t = room
val description : t -> string
val items : t -> item list
val points : t -> int
val treasure : t -> treasure list
end
...

Couchbase query using "\uefff" break the next conditional keys

I have a map() function like this in beer design document:
function (doc, meta) {
if(doc.brewery_id)
emit([ doc.brewery_id, doc.abv], [doc.name, doc.abv, doc.type, doc.brewery_id, doc.style, doc.category]);
}
I need to get all doc with 2 rules:
1.[brewery_id] start with "21st"
2.[abv] between 3-4
My filter is:
startkey=["21st", 3]
endkey=["21st\uefff", 4]
But the result is not correct, the rule 1 is work as expected but the rule 2 is ignored.
Please help me find out what's wrong.
Thanks!!!
Hear's the result
{"total_rows":5891,"rows":[
{"id":"21st_amendment_brewery_cafe-bitter_american","key":["21st_amendment_brewery_cafe",3.6],"value":["Bitter American",3.6,"beer","21st_amendment_brewery_cafe","Special Bitter or Best Bitter","British Ale"]},
{"id":"21st_amendment_brewery_cafe-563_stout","key":["21st_amendment_brewery_cafe",5],"value":["563 Stout",5,"beer","21st_amendment_brewery_cafe","American-Style Stout","North American Ale"]},
{"id":"21st_amendment_brewery_cafe-south_park_blonde","key":["21st_amendment_brewery_cafe",5],"value":["South Park Blonde",5,"beer","21st_amendment_brewery_cafe","Golden or Blonde Ale","North American Ale"]},
{"id":"21st_amendment_brewery_cafe-amendment_pale_ale","key":["21st_amendment_brewery_cafe",5.2],"value":["Amendment Pale Ale",5.2,"beer","21st_amendment_brewery_cafe","American-Style Pale Ale","North American Ale"]},
{"id":"21st_amendment_brewery_cafe-potrero_esb","key":["21st_amendment_brewery_cafe",5.2],"value":["Potrero ESB",5.2,"beer","21st_amendment_brewery_cafe","Special Bitter or Best Bitter","British Ale"]},
{"id":"21st_amendment_brewery_cafe-general_pippo_s_porter","key":["21st_amendment_brewery_cafe",5.5],"value":["General Pippo's Porter",5.5,"beer","21st_amendment_brewery_cafe","Porter","Irish Ale"]},
{"id":"21st_amendment_brewery_cafe-watermelon_wheat","key":["21st_amendment_brewery_cafe",5.5],"value":["Watermelon Wheat",5.5,"beer","21st_amendment_brewery_cafe","Belgian-Style Fruit Lambic","Belgian and French Ale"]},
{"id":"21st_amendment_brewery_cafe-north_star_red","key":["21st_amendment_brewery_cafe",5.8],"value":["North Star Red",5.8,"beer","21st_amendment_brewery_cafe","American-Style Amber/Red Ale","North American Ale"]},
{"id":"21st_amendment_brewery_cafe-oyster_point_oyster_stout","key":["21st_amendment_brewery_cafe",5.9],"value":["Oyster Point Oyster Stout",5.9,"beer","21st_amendment_brewery_cafe","American-Style Stout","North American Ale"]},
{"id":"21st_amendment_brewery_cafe-21a_ipa","key":["21st_amendment_brewery_cafe",7.2],"value":["21A IPA",7.2,"beer","21st_amendment_brewery_cafe","American-Style India Pale Ale","North American Ale"]}
]
}
If you need to filter your results by 2 varying ranges you can use LinQ, but if you have large number of documents it can be slow. So to make it faster you can do two things:
After applying LinQ "filter" cache results in memcached or couchbase.
If your datamodel allows you to create separate view for one of the ranges, i.e. if you can move one of your ranges from key to map function if like:
View for 21sts:
map: function() { if (doc.subtype === "21sts") emit (doc.abv,null) }
where docs that have subtype == "21sts" are docs that you can get from view with:
map: function() { emit(doc.brewery_id, null) }
and startkey="21st", endkey="21st\uefff".

how to find best matching element in array of numbers?

I need help with something that seems simple but confuses me. Trying to write some fuzzy matching method that copes with differences in format between what value is computed as needed, and which are actually available from a selection list.
The value (option strike price) is always a computed Float like 85.0 or Int.
The array contains numbers in string form, unpredictable in either increment or whether they will be shown rounded to some decimal (including extra zeros like 5.50) or no decimal (like 85), eg.:
select_list = ["77.5", "80", "82.5", "85", "87.5", "90", "95", "100", "105"]
I am unsure how to write a simple line or two of code that will return the closest matching element (by number value) as it appears in the array. For example, if select_list.contains? 85.0 returned "85"
Actually, the selection choices come from a Watir::Webdriver browser.select_list(:id, "lstStrike0_1") HTML object whose visible text (not HTML value) are those numbers; maybe there is a more direct way to just call browser.select_list(:id, "lstStrike0_1").select X without having to figure out in Watir how to convert all those choices into a Ruby array?
xs = ["77.5", "80", "82.5", "85", "87.5", "90", "95", "100", "105"]
xs.min_by { |x| (x.to_f - 82.4).abs }
#=> "82.5"
I'm not a ruby coder so this might not be the best way to do it
def select_closest(list, target)
return (list.map {|x| [(x.to_f - target).abs, x]}).min[1]
end
select_list = ["77.5", "80", "82.5", "85", "87.5", "90", "95", "100", "105"]
puts select_closest(select_list, 81) # yields 80

Resources