I have to compare two string using something similar to ts_vector function used by postgres.
Basically I need to clean up string from propositions, stop words, plural/singolars so that they are equal.
For example, all those strings should be equal for a == ruby comparison.
"cover iphone", "covers iphone", "cover for iphone", "cover iphones"
SELECT
strip(to_tsvector('english', 'cover iphone')) = strip(to_tsvector('english', 'cover for iphone'))
This should support main languages such as english, german, spanish...
Related
I am implementing an internal search that looks at various normalized fields to determine relevance for a user's search terms. The best_fields strategy seems to yield strange results sometimes because a "less important" field will generate the highest score and beat out other more important fields with weaker matches. I've included a boost, but cranking that value up seems like it will also skew results; as does moving to a most_fields strategy since not all pages will have all the fields.
What is the right way to go about tuning the below query & incorporating scores from each field?
Below is an example where the content field ends up winning the "max" evaluation for best_field (because the search term is present more times) and scores higher than the second page which I want to come first because the search term is a literal match for the keywords field. What's more, since more keywords are added to important pages, their match seems to get further devalued since the field length is much longer than average.
Query Example
{
"query": {
'multi_match' : {
"query": "Hello World",
"fields": ["keywords^3", "name^2", "content^1"]
}
}
};
Document/Results Example:
[{
"name": "Howdy!",
"keywords: "",
"content": "Hello everybody, I'm in the world. hello there, i like saying hello"
},{
"name": "Hey",
"keywords: "Hello World, Hello, World",
"content": "Lot's of text, Lot's of text, Lot's of text, Lot's of text, Lot's of text, Hello"
}]
What you have right now is static boost which is also a good way to tune the search relevance but for advance use cases like yours I would advise looking at function score way to fine tuning the score and relevance.
Please go through at the function score documentation, its quite exhaustive and can easily serve your use-case.
I have two folders with names CLP2_v6 and CLP_DE0_v7. When I tried to sort it using Icomparer the result is:
Using StrCmpLogicalW(Windows):
CLP_DE0_v7
CLP2_v6
I'm confusing right now it's because when I tried to sort that words using this Text Line Sorter gave me a different answer:
Using TextLineSorter:
CLP2_v6
CLP_DE0_v7
What I want to display on my program is like this:
Preferred:
CLP2_v6
CLP_DE0_v7
Here VB.net is my code:
Public Class StringCompare
Implements IComparer(Of String)
Declare Unicode Function StrCmpLogicalW Lib "shlwapi.dll" _
(ByVal s1 As String, ByVal s2 As String) As Int32
Public Function Compare(x As String, y As String) As Integer Implements _
System.Collections.Generic.IComparer(Of String).Compare
Return StrCmpLogicalW(x, y)
End Function
End Class
Sub Main()
Dim UsortedArray() As String = {
"CLP_DE0_v7",
"CLP2_v6"
}
Dim rc As New StringCompare()
Console.WriteLine(vbLf & "Windows Sorting:")
Array.Sort(UsortedArray, rc)
Console.WriteLine()
For Each dinosaur As String In UsortedArray
Console.WriteLine(dinosaur)
Next
End Sub
What is the right arrangement if we'll try to sort these two words? Does sorting have many rules to follow? or Sorting have many standards?
There are many different approaches to sorting depending on required results, context or available tools. Since you have .NET you're not limited to technology, but your own requirements.
Consider theese situations:
Culture specific
'Array sorted by English culture
{"aa", "bb", "cc", "ch", "dd", "ee", "ff", "gg", "hh", "ii"}
'Same array sorted by Czech culture
{"aa", "bb", "cc", "dd", "ee", "ff", "gg", "hh", "ch", "ii"}
And have you ever heard of ě, ê, è, é ? :)
Where do you put them? Before "e", after "e", after, "z"? That would depend on your culture and needs.
Technology specific
Let's say you have your ANSI string in array of bytes. Sortig by byte-value returns something different then sorting by char position in alphabet.
User-needs specific
Is "a" more than "A"? What in general? What in your specific need?
Is directory named "9" more than directory named "10"? Sort it as string and you'll get {"10", "9"}, open it in windows explorer and you'll see {"9", "10}. Open it in Total Commander and you'll get {"10", "9"} again for the same directory.
Conclusion
You should define what you really need in your specific case. And find proper or easy way how to do it. In .NET your results will depenend on Threading.Thread.CurrentThread.CultureInfo or your own IComparer that you can provide to IList.Sort method or SortedList/SortedSet constructors.
Risks
You should be aware of different sorting under different culture info. For example creating and filling SortedList(Of String, Object) under "hu-HU" culture will cause weird exceptions in some cases after reading items under "cs-CZ" culture since the items would not be sorted as expected and binary search tree would be confused.
I am trying to make a simple naive text adventure game (base one this page) to learn OCaml.
The game is about making an game engine, so all the information about rooms, items ect, is store in a json file.
Sample json file would be like this:
{
"rooms":
[
{
"id": "room1",
"description": "This is Room 1. There is an exit to the north.\nYou should drop the white hat here.",
"items": ["black hat"],
"points": 10,
"exits": [
{
"direction": "north",
"room": "room2"
}
],
"treasure": ["white hat"]
},
{
"id": "room2",
"description": "This is Room 2. There is an exit to the south.\nYou should drop the black hat here.",
"items": [],
"points": 10,
"exits": [
{
"direction": "south",
"room": "room1"
}
],
"treasure": ["black hat"]
}
],
"start_room": "room1",
"items":
[
{
"id": "black hat",
"description": "A black fedora",
"points": 100
},
{
"id": "white hat",
"description": "A white panama",
"points": 100
}
],
"start_items": ["white hat"]
}
I've almost done the game, but on the project description page, it says two of the objectives are
Design user-defined data types, especially records and variants.
Write code that uses pattern matching and higher-order functions on lists and on trees.
However, the only user-defined datatype I made is a record type used to capture the current state of the game, I did not use tree and variant :
type state = {
current_inventory : string list ;
current_room : string ;
current_score : int ;
current_turn : int ;
}
then just parse user input and use pattern matching to handle different situations.
I'm been trying to figure out how should I use variant (or polymorphic variant) and tree in my game.
Can anyone please provide some suggestions?
The json is inherently a tree. You may, of course just parse the json without having an in-memory representation and perform side-effectful computations as you descent though the json data to fill in hash tables with the data that you've read. This is a valid option, but it looks like that authors of the course expect, that you first read the entire json and represent it in memory as a tree, and then perform lookups on the tree.
What concerning variants, then you should represent with a variant type the following data:
movement directions: type dir = N | NE | E ...
verbs type verb = Go | Take of item | Drop of item
Also, it would be a good idea to create an abstract data types for room and items, that will guarantee that they are is actually present in the json data bases. You're using string to represent them. But this type includes all values, including those, that doesn't represent a valid identifiers, as well as those, that doesn't occur in the game description file. Inventory items are also deserve to get their own type.
In general in languages with rich type system, you should try to express as much as possible with the type system.
Just to be less theoretical, if I were you, then I will have the following types in my game (as a first approximation):
type game
type room
type item
type verb
type dir
type treasure
type state
(** a static representation of a game (using a tree inside) *)
module Game : sig
type t = game
val from_json : string -> t option
val start : t -> room
val room_exits : t -> room -> (dir * room) list
end
module Room : sig
type t = room
val description : t -> string
val items : t -> item list
val points : t -> int
val treasure : t -> treasure list
end
...
I feel this is a strange one. It comes from nowhere specific but it's a problem I've started trying to solve and now just want to know the answer or at least a starting place.
I have an array of x number of sentences,
I have a count of how many sentences each word appears in,
I have a count of how many sentences each word appears in with every other word,
I can search for a sentence using typical case insensitive boolean search clauses (AND +/- Word)
My data structure looks like this:
{ words: [{ word: '', count: x, concurrentWords: [{ word: '', count: x }] }] }
I need to generate an array of searches which will group the sentences into arrays of n size or less.
I don't know if it's even possible to do this in a predictable way so approximations are cool. The solution doesn't have to use the fact that I have my array of words and their counts. I'm doing this in JavaScript, not that that should matter.
Thanks in advance
I just started to learn Erlang, and really like their list comprehension syntax, for example:
Weather = [{toronto, rain}, {montreal, storms}, {london, fog}, {paris, sun}, {boston, fog}, {vancounver, snow}].
FoggyPlaces = [X || {X, fog} <- Weather].
In this case, FoggyPlaces will evaluate to "london" and "boston".
What's the best way to do this in Ruby?
For example, an Array like (very common, I believe):
weather = [{city: 'toronto', weather: :rain}, {city: 'montreal', weather: :storms}, {city: 'london', weather: :fog}, {city: 'paris', weather: :sun}, {city: 'boston', weather: :fog}, {city: 'vancounver', weather: :snow}]
The best I got 'til now is:
weather.collect {|w| w[:city] if w[:weather] == :fog }.compact
But in this case, I have to call compact to remove nil values, and the example itself is not that readable as Erlang.
And even more, in the Erlang example, both city and weather are atoms. I don't even know how to get something that makes sense and looks good like this in Ruby.
First off, your data structures aren't equivalent. The equivalent Ruby data structure to your Erlang example would be more like
weather = [[:toronto, :rain], [:montreal, :storms], [:london, :fog],
[:paris, :sun], [:boston, :fog], [:vancouver, :snow]]
Secondly, yes, Ruby doesn't have list comprehensions nor pattern matching. So, the example will probably be more complex. Your list comprehension first filters all foggy cities, then projects the name. Let's do the same in Ruby:
weather.select {|_, weather| weather == :fog }.map(&:first)
# => [:london, :boston]
However, Ruby is centered around objects, but you are using abstract data types. With a more object-oriented data abstraction, the code would probably look more like
weather.select(&:foggy?).map(&:city)
which isn't too bad, is it?