Zend lucene - search within range - zend-search-lucene

I have the following code to create the Zend Lucene index
$doc->addField(Zend_Search_Lucene_Field::UnStored('keywords', $job->getKeywords()));
$doc->addField(Zend_Search_Lucene_Field::UnStored('title', $job->getTitle()));
$doc->addField(Zend_Search_Lucene_Field::UnStored('region', $job->getRegion()));
$doc->addField(Zend_Search_Lucene_Field::keyword('minSalary', $minSalary));
$doc->addField(Zend_Search_Lucene_Field::keyword('maxSalary', $maxSalary));
$doc->addField(Zend_Search_Lucene_Field::UnStored('type', $job->getType()));
and my search query is
$query = 'minSalary:[0 TO 20000]';
Here I am trying to get all jobs whose minSalary is equal or less than 20000. But the result I get has jobs with following minSalary values
110000
100000
20000
10000
Can anyone advice on this?
Thanks
B

I suggest to use strings instead of numeric values. Convert all numeric values (e.g. 1000) in strings with the same length (e.g. 0001000) during the indexing process. So, if you want to search for a minSalary from 0 to 20000, your query string has to look like this:
$query = "minSalary:[0000000 TO 0020000]";

Related

NRediSearch - Getting total documents matched count

Is there a way to get a total results count when calling Aggregate function?
Note that I'm not using Aggregate function to aggregate results, but as an advanced search query, because Search function does not allow to sort by multiple fields.
RediSearch returns total documents matched count, but I can't find a way to get this number using NRediSearch library.
With NRediSearch
Using NRediSearch, you would need to build and execute aggregation that will run a GROUPBY 0 and the COUNT reducer, say you have a person-idx index and you want to count all the Person documents in Redis:
var client = new Client("person-idx", muxer.GetDatabase());
var result = await client.AggregateAsync(new AggregationBuilder().GroupBy(new List<string>(), new List<Reducer>{Reducers.Count()}));
Console.WriteLine(result.GetResults().First().Values.First());
Will get the count you are looking for.
With Redis.OM
There's a newer library Redis.OM which you can also use to make these aggregations a bit simpler, the same operation would be done with the following:
var peopleAggregations = provider.AggregationSet<Person>();
Console.WriteLine(peopleAggregations.Count());

Rails compare same object's 1 field with another + addition of string in Active Record

I've two string fields which contains dates in string like field_1 = "2003.11.14" and I use them in ORM and they are working just fine. Now I want to compare 1 field value with another field's - 18.months. Here is a example
User.where("users.field_1 > '#{Date.today - 18.months}' AND users.field_2 > (users.fields_1 - 18.months)")
something like. Can anyone help me?
Thanks in advance
Most databases support data calculations in SQL. Something like this should work.
query = User.where("users.field_1 > ?", 18.months.ago)
query.where("users.field_2 > users.field_1 - :time", time: 18.months.ago)
edit: Just saw that the values are stored as strings, then you can not use SQL.
can not do that because the table has millions of records
I don't really understand why the size of the table limits to use the correct data type?

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

OFFSET/LIMIT only count DISTINCT values in Activerecord query

I am running this query
Playlistship.order("created_at desc").select("distinct playlist_id").limit(12).offset(2)
This query does not necessarily return 12 records. It returns the number of distinct records in the set of 12 defined by the LIMIT, OFFSET and ORDER parameters.
For example if the Playlistships between id=13 and id=24 had playlist_ids of [2,3,3,5,6,3,5,6,8,11,12,12], then this query will only give return 7 records, corresponding to the first ones having the playlist_ids [2,3,5,6,8,11,12].
What I would like to find is a query that yields 12, records with distinct playlist_ids, with the correct offset so that running this query again with an OFFSET of 3 would yield the next 12 records with distinct playlist_ids.
Hopefully I didn't "over explain" this one, as I think it's a relatively straightforward question. Please ask for more details if you need them.
Thanks!
Have you tried with subqueries? Give this a try:
Playlistship.select("distinct playlist_id").limit(12).where(playlist_id: Playlistship.order("created_at desc").select('playlist_id').offset(2))

queryContext - filtering with numbers neo4j/lucene

I'm trying to filter a wildcard query in neo/lucene using numeric range.
I want to search for all nodes (documents) having key "actor" starting with "rob" and age > 20:
WildcardQuery luceneQuery = new WildcardQuery( new Term("actor", "rob*" ));
QueryContext qx = new QueryContext(luceneQuery)
.numericRange("age", 20, null)
.sortNumeric("age", true);
IndexHits<Node> hits = lucene.query(qx);
Once I add numeric range the wildCard query does not works, it only orders by numeric range.
Is it possible to combine both wildcard and numeric?
Thanks,
Daniele
I suspect you want to use a BooleanQuery to combine the WildcardQuery with the numeric range query. (I normally use QueryParser, myself, rather than building the queries by hand.)
For your example query, the QueryParser syntax would look like:
+actor:rob* +age:{20 TO 123}
where +age:{20 TO 123} asks for age > 20 AND age < 123 (the oldest well-documented person lived to 122). The "+" operators force both of those terms to occur in the document.

Resources