Query two or more sphinx indexes - full-text-search

I'm using php API to query two sphinx indexes as below
$cl->Query("test","index1 index2");
and I'm getting the result from both of successfully but I can't differentiate which result is from which index. is there a way to tell the difference? or do I need to do 2 queries separately?

Set a unique attribute on each
source1 {
sql_query = SELECT id, 1 as index_id, ....
sql_attr_unit = index_id
}
source2 {
sql_query = SELECT id, 2 as index_id, ....
sql_attr_unit = index_id
}
Results will contain a 'index_id' attribute.
Almost the same if using RT indexes. just need to define a rt_attr_unit and then populate it appropriately when you inject data into the index.
The otherway, persumably you've already arranged for the ids in the two indexes to be non-overlapping (it wont work if have the same ids in both indexes) so can look a the ID to deduce the source index.

Related

Laravel query builder - Select elements unique or null on specific column

I have a model Form for table forms. There is a column called guid which can be null, or contain some sort of grouping random hash.
I need to select all forms that have column guid either null or unique in current search. In other words, for repeating guid values in current search I select only first occurence of every guid hash.
I tried:
$results = App\Form::where(... some where clauses .. ).groupBy('guid')
and it's almost ok, but for all rows, where guid == NULL it groups them and selects only one (and I need all of them).
How can I get the unique or null rows either by building proper SQL query or filtering the results in PHP?
Note: I need my $results to be an Illuminate\Database\Eloquent\Builder instance
EDIT:
I fount out that SQL version of query I need is:
SELECT * FROM `forms` WHERE .... GROUP BY IFNULL(guid, id)
What would be equivallent query for Laravel's database query builder?
UPDATE: Using DB::raw
App\Form::where(... conditions ...)
->groupBy(DB::raw("IFNULL('guid', 'id')"));
Or the another way could be:
You can also use whereNotNull, whereNull & at last merge both the collections using merge() like this:
First get the results where guid is grouped by (excluding null guid's here):
$unique_guid_without_null = App\Form::whereNotNull('guid')->groupBy('guid')->get();
Now, get the results where guid is null:
$all_guid_with_null = App\Form::whereNull('guid')->get();
and at last merge both the collections using merge() method:
$filtered_collection = $unique_guid_without_null->merge($all_guid_with_null);
Hope this helps!
For your edited question, you can use raw() as;
->groupBy(DB::raw("IFNULL('guid', 'id')"))
So your final query will be as:
$results = App\Form::where(... some where clauses .. )
->groupBy(DB::raw("IFNULL('guid', 'id')"));
By above query, your $results will be an instance of Illuminate\Database\Eloquent\Builder.

In Solr, how can I get a list of one field ( document id ) for all documents?

I am working with a Solr instance that is populated from an oracle database. As records are added and deleted from the oracle database they are supposed to also be added and removed from Solr.
The schema.xml has this setup, which we use to store the ID that is also the primary key in oracle:
<uniqueKey>id</uniqueKey>
<field name="id" type="string" indexed="true" stored="true"/>
Furthermore the ids are not in sequential order. The solr admin interface has not been much help, I can only see the IDs along with the rest of each record, a few at a time, paginated.
There are about a million documents in this solr core.
I can easily get the IDs of the records from the oracle database, and so I would like to also get a list of the document id's from the solr index for comparison.
I haven't been able to find any information on how to do this but I may be searching
If you really need to get the id of all your documents, use the fl parameter. Something like that:
SolrQuery q = new SolrQuery("*:*&fl=id");
// ^^^^^
// return only the `id` field
q.setRows(10000000);
// ^^^^^^^^
// insanely high number: retrieve _all_ rows
// see: http://wiki.apache.org/solr/CommonQueryParameters#rows-1
return server.query(q).getResults();
(untested):
For simple comparison between the content in Oracle and in Solr, you might just want to count documents:
SolrQuery q = new SolrQuery("*:*");
q.setRows(0);
// ^
// don't retrieve _any_ row
return server.query(q).getResults().getNumFound();
// ^^^^^^^^^^^^^
// just get the number of matching documents
(untested):
In latest Solr (4.10), you can export large number of records.
However, if you really just want one field, you can make a request with that one field and export as CSV. That minimizes the formatting overhead.
For Solr 7 syntax has changed a bit. This is what worked for me (in Java):
CloudSolrClient solrClient = ...;
solrClient.setDefaultCollection("collection1");
SolrQuery q = new SolrQuery("*:*");
q.set("fl", "id");
q.setRows(10000000);
Set<String> uniqueIds = solrClient.query(q).getResults()
.stream().map(x -> (String) x.get("id"))
.collect(Collectors.toSet());

Rethinkdb - filtering by value in another table

In our RethinkDB database, we have a table for orders, and a separate table that stores all the order items. Each entry in the OrderItems table has the orderId of the corresponding order.
I want to write a query that gets all SHIPPED order items (just the items from the OrderItems table ... I don't want the whole order). But whether the order is "shipped" is stored in the Order table.
So, is it possible to write a query that filters the OrderItems table based on the "shipped" value for the corresponding order in the Orders table?
If you're wondering, we're using the JS version of Rethinkdb.
UPDATE:
OK, I figured it out on my own! Here is my solution. I'm not positive that it is the best way (and certainly isn't super efficient), so if anyone else has ideas I'd still love to hear them.
I did it by running a .merge() to create a new field based on the Order table, then did a filter based on that value.
A semi-generalized query with filter from another table for my problem looks like this:
r.table('orderItems')
.merge(function(orderItem){
return {
orderShipped: r.table('orders').get(orderItem('orderId')).pluck('shipped') // I am plucking just the "shipped" value, since I don't want the entire order
}
})
.filter(function(orderItem){
return orderItem('orderShipped')('shipped').gt(0) // Filtering based on that new "shipped" value
})
it will be much easier.
r.table('orderItems').filter(function(orderItem){
return r.table('orders').get(orderItem('orderId'))('shipped').default(0).gt(0)
})
And it should be better to avoid result NULL, add '.default(0)'
It's probably better to create proper index before any finding. Without index, you cannot find document in a table with more than 100,000 element.
Also, filter is limit for only primary index.
A propery way is to using getAll and map
First, create index:
r.table("orderItems").indexCreate("orderId")
r.table("orders").indexCreate("shipStatus", r.row("shipped").default(0).gt(0))
With that index, we can find all of shipper order
r.table("orders").getAll(true, {index: "shipStatus"})
Now, we will use concatMap to transform the order into its equivalent orderItem
r.table("orders")
.getAll(true, {index: "shipStatus"})
.concatMap(function(order) {
return r.table("orderItems").getAll(order("id"), {index: "orderId"}).coerceTo("array")
})

Is it possible to detect if the selected item is the first in LINQ-to-SQL?

I wonder how I can build a query expression which understands the given item being selected is the first or not. Say I'm selecting 10 items from DB:
var query = db.Table.Take(10).Select(t => IsFirst ? t.Value1 : t.Value2);
There is an indexed variant of Select but that is not supported in LINQ-to-SQL. If it was supported my problems would be solved. Is there any other trick?
I could have used ROW_NUMBER() on T-SQL for instance, which LINQ-to-SQL uses but does not give access to.
I know I can Concat two queries and use the first expression in the first and so forth but I don't want to manipulate the rest of the query, just the select statement itself because the query is built at multiple places and this is where I want to behave differently on first row. I'll consider other options if that is not possible.
You can use the indexed overload, but you need to use the LINQ to Objects version:
var query =
db.Table.Take(10).AsEnumreable()
.Select((t, index) => index == 0 ? t.Value1 : t.Value2);
If Table have a primary key. You could do this:
var result= (
from t in db.Table.Take(10)
let first=db.Table.Take(10).Select (ta =>ta.PrimayKey).First()
select new
{
Value=(t.PrimaryKey=first?t.Value1 : t.Value2)
}
);

Linq Contains issue: cannot formulate the equivalent of 'WHERE IN' query

In the table ReservationWorkerPeriods there are records of all workers that are planned to work on a given period on any possible machine.
The additional table WorkerOnMachineOnConstructionSite contains columns workerId, MachineId and ConstructionSiteId.
From the table ReservationWorkerPeriods I would like to retrieve just workers who work on selected machine.
In order to retrieve just relevant records from WorkerOnMachineOnConstructionSite table I have written the following code:
var relevantWorkerOnMachineOnConstructionSite = (from cswm in currentConstructionSiteSchedule.ContrustionSiteWorkerOnMachine
where cswm.MachineId == machineId
select cswm).ToList();
workerOnMachineOnConstructionSite = relevantWorkerOnMachineOnConstructionSite as List<ContrustionSiteWorkerOnMachine>;
These records are also used in the application so I don't want to bypass the above code even if is possible to directly retrieve just workerPeriods for workers who work on selected machine. Anyway I haven't figured out how it is possible to retrieve the relevant workerPeriods once we know which userIDs are relevant.
I have tried the following code:
var userIDs = from w in workerOnMachineOnConstructionSite select new {w.WorkerId};
List<ReservationWorkerPeriods> workerPeriods = currentConstructionSiteSchedule.ReservationWorkerPeriods.ToList();
allocatedWorkers = workerPeriods.Where(wp => userIDs.Contains(wp.WorkerId));
but it seems to be incorrect and don't know how to fix it. Does anyone know what is the problem and how it is possible to retrieve just records which contain userIDs from the list?
Currently, you are constructing an anonymous object on the fly, with one property. You'll want to grab the id directly with (note the missing curly braces):
var userIDs = from w in workerOnMachineOnConstructionSite select w.WorkerId;
Also, in such cases, don't call ToList on it - the variable userIDs just contains the query, not the result. If you use that variable in a further query, the provider can translate it to a single sql query.

Resources