Sort lucene over score - sorting

I'm testing sorting feature in lucene with no luck. I am new to it.
I've tried using either TopFieldCollector or TopFieldDocs but no sorting seems to be applied.
Below a test code. What's wrong with it?
private void testNumericSorting(){
// 1. index some data
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, analyzer);
IndexWriter w = new IndexWriter(index, config);
addDoc(w, "orange", 1);
addDoc(w, "lemon orange", 10);
w.close();
// 2. query
String querystr = "orange";
Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index, true);
// Normal score, with no sorting
//TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
//searcher.search(q, collector);
//ScoreDoc[] hits = collector.topDocs().scoreDocs;
// Score with TopFieldCollector
Sort sort = new Sort(new SortField[] {
SortField.FIELD_SCORE,
new SortField("num", SortField.INT) });
TopFieldCollector topField = TopFieldCollector.create(sort, hitsPerPage, true, true, true, false);
searcher.search(q, topField);
ScoreDoc[] sortedHits = topField.topDocs().scoreDocs;
// Score with TopFieldDocs
// TopFieldDocs topFields = searcher.search(q, null, hitsPerPage, sort);
// ScoreDoc[] sortedHits = topFields.scoreDocs;
System.out.println("Found " + sortedHits.length + " hits.");
for(int i=0;i<sortedHits.length;++i) {
int docId = sortedHits[i].doc;
float score = sortedHits[i].score;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get("title")+ " score:"+score);
}
searcher.close();
}
private static void addDoc(IndexWriter w, String value, Integer num) throws IOException {
Document doc = new Document();
doc.add(new Field("title", value, Field.Store.YES, Field.Index.ANALYZED));
//doc.add(new NumericField("num", Field.Store.NO, false).setIntValue(num));
doc.add(new Field ("num", Integer.toString(num), Field.Store.NO, Field.Index.NOT_ANALYZED));
w.addDocument(doc);
}
If print results with and without sorting I get the following output (basically no changes):
Without sorting, found 2 hits.
1. orange score:0.5945348
2. lemon orange score:0.37158427
With sorting, found 2 hits.
1. orange score:0.5945348
2. lemon orange score:0.37158427

The problem is that you are adding the "num" field as a String and then trying to sort it as an integer. You should either add it as an integer (using NumericField) or sort as a String (but beware that it will be sorted according to the lexicographical order).

Related

top "N" rows in each group using hibernate criteria

Top / Bottom / Random "N" rows in each group using Hibernate Criteria / Projections
How to retrieve top 5 Questions for Each Sub Topic
DetachedCriteria criteria = DetachedCriteria.forClass(Question.class);
ProjectionList projList = Projections.projectionList();
projList.add(Projections.property("questionId"));
projList.add(Projections.groupProperty("subTopicId"));
criteria.setProjection(projList);
List<Object> resultList = (List<Object>) getHibernateTemplate().findByCriteria(criteria);
Iterator<Object> itr = resultList.iterator();
List<Integer> questionIdList = new ArrayList<Integer>();
while(itr.hasNext()){
Object ob[] = (Object[])itr.next();
System.out.println(ob[0]+" -- "+ob[1]);
}
I am Using the Code Below for Getting Result Temporally, Is there any solution from Hibernate using Criteria API / Any other alternate way to get the result
Set<Integer> getQuestionsBySubTopicWithLimit(Set<Integer> questionIdsSet, Integer subjectId, Integer limit, Integer status) {
DetachedCriteria criteria = DetachedCriteria.forClass(Question.class);
if(subjectId!=null && subjectId!=0){
criteria.add(Restrictions.eq("subjectId", subjectId));
}
if(status!=null){
criteria.add(Restrictions.eq("status", status));
}
if(questionIdsSet!=null && !questionIdsSet.isEmpty()){
criteria.add(Restrictions.not(Restrictions.in("questionId", questionIdsSet)));
}
ProjectionList projList = Projections.projectionList();
projList.add(Projections.property("questionId"));
projList.add(Projections.property("subTopicId"));
criteria.add(Restrictions.sqlRestriction("1=1 order by sub_topic_id, rand()"));
criteria.setProjection(projList);
List<Object> resultList = (List<Object>) getHibernateTemplate().findByCriteria(criteria);
Iterator<Object> itr = resultList.iterator();
Set<Integer> tmpQuestionIdsSet = new HashSet<Integer>();
Integer subTopicId = 0, tmpSubTopicId = 0;
Integer count = 0;
while(itr.hasNext()){
Object ob[] = (Object[])itr.next();
if(count==0){
subTopicId = (Integer) ob[1];
}
tmpSubTopicId = (Integer) ob[1];
if(tmpSubTopicId!=subTopicId){
subTopicId = tmpSubTopicId;
count = 0;
}
count++;
if(count<=limit){
tmpQuestionIdsSet.add((Integer) ob[0]);
}
}
return tmpQuestionIdsSet;
}

how to find average of each column of a datatable using c#

I have a .csv file containing names, roll, subjects correspondingly.I parsed it into a datatable and I calculated the highest mark of each subject. All i want to calculate is the average of each Subject. Can anyone help me with this !!!!!
This was my output.
Highest mark for ComputerScience:
Name : Manoj
Roll Number : 1212334556
Mark : 94
Highest Mark for Biology:
Name : Sandeep
Roll Number : 1223456477
Mark : 90
Highest Mark for Commerce:
Name : BarathRam
Roll Number : 1212345664
Mark : 97
And csv file contains Names,Rollno, Computer, Biology, Commerce.
Now all i need to get is the average of each subject
My code:
static DataTable table;
static void Main(string[] args)
{
StreamReader r = new StreamReader(#"C:\Users\GOPINATH\Desktop\stud1.csv");
string line = r.ReadLine(); //reads first line - column header
string[] part = line.Split(','); //splits the line by comma
createDataTable(part);
//copy from CSV to DataTable<String,String,int,int,int>
while ((line = r.ReadLine()) != null)
{
try
{
part = line.Split(',');
table.Rows.Add(part[0], part[1], Convert.ToInt32(part[2]), Convert.ToInt32(part[3]), Convert.ToInt32(part[4]));
}
catch(Exception e)
{
Console.WriteLine(e.Message);
}
}
r.Close();
int mark1_index = 0, mark2_index = 0, mark3_index = 0; //initailize index value 0 for highest marks
//finding the index of the highest mark for each subject
for(int i=0 ; i<table.Rows.Count ; i++)
{
if (Convert.ToInt32(table.Rows[i][2]) > Convert.ToInt32(table.Rows[mark1_index][2])) //subject1
{
mark1_index = i;
}
if (Convert.ToInt32(table.Rows[i][3]) > Convert.ToInt32(table.Rows[mark2_index][3])) //subject2
{
mark2_index = i;
}
if (Convert.ToInt32(table.Rows[i][4]) > Convert.ToInt32(table.Rows[mark3_index][4])) //subject3
{
mark3_index = i;
}
}
printmark(table,mark1_index, 2);
printmark(table,mark2_index, 3);
printmark(table,mark3_index, 4);
Console.Read();
}
public static void createDataTable(string[] columnName)
{
//create DataTable<String,String,int,int,int>
table = new DataTable();
table.Columns.Add(columnName[0], typeof(String));
table.Columns.Add(columnName[1], typeof(String));
table.Columns.Add(columnName[2], typeof(int));
table.Columns.Add(columnName[3], typeof(int));
table.Columns.Add(columnName[4], typeof(int));
}
public static void printmark(DataTable t, int rowIndex, int columnIndex)
{
Console.WriteLine("Highest mark for " + t.Columns[columnIndex].ColumnName + ":");
Console.WriteLine("\tName: " + (string)t.Rows[rowIndex][0]);
Console.WriteLine("\tRole Number: " + (string)t.Rows[rowIndex][1]);
Console.WriteLine("\tMark: " + (int)t.Rows[rowIndex][columnIndex]);
}
}
}
You could use Linq and do this.
DataTable t;
var average = t.AsEnumerable().Average(x=> x.Field<int>("columnname"));
var result=table.AsEnumerable()
.GroupBy(x=>x.Field<string>("Subject"))
.Select(x=>new
{
Subject=x.Key,
Average=x.Average(x=> x.Field<int>("Mark"));
}).ToList();
In order to calculate the average mark by Subject, first you need to groupby Subject then calculate the average for each group.

How do you use linq to group records based on an accumulator?

Given an enumeration of records in the format:
Name (string)
Amount (number)
For example:
Laverne 4
Lenny 2
Shirley 3
Squiggy 5
I want to group the records, so that each group's total Amount does not exceed some limit-per-group. For example, 10.
Group 1 (Laverne,Lenny,Shirley) with Total Amount 9
Group 2 (Squiggy) with Total Amount 5
The Amount number is guaranteed to always be less than the grouping limit.
If you allow for captured variables to maintain state, then it becomes easier. If we have:
int limit = 10;
Then:
int groupTotal = 0;
int groupNum = 0;
var grouped = records.Select(r =>
{
int newCount = groupTotal + r.Amount;
if (newCount > limit)
{
groupNum++;
groupTotal = r.Amount;
}
else
groupTotal = newCount;
return new{Records = r, Group = groupNum};
}
).GroupBy(g => g.Group, g => g.Records);
It's O(n), and just a Select and a GroupBy, but the use of captured variables may not be as portable across providers as one may want though.
For linq-to-objects though, it's fine.
Here I have a solution using only LINQ functions:
// Record definition
class Record
{
public string Name;
public int Amount;
public Record(string name, int amount)
{
Name = name;
Amount = amount;
}
}
// actual code for setup and LINQ
List<Record> records = new List<Record>()
{
new Record("Laverne", 4),
new Record("Lenny", 2),
new Record("Shirley", 3),
new Record("Squiggy", 5)
};
int groupLimit = 10;
// the solution
List<Record[]> test =
records.GroupBy(record => records.TakeWhile(r => r != record)
.Concat(new[] { record })
.Sum(r => r.Amount) / (groupLimit + 1))
.Select(g => g.ToArray()).ToList();
This gives the correct result:
test =
{
{ [ "Laverne", 4 ], [ "Lenny", 2 ], [ "shirley", 3 ] },
{ [ "Squiggly", 5 ] }
}
The only downside is that this is O(n2). It essentially groups by the index of the group (as defined by using the sum of the record up to the current one). Note that groupLimit + 1 is needed so that we actually include groups from 0 to groupLimit, inclusive.
I'm trying to find a way of making it prettier, but it doesn't look easy.
A dotnet fiddle with a solution using Aggregate:
https://dotnetfiddle.net/gVgONH
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
// Record definition
public class Record
{
public string Name;
public int Amount;
public Record(string name, int amount)
{
Name = name;
Amount = amount;
}
}
public static void Main()
{
// actual code for setup and LINQ
List<Record> records = new List<Record>()
{
new Record("Alice", 1), new Record("Bob", 5), new Record("Charly", 4), new Record("Laverne", 4), new Record("Lenny", 2), new Record("Shirley", 3), new Record("Squiggy", 5)}
;
int groupLimit = 10;
int sum = 0;
var result = records.Aggregate(new List<List<Record>>(), (accumulated, next) =>
{
if ((sum + next.Amount >= groupLimit) || accumulated.Count() == 0)
{
Console.WriteLine("New team: " + accumulated.Count());
accumulated.Add(new List<Record>());
sum = 0;
}
sum += next.Amount;
Console.WriteLine("New member {0} ({1}): adds up to {2} ", next.Name, next.Amount, sum);
accumulated.Last().Add(next);
return accumulated;
}
);
Console.WriteLine("Team count: " + result.Count());
}
}
With output:
New team: 0
New member Alice (1): adds up to 1
New member Bob (5): adds up to 6
New team: 1
New member Charly (4): adds up to 4
New member Laverne (4): adds up to 8
New team: 2
New member Lenny (2): adds up to 2
New member Shirley (3): adds up to 5
New team: 3
New member Squiggy (5): adds up to 5
Team count: 4
There is no 'performant' way to do this with the built in Linq operators that I am aware of. You could create your own extension method, though:
public static class EnumerableExtensions
{
public static IEnumerable<TResult> GroupWhile<TSource, TAccumulation, TResult>(
this IEnumerable<TSource> source,
Func<TAccumulation> seedFactory,
Func<TAccumulation, TSource, TAccumulation> accumulator,
Func<TAccumulation, bool> predicate,
Func<TAccumulation, IEnumerable<TSource>, TResult> selector)
{
TAccumulation accumulation = seedFactory();
List<TSource> result = new List<TSource>();
using(IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while(enumerator.MoveNext())
{
if(!predicate(accumulator(accumulation, enumerator.Current)))
{
yield return selector(accumulation, result);
accumulation = seedFactory();
result = new List<TSource>();
}
result.Add(enumerator.Current);
accumulation = accumulator(accumulation, enumerator.Current);
}
if(result.Count > 0)
{
yield return selector(accumulation, result);
}
}
}
}
And then call it like this:
int limit = 10;
var groups =
records
.GroupWhile(
() => 0,
(a, x) => a + x,
(a) => a <= limit,
(a, g) => new { Total = a, Group = g });
The way it is currently written, if any single record exceeds that limit then that record is returned by itself. You could modify it to exclude records that exceed the limit or leave it as is and perform the exclusion with Where.
This solution has O(n) runtime.

lucene.net search multiple fields with one value AND other field with another value

I have a Lucene doc with various fields; Name, BriefData, FullData, ParentIDs (comma delimted string), ProductType, Experience.
I have a search form with a text box, drop down of parents, dropdown of product types, dropdown of experience.
If I search from the text box I get the results I should. If I search from any of dropdowns (or all of them) I get the results I want. If I use the dropdowns AND the textbox I get all results as a search of textbox OR dropdowns. What I want is textbox AND dropdowns.
So, my search builds something like so:
if (string.IsNullOrWhiteSpace(searchTerm))
{
searchTerm = "";
if (!string.IsNullOrWhiteSpace(Request.QueryString["textbox"]))
{
string tester = Request.QueryString["query"];
searchTerm += tester;
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["parent"]))
{
searchTerm += searchTerm.Length > 0 ? " " : "";
searchTerm += "+ParentIDs:" + Request.QueryString["parent"];
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["product"]))
{
ProductTypes pt = db.ProductTypes.Find(int.Parse(Request.QueryString["product"]));
if (pt != null) {
searchTerm += searchTerm.Length > 0 ? " " : "";
searchTerm += "+ProductType:" + pt.TypeName;
}
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["experience"]))
{
searchTerm += searchTerm.Length > 0 ? " " : "";
searchTerm += "+Experience:" + Request.QueryString["experience"];
}
}
if (!Directory.Exists(Helper.LuceneSearch._luceneDir))
Directory.CreateDirectory(Helper.LuceneSearch._luceneDir);
_searchResults = string.IsNullOrEmpty(searchField)
? Helper.LuceneSearch.Search(searchTerm).Distinct()
: Helper.LuceneSearch.Search(searchTerm, searchField).Distinct();
return View(_searchResults.Distinct());
If I am searching just textbox and dropdown parent I get a searchterm of "north +ParentIDs:62"
What I want is the search to ONLY return results with a parent of 62 AND (Name OR BriefData OR FullData of "north").
I have tried creating a searchTerm of "+(Name:north BriefData:north FullData:north) +ParentIDs:62" and "Name:north BriefData:north FullData:north +ParentIDs:62". The first returns no results and the second returns the same as just searching +ParentIDs:62.
I think the logic behind this is pretty simple. However, I have no idea what it is that I need to write in code.
Please help. :)
Thanks to JF Beaulac giving me cause to look at the Lucene.Net code I had included (Helper.LuceneSearch.Search(searchTerm).Distinct()) I rewrote my search to essentially not bother using that but instead to somewhat duplicate it.
I did this by using the MultiFieldQueryParser for the, oddly enough, multi-field search I wanted. I then used the TermQuery for single field queries. These were all added to a BooleanQuery and my search was executed against said BooleanQuery.
var hits_limit = 1000;
var analyzer = new StandardAnalyzer(Version.LUCENE_29);
BooleanQuery bq = new BooleanQuery();
if (string.IsNullOrWhiteSpace(searchTerm))
{
searchTerm = "";
if (!string.IsNullOrWhiteSpace(Request.QueryString["textbox"]))
{
string tester = Request.QueryString["textbox"];
var parser = new MultiFieldQueryParser(Version.LUCENE_29, new[] { "Name", "BriefData", "FullData" }, analyzer);
var query = Helper.LuceneSearch.parseQuery(tester.Replace("*", "").Replace("?", ""), parser);
bq.Add(query, BooleanClause.Occur.MUST);
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["parent"]))
{
bq.Add(new TermQuery(new Term("ParentIDs", Request.QueryString["parent"])), BooleanClause.Occur.MUST);
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["product"]))
{
ProductTypes pt = db.ProductTypes.Find(int.Parse(Request.QueryString["product"]));
if (pt != null) {
bq.Add(new TermQuery(new Term("ProductType", pt.TypeName)), BooleanClause.Occur.MUST);
}
}
if (!string.IsNullOrWhiteSpace(Request.QueryString["experience"]))
{
bq.Add(new TermQuery(new Term("Experience", Request.QueryString["experience"])), BooleanClause.Occur.MUST);
}
}
if (!System.IO.Directory.Exists(Helper.LuceneSearch._luceneDir))
System.IO.Directory.CreateDirectory(Helper.LuceneSearch._luceneDir);
var searcher = new IndexSearcher(Helper.LuceneSearch._directory, false);
var hits = searcher.Search(bq, null, hits_limit, Sort.RELEVANCE).ScoreDocs;
var results = Helper.LuceneSearch._mapLuceneToDataList(hits, searcher).Distinct();
analyzer.Close();
searcher.Close();
searcher.Dispose();
return View(results);
It should be noted that to get the product and experience fields to work I had to set them to "Field.Index.NOT_ANALYZED" when adding them to the index. I'm guessing this was because they would only ever have a single value per document. The other searched fields are "Field.Index.ANALYZED".

Dynamic Linq to Datatable Derived Field

Is it possible to use Dynamic Linq to run a query similar to:
Select a, b, a + b as c
from MyDataTable
I have an application where the user can enter SQL statements, the results of these statements are then assigned to a DataTable. There is also the option to derive a field based on other fields. (e.g. user can say field C = a + b, or field D = A*B+10 etc).
Ideally I would like to do something similar to:
string myCalc = "Convert.ToDouble(r.ItemArray[14])+Convert.ToDouble(r.ItemArray[45])";
var parameters = from r in dt.AsEnumerable()
select (myCalc);
What I want to do in this example is add the value of column 14 to column 45 and return it. It's up to the user to decide what expression to use so the text in the select needs to be from a string, I cannot hard code the expression. The string myCalc is purely for demonstration purposes.
You could do that using a Dictionary, and a DataReader and Dynamic Queries. Here is an example based in part in Rob Connery's Massive ORM RecordToExpando:
void Main()
{
string connString = "your connection string";
System.Data.SqlClient.SqlConnection conn = new SqlConnection(connString);
string statement = "SUM = EstimatedEffort + OriginalEstimate, Original = OriginalEstimate";
// Note: You should parse the statement so it doesn't have any updates or inserts in it.
string sql = "SELECT " + statement +" FROM Activities";
List<IDictionary<string, object>> results = new List<IDictionary<string, object>>();
conn.Open();
using(conn)
{
var cmd = new SqlCommand(sql, conn);
var reader = cmd.ExecuteReader();
while (reader.Read())
{
var dic = new Dictionary<string, object>();
for (int i = 0; i < reader.FieldCount; i++)
{
dic.Add(
reader.GetName(i),
DBNull.Value.Equals(reader[i]) ? null : reader[i]);
}
results.Add(dic);
}
}
foreach (var dicRow in results)
{
foreach (string key in dicRow.Keys)
{
Console.Write("Key: " + key + " Value: " + dicRow[key]);
}
Console.WriteLine();
}
}
Something like this:
void Main()
{
var dataTable = new DataTable();
dataTable.Columns.Add("a", typeof(double));
dataTable.Columns.Add("b", typeof(double));
dataTable.Rows.Add(new object[] { 10, 20 });
dataTable.Rows.Add(new object[] { 30, 40 });
string myCalc = "Convert.ToDouble(ItemArray[0]) + Convert.ToDouble(ItemArray[1])";
var query = dataTable.AsEnumerable().AsQueryable();
var result = query.Select(myCalc);
foreach (Double c in result)
{
System.Console.WriteLine(c);
}
}

Resources