Examine search contain words - examine

I am trying to write script using examine fluent api.
I have conditions that I need to fulfill
search must find nodes starting with searchTerm
search must find nodes containing searchTerm
search must find nodes ending with searchTerm
search must support multiple words
search must not fail due to & * ` and another characters
I am able to match only words starting with this string.
When I execute to code below, I do get only words starting with the searchTerm
public IEnumerable<SearchResultItem> Search(string searchTerm)
{
//Create search Criteria
var sc = ExamineManager.Instance.CreateSearchCriteria();
//define query
var query = sc.NodeName(searchTerm.MultipleCharacterWildcard())
.Or()
.Field("content", searchTerm.MultipleCharacterWildcard())
.Compile();
var results = ExamineManager.Instance.SearchProviderCollection["ContentSearcher"].Search(query);
return results.OrderBy(x => x.Score).Select(MapSearchResults);
}
How do I update the search script for all conditions?

Solution with raw query.
This should hover
search must find nodes starting with searchTerm
search must find nodes containg searchTerm
search must find nodes ending with searchTerm
search must support multiple words
var searchTerm = Request["term"].Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
var searcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = searcher.CreateSearchCriteria();
var luceneString = new System.Text.StringBuilder();
luceneString.Append("nodeTypeAlias:");
luceneString.Append("*");
for (int i = 0; i < searchTerm.Length; i++)
{
luceneString.Append(" AND ");
luceneString.Append("title:");
luceneString.Append("*");
luceneString.Append(searchTerm[i]);
luceneString.Append("*");
}
var query = searchCriteria.RawQuery(luceneString.ToString());
var searchResults = searcher.Search(query);
this article helped me http://www.lucenetutorial.com/lucene-query-syntax.html

Related

Browse all documents and bulk update some of them

I am using the Jest client for Elastic to browse an index of document to update one field. My workflow is to run an empty query with paging and look if I can compute the extra field. If I can, I update the relevant documents in one bulk update.
Pseudo-code
private void process() {
int from = 0
int size = this.properties.batchSize
boolean moreResults = true
while (moreResults) {
moreResults = handleBatch(from, this.properties.batchSize)
from += size
}
}
private boolean handleBatch(int from, int size) {
log.info("Processing records $from to " + (from + size))
def result = search(from, size)
if (result.isSucceeded()) {
// Check each element and perform an upgrade
}
// return true if the query returned at least one item
}
private SearchResult search(int from, int size) {
String query =
'{ "from": ' + from + ', ' +
'"size": ' + size + '}'
Search search = new Search.Builder(query)
.addIndex("my-index")
.addType('my-document')
.build();
jestClient.execute(search)
}
I don't have any error but when I run the batch several times, it looks like is finding "new" documents to upgrade while the total number of documents hasn't changed. I got the suspicion that an updated document was processed several times which I could confirm by checking the processed IDs.
How can I run a query so that the original documents are the ones processed and any update wouldn't interfere with it?
Instead of running a normal search (i.e. using from+size), you need to run a scroll search query instead. The main difference is that the scroll will freeze a given snapshot of documents (at the time of the query) and query them. Whatever changes happen after the first scroll query, won't be considered.
Using Jest, you need to modify your code to look more like this:
// 1. Initiate the scroll request
Search search = new Search.Builder(searchSourceBuilder.toString())
.addIndex("my-index")
.addType("my-document")
.addSort(new Sort("_doc"))
.setParameter(Parameters.SIZE, size)
.setParameter(Parameters.SCROLL, "5m")
.build();
JestResult result = jestClient.execute(search);
// 2. Get the scroll_id to use in subsequent request
String scrollId = result.getJsonObject().get("_scroll_id").getAsString();
// 3. Issue scroll search requests until you have retrieved all results
boolean moreResults = true;
while (moreResults) {
SearchScroll scroll = new SearchScroll.Builder(scrollId, "5m")
.setParameter(Parameters.SIZE, size).build();
result = client.execute(scroll);
def hits = result.getJsonObject().getAsJsonObject("hits").getAsJsonArray("hits");
moreResults = hits.size() > 0;
}
You need to modify your process and handleBatch methods with the above code. It should be straightforward, let me know if not.

Sitecore search: Get results term by term

Here, I am trying to get search results for multiple terms. Say fulltext="Lee jeans", then regexresult={"lee","jeans"}.
Code :
IProviderSearchContext searchContext = index.CreateSearchContext();
IQueryable<SearchItem> scQuery = searchContext.GetQueryable<SearchItem>();
var predicate = PredicateBuilder.True<SearchItem>();
//checking if the fulltext includes terms within " "
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
foreach (string term in regexResult)
{
predicate = predicate.Or(p => p.TextContent.Contains(term));
}
scQuery = scQuery.Where(predicate);
IEnumerable<SearchHit<SearchItem>> results = scQuery.GetResults().Hits;
results=sortResult(results);
Sorting is based on sitecore fields:
switch (query.Sort)
{
case SearchQuerySort.Date:
results = results.OrderBy(x => GetValue(x.Document, FieldNames.StartDate));
break;
case SearchQuerySort.Alphabetically:
results = results.OrderBy(x => GetValue(x.Document, FieldNames.Profile));
break;
case SearchQuerySort.Default:
default:
results = results.OrderByDescending(x => GetValue(x.Document, FieldNames.Updated));
break;
}
Now, what i need is to have results for "lee" first and sort them and then find results for "jeans" and sort them. The final search result will have the concatenated sets of sorted items for "lee" first and then for "jeans".
Thus we would have to get results for "lee" first and then results for "jeans"
Is there a way to get results term by term ?
You can use Query-Time Boosting to give the terms more relevance and therefore affect the ranking:
Sitecore 7: Six Types of Search Boosting
Lucene Boost With LINQ in Sitecore 7 ContentSearch
You want to give the first term the highest boost, and then gradually reduce for each additional term:
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
float boost = regexResult.Count();
foreach (string term in regexResult)
{
predicate = predicate.Or(p => p.TextContent.Contains(term)).Boost(boost--);
}
EDIT:
Boosting and sorting in the same query is not possible, at least, the sorting will undo the "relevance" based sorting that was returned due to boosting.
Alternative way would be to search multiple times and concatenate the results returning a single list. Not as efficient since you are essentially making multiple searches:
IProviderSearchContext searchContext = index.CreateSearchContext();
var items = new List<SearchResultItem>();
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
foreach (string term in regexResult)
{
var results = searchContext.GetQueryable<SearchResultItem>()
.Where(p => p.Content.Contains(term));
SortSearchResults(results); //results passed in by reference, no need to return object to set it back to itself
items.AddRange(results);
}
NOTE: The above does not take into account duplicates between the result sets.

How to retrieve total view count of large number of pages combined from the GA API

We are interested in the statistics of the different pages combined from the Google Analytics core reporting API. The only way I found to query statistics multiple pages at the same is by creating a filter like so:
ga:pagePath==page?id=a,ga:pagePath==page?id=b,ga:pagePath==page?id=c
And this get escaped inside the filter parameter of the GET query.
However when the GET query gets over 2000 characters I get the following response:
414. That’s an error.
The requested URL /analytics/v3/data/ga... is too large to process. That’s all we know.
Note that just like in the example call the only part that is different per page is a GET parameter in the pagePath, but we have to OR a new filter specifying both the metric (pagePath) as well as the part of the path that is always identical.
Is there any way to specify a large number of different pages to query without hitting this limit in the GET query (I can't find any documentation for doing POST requests)? Or are there alternatives to creating batches of a max of X different pages per query and adding them up on my end?
Instead of using ga:pagePath as part of a filter you should use it as a dimension. You can get up to 10,000 rows per query this way and paginate to get all results. Then parse the results client side to get what you need. Additionally use a filter to scope the results down if possible based on your site structure or page names.
I am sharing a sample code where you can fetch more then 10,000 record data via help of Items PerPage
private void GetDataofPpcInfo(DateTime dtStartDate, DateTime dtEndDate, AnalyticsService gas, List<PpcReportData> lstPpcReportData, string strProfileID)
{
int intStartIndex = 1;
int intIndexCnt = 0;
int intMaxRecords = 10000;
var metrics = "ga:impressions,ga:adClicks,ga:adCost,ga:goalCompletionsAll,ga:CPC,ga:visits";
var r = gas.Data.Ga.Get("ga:" + strProfileID, dtStartDate.ToString("yyyy-MM-dd"), dtEndDate.ToString("yyyy-MM-dd"),
metrics);
r.Dimensions = "ga:campaign,ga:keyword,ga:adGroup,ga:source,ga:isMobile,ga:date";
r.MaxResults = 10000;
r.Filters = "ga:medium==cpc;ga:campaign!=(not set)";
while (true)
{
r.StartIndex = intStartIndex;
var dimensionOneData = r.Fetch();
dimensionOneData.ItemsPerPage = intMaxRecords;
if (dimensionOneData != null && dimensionOneData.Rows != null)
{
var enUS = new CultureInfo("en-US");
intIndexCnt++;
foreach (var lstFirst in dimensionOneData.Rows)
{
var objPPCReportData = new PpcReportData();
objPPCReportData.Campaign = lstFirst[dimensionOneData.ColumnHeaders.IndexOf(dimensionOneData.ColumnHeaders.FirstOrDefault(h => h.Name == "ga:campaign"))];
objPPCReportData.Keywords = lstFirst[dimensionOneData.ColumnHeaders.IndexOf(dimensionOneData.ColumnHeaders.FirstOrDefault(h => h.Name == "ga:keyword"))];
lstPpcReportData.Add(objPPCReportData);
}
intStartIndex = intIndexCnt * intMaxRecords + 1;
}
else break;
}
}
Only one thing is problamatic that your query length shouldn't exceed around 2000 odd characters

Reading the next line using LINQ and File.ReadAllLines()

I have a file which represents items, in one line there's Item GUID followed by 5 lines describing the item.
Example:
Line 1: Guid=8e2803d1-444a-4893-a23d-d3b4ba51baee name= line1
Line 2: Item details = bla bla
.
.
Line 7: Guid=79e5e39d-0c17-42aa-a7c4-c5fa9bfe7309 name= line7
Line 8: Item details = bla bla
.
.
I am trying to access this file first to get the GUIDs of the items meet the criteria provided using LINQ e.g. where line.Contains("line1").. This way I will get the whole line, I will extract the GUID from there, I want to pass this GUID to another function which should access the file "again", find that line (where line.Contains("line1") && line.Contains("8e2803d1-444a-4893-a23d-d3b4ba51baee") and reads the next 5 lines starting from that line.
Is there any efficient way to do so?
I don't think it really makes sense to use LINQ entirely given the requirements of what you need to do and given that the index of the line in the array is fairy integral. I would also recommend doing everything in one pass - opening the file multiple times won't be as efficient as just reading everything once and processing it immediately. As long as the file is structured as well as you describe, this won't be terribly difficult:
private void GetStuff()
{
var lines = File.ReadAllLines("foo.txt");
var result = new Dictionary<Guid, String[]>();
for (var index = 0; index < lines.Length; index += 6)
{
var item = new
{
Guid = new Guid(lines[index]),
Description = lines.Skip(index + 1).Take(5).ToArray()
};
result.Add(item.Guid, item.Description);
}
}
I tried a couple different ways to do this with LINQ but nothing allowed me to do a single scan of the file. For this scenario you're talking about I would go down to the Enumerable level and use the GetEnumerator like this:
public IEnumerable<LogData> GetLogData(string filename)
{
var line1Regex = #"Line\s(\d+):\sGuid=([0123456789abcdefg]{8}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{4}-[0123456789abcdefg]{12})\sname=\s(\w*)";
int detailLines = 4;
var lines = File.ReadAllLines(filename).GetEnumerator();
while (lines.MoveNext())
{
var line = (string)lines.Current;
var match = Regex.Match(line, line1Regex);
if (!match.Success)
continue;
var details = new string[detailLines];
for (int i = 0; i < detailLines && lines.MoveNext(); i++)
{
details[i] = (string)lines.Current;
}
yield return new LogData
{
Id = new Guid(match.Groups[2].Value),
Name = match.Groups[3].Value,
LineNumber = int.Parse(match.Groups[1].Value),
Details = details
};
}
}

LINQ: Entity string field contains any of an array of strings

I want to get a collection of Product entities where the product.Description property contains any of the words in a string array.
It would look something like this (result would be any product which had the word "mustard OR "pickles" OR "relish" in the Description text):
Dim products As List(Of ProductEntity) = New ProductRepository().AllProducts
Dim search As String() = {"mustard", "pickles", "relish"}
Dim result = From p In products _
Where p.Description.Contains(search) _
Select p
Return result.ToList
I already looked at this similar question but couldn't get it to work.
Since you want to see if search contains a word which is contained in the description of p you basically need to test for each value in search if it is contained in the description of p
result = from p in products
where search.Any(val => p.Description.Contains(val))
select p;
This is c# syntax for the lambda method since my vb is not that great
Dim result = From p in products _
Where search.Any(Function(s) p.Description.Contains(s))
Select p
You can use a simple LINQ query, if all you need is to check for substrings:
var q = words.Any(w => myText.Contains(w));
// returns true if myText == "This password1 is weak";
If you want to check for whole words, you can use a regular expression:
Matching against a regular expression that is the disjunction of all the words:
// you may need to call ToArray if you're not on .NET 4
var escapedWords = words.Select(w => #"\b" + Regex.Escape(w) + #"\b");
// the following line builds a regex similar to: (word1)|(word2)|(word3)
var pattern = new Regex("(" + string.Join(")|(", escapedWords) + ")");
var q = pattern.IsMatch(myText);
Splitting the string into words with a regular expression, and testing for membership on the words collection (this will get faster if you use make words into a HashSet instead of a List):
var pattern = new Regex(#"\W");
var q = pattern.Split(myText).Any(w => words.Contains(w));
In order to filter a collection of sentences according to this criterion all you have to do its put it into a function and call Where:
// Given:
// bool HasThoseWords(string sentence) { blah }
var q = sentences.Where(HasThoseWords);
Or put it in a lambda:
var q = sentences.Where(s => Regex.Split(myText, #"\W").Any(w => words.Contains(w)));
Ans From => How to check if any word in my List<string> contains in text by #R. Martinho Fernandes

Resources