search an array of string in a large string and check if any exist using linq - linq

I have an array of string
var searchString = new string[] {"1:PS", "2:PS"};
and a large result string eg;
var largeString = "D9876646|10|1:PS^CD9876647100|11|2:PS"
how do I check if any of the options in searchString exist in the largeString?
I know it can be done via loop quite easily but I am looking for an other way around since I need to append the following as search clause in linq query.

You can use LINQ for it with a simple Any() call, like this:
var hasAny = searchString.Any(sub => largeString.Contains(sub));
However, this is as slow as a foreach loop. You can find the answer faster with a regex constructed from searchString:
var regex = string.Join("|", searchString.Select(Regex.Escape));
var hasAny = Regex.IsMatch(largeString, regex);

Depending on the nature of your LINQ provider (assuming it isn't LINQ to Objects), you may want to add individual tests for each member of searchString. The best way to do this is probably using PredicateBuilder
var sq = PredicateBuilder.New<dbType>();
foreach (var s in searchString)
sq = sq.Or(r => r.largeString.Contains(s));
q = q.Where(sq);

Related

What is the performance optimum (or even better coding practise) for writing this Linq query

I am new to linq so please excuse me if I am asking a very basic question:
paymentReceiptViewModel.EntityName = payment.CommitmentPayments.First().Commitment.Entity.GetEntityName();
paymentReceiptViewModel.HofItsId = payment.CommitmentPayments.First().Commitment.Entity.ResponsiblePerson.ItsId;
paymentReceiptViewModel.LocalId = payment.CommitmentPayments.First().Commitment.Entity.LocalEntityId;
paymentReceiptViewModel.EntityAddress = payment.CommitmentPayments.First().Commitment.Entity.Address.ToString();
This code is too repetitive and I am sure there is a better way of writing this.
Thanks in advance for looking this up.
Instead of executing query at each line, get commitment entity once:
var commitment = payment.CommitmentPayments.First().Commitment.Entity;
paymentReceiptViewModel.EntityName = commitment.GetEntityName();
paymentReceiptViewModel.HofItsId = commitment.ResponsiblePerson.ItsId;
paymentReceiptViewModel.LocalId = commitment.LocalEntityId;
paymentReceiptViewModel.EntityAddress = commitment.Address.ToString();
It depends a bit on what you are selecting to, you cannot select from one entity into another in Linq to Entities. If you are using LINQ to SQL and creating the paymentReceiptModel, you can do this.
var paymentReceiptModel = payment.CommitmentPayments.select(x=>new{
EntityName = x.Commitment.Entity.GetEntityName(),
HofItsId = x.Commitment.Entity.ResponsiblePerson.ItsId,
LocalId = x.Commitments.Entity.LocalEntityId,
EntityAddress = x.Commitment.Entity.Address
}).FirstOrDefault();
If you are using an already instantiated paymentReceiptModel and just need to assign properties then you are better looking to the solution by lazyberezovsky.
To get around the limitation in Linq to Entities, if that is what you are using, you could do this
var result = payment.CommitmentPayments.select(x=>x);
var paymentReceiptModel= result.select(x=>new
{
EntityName = x.Commitment.Entity.GetEntityName(),
HofItsId = x.Commitment.Entity.ResponsiblePerson.ItsId,
LocalId = x.Commitments.Entity.LocalEntityId,
EntityAddress = x.Commitment.Entity.Address
}).FirstOrDefault();
This essentially, makes the majority of your query Linq to Objects, only the first line is Linq to Entities

Delete Files that are more than than 10 days old using Linq

i'm using the code below to delete the files that are more than 10 days old. Is there a simpler/smarter way of doing this?
string source_path = ConfigurationManager.AppSettings["source_path"];
string filename= ConfigurationManager.AppSettings["filename"];
var fileQuery= from file in Directory.GetFiles(source_path,filename,SearchOption.TopDirectoryOnly)
where File.GetCreationTime(file)<System.DateTime.Now.AddDays(-10)
select file;
foreach(var f in fileQuery)
{
File.Delete(f);
}
Well there are two things I'd change:
Determine the cut-off DateTime once, rather than re-evaluating DateTime.Now repeatedly
I wouldn't use a query expression when you've just got a where clause:
So I'd rewrite the query part as:
var cutoff = DateTime.Now.AddDays(-10);
var query = Directory.GetFiles(sourcePath, filename, SearchOption.TopDirectoryOnly)
.Where(f => File.GetCreationTime(f) < cutoff);
Another alternative would be to use DirectoryInfo and FileInfo:
var cutoff = DateTime.Now.AddDays(-10);
var path = new DirectoryInfo(sourcePath);
var query = path.GetFiles(filename, SearchOption.TopDirectoryOnly)
.Where(fi => fi.CreationTime < cutoff);
(In .NET 4 you might also want to use EnumerateFiles instead.)
It is possible to do a LINQ "one-liner" to perform this process:
Directory.GetFiles(source_path,filename,SearchOption.TopDirectoryOnly)
.Where(f => File.GetCreationTime(file) < System.DateTime.Now.AddDays(-10))
.All(f => {File.Delete(f); return true;);
Don't forget to wrap the code in a try catch.

Truncating a collection using Linq query

I want to extract part of a collection to another collection.
I can easily do the same using a for loop, but my linq query is not working for the same.
I am a neophyte in Linq, so please help me correcting the query (if possible with explanation / beginners tutorial link)
Legacy way of doing :
Collection<string> testColl1 = new Collection<string> {"t1", "t2", "t3", "t4"};
Collection<string> testColl2 = new Collection<string>();
for (int i = 0; i < newLength; i++)
{
testColl2.Add(testColl1[i]);
}
Where testColl1 is the source & testColl2 is the desired truncated collection of count = newLength.
I have used the following linq queries, but none of them are working ...
var result = from t in testColl1 where t.Count() <= newLength select t;
var res = testColl1.Where(t => t.Count() <= newLength);
Use Enumerable.Take:
var testColl2 = testColl1.Take(newLength).ToList();
Note that there's a semantic difference between your for loop and the version using Take. The for loop will throw with IndexOutOfRangeException exception if there are less than newLength items in testColl1, whereas the Take version will silently ignore this fact and just return as many items up to newLength items.
The correct way is by using Take:
var result = testColl1.Take(newLength);
An equivalent way using Where is:
var result = testColl1.Where((i, item) => i < newLength);
These expressions will produce an IEnumerable, so you might also want to attach a .ToList() or .ToArray() at the end.
Both ways return one less item than your original implementation does because it is more natural (e.g. if newLength == 0 no items should be returned).
You could convert to for loop to something like this:
testColl1.Take(newLength)
Use Take:
var result = testColl1.Take(newLength);
This extension method returns the first N elements from the collection where N is the parameter you pass, in this case newLength.

Finding strings that are not in DB already

I have some bad performance issues in my application. One of the big operations is comparing strings.
I download a list of strings, approximately 1000 - 10000. These are all unique strings.
Then I need to check if these strings already exists in the database.
The linq query that I'm using looks like this:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var selection = from a in allNewStrings
where !(from o in context.Items
select o.TheUniqueString).Contains(a)
select a;
Am I doing something wrong or how could I make this process faster preferably with Linq?
Thanks.
You did query the same unique strings 1000 - 10000 times for every element in allNewStrings, so it's extremely inefficient.
Try to query unique strings separately in order that it is executed once:
IEnumerable<string> allNewStrings = DownloadAllStrings();
var uniqueStrings = from o in context.Items
select o.TheUniqueString;
var selection = from a in allNewStrings
where !uniqueStrings.Contains(a)
select a;
Now you can see that the last query could be written using Except which is more efficient for the case of set operators like your example:
var selection = allNewStrings.Except(uniqueStrings);
An alternative solution would be to use a HashSet:
var set = new HashSet<string>(DownloadAllStrings());
set.ExceptWith(context.Items.Select(s => s.TheUniqueString));
The set will now contain the the strings that are not in the DB.

TableServiceContext and dynamic query

I m trying to do something that look very simple but I hit massive difficulties when I want to make that more dynamic.
Expression<Func<TableServiceEntity, bool>> predicate = (e) => e.PartitionKey == "model" && (e.RowKey == "home" || e.RowKey == "shared");
context.CreateQuery<TableServiceEntity>(tableName).Where(predicate);
I would like to pass an array of rowKey instead of having to hard code the predicate.
When I try to build an expression tree I receive a not supported exception I think it doesn't support invoking as part of the expression tree.
Does someone know how to build and expression tree exactly as the predicate to avoid the not supported exception?
Thank you by advance
So, you can build the query dynamically by using something like this (taken from PhluffyFotos sample):
Expression<Func<PhotoTagRow, bool>> search = null;
foreach (var tag in tags)
{
var id = tag.Trim().ToLowerInvariant();
if (String.IsNullOrEmpty(id))
{
continue;
}
Expression<Func<PhotoTagRow, bool>> addendum = t => t.PartitionKey == id;
if (search == null)
{
search = addendum;
}
else
{
search = Expression.Lambda<Func<PhotoTagRow, bool>>(Expression.OrElse(search.Body, addendum.Body), search.Parameters);
}
}
Now, once you have 'search' you can just pass that as the predicate in your Where clause.
However, I want to convince you not to do this. I am answering your question, but telling you that it is a bad idea to do a multiple '|' OR clause in Table storage. The reason is that today at least, these queries cannot be optimized and they cause a full table scan. The performance will be horrendous with any non-trivial amount of data. Furthermore, if you build your predicates dynamically like this you run the risk of blowing the URL limit (keep that in mind).
This code in PhluffyFotos shows how, but it is actually a bad practice (I know, I wrote it). It really should be optimized to run each OR clause separately in parallel. That is how you really should do it. AND clauses are ok, but OR clauses should be parallelized (use PLINQ or TPL) and you should aggregate the results. It will be much faster.
HTH.
I believe what HTH said about this kind of query doing a full table scan is incorrect from the documentation I have read. Azure will perform a PARTITION scan rather than a TABLE scan which is a big difference in performance.
Here is my solution please read also the answer from HTH who pointed out that this is not a best practice.
var parameter = Expression.Parameter(typeof(TableServiceEntity), "e");
var getPartitionKey = typeof(TableServiceEntity).GetProperty("PartitionKey").GetGetMethod();
var getRowKey = typeof(TableServiceEntity).GetProperty("RowKey").GetGetMethod();
var getPartition = Expression.Property(parameter, getPartitionKey);
var getRow = Expression.Property(parameter, getRowKey);
var constPartition = Expression.Constant("model", typeof(string));
var constRow1 = Expression.Constant("home", typeof(string));
var constRow2 = Expression.Constant("shared", typeof(string));
var equalPartition = Expression.Equal(getPartition, constPartition);
var equalRow1 = Expression.Equal(getRow, constRow1);
var equalRow2 = Expression.Equal(getRow, constRow2);
var and = Expression.AndAlso(equalPartition, Expression.OrElse(equalRow1, equalRow2));
return Expression.Lambda<Func<TableServiceEntity, bool>>(and, parameter);

Resources