LINQ and CASE Sensitivity - linq

I have this LINQ Query:
TempRecordList = new ArrayList(TempRecordList.Cast<string>().OrderBy(s => s.Substring(9, 30)).ToArray());
It works great and performs sorting in a way that's accurate but a little different from what I want. Among the the result of the query I see something like this:
Palm-Bouter, Peter
Palmer-Johnson, Sean
Whereas what I really need is to have names sorted like this:
Palmer-Johnson, Sean
Palm-Bouter, Peter
Basically I want the '-' character to be treated as being lower than the character so that names that contain it show up later in an ascending search.
Here is another example. I get:
Dias, Reginald
DiBlackley, Anton
Instead of:
DiBlackley, Anton
Dias, Reginald
As you can see, again, the order is switched due to how the uppercase letter 'B' is treated.
So my question is, what do I need to change in my LINQ query to make it return results in the order I specified. Any feedback would be greatly appreaciated.
By the way, I tried using s.Substring(9, 30).ToLower() but that didn't help.
Thank you!

To customize the sorting order you will need to create a comparer class that implements IComparer<string> interface. The OrderBy() method takes comparer as second parameter.
internal sealed class NameComparer : IComparer<string> {
private static readonly NameComparer DefaultInstance = new NameComparer();
static NameComparer() { }
private NameComparer() { }
public static NameComparer Default {
get { return DefaultInstance; }
}
public int Compare(string x, string y) {
int length = Math.Min(x.Length, y.Length);
for (int i = 0; i < length; ++i) {
if (x[i] == y[i]) continue;
if (x[i] == '-') return 1;
if (y[i] == '-') return -1;
return x[i].CompareTo(y[i]);
}
return x.Length - y.Length;
}
}
This works at least with the following test cases:
var names = new[] {
"Palmer-Johnson, Sean",
"Palm-Bouter, Peter",
"Dias, Reginald",
"DiBlackley, Anton",
};
var sorted = names.OrderBy(name => name, NameComparer.Default).ToList();
// sorted:
// [0]: "DiBlackley, Anton"
// [1]: "Dias, Reginald"
// [2]: "Palmer-Johnson, Sean"
// [3]: "Palm-Bouter, Peter"

As already mentioned, the OrderBy() method takes a comparer as a second parameter.
For strings, you don't necessarily have to implement an IComparer<string>. You might be fine with System.StringComparer.CurrentCulture (or one of the others in System.StringComparer).
In your exact case, however, there is no built-in comparer which will handle also the - after letter sort order.

OrderBy() returns results in ascending order.
e comes before h, thus the first result (remember you're comparing on a substring that starts with the character in the 9th position...not the beginning of the string) and i comes before y, thus the second. Case sensitivity has nothing to do with it.
If you want results in descending order, you should use OrderByDescending():
TempRecordList.Cast<string>
.OrderByDescending(s => s.Substring(9, 30)).ToArray());

You might want to just implement a custom IComparer object that will give a custom priority to special, upper-case and lower-case characters.
http://msdn.microsoft.com/en-us/library/system.collections.icomparer.aspx

Related

how to get the if else condition using filter operation of java stream?

Question :
from the list of an integer get the square of odd number and half the even number and then return the list of value.
Ans : 1> i will write the logic and if else condition inside the map() method.
List<Integer> output = intArray.stream().map(x-> {
if(x%2 ==0){
x=x/2;
}else{
x= x*x;
}
}).collect(Collectors.toList());
Is their any better way to do this specially using Filter?
Try using map:
map(x -> x % 2 == 0? x / 2: x * x);
Let me know if this works for you.
You can learn more about map and filter here
As you are transforming data (performing a math operation), you cannot use Filter here. Filter is used to filter out elements in your stream. For example if you only want to preserve the even numbers, you could use a Filter
What you need to use is the Map, as you already did. Do note that a map should always return data. Your code is missing this return statement.
To make it more readable, you could split your mapping logic in a method. This makes your stream easy to read and easy to follow (when you give the method a good name ofcourse).
Code example
List<Integer> output = intArray.stream()
.map(test::divideOrPow)
.collect(Collectors.toList());
private int divideOrPow(intx) {
if (x % 2 == 0) {
return x / 2;
} else {
return x * x;
}
}

Avoid counting values of Ints with for loop in Kotlin

I have a list of A class objects
data class A{
val abc: Abc
val values: Int?
}
val list = List<A>
If I want to count how many objects I have in list I use:
val count= a.count()
or val count= a.count(it -> {})
How to append all values in the list of objects A avoiding for loop? Generaly Im looking for proper kotlin syntax with avoiding code below
if (a!= null) {
for (i in list) {
counter += i.values!!
}
}
Either use sumBy or sum in case you have a list of non-nullable numbers already available, i.e.:
val counter = list.sumBy { it.values ?: 0 }
// or
val counter = extractedNonNullValues.sum()
The latter only makes sense if you already mapped your A.values before to a list of non-nullable values, e.g. something like:
val extractedNonNullValues= list.mapNotNull { it.values } // set somewhere else before because you needed it...
If you do not need such an intermediate extractedNonNullValues-list then just go for the sumBy-variant.
I don't see you doing any appending to a list in the question. Based on your for loop I believe what you meant was "How do I sum properties of objects in my list". If that's the case you can use sumBy, the extension function on list that takes a labmda: ((T) -> Int) and returns an Int like so:
val sum = list.sumBy { a -> a.values ?: 0 }
Also, calling an Int property values is pretty confusing, I think it should be called value. The plural indicates a list...
On another note, there is a possible NPE in your original for loop. Avoid using !! on nullable values as, if the value is null, you will get an NPE. Instead, use null coalescing (aka elvis) operator to fall back to a default value ?: - this is perfectly acceptable in a sum function. If the iteration is not to do with summing, you may need to handle the null case differently.

How to return-7-6-5-4-3-2-1012345678910111213

Code below is in Objective C in Xcode. I am trying to return -7-6-5-4-3-2-1012345678910111213 as the method is expecting that response. number = -7 and otherNumber = 13 How do I return the series of numbers? I tried the method below but with no success...
while (number < otherNumber) {
++number;
return number;
}
Another thing to look out for is how your parameters are getting passed in to the method. Since we dont know if "number" is always going to be less than "otherNumber" you should check to find out which of the two numbers being passed in is lower before using them in your while loop.
this is very similar to the previous post but it might make it a tad clearer:
//find which number is low and which is high and set it accordingly
while (low <= high){
//then append low to end of string
++low;
}
//return your string
And this handles the case when the numbers are equal
In Objective-C, methods can only have one return value.
If your method returns an array, something like this would work:
// Create an NSMutableArray
while (number < otherNumber) {
// Add the number to the array
++number;
}
// Return the array
Or, similarly, if your method returns a string:
// Create an NSMutableString
while (number < otherNumber) {
// Append the number to the end of the string
++number;
}
// Return the string
A few notes:
your conditional, number < otherNumber, won't capture the case where number == otherNumber. Since in your example otherNumber is 13, and you want that included, you may want to use number <= otherNumber.
you can only compare scalar numbers (like NSInteger or CGFloat) with the inequality operators (like < and >). However, you can only add objects to NSMutableArray and NSMutableString. So you'll need to convert between the scalar numbers and NSNumber as appropriate.
Since it looks like you're learning Objective-C, note that this is different from Swift, which does allow methods to return multiple values.

Linq mixed with string.Compare(...,...,...)

I was searching for a solution to permform a linq query and ignore case. I found this:
m_context.Users.SingleOrDefault(u => string.Compare(u.UserName, username, StringComparison.InvariantCultureIgnoreCase) == 0);
It search for a user object based on the username provided, ignoring case. It works, that's not the question here, but when analysing the code it seems strange to me. I mean, inside the linq, we have the string.Compare(...,...,...) returning an integer. So what? How is it managed by linq (SingleOrDefault)?
Thanks for your help.
You are passing a predicate into the SingleOrDefault method. The predicate evaluates to true or false, and this method returns the single element in the sequence that satisfies that predicate.
u => string.Compare(x, y, StringComparison.InvariantCultureIgnoreCase) == 0
This is a Func<User, bool> predicate, which means it is a function that accepts a User as an argument u and returns a boolean value as a result of the string.Compare(...) == 0 evaluation. The single element in the sequence of users to satisfy this condition is then returned. If more than one satisfies the predicate, it is an error. If less than one satisfies the predicate, you get the default value for the type, which for a reference type is simply null.
Think of it as very roughly
public static T SingleOrDefault<T>(this IEnumerable<T> sequence, Func<T, bool> predicate)
{
T foundItem = null;
int count = 0;
foreach (T item in sequence)
{
if (predicate(item)) // evaluates the u => string.Compare(...)
{
count += 1;
if (count > 1)
throw new InvalidOperationException("...");
foundItem = item;
}
}
return foundItem;
}
The above is again just my rough draft of what the method does, not the actual implementation. If you're interested in a more in-depth investigation of linq-to-objects implementations, consider reading Jon Skeet's Edulinq series, where he goes through and reimplements every (give or take) method and explains it along the way. Again, that's not the actual source code of the library, but it is very educational.
You have string.Compare(...) == 0 - that's Boolean

word distribution problem

I have a big file of words ~100 Gb and have limited memory 4Gb. I need to calculate word distribution from this file. Now one option is to divide it into chunks and sort each chunk and then merge to calculate word distribution. Is there any other way it can be done faster? One idea is to sample but not sure how to implement it to return close to correct solution.
Thanks
You can build a Trie structure where each leaf (and some nodes) will contain the current count. As words will intersect with each other 4GB should be enough to process 100 GB of data.
Naively I would just build up a hash table until it hits a certain limit in memory, then sort it in memory and write this out. Finally, you can do n-way merging of each chunk. At most you will have 100/4 chunks or so, but probably many fewer provided some words are more common than others (and how they cluster).
Another option is to use a trie which was built for this kind of thing. Each character in the string becomes a branch in a 256-way tree and at the leaf you have the counter. Look up the data structure on the web.
If you can pardon the pun, "trie" this:
public class Trie : Dictionary<char, Trie>
{
public int Frequency { get; set; }
public void Add(string word)
{
this.Add(word.ToCharArray());
}
private void Add(char[] chars)
{
if (chars == null || chars.Length == 0)
{
throw new System.ArgumentException();
}
var first = chars[0];
if (!this.ContainsKey(first))
{
this.Add(first, new Trie());
}
if (chars.Length == 1)
{
this[first].Frequency += 1;
}
else
{
this[first].Add(chars.Skip(1).ToArray());
}
}
public int GetFrequency(string word)
{
return this.GetFrequency(word.ToCharArray());
}
private int GetFrequency(char[] chars)
{
if (chars == null || chars.Length == 0)
{
throw new System.ArgumentException();
}
var first = chars[0];
if (!this.ContainsKey(first))
{
return 0;
}
if (chars.Length == 1)
{
return this[first].Frequency;
}
else
{
return this[first].GetFrequency(chars.Skip(1).ToArray());
}
}
}
Then you can call code like this:
var t = new Trie();
t.Add("Apple");
t.Add("Banana");
t.Add("Cherry");
t.Add("Banana");
var a = t.GetFrequency("Apple"); // == 1
var b = t.GetFrequency("Banana"); // == 2
var c = t.GetFrequency("Cherry"); // == 1
You should be able to add code to traverse the trie and return a flat list of words and their frequencies.
If you find that this too still blows your memory limit then might I suggest that you "divide and conquer". Maybe scan the source data for all the first characters and then run the trie separately against each and then concatenate the results after all of the runs.
do you know how many different words you have? if not a lot (i.e. hundred thousand) then you can stream the input, determine words and use a hash table to keep the counts. after input is done just traverse the result.
Just use a DBM file. It’s a hash on disk. If you use the more recent versions, you can use a B+Tree to get in-order traversal.
Why not use any relational DB? The procedure would be as simple as:
Create a table with the word and count.
Create index on word. Some databases have word index (f.e. Progress).
Do SELECT on this table with the word.
If word exists then increase counter.
Otherwise - add it to the table.
If you are using python, you can check the built-in iter function. It will read line by line from your file and will not cause memory problems. You should not "return" the value but "yield" it.
Here is a sample that I used to read a file and get the vector values.
def __iter__(self):
for line in open(self.temp_file_name):
yield self.dictionary.doc2bow(line.lower().split())

Resources