HowTo sort std::map by first and second params? - sorting

Here is my std::map example, like std::map< string, string > my_map;
// ABC | aaa ABC | aaa
// DEF | def ABC | dcd
// BCD | def -> ABC | zzz
// DEF | bcd BCD | def
// ABC | dcd DEF | bcd
// ABC | zzz DEF | def
As you can see, I'm trying to sort left std::map and get the right one.
And here is my code (I used not strings, but my custom types. any way, in final, I'm sorting strings):
template < typename T1, typename T2 >
struct less_second
{
typedef std::pair< T1, T2 > type;
bool operator ()( type const& _left, type const& _right ) const
{
return ( (*_left.first).name() < (*_right.first).name() ) &&
( (*_left.second).name() < (*_right.second).name() );
}
};
Problem: when I use only in less_second
return (*_left.first).name() < (*_right.first).name();
All data from the first column sorted, but second column not (of course, because we are used only first!)
The mirrored situation, when I use only
return (*_left.second).name() < (*_right.second).name();
The second column sorted.
BUT I need to sort and first and the second columns at once. How to code this? What I'm doing wrong?
Thanks for help!
Sorry, forget this code:
std::vector< std::pair< CompanyPtr, ContractorPtr > > n_map_( buddies_ccm_.begin(), buddies_ccm_.end() );
std::sort( n_map_.begin(), n_map_.end(), less_second< CompanyPtr, ContractorPtr >() );

Your comparison function is wrong. It does not work when both first.name() are equal. Try something like this:
bool operator ()( type const& _left, type const& _right ) const
{
if ((*_left.first).name() > (*_right.first).name())
return false;
if ((*_left.first).name() < (*_right.first).name())
return true;
return ( (*_left.second).name() < (*_right.second).name() );
}

I'm a little bit confused by the term "columns". Are you talking about keys and values?
A std::map is always ordered by key. You can specify a compare object at construction time of your map to define that order. But this compare object does not compare std::pairs, but objects of the key type of your map.
Moreover, a key in a map is unique. Thus, there cannot be two entries with the key "ABC" in the map.
I suppose you try to sort the map with std::sort from <algorithm>. I'm not sure what happens in this case, but I think it is not what you expect to happen.

You only want to take the second column into account if _left->first and _right->first are equal. Put another way, you want to make your code compare the first fields first, and fall back to comparing the second fields only if the first comparison is inconclusive (which happens when both firsts are equal).

You cannot achieve what you want with a std::map as it has been already explained in another answer. If your pairs don't have repetitions you could use a set of pairs:
std::set<std::pair<std::string, std::string> > pair_set;
You don't need to provide a comparator as std::pair already supports the kind of comparison you require.

Related

How to sort multiple columns: CSV? c++

I am attempting to sort a CSV file by specifying which column order to sort in:
for example: ./csort 3, 1, 5 < DATA > SORTED_DATA
or ./csort 3, 4, 6, 2, 1, 5 < DATA ...
example line of DATA: 177,27,2,42,285,220
I used a vector split(string str) function to store the columns specified in the arguments which require sorting. Creating a vector:
vector<string> columns {3, 1, 5}; // for example
Not entirely sure how to use this columns vector to proceed with the sorting process; though, I am aware that I could use sort.
sort(v.begin(), v.end(), myfunction);
As I understand your question, you have already parsed your data into 4 vectors, 1 vector per column, and you want to be able to sort your data, specifying the prececedence of the column to sort -- i.e. sort by col1, then col3, then col4...
What you want to do isn't too difficult, but you'll have to backtrack a bit. There are multiple ways to approach the problem, but here's a rough outline. Based on the level of expertise you exhibit in your question, you might have to look a few terms in the following outline, but if you do you'll have a good flexible solution to your problem.
You want to store your data by row, since you want to sort rows... 4 vector for 4 columns won't help you here. If all 4 elements in the row are going to be a the same type, you could use a std::vector or std::array for the row. std::array is solid if # cols is known compile time, std::vector for runtime. If the types are inhomogeneous, you could use a tuple, or a struct. Whatever type you use, let's call it RowT.
Parse and store into your rows, make a vector of RowT.
Define a function-object which provides the () operator for a left and right hand side of RowT. It must implement the "less than operation" following the precedence you want. Lets call that class CustomSorter.
Once you have that in place, your final sort will be:
CustomSorter cs(/*precedence arguments*/);
std::sort(rows.begin(), rows.end(), cs);
Everything is really straightforward, a basic example can bee seen here in the customsort example. In my experience the only part you will have to work at is the sort algorithm itself.
The easiest way is to use a class that has a list of indexes as a member, and go through the list in order to see if the item is less than the other.
class VecLess
{
std::vector<int> indexes;
public:
VecLess(std::vector<int> init) : indexes(init)
{
}
bool operator()(const std::vector<string> & lhs, const std::vector<string> rhs)
{
for (auto i = indexes.begin(); i != indexes.end(); ++i)
{
if (lhs[*i] < rhs[*i])
return true;
if (rhs[*i] < lhs[*i])
return false;
}
return false;
}
};

Good algorithm to turn stl map into sorted list of the keys based on a numeric value

I have a stl map that's of type:
map<Object*, baseObject*>
where
class baseObject{
int ID;
//other stuff
};
If I wanted to return a list of objects (std::list< Object* >), what's the best way to sort it in order of the baseObject.ID's?
Am I just stuck looking through for every number or something? I'd prefer not to change the map to a boost map, although I wouldn't be necessarily against doing something that's self contained within a return function like
GetObjectList(std::list<Object*> &objects)
{
//sort the map into the list
}
Edit: maybe I should iterate through and copy the obj->baseobj into a map of baseobj.ID->obj ?
What I'd do is first extract the keys (since you only want to return those) into a vector, and then sort that:
std::vector<baseObject*> out;
std::transform(myMap.begin(), myMap.end(), std::back_inserter(out), [](std::pair<Object*, baseObject*> p) { return p.first; });
std::sort(out.begin(), out.end(), [&myMap](baseObject* lhs, baseObject* rhs) { return myMap[lhs].componentID < myMap[rhs].componentID; });
If your compiler doesn't support lambdas, just rewrite them as free functions or function objects. I just used lambdas for conciseness.
For performance, I'd probably reserve enough room in the vector initially, instead of letting it gradually expand.
(Also note that I haven't tested the code, so it might need a little bit of fiddling)
Also, I don't know what this map is supposed to represent, but holding a map where both key and value types are pointers really sets my "bad C++" sense tingling. It smells of manual memory management and muddled (or nonexistent) ownership semantics.
You mentioned getting the output in a list, but a vector is almost certainly a better performing option, so I used that. The only situation where a list is preferable is really when you have no intention of ever iterating over it, and if you need the guarantee that pointers and iterators stay valid after modification of the list.
The first thing is that I would not use a std::list, but rather a std::vector. Now as of the particular problem you need to perform two operations: generate the container, sort it by whatever your criteria is.
// Extract the data:
std::vector<Object*> v;
v.reserve( m.size() );
std::transform( m.begin(), m.end(),
std::back_inserter(v),
[]( const map<Object*, baseObject*>::value_type& v ) {
return v.first;
} );
// Order according to the values in the map
std::sort( v.begin(), v.end(),
[&m]( Object* lhs, Object* rhs ) {
return m[lhs]->id < m[rhs]->id;
} );
Without C++11 you will need to create functors instead of the lambdas, and if you insist in returning a std::list then you should use std::list<>::sort( Comparator ). Note that this is probably inefficient. If performance is an issue (after you get this working and you profile and know that this is actually a bottleneck) you might want to consider using an intermediate map<int,Object*>:
std::map<int,Object*> mm;
for ( auto it = m.begin(); it != m.end(); ++it )
mm[ it->second->id ] = it->first;
}
std::vector<Object*> v;
v.reserve( mm.size() ); // mm might have less elements than m!
std::transform( mm.begin(), mm.end(),
std::back_inserter(v),
[]( const map<int, Object*>::value_type& v ) {
return v.second;
} );
Again, this might be faster or slower than the original version... profile.
I think you'll do fine with:
GetObjectList(std::list<Object*> &objects)
{
std::vector <Object*> vec;
vec.reserve(map.size());
for(auto it = map.begin(), it_end = map.end(); it != it_end; ++it)
vec.push_back(it->second);
std::sort(vec.begin(), vec.end(), [](Object* a, Object* b) { return a->ID < b->ID; });
objects.assign(vec.begin(), vec.end());
}
Here's how to do what you said, "sort it in order of the baseObject.ID's":
typedef std::map<Object*, baseObject*> MapType;
MapType mymap; // don't care how this is populated
// except that it must not contain null baseObject* values.
struct CompareByMappedId {
const MapType &map;
CompareByMappedId(const MapType &map) : map(map) {}
bool operator()(Object *lhs, Object *rhs) {
return map.find(lhs)->second->ID < map.find(rhs)->second->ID;
}
};
void GetObjectList(std::list<Object*> &objects) {
assert(objects.empty()); // pre-condition, or could clear it
// or for that matter return a list by value instead.
// copy keys into list
for (MapType::const_iterator it = mymap.begin(); it != mymap.end(); ++it) {
objects.push_back(it->first);
}
// sort the list
objects.sort(CompareByMappedId(mymap));
}
This isn't desperately efficient: it does more looking up in the map than is strictly necessary, and manipulating list nodes in std::list::sort is likely a little slower than std::sort would be at manipulating a random-access container of pointers. But then, std::list itself isn't very efficient for most purposes, so you expect it to be expensive to set one up.
If you need to optimize, you could create a vector of pairs of (int, Object*), so that you only have to iterate over the map once, no need to look things up. Sort the pairs, then put the second element of each pair into the list. That may be a premature optimization, but it's an effective trick in practice.
I would create a new map that had a sort criterion that used the component id of your objects. Populate the second map from the first map (just iterate through or std::copy in). Then you can read this map in order using the iterators.
This has a slight overhead in terms of insertion over using a vector or list (log(n) time instead of constant time), but it avoids the need to sort after you've created the vector or list which is nice.
Also, you'll be able to add more elements to it later in your program and it will maintain its order without need of a resort.
I'm not sure I completely understand what you're trying to store in your map but perhaps look here
The third template argument of an std::map is a less functor. Perhaps you can utilize this to sort the data stored in the map on insertion. Then it would be a straight forward loop on a map iterator to populate a list

C++: shared_ptr as unordered_set's key

Consider the following code
#include <boost/unordered_set.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
int main()
{
boost::unordered_set<int> s;
s.insert(5);
s.insert(5);
// s.size() == 1
boost::unordered_set<boost::shared_ptr<int> > s2;
s2.insert(boost::make_shared<int>(5));
s2.insert(boost::make_shared<int>(5));
// s2.size() == 2
}
The question is: how come the size of s2 is 2 instead of 1? I'm pretty sure it must have something to do with the hash function. I tried looking at the boost docs and playing around with the hash function without luck.
Ideas?
make_shared allocates a new int, and wraps a shared_ptr around it. This means that your two shared_ptr<int>s point to different memory, and since you're creating a hash table keyed on pointer value, they are distinct keys.
For the same reason, this will result in a size of 2:
boost::unordered_set<int *> s3;
s3.insert(new int(5));
s3.insert(new int(5));
assert(s3.size() == 2);
For the most part you can consider shared_ptrs to act just like pointers, including for comparisons, except for the auto-destruction.
You could define your own hash function and comparison predicate, and pass them as template parameters to unordered_map, though:
struct your_equality_predicate
: std::binary_function<boost::shared_ptr<int>, boost::shared_ptr<int>, bool>
{
bool operator()(boost::shared_ptr<int> i1, boost::shared_ptr<int> i2) const {
return *i1 == *i2;
}
};
struct your_hash_function
: std::unary_function<boost::shared_ptr<int>, std::size_t>
{
std::size_t operator()(boost::shared_ptr<int> x) const {
return *x; // BAD hash function, replace with somethign better!
}
};
boost::unordered_set<int, your_hash_function, your_equality_predicate> s4;
However, this is probably a bad idea for a few reasons:
You have the confusing situation where x != y but s4[x] and s4[y] are the same.
If someone ever changes the value pointed-to by a hash key your hash will break! That is:
boost::shared_ptr<int> tmp(new int(42));
s4[tmp] = 42;
*tmp = 24; // UNDEFINED BEHAVIOR
Typically with hash functions you want the key to be immutable; it will always compare the same, no matter what happens later. If you're using pointers, you usually want the pointer identity to be what is matched on, as in extra_info_hash[&some_object] = ...; this will normally always map to the same hash value whatever some_object's members may be. With the keys mutable after insertion is it all too easy to actually do so, resulting in undefined behavior in the hash.
Notice that in Boost <= 1.46.0, the default hash_value of a boost::shared_ptr is its boolean value, true or false.
For any shared_ptr that is not NULL, hash_value evaluates to 1 (one), as the (bool)shared_ptr == true.
In other words, you downgrade a hash set to a linked list if you are using Boost <= 1.46.0.
This is fixed in Boost 1.47.0, see https://svn.boost.org/trac/boost/ticket/5216 .
If you are using std::shared_ptr, please define your own hash function, or use boost/functional/hash/extensions.hpp from Boost >= 1.51.0
As you found out, the two objects inserted into s2 are distinct.

LINQ and CASE Sensitivity

I have this LINQ Query:
TempRecordList = new ArrayList(TempRecordList.Cast<string>().OrderBy(s => s.Substring(9, 30)).ToArray());
It works great and performs sorting in a way that's accurate but a little different from what I want. Among the the result of the query I see something like this:
Palm-Bouter, Peter
Palmer-Johnson, Sean
Whereas what I really need is to have names sorted like this:
Palmer-Johnson, Sean
Palm-Bouter, Peter
Basically I want the '-' character to be treated as being lower than the character so that names that contain it show up later in an ascending search.
Here is another example. I get:
Dias, Reginald
DiBlackley, Anton
Instead of:
DiBlackley, Anton
Dias, Reginald
As you can see, again, the order is switched due to how the uppercase letter 'B' is treated.
So my question is, what do I need to change in my LINQ query to make it return results in the order I specified. Any feedback would be greatly appreaciated.
By the way, I tried using s.Substring(9, 30).ToLower() but that didn't help.
Thank you!
To customize the sorting order you will need to create a comparer class that implements IComparer<string> interface. The OrderBy() method takes comparer as second parameter.
internal sealed class NameComparer : IComparer<string> {
private static readonly NameComparer DefaultInstance = new NameComparer();
static NameComparer() { }
private NameComparer() { }
public static NameComparer Default {
get { return DefaultInstance; }
}
public int Compare(string x, string y) {
int length = Math.Min(x.Length, y.Length);
for (int i = 0; i < length; ++i) {
if (x[i] == y[i]) continue;
if (x[i] == '-') return 1;
if (y[i] == '-') return -1;
return x[i].CompareTo(y[i]);
}
return x.Length - y.Length;
}
}
This works at least with the following test cases:
var names = new[] {
"Palmer-Johnson, Sean",
"Palm-Bouter, Peter",
"Dias, Reginald",
"DiBlackley, Anton",
};
var sorted = names.OrderBy(name => name, NameComparer.Default).ToList();
// sorted:
// [0]: "DiBlackley, Anton"
// [1]: "Dias, Reginald"
// [2]: "Palmer-Johnson, Sean"
// [3]: "Palm-Bouter, Peter"
As already mentioned, the OrderBy() method takes a comparer as a second parameter.
For strings, you don't necessarily have to implement an IComparer<string>. You might be fine with System.StringComparer.CurrentCulture (or one of the others in System.StringComparer).
In your exact case, however, there is no built-in comparer which will handle also the - after letter sort order.
OrderBy() returns results in ascending order.
e comes before h, thus the first result (remember you're comparing on a substring that starts with the character in the 9th position...not the beginning of the string) and i comes before y, thus the second. Case sensitivity has nothing to do with it.
If you want results in descending order, you should use OrderByDescending():
TempRecordList.Cast<string>
.OrderByDescending(s => s.Substring(9, 30)).ToArray());
You might want to just implement a custom IComparer object that will give a custom priority to special, upper-case and lower-case characters.
http://msdn.microsoft.com/en-us/library/system.collections.icomparer.aspx

Boost Spirit rule with custom attribute parsing

I am writing a Boost Spirit grammar to parse text into a vector of these structs:
struct Pair
{
double a;
double b;
};
BOOST_FUSION_ADAPT_STRUCT(
Pair,
(double, a)
(double, a)
)
This grammar has a rule like this:
qi::rule<Iterator, Pair()> pairSequence;
However, the actual grammar of pairSequence is this:
double_ % separator
I want this grammar to produce a Pair with a equal to the double and b equal to some constant. I want to do something like this:
pairSequence = double_[_val = Pair(_1, DEFAULT_B)] % separator;
The above does not compile, of course. I tried adding a constructor to Pair, but I still get compile errors (no matching function for call to 'Pair::Pair(const boost::phoenix::actor >&, double)').
First of all, the signature of pairSequence needs to be:
qi::rule<Iterator, std::vector<Pair>()> pairSequence;
as the list operator exposes a std::vector<Pair> as its attribute.
All functions called from inside a semantic action have to be 'lazy', so you need to utilize phoenix:
namespace phx = boost::phoenix;
pairSequence =
double_[
phx::push_back(_val,
phx::construct<Pair>(_1, phx::val(DEFAULT_B))
)
] % separator
;
Another possibility would be to add a (non-explicit) constructor to Pair:
struct Pair
{
Pair(double a) : a(a), b(DEFAULT_B) {}
double a;
double b;
};
which allows to simplify the grammar:
pairSequence = double_ % separator;
and completely relies on Spirit's built-in attribute propagation rules.
BTW, for any of this to work, you don't need to adapt Pair as a Fusion sequence.

Resources