...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Boost.Intrusive associative containers offer
an interface similar to STL associative containers. However, STL's ordered
and unordered simple associative containers (std::set
,
std::multiset
, std::unordered_set
and std::unordered_multiset
) have some inefficiencies
caused by the interface in several search, insertion or erasure functions
(equal_range
, lower_bound
, upper_bound
,
find
, insert
,
erase
...): the user can only
operate with value_type
objects
or (starting from C++11), heterogeneous
comparison lookups which are not flexible enough as key_compare
shall support the comparison
between the provided key and value_type
,
which precludes the use of user-defined comparison objects that can partition
the search in a compatible but advanced way.
To solve these problems, Boost.Intrusive
containers offers functions where a key type different from key_type
and a comparison object are provided
by the user. This applies to: * equal_range * lower_bound * upper_bound *
count * find * erase
Requirements for such functions are:
key_compare
and hasher
.
For more details, see Requires clauses of such functions in the reference.
Imagine that the object to be searched is quite expensive to construct
(called Expensive
in the
example):
#include <boost/intrusive/set.hpp> #include <boost/intrusive/unordered_set.hpp> #include <cstring> #include <string> // Hash function for strings struct StrHasher { std::size_t operator()(const char *str) const { //Simple example, use favorite hash function (like boost::hash_combine) std::size_t seed = 0; for(; *str; ++str){ std::size_t x = std::size_t(*str); x = ((x >> 16) ^ x) * 0x45d9f3b; x = (x >> 16) ^ x; seed += x; } return seed; } }; using namespace boost::intrusive; class Expensive : public set_base_hook<>, public unordered_set_base_hook<> { std::string key_; // Other members... public: Expensive(const char *key) : key_(key) {} //other expensive initializations... const std::string & get_key() const { return key_; } friend bool operator < (const Expensive &a, const Expensive &b) { return a.key_ < b.key_; } friend bool operator == (const Expensive &a, const Expensive &b) { return a.key_ == b.key_; } friend std::size_t hash_value(const Expensive &object) { return StrHasher()(object.get_key().c_str()); } }; // A set and unordered_set that store Expensive objects typedef set<Expensive> Set; typedef unordered_set<Expensive> UnorderedSet; // Search functions Expensive *get_from_set(const char* key, Set &set_object) { Set::iterator it = set_object.find(Expensive(key)); if( it == set_object.end() ) return 0; return &*it; } Expensive *get_from_uset(const char* key, UnorderedSet &uset_object) { UnorderedSet::iterator it = uset_object.find(Expensive (key)); if( it == uset_object.end() ) return 0; return &*it; }
If "key" c-string is quite long Expensive
has to construct a std::string
using heap memory. Like Expensive
,
many times the only member taking part in ordering issues is just a small
part of the class. E.g.: with Expensive
,
only the internal std::string
is needed to compare the object.
In both containers, if we call get_from_set/get_from_unordered_set
in a loop, we might get a performance penalty, because we are forced to
create a whole Expensive
object to be able to find an equivalent one.
Sometimes the problem is not only performance-related, as we might not have enough information to construct the object
but we might have enough information to find the
object. In this case, a name is enough to search Expensive
objects in the container but
constructing an Expensive
object might require more information that the searcher might not have.
To solve this, we can use the functions that take any type comparable with the value and a the 'compatible' comparison object (and hash, when the associative container is unordered) Let's see optimized search function:
// These compare Expensive and a c-string struct StrExpComp { bool operator()(const char *str, const Expensive &c) const { return std::strcmp(str, c.get_key().c_str()) < 0; } bool operator()(const Expensive &c, const char *str) const { return std::strcmp(c.get_key().c_str(), str) < 0; } }; struct StrExpEqual { bool operator()(const char *str, const Expensive &c) const { return std::strcmp(str, c.get_key().c_str()) == 0; } bool operator()(const Expensive &c, const char *str) const { return std::strcmp(c.get_key().c_str(), str) == 0; } }; // Optimized search functions Expensive *get_from_set_optimized(const char* key, Set &set_object) { Set::iterator it = set_object.find(key, StrExpComp()); if( it == set_object.end() ) return 0; return &*it; } Expensive *get_from_uset_optimized(const char* key, UnorderedSet &uset_object) { UnorderedSet::iterator it = uset_object.find(key, StrHasher(), StrExpEqual()); if( it == uset_object.end() ) return 0; return &*it; }
A similar issue happens with insertions in simple ordered and unordered associative
containers with unique keys (std::set
and
std::unordered_set
). In these containers, if
a value is already present, the value to be inserted is discarded. With expensive
values, if the value is already present, we can suffer efficiency problems.
set
and unordered_set
-like
containers have insertion functions (insert_check
,
insert_unique_check
,...)
to check efficiently, without constructing the value, if a value is present
or not and if it's not present, a function to insert it immediately (insert_commit
) without any further lookup.
Requirements for functions that check the existence of such value are:
key_compare
and hasher
.
key_compare
.
To sum up, insert_check
is
similar to a normal insert
but:
insert_check
can be used
with arbitrary keys
insert_check
collects all the needed
information in an insert_commit_data
structure, so that insert_commit
:
These functions must be used with care, no other insertion or erasure must
be executed between an insert_check
and an insert_commit
pair.
Otherwise, the behaviour is undefined.
See set
and unordered_set
-like containers'
reference for more information about insert_check
and insert_commit
.
With multiple ordered and unordered associative containers (multiset
and unordered_multiset
)
there is no need for these advanced insertion functions, since insertions
are always successful.
For example, using the same Expensive
class, this function can be inefficient:
// Insertion functions bool insert_to_set(const char* key, Set &set_object) { Expensive *pobject = new Expensive(key); bool success = set_object.insert(*pobject).second; if(!success) delete pobject; return success; } bool insert_to_uset(const char* key, UnorderedSet &uset_object) { Expensive *pobject = new Expensive(key); bool success = uset_object.insert(*pobject).second; if(!success) delete pobject; return success; }
If the object is already present, we are constructing an Expensive
that will be discarded, and
this is a waste of resources. Instead of that, let's use insert_check
and insert_commit
functions:
// Optimized insertion functions bool insert_to_set_optimized(const char* key, Set &set_object) { Set::insert_commit_data insert_data; bool success = set_object.insert_check(key, StrExpComp(), insert_data).second; if(success) set_object.insert_commit(*new Expensive(key), insert_data); return success; } bool insert_to_uset_optimized(const char* key, UnorderedSet &uset_object) { UnorderedSet::insert_commit_data insert_data; bool success = uset_object.insert_check (key, StrHasher(), StrExpEqual(), insert_data).second; if(success) uset_object.insert_commit(*new Expensive(key), insert_data); return success; }
Some ordered associative containers offer low-level functions to bypass ordering checks and insert nodes directly in desired tree positions. These functions are provided for performance reasons when values to be inserted in the container are known to fulfill order (sets and multisets) and uniqueness (sets) invariants. A typical usage of these functions is when intrusive associative containers are used to build non-intrusive containers and the programmer wants to speed up assignments from other associative containers: if the ordering and uniqueness properties are the same, there is no need to waste time checking the position of each source value, because values are already ordered: back insertions will be much more efficient.
Note: These functions don't check preconditions so they must used with care. They are low-level operations that will break container invariants if ordering and uniqueness preconditions are not assured by the caller.
Let's see an example:
#include <boost/intrusive/set.hpp> #include <vector> #include <functional> #include <cassert> using namespace boost::intrusive; //A simple class with a set hook class MyClass : public set_base_hook<> { public: int int_; MyClass(int i) : int_(i) {} friend bool operator< (const MyClass &a, const MyClass &b) { return a.int_ < b.int_; } friend bool operator> (const MyClass &a, const MyClass &b) { return a.int_ > b.int_; } }; int main() { //Create some ORDERED elements std::vector<MyClass> values; for(int i = 0; i < 100; ++i) values.push_back(MyClass(i)); { //Data is naturally ordered in the vector with the same criteria //as multiset's comparison predicate, so we can just push back //all elements, which is more efficient than normal insertion multiset<MyClass> mset; for(std::size_t i = 0; i != 100u; ++i) mset.push_back(values[i]); //Now check orderd invariant multiset<MyClass>::const_iterator next(mset.cbegin()), it(next++); for(std::size_t i = 0; i < 99u; ++i, ++it, ++next) assert(*it < *next); } { //Now the correct order for the set is the reverse order //so let's push front all elements multiset<MyClass, compare< std::greater<MyClass> > > mset; for(std::size_t i = 0; i < 100u; ++i) mset.push_front(values[i]); //Now check orderd invariant multiset<MyClass, compare< std::greater<MyClass> > >:: const_iterator next(mset.cbegin()), it(next++); for(std::size_t i = 0; i < 99u; ++i, ++it, ++next) assert(*it > *next); } { //Now push the first and the last and insert the rest //before the last position using "insert_before" multiset<MyClass> mset; mset.insert_before(mset.begin(), values[0]); multiset<MyClass>::const_iterator pos = mset.insert_before(mset.end(), values[99]); for(std::size_t i = 1; i < 99u; ++i) mset.insert_before(pos, values[i]); //Now check orderd invariant multiset<MyClass>::const_iterator next(mset.cbegin()), it(next++); for(int i = 0; i < 99; ++i, ++it, ++next) assert(*it < *next); } return 0; }
For more information about advanced lookup and insertion functions see associative
containers' documentation (e.g. set
,
multiset
, unordered_set
and unordered_multiset
references).