unordered_set problem

Question

nitin1 15 Master Poster

12 Years Ago

can anyone please explain me in brief that how exactly this container work ? or can u giveme some link of this. i have tried to look on some links by google, but i am not much clear about my doubt. please help me if u can. thanks.

c++

4 Contributors
16 Replies
423 Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by mike_2000_17

All 16 Replies

NathanOliver 429 Veteran Poster

12 Years Ago

Check this out

NathanOliver 429 Veteran Poster

12 Years Ago

Lets say you need to read in a text file and display a list of all of the unique words in the the file in the order they appeared. Using an unordered set you would be able to do that by reading in each word from the file and inserting it into the set. then all you have to do is print out the set from first to last.

nitin1 commented: nicely answered!! +2

mike_2000_17 2,669 21st Century Viking

12 Years Ago

the words will be stored in the order they are added to the unordered set unlike a set where they are put in order of thier key value.

That's not true. Not at all. The values stored in an unordered_set will appear in whatever order they map to through the hashing function and the bucketing (to resolve collisions). This is why it is specified as "store elements in no particular order". If an implementation was required to preserve the order of insertion (as you suggest), there would be no way to implement this without doing an indirect indexing, and that is an unacceptable compromise for most purposes (if you want indirect indexing, just do it yourself through a vector and an unordered_set working together).

if i iterate it, then will they be in any specific order ?

No. The elements appear in "no particular order", which means neither sorted nor in their original insertion order.

what can be the use of it ?

If you don't need the data to be sorted (as in a std::set), then you can use an std::unordered_set for the same purposes you would use an std::set. Then, lookups, insertions, removals, and just about any operation that requires finding a particular value in the set can be done in constant time O(1) (amortized) as opposed to logarithmic time O(logN). For large numbers of elements, this can be a huge performance benefit. For smaller sets, you get less of a performance boost because of the additional overhead of creating hash values (more expensive than comparisons, in general), and some of the memory overhead of the somewhat more complicated underlying memory structure.

just i am inserting the strings and then in future i can check that i have added it already or not ?

Yes, that is one additional purpose for an unordered_set.

that's all ?

The basic purpose of a std::set is to be able to do fast lookups by value (and ensure uniqueness), the fact that the values are sorted is an additional bonus. If you have no specific need for the values being sorted (i.e., you don't need to traverse the elements in any particular order, or at all), then std::set and std::unordered_set are essentially interchangeable, and the choice boils down to performance.

The performance choices look like this:
Fast lookups on large data sets: Use std::unordered_set or std::unordered_map.
A large data set in sorted order (in-order traversals): Use a sorted std::vector.
Fast lookups AND/OR in-order traversals on small data sets: Use std::set or std::map.
Fast lookups AND in-order traversals on large data sets: You're in trouble. You will probably have to use a indirect indexing or multi-index implementation.

N.B.: Large / small in the context of lookup speed is about whether the overhead of that hash function / table is acceptable for the benefit of the average constant-time lookups.
N.B.: Large / small in the context of in-order traversals is about the size of the cache and general fragmentation of your program's memory footprint.

i want to know how unordered_set works and does it hash the values or not and how it retrive the results for queries.

It's a hash table implementation, the element values are processed by a hash function that generates a hash-value which is essentially an index into an array of buckets (usually sparsely stored). Within each bucket of the array that a hash value might map to, there is allowance for a multiple number of elements that result in the same hash value (collisions), and then there is probably some rather simple method to arrange them (or no arrangement at all).

A query is resolved by first processing the query value through the hash function to get its hash value. Then, directly lookup the corresponding bucket (that's just a simple indexing into an array). If the bucket is empty, then the query fails (no match). If the bucket has one or more elements in it, the some simple method is used to find the unique match (e.g., linear search or binary search, depending on the arrangement within the buckets).

The performance of a hash table depends strongly on how good the hash function is with respect to the distribution (and amount) of the data you store in the hash table, this is a fine art. This is probably why it wasn't included in the original standard library, even though the std::hash_set and std::hash_map has been available in most implementations since the beginning.

As for the actual details of the implementation, I don't know them, and they will likely vary between implementations, you'd have to look it up.

mike_2000_17 2,669 21st Century Viking

12 Years Ago

we can say that set (ordered or unordered) are MAINLY used for look-ups ? and i have a big advantage with set(ordered) that they are arranged in sorted order when i will traverse it. am i roght ?

Yes that's right.

secondly, if can i have a set of structre ? if yes, then how will it check for uniqueness ? thanks.

If you want to use unordered_set with a custom class (or structure), you have to define a hash function as well as an equality comparison function, as you see from the signature of the class template:

template < class Key,                        // unordered_set::key_type/value_type
           class Hash = hash<Key>,           // unordered_set::hasher
           class Pred = equal_to<Key>,       // unordered_set::key_equal
           class Alloc = allocator<Key>      // unordered_set::allocator_type
           > class unordered_set;

The default equality comparison is the == operator for the class (which you can define).

For a std::set, it is similar, you have to define your own comparator (analogous to less-than), or simply overload the less-than operator.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

nitin1 15 Master Poster · Answer 1 · 2013-02-14T17:57:07+00:00

i have already read out this. i want to knoe the working or some knowledge about its implementation. thanks

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 2 · 2013-02-14T18:00:07+00:00

i want to knoe the working or some knowledge about its implementation.

The whole point of abstracting things into an interface is that you don't need to know about the implementation. But it's a fairly safe bet that you're looking at some variation of a hash table. Open up the header and see if you can decipher the code, it'll tell you everything you want to know.

nitin1 15 Master Poster · Answer 3 · 2013-02-14T18:07:47+00:00

sir, i know that codes are really tuff to understand. but i was willing to know just about basic working. like, when you u have told me about map in STL. (they are implemented as red black trees and they have logn complexity and all..). so , i want to know how unordered_set works and does it hash the values or not and how it retrive the results for queries. thanks

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 4 · 2013-02-14T18:11:09+00:00

so , i want to know how unordered_set works and does it hash the values or not and how it retrive the results for queries. thanks

Please read my posts in their entirety. You'll find that I already answered this question: it's some variation of hash table. If you want to learn how unordered_set probably works under the hood, go learn about hash tables.

nitin1 15 Master Poster · Answer 5 · 2013-02-14T18:15:03+00:00

unordered_set<string>mm;

what can be the use of it ? just i am inserting the strings and then in future i can check that i have added it already or not ? that's all ?

secondly, how does it save the collisons ? i mean if it hashes the values , then collisions will be there for sure.in short, i am asking that it this always reliable to use it ? thanks

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 6 · 2013-02-14T18:31:25+00:00

unordered_set is a hash table, it does manage collisions. How it works is solely dependent on which variation of hash table the author chose to use. You use it for any number of cases where a hash table is a viable solution.

nitin1 15 Master Poster · Answer 7 · 2013-02-14T18:42:13+00:00

if i iterate it, then will they be in any specific order ? (as u said "in order"). please throw some light on that. thanks for this help.

NathanOliver 429 Veteran Poster Featured Poster · Answer 8 · 2013-02-14T18:43:51+00:00

the words will be stored in the order they are added to the unordered set unlike a set where they are put in order of thier key value.

NathanOliver 429 Veteran Poster Featured Poster · Answer 9 · 2013-02-15T13:11:27+00:00

Mike thanks for the clarification. I thought that the set maintained the order that the elements were inserted in. I guess I need to read a further and not make assumptions. Sorry for the bad advice.

nitin1 15 Master Poster · Answer 10 · 2013-02-15T14:53:14+00:00

we can say that set (ordered or unordered) are MAINLY used for look-ups ? and i have a big advantage with set(ordered) that they are arranged in sorted order when i will traverse it. am i roght ?

secondly, if can i have a set of structre ? if yes, then how will it check for uniqueness ? thanks.

nitin1 15 Master Poster · Answer 11 · 2013-02-16T05:59:37+00:00

sir, please tell me that can i say both map and set as associative arrays ? i have read this and seen that associative arrays have key,value pairs (like map). but set don't have value , then why are they called associtiave array ?

secondly, map are implemented as Red-black trees. right ? that mean map don't hash the key but unordered_set hashes it. can we have a attached value with unordered_set ?

thridly, i have read that hash tables are used to implement assoctiave arrays. but again , i saw that maps are assoctive arrays, but map dont use hashing. then how hash tables are used to implement assoctive arrays? thanks.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 12 · 2013-02-16T07:35:16+00:00

sir, please tell me that can i say both map and set as associative arrays ? i have read this and seen that associative arrays have key,value pairs (like map). but set don't have value , then why are they called associtiave array ?

Yes, they are both types of associative containers, at least, in the world of C++ (it might not be universal among computer scientists, I would presume some would only consider things like std::map or std::unordered_map as associative containers). Here is the definition of "associative containers" from the C++ standard (section 23.2.4/1-2):

Associative containers provide fast retrieval of data based on keys. The library provides four basic kinds of associative containers: set, multiset, map and multimap.
Each associative container is parameterized on Key and an ordering relation Compare that induces a strict weak ordering (25.4) on elements of Key. In addition, map and multimap associate an arbitrary type T with the Key. The object of type Compare is called the comparison object of a container.

The defining characteristic of associative containers is the "fast retrieval of data based on keys", whether the "data" is the key itself (e.g., std::set) or another value (e.g., std::map) doesn't change that, it's still called an associative container. The second clause also is the defining characteristic of ordered containers (set or map), that is, they are organized by a method of strict weak ordering (e.g., less-than, greater-than, etc.).

By contrast, unordered_set or unordered_map are "unordered associative containers", as in section 23.2.5/1-5:

Unordered associative containers provide an ability for fast retrieval of data based on keys. The worst-case complexity for most operations is linear, but the average case is much faster. The library provides four unordered associative containers: unordered_set, unordered_map, unordered_multiset, and
unordered_multimap.
[snip]
Each unordered associative container is parameterized by Key, by a function object type Hash that meets the Hash requirements (17.6.3.4) and acts as a hash function for argument values of type Key, and by a binary predicate Pred that induces an equivalence relation on values of type Key. Additionally, unordered_map and unordered_multimap associate an arbitrary mapped type T with the Key.
A hash function is a function object that takes a single argument of type Key and returns a value of type std::size_t.
Two values k1 and k2 of type Key are considered equivalent if the container’s key_equal function object returns true when passed those values. If k1 and k2 are equivalent, the hash function shall return the same value for both.

secondly, map are implemented as Red-black trees. right ?

Presumably, yes. They don't have to, but it is a very likely choice. If not, it will be a very similar type of self-balancing binary search tree.

BTW, the std::set is most likely the same implementation as std::map on most implementation, except without the mapped values (only the keys).

can we have a attached value with unordered_set ?

Sure, just like you would with std::set. If you create a custom class that contains a number of different fields, and you want to do the lookups based on one of the fields only, then you would create a custom hash function that hashes only that one particular field and a equality predicate that compares only that one particular field, and you'll be able to use it like that. This is the same with std::set, except that in that case you need to create a custom comparison function only. But, of course, it is much easier to do this with a map or unordered_map (and that's why they exist).

thridly, i have read that hash tables are used to implement assoctiave arrays. but again , i saw that maps are assoctive arrays, but map dont use hashing. then how hash tables are used to implement assoctive arrays? thanks.

They're all associative containers. std::set, std::multiset, std::map and std::multimap are ordered associative containers (mostly likely implemented with red-black trees or some other variations of a binary search tree). std::unordered_set, std::unordered_multiset, std::unordered_map and std::unordered_multimap are unordered associative containers (hash tables).

I think you are confusing the implementation method with the category of functionality. When we say "associative container", we are referring to a particular functionality that it offers, i.e., "fast retrieval of data based on keys". This says nothing about how it's implemented. In fact, by and large, the standard rarely specifies how anything is to be implemented, it specifies what functionalities must be provided and the observable behaviour of invoking those functionalities. The implementation details are left to the implementer (compiler vendor) and are of no concern to the standard or the users (programmers), but, by and large, the required functionality and behavior (incl. performance) on the operations significantly narrow down the possibilities (e.g., red-black tree, hash table, etc.).

unordered_set problem

Recommended Answers Collapse Answers

All 16 Replies

Recommended Answers