比较两个集合的相等性,无论它们中的项顺序如何

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

In my case, two collections would be equal if they both contain the same items (no matter the order).

Example:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

An example I can think of that would be wrong is:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?


The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).


How about algorithm? All answer related by compare something, generic lists compare linq etc. Really did we promised to someone that we will never use algorithm as an old fashioned programmer?

You are not checking for Equality you are checking for Equivalence. It's nitpicky but an important distinction. And a long time ago. This is a good Q+A.

You may be interested in this post, which discusses a tuned version of the dictionary-based method described below. One issue with most simple dictionary approaches is that they don't handle nulls properly because .NET's Dictionary class doesn't permit null keys.

I'm not 100% sure but I think your answer violates Microsoft's terms of use against reverse engineering.

Hello Ohad, Please read the following long debate in the topic , stackoverflow.com/questions/371328/… If you change object hashcode , while its in a hashset it will interrupt with the hashset proper action and might cause an exception . The rule is as following : If two objects are equals - they must have same hash code. If two objects have same hashcode - it isnt a must for them to be equal. Hashcode must stay same for entire object's lifetime! Thats why you impelment ICompareable and IEqualrity .

JamesRoeiter Perhaps my comment was misleading. When a dictionary encounters a hashcode it already contains, it checks for actual equality with an EqualityComparer (either the one you supplied or EqualityComparer.Default, you can check Reflector or the reference source to verify this). True, if objects change (and specifically thier hashcode changes) while this method is running then the results are unexpected, but that just means this method is not thread safe in this context.

JamesRoeiter Suppose x and y are two objects we want to compare. If they have different hashcodes, we know they are different (because equal items have equal hashcodes), and the above implementation is correct. If they have the same hashcode, the dictionary implementation will check for actual equality using the specified EqualityComparer (or EqualityComparer.Default if none was specified) and again the implementation is correct.

CADbloke the method has to be named Equals because of the IEqualityComparer<T> interface. What you should be looking at is the name of the comparer itself. In this case it's MultiSetComparer which makes sense.

You just have to add a using System.Linq; first to make it work

if this code is within a loop and collection1 gets updated and collection2 remains untouched, notice even when both collections have the same object, debugger would show false for this "equal" variable.

Chaulky - I believe the OrderBy is needed. See: dotnetfiddle.net/jA8iwE

Which was the other answer referred to as "above"? Possibly stackoverflow.com/a/50465/3195477 ?

This is almost what I want. However, I'd like to be able to do this even if I am not using integers. I'd like to use reference objects, but they do not behave properly as keys in dictionaries.

Mono, your question is moot if your Items are not comparable. If they cannot be used as keys in Dictionary, there is no solution available.

I think Mono meant the keys are not sortable. But Daniel's solution is clearly intended to be implemented with a hashtable, not a tree, and will work as long as there's an equivalence test and a hash function.

Upvoted of course for the help, but not accepted since it's missing an important point (which I cover in my answer).

FWIW, you can simplify your last foreach loop and return statement with this: return dict.All(kvp => kvp.Value == 0);

Nice job, but Note: 1. In contrast to Daniel Jennings solution, This is not O(N) but rather O(N^2), because of the find function inside the foreach loop on the bar collection; 2. You can generalize the method to accept IEnumerable<T> instead of ICollection<T> with no further modification to the code

The keys of a dictionary are compared by reference, so we have to find the original key that is equivalent to the "item" - this is not true. The algorithm is based on wrong assumptions and while works, it is terribly inefficient.

of course, using a HashSet assumes no duplicates but if so HashSet is the best way to go

I've just found on latest version that there is a parameter bool ignoreOrder on ShouldBe method.

The only issue with this code is that it only works when comparing value types or comparing the pointers to reference types. I could have two different instances of the same object in the collections, so I need to be able to specify how to compare each. Can you pass a comparison delegate to the intersect method?

Sure, you can pass a comparer delegate. But, note the above limitation regarding sets that I added, which puts a significant limit on its applicability.

The Intersect method returns a distinct collection. Given a = {1,1,2} and b ={2,2,1}, a.Intersect(b).Count() != a.Count, which causes your expression to correctly return false. {1,2}.Count != {1,1,2}.Count See link[/link] (Note that both sides are made distinct before comparison.)

What is the point of Serializable for Comparer class? :o Also you can change the input to ISet<T> to express it is meant for sets (ie no duplicates).

nawfal thanks, don't know what I was thinking when I marked it Serializable... As for ISet, the idea here was to treat the IEnumerable as a set (because you got an IEnumerable to begin with), though considering the 0 upvotes in over 5 years that may not have been the sharpest idea :P

Except won't work for counting duplicate items. It will return true for sets {1,2,2} and {1,1,2}.

CristiDiaconescu you could do a ".Distinct()" first to remove any duplicates

The OP is asking for [1, 1, 2] != [1, 2, 2] . Using Distinct would make them look equal.

How well does this perform, any ideas?

I only use this for small collections, so have not thought about Big-O complexity or done benchmarking. HaveMismatchedElements alone is O(M*N) so it may not perform well for large collections.

If IEnumerable<T>s are queries then calling Count() is not a good idea. Ohad's original answer's approach of checking if they are ICollection<T> is the better idea.