标签云

微信群

扫码加入我们

WeChat QR Code

I would like to compare two collections (in C#), but I'm not sure of the best way to implement this efficiently.

I've read the other thread about Enumerable.SequenceEqual, but it's not exactly what I'm looking for.

In my case, two collections would be equal if they both contain the same items (no matter the order).

Example:

collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};

collection1 == collection2; // true

What I usually do is to loop through each item of one collection and see if it exists in the other collection, then loop through each item of the other collection and see if it exists in the first collection. (I start by comparing the lengths).

if (collection1.Count != collection2.Count)
    return false; // the collections are not equal

foreach (Item item in collection1)
{
    if (!collection2.Contains(item))
        return false; // the collections are not equal
}

foreach (Item item in collection2)
{
    if (!collection1.Contains(item))
        return false; // the collections are not equal
}

return true; // the collections are equal

However, this is not entirely correct, and it's probably not the most efficient way to do compare two collections for equality.

An example I can think of that would be wrong is:

collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}

Which would be equal with my implementation. Should I just count the number of times each item is found and make sure the counts are equal in both collections?


The examples are in some sort of C# (let's call it pseudo-C#), but give your answer in whatever language you wish, it does not matter.

Note: I used integers in the examples for simplicity, but I want to be able to use reference-type objects too (they do not behave correctly as keys because only the reference of the object is compared, not the content).


How about algorithm? All answer related by compare something, generic lists compare linq etc. Really did we promised to someone that we will never use algorithm as an old fashioned programmer?

2018年08月14日50分22秒

You are not checking for Equality you are checking for Equivalence. It's nitpicky but an important distinction. And a long time ago. This is a good Q+A.

2018年08月14日50分22秒

You may be interested in this post, which discusses a tuned version of the dictionary-based method described below. One issue with most simple dictionary approaches is that they don't handle nulls properly because .NET's Dictionary class doesn't permit null keys.

2018年08月14日50分22秒

I'm not 100% sure but I think your answer violates Microsoft's terms of use against reverse engineering.

1970年01月01日00分03秒

JamesRoeiter Perhaps my comment was misleading. When a dictionary encounters a hashcode it already contains, it checks for actual equality with an EqualityComparer (either the one you supplied or EqualityComparer.Default, you can check Reflector or the reference source to verify this). True, if objects change (and specifically thier hashcode changes) while this method is running then the results are unexpected, but that just means this method is not thread safe in this context.

2018年08月14日50分22秒

JamesRoeiter Suppose x and y are two objects we want to compare. If they have different hashcodes, we know they are different (because equal items have equal hashcodes), and the above implementation is correct. If they have the same hashcode, the dictionary implementation will check for actual equality using the specified EqualityComparer (or EqualityComparer.Default if none was specified) and again the implementation is correct.

2018年08月14日50分22秒

CADbloke the method has to be named Equals because of the IEqualityComparer<T> interface. What you should be looking at is the name of the comparer itself. In this case it's MultiSetComparer which makes sense.

2018年08月14日50分22秒

Ah, yes, sorry. I overlooked the Interface implementation requiring you to call it Equals. Thanks.

2018年08月14日50分22秒

You just have to add a using System.Linq; first to make it work

2018年08月14日50分22秒

if this code is within a loop and collection1 gets updated and collection2 remains untouched, notice even when both collections have the same object, debugger would show false for this "equal" variable.

2018年08月15日50分22秒

Chaulky - I believe the OrderBy is needed. See: dotnetfiddle.net/jA8iwE

2018年08月15日50分22秒

This is almost what I want. However, I'd like to be able to do this even if I am not using integers. I'd like to use reference objects, but they do not behave properly as keys in dictionaries.

2018年08月15日50分22秒

Mono, your question is moot if your Items are not comparable. If they cannot be used as keys in Dictionary, there is no solution available.

2018年08月15日50分22秒

I think Mono meant the keys are not sortable. But Daniel's solution is clearly intended to be implemented with a hashtable, not a tree, and will work as long as there's an equivalence test and a hash function.

2018年08月14日50分22秒

Upvoted of course for the help, but not accepted since it's missing an important point (which I cover in my answer).

2018年08月15日50分22秒

FWIW, you can simplify your last foreach loop and return statement with this: return dict.All(kvp => kvp.Value == 0);

2018年08月15日50分22秒

Sounds good! Glad my answer helped.

2018年08月15日50分22秒

Nice job, but Note: 1. In contrast to Daniel Jennings solution, This is not O(N) but rather O(N^2), because of the find function inside the foreach loop on the bar collection; 2. You can generalize the method to accept IEnumerable<T> instead of ICollection<T> with no further modification to the code

2018年08月14日50分22秒

of course, using a HashSet assumes no duplicates but if so HashSet is the best way to go

2018年08月14日50分22秒

The only issue with this code is that it only works when comparing value types or comparing the pointers to reference types. I could have two different instances of the same object in the collections, so I need to be able to specify how to compare each. Can you pass a comparison delegate to the intersect method?

2018年08月14日50分22秒

Sure, you can pass a comparer delegate. But, note the above limitation regarding sets that I added, which puts a significant limit on its applicability.

2018年08月14日50分22秒

The Intersect method returns a distinct collection. Given a = {1,1,2} and b ={2,2,1}, a.Intersect(b).Count() != a.Count, which causes your expression to correctly return false. {1,2}.Count != {1,1,2}.Count See link[/link] (Note that both sides are made distinct before comparison.)

2018年08月14日50分22秒

What is the point of Serializable for Comparer class? :o Also you can change the input to ISet<T> to express it is meant for sets (ie no duplicates).

2018年08月15日50分22秒

nawfal thanks, don't know what I was thinking when I marked it Serializable... As for ISet, the idea here was to treat the IEnumerable as a set (because you got an IEnumerable to begin with), though considering the 0 upvotes in over 5 years that may not have been the sharpest idea :P

2018年08月14日50分22秒

Except won't work for counting duplicate items. It will return true for sets {1,2,2} and {1,1,2}.

2018年08月14日50分22秒

CristiDiaconescu you could do a ".Distinct()" first to remove any duplicates

2018年08月15日50分22秒

The OP is asking for [1, 1, 2] != [1, 2, 2] . Using Distinct would make them look equal.

2018年08月14日50分22秒

How well does this perform, any ideas?

2018年08月14日50分22秒

I only use this for small collections, so have not thought about Big-O complexity or done benchmarking. HaveMismatchedElements alone is O(M*N) so it may not perform well for large collections.

2018年08月15日50分22秒

If IEnumerable<T>s are queries then calling Count() is not a good idea. Ohad's original answer's approach of checking if they are ICollection<T> is the better idea.

2018年08月14日50分22秒

I've just found on latest version that there is a parameter bool ignoreOrder on ShouldBe method.

1970年01月01日00分03秒