** UPDATE * - this is messed up! read the comments for more details...
A question came up today on MsWebDev as follows:
I have two lists of integers, each with approximately 300,000 integers in them. I need to work out which integers are in List A but not in List B.
Their first thought was to "literate through List A and ask if List B contains the current integer. If it doesn’t then I add it to a third list and return that at the end", but this was giving poor performance (10 minutes!)
The use of a Hashtable was suggested, and this gave dramatic performance improvements taking the operation down to 1 second!
foreach (int current in listB)
{
htB.Add(current, current);
}
List<int> cList2 = new List<int>();
foreach (int current in listA)
{
if (htB[current] != null)
{
cList2.Add(current);
}
}
However I was intrigued to see how Linq would perform. The code was beautifully simple:
var cList = from c in aList
where bList.Contains(c)
select c;
And when I ran a simple test, Linq was significantly faster than the Hashtable too:

That is quite interesting and at some point I will investigate how it is so.
For now if you want to play with the very rough and ready test code, you can download it from my sky drive here:
Cheers
Ian
Technorati Tags:
Linq,
Hashtable