** UPDATE * - this is messed up!  read the comments for more details...

A question came up today on MsWebDev as follows:

I have two lists of integers, each with approximately 300,000 integers in them. I need to work out which integers are in List A but not in List B.

Their first thought was to "literate through List A and ask if List B contains the current integer. If it doesn’t then I add it to a third list and return that at the end", but this was giving poor performance (10 minutes!)

The use of a Hashtable was suggested, and this gave dramatic performance improvements taking the operation down to 1 second!

 

foreach (int current in listB)
            {
                htB.Add(current, current);
            }

            List<int> cList2 = new List<int>();

            foreach (int current in listA)
            {
                if (htB[current] != null)
                {
                    cList2.Add(current);
                }
            }

 

However I was intrigued to see how Linq would perform.  The code was beautifully simple:

var cList = from c in aList
            where bList.Contains(c)
            select c;

 

And when I ran a simple test, Linq was significantly faster than the Hashtable too:

clip_image002

That is quite interesting and at some point I will investigate how it is so. 

For now if you want to play with the very rough and ready test code, you can download it from my sky drive here:

Cheers

Ian

 

Technorati Tags: ,