Thursday, March 15, 2007

Performance: ArrayList vs List<>

Quite recently I was trying to optimize/cleanup some .net 2.0 code that had recently been moved from framework 1.1. Naturally various old fashioned Arraylists was scattered over most of the code, and I began the tedious work of tidying up and using generic lists.
As I was doing this I began to wonder how big a performance difference there actually is with various list types, and I decided to investigate this.
The three list types I decided to check were these:

  • ArrayList. Leftover from .net 1.0, basically a dynamically growing array of objects. Everything has to be boxed to an object in order to use this.
  • List<>. The generic List in 2.0. For valuetypes, this doesn't box them, but stores them quite effiecently.
  • Normal arrays. For comparison.

So far I've only decided to test adding/setting the values. I expect similar measurements can be made with regards to removing, accessing and finding. Later on it might also be interesting to look at searching and sorting.
I decided to compare how the ArrayList and ListType does, when adding strings, integers and finally when adding strings after the list has been initialized to the expected size. Just for fun I also tested how the ArrayList performed in framework 1.1.

For the integer tests I added the number 42 to the collections, and for the string tests I added "42".
These are the results I got(times are in ms, running on my laptop):


Items 1 mil 2 mil 4 mil 8 mil
ArrayList Add (int) 1.1 200 500 1121 3000
ArrayList Add (int) 150 450 1100 2600
ArrayList Add (string) 1.1 40 150 250 530
ArrayList Add (string) 70 150 290 630
ArrayList Add (string) PreSize 20 60 110 230
List Add (int)
30 50 120 230
List Add (string)
60 120 280 520
List Add (string) PreSize 20 50 100 200
Array (int) Set 10 20 30 60
Array (string) Set 20 30 70 170

Graphically the data looks something like this:


The conclusions I draw from this is as follows:
  • Always use arrays or List's when possible, when working with valuetypes. ArrayLists are naughty. The big overhead is probably due to the boxing that takes place (an Int32 object has to be created for every number, and a reference to it is stored in the ArrayList).
  • When working with reference-type data (like strings) it's typically not worth the effort to change it to Lists unless you are dealing with really big amounts of data. The List is only around 110 ms faster than the arraylist when dealing with 8 mil. items and both seem to scale linearly.
  • When possible it really does help a lot to initialize your list or ArrayList to the expected size. It makes the performance approach that of a normal array.
  • When Microsoft claims that they've improved performance a great deal for collection-types in framework 2.0, it's only true for the "new" types like List. There's almost no difference in ArrayList performance!

12 comments:

Alexey Rusakov said...

I think they always referred to generic collections when saying that, at least I never expected ArrayLists to become noticeably faster.

And if not for the performance, it might be worth changing to List<> for cleaner API if you're doing major conversion anyway.

Allan Thræn said...

I totally agree that Lists certainly does make prettier code. But then again, if it works don't fix it :-)

Alexey Rusakov said...

"But then again, if it works don't fix it :-)"

I try to fight with that ;)

Stavros said...

In general is good to use generics. Once you start, you never come back.. For Benefits of Generics, read here.

You cannot compare the performance of Array and List. They are different data structures. By definition they have different running times in different tasks because they are meant to fullfil different needs.. I guess you have CLRS. If not, I highly recommend it.

Nice blog by the way!

Allan Thræn said...

I agree, Stavros, it's not fair to compare Lists and Arrays cause they fullfill different purposes. The idea in this entry was actually more to compare ArrayList and Generic List (I'm also fan of generics, but sometimes I run into old code, and the question always arises if it's really worth the trouble of cleaning up the old lists).
The arrays I simply threw in as a baseline, since most lists internally uses array (in various combinations) to store their data.

aiyipianni said...
This comment has been removed by a blog administrator.
aiyipianni said...
This comment has been removed by a blog administrator.
dghnfgj said...
This comment has been removed by a blog administrator.
John said...
This comment has been removed by a blog administrator.
bing said...
This comment has been removed by a blog administrator.
Pachydermitis said...

This was a really helpful post - thanks. I realize that list and arrays can have different functionality, but if I don't need that, it's really nice to know if there is additional overhead to avoid by using one or the other.

Javin Paul said...

Nice article, you have indeed covered the array and arraylist in java in detail with code example. very handy for quick use.