Monday, December 3, 2007

C# Xml List Serialization Behavior

Xml serialization in C# is, for the most part, simple and transparent. Public properties of your objects are automatically written to and read from a corresponding Xml representation. Usually this happens exactly as you would expect, but there are a few gotchas in serialization of lists.

Take this example object:

public class XmlTest
{
private List<int> integerList = new List<int> { 1, 2, 3 };

public List<int> IntegerList
{
get { return integerList; }
set { integerList = value; }
}
}


We have a single property which is a list of integer values. If we create an object and serialize it:

XmlTest xmlTest = new XmlTest();
TextWriter writer = new StreamWriter("test.xml");
XmlSerializer serializer = new XmlSerializer(typeof(XmlTest));
serializer.Serialize(writer, xmlTest);
writer.Close();


we get the following Xml, just like we would expect:

<?xml version="1.0" encoding="utf-8"?>
<XmlTest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<IntegerList>
<int>1</int>
<int>2</int>
<int>3</int>
</IntegerList>
</XmlTest>


But when we deserialize the generated Xml like so:

FileStream fs = new FileStream("test.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(fs);
xmlTest = (XmlTest)serializer.Deserialize(reader);
fs.Close();


we get a surprise. The integer list in the deserialized object has 6 items rather than three. The list ends up being "1, 2, 3, 1, 2, 3". Why is this? It has to do with the details of how lists are deserialized. If you put a breakpoint in the "set" of the IntegerList property, you will find that it never gets called during deserialization. Instead, it seems that the .NET deserializer uses "get" to access the property, and then calls "Add()" to deserialize the list items. Because we initialize the list to "1, 2, 3", those items are already there before adding the items from the Xml.

I don't know that I am prepared to call this behavior "wrong", but it certainly was unexpected for me. Ideally, serializing and then deserializing an object would result in the exact same data. In this case, not so much.

A better behavior, I think, would be for the deserialzer to creat a new list, populate it from the Xml, and then call my property's "set" method to hook it into my object. This also would allow you to put business logic in your property and have it survive serialization. At the very least, it would probably make sense to clear the list before adding items during deserialization.

7 comments:

  1. Generally the XMLSerializer likes to deal with fields. I generally create pure data objects that I use for my serialization and facade those with the actual access objects. That means I also can control the public interface of my objects, since XMLSerialization insists on having everything public (unlike the binary formatter).

    ReplyDelete
  2. Yeah, I'm usually too lazy to do that, though. Especially when I have lots of relatively simple objects.

    ReplyDelete
  3. Mike,

    The reason for the discrepancy is as follows. Have a look at the integerList instance field; this is 'populated' upon instantiation. Your example then serializes to a .xml file (in this case test.xml) -- so far so good. Just before de-serialization {1,2,3} is already in the list because you've just used the "new" modifier. De-serialization occurs, and you're left with {1,2,3,1,2,3}.

    Add something like:

    public void setList(){ integerList = new List< int > { 1, 2, 3 };}

    to the XMLTest class; call it just before you do your serialize.

    ReplyDelete
  4. //
    // Only a proof of concept.
    //

    static void Main(string[] args)
    {
    XmlTest xmlTest = new XmlTest();
    xmlTest.setList();

    // Outputs 3
    Console.WriteLine(xmlTest.IntegerList.Count);

    TextWriter writer = new StreamWriter("test.xml");
    XmlSerializer serializer = new XmlSerializer(typeof(XmlTest));
    serializer.Serialize(writer, xmlTest);
    writer.Close();

    FileStream fs = new FileStream("test.xml", FileMode.Open);
    XmlReader reader = new XmlTextReader(fs);
    xmlTest = (XmlTest)serializer.Deserialize(reader);
    fs.Close();

    // Outputs 3
    Console.WriteLine(xmlTest.IntegerList.Count);
    Console.ReadKey();
    }

    [Serializable]
    public class XmlTest
    {
    public void setList(){ integerList = new List< int > { 1, 2, 3 };}

    private List< int > integerList = null;

    public List< int > IntegerList
    {
    get { return integerList; }
    set { integerList = value; }
    }
    }

    ReplyDelete
  5. Hi Pete - I understand "why" it happens, and that it is not too difficult to work around, I just don't think it is the correct behavior. Default initialization of properties just should not interact with serialization in this manner.

    It would be easy for Microsoft to fix. I've since rolled my own Xml serializer to work around this, and other issues/limitations with the way that .NET serialization works (the biggest one being the inability to serialize interfaces).

    ReplyDelete
  6. Hey Mike, I stumbled onto your post and thought I'd shed a little more light on the matter.

    My guess for the cause of this behavior has to do more with the guts of List(Of T); if you look at the list's Items() collection you'll see it's declared as a ReadOnly instance of IList(Of T).

    The fact that it's ReadOnly says that it's not compatible with serialization, which works against public properties.

    Then again, it's nothing I would have expected either--but at least it looks like there's a reason for the behavior.

    ReplyDelete
  7. Hi Mike,

    Maybe related to this is the non-intuitive behaviour of serializing/deserializing List(List(T)). I write round brackets, the post doesn't seem to accept the angle brackets. If the "inner" List(T) is actually null, it is deserialized as a List(T)-object with Count=0. Which is also not "the correct behavior...

    Regards,
    Bart Donders
    the Netherlands

    ReplyDelete