C7 Save Structure Proposal #377
Replies: 8 comments 5 replies
-
Will Godot be compatible with .NET 7.0? Also, it seems like this approach will require game objects implementing their serialisation or some function that traverses the game objects serialising them, in which case the lack of support for serialising polymorphic classes wouldn't be an issue? |
Beta Was this translation helpful? Give feedback.
-
I have started exploring using Newtonsoft for this purpose. Serializing basic objects is easy. For UnitPrototype.cs, I have written a test that serializes it with one line:
No changes necessary in the file! The result (with the unit test object) is:
This is nice and readable. I think I'm going to continue building out tests for one part of the save at a time, and they will gradually tie together (I intentionally picked UnitPrototype first as it doesn't reference any of our other objects). If we migrate to Godot 4 when it is released, we could use .NET 7 as that change has now been merged, and thus .NET 7 built-in serialization with polymorphic support. But moving to Godot 4 might require a significant amount of changes, and I don't know if .NET 7 is being backported to Godot 3.6. Thus I favor using Newtonsoft now rather than waiting for Godot. References on polymorphic serialization/deserialization: https://www.newtonsoft.com/json/help/html/SerializeTypeNameHandling.htm, https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_TypeNameHandling.htm . Note that in the first example, it uses the "All" option. I think we would be okay with the less verbose "Objects" option, but we shall see. |
Beta Was this translation helpful? Give feedback.
-
Rather than continue to incrementally add Newtonsoft support, I decided it would be beneficial to get a high-level view by creating a UML diagram showing the relationships between all of the classes in our save file. That is now part of this Wiki article. |
Beta Was this translation helpful? Give feedback.
-
Did a bit more reading on the Newtonsoft documentation. This page is interesting as it talks about preserving references to objects. Notably it shows how we can avoid the problem of objects being duplicated, without having to manually handle their IDs. This would, for instance, reduce the duplication of terrain types we currently have. I'm still trying to grok how IsReference works. My primary concern with just setting PreserveReferencesHandling to Object is how do we control where the master copy is? E.g. if it encounters TerrainType in the GameMap before it does in the GameData, where will the terrain types be defined? It might work regardless of where they are defined, but I would want to test this out. This could allow us to effectively avoid duplication and having polymorphism without having to manually re-wire object links at load time, as we do currently, if things work out in the ideal way which is TBD. |
Beta Was this translation helpful? Give feedback.
-
Also wanted to capture a thought I had here before I forget it. There has been some talk in the IDs thread about network support. I realized yesterday that we don't necessarily have to use the same ID for human-readable and network IDs. If we need some degree of randomness/guaranteed-uniqueness for synchronization purposes, that doesn't have to affect the save game. |
Beta Was this translation helpful? Give feedback.
-
leaving a comment before I forget, but the current Godot 4 beta has .NET 7 support, so it may be worth waiting to try using .NET 7 json marshalling before pursuing Newtonsoft further |
Beta Was this translation helpful? Give feedback.
-
I was experimenting with loading JSON in .NET 7 since the Godot 4 branch runs with .NET 7, and I found that preserving references causes a fairly large set of problems, but in particular the naive serialisation of the current GameData class results in reference not found errors when deserialising it. Looking at what the save data consists of, there aren't that many references stored in GameData to begin with, and I'm thinking the extra effort of saving everything by value may not be that much more difficult that resolving reference issues in the save. Some benefits include forcing us to write some ID system (ie. replacing copies of MapUnit with an ID), and it drastically simplifies loading C7 saves from previous versions, since saving C# references in JSON breaks if any of the referenced class definitions change, whereas if the save is plain old data, it would only require some custom deserialisation logic based on the version number to convert the map to the current version. |
Beta Was this translation helpful? Give feedback.
-
I did a save structure experiment #413 trying to use POD (what I really mean is only using classes that can be serialised to regular json without references or cycles) only in the save and human friendly ids. There's a more descriptive write up in the pr |
Beta Was this translation helpful? Give feedback.
-
C7 Save Structure Proposal
Author's note: This is an attempt to move forward on the save format question, while also providing a high-level overview/documentation of that proposed format. As the conclusion notes, provided no catastrophic flaws are discovered, I plan to go forward with this vision, keeping in mind that we can always make future revisions as we learn more from experience.
Goals/Problem Statement
So far in C7, we have been running with essentially the default serialization provided by .NET, with the occasional [JsonIgnore] to avoid saving transient/calculated data, or data that might loop. Nevertheless, significant problems have become evident, not least that many nested data structures are saved, resulting in the same data appearing in many places throughout the save file. As we add more elements to the save data, this problem intesifies.
This document proposes the following guiding principles for our save design:
We'll now cover each in turn.
E pluribus unum
The most important design goal is e pluribus unum, or "from many, one" for those whose native tongue is not Latin. What does this mean in the context of a C7 save? It means that while there may be many copies of data in memory while the game is running, to optimize performance and simplify reading data, there will only be one copy of each piece of data in the save. References to the data can be made from elsewhere within the save, but the data itself will not be duplicated.
For example, the terrain data for "ocean" will only - how much food it produces, which graphics it uses, etc. - will only be defined in one place. A tile may reference that its terrain it ocean, but it will only reference that by ID, not replicating all the other data about ocean terrain.
This has several advantages, such as reducing the uncompressed save size, allowing data to be updated in one place only with a text editor and having that update be recognized universally in-game, and avoiding excessive nesting and potential infinite looping of the save structure when there are cyclical references.
The disadvantage is that when the save is loaded, the in-memory copies will have to be re-created. At this point, however, some amount of that is seen as inevitable. However, this is expected to be inevitable as the game progresses from a rough proof of concept to having significant amounts of data.
Easy to understand for human readers
This point has received the most attention previously. All agree that binary is not very human-readable. But the debate on what to use in its place has never been fully resolved. Everyone presents their opinion, and soon the discussion fizzles out. We need a whiteboard and a conference room, and maybe a follow-up meeting the next day after having had a chance to think on it.
One of the oft-debated arguments revolves around how things will work over a network. Fundamentally, however, that depends on a theoretical multiplayer architecture that is also not defined, and to which no one has made significant published effort. Thus, instead of attempting to design for that, I am going to punt on it until more is known about the network design, and allow that to be added in a future revision.
Thus, I will be making the executive decision that data identifiers should use a key:number format, where the number can be generated by a sequential iterator, and is only required in cases where the data can be dynamically generated. A few examples will further this point.
Example One: Terrain types
Terrain types are static and unchanging. New ones cannot be added during the game. Thus, they will not use the :number part of the identifier. Instead, they will be static strings such as "ocean", "sea", and "plains".
Terrain types already follow this format, but it may be appropriate for other types as well. For example, for unit types, swordsman could be a key.
Example Two: Units
Units, as in individual units on the map, require a numerical identifier to distinguish them. This document proposes that they be in the form of unitType:sequentialNumber, for example "warrior:2".
This makes them easy to identify both in the save file and in logs. A downside is that if their type changes, either their ID should change to match, or it will be out of sync. Since their actual type will be stored in the save, and their type cannot currently change regardless, this problem is currently not considered a dealbreaker.
An alternative could be labeling them simply unit:12, but specifying the type provides some aid in the user knowing at a glance what the type is when reading the save.
Example Three: Tiles
Some items have natural identifiers, such as tiles, which have coordinates. Tiles could have their identifiers be something like: tile:52,41.
Cities are another candidate. Instead of city:17, the city could be city:sofia. Like units, there is a risk of a city being renamed, but so long as duplicate names are not allowed, there would still be consistency among individual save files.
Even citizens could follow a similar scheme, e.g. citizen:sofia:3.
Supports Polymorphic Classes
The AI classes have been designed with polymorphism as a key design element. By designing AI classes to fit defined contracts, new subclasses can be added by modders without having to understand the entire structure of the AI design.
The AI classes can also store data, which needs to be persisted across saves. The most straightforward way to do this is to allow polymorphic classes to be serialized. Although in .NET 6.0, the built-in serializer does not support this, many others, including Newtonsoft for .NET, do support this. In addition, in .NET 7.0, the built-in serializer will add support for polymorphic classes. .NET 7.0 is scheduled to launch in November of 2022.
An alternative could be to store all AI data in maps can be stored on a separate, non-polymorphic class. However, that introduces less intuitive AI programming than using traditional flat variable structures. Assuming Newtonsoft/.NET 7.0 do not present unforeseen challenges, polymorphic classes are preferable.
Conclusion
The save format has suffered from a lack of focus, aided by its apparent success in the first few releases. However, more formal principles and definition is now required.
One of the challenges has been a failure to resolve differences of opinion on the preferable form of the format. With the note that there can always be future, incompatible updates given our current pre-alpha status, I will thus be establishing that pending convincing arguments that these proposals are catastrophic mistakes, I'll be going ahead with this plan. Perfect is the enemy of the good. Network concerns with IDs can be resolved once we have an idea what a network implementation would look like. Getting a functional base that can be built upon for save games is more important.
Future appendices will detail the specific structure, including which items will be top level in the e pluribus unum approach.
Beta Was this translation helpful? Give feedback.
All reactions