C7 Save Structure Proposal #377

QuintillusCFC · 2022-11-05T00:26:44Z

QuintillusCFC
Nov 5, 2022
Maintainer

C7 Save Structure Proposal

Author's note: This is an attempt to move forward on the save format question, while also providing a high-level overview/documentation of that proposed format. As the conclusion notes, provided no catastrophic flaws are discovered, I plan to go forward with this vision, keeping in mind that we can always make future revisions as we learn more from experience.

Goals/Problem Statement

So far in C7, we have been running with essentially the default serialization provided by .NET, with the occasional [JsonIgnore] to avoid saving transient/calculated data, or data that might loop. Nevertheless, significant problems have become evident, not least that many nested data structures are saved, resulting in the same data appearing in many places throughout the save file. As we add more elements to the save data, this problem intesifies.

This document proposes the following guiding principles for our save design:

E pluribus unum
Easy to understand for human readers
Supports polymorphic classes

We'll now cover each in turn.

E pluribus unum

The most important design goal is e pluribus unum, or "from many, one" for those whose native tongue is not Latin. What does this mean in the context of a C7 save? It means that while there may be many copies of data in memory while the game is running, to optimize performance and simplify reading data, there will only be one copy of each piece of data in the save. References to the data can be made from elsewhere within the save, but the data itself will not be duplicated.

For example, the terrain data for "ocean" will only - how much food it produces, which graphics it uses, etc. - will only be defined in one place. A tile may reference that its terrain it ocean, but it will only reference that by ID, not replicating all the other data about ocean terrain.

This has several advantages, such as reducing the uncompressed save size, allowing data to be updated in one place only with a text editor and having that update be recognized universally in-game, and avoiding excessive nesting and potential infinite looping of the save structure when there are cyclical references.

The disadvantage is that when the save is loaded, the in-memory copies will have to be re-created. At this point, however, some amount of that is seen as inevitable. However, this is expected to be inevitable as the game progresses from a rough proof of concept to having significant amounts of data.

Easy to understand for human readers

This point has received the most attention previously. All agree that binary is not very human-readable. But the debate on what to use in its place has never been fully resolved. Everyone presents their opinion, and soon the discussion fizzles out. We need a whiteboard and a conference room, and maybe a follow-up meeting the next day after having had a chance to think on it.

One of the oft-debated arguments revolves around how things will work over a network. Fundamentally, however, that depends on a theoretical multiplayer architecture that is also not defined, and to which no one has made significant published effort. Thus, instead of attempting to design for that, I am going to punt on it until more is known about the network design, and allow that to be added in a future revision.

Thus, I will be making the executive decision that data identifiers should use a key:number format, where the number can be generated by a sequential iterator, and is only required in cases where the data can be dynamically generated. A few examples will further this point.

Example One: Terrain types

Terrain types are static and unchanging. New ones cannot be added during the game. Thus, they will not use the :number part of the identifier. Instead, they will be static strings such as "ocean", "sea", and "plains".

Terrain types already follow this format, but it may be appropriate for other types as well. For example, for unit types, swordsman could be a key.

Example Two: Units

Units, as in individual units on the map, require a numerical identifier to distinguish them. This document proposes that they be in the form of unitType:sequentialNumber, for example "warrior:2".

This makes them easy to identify both in the save file and in logs. A downside is that if their type changes, either their ID should change to match, or it will be out of sync. Since their actual type will be stored in the save, and their type cannot currently change regardless, this problem is currently not considered a dealbreaker.

An alternative could be labeling them simply unit:12, but specifying the type provides some aid in the user knowing at a glance what the type is when reading the save.

Example Three: Tiles

Some items have natural identifiers, such as tiles, which have coordinates. Tiles could have their identifiers be something like: tile:52,41.

Cities are another candidate. Instead of city:17, the city could be city:sofia. Like units, there is a risk of a city being renamed, but so long as duplicate names are not allowed, there would still be consistency among individual save files.

Even citizens could follow a similar scheme, e.g. citizen:sofia:3.

Supports Polymorphic Classes

The AI classes have been designed with polymorphism as a key design element. By designing AI classes to fit defined contracts, new subclasses can be added by modders without having to understand the entire structure of the AI design.

The AI classes can also store data, which needs to be persisted across saves. The most straightforward way to do this is to allow polymorphic classes to be serialized. Although in .NET 6.0, the built-in serializer does not support this, many others, including Newtonsoft for .NET, do support this. In addition, in .NET 7.0, the built-in serializer will add support for polymorphic classes. .NET 7.0 is scheduled to launch in November of 2022.

An alternative could be to store all AI data in maps can be stored on a separate, non-polymorphic class. However, that introduces less intuitive AI programming than using traditional flat variable structures. Assuming Newtonsoft/.NET 7.0 do not present unforeseen challenges, polymorphic classes are preferable.

Conclusion

The save format has suffered from a lack of focus, aided by its apparent success in the first few releases. However, more formal principles and definition is now required.

One of the challenges has been a failure to resolve differences of opinion on the preferable form of the format. With the note that there can always be future, incompatible updates given our current pre-alpha status, I will thus be establishing that pending convincing arguments that these proposals are catastrophic mistakes, I'll be going ahead with this plan. Perfect is the enemy of the good. Network concerns with IDs can be resolved once we have an idea what a network implementation would look like. Getting a functional base that can be built upon for save games is more important.

Future appendices will detail the specific structure, including which items will be top level in the e pluribus unum approach.

pcen · 2023-01-20T06:15:34Z

pcen
Jan 20, 2023
Collaborator

Will Godot be compatible with .NET 7.0? Also, it seems like this approach will require game objects implementing their serialisation or some function that traverses the game objects serialising them, in which case the lack of support for serialising polymorphic classes wouldn't be an issue?

1 reply

QuintillusCFC Jan 20, 2023
Maintainer Author

I believe Godot eventually will be compatible with .NET 7.0, the question is when. According to godotengine/godot-proposals#5780 it might not be yet, but I just installed .NET 7.0 and started C7 from Godot 3.5.1 and it worked. Is it running on .NET 7 or is it still on .NET 6? More investigation is required.

The target I am hoping to strike is that by implementing our own logic for IDs and whether nested objects are referred to in whole or just by their ID, we can get all the properties on those objects (includings lists and maps where appropriate) for free. You are right that if we implemented our own serialization altogether, the lack of support for polymorphic classes wouldn't be a problem, but it is tedious writing out manually "write out this integer property, and that one, and this string property, etc." - I did that for my editor because I didn't know any other approach at the time (and Civ3 data is binary which makes it harder).

I am open to other proposals. I should also probably add a code example to this one, maybe even a small sample branch, to complement the written descriptions. I ran out of steam on the project just after writing this up, so thus far I haven't moved forward with it.

Also, welcome back! 👋

QuintillusCFC · 2023-02-05T16:57:10Z

QuintillusCFC
Feb 5, 2023
Maintainer Author

I have started exploring using Newtonsoft for this purpose. Serializing basic objects is easy. For UnitPrototype.cs, I have written a test that serializes it with one line:

string serialized = JsonConvert.SerializeObject(prototype, Formatting.Indented);

No changes necessary in the file! The result (with the unit test object) is:

{
  "categories": [
    "Sea"
  ],
  "actions": [
    "Move"
  ],
  "attributes": [
    "Can move on Sea"
  ],
  "name": "Frigate",
  "shieldCost": 70,
  "populationCost": 0,
  "attack": 4,
  "defense": 4,
  "bombard": 1,
  "movement": 5,
  "iconIndex": 72
}

This is nice and readable. I think I'm going to continue building out tests for one part of the save at a time, and they will gradually tie together (I intentionally picked UnitPrototype first as it doesn't reference any of our other objects). If we migrate to Godot 4 when it is released, we could use .NET 7 as that change has now been merged, and thus .NET 7 built-in serialization with polymorphic support. But moving to Godot 4 might require a significant amount of changes, and I don't know if .NET 7 is being backported to Godot 3.6. Thus I favor using Newtonsoft now rather than waiting for Godot.

References on polymorphic serialization/deserialization: https://www.newtonsoft.com/json/help/html/SerializeTypeNameHandling.htm, https://www.newtonsoft.com/json/help/html/T_Newtonsoft_Json_TypeNameHandling.htm . Note that in the first example, it uses the "All" option. I think we would be okay with the less verbose "Objects" option, but we shall see.

0 replies

QuintillusCFC · 2023-02-05T19:34:04Z

QuintillusCFC
Feb 5, 2023
Maintainer Author

Rather than continue to incrementally add Newtonsoft support, I decided it would be beneficial to get a high-level view by creating a UML diagram showing the relationships between all of the classes in our save file. That is now part of this Wiki article.

0 replies

QuintillusCFC · 2023-02-07T04:59:55Z

QuintillusCFC
Feb 7, 2023
Maintainer Author

Did a bit more reading on the Newtonsoft documentation. This page is interesting as it talks about preserving references to objects. Notably it shows how we can avoid the problem of objects being duplicated, without having to manually handle their IDs. This would, for instance, reduce the duplication of terrain types we currently have.

I'm still trying to grok how IsReference works. My primary concern with just setting PreserveReferencesHandling to Object is how do we control where the master copy is? E.g. if it encounters TerrainType in the GameMap before it does in the GameData, where will the terrain types be defined? It might work regardless of where they are defined, but I would want to test this out.

This could allow us to effectively avoid duplication and having polymorphism without having to manually re-wire object links at load time, as we do currently, if things work out in the ideal way which is TBD.

2 replies

QuintillusCFC Feb 7, 2023
Maintainer Author

See also https://www.newtonsoft.com/json/help/html/PreserveReferencesHandlingObject.htm for another good example, showing the difference between Object and All. In this example it is clear that if we set if to object, it assigns IDs for all the objects, without us having to figure out an ID scheme. This is good, and I think this significantly increases the chance that we can use Newtonsoft's features to make this easier than I expected.

I think we would need All if we have arrays that are shared between objects, but which are the same array. I don't think that's going to be the case but am not entirely sure.

There's still an argument to be made for having our own more human-friendly ID scheme, but as can be seen in the file system example, it would not be bad searching for the item with ID 343, for example, in that setup.

QuintillusCFC Feb 7, 2023
Maintainer Author

Newtonsoft also has OnSerializing/OnSerialized and equivalent deserialization methods: https://www.newtonsoft.com/json/help/html/SerializationCallbackAttributes.htm

If we do need to perform some manual re-inflation, this would allow us to annotate the methods properly and not to have to invoke them manually. This is nice. I was hoping their annotation support was advanced enough to allow this.

QuintillusCFC · 2023-02-07T05:03:05Z

QuintillusCFC
Feb 7, 2023
Maintainer Author

Also wanted to capture a thought I had here before I forget it. There has been some talk in the IDs thread about network support. I realized yesterday that we don't necessarily have to use the same ID for human-readable and network IDs. If we need some degree of randomness/guaranteed-uniqueness for synchronization purposes, that doesn't have to affect the save game.

2 replies

pcen Feb 7, 2023
Collaborator

How would you synchronise when joining a multiplayer game? All the clients download the save data from the host, and the host generates new network ID's for every entity in the save?

QuintillusCFC Feb 8, 2023
Maintainer Author

Hmm, when hotjoining like in Civ4? I haven't thought about that. I was thinking more about the question of, if a player creates a new unit, how is that synchronized? If only the host can generate new objects, it's simpler than if any client can and the ID has to be unique.

Network design definitely needs more work, preferably from someone who plays multiplayer Civ or has multiplayer game design experience (I play single player). I'm wary of trying to design it at the same time as save support though, since IMO save support is much more important.

pcen · 2023-02-08T23:57:30Z

pcen
Feb 8, 2023
Collaborator

leaving a comment before I forget, but the current Godot 4 beta has .NET 7 support, so it may be worth waiting to try using .NET 7 json marshalling before pursuing Newtonsoft further

0 replies

pcen · 2023-02-20T22:34:40Z

pcen
Feb 20, 2023
Collaborator

I was experimenting with loading JSON in .NET 7 since the Godot 4 branch runs with .NET 7, and I found that preserving references causes a fairly large set of problems, but in particular the naive serialisation of the current GameData class results in reference not found errors when deserialising it. Looking at what the save data consists of, there aren't that many references stored in GameData to begin with, and I'm thinking the extra effort of saving everything by value may not be that much more difficult that resolving reference issues in the save. Some benefits include forcing us to write some ID system (ie. replacing copies of MapUnit with an ID), and it drastically simplifies loading C7 saves from previous versions, since saving C# references in JSON breaks if any of the referenced class definitions change, whereas if the save is plain old data, it would only require some custom deserialisation logic based on the version number to convert the map to the current version.

0 replies

pcen · 2023-06-30T22:08:31Z

pcen
Jun 30, 2023
Collaborator

I did a save structure experiment #413 trying to use POD (what I really mean is only using classes that can be serialised to regular json without references or cycles) only in the save and human friendly ids. There's a more descriptive write up in the pr

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C7 Save Structure Proposal #377

{{title}}

Replies: 8 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

C7 Save Structure Proposal #377

QuintillusCFC Nov 5, 2022 Maintainer

C7 Save Structure Proposal

Goals/Problem Statement

E pluribus unum

Easy to understand for human readers

Example One: Terrain types

Example Two: Units

Example Three: Tiles

Supports Polymorphic Classes

Conclusion

Replies: 8 comments · 5 replies

pcen Jan 20, 2023 Collaborator

QuintillusCFC Jan 20, 2023 Maintainer Author

QuintillusCFC Feb 5, 2023 Maintainer Author

QuintillusCFC Feb 5, 2023 Maintainer Author

QuintillusCFC Feb 7, 2023 Maintainer Author

QuintillusCFC Feb 7, 2023 Maintainer Author

QuintillusCFC Feb 7, 2023 Maintainer Author

QuintillusCFC Feb 7, 2023 Maintainer Author

pcen Feb 7, 2023 Collaborator

QuintillusCFC Feb 8, 2023 Maintainer Author

pcen Feb 8, 2023 Collaborator

pcen Feb 20, 2023 Collaborator

pcen Jun 30, 2023 Collaborator

QuintillusCFC
Nov 5, 2022
Maintainer

Replies: 8 comments 5 replies

pcen
Jan 20, 2023
Collaborator

QuintillusCFC Jan 20, 2023
Maintainer Author

QuintillusCFC
Feb 5, 2023
Maintainer Author

QuintillusCFC
Feb 5, 2023
Maintainer Author

QuintillusCFC
Feb 7, 2023
Maintainer Author

QuintillusCFC Feb 7, 2023
Maintainer Author

QuintillusCFC Feb 7, 2023
Maintainer Author

QuintillusCFC
Feb 7, 2023
Maintainer Author

pcen Feb 7, 2023
Collaborator

QuintillusCFC Feb 8, 2023
Maintainer Author

pcen
Feb 8, 2023
Collaborator

pcen
Feb 20, 2023
Collaborator

pcen
Jun 30, 2023
Collaborator