Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does this differ from Spotify, DropBox etc? #6

Open
magnusart opened this issue Dec 2, 2014 · 2 comments
Open

How does this differ from Spotify, DropBox etc? #6

magnusart opened this issue Dec 2, 2014 · 2 comments

Comments

@magnusart
Copy link

Hi

I read the website. I agree that this problem have some UX challenges. But I don't see this as a particularly unexplored area, but then again I might be missing something?

Please bear with me as I expand below.

Offline first == File sync

As I see it, your problem domain becomes very similar to the challenges of a distributed version control system (git), a multi client file backup (DropBox) or even a offline music player (Spotify). These all have attacked the UX problems you mention and have a lot of ready solutions.

File sync, but we're working with JSON-data in our API:s?

If you treat your data as files in a file system, then the filename is the checksum of said data (again how git stores data).

This have two benefits:

  1. You store data idempotent since if you upload the same data twice and your key is a checksum the value is guaranteed to be identical.
  2. You can factor the sync problem out of your application API-equation.
    • This is important because you don't want to change your sync API for each new feature or type of data you add support for, trust me on this one, it is very painful.

Typically you would store this in a document or a key-value database on the backend. Your client basically holds a LRU-cache (Last Recently Used) on disk. This means that the most accessed data is always available (Spotify), but you can limit data storage to a certain amount.

How do you store and read these files?

Structuring and accessing you data becomes a bit different than the traditional model. Since you are syncing files with checksum as keys, you can't hardcode application logic to a particular REST API-call.

Luckily you don't have to search very far to find an excellent example. This one you already know about and use every day. It is called: The world wide web (or HTTP + HTML).

Yes, this means that you should view your client as a browser when exploring your data. A browser typically makes no pre-assumption as to what content an URI will contain.

Example: Web browser

  1. You start off by clicking a link or entering a web address in the url bar.
  2. This prompts you to download a resource, but only if it is not already in your cache.
  3. You know this by either asking if the file have been modified since a particular date or by sending the checksum in an If-None-Match header.
  4. When you download it you look at the mime-type (which can be derived from file ending), which decides how you should interpret this particular resource.
  5. If it is a HTML page you start to download all the images into your local cache and some browsers also prefetch and caches links to other pages.

Example: Your data browser

Your data model needs to be structured like a file system (a tree datastructure). Where have one or several root nodes (your index.html). The child nodes contains links to data and links to other nodes.

  1. You access a root node and download that data, unless you have a local up to date copy (If-None-Modified)
  2. By looking at the mime-type you know how to interpret the downloaded data in you client code (application/vnd.myapp.user.v1+json).
    • This is instead of hardcoding logic to an API-call.
  3. Files or pages typically represents one screen or one set of screens of related data that will be updated together.
  4. Just like HTML pages you can link to other resources, some you download directly (images) and some are navigational elements to other pages. You can choose to download and cache these aggressively or defer download until the user clicks them.

When persisting your data you need to end up in what would be called materialized views (like HTML-pages are). This kind of data is typically a much better fitted in Document stores than Relational databases.

Writing

If you update a file and therefore change the file name (file name == checksum, remember) you need to make sure all the links to that document are updated. You do this by changing all the nodes in that branch of the tree and lastly update the root element (which becomes your transactional boundary).

This model also solves versioning. Because you don't have to remove old data that a user needs to continue working. If you delete data you only do it at a time when no links exists to it any more (basically a Garbage Collector).

Maybe I'm making a lot of assumptions, but if this problem is in fact not new, does that not imply that there are already well known UX models we can build upon as well?

@irshadc
Copy link

irshadc commented Aug 9, 2018

Great Analogy and to the point correct. Just wanted to update on the same.

When we design our offline model in browser way it will work as browser connects with our server.
If Browser request for the first time, it inform server about its state and ask for a fresh copy.
If Browser want to refresh, browser has to send the file along with timestamp of last update and server will tell you if it is expired or okay to use.
If browser has to update the same, it tries pushing the data to server and server will do the conflict resolution.

These scenario need special handling considering for scenarios:
Considering Client once hit a page will never refreshes it.

  • Need to automatically refresh data in background if server receives the fresh copy.
  • File dependency of one over other & how to handle it?
  • Updated record offline, without internet now after 3 hours it is getting synced.

I am looking towards an architecture where client doesn't have to handle so much of scenarios.

  • How to migrate from one version to another with seamless upgrade experience.
  • How to handle large dataset (how much to sync and how it should refresh such datasets)?
  • How to efficiently work when you have active connection & offline data?

We are working on similar kind of architecture for our mobility accelerator framework, to tackle above issue. Hopefully if we can open-source it.

@magnusart
Copy link
Author

Hi

Not sure I followed all your points. But for the below statements:

Need to automatically refresh data in background if server receives the fresh copy.
You can implement a push functionality on changes. Since you only ever commit changes on the backend and you would have materialized views for each user that contains exactly what they are allowed to view you can also know when something has changed for that user even for a specific client. You only need to keep track on what index version a specific user/device is at and push a notification when a new index is available.

File dependency of one over other & how to handle it?
Changes are pushed to the backend. If a conflict occurs you create a diff (in whatever format makes sense for your application) and present that to the user. If it cannot be resolved you can resolve it offline with your support manually.

Updated record offline, without internet now after 3 hours it is getting synced.
Yes, just like you commit code into your git repository. You have a local queue of changes that are uploaded. Your client can still let the user interact with changes, but then state that it is unsaved data. You could have a lookup table that maps the cached file with the unsynced edited file on the client until a new index has been downloaded.

How to migrate from one version to another with seamless upgrade experience.
This would happen like this:

  1. Upload all pending changes from client to backend. This is a file upload so if even if a conflict occurs, backend would still keep the change and an upgrade can proceed. You can always do conflict resolution later.
  2. Update the client code
  3. The client code will download the an index that is intended for the new client version.
  4. The new index will point to whatever resources it needs, if the resources have changed (new structure or data) it will need to be downloaded again, if not then cached files are reused
  5. Garbage collect any orphans/unreferenced files

How to handle large dataset (how much to sync and how it should refresh such datasets)?
It would work just like a web browser. As an example: you browse to a photo gallery cache a lot of large photos. When the author updates the title of the index page you would not have to download all cached photos on you next visit.

This translates to: if you have large data blobs, make them separate files! If you have data that changes very often then do not put them in the same files as data that changes very seldom. Do not use deep nested hierarchies for your files if your leaves change often, it will invalidate all parents.

How to efficiently work when you have active connection & offline data?
You don't need to apply the model for everything in your application. A chat application that is streaming a lot of data should perhaps not use this model. You can segment your application into parts that depend on an active connection and parts that do not. It all depends on what your non functional requirements are. There are always trade offs to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants