Client-Server Shared Data Structure

Question

bguild 163 Posting Whiz

11 Years Ago

I have been struggling with the principles of creating a client-server design where the server maintains a data structure that multiple clients can view and manipulate with various restrictions on which client can access which parts of the structure. I don't know what to properly call a system like that, but on the surface it seems like a simple problem. Unfortunately, I've found ways to make it complicated for myself and I've been slowly teaching myself how to simplify it to manageable levels. For example, one of my top design goals is that each client should have a local copy of the accessible part of the structure. This is necessary for creating a fast user interface, but it complicates the design by creating partial versions of the data structure and requiring consistency between client data structures and server data structures.

With or without a consistency requirement, I would want the server to send out messages to all clients who have permission to view a change when that change happens because keeping clients promptly notified about such changes is another one of my design goals. So I need the client to serialize and send each change that it wants to make to server, then the server makes that change to its copy, then it resends the change to all relevant clients so they can update their local copies. At first I wanted to have the original client update its local copy immediately, but I later realized that consistency is easier to maintain if all clients see the changes in the same order, so now I prefer to have the original client wait for the change to bounce back from the server before making the change locally.

Before I realized that this might be difficult, I jumped straight into the problem without thinking about it carefully. I had an application in mind. I knew what objects I wanted the server to manage and I designed classes to represent those objects in the straight-forward way, then I created serialization methods for each of those classes and serialization methods for every modification that might legally happen to objects of those classes. I used special values in fields to represent when the true value of the field was absent due to lack of permission to view it. I knew what sorts of permissions clients might have for each object and added fields in the classes to store each possible permission. I managed to get all of this mostly working before things started to fall apart. The whole system was complicated and unmanageable. It had problems and those problems were very difficult to find and fix, and making any sort of change was a huge effort.

In the hope of avoiding the problems that plagued me from my first effort I restarted from scratch and designed the system more carefully, but it was still designed on the same principles. This time I had less patience for putting in a large amount of work because I was painfully aware of how badly it might end up, and when I saw the same sorts of problems starting to form halfway through the project, I abandoned the entire idea for a long time.

Now I am attempting it again, but this time I have studied more object oriented design and have radically different ideas. I have learned from my past failures and this time I won't create the system for just one application. A system like this is too difficult and too much work to be practical unless it can be reused many times in many different applications, so I will try to reject anything application-specific from the design. I also realize that I was working at too low a level; instead of creating a class to represent each kind of object in the data structure, I should create a class to represent the concept of a kind of object. My kinds are now objects, not classes, and each object contain a description of what sort of content might appear in an object.

The objects of the data structure are now all uniformly instances of a single class. For example, call that class "Thing." The various roles of the objects are represented by a field of each object that points to an object of class "Kind" that represents that role. So, to represent cars, trees, and houses in the server data structure, the server would have three objects of class Kind: car, tree, and house. And each Thing object x would have a field "kind" and x.kind == car, or x.kind == tree, or x.kind == house. The Thing object stores the details of a car or a tree or a house in an uninterpreted way, like a hash table from field identifiers to objects of class Data, where Data is a totally abstract class with very few members. Thing can't do much with Data objects, but each Kind object knows what Data objects its Thing should have, so it can downcast to extract the necessary values for serialization. The real power of the Thing class is that it can keep track of permissions, modifications, and fields that point to other Things, and so all the difficult parts of the system no longer need to be designed and implemented for every kind of object in the structure; it can all be done just once in the relatively simple Thing class.

This seems like a powerful and promising technique and I have rushed ahead with an implementation, but questions and doubts nag at me and I want to look into how this sort of problem is handled by people on the Internet. Unfortunately I don't know what to google to get the answers I want. One question I have is what sort of structure Thing should be. My first thought was to make it two hash tables, one mapping identifiers to Thing objects and one mapping identifiers to Data objects. Then I realized that a list of Things would be needed by almost any application. Do I have the applications create a linked list where each cell is a Thing with a reference to one of the Things we want to store and the next cell in the list? If I do that then perhaps I should take a lesson from Lisp and simplify the Thing structure to nothing more than a cons cell, where the car can be either a Data or a Thing and the cdr can be either a Thing or a null. Perhaps I should even have more than one Thing subclass. One subclass could use hash tables while another stores a list of Things, but this seems to complicate the system and would introduce more downcasting which repels me because I have a natural fear of downcasting.

Using cons-cell-Things seems to simplify the server, but it might complicate the clients. To make a meaningful change, a client would need to send many small changes to update all of the cells involved. The client would need to acquire a lock to prevent other clients from reading or writing until all of the changes had been completed to prevent other clients from seeing the structure in an illegal state. Clients also need to interpret the change notifications that they get from the server. When a client gets a sequence of cons cell modifications such as new car and cdr values, it would need to figure out whether an item is being added or removed from a list so that it can update the user interface. Even if the client knew that an item was being added to a list, determining which list was being modified would be troublesome when all it has are cons cells.

It's possible that Kind objects can contain more than just serialization capabilities. A Kind object could also have a reference to objects that perform complex operations upon a client's request and sends out appropriate messages. This could help the system be customized for specific applications and also allow lists to be made out of cons-cell-Things without great burdens being place on clients. On the other hand, it seems like a step backward, away from a general solution and toward a solution that needs a large amount of application-specific work.

Another issue about lists is that I can imagine a wide variety of permissions one might have with them. In comparison, a hash table seems quite simple: you either know what value is associated with an identifier or you don't. In a list, you might know all of the list or none of it, or only the first element, or only the last, or only the first three elements. You might know the content of the list without knowing the order they are in and you might know the number of things in the list without knowing what they are. How can I have a generalized solution to permissions for lists in the data structure?

Most of all I'd like to be directed to examples of systems like this that are well designed so I can learn from them, and I would appreciate learning the correct terminology for this sort of project so that I can do useful web searches on this topic. Any comments or ideas would also be appreciated.

data-structure

3 Contributors
10 Replies
1K Views
3 Days Discussion Span
Latest Post 11 Years Ago Latest Post by bguild

All 10 Replies

deceptikon 1,790 Code Sniper

11 Years Ago

I don't know what to properly call a system like that, but on the surface it seems like a simple problem.

I'd call such a system "stupid". If the client needs to make changes to or retrieve data on the server side, it should send a request to the server and the server should make those changes or push the requested data. A client with access to data on the server side, on top of being awkward and counterintuitive given the concept of a client-server setup, would be extremely brittle by nature.

Edited 11 Years Ago by deceptikon

mike_2000_17 2,669 21st Century Viking

11 Years Ago

@deceptikon: "I'd call such a system "stupid"."
Don't be so quick to judge, these design principles are in contradiction to the "server-client" paradigm, but they are not "stupid", in fact, they are the basis for most decentralized systems, with some notable examples like Git, Mercurial, ROS, and the Internet!

I have been struggling with the principles of creating a client-server design where the server maintains a data structure that multiple clients can view and manipulate with various restrictions on which client can access which parts of the structure. I don't know what to properly call a system like that, but on the surface it seems like a simple problem.

The most appropriate name I can think of is a Database. It is a general term, I know, but this is exactly what you are describing. And yes, as deceptikon pointed out, most of these systems are based on some centralized server that responds to request for changes to the database. If you think about it, most websites, like this forum for example, have very similar requirements: a population of users (members) with varying levels of permissions, abilities for any user to view / create / modify some of the content, keep things in-sync between multiple users viewing the same page, and also keeping things in-sync between multiple servers. These are the core operations that occur under-the-hood of most community-driven websites (Daniweb, facebook, youtube, etc.), and all these websites use one form or another of a Database server.

For example, one of my top design goals is that each client should have a local copy of the accessible part of the structure. This is necessary for creating a fast user interface, but it complicates the design by creating partial versions of the data structure and requiring consistency between client data structures and server data structures.

Well, obviously, this is very much like downloading one page of a website, and be able to view that page. But, in that scenario, any changes (like submitting a comment or forum post) is still something that gets immediately sent as a request to the database server to submit the entry.

When you describe it in terms of making a number of local changes and then trying to reunite the different client versions into the server's database, it sounds a lot more like a version control system. Some version control systems are designed in a classic central "database" way, i.e., a client-server architecture with standard requests on the server. Examples include SVN and CVS. Others rely on a decentralized system (i.e., there is no "server" or "client", all nodes are equal, similar to peer-to-peer, or the general design of the Internet), examples of which are Git and Mercurial. In either cases, all these systems seem to boil down to just a few key elements (data structure or operations):

A means by which to represent a "change" to the data. For example, most version systems will use "diff files" which represent any removed / modified / added lines of text in a text file (e.g., source code). Others, like Git, also use checksums (like MD5 or SHA2) to mark changes (versions). This implies that you have some method to generate a diff-file (or record of changes) from the current state of the data and a record of its state when it was last obtained from the server.
A means by which to commit and merge changes. The term "commit" usually means that you want your local changes to be applied to the database (i.e., published). The term "merge" usually means that you want to take multiple versions of the database (that were the same at some point in the past, but have had committed changes since then) and make one version that would include all the changes since the common point in the past.

I think this might be something to think about in your design. You must understand that the challenge in this kind of design is not the problem of representing / working with the data, but rather the problem of managing the changes made to the data and resolving conflicts or simply synchronizing concurrent changes. This is probably where all the trouble is coming from. If you setup your software to "represent the data", then, all the operations of "changing the data" become really complicated and difficult to resolve. Most databases or version control systems, or other similar systems, have two components: a protocol (or set of commands) to apply changes to the data; and, some underlying data structure to handle and respond to the queries quickly. And these are generally quite separate designs. There are carefully designed data structures and protocols to represent and apply changes to the data. And then the local server or client application is written to execute those changes on a local data structure.

With or without a consistency requirement, I would want the server to send out messages to all clients who have permission to view a change when that change happens because keeping clients promptly notified about such changes is another one of my design goals. So I need the client to serialize and send each change that it wants to make to server, then the server makes that change to its copy, then it resends the change to all relevant clients so they can update their local copies. At first I wanted to have the original client update its local copy immediately, but I later realized that consistency is easier to maintain if all clients see the changes in the same order, so now I prefer to have the original client wait for the change to bounce back from the server before making the change locally.

That is the beginning of a solution to the "commit / merge" problem. The problem I see with this, however, is that you cannot control the local copies that each client has. Say Client-A and Client-B are both editing roughly the same "chunk" of the database. Client-A finishes and produces Change-A, which is committed to the server, and then propagated to the clients, including Client-B. Client-B receives Change-A, but it is in conflict with its own changes, what do you do? Do you "refresh" with the server data and essentially destroy the work of Client-B? Do you bounce it back to the server saying there is a conflict, changes are reversed on the server, repropagated to clients (with possible secondary conflicts), and then you wait for Client-B to commit Change-B such that you can resolve the conflict on the server? Will you be able to resolve the conflict on the server? Who has permission to resolve such conflicts?

These are hard questions that are really fundamental to the way your system works. And there is no easy way out of it.

I also realize that I was working at too low a level; instead of creating a class to represent each kind of object in the data structure, I should create a class to represent the concept of a kind of object. My kinds are now objects, not classes, and each object contain a description of what sort of content might appear in an object.

This sounds very familiar to me. I've written such architectures in the past, and I'm actually writing one right now. Having objects to represent the "kind" of other objects, is a core element of what is called Run-Time Type Identification (RTTI) or type introspection. Many many libraries have their own system for type introspection (unless the built-in language feature is sufficient). I have one for my C++ library.

The second part: "each object contain a description of what sort of content might appear in an object". This what I would refer to as a "schema", as in XML Schemas or Protobuf schemas. In other words, it is a sort of list of "fields" that a certain kind of object can contain. Most serialization and database systems have a certain way to represent that. In the example of protobuf, there is even a parser that can generate C / C++ code for a class with containing fields. There are very complicated schema systems (like XML) and some that are very system and minimalistic (like protobuf). They vary depending on the needs.

The objects of the data structure are now all uniformly instances of a single class. For example, call that class "Thing." The various roles of the objects are represented by a field of each object that points to an object of class "Kind" that represents that role. So, to represent cars, trees, and houses in the server data structure, the server would have three objects of class Kind: car, tree, and house. And each Thing object x would have a field "kind" and x.kind == car, or x.kind == tree, or x.kind == house. The Thing object stores the details of a car or a tree or a house in an uninterpreted way, like a hash table from field identifiers to objects of class Data, where Data is a totally abstract class with very few members. Thing can't do much with Data objects, but each Kind object knows what Data objects its Thing should have, so it can downcast to extract the necessary values for serialization. The real power of the Thing class is that it can keep track of permissions, modifications, and fields that point to other Things, and so all the difficult parts of the system no longer need to be designed and implemented for every kind of object in the structure; it can all be done just once in the relatively simple Thing class.

This all seems powerful. But, be careful, you should try to keep things as least intrusive as possible. What I mean is that you must not create some kind of complicated class structure that will insert itself massively into your library's existing (or future) class hierarchy.

For example, the system I'm writing right now relies solely on an RTTI system and a serialization system, everything else is non-intrusive (done externally to the classes in the library). In other words, each class in my library must be registered to the RTTI (through a simple MACRO in the class declaration that generates a couple of virtual functions), and must provide two functions "save" and "load" that simply serialize/deserialize to some kind of "archive" (a polymorphic type, not necessarily going to a file). Everything else is external. I use classes derived from the archive classes to generate the schemas (or list of fields (with name and type)) for the objects, I use a special pair of archives to make a shadow image of the object hierarchy into a structure that can be edited in a general manner (e.g., someObject.setField("some_field", 42);), and so on. At the end of the day, this minimizes the work required when creating new classes in the library, and minimizes their size and overall complexity. I can't stress enough how important that is, this is what makes the difference between something being a "feature" of a library, to something being a "burden" of the library.

One question I have is what sort of structure Thing should be.

The structure needs to reflect the object hierarchy of your software. Unless your software is very trivial, the most likely structure is a directed graph. Of course, depending on your needs, you might want to have multi-directional indices in and out of your graph (i.e., map nodes to actual objects, or map actual objects to nodes). A hash-map is a very reasonable candidate for that task. You talked about a linked list.. forget it, it's useless in this case. In any case, you must differentiate between a data structure and an index into a data structure (or a mapping). So, long story short, data structure = directed graph, index = hash-map (or search tree).

It's possible that Kind objects can contain more than just serialization capabilities. A Kind object could also have a reference to objects that perform complex operations upon a client's request and sends out appropriate messages.

This sounds to me like straight-forward plugin system.

Most of all I'd like to be directed to examples of systems like this that are well designed so I can learn from them, and I would appreciate learning the correct terminology for this sort of project so that I can do useful web searches on this topic. Any comments or ideas would also be appreciated.

That's mostly what I tried to do, I hope it helped.

mike_2000_17 2,669 21st Century Viking

11 Years Ago

From what you have told me, there is no real name for what I am trying to do, only names for various pieces of what I'm doing and names for other things that are similar, but not really what I want.

I'm actually more inclined to think that you don't have a very precise idea of what you want (or, at least, you are not communicating it well). The different things I mentioned all have very similar mechanisms in place to manipulate and synchronize data, except that they are each specifically tailored to very particular requirements. For example, in a version control system, you expect programmers to split up their tasks and work more or less independently and commit code to the repository every few days or weeks. With these requirements, systems like SVN or Git make a lot of sense. The example of a forum / social-network website is very different, here, the main activity is submitting new content (new entries / uploads / posts), which can be dealt with with a much simpler mechanics because concurrent new entries never conflict with each other. Both of these examples have very clear requirements about what kind of modifications are allowed, the frequency of the changes to the server's data, the work-flow (and pace) that the average user is expected to follow, etc. Solutions are tailored to the required tasks. You seem to have put a lot of focus on throwing ideas around about how you could do this or that, but you have to start by what is required (or desired) in terms of user-experience / library-feature, and let that guide you to a solution. And that's also the problem between general vs. specific, because very often, looking for a "general solution" is an ill-defined process because there is no such thing as a general solution, there are only different sets of restrictions / constraints / requirements and different application domains, some wider than others.

This means that I can stop worrying about whether there are libraries that would do what I want and how I could find more advice about how to do it on the internet and just accept that it is a task for myself alone.

Very often, the search is not for a library / algorithm / data-structure / architecture that does exactly what you require, but rather a search for inspiration by looking at techniques people use to solve similar or related problems. It's important to see the problems in a wider scope and recognize the patterns and sub-problems that characterize them, because then, you can see related problems in the same light and see the similarities and draw appropriate inspiration from their solutions. You will never find an exact solution to your exact problem, but you will find a vast amount of related problems / solutions.

I intend to deal with the issues of representing and committing changes by using very small changes. Imagine a version control system that commits while you are typing after every keystroke and where changes made by other users appear instantly in the text you are editing.

The smaller the changes, the greater the bandwidth requirements and the worse the latency of the operations. You can't have arbitrarily small changes, there is a limit. Imagine every new letter typed was a "change" to be committed. This would mean that at every letter that is typed you need to send a request to the database to insert that letter, the database has to do that operation (find where it needs to go and do the insertion), send a reply to confirm the operation, then broadcast the change to all other clients, and then respond to a whole bunch of request to download the updated version of the data (or relevant parts of it). And that's not even considering the latency at the other client sites. Additionally, this means that your client programs must be ready to do all these things (update the data based on a received broadcast from the server) at a very fast pace (within the time it takes to type a character, which is a fraction of a second). Do you understand what I'm getting at? This is impossible, period. There just isn't enough bandwidth to do this, and the resulting latency would be way beyond practical levels. So, you have to set the bar at much larger changes, and much less frequent updates from the server.

That would be impractical, but it illustrates how merging gets easier when each change is small.

Merging does get easier with smaller changes, but not because the changes are smaller, but because the smaller the changes the less change there is of a conflict between the changes. It is a simple question of probabilities, if you have a word document with 10 thousand words, and two people add one word of ten characters, the chance that they happen to insert the word at the same location (which would be in conflict) is very small, but if both wrote a total of 1 thousand words into the document, the chances of a conflict are quite high.

Making the changes smaller is not the only way to affect the probability of conflicts. One solution is to have segmented locking mechanism (or checking out mechanisms). In other words, if I'm working on an article with a few colleagues, I might receive the last version of the draft, from the last person who worked on it, then I'll send an email back saying "I'll work on section 3 of the article" effectively telling others not to touch that section until I have made my modifications to it. You can use that same mechanism for your problem, require that the clients obtain an exclusive lock on the sections of the data (or data-structure) before they can modify it, and release that lock when the changes are complete. This could be annoying, but it has the great advantage that it completely eliminates the probability of conflicts between changes.

Linked lists will be needed somewhere without a doubt

IMO, linked-lists are toy-examples used in computer science 101 courses as a good introduction to some useful concepts in programming. Their applicability in real-life programming is very small, because they have far more disadvantages than advantages, and the alternatives are almost always better (contiguous storage, semi-contiguous storage, contiguous storage of pointers, search-tree storage, etc.). The only practical use of a linked-list that I know of is when the links are to the outside of a program (i.e., the data / objects are distributed among different programs in different address spaces), but I don't thing that's what you are talking about.

On the other hand, working with local linked lists is easy, so if I need a shared list, it is better to construct it like a local linked list using Things as links; that way the list can be used like a local linked list with the sharing handled automatically by the system.

It seems to me like you would be better off with a linked-list style graph implementation, like an adjacency-list (which is essentially a generalization of a linked-list, and is actually rich enough to make it a useful data structure to represent a graph). I don't understand why you would want to have a "local linked list using Things as links", what is it about "Things" that make the linked list a natural choice in putting them together. If "Things" represent objects loaded in your software, then the relationships that bind them together are their mutual ownerships (e.g., A owns B and C, and B refers to C). These ownership relations are best modeled as a graph, not a linked-list. If you simply want a list of "Things", then you will probably need some form of an index to address them (either by name, by ID, by something), and that mandates either a search-tree structure or a hash-map structure, either way, it excludes the possibility of using a linked-list, which is incompatible with both types of indices.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bguild 163 Posting Whiz · Answer 1 · 2012-12-20T22:32:57+00:00

A client with access to data on the server side, on top of being awkward and counterintuitive given the concept of a client-server setup, would be extremely brittle by nature.

I don't understand which problem you are talking about. I certainly agree that the client should send requests to the server to cause things to happen; I can't even imagine an alternative to that. As far as I can imagine, requests are the only way for clients to communicate with servers.

As far as I understand it, clients access server data frequently in client-server relationships. Even now as I type this I am reading your post because my browser accessed it as data from a web server. And by posting this message my browser is modifying server data. FTP servers also come to mind as clear examples of clients accessing data on a server.

I hope you will expand upon what is awkward and counterintuitive, because I don't think you mean the only thing I can guess you mean, and since my project is awkward and brittle by nature I feel you must be seeing a critical problem in my project. Frustratingly, I don't quite see what the problem is.

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 2 · 2012-12-21T13:06:54+00:00

Even now as I type this I am reading your post because my browser accessed it as data from a web server.

Your browser made a structured request and the server sent you the data. The client (your browser) didn't access the data directly. That's the difference from my reading of your first post. You seem to want clients that actually hit the server and fetch data directly rather than sending a request and having the server fetch it.

FTP servers also come to mind as clear examples of clients accessing data on a server.

FTP is a structured protocol for making a data request. Once again, the client doesn't go and get the data directly, regardless of how it might seem with some FTP interfaces. It makes a request for data which the server then responds to by fetching the data and sending it.

bguild 163 Posting Whiz · Answer 3 · 2012-12-21T21:32:46+00:00

FTP is a structured protocol for making a data request. Once again, the client doesn't go and get the data directly

So then the key is getting the data directly as opposed to asking for it in a request and receiving it as a consequence of the request. But I don't understand what it means to get the data directly. I'm trying to imagine something more direct than a request from a client and a response from a server, but as far as I understand clients and servers, requests and responses are all that exist.

I'm trying really hard to interpret your advice. I've even considered the idea that you might be talking about the client and server being on the same computer and actually sharing memory, so that they can access each other's data by simply passing around pointers. Is that what you mean? I hope not because it seems like it would completely break down the whole idea of a client and server. What I want are clients and servers that communicate through serialization as I expect all clients and servers do and if they talk by passing pointers then it's contrary to my design goals and no use to me. I hope you've been talking about a problem in what I'm trying to do, not a problem in servers and clients that use the same memory because that seems pretty useless.

My next possible interpretation is that you are talking about doing serialization by taking the memory occupied by the data structure and using it byte-by-byte exactly as it is stored. If the server did that I can see why it might be called giving the client direct access to the data, but that's also something I've never even considered doing. Serialization can be tedious, but I would never want to rely upon how the compiler of my source code chooses to serialize my data structures in memory just to avoid doing my own serialization, because I would never be able to trust the compiler to always serialize same way. Even if it promised to do that I would have doubts and I think such promises are rare. Even serialization libraries give me doubts; I think that serialization should be customized to the task at hand.

My third idea for what direct access may mean is that the protocols for client-server interactions come in several flavors which I am not aware of and are designed based on principles that I've never heard of, and direct access is terminology from that field of study. It could mean a certain style of message passing that I don't know about, probably an antipattern that knowledgable people know to avoid and I have unfortunately stumbled into it in my ignorance.

I'm attempting to use google to learn more about this terminology, but "DirectAccess" seems to be the name of a Microsoft virtual private network system, and that's clouding up the results. My initial searches have had no interesting results. Whether I find the answer to this question or not, research on this topic can only help my project.

deceptikon 1,790 Code Sniper Team Colleague Featured Poster · Answer 4 · 2012-12-21T21:35:39+00:00

I'm trying to imagine something more direct than a request from a client and a response from a server

VPN into the server and access its file system like you would any computer on a local network. That's what I mean by "direct". If that's not what you were thinking of, then I'm curious how you're imagining a data structure that's shared between server and clients yet is somehow revolutionary or different from the usual request-response designs.

bguild 163 Posting Whiz · Answer 5 · 2012-12-22T05:34:51+00:00

I'm not aware of what the usual request-response designs look like in detail because I don't have a name for what I'm trying to do and without that it is hard to research how other people do it.

I have imagined various protocols for working with a data structure that is shared between a server and clients, but in general all I want is something like a model-view-controller pattern where you have model that notifies its views whenever a change is made and the views consult the model to determine what to display, except the views are not being displayed on the same machine and each machine has its own copy of the model, and a server is used to coordinate all the copies. Additionally, since not every view needs every part of the model, clients only keep a copy of the parts of the model that they need.

The messages passed from client to server would be requests to modify the model and requests for values contained in the model so that a client can refresh its copy of the model. Messages from the server to clients would be responses to requests and notifications whenever the model is changed.

Just setting up a model-view-controller system would be no trouble, but the complexity of it grows enormously when everything needs to be serialized and transmitted. Even just serializing the objects of the model wouldn't be too troublesome, but one also needs to serialize all the ways that the model might be modified. It certainly deserves some careful planning and research, but I don't have any search terms to enter into google to produce helpful results.

bguild 163 Posting Whiz · Answer 6 · 2012-12-23T00:29:56+00:00

I hope it helped

I think you helped greatly! You seem quite knowledgable so you have settled much of my doubts. From what you have told me, there is no real name for what I am trying to do, only names for various pieces of what I'm doing and names for other things that are similar, but not really what I want. This means that I can stop worrying about whether there are libraries that would do what I want and how I could find more advice about how to do it on the internet and just accept that it is a task for myself alone.

I intend to deal with the issues of representing and committing changes by using very small changes. Imagine a version control system that commits while you are typing after every keystroke and where changes made by other users appear instantly in the text you are editing. That would be impractical, but it illustrates how merging gets easier when each change is small.

What I mean is that you must not create some kind of complicated class structure that will insert itself massively into your library's existing (or future) class hierarchy.

I intend to use objects that map from constants to values to form a directed graph that is shared through a server. That simple structure simplifies the server, while the client can keep the shared objects isolated from the business part of the application by wrapping the shared objects in objects that have a more meaningful interface. That way the shared objects and the constants used to look up data only appear in as few places as possible.

You talked about a linked list.. forget it, it's useless in this case.

I see what you mean. Linked lists will be needed somewhere without a doubt, but putting them inside the Thing objects is more trouble than benefit. A complex shared object more than anything complicates the design of the schema and makes the whole system less intuitive. On the other hand, working with local linked lists is easy, so if I need a shared list, it is better to construct it like a local linked list using Things as links; that way the list can be used like a local linked list with the sharing handled automatically by the system.

The only issue is how to track down which list has changed when the server reports that a link has changed, since the server knows nothing about lists. That is simply a matter of giving each link a pointer to the list. That's not something one would normally have in a linked list, but it's really a trivial cost compared to the cost of sharing the structure.

bguild 163 Posting Whiz · Answer 7 · 2012-12-24T00:44:40+00:00

The different things I mentioned all have very similar mechanisms in place to manipulate and synchronize data, except that they are each specifically tailored to very particular requirements.

That's true. SVN and forums both have requirements that are quite contrary to my design goals. They deal with large, slow modifications and the server waits for each client to request an update before telling the client about the latest changes. My design goals require something closer to real-time interaction between the clients and the server. I'm not worried about avoiding the natural delays due to internet transmission, but prompt notification of all clients with each change is a design goal.

You can't have arbitrarily small changes, there is a limit.

I can have arbitrarily small changes; it just requires giving up something else in return. The keystroke-by-keystroke version control system was impractical, but something where the small changes come slightly more slowly shouldn't be. A better example might be a spreadsheet that is shared with a server and every time the value in any cell is changed all clients see the change promptly. The user isn't likely to make changes like that fast enough to be a problem. And since each user would be seeing the work of other users in approximately real time, one can just do merging by having the latest change win. The users would be able to see it happen and correct it if there where a problem.

IMO, linked-lists are toy-examples used in computer science 101 courses as a good introduction to some useful concepts in programming. Their applicability in real-life programming is very small, because they have far more disadvantages than advantages, and the alternatives are almost always better (contiguous storage, semi-contiguous storage, contiguous storage of pointers, search-tree storage, etc.).

The advantages of contiguous storage and search-trees is that you can look things up quickly. Of course that is a huge advantage is some cases, but when the data is being transmitted over the internet the look-up speed advantage is vastly overwhelmed. When I transmit a change in a contiguous storage list, I can either transmit the entire new list, or I can transmit that a certain sublist has been replaced by another sublist. When I transmit a change in a linked list, I transmit requests for the creation of new links and changes in the values of the fields of those links. If a client wants contiguous storage look-up speed then the list can be copied into contiguous storage locally every time it changes; compared to transmission time that would be trivial. Unlike most cases, transmitting a change in the list is the only operation that matters, and I don't see any significant improvement from using the contiguous option.

And the linked list has other sorts of advantages. It fits more naturally with representing the structure as a directed graph than contiguous storage does. Search trees have the same advantage, but search trees are no better than linked lists if searching is not needed. Each node can just store a adjacency list of a small fixed set of out-going edges. Some of those edges might point to null sometimes, but we can have a label for every edge and every change is just assigning a node to a label or a small number of nodes to a small number of labels.

In order to fit contiguous storage into that system, I can only imagine a node which has as many out-going edges as there are elements in the list, and each edge is labelled by an index. Then changes can sometimes be simply assigning a node to a label, but if you want to do an insertion anywhere but the end of the list then you need to either transmit a request to shift forward each edge or you need a more complicated system where you can send a different kind of message for insertions.

If you simply want a list of "Things", then you will probably need some form of an index to address them (either by name, by ID, by something), and that mandates either a search-tree structure or a hash-map structure, either way, it excludes the possibility of using a linked-list, which is incompatible with both types of indices.

I don't think that needing an index would be the most probable case. If indexing really were necessary then I think that a search tree would be best, but in most cases just getting the content of the list would be enough. From my perspective, just having two applications on remote machines operating on the same data at the same time is a wondrous thing and you can have that as long as they both have lists with the same elements in the same order. Sharing an index into that list is pure bonus, and in some cases where it is needed you could do it by reconstructing the index locally every time the list changes because the time needed to do that would usually be overwhelmed by the transmission time.

Client-Server Shared Data Structure

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers