Generic Entity

Motivation

With the fast growing of the technology companies, the software products are becoming integrated as never before or replaced by bigger software platforms. The data are no longer locked in the product silos. The raw data are often spread from users, across organizations and services. They often end up in the cloud, to be returned in the higher, more intelligent form.

The motivation is to create the simple generic data model and programming model, that can be accessed by any technology. It can be a fundamental part of software platforms, which want to leverage the open source instead of creating proprietary implementations. The key design aspects are reusability, extensibility and simple interface.

Traditional entities

Probably every software provides functionality built on the top of some data model. Such data models are usually represented by entities which store the information into fields. Standard implementation of the entity is a simple class which exposes its fields via get and set methods (or properties in C#). Such class is also called POCO in the .NET world.

Example of traditional entity implemented in C# language:

namespace MyNamespace
{
    public class Address
    {
        public int Id { get; set; }
        public string AddressLine1 { get; set; }
        public string City { get; set; }
        public string PostalCode { get; set; }
        public string Country { get; set; }
    }
}

Depending on the software, the various capabilities are built on the top of these entities. It can be UI views, validation, business logic, API controllers, processing, and the data storage. Such functionality can be bound to concrete entities or implemented in a generic way.

Functionality coupled to concrete entities

The assumption is that the data model is known in advance and can be represented by concrete entities. Every functionality built on top of concrete entities and their fields creates high number of dependencies on these entities. This phenomenon is called tight coupling and it worsens the maintainability of the software. Another issue is the code reusability, when the same functionality is often duplicated for different entities (e.g., the separate API controllers with CRUD operations for every entity or the separate data access layer for every entity). With introducing of the new entity, the existing functionality is usually copy-pasted from some other existing entity and adjusted (and possibly screwed by developer). The new functionality is adding a new interface which together with interfaces added by other entities is creating a big surface to be maintained and tested. Such development is also very ineffective. How much development it takes to add the new entity or to just the new field to the entity with supporting all the capabilities? How much it takes to test it? How much to takes to produce bug fixes and test them?

Finally, it requires that data model need to be implemented beforehand of the follow-up functionality. Such data models create a contract the other functionality will depend on. But what if concrete entities and their fields are not known in the phase when the functionality needs to be implemented? What if they are provided by an external team or by customers later? The dependencies on concrete entities causes the work organization problems.

Generic functionality

Usually, every software contains functionality, which can be identified as common and is the same for all entities. For example, the UI datagrid view, validation, API controller with CRUD operations or the data storage. Such common functionality can be implemented effectively in a generic way.

In fact, there is a lot of existing software frameworks which can be used to support some common functionalities. For the UI frameworks, there are components like datagrid view, which use data binding to display the fields of any provided entity. There are also validation frameworks which can be used to validate entity fields. For the data storage, the ORM mappers (i.e., Entity Framework, NHibernate, etc.) are used to map and store provided entities into the database. The very common nowadays are also JSON serializers. These software frameworks usually use traditional entities, which must be provided by the application developer via the code. The software framework usually uses reflection to access the entity fields. Some frameworks dynamically generate, compile, and use the code for accessing fields (expression trees in C#).

Although the third-party framework doesn’t need to fit into your project. You can extend existing frameworks, but you can’t fully control their basic flow. You might want to implement your own UI datagrid, validation system or the workflow to full-fill your customer requirements. The functionality identified by your project as common can be also very specific to your domain or the business requirements so there no such framework exist at all. Therefore, some common functionalities can be only addressed by implementing of your own software frameworks.

But creating of the generic functionality on top of traditional entities is not straight-forward. First, it requires abstract thinking to identify a common functionality. The non-generic data model also leads developers to build also non-generic functionality. The generic functionality would need to use the reflection or dynamic generation and compiling of the code to access fields. Such a programming model would become complicated and requires decent programming skills. Using of the reflection also causes performance hits. As the result, a lot of common functionality is either not identified or is not implemented in generic way.

The data model stuck in the code

As traditional entities are implemented as simple software classes, the whole data model is stuck in the code. Every change in the entity or in the entity field requires change of the code and releasing of the new version. There is also development cycle which delays the new version containing the new entity or the field to be available for the customer (time to market factor).

These problems can be mitigated, if entities are defined by the schema outside of the code and the software classes with their implementation are automatically generated from this schema.

Generic data model requirement

Imagine that you are building software platform, which should work with any data provided by customers. Basically, any customer can bring their own data into your platform so he can leverage its functionality. The functionality you are trying to build can’t depend on the concrete customers data models as you don’t know them in advance. Therefore, the platform functionality needs to be implemented in a generic way, so it can work with any provided data model. One possibility would be to access provided data models via the reflection. Another, the more straight-forward way is to build such functionality on top of some generic data model. Such a data model also promotes building of the dependent functionality in a generic way.

The generic entity is the concrete implementation of generic data model. It is an abstraction of traditional entities. In traditional entities, fields and their data types are defined by the code of concrete entity class. The basic idea behind the generic entity is to take out the concrete entities definitions outside of the code and put it into the language independent definition files (schemas).

Such abstraction allows us to use single software class (representing the generic entity) to substitutes all possible software classes representing concrete entities. The generic entity class also stores the information into the fields. Such a field can basically store any data type, similarly as the traditional entity can store in its concrete field. But the difference is that information about concrete fields is not hardcoded but provided by concrete schema.

The entity definition (from the example above) using the built-in schema:

{
  "namespace": "MyNamespace",
  "name": "Address",
  "fields": [
    {
      "name": "id",
      "type": "integer",
      "displayName": "Id"
    },
    {
      "name": "addressLine1",
      "type": "string",
      "displayName": "Address line 1"
    },
    {
      "name": "city",
      "type": "string",
      "displayName": "City"
    },
    {
      "name": "postalCode",
      "type": "string",
      "displayName": "Postal code"
    },
    {
      "name": "country",
      "type": "string",
      "displayName": "Country"
    }
  ]
}

Built-in schema

The generic entity uses for definitions its own JSON schema format (with the structure above), to make it independent from other technologies. It defines the entity “name” in the particular “namespace” and concrete fields. The field definition has the “name” and the “type” properties which are mandatory. The “type” is language independent. The above schema definition uses the “integer” and “string” types. The supported types by the schema can be easily extended to any new type. The field definition can optionally contain the “displayName” property (the nice presentable name). The full schema model is here (C# version).

Validation of fields

The traditional entity model is subject of compiler check (in the compiled languages). As the generic entity is defined by data and not by the code, this possibility is not available. Therefore, the validation of generic entity fields and their types need to be done in runtime.

Unlocking new capabilities

Now, the concrete data model is not stuck in the code. It is defined by data. You can start building cross-product features around generic data model. Sharing of schemas across-organizations will cause that one organization will easily understand the data provided by another organization. The organizations can also collaborate on the same schema. You don’t need to create a new version of software to support a new concrete entity or to modify the existing one. It is just matter of changing of the data (schema), which can be done even in runtime. It can be done also directly by the user (e.g. in the UI). You only need development if you want to extend the schema about new field types.

Future schema evolution

The schema can be evolved to support new capabilities of the generic entity. The good example of it would be the inheritance or the versioning. It can also support the import and export of existing industry’s standard schemas, like the Microsoft EDMX, CDM or JSON schema, to leverage existing data models and tools.

Programming model

The programming model for the generic entity will be introduced soon, in my next article.

Generic Entity – Concept