Managing Complex Data Models


Complex data models are now commonplace. A single data stream might pass through multiple hubs and different technologies. It could pass through the front end, APIs, Kafka pub/sub systems, Lambda functions, ETLs, data lakes, and data warehouses, among other places. The schema is contained within this stream of data, and each schema has its own language and syntax, as well as its own data types and life cycle. As a result, data modellers’ jobs have become much more challenging over time.

Another element that makes data modelling more challenging is that data modellers must also be familiar with the target technology. Machine learning, natural language processing, artificial intelligence, blockchain, and other methods of data consumption are only a few examples. The world of data has grown a lot more complicated than it used to be. Data Modeling is made more difficult by the numerous options accessible in NoSQL databases and communication methods.

Physical Data Models Become Difficult

Physical data models depict a data design that has been applied in a database management system or will be deployed in the future. It’s a database-specific model that depicts the relationships between relational data items (columns, tables, primary and foreign keys). Physical data models can also create DDL statements, which are subsequently transmitted to the database server.

Implementing a physical data model necessitates a thorough grasp of the database system’s characteristics and performance factors. When working with a relational database, for example, it’s important to understand how the columns, tables, and relationships between them are organised. Understanding the intricacies of the DBMS is critical to integrating the model, regardless of the type of database (columnar, multidimensional, or some other type of database). Founder and CEO of Hackolade, Pascal Desmarets, says:

“Physical Data Modeling has traditionally been centred on the design of single relational databases, with DDL statements as the expected artefact. Those statements were usually fairly generic, with only slight changes in functionality and SQL languages between vendors. However, at the size required by today’s huge corporations, these models become complex.”

Enterprises are increasingly adopting modern IT architectures based on APIs and microservices, which include complex communication protocols such as message queuing and remote procedure calls, among other things. Desmarets explained, “They do polyglot persistence using several types of databases using specialist databases from a range of NoSQL providers.” Each of them has a distinct storage model. Machine learning, natural language processing, artificial intelligence, blockchain, and other technologies are being used to consume data in new ways. As a result, the environment is becoming far more complex than in the past.

“They used to merely generate DDLs, which was rather straightforward in terms of target technology.” Now the data modellers must comprehend and integrate the characteristics of each technology in order for the physical data model to fully utilise the benefits of each.”

Data Models with Multiple Languages

A “polyglot” is someone who is fluent in multiple languages. The term “polyglot data model” refers to the usage of numerous database technologies to access different sorts of data. Data services can use and interact with different database systems with a polyglot data model, providing multiple methods to handle and retrieve data.

Many businesses, on the other hand, are functioning using outdated logical models that fall short of this objective. There is a need for new data models that can describe both at-rest and in-motion data. Modern data has complicated layered data types and can be polymorphic, necessitating a great deal more effort to transform a classic logical model into each of the many diverse physical schemas used by various technologies.

Desmarets had this to say about polyglot data models:

“We see that businesses have amassed conceptual and logical models to represent their operations and information systems. They’ve made a significant investment. Clearly, information architecture departments want to get the most out of their investment, even as technology evolves and becomes more sophisticated.”

According to his clients’ feedback:

“We believe that the definition of a logical model should be broadened. A logical model should not merely be the least common denominator of data definitions with the danger of making sacrifices to match the most limiting technology while remaining technology agnostic.”

Scale and complexity

The more technologies that are deployed, the more complicated the organisation and its physical data model become. Different departments within a company might be thought of as links in a chain, with some utilising different technology than others. As a result, not all of the chain’s links can be altered at the same time or with a single command. Desmarets made the following observation:

“Another problem is the ability to deal with scale and complexity. Companies appear to have hundreds, if not thousands, if not tens of thousands, of APIs and microservices, which are managed utilising a variety of technologies. Their schemas are all over the place, each with its own lifespan.”

The size required to run efficiently is determined by the number of microservices and APIs. Scale wasn’t an issue a few decades ago, when monolithic programmes with a three-tier architecture were common. Modern systems, on the other hand, utilise a wide range of services, necessitating a scaling up of the system to meet the services.

Data Modeling in the Future

As the demand for knowing how the system works and manipulating it develops, Data Modeling will become increasingly critical. In 2020, Data Modeling will place a greater emphasis on metadata (data tags used to find data). This is owing to its significance during the research process, in part. When metadata is included in a data model, it becomes easier to visualise and establishes its value in data management.

Desmarets, when asked about the future of Data Modeling, said:

“Our road map is divided into two halves. One is for providing functionality that every data modeller expects and requires from an application in order to execute data modelling and schema design, including NoSQL. Simultaneously, we’re expanding our support for target technologies to meet our clients’ rising demands: additional NoSQL databases, JSON in relational databases, big data analytics platforms, storage formats, cloud databases, communication protocols, and so on.”

Hackolade is currently working on a polyglot data model that will allow modellers to define the structure once and create schemas in a variety of technologies in a simple and efficient manner. Customers are confronted with this new challenge, thus Hackolade condensed it into a list of planned additions.

“We’re working on another project,” he explained. “An capacity to deduce the schema of JSON contained in blobs of relational databases, leading to a more full data model of semi-structured data, is perhaps more tactical.”

He clarified that the goal is not to compete with existing data modelling tools, but to complement them and provide value to customers. Hackolade focuses on helping enterprises solve new challenges while leveraging their existing investments in traditional solutions.

Source: data science course malaysia , data science in malaysia


Please enter your comment!
Please enter your name here