Introduction

Databases are an essential building block in today’s digital systems. The data management features they provide are so fundamental that almost no application could function without them. Today, database management systems (DBMS) are complex and varied, providing facilities far beyond the basic requirement to store and retrieve data. But databases are often the unsung hero of the project – an unseen foundation layer working quiet miracles beneath the application.

In the era of online systems with millions of users, the demands on the DBMS are greater than ever: it must manage high volumes of transactions with low latency, always be available, and allow for complex queries to support analytics.

The cutting-edge use cases of database technology include applications such as online advertising and payment processing, where low latency and high availability are essential. However the public sector also manages very large datasets, especially in areas like healthcare and taxation, and will inevitably make more use of this data to improve public services. Inevitably, public sector organisations will need to follow the lead of the private sector in implementing very large and complex database systems.

Background

As computing power has evolved, so have databases. The first hard disk drive, built in the 1950s by IBM, was the size of a fridge and could store 3MB of data. By the 1980s, drives had shrunk enough to be used in desktop computers, and had capacities in the tens of megabytes.

The fast-declining cost of storage, coupled with the rapid increase in processing power, made DBMS accessible to organisations that had previously been priced out of the IBM-dominated mainframe market. This created a requirement for high-quality database software, and during the 1980s, a competitive DBMS market emerged. The clear market leader was Oracle, which remains a leading player in the database market; other leading DBMSs of that era included Ingres, Informix and Sybase, all of which were eventually acquired by larger technology companies.

These systems were all relational databases: the relational model is based on set theory, a branch of mathematics, and stores data as a set of tables with columns (fields) and rows (records). While the relational model is flexible, and has a widely-supported query language (SQL) for accessing data, it is not very suitable for rich data that doesn’t naturally fit into a table structure, which can make high demands on computing power, and it may not scale well.

The current database ecosystem

By the 1990s, relational databases ruled the IT world, and coped with all but the most demanding applications. “What’s striking”, says David Walker, a consultant specialising in database technologies, “is that there was little significant progress in database technologies for a couple of decades. Relational databases got faster, more scalable and more reliable, but were still limited in many ways. But now there are multiple database models. Organisations will no longer select a single database supplier – they instead create a data layer based on four or five different systems. You may have a database for transactional work, one for analytical, another for search, and so on.”

Oracle still rules the roost in terms of market share, but has “become an ecosystem”, says David. “Other databases have moved back to just being databases. People expect them to be really efficient. Scale is the thing now, whether it’s about the sheer volume of data, transactions per second or the ability to distribute data geographically.”

New database models have emerged to cope with new demands. Of key interest are graph and vector databases.

Graph databases

Graphs are mathematical data maps in which data items (nodes) are connected with each other (via edges). Graph databases are optimised for applications that need to understand the relatedness of data items – for example, social networks (which store the relationships between people) or the recommendation engines used in retail and video streaming applications, which store the relationships between products. Graph databases are searched via graph query languages, which allow queries to quickly find people or products based on their connectedness to other people or products.

Vector databases

In vector databases, data objects can be searched and compared via complex attributes (known as vectors). Vectors are far more than simple data attributes – they can be algorithmic, which means that vector databases can be used to find complex, non-exact matches between data items. This makes vector databases ideal for finding data based on how similar it is (in some way) to other data; they have become particularly popular for use in generative AI applications.

David Walker says that understanding today’s database market can be challenging. “There are now three hundred-plus databases. Analytical databases have progressed a long way – Teradata has been in that space for a long time, but they’re now joined by products like Cassandra, MongoDB, Elasticsearch and Neo4j. For very high transaction volumes, there are products like Aerospike and Yugabyte, which do well in telcos and banks. When it comes to AI, the challenge is to handle enormous datasets.”

Database example – Aerospike

In practise, database companies often provide solutions in multiple areas. Aerospike’s core product is an ultra-fast transactional, real-time database that the company describes as being massively scalable with predictable sub-millisecond latency.

“We’re a mature startup” says Aerospike’s Martin James, “with a multi-model database platform that includes key-value, document, graph, and vector. Whether the solution calls for gigabytes, terabytes or petabytes, we can confidently deliver it. Our architecture allows for replication across clusters, and can be highly distributed across geographies. It’s extremely efficient: we’ve enabled some customers to cut their hardware usage by over 80%”

The product was originally developed for the online advertising industry, which requires very high transaction throughput. “One of our ad customers, Criteo, migrated to our platform”, Martin says, “allowing them to reduce their hardware footprint from 3,200 to 800 servers. On this infrastructure, 120,000 ads can be served each second. At peak, this needs 290 million simple transactions per second.” [At this point in the interview, your interviewer nearly fainted]. As Criteo Engineering point out in a blog post, such efficiency is not only good for cutting costs and complexity, but is also instrumental in advancing sustainability.

Aerospike recently added graph and vector databases to its product set, layered on top of their scalable architecture. The concept for the Aerospike graph database was based on an anti-fraud project by PayPal, which built graph capabilities on the core Aerospike platform. “We looked at what PayPal had done and launched our graph database in 2023,” explains Martin. “Neo4j is the dominant player in that sector; we bring unlimited scalability and low latency to the graph database market.”

Databases in the public sector

In terms of technology, the public sector tends to lag well behind much of the private sector, and that rule applies equally to database technology. In theory, large organisations like the NHS and HMRC, which serve tens of millions of people, should be able to leverage the new generation of databases to enhance their offerings to UK citizens. With the addition of AI capabilities into public sector systems, the demand for high-performance solutions will grow even faster. But sometimes, being late to the party can be beneficial: David Walker suggests that “the public sector should skip a generation of technology. If they follow the same cycle as the private sector, they’ll never catch up. One key issue (which Ofcom has raised) is the lock-in risk associated with cloud infrastructures. What happens if Amazon goes down? Regulators in fintech have insisted on multi-cloud strategies – the public sector should take note and follow this advice.”

Several Bramble Hub partners specialise in database implementation in the public sector. Our partner Servita – for example – designed, implemented and currently runs the National Patient Care Aggregator for NHS England. This system gives patients access to their referrals and hospital appointments via the NHS App, which is currently available to 38 million UK citizens. The solution architected by Servita, which utilises advanced containerised technology and Amazon’s DocumentDB, aggregates numerous data sources, and manages over 11 million requests per month, equating to many more database transactions.

While not in the same league as the online advertising industry (yet), it seems clear that the public sector’s use of advanced database technology is set to grow rapidly in coming years.

With thanks to: David Walker, Aerospike and Servita