Big Data Market Segment LS
Big Data Market Segment RS
Wednesday, 21 August 2024 23:42

Want GenAI? Then you need a document database for the best result says MongoDB exec

By

Generative AI is bringing vast business benefits from summarising documents, to helping with customer service, even aiding organisations in asking questions of complex systems in simple plain language. However, you might not be using the right tools for the job. MongoDB field CTO Rick Houlihan experimented and found vast performance gains when using a document database over a relational one.

In the world of databases, MongoDB is a leader in the 'no SQL' movement. Relational databases trace their roots to mathematical set theory, and its rules of relational algebra were set in stone by IBM and other researchers since the 1950s. Relational databases power sales, payrolls, inventory, flight schedules, and all kinds of enterprise purposes around the world.

Yet, in a modern world where text - and specifically, natural language - is becoming a major force, the relational database may simply not be the right choice.

"Third normal form is a great mathematical and logical representation of data," said MongoDB field CTO Rick Houlihan, referring to the relational model of using tables with linked fields to ensure a single set of master data without redundancy. "But it has a high time complexity to map data together."

"MongoDB makes it easier to work with data. Our core database, the document database, was built to remove abstractions from data. A document database brings data out in a better way," he said. "We don't have to work with data in third normal form in apps."

In fact, the challenge of developers getting their heads around relational database mechanisms can often be a complex one. It's why, Houlihan notes, there's a whole slew of object relational-mapping (ORM) tools. One such example is the popular Entity Framework for .NET; such tools are used by developers to remove abstractions from relational databases. Houlihan says you simply don't have that fuss in a document database; it just works differently.

"We say just store the data how you use it. Make document structures that map to your access patterns," he said. "It's more efficient."

And, when it comes to generative AI - which institutions all around the world are working hard to find and pilot use cases for - the choice of database can make a huge difference.

Houlihan is more than willing to put his money where his mouth is. "I've always been a big fan of Grace Hopper who said 'one accurate measurement is worth a thousand expert opinions'" he said. Thus, earlier this year he tested for himself how well different databases could support generative AI with truly eye-opening results.

Using the exact same hardware, and with Postgres and MongoDB set up, with clearly stated configurations and parameters, Houlihan loaded single attributes and multiple attributes of increasing size into the databases. This replicates the type of data generative AI deals with; it's not about simple numeric order IDs or product SKUs or surnames and first names. Rather, GenAI is all about huge chunks of text - contracts, manuals, documentation. Even if the text is chunked it's still in blocks of 4Kb or more. It's a scenario that a document database excels at, and a relational database does not.

Houlihan's testing showed for small block sizes MongoDB and Postgres compared relatively evenly until the payload size ramped up. No matter if using Postgres JSON (a widely used data interchange format popular across many applications and technology stacks) or JSONB after a mere 200 bytes their processing time to insert data began increasing significantly. Meanwhile, MongoDB retained a reasonably linear insert time irrespective of the size of the data.

For example, to insert 200 attributes at 4000 bytes Postgres took 37.2s using JSONB and 17.5s using JSON, while MongoDB did the same work at fractionally over a mere second.

The read workload running against the same data took 53.8s and 27.8s in Postgres for JSONB and JSON respectively, against 8.4s in MongoDB.

Data types like JSON or VARIANT can help to shoe-horn large objects into relational databases, but the takeaway from Houlihan's experiments is clear. Relational databases suffer from performance limitations of wide rows and large data attributes, while a document database such as MongoDB takes them in its stride.

Relational databases are valuable, and important, of course, but Houlihan's message is you need to use the right tool for the right task. And, when it comes to generative AI, the right tool is a document database like MongoDB.

Houlian published his work in a GitHub repo for the world to see and for anyone to validate for themselves.

It's not simply an experiment; when it comes to MongoDB and GenAI "there are companies doing real things, with real impact," Houlihan said. He cites an example as Pathfinder, an organisation which uncovers cybercrime evidence, collates it, and then uses AI to find similarities and identify perpetrators of such evil as human trafficking and exploitation.

Another is Nova Nordisk leveraging MongoDB and Generative AI to improve health care and advance medical treatment for common diseases like diabetes and cancer. Reducing the time required to compile clinical research reports for regulatory approval of new pharmaceuticals from 12 weeks to just 10 minutes has empowered their business to do more with less.

Meanwhile, MongoDB has a big Australian connection, Houlihan explained. While MongoDB is available as a free product, it also comes as a managed service called Atlas. A new feature, Atlas Charts, uses natural language to easily visualise data and help end users self-service to get meaningful, actionable insights into their data without having to wait upon specialised BI developers being freed up.

Atlas Charts is the work of MongoDB's Sydney-based engineering team. This team has also been a big part of the company's Relational Migrator, a service that uses generative AI to help organisations migrate relational databases to MongoDB, among many other projects.

Houlihan previously spent time as the first technical product manager for AWS DocumentDB, building a NoSQL centre of excellence there. Now he is at MongoDB and taking his big ideas further.

What attracted him to MongoDB was how the product "wraps functionality behind a unified API where devs don't have to learn five or six different tech stacks. We don't reinvent the wheel; we invest in the core service and then add the best-of-breed from the industry into the product." An example is Lucene; it's the most popular full-text search engine, and the same one that backs Elastic, among other popular products. "So, we built Lucene into our own API to reduce developer overhead in working with the data."

"On top of that," he said, "our founders had geographical data distribution in mind from the beginning. The relational database is not built with geographic distribution in mind. Layers can be added on top like log shipping for Postgres, or Golden Gate for Oracle, but it's often left for the developer to solve."

By contrast, "it's a first-class citizen in MongoDB. We combine the flexibility of the document model with the ability to determine on each individual write how the data should be replicated and what level of consistency is required. It's a really novel way of working with data and how to store and access it."

"We work with the largest financial institutions in the world. These companies run extremely high-velocity trading and payment processing applications; the tech we provide for those kinds of workloads drives an enormous amount of efficiency," Houlihan said.

Read 1127 times

Please join our community here and become a VIP.

Subscribe to ITWIRE UPDATE Newsletter here
JOIN our iTWireTV our YouTube Community here
BACK TO LATEST NEWS here




IDC WHITE PAPER: The Business Value of Aiven Data Cloud Solutions

According to IDC, Aiven enables your teams to perform more efficiently, reduce direct infrastructure costs, and provide improved database performance, agility and scalability.

Find out how Aiven makes teams 48% more efficient, allowing staff to focus on high-value activities that drive real business results:

340% 3-year ROI – break even in 5 months (average)

37% lower 3-year cost of operations

78% reduction in staff time for database deployments


Download the IDC White Paper now

DOWNLOAD WHITE PAPER!

PROMOTE YOUR WEBINAR ON ITWIRE

It's all about Webinars.

Marketing budgets are now focused on Webinars combined with Lead Generation.

If you wish to promote a Webinar we recommend at least a 3 to 4 week campaign prior to your event.

The iTWire campaign will include extensive adverts on our News Site itwire.com and prominent Newsletter promotion https://itwire.com/itwire-update.html and Promotional News & Editorial. Plus a video interview of the key speaker on iTWire TV https://www.youtube.com/c/iTWireTV/videos which will be used in Promotional Posts on the iTWire Home Page.

Now we are coming out of Lockdown iTWire will be focussed to assisting with your webinars and campaigns and assistance via part payments and extended terms, a Webinar Business Booster Pack and other supportive programs. We can also create your adverts and written content plus coordinate your video interview.

We look forward to discussing your campaign goals with you. Please click the button below.

MORE INFO HERE!

BACK TO HOME PAGE
David M Williams

David has been computing since 1984 where he instantly gravitated to the family Commodore 64. He completed a Bachelor of Computer Science degree from 1990 to 1992, commencing full-time employment as a systems analyst at the end of that year. David subsequently worked as a UNIX Systems Manager, Asia-Pacific technical specialist for an international software company, Business Analyst, IT Manager, and other roles. David has been the Chief Information Officer for national public companies since 2007, delivering IT knowledge and business acumen, seeking to transform the industries within which he works. David is also involved in the user group community, the Australian Computer Society technical advisory boards, and education.

Share News tips for the iTWire Journalists? Your tip will be anonymous

Subscribe to Newsletter

*  Enter the security code shown:

WEBINARS & EVENTS

CYBERSECURITY

PEOPLE MOVES

GUEST ARTICLES

Guest Opinion

ITWIRETV & INTERVIEWS

RESEARCH & CASE STUDIES

Channel News

Comments