Oracle Raises MySQL HeatWave’s Thermostat for In-Database Machine Learning

To further strengthen our commitment to providing industry-leading data technology coverage, VentureBeat is pleased to welcome Andrew Brust and Tony Baer as regular contributors. Look for their articles in the Data Pipeline.

Oracle has owned MySQL since its acquisition of Sun Microsystems more than a decade ago. Under the supervision of Oracle, MySQL has remained distinctive. But unless you were MariaDB, few people paid much attention to Oracle’s stewardship until a few years ago. And with each of the major cloud providers rolling out their own managed MySQL database services, Oracle offered relatively few reasons for customers to pull to MySQL under Oracle’s support.

Well, that’s no more. Fifteen months ago, Oracle introduced MySQL HeatWave with its own optimized implementation of MySQL on Oracle Cloud Infrastructure (OCI, also known as Oracle’s public cloud platform). Those optimizations must be transparent to the application. And now, Oracle is making the 3.0 release of HeatWave, increasing the node size, reducing costs for some workloads, and introducing machine learning into the database, which could take advantage of higher-density data nodes.

HeatWave is not just vanilla open source MySQL, as it differentiates itself with Oracle developed extensions (described below). That’s not particularly unusual in open source, as Amazon Aurora and Azure PostgreSQL Hyperscale, not to mention the countless other PostgreSQL variants on the market, demonstrate that open source databases provide a clean slate for differentiation.

In making the move to become a serious competitor in the MySQL space, Oracle took the database in a unique direction with HeatWave: It optimized for analytics alongside transaction processing by leveraging MySQL’s support for pluggable storage engines. In this case, it plugged in an in-memory columnar storage engine that works alongside the row storage, with optimizations tailored to handle analytic queries.

Connecting a columnar storage engine working side-by-side with a row-oriented engine is not uncommon; MariaDB has done it and in fact Oracle went down a similar path a few years ago but with different technology for its flagship database. But to this day, Oracle is the only one to have developed an analytics-optimized engine for MySQL.

In the latest release, Oracle has introduced new improvements to reduce compute costs and bring machine learning into the database.

Let’s start with operating costs. HeatWave version 3.0 doubles the data density in each compute node without changing the price for compute. So you can now only use (pay) half the number of nodes to calculate the same workload. And by the way, Oracle set the stage for all this in the previous HeatWave 2.0 release, where it doubled the maximum cap for HeatWave clusters to 64 nodes.

Combined, cost efficiency and scale should come in handy now that machine learning models can be run in the database. Hold that thought.

In addition to data density, HeatWave 3.0 makes it more economical to scale, as you can add any number of nodes (up to a maximum of 64) in each step. This is in line with what Oracle has introduced for its Autonomous Database cloud service, doing away with the so-called standard “T-shirt sizes”. So elasticity with HeatWave doesn’t mean you have to double the number of active nodes every time your workload is calculated. HeatWave also improves availability during resizing, by up to a few microseconds while the search is paused.

HeatWave 3.0 adds a few tricks to further speed up processing. Like any columnar storage engine, HeatWave makes ample use of data compression. And it applies some common techniques, such as Bloom filters, which reduce the amount of cache needed to process queries. Notably, HeatWave has implemented Blocked Bloom filters that can perform the necessary data searches with much less overhead, significantly reducing the amount of intermediate memory required.

These capabilities, in turn, pave the way for Oracle to introduce the ability to process machine learning models in the database, without the need for an external ETL engine or machine learning execution environment. And with that, Oracle is following a trend that also includes AWS (Amazon Redshift ML), Google (BigQuery ML), Microsoft (SQL Server with in-database R and Python functions), Snowflake (with Snowpark), and Teradata (via extended SQL). But comparing these approaches is like apples and oranges, as each provider takes different paths ranging from developing models externally to providing limited, managed choices for running ML, while others extend SQL itself.

Heatwave goes the compound route. It’s an approach suitable for business analysts or “citizen data scientists” to democratize machine learning in the same way that self-service visualization put BI in the hands of the average user. In contrast, the external route targets data scientists in organizations competing for their ability to develop their own unique, highly sophisticated models.

A bonus of the curated approach is that it doesn’t require any external tools, meaning the selection, configuration, training, and execution of ML models is done entirely within the database. That eliminates the overhead and cost of moving data to tools or ML services running on separate nodes. Oracle also touts the fact that keeping everything in the database reduces potential attack surfaces and thus reduces security exposure.

Here’s how HeatWave’s AutoML approach works. The user chooses the table, columns and algorithm type (eg regression or classification) and then specifies where to store the model artifacts. The system automatically determines the best algorithm, the correct functions and the optimal hyper parameters and generates a tuned model.

It streamlines key steps; for example, when testing a candidate model, it separates individual tasks or steps that the model performs, evaluating each step using proxies or stubs that simulate the algorithm against a representative sample of hyperparameters. It then automatically documents the choice of data, algorithms and hyperparameters to make the model explicable, as shown in the figure below.

The advantage of ML processing in the database is a flatter architecture and elimination of the overhead of data movement. While the downside of bringing application processing into the database is heavier processing overhead, there are several design features that call these issues into question.

Cloud-native architecture, which allows compute to scale as needed, eliminates the problem of contention for limited resources. In addition, most cloud analytics platforms that support in-database ML only streamline or support limited libraries of models to prevent the AI ​​equivalent of the workload from spiraling out of control, especially for training runs that tend to be the most time-consuming and compute-intensive. Oracle has published ML benchmarks for HeatWave 3.0 that are available on GitHub for customers and prospects to run and verify themselves.

Oracle’s introduction of ML processing in HeatWave complements an ML-related feature from the latest release, version 2.0 from last summer. That release included MySQL Autopilot, which uses internalized machine learning to help customers operate the database, such as database provisioning and loading suggestions, while providing a closed loop for error handling/remediation and query execution. .

With version 3.0, MySQL HeatWave comes full circle, using ML to run the database and support running ML models in it. This is another example of a prediction I made for this year that machine learning will be at the heart of both optimizing the operation of the database and giving customers the ability to develop models in the database and/or to be carried out.

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more about membership.

This post Oracle Raises MySQL HeatWave’s Thermostat for In-Database Machine Learning

was original published at “”