How data scientists can create a more inclusive landscape for financial services

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!

For those who understand its real-world applications and its potential, artificial intelligence is one of the most valuable tools we have today. From disease detection and drug discovery to climate change models, AI continuously provides the insights and solutions that help us address the most pressing challenges of our time.

In financial services, one of the key issues we face is the inequality when it comes to financial inclusion. While this disparity is caused by many factors, at least the common denominator is probably data (or lack thereof). Data is the lifeblood of most organizations, but especially those looking to implement advanced automation through AI and machine learning. It is therefore up to financial services firms and the data science community to understand how models can be used to create a more inclusive financial services landscape.

lend a hand

Borrowing is an essential financial service today. It generates revenue for banks and lenders, but also provides a core service for both individuals and businesses. Loans can be a lifeline in difficult times, or a boost for a young start-up. But in any case, the credit risk must be evaluated.

Most of today’s default risk is calculated through automated tools. Increasingly, this automation is provided by algorithms that vastly speed up the decision-making process. The data informing these models is extensive, but as with any decision-making algorithm there is a tendency to provide accurate results for a majority group, putting certain individuals and minority groups at a disadvantage depending on the model used.

This business model is obviously unsustainable, which is why lenders need to consider the more nuanced factors behind making “the right decision”. With the demand for loans soaring, especially as point of sale loans such as buy-now-pay-later offer new and flexible ways to obtain credit, there is now a wealth of competition in the industry, with traditional lenders, challengers and fintechs all compete for market share. As regulatory and social pressures around fairness and equitable outcomes continue to mount, organizations that prioritize and codify these principles within their business and data science models will become increasingly attractive to customers.

Building for honesty

When a loan risk model rejects applications, many of the unsuccessful applicants may implicitly understand the logic behind the decision. They may have applied knowing they probably wouldn’t meet the eligibility criteria, or they may have simply miscalculated their eligibility. But what happens if a member of a minority group or individual is rejected on the grounds that he is outside the majority group on which a model has been trained?

Customers don’t need to be data scientists to understand when dishonesty – algorithmic or otherwise – has occurred. If a small business owner has the means to repay their loan but is rejected for no apparent reason, they will be rightfully upset at their mistreatment and may seek a competitor to provide the services they need. In addition, if customers from a similar background are also unfairly rejected, there may be something wrong with the model. The most common explanation here is that bias has somehow crept into the model.

Recent history has shown that insurance companies use machine learning for insurance premiums that discriminate against older people, online price discrimination, and even product personalization that steers minorities toward higher rates. The cost of these egregious mistakes is serious reputational damage, irreparably losing customer trust.

This is where priorities within the data science and financial services communities must now be refocused, leading to equitable outcomes for all above all high-performing models that work for the majority. We must prioritize people next to model performance.

Eliminate bias in models

Despite regulations that rightly prohibit the use of sensitive information for use in decision-making algorithms, dishonesty can creep in through the use of biased data. To illustrate how this is possible, here are five examples of how data bias can occur:

Missing data — This is where a data set is used that may be missing certain fields for certain groups in the population. Sample bias — The sample data sets chosen to train models do not accurately reflect the population users intended to model, meaning the models will be largely blind to certain minority groups and individuals. Exclusion bias — This is when data is deleted or not included because it is considered unimportant. This is why robust data validation and diverse data science teams are essential. Measurement bias — This occurs when the data collected for training does not accurately represent the target population, or when erroneous measurements lead to data distortion. Label bias — A common pitfall in the data labeling stage of a project, label bias occurs when similar types of data are labeled inconsistently. Again, this is more of a validation issue.

While no point in this list can be described as malicious bias, it’s easy to see how bias can find its way into models if a robust framework that builds fairness isn’t incorporated from the start of a data science project.

Data scientists and machine learning engineers are used to very specific pipelines that traditionally favored high performance. Data is at the heart of modeling, so we start any data science project by exploring our datasets and identifying relationships. We go through exploratory data analysis so that we can understand and explore our data. Then it’s time to go into the preprocessing phase where we rummage and clean up our data before starting the intense process of generating functions, which helps us create more useful descriptions of the data. Then we experiment with different models, tune parameters and hyperparameters, validate our models and repeat this cycle until we reach our desired performance stats. Once this is done, we can take our solutions into production and deploy them, which we will then maintain in production environments.

It’s a lot of work, but there’s a major problem that isn’t addressed with this traditional model. At no point in this cadence of activity is the fairness of the model assessed, nor is data bias heavily examined. We need to work with domain experts, including legal and governance, to understand what fairness means for the issue at hand and try to reduce bias from the foundation of our modeling, ie the data.

Simply understanding how bias can find its way into models is a good start when it comes to creating a more inclusive financial services environment. By testing ourselves against the above points and reassessing how we approach data science projects, we can try to create models that work for everyone.

Adam Lieberman is head of artificial intelligence and machine learning at Finastra

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

This post How data scientists can create a more inclusive landscape for financial services

was original published at “”