Data and AI are the keys to digital transformation – how can you ensure their integrity?

Missed a session at the Data Summit? View on demand here.

If data is the new oil of the digital economy, artificial intelligence (AI) is the steam engine. Companies that capitalize on the power of data and AI hold the key to innovation – just as oil and steam engines fueled transportation and, ultimately, the industrial revolution.

By 2022, data and AI will have paved the way for the next chapter of the digital revolution, powering businesses around the world. How can companies ensure that responsibility and ethics are at the heart of these revolutionary technologies?

Defining responsibility in data and AI

One of the biggest contributors to bias in AI is undoubtedly the lack of diversity among annotators and data labels, who then train the models that AI ultimately learns from.

Saiph Savage, a panelist at VentureBeat’s Data Summit and assistant professor and director of the Civic AI Lab at the Khoury College of Computer Sciences at Northeastern University, says responsible AI starts with groundwork that’s inclusive from the start.

“One of the critical things to think about is the ability to get different types of workforce to run the data labels for your company,” said Savage at VentureBeat’s Data Summit conference. “Why? Suppose you only recruit workers from New York. It is very likely that the New York workers even have different ways of labeling information than a worker from a rural area, based on their different types of experiences and even different types of prejudices that employees may have.”

Industry experts understand that many AI models in production today require annotated, labeled data to learn from in order to amplify the intelligence of the AI ​​and ultimately the machine’s overall capabilities.

The technologies that support this are also complex, such as natural language processing (NLP), computer vision, and sentiment analysis. Unfortunately, with this complexity, the margin of error regarding how the AI ​​is trained can be quite large.

Research shows that even well-known NLP language models contain racial, religious, gender and occupational biases. Similarly, researchers have documented evidence of pervasive biases in computer vision algorithms showing that these models automatically learn biases from how groups of people (by ethnicity, gender, weight, etc.) are stereotypically portrayed online. Sentiment analysis models face the same challenges.

“Responsible AI is a very important topic, but it is only as good as it is doable,” said Olga Megorskaya, Data Summit panelist and CEO of global data labeling platform Toloka AI. “If you’re a business, applying AI responsibly means constantly monitoring the quality of the models you’ve deployed in production at all times and understanding where AI’s decisions come from. [You must] understand the data on which these models have been trained and continuously update the training models to the current context in which the model operates. Second, responsible AI means responsible treatment of people who actually act behind the scenes of training AI models. And here we work closely with many researchers and universities.”

Explainability and transparency

If responsible AI is only as good as actionable, the explainability and transparency behind AI is only as good as the feelings of transparency and information afforded to both the annotators and labellers who work with the data, and the customers of the companies with using services such as Toloka.

In particular, Toloka, which launched in 2014, is positioning itself as a crowdsourcing platform and microtasking project to source diverse individuals around the world to quickly flag large amounts of data that will eventually be used for machine learning and improving search algorithms.

Over the past eight years, Toloka has expanded; today, the project has more than 200,000 contributors to annotate and label data from more than 100 countries around the world. The company is also developing tools to help detect biases in datasets and tools that provide quick feedback on issues arising related to labeling projects that could affect the requesting company’s interfaces, project, or tools. Toloka also works closely with researchers from labs such as the Civic AI Lab at the Khoury College of Computer Sciences at Northeastern University, where Savage works.

According to Megorskaya, companies in the AI ​​and data labels market need to work on transparency and explainability in a way that “…[es] both the interests of the employees and the companies to make it a win-win situation where everyone gets the benefit of common development.”

Megorskaya advises companies to stay tuned to the following to ensure transparency and accountability on internal and external fronts:

Constantly adjust the data that AI is trained on to reflect current situations and real-life data. Measure the quality of the models and use that information to build metrics about the quality of your models to track their improvement and performance over time. Stay agile. Think of transparency as understanding the guidelines the data labelers should follow when performing the annotations. Make feedback accessible and prioritize addressing it.

For example, Toloka’s platform provides insight into available tasks, as well as guidelines for the labelers doing the work. In this way, there is a direct, rapid feedback loop from the employees performing the labeling and the companies requesting that work. If a labeling rule or guideline needs to be adjusted, that change can be made in the blink of an eye. This process makes room for label maker teams to then approach the rest of the data labeling process in a more unified, accurate and updated way – allowing for a people-centric approach to address biases as they arise.

Bringing ‘humanity’ to the forefront of innovation

Both Megorskaya and Savage agree that if a company outsources or outsources its data labels and annotations, that decision itself creates a crack in the responsible development of the AI ​​it will eventually train. Often, companies that outsource the labeling and training of their AI models do not have the ability to communicate directly with the people who are actually labeling the data.

By focusing on removing bias from the AI ​​production sphere and breaking the cycle of disconnected systems, Toloka says AI and machine learning will become more inclusive and representative of society.

Toloka hopes to pave the way for this change and aims to have development engineers asking companies to meet the data labelers in person. By doing this, they can see the diversity in end users that will ultimately affect the data and AI. Engineering without understanding the real people, places, and communities that a company’s technology will ultimately impact creates a gap, and closing that gap in this way creates a new layer of responsible development for teams.

“In the modern world, no effective AI model can be trained on some data collected by a small group of pre-selected people who spend their lives just doing this tally,” Megorskaya said.

Toloka creates datasheets to demonstrate the biases employees can have. “If you label data, these sheets show information like what kind of backgrounds the employees have and what backgrounds might be missing,” Savage says. “This is especially helpful for developers and researchers to see so they can make decisions to gain the backgrounds and perspectives that may be missing from the next run to make the models more inclusive.”

While it may seem like a daunting attempt to include a world of myriad ethnicities, backgrounds, experiences, and upbringing in every dataset and model, Savage and Megorskaya emphasize that the most important way for companies, researchers and developers to keep climbing to a fair and accountable AI is to involve as many key stakeholders as possible from the start that will influence your technology, as correcting biases becomes much more difficult later on.

“It might be hard to say that AI can absolutely be responsible and ethical, but it’s important to approach this goal as closely as possible,” Megorskaya said. “Having the broadest and inclusive representation possible is critical to give engineers the best tools to build AI as effectively as possible.”

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more

This post Data and AI are the keys to digital transformation – how can you ensure their integrity?

was original published at “”