This is what makes deep learning so powerful

We’re excited to bring Transform 2022 back in person on July 19 and virtually July 20 – August 3. Join AI and data leaders for insightful conversations and exciting networking opportunities. Learn more

The use of deep learning has grown rapidly over the past decade thanks to the adoption of cloud-based technology and the use of deep learning systems in big data, according to Emergen Research, which expects deep learning to become a $93 billion market by 2028.

But what exactly is deep learning and how does it work?

Deep learning is a subset of machine learning that uses neural networks to perform learning and predictions. Deep learning has shown amazing performance on various tasks, be it text, time series or computer vision. The success of deep learning comes mainly from the availability of large amounts of data and computing power. However, it is more than that, making deep learning much better than any classic machine learning algorithms.

Deep Learning: Neural Networks and Functions

A neural network is an interconnected network of neurons where each neuron is a limited function approximation. In this way, neural networks are regarded as universal function approaches. If you remember high school math, a function is a mapping from an input space to an output space. A simple sin(x) function is to map angular space (-180o to 180o or 0o to 360o) to real number space (-1 to 1).

Let’s see why neural networks are considered universal function approaches. Each neuron learns a constrained function: f(.) = g(W*X) where W is the weight vector to be learned, X is the input vector and g(.) is a nonlinear transformation. W*X can be visualized as a line (being learned) in high dimensional space (hyperplane) and g(.) can be any nonlinear differentiable function such as sigmoid, tanh, ReLU, etc. (often used in deep learning community). Learning in neural networks is nothing more than finding the optimal weight vector W. As an example, in y = mx+c we have 2 weights: m and c. Now, depending on the distribution of points in 2D space, we find the optimal value of m & c that satisfies a number of criteria: the difference between predicted y and actual points is minimal for all data points.

The layer effect

Now that each neuron is a nonlinear function, we stack several such neurons in a “layer” where each neuron receives the same set of inputs, but learns different weights W. Therefore, each layer has a set of learned functions: [f1, f2, …, fn], which are called as hidden layer values. These values ​​are combined again, in the next layer: h(f1, f2, …, fn) and so on. In this way, each layer is composed of functions from the previous layer (something like h(f(g(x)))). It has been shown that through this composition we can learn any nonlinear complex function.

Deep learning is a neural network with many hidden layers (usually identified by > 2 hidden layers). But effectively, what is deep learning a complex composition of functions from layer to layer, finding the function that defines a mapping from input to output. For example, if the input is an image of a lion and the output is the image classification that the image belongs to the class of lions, deep learning is learning a function that maps image vectors to classes. Likewise, input is the word order and output is whether the input sentence has a positive/neutral/negative sentiment. Deep learning is therefore learning a map from input text to output classes: neutral or positive or negative.

Deep learning as interpolation

From a biological interpretation, people process images of the world by interpreting them hierarchically bit by bit, from low-level features such as edges and contours to high-level features such as objects and scenes. Function composition in neural networks is in line with this, where each function composition learns complex features of an image. The most common neural network architecture used for images is Convolutional Neural Network (CNN), which learns these features in a hierarchical manner and then classifies a fully connected neural network image features into different classes.

Using high school math again, with a set of data points in 2D, we try to fit a curve through interpolation that somewhat represents a function that defines those data points. The more complex the function we fit (in interpolation, e.g. determined by polynomial degree), the more it fits the data; however, the less it generalizes to a new data point. This is where deep learning faces challenges and what is commonly referred to as an overfitting problem: fitting data as much as possible, but compromising generalization. Almost all architectures in deep learning had to deal with this important factor in order to learn a common function that can perform equally well on unseen data.

A deep learning pioneer, Yann LeCun (creator of the convolutional neural network and winner of the ACM Turing Prize) posted on his Twitter name (based on a paper): “Deep Learning isn’t as impressive as you think because it’s mere interpolation resulting from glorified curve fitting. But in high dimensions there is no such thing as interpolation. In high dimensions, everything is extrapolation.” So, as part of function learning, deep learning is nothing but interpolation or in some cases extrapolation. That’s all!

The learning aspect

So, how do we learn this complex function? Well, it totally depends on the problem and that’s what determines the neural network architecture. If we are interested in image classification, we use CNN. If we are interested in time dependent predictions or text we use RNN or transformers and if we have a dynamic environment (like driving a car) we use reinforcement learning. Apart from this, learning involves dealing with several challenges:

Ensuring that the model learns the general function and does not fit just to train data; this is handled through regularization. Depending on the problem, the choice of the loss function is made; loosely speaking, the loss function is an error function between what we want (true value) and what we currently have (current prediction). Gradient descent is the algorithm used to converge to an optimal function; determining the learning rate becomes challenging because when we are not optimal we want to go faster to optimal, and when we are near optimal we want to slow down to make sure we converge to optimal and global minima. High number of hidden layers must deal with the disappearing gradient problem; architectural changes such as skip connections and the correct nonlinear activation function help to solve this.

Computer Challenges

Now that we know that deep learning is just a complex learning function, it brings other computational challenges:

To learn a complex function we need a large amount of data To process large data we need fast computing environments We need an infrastructure that supports such environments

Parallel processing with CPUs is not enough to calculate millions or billions of weights (also called parameters of DL). Neural networks require learning weights that require vector (or tensor) multiplications. That’s where GPUs come in handy, as they can do parallel vector multiplications very quickly. Depending on the deep learning architecture, the data size and the task at hand, sometimes we need 1 GPU, and sometimes several that a data scientist has to take based on known literature or by measuring the performance on 1 GPU.

With the use of the right neural network architecture (number of layers, number of neurons, nonlinear function, etc.) along with enough data, a deep learning network can learn any mapping from one vector space to another vector space. That’s what makes deep learning such a powerful tool for any machine learning task.

Abhishek Gupta is the principal data scientist at Talentica Software.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the very latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers

This post This is what makes deep learning so powerful

was original published at “”

No Comment

Leave a reply

Your email address will not be published. Required fields are marked *