AI Is Changing The Way We Look At Data Science

October 9, 2017

I’m Francesco Gadaleta, the Chief Data Officer at Abe AI, where we streamline banking with financial artificial intelligence. I want to explain to you the difference between data science and artificial intelligence as we know it today. As well as, why we are excited about it and what’s different this time around compared to what’s happened in the past.

Check out the full video below:

 

A few years ago, the Chief Economist of Google, Hal Varian, said, “I keep saying that the sexy job in the next ten years will be statisticians and I am not kidding.” Indeed he was not kidding because it is true. The sexy job of the next ten years will be statisticians, data scientists, and artificial intelligence experts. The thing Hal didn’t tell us is what would happen after those ten years. In fact, data science is the sexiest job of the 21st century, not only in the United States, but in Asia, Europe, and pretty much everywhere else in the world. The problem is that it’s also the most vulnerable job. It’s probably at the top of jobs that are probably going disappear because of artificial intelligence.

The sexy job of the next ten years will be statisticians, data scientists, and #AI experts. Click To Tweet

Artificial intelligence is changing, in a disruptive way, every industry.  It’s changing the financial industry, healthcare, entertainment, marketing, and even personal projects. AI is also changing data science itself.

#AI is changing data science itself. Click To Tweet

Now, if you are here and I am here, it’s because we are in a way related to data science projects and something that deals with artificial intelligence. Let me show you a picture of what data science is and how we’ve used it so far.

Here you can see the typical waterfall pipeline of the data scientist. This still works for several domains in everyday life. Usually, you would start from the discovery phase, where you ask yourself if you have enough information to at least start an analytic plan. From there, you move to the second phase which is data preparation, where you ask yourself and your colleagues if you have quality data so that you can start building a model. Then you move to phase three which is model planning or designing. This is where you design your model and calculate all the possible algorithm complexity and feasibility from a mathematical perspective. When you are ready with your model planning then you move to your model building. In this phase, you implement in any of the programming languages that you are comfortable with. Then you start validating this model that’s between phase four and five, in which you assess the robustness and the accuracy of the model. After you communicate the results with the rest of your team or to the business, once the model has been made operational.

In this waterfall pipeline, you can understand how step two and three are extremely important because they define what is the quality of the data and the quality of the model. These two phases are usually referred to as the feature engineering step, in which data scientists are using gut feelings, previous expertise, or even domain-specific knowledge to extract these features and therefore feed the models with transformed data. Basically what a model sees of the data is not the raw version of that, but it’s the transformed version and this transformation is exactly what I mean by featured engineering. Selecting the right features that might be or are very likely to be relevant to the problem you would like to solve. For instance, in a financial transaction, a feature might be a date, account balance, or any other information related to what you just purchased in your transaction like when did you buy, how much did you buy, and so on. Traditional machine learning relies on the fact that the data scientist will create these features manually. As you’ll see, this is a time-consuming approach with a lot of human interactions that makes it prone to errors.

This is exactly how data science has been happening so far. You start from input data, go through the feature engineering process, then design your algorithm and from there implement and execute it. This feature engineering part, as I said, is prone to human error and it’s very time-consuming. In addition to that, it’s usually very hard to generalize across domains. For example, once you fix the problem in a financial domain and you decide to move that model into a healthcare domain, most of the time you will have to start from scratch because the feature engineering step is very much domain specific. Therefore these algorithms that work in finance might not work in healthcare and will definitely not work for social media problems.

The feature engineering part is prone to human error and is time-consuming. Click To Tweet

So after that artificial intelligence came. Now, take a few seconds to answer this question: How many neural networks do you think there are out there? Many of you are probably thinking of an impressive amount of neural networks due to the fact that there are a number of problems that are quite different from each other. For example, think about speech recognition tasks, computer visions, or just numerical or time series analysis done with neural networks.

There is just one neural network for all these problems. Of course, I am generalizing here, but basically, what we usually refer to when we deal with artificial intelligence and neural network specifically we are referring to this black box that you feed with some inputs and you get some outputs. Of course, this is an oversimplification. Many of the people that are not in the research or are not practitioners just know about artificial intelligence and they know about this magic box that can solve a lot of problems.

There is just one neural network for all these problems. #AI #MachineLearning Click To Tweet

Some of other folks out there prefer to look at neural networks this way, which specifically is the hidden layers. This is whatever is happening in between the input and the output. Where the input can be is usually raw sensory inputs. It could be financial transactions, the heartbeat of an individual or a million individuals, medical claims, tax, clinical lab tests, and so on. The output could be, for instance, in terms of financial transactions, the score of a financial health of an individual. Whatever is happening in between the so-called hidden layer, of course, it can be more than one, is basically what the network learns about the input and the output and therefore that’s where the networks start self-tuning in order to optimize this relationship between input and output.

People like me prefer to look at neural networks in this way, which is a bunch of logistic regressions, or regressions where you have to minimize some kind of function called loss function between the predicted value and the true value.

I also like to look at neural networks in this way, where we have to apply the horsepower of artificial intelligence, which is the backpropagation algorithm, which of course requires a much longer explanation.

Let me go back to this picture where everybody is happy and we all agree that neural networks work as long as you treat them like black boxes.

Why do they work? It all started as a game and as a fun project by some guys at Google who decided to tackle the problem of distinguishing cats from dogs. For those who are familiar with computer vision or have been practicing computer vision for a while, they know how difficult this problem can be. Distinguishing a cat from a dog is something that’s extremely easy for human beings, for us to go out in streets and we can see and make the distinction between a cat and a dog. Of course, even a kid can do that. If we ask that kid what’s going on in your brain while you do that we hardly get an answer because it’s very difficult to understand which neurons or what’s going on in our brain when we actually distinguish a cat from a dog. Now, if we want to solve this problem in an algorithmic way, the traditional way of data science, we should define some features, as I said in the feature engineering step, some features that are specific to the cat and not to the dog. So that will allow us to distinguish a cat from a dog.

Therefore we go through this feature engineering problem and we start defining some features for the cat. So we say the cat has four legs, two eyes, two ears, but then you’d say the dog also has four legs, two eyes, and two ears. Yes, but with the cat class there are many whiskers and multiple colors, but also for dogs there are multiple colors and many whiskers. As you can see, it becomes absolutely challenging to find some features that are specific to the cat and not the dog and the other way around.

As you can understand, detecting the right features is a very hard task and the expert knowledge usually plays a fundamental role. For instance, when you want to detect special breeds of cats or dogs, you need to be an animal breed expert. It’s very difficult to handcraft these features and usually when you succeed with that this approach does not generalize across domains. So cracking the problem between cats and dogs doesn’t mean that you are going to crack the problem of distinguishing chairs from tables. Probably you will have to start from scratch.

So how does deep learning deal with that? Well, when they asked me to define the neural models in a very compact way, I referred to the cat and dog problem and I just threw it. I said okay, “neural models learn to distinguish a dog from a cat just like the brain of a baby.” What happens to the brain of a baby? We never explained what are the features of the cat, or the typical features of a dog. We just tell them and show our baby what is the cat and what is the dog. Then something happens in the baby’s brain that is some neurons start connecting with other neurons and forming these active synapses that will get excited the next time a new cat or new dog will be seen by the baby. This is exactly the principle behind neural networks and behind deep learning in particular. So once you feed the network with a very high number of pictures, in this case, cats and dogs, the networks start self-tuning. Therefore a bunch of neurons starts to raise and connect with each other in order to recognize these specific objects. After a while of training the networks, you will start identifying neurons that can, for instance, recognize diagonal lines, blobs, or other pixel aggregated in a specific way, but also more complicated shapes.  For instance, a face or a node that gets excited when it sees a cat and the node that gets excited when it sees a dog. So we have the so-called dog neuron and cat neuron. Now, this happens on pretty much every type of data. I’m just making an example with images because it’s easier to represent and explain, but this can be generalized to pretty much every type of data, especially numeric data.

Why is this happening today? There are three major reasons that I believe are causing these amazing times with deep learning and artificial intelligence. The first is big data, GPU power, and algorithm progress.

#Bigdata, GPU, and algorithm progress are the 3 main things contributing to the success of #deeplearning and #AI. Click To Tweet

Big data is not just about the amount. It’s not just large data, but it’s about integrated data. A lot of heterogeneous data can be pulled and integrated much much easier than in the past. This, of course, facilitates training of these very data hungry algorithms like deep learning.

#Bigdata is not just about the amount of data. It’s about integrated data. Click To Tweet

The second reason is GPUs, which stands for graphical processing units. In general, it’s about better and faster hardware. GPUs, in particular, can speed up training of a network up to a hundred or more times than regular CPUs. Just so you know, you have a GPU in your mobile phone. So imagine what can happen with a standalone server that is specifically designed to calculate on GPUs.

The third reason is a lot of research. Today we have much better algorithms, not just because there are smarter researchers out there, but because we are all focused on improving these tools or three of horsepower deep learning, mainly back-propagation, and function optimization. One of which is stochastic gradient descent, which is the way neural networks actually learn.

So, of course, we are not very much interested in distinguishing cats from dogs, especially here at Abe AI, but we are very much interested in detecting fraudulent transactions for example. Here you can see how easy it can be for a neural network to deal with legitimate transactions and fraudulent transactions in a fraud detection algorithm. In the same way, it was dealing with cats and dogs.

This is exactly what we have been working on for a while. We developed an amazing artificial intelligence based algorithm that could, for instance, detect fraudulent or high-risk transactions in the forex for a global bank. Before Abe, there was a lot of human intervention and literally a human being, a person, behind four screens who couldn’t cope with transactional data coming in at a very high pace. He had to go through each record manually, and therefore flag and click around. It was very time-consuming. When we implemented this approach we learned, of course from the human process, and we developed this algorithm that at some point started going on autopilot. Therefore it really could achieve real-time transactional data classification.

Artificial intelligence and specifically deep learning are amazing algorithms. It’s very exciting times for us and for everyone out there. The problem is that these algorithms require a lot of data. They are very data hungry algorithms. So a lot of research is being performed now in order to optimize this network and use fewer data. Of course, they require researchers and practitioners to think differently. In a way that we have to transform the same problem into something that deep learning can solve. Here I’m referring to the cat and dog problem in finance could easily become a fraud detection problem if we indeed transform the problem into something that artificial intelligence can understand. Another amazing feature about deep learning is about knowledge transfer. The capability to train the network in a domain and tune it somewhere and predicting yet another domain. We have this capability of exporting this artificial brain back and forth, from domain to domain and still maintaining the decent accuracy with the model so we don’t have to start from scratch as it’s happened in the past.

#AI and specifically #DeepLearning are amazing algorithms. Click To Tweet

As you can imagine there are a plethora of opportunities for machine learning, deep learning, and more specifically with artificial intelligence. These are all mathematical tools that are really changing the way we solve problems and it’s exactly how we solve problems at Abe AI.

There is a ton of opportunities with #machinelearning, #deeplearning, and #AI. Click To Tweet