Book Image

Creators of Intelligence

By : Dr. Alex Antic
Book Image

Creators of Intelligence

By: Dr. Alex Antic

Overview of this book

A Gartner prediction in 2018 led to numerous articles stating that "85% of AI and machine learning projects fail to deliver.” Although it's unclear whether a mass extinction event occurred for AI implementations at the end of 2022, the question remains: how can I ensure that my project delivers value and doesn't become a statistic? The demand for data scientists has only grown since 2015, when they were dubbed the new “rock stars” of business. But how can you become a data science rock star? As a new senior data leader, how can you build and manage a productive team? And what is the path to becoming a chief data officer? Creators of Intelligence is a collection of in-depth, one-on-one interviews where Dr. Alex Antic, a recognized data science leader, explores the answers to these questions and more with some of the world's leading data science leaders and CDOs. Interviews with: Cortnie Abercrombie, Edward Santow, Kshira Saagar, Charles Martin, Petar Veličković, Kathleen Maley, Kirk Borne, Nikolaj Van Omme, Jason Tamara Widjaja, Jon Whittle, Althea Davis, Igor Halperin, Christina Stathopoulos, Angshuman Ghosh, Maria Milosavljevic, Dr. Meri Rosich, Dat Tran, and Stephane Doyen.
Table of Contents (23 chapters)
1
Chapter 1: Introducing the Creators of Intelligence

Establishing a strong data culture

AA: Data culture is a somewhat related issue. In your opinion, what does an effective data culture look like? How do you advise organizations on building a data culture?

CA: It goes back to the C-suite culture: the people in charge, how they view data, and how involved or how data-literate they are really affect the data culture. The reason that people have a poor data culture is usually that they have people who don’t know anything about data at the top.

There can be problems at both the bottom and the top of the organization. I have a top 10 list on my website about this, in an article about when, as a data professional, you should just walk away from a situation because you’re not really going to get anywhere (https://www.aitruth.org/post/10-signs-you-might-want-to-walk-away-from-an-ai-initiative). There can be this over-amorous feeling about data. The CEOs and C-suite-level people can sometimes think it’s going to solve world hunger! They don’t have a clue what it’s actually supposed to be able to do within their bounds and their company, but they think that it does a lot more, and they think that somehow the data’s going to do whatever analysis needs to be done itself. They don’t think about the people who are actually performing the analysis, how long it takes people to get things done, and how much data needs to be cleaned up.

We used to laugh a little bit when I was at IBM about executives who would promise to get data solutions up and running within four weeks. We would say, “Yeah, that’s going to be just an initial investigation of your data.” Anytime you’re working with data, you have to understand the quality of the data that you have and so many other aspects, such as where you’re going to get it.

At AT&T, they did projects for everybody on the planet and they had 1,000 people acting as “data librarians” – that’s my term, not theirs. You could go to these expert data-sourcing resources and say, “I’m on a computer vision harvesting project for John Deere tractors, and they want to know the varying stages of ripeness for a red cabbage. Do you happen to have pictures of red cabbages in varying stages of ripeness somewhere?” There was a 1,000-person team that could say, “Yes, there’s a place over in Iowa that’s been working on this. We will procure some datasets or an API for you.”

Sometimes, the data is easy and readily available, depending on what you’re trying to accomplish, but other times, it’s a hard use case and you’re going to be tapped to try to figure out where to get it. Where am I going to source this information, and is it even possible to do so? There’s a whole investigation that has to happen. If your C-level leader doesn’t understand what goes into it and doesn’t trust the people that are working for them, it’s not going to work. You’re working with all these vendors that have been hodge-podged together, which a lot of C-suite people do because they just see the numbers: “Oh, it’s cheaper if I outsource this.” But what you’re dealing with sometimes is that they’re just throwing bodies at the problem as opposed to actually having expertise – expertise can sometimes cost more.

The C-suite can have a great effect in terms of how much time they give to a project and how much leeway they give to people about finding data sources, investigating them, and pulling them together in the right ways. I’ve seen that when people are not given enough time or budget, they’ll just go for the cut-throat version of things. They’ll say, “OK! I’ve got $1 per record to spend on something that should normally cost $100 per record,” or, “I need genetic information on this but I’m not allowed to have that, so I’m just going to make up some stuff.”

You see all kinds of bad practices because the C-suite has unrealistic expectations. But then, you see bad behaviors going up too, from the bottom up. You see some data scientists that are just lazy. They don’t want to do things the right way. They are collecting experience on their resume like baseball cards. They just want to go from the project that they just got offered to this project, to the next project, and then they’re going to put that on their resume, and they’re going to keep moving their salary up from $100,000, to $200,000, to $300,000, to $400,000.

There’s bad data culture everywhere.

The best thing you can do is be literate about the data issues, have some trusted people that you work with, and pay them well.

Look at the types of products people have taken on and how long they spend at a company. If they are one of those people that’s just in it for 12 to 18 months and you only see one project per place on their resume, that’s a pretty good sign that they’re just going to rush through, not document anything, and then leave you holding the bag with no Return on Investment (ROI) at the end of it. That’s my personal opinion.

AA: Yes, that all resonates with me. I love the way you also address the issue with data scientists themselves. People are often very guarded about speaking negatively, but let’s be honest: there are many data scientists who are collecting experience, just trying to move up the ladder like any working professional. They’re no different from anyone else. They can be quite crafty with what they put on their resumes.

CA: That’s exactly right. My thought is, “Go with the people who you trust, and if those people happen to be inside the company already, then just teach them the skills.” Remember my boss from before, he said, “I know you can do this. You’re ambitious. We’re just going to give you all the classes you need.” I go for trust and the personality types that I think would do well, and then I just give them the skills. That’s how I approach it, as opposed to the opposite, where data science candidates have promising skills, and then they come in and then they’re not really that fantastic, but you’ve paid a ton of money for them and possibly signed a long-term contract. You don’t want to get to the point where you’re at the end of the game and thinking, “I’ve already sunk a million dollars, and now I have no idea what this person did.”

There might even be no documentation because a lot of people see failing to document what they’ve done and how they’ve munged data together as a way to control the situation, provide job security for themselves, and increase their salary. Some people do that. I’m not saying everybody does, but some do.

Then, we also have the opposite situation. There are also abused data scientists out there too, who are truly trying to do the right things but the time frames and budgets that they’ve been given are just so unrealistic that they couldn’t possibly deliver a quality end result. Every profession has good people and bad people and people in between just trying to survive.

AA: I’m sure you’ve seen many examples where things have gone awry in terms of poor data cultures.

CA: Let’s face it: data culture is just so important. One other aspect of this that I’m learning through research right now is that within the pod structure of data engineers, junior data scientists, and senior data scientists, it is the junior data scientists who are the ones who are most likely to blow the whistle when they’re not being paid attention to. When they’re raising objections in a process, they need to be listened to. What we’re not seeing in these big companies is the ability within the agile process to push back, to assume that there will be some red flags, and to slow down. We’ve become so accustomed to our agile processes delivering the Minimal Viable Product (MVP), even if it’s just an API feed, that we’ve become used to delivering in six to eight weeks. Sometimes, that’s just not possible.

Anytime someone pushes back or people are under pressure, it’s important to know what they’re going to do. Are they just going to drop everything and say, “OK, fine, whatever. I’ll just be unethical – it’s fine. I’m just going to deliver this because my bonus depends on it,” or do they actually care? Will they push back on this part of the process, going up against a senior data engineer and a senior data scientist? Can they hold their own there, and do you have support for them to hold their own? Will the lead data scientist interfacing with the chief marketing officer or digital officer give these people enough pushback ability? I think that’s not happening at all.

I think what we have is a whole lot of abused data scientists who are trying to raise the flags, and others are saying, “No, you’re slowing down our process. We’re not going to get that patent filed in time, or we’re not going to get this thing out to market in time, so we’re just going to push you down.” If that’s the case, you have a very toxic, very risky data culture, and you have to figure out how to address that so that raising red flags (and, more importantly, fixing critical issues that cause red flags) is the norm, not something that you’re blacklisted for. You shouldn’t be blacklisted as a data scientist for bringing up risks that need to be addressed before a product is released.

AA: You’ve touched on an issue that isn’t normally voiced: the realities that data scientists face with unrealistic expectations and the culture they have to abide by to survive, which is often very toxic in nature.

What do you think practitioners (machine learning engineers and data scientists) should be doing to make sure they’re contributing in a positive way to the ethical development of their models and products?

CA: The overarching way that it needs to be approached is from the top down and the bottom up. Practitioners definitely have to be that last stop. The buck stops here. The responsibility is in everybody’s role. That’s why I semi-hesitate when I see AI ethics being called out as a separate group within a company. It bothers me because I think that gives you an easy scapegoat when things go wrong.

You just get to point over to a group that somehow failed you, when in reality, every single person that’s involved in AI development should be feeling that they are responsible in some way, shape, or form for every part of what they’re doing.

The thing that concerns me the most is this attitude in the practitioner groups of, “Why does that even matter to me? That doesn’t have anything to do with me.” Those are the people that need to find the problems and bring them forward because they are the people who should be involved in actually investigating data and understanding how models are set up. Only they know what they chose to use, what they chose not to use, and the decisions that went into that.

Let’s say they feel responsible, but they don’t feel like they can raise their hand, raise the red flag, and say, “Hey! I think we could do better on this data. I’ve seen a lot of flaws in it. I’ve seen that it doesn’t have enough of this type of people. We just left things out in general, and I think that the proxy data was crap” – that’s a problem.

In the case of Amazon’s hiring algorithm, we actually saw that the data scientists themselves were the ones who did the whistleblowing. They did it anonymously, and I’m so grateful that they did. Those executive sponsors (such as the chief human resources officer, for example) don’t really know what they don’t know, and so they would have continued forward with that hiring algorithm had the data science development team not raised the flag. The fact that we still have to call it whistleblowing means that we don’t have a culture or a set of norms yet that is conducive to pushing back, and that’s the problem.

AA: Yes, it’s everyone’s responsibility. I couldn’t agree more.

With the growing influence of AI in our daily lives, trust is a big issue. How do we develop trust in AI? I think your book goes a long way by helping the layperson understand what is and isn’t AI. Beyond that, more broadly as a society, what should we be doing? Also, how important is regulation in your opinion?

CA: We could strip those down, and each one has an answer.

In my book, I do talk about the 12 Tenets of Trust because I think that all across the globe, we’re in a time where we have the lowest levels of trust among people.

How can we hope to build and scale AI at this point when there’s no explainability, no transparency, and no accessible information about what goes into models? I don’t think anybody has earned the trust of anybody right now in the AI space. The typical answer is, “Well, I have accuracy,” but we all know that 100% accuracy on garbage is still garbage. I hate hearing, “Well, my accuracy rate is so great.” That is not a good, solid statement of whether or not we used sound judgment for the things that went into the model in the first place, the data that went in there, or even the way that we conducted the training of the model, and so forth.

People don’t trust each other. There was a research study out here in the States from a group called Pew Research Center (https://www.pewresearch.org/topic/politics-policy/trust-facts-democracy/). Right now, we don’t trust scientists, we don’t trust the government, we don’t trust the news, we don’t trust social media, and we don’t even trust each other as neighbors to do the right things. For all the many hundreds – possibly thousands – of ethical AI frameworks out there, I would say this. There’s one simple thing that you have to remember, and that’s the golden rule, which is to do unto others as you would have them do unto you. As long as you can remember that, you should do OK with trying to develop trust.

I think we’ve become addicted to moving fast and breaking things. That’s why I recommend this 12-step program to break that addiction, which I’m calling the 12 Tenets of Trust. Whatever you are doing in your data science, if you wouldn’t steal $20 off of the ground if someone in front of you just dropped it, and you would instead go find them and say, “Hey, here’s your money back,” then you should do that in your data science too. If your model’s stealing money from people and you as a person would not do that in real life, and instead would chase someone down and give them their money back, you should adopt a similar mindset when you develop your models as well.

We tend to have this thought that we leave part of ourselves behind when we come into the office. We somehow think, “OK, to move forward in data science, we just need a corporate mindset – we have to make money.” That’s our fiduciary responsibility to our stakeholders, but there’s still a line. If you wouldn’t do it otherwise, then don’t do it just because you’re working at a company because, at the end of the day, you still have to look your kids in the face. You have to look at yourself in the mirror and say, “Hey, I did something good today,” or, “I did something terrible.” So, the golden rule “do unto others” is the number-one way to make sure you’ve got trustworthy AI.

AA: Can you please elaborate on the 12 tenets?

CA: These are the 12 tenets that I think the public should expect all of us in AI development to adhere to in order to put their trust and faith in our products or services. The very first tenet we should meet is developing AI that is humane. People ask, “What does being humane have to do with data science?” But I think the very first thing we have to do when we come up with a high-impact use case to fund is ask, “Is it a humane thing to do?” The chatbot that delivers cancer diagnoses is probably not a good idea, for example. The same goes for that article in Wired that I was quoted in where a CEO wanted to use taser-powered drones in schools for children. That’s not a humane use of AI. Do we need taser drones inside school buildings with small children? No, probably not. You’ve got to start with the use case, and say, “Could this cause more harm than good? Is this an appropriate use of AI? Is this humane?

The second thing to ask is, “Is this consensual? Am I taking some data from a person that is gathered for a whole different context from where it’s being used?” A prime example I use for that is a group called Clearview AI (https://www.oaic.gov.au/updates/news-and-media/clearview-ai-breached-australians-privacy). They scrape people’s information and faces from social media and work with law enforcement to provide potential facial matches based on that data. When you find yourself begging, borrowing, and stealing information from people on social media sites, that is not the original intent of the information. If you want to violate people’s trust, go right ahead and keep doing that. But if you want to build trust and make sure that you have a consensual approach where people know what their data is being used for and that it’s being used in a way that is consistent with what they’ve agreed to, then you want to be transparent. We’ve already talked about that.

Also, transparency is not enough. It’s not enough to say, “Hey, I have the ability for you to know about these things.” If you don’t inform people and make the data held accessible to them online so that they can see it for themselves, that’s not what I consider to be transparent or accessible.

The other part of this is personal agency. I need to be able to go in and affect something to do with myself, or at least understand it. It should be explainable, which is the next tenet. Can I affect that that is the wrong address, for example? Can I say, “That’s not the right location. Please go find that information and rectify that for me”? That rectification is actually part of the 12 tenets as well, which we don’t often see. We see a lot of explainability right now, which is fine. It’s hilarious to me, though, that everybody talks about explainability and nobody talks about accountability, traceability, or the ability to govern and rectify these situations, which are the other tenets.

We also need privacy and security. You can’t just take the X-rays of people who have had COVID and stick them where any hacker can get to them, along with dates of birth and everything else that would reside with those medical records. That could happen if you come up with a fantastic X-ray-diagnostic type of machine learning capability. Healthcare is fraught with those kinds of fast-and-loose methods of just throwing people’s information into a single place and then leaving it unlocked. It’s like saying, “I just threw all your stuff into a huge purse, and then I put a big advertisement on the outside of it, and I can’t believe people went in there and stole your data. I don’t understand why that happened.” People are not thinking about that. What people are thinking about as start-ups is, “I’ve got to hurry up and get this product out there and claim first-mover status. I’m going fast and loose with people’s information. I know there are some guidelines I’m supposed to be following, but within my own company I thought it would be OK if I just did this.” No. Just because you thought that would be OK within your own company, doesn’t mean it’s OK. Most of the breaches that we see today are from within, especially in the competitive environment among start-ups where infiltrations can be common and workers move from one competing start-up to another.

The final thing is making sure that your data’s actually correct. You need to make sure you have fair, quality information going into your model, not just some random bits that you were able to procure from somebody who just played a game on Facebook, swears that they now have the psychographic information of people, goes out and labels people as persuadable or whatever else, and then sells the data to the highest bidder (true story: that’s the Cambridge Analytica case). That’s just wrong. I think that you have to know where your data’s coming from. Does it have bias in it? Is it actually correct? If the data quality is really bad, you’re going to have overfitting, which is also going to cause the model to put out weird results. It’ll look like you’re getting some level of accuracy, but when you go back and look at what you’ve got accuracy on, it’s not going to come out right. You’ve got to pay attention to the data itself, and I’ve been amazed at how many data scientists don’t actually go in and thoroughly investigate their data.

To recap, the first 10 of the 12 tenets are these: humane, consensual, transparent, accessible, agency-imbuing, explainable, private and secure (that’s one), fair and quality, accountable, and traceable. Traceability is about whether you know when something went wrong and how it went wrong, and whether it can be traced back to a moment in time. People are using blockchain to do some amazing things with traceability, especially in credit reporting.

Tenet 11 is incorporating feedback. If something isn’t working the way it should, there should be a way to input feedback. This is especially required for expert systems, such as AI trying to take the place of actuaries. Believe it or not, people don’t find that a fun profession anymore, and we’re finding that Gen Z and millennials don’t really want to go into that field. Now we’re training AI to try to do it, but if you don’t have experts continuing to weigh in in some way, shape, or form, you’ll have drift. Bias can also occur if you don’t have ongoing feedback loops incorporated where humans are definitely in the loop.

The last one is governed and rectifiable. What’s the point of having explainable AI if you’re not intending in any way, shape, or form to fix anything that goes wrong? We’re not even talking about that as an industry. We’re so focused on bias and explainability that nobody’s stopped to ask the question, “Well, what do you do when you find bias? What do you do when you find out that your model has drifted all the way over here, or it’s turned into a racist, like Tay? What do you do? Do you just shut it down?”

Tay

Tay was an AI Twitter chatbot created by Microsoft in 2016. Intended to learn from its interactions with human users on Twitter, Tay ended up being taken offline after just a few hours as users taught the algorithm to make offensive statements (https://www.bbc.co.uk/news/technology-35890188).

Think about the Tesla example. We have self-driving cars that can’t even be shut down from outside of the car. What do police do when they find a couple sleeping in their car that’s just racing down the highway? None of us can do anything. Do we just have to hope and pray? No, that’s not acceptable. We need to start considering how we’re going to rectify situations when we build. It needs to be in the design from the get-go, and that’s how people will trust us: they need technology that they find trustworthy. Can we shut things down? Can we govern them? Do we know when things actually went wrong? Is someone even accountable to fix this? That’s my other pet peeve: you have model drift. Who’s going to fix it? “I don’t know. That person already left the company.” In what business are you allowed to say, “I don’t know,” throw up your hands, and say, “Well, anyway, on to the next customer.”

AA: Have you seen that happen?

CA: Yes, absolutely. It’s because of the way that AI gets implemented with all the different vendors and contractors. People are transitory in this space, and so what you see is that a project gets developed, and then it gets left behind with the people that are using it, but the people who are using it don’t know how it works. So, there’s nobody accountable.

AA: How important is regulation now in all of this at a government level and an organizational level? Are you an advocate for regulation?

CA: I usually am not an advocate for regulation. Not because of the intent of the regulation, but because of the government’s ability to execute the regulation in a way that doesn’t overreach to the point of creating barriers to entry for smaller companies who can’t afford to lobby. That said, I do feel like some areas such as social media, autonomous weapons and vehicles, and healthcare urgently require regulations in order to prevent major disasters from happening. For example, we have major social media platforms that are influencing everything from teen suicide rates to how elections are won and the full-on destabilization of countries through the use of fake news and the amplification of hate through bots. This cannot be allowed to continue. For these firms to continue profiting from ads while society goes down in flames is akin to Nero watching as Rome burned.

The big question on my mind is, “How can we get the regulations we need when we have congressional representatives whose average age is 70 or above, who don’t even understand how these firms make their money or how the technology works?”. The Facebook (now Meta) and Google congressional hearings in the US, where a key congressman asked how Facebook makes its money, were a sad demonstration of just how little Congress knows about the tech industry. If they don’t know how the business model works, how can they ever hope to understand what drives these companies to do what they do and under what circumstances they do them?

Facebook congressional hearings

In 2018, Facebook executives were required to speak at a congressional hearing in the US (https://www.vox.com/policy-and-politics/2018/4/10/17222062/mark-zuckerberg-testimony-graham-facebook-regulations).

While I have little faith in the actual lawmakers themselves understanding the problems, I do think there are enough of us out there advocating that we can educate them and the public at large. I wrote my book specifically for people who aren’t in the tech industry to enable them to understand the issues and impacts better. I want to get my book into the hands of as many legislators as I can so that they can be brought up to speed on what AI is, how it works, why adverse impacts happen with AI, and more importantly, what we can do about them.

AA: So, you think education is important?

CA: Yes!

I think the more that people – both inside and outside the tech industry – know, the better. That way, we can all exert pressure on both regulators and the businesses that use data science and AI. We need all people in society to be aware and understand the areas where these technologies affect our lives.

The areas people ask me about most often are related to jobs and being replaced by AI, such as helping a student pick a profession that won’t be usurped by AI, or how ads track them around online. All these years, I’ve been collecting AI questions from lots of different types of people who I’ve encountered: ride-share and taxi drivers, church ladies, family members, postal workers, and parents in line with me at the grocery store. I’ve tried to understand the main questions and frustrations in the minds of people from a slew of different backgrounds. I’ve taken those main categories of questions and made them each into a chapter in my book. The topics include the following:

  • AI in hiring – including AI interviews and social media flagging
  • Job replacement and automation – which jobs and what skills
  • Impacts on kids and teens – tech addiction, suicide, harm challenges, and trafficking
  • Political polarization and radicalization – fake news, conspiracies, and bots
  • Rights and liberties being usurped with AI use in criminal justice – predictive policing and facial matching algorithms
  • Life-and-death AI decisions in healthcare

Globally, one of the areas I think people feel most frustrated with but can’t quite pinpoint why is politics. We are all constantly outraged, pointing fingers back and forth between political parties. We cannot get beyond it. But what the average person doesn’t understand is that there are actually bots out there keeping the angst going by magnifying outrageous salacious content designed to provoke visceral emotional reactions. These bots are intended to polarize and destabilize democratic countries. There are a lot more of these bots than there are moderate people. The theory is that there are a lot more moderate people out there but their content is boring, so it doesn’t attract eyeballs. Because moderate content doesn’t attract eyeballs, it doesn’t get advertising dollars, and because of that, it doesn’t get amplified.

If the standard person understands that this is how everything operates on social media, that in fact their next-door neighbor is just forwarding something that was put out there by a bot, or is fake information, I think it would change the dynamics of how much they invest their personal psyche into the hatred that goes into comments. If you know you’re interacting with a bot, you’re probably not going to be as hateful about it as you are if you think you’re truly being somewhat usurped in your immediate society. I think that is what happens with bots. I think people see some of these comments online and think, “Oh my gosh! This is a personal attack on me.” These bots are so rampant, it’s not even funny. They’re there just to provoke hate in the most devious little ways.

AA: That’s great context. It’s something a lot of people just aren’t really aware of. I think it’s very important that data scientists are cognizant of how market forces and technology combine and are shaping society.