Design Thinking for Data Scientists

Design Thinking for Data Scientists

I gave a talk on Design Thinking for Data Scientists at the O'Reilly Strata Conference in February 2015. The talk was pretty much orthogonal to every other talk at the conference. But it was well received, so I thought I'd share the transcript with you (or your can watch the video). I'd be delighted to hear your thoughts and experiences around this topic of increasing the business impact of Data Scientists working in industry.

Welcome everyone!

This talk is primarily directed towards Data Scientists, especially early career Data Scientists, but its also highly relevant to those who manage Data Scientists.

I’m using the term Data Scientist in a very broad sense, to mean anyone who is using data plus statistics plus programming to do something new, something that’s never been done before in their organization.

I’m going to talk about how and why Intuit is transforming it’s Data Scientists into Design Thinking leaders.

And I also thought it might be useful to you, if I go into enough detail about our Design Thinking process for Data Science projects so that you could try it in your organization.

So, here’s the deep dark secret about Data Science projects:

If we define a successful Data Science project as one that gets deployed and delivers the promised business results, then Data Science projects have a very low success rate.

How do I know?

First, although I’m not aware of any studies on Data Science projects, studies of Big Data projects find a success rate below 30%.

And given that many Data Science projects are exploring the bleeding edge, it wouldn’t be surprising if their success rate was much lower than that.

The other way I know is that I’ve been a Data Scientist for over 20 years, since way before it was called Data Science, and certainly way before it was called the Sexiest Job of the 21st Century by the Harvard Business Review!

I’ve worked in academia (that’s a solar flare on the left), and in multiple industries.

I’ve led many Data Science projects. And I’ve witnessed many more Data Science projects.

And I’d bet good money that the success rate of Data Science projects, across all industries and across everyone who calls themselves a Data Scientist, is well below 10%.

I hate this waste. And I’m sure many of you hate this waste too.

It’s a waste of Data Science talent, and it’s a waste of company money.

But I know that we Data Scientists can do something to reduce this waste.

Here’s my conjecture:

A Data Scientist can increase the probability that their project is successful by 5-10 times if they approach it as a Design Thinking leader.

You’ll notice that I say Design Thinking leader, not just Design Thinking participant.

The good news is that the essential Design Thinking leadership skills are not that hard to acquire, once someone walks you through them, and once you see them modeled. After that, its a matter of practice.

The bigger challenge, as we’ll see in a moment, is to acquire a new mindset.

When I first encountered Design Thinking, many of the techniques looked familiar, because I’d stumbled upon many of them in the past, without calling them Design Thinking techniques.

At the same time it was confusing, because there are so many Design Thinking techniques out there, and it’s not always obvious how to adapt them to Data Science projects.

What I wanted back then, was for someone to show me a specific, step-by-step, best practices Design Thinking process that was tailored to Data Science projects.

What I want to give you today is exactly such a process, so that you can try it in your organization.

I’m certainly not making the claim that this is the only process, or the best process, but it is a battle-tested Minimal Viable Product.

Before we jump into the process, I want to make a disclaimer.

I’m not saying it’s easy to transform Data Scientists into Design Thinking leaders.

Even at Intuit, where the conditions are nearly perfect for doing so, it’s still challenging.

Because remember, there is also a mindset issue.

And that’s why I’m an evangelist!

Intuit is a spry 30 year old company that is reinventing itself from top-to-bottom using Design Thinking.

We have a home-grown Design Thinking framework - Design for Delight (or D4D).

Our Design for Delight process has three stages:

  • Stage 1 is Deep Customer Empathy,
  • Stage 2 is Go Broad to Go Narrow, and
  • Stage 3 is Rapid Experiments with Customers

I’ll unpack these three stages later in this talk, and I’ll show how we tailor them for Data Science projects.

My point is that Intuit takes Design Thinking very seriously.

On your first day of work at Intuit, you get a laptop, and you get Design for Delight training!

And your job performance is partly evaluated on how well you apply Design for Delight to your work.

So there is amazing organizational emphasis and widespread adoption of Design Thinking at Intuit, and Data Scientists have been participants in the Design for Delight process for years.

But its still challenging to persuade more Data Scientists to step up as Design Thinking leaders.

I think this is partly because, in doing so, we’re asking them to step out of their comfort zone:

  • We’re asking them to become change management leaders, and
  • We’re asking them to go from playing an individual sport to playing a team sport. Data Science is an individual sport by and large, and Design Thinking is most definitely a team sport.

So, how do we persuade more Data Scientists to step up to Design Thinking leadership?

Well, personally, in my evangelism, and also in my coaching of early career Data Scientists, I start by influencing their mindset. This is Step 1 in my 12 Step Program!

Data Scientists, are a skeptical bunch.

Their initial reaction to Design Thinking leadership is often that it sounds like something a Product Manager should be doing.

And my answer to that is yes, exactly!

Step 1 in my 12 Step program is for the Data Scientist to see themselves as the Product Manager for a new product that must satisfy the needs of a very diverse group of customers.

Data Scientists are certainly creating new products.

The new product might be:

  • A one-time analysis, to provide internal consumers with insights on some burning issue, or
  • A decision support system, to allow internal consumers to find insights on their own, or
  • A decision engine, to say automatically generate recommendations for external consumers

I’m using consumer here to mean whoever is the target audience for this new product.

And the Data Scientist certainly serves a diverse group of customers.

Who is a customer? My definition of a customer is anyone who can derail the Data Science project!

The primary customer of course is the consumer, whether internal or external to the organization.

But most Data Science projects have a host of other customers, who may include

  • The executive sponsor
  • The sales and marketing team
  • The development team
  • The operations team
  • And so on

As you can imagine, the needs of each class of customers are very different.

And lack of support from any of these customers can derail the Data Science project.

Don’t get me wrong. I’m not saying there is a malicious conspiracy to kill the Data Science project. I’m saying the Data Science project gets deprioritized, or it doesn’t get sufficient resources allocated to it, etc.

So, my argument to Data Scientists is that, in a very real sense, the Data Scientist is the Product Manager for a new product that must satisfy the needs of a very diverse group of customers.

If you buy this argument, then it’s not inherently crazy that some Product Management best practices might also be helpful for Data Scientists.

But why exactly does the Design Thinking process help a Data Science project? Data Scientists, being scientists, want to know why it works, they want to know the mechanism.

I’ve found that it helps because it addresses the biggest reason why Data Science projects fail:

Bad algorithms.

No! Not bad algorithms. I’m just seeing if you’re paying attention!

Most Data Science projects fail because of this:

Getting any organization to adopt a new idea is exceedingly difficult, and most Data Scientists are terrible at it.

I don’t mean to be insulting when I say that, but as a Data Scientist myself, I just want to call a spade a spade.

Introducing a new idea into an organization is like injecting foreign DNA into an organism. It will most likely be rejected.

Pushing this analogy further, I’ve found that the Design Thinking process works because it transforms this foreign DNA into something that is recognized and embraced by the organism.

It transforms “my idea” into “our idea.”

And it transforms “my project” into “our project.”

So, on to the process.

To be clear, I’m not saying you need to apply the full machinery of the process to every Data Science project.

But I’ve found that applying even a light weight version of the process is incredibly helpful.

I’m calling this process Design for Delight Plus Plus, because I’ve added some steps that are specific to Data Science projects.

The first stage in the Design for Delight process is Deep Customer Empathy.

The goals of the Data Scientist in this first stage are to

  • Understand the people you are solving for
  • Define the problem you are solving for them
  • Map the environment in which the solution must operate, and
  • Define the characteristics of a good solution

It’s very important in this first stage for the Data Scientist to be self-disciplined enough not to propose a solution. To remind ourselves of this at Intuit we say “Fall in love with the customer problem, not a particular solution”.

In the Deep Customer Empathy stage, the Data Scientist is a detective, interviewing and observing each class of customer (in the broad sense of customer), then synthesizing what you’ve learned.

In practice this means spending time with your customers, and asking them all the “W” questions.

Customer, from your perspective:

  • What is your role in the project?
  • What problem should this project be solving?
  • What is the rationale for this project?
  • What are your hopes and concerns around this project?
  • What data, systems, and processes should this project consider?
  • What criteria would you use to evaluate a proposed solution?
  • What criteria would you use to evaluate a deployed solution?

Most customers will appreciate the opportunity to clarify their own thinking by answering these questions. But some customers may not appreciate, for example, probing questions about the business rationale for the project. My best advice for the Data Scientist in those cases, is to explain what you’re doing and why you’re doing it, then put on the big boy pants or the big girl pants as the case may be, and keep asking those probing questions!

The outputs from the Deep Customer Empathy stage are five artifacts that synthesize what you’ve learned from your detective work.

The purpose of these five artifacts is to help everyone involved in the Data Science project stay customer-focused (again, in the broad sense of customer) throughout the Data Science project.

And these five artifacts are the inputs to the next stage of the Design for Delight process.

And remember that we’re talking here about making simple posters, not about writing War and Peace!

The first set of artifacts are the personas.

Each persona summarizes what you’ve learned about a particular class of customers.

So you’ll create a persona for your primary customer - the consumer - and you’ll also create personas for your secondary customers – for example, an executive sponsor, a salesperson, an operations engineer, and so on.

The second set of artifacts are the problem statements.

Each persona should have a corresponding problem statement.

Don’t be surprised if this turns into a very difficult task. But I’ve found that if you can craft good problem statements, you are halfway to a good solution. As Einstein is supposed to have said “If I had an hour to solve a problem, and my life depended on it, I would spend 55 minutes thinking about the problem, and 5 minutes solving it.”

A good problem statement defines

  • Who has the problem
  • What they are trying to do
  • Why they want to do it
  • What’s blocking them from doing it, and
  • How that makes them feel

Please don’t neglect feelings!

Emotion is the engine of decision making, so if your Data Science project can tap into your customers’ emotion, then you have exponentially increased its chances of success.

We can talk about this topic for a long time, but the bottom line is please don’t neglect feelings.

The third set of artifacts, which are an extension specific to Data Science projects, are descriptions of the environment in which the solution must operate.

This includes the available data to drive algorithms, the current systems and processes that a solution must integrate with, and so on.

It’s especially important to understand the current systems and processes, including both machines and humans, because systems and processes are exceedingly hard to change. So it’s highly advisable at this early stage to understand which aspects are going to be easier to change and which aspects are going to be harder to change. If you’ve ever shucked an oyster, you know how important it is to find out where the hinge is located.

The fourth artifact is the criteria that will be used to evaluate a proposed solution.

Remember, these criteria come from interviewing your customers.

So, as the Data Scientist, don’t be surprised if these criteria include interpretability and simplicity of the algorithm.

It’s just human nature to mistrust something new that you don’t understand, and that you can’t fix if it breaks.

This criterion can be a real bummer for some Data Scientists because some of us like to apply the latest, greatest, most complex, most inscrutable machine learning algorithm. But I’ve found it’s better to discover such a criterion earlier rather than later.

The fifth, and final, artifact is the criteria that will be used to evaluate the solution once it’s deployed.

Again, as the Data Scientist, don’t be surprised if an internal customer, for example the executive sponsor, wants assurance that the algorithm is working, at all times.

This goes back to the trust issue!

So a one-time or even periodic cross-validation may not be sufficient to satisfy this customer. They may want to know that the algorithm is working on a daily basis, which probably means that the machine learning pipeline has to include testing, monitoring, and alerting from the very beginning.

So, that’s the first stage of Design for Delight, Building Customer Empathy.

The second stage in the Design for Delight process is Go Broad to Go Narrow.

This stage requires the Data Scientist to convene a diverse group of customers for a brainstorming session where they generate lots of solution ideas, filter to select the best solution idea, frame a solution hypothesis, and finally define rapid experiments to test the solution hypothesis.

Depending on the details of the Data Science project, this stage may require several brainstorming sessions, each dealing with a different facet of the problem, say the consumer experience created by the algorithm, the architecture of the machine learning pipeline, and so on.

I’ve found that these brainstorming session are most effective when they run for about 2 hours and include about 6-8 customers.

The role of the Data Scientist in this second stage is to convene a diverse group of customers, and then to act as the facilitator of the brainstorming session.

This diversity requirement might sound odd to some Data Scientists, who might argue "What does a non-technical customer know about algorithms?"

The answer is that they may know anthing about algorithms, but they probably know a lot about the problem domain, and they know the characteristics of a good solution.

But the main reason for not excluding any class of customers from the brainstorming session, is that the act of participating makes customers feel that the solution has their fingerprints on it, and that makes customers feel invested in the success of the solution.

Remember, the goal is to go from “my idea” to “our idea.”

The first step in the brainstorming session is for the Data Scientist as the Design Thinking leader to post the “rules of the road” for the brainstorming session up on a wall, and to review these with the group.

It’s all common sense, it’s what we learned in kindergarten, but I’ve found it greatly helps the brainstorming session to go smoothly.

The second step in the brainstorming session is for the Data Scientist to post all the artifacts from the Deep Customer Empathy stage up on the wall, and to review each artifact with the group.

Remember we’re talking about simple posters here, so this shouldn’t take a long time. Perhaps 15 minutes.

But I’ve found that this practice is incredibly helpful in keeping everyone grounded and focused throughout the brainstorming session, especially if someone is easily distracted or perhaps has a competing agenda.

The third step in the brainstorming session is generating and filtering solution ideas.

Here’s are the mechanics of facilitating I’ve found to work best:

  • Remind everyone of the problem statement
  • Next have everyone quietly think for 15 minutes, and write down their ideas on stickies
  • Then have each person individually share their ideas, while all the others listen and build on those ideas
  • Next, as a group organize the ideas into themes, and collapse similar ideas into one idea wherever possible
  • Then remind everyone of the criteria for evaluating a proposed solution
  • And finally vote on the best idea

I’ve found that it’s helpful to give each person say 3 voting dots, so that the loudest or highest paid person in the room doesn’t determine the outcome.

The final step of the brainstorming session is for the group to take the best idea and formulate a solution hypothesis:

  • State the best idea succinctly
  • List the leap-of-faith assumptions – the things that must be true for the idea to work, and
  • Identify the minimal experiment that will validate or invalidate those assumptions

For a Data Science project, the leap-of-faith assumptions may be:

  • The availability and condition of data, or
  • The feasibility of building an accurate, fast, and scalable algorithm, or
  • The feasibility of building the machine learning pipeline

At this point, we’ve arrived at the end of the second stage of Design for Delight.

And, at this point, the Data Science project should have gone from “my project” to “our project.

The third and final stage of the Design for Delight process is Rapid Experiments with Customers.

For the Data Science project, this usually translates to rapid prototyping and iterative development.

The role of the Data Scientist in this third stage is to do iterative development starting from the solution hypothesis, and to get feedback on each iteration from the customers, so that you stay grounded in the customers and their problems, you don’t over-engineer or fall in love with your solution, and even more importantly that you maintain the sense of joint ownership.

Of course, the solution that is hardest to over-engineering, and the hardest to fall in love with, is the paper prototype. So I highly recommend it!

As a Data Scientist, you can use rough sketches and storyboards to produce a paper prototype whether your new product is a one-time analysis, or a decision support system, or a decision engine.

It’s just emotionally a lot easier to tear up, throw away, and redo a paper prototype than something that you’ve been polishing even for a day.

There is a very important extension to this last stage of Design for Delight that is specific to Data Science projects.

I’ve found it’s incredibly helpful, very early in a Data Science project, to create a stable methodology and a stable dataset for objectively comparing algorithm A with algorithm B.

And I’ve also found it’s incredibly helpful at that time to also implement the simplest baseline algorithm you can possibly imagine.

This sound so obvious, but its amazing how many times Data Scientists don’t do it, and I constantly have to remind myself to do it.

Having a stable testing methodology and dataset, and a simple baseline algorithm, is a great tonic against over-engineering. Because as you fiddle with your algorithm, you and everyone else can see when you are reaching the point of diminishing returns.

Start with a dumb algorithm, but a smart test.

I grew up in Australia. And, as we say in Australia, at this point we’re almost home and hosed. In case you’re curious, its a horse racing thing. You finish your race, and you get hosed down. So, we’re almost home and hosed. See, you’ve learned something in this talk! Nobody can take that away from you.

Starting from a paper prototype, the Data Scientist iterates to produce higher and higher fidelity solutions, taking the customers along on the journey.

And, you’ll notice the Data Scientist doesn’t work on the nose first, get that exactly perfect, then move on to the eye next, and get that exactly perfect, etc.

The goal is to quickly get something working end-to-end with

  • a basic algorithm
  • basic integrations, and
  • basic user interfaces

Why? Because generally these are where the leap-of-faith assumptions reside.

A lovely secondary benefit of rapid prototyping and iterative development is that it encourages the Data Scientist to use off-the-shelf components, rather than building their own from the ground up, which many of us enjoy doing. I swear if I see another home-grown k-means library, I’ll be on front page of KDD Nuggets for some violent act.

So there you have it, a Design Thinking process tailored to Data Science projects.

Easy, right?

Sadly, not easy, but very worthwhile.

I said earlier that Intuit has an advantage because Design Thinking is already in our DNA. But what if you want to try this process, and Design Thinking is not yet in your organization’s DNA?

Here’s what I’d suggest:

  • Start small
  • Tell people what you are doing and why you’re doing
  • Frame it as a short-term experiment, not as a massive culture shift
  • Start with one small project
  • Role model the process in that project
  • Get a collective win in that project, and
  • Gradually get like-minded people to role model the process in their projects

I hope you take two things away from this talk:

  • A different way of thinking about why Data Science projects fail, and what you can do about that
  • A specific Design Thinking process that can help Data Scientists increase the odds of success for their projects

If you want to try this at home, here is a link to a Design for Delight handbook that you might find helpful.

And of course please feel free to reach out to me with any questions. I’d love to hear your thoughts and experiences around this topic.

Thank you so much for listening, and good luck on your Design Thinking journey!

Also, feel free to connect with me on LinkedIn.

Eric de Oliveira Júnior

IT Manager | Coordenador Desenvolvimento de Soluções Digitais | Leader Project Manager | Machine Learning Researcher

6y
Like
Reply
Sanjib Chaudhury

Defense & Aerospace | Intelligent Transportation Systems | Education

6y

Informative and quite interesting...

Like
Reply
Dnyanesh Bodhe

Director Co-Creation & Design @ WNS | Design Thinking, Digital Transformation, Strategy & Consulting

6y

Hey George, just saw your video and read your article. It's quite brilliant! Especially for me since I'm a design thinker stepping into an analytics firm tasked with bringing design thinking into the organization. Would love get an opportunity to connect with you and have a chat (to pick your brain) sometime if you're open!. Regards. DB

Like
Reply
Badrinarayan Mohapatra

Big Data, BI & Data Analytics SME

7y

Great article.

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics