All posts by Mike Zawitkowski

Mike Zawitkowski helps you create that magical blend of technology and humanity that can change worlds. As a data scientist and expert advisor for leaders across the globe, Mike Zawitkowski has deep expertise in making dramatic improvements to how engaged employees and customers are with your organization.

Less Vain Growth Metrics

screenshot-2016-08-17-15-28-46

(This is part two of a series.  Previously I criticized the use of the “One Metric That Matters” approach to business management. You can read the previous post here.)

Above is a chart from a funding pitch deck from a real startup. Roughly 99% of this chart was altered further to protect everyone involved. This chart was used in the funding deck because it looks like the type of chart you would think every investor would want to see. It goes up and to the right, shows an exponential growth curve. 

In this post I’ll argue that growth is not something you can or should manage with just one metric. As with most aspects of a business, trying to apply the “One Metric That Matters” (OMTM) to a company you wish to be successful is a myopic recipe for failure. Instead you must use a collection of metrics to provide a realistic assessment.

Growth seems pretty simple, right? “The numbers need to go up and to the right” in a scale that suggests you and your company are “riding a rocket.” Perhaps you think you “have the bull by the horns” or the “tiger by the tail.” It sounds exciting, a little terrifying, but positive, right?

Vanity and Theater

We as a culture are obsessed with growth, and the media has fanned this obsession with what Lean Startup author Eric Ries refers to in his writing and speaking as “vanity metrics” employed in the service of “success theater.” This is the publicizing of some number that is so large as to appear sensational. Its purpose is to give the press something to write about, entice prospective investors, and make your competitors jealous. The truth is that the number means nothing, and certainly provides no insights that your competition can use against you. 

One of my favorite recordings from the Commonwealth Club of California is this conversation between Eric Ries and 500 Startups founder Dave McClure. The example they describe is that if you are talking about having a million clicks to your website a week, that could be from one million individuals that each visit once and then abandon it, never again to return. Those million clicks could also be a single user who is clicking a million times over and over (probably a bot, or your mom).

It’s true that there are some stories of incredible startups that became giant, like Instagram and Facebook, and in retrospect we can look at the history of the growth of those companies and see giant numbers. We make the mistake of believing that giant numbers cause a huge success, when more accurately it is the huge success that produces giant numbers as a bi-product.

For every company that was as successful as Instagram or Facebook (that’s a very small club), there are hundreds that started out similarly but died. We can look through the dead pool and find examples of startups that everybody thought would have been the next big thing. Here’s an article by Business Insider reporting on what happened to some particularly overhyped companies. 

Opportunity Costs

Let’s return to the chart from earlier. The main problem with this chart is that it is based on a single number, cumulative signups. It is impossible for a cumulative signup number to decrease over time, unless your company is proactively deleting accounts and factoring that into the equation. Success theater, however, is an arms race, and no company is going to intentionally take an action to make this chart look less sensational by doing something like deleting users. This is the same problem with metrics like total installs or total number of app store downloads, or any number that is aggregated and totalled like this.

What’s worse, spending your precious time, talent, and dollars on activities that serve success theater and vanity metrics becomes very expensive when you consider the opportunity costs involved. A startup that chooses options that make the numbers more sensational over what makes for a better customer experience is a startup that has gotten seduced into providing a sub-par software stack, database architecture, user features, and company workflow. 

A Better Way

Stop being so vain! Instead of spending any time on these sort of vanity numbers to track and flaunt your “growth,” there’s a better way. Below are two sets of analysis that in the long run will provide you with a clearer, better picture of the growth and health of your company:

Cohort Analysis Chart

 

Cohort Analysis

 Look at the behavior of users over time. An easier way to do this than looking at them individually is to look at them in cohorts. A cohort in this context is grouping users by time based on a milestone. It’s like a graduating class. If you are making weekly changes to your product, try grouping individual users into weekly cohorts. These might be individuals who completed the gauntlet of your signup process using the set of features that were available at that time. For a slower moving company, you may look at a monthly view. The important thing is to make sure that the size of each cohort is statistically significant. A weekly cohort of a total of just 3 users isn’t going to give you information you can trust.

If a new feature is added that improves your product that improvement will be reflected in the numbers for that cohort. If you break the experience or just make it worse somehow, then you’ll see that change because your future cohorts will not look as good for as long as the ones in the previous week.

There’s more about cohort analysis I can and should discuss but I’ll save it for another post. One of my favorite references for generating cohort analysis in Python is this post by Greg Reda. Please comment or contact me if you have specific questions about cohort analysis and I’ll be glad to help.

DAU:MAU

This is one of my favorite measurements. It is incredibly easy to do.  Also, there is a wealth of publicly available data that can be used to compare your metrics with other companies. It seems to be fairly robust across industries too, although you should always check this before blindly trusting external data. DAU is an acronym for “daily active users.” MAU stands for “monthly active users.”

The way this works is that each 24-hour period you count the total number of users that engaged your product. Assuming you are logging a user id number as well as a timestamp, this is fairly easy to calculate. If a user id appears inside of a 24-hour window, you add it to that day’s tally. Every user only gets counted once in each day. That means if a user clicks 1,000 times in a single day, they still are only counted one time.

Then you do this same analysis again, but instead of a 1-day window, you look at a 30-day window. That gives you your MAU number. To calculate this, some average the last thirty days of DAU and stick with calendar months. I’ve also seen this calculated as taking a single day and then calculating backwards for the last 30 days. I prefer the latter approach.

Finally you simply divide your DAU by your MAU and you get a ratio. That result is your DAU:MAU ratio, and it is a number I believe is very useful. I’m not alone in this assessment; you can read more about DAU:MAU herehere and here.

Another Disclaimer

It’s important to note that you should use both approaches, and more. Paying attention to DAU:MAU but not doing cohort analysis will make it difficult for you to understanding the longevity and retention of your users. Conversely, ignoring DAU:MAU to focus only on cohort analysis means you’ll miss spikes or drops in behavior.

Besides the above examples there are other important indicators of the health of the business that your company should monitor. For instance, how comfortable are you with the balance sheet? Have you considered alternative runway scenarios in your forecast?

Despite this, if you are reviewing DAU:MAU plus cohort analysis charts regularly, you are doing better than at least 80% of companies I’ve worked with. Do this now so that when we work together we can jump past the basics.

 

That’s it, for now. I hope this encourages you and your company to do less playing around with vanity metrics and success theater. I hope it will lead to you seeing a more accurate picture of the growth and health of your company. If you like this post and would like to see more like it, please let me know!

 

The Deadly “One Metric That Matters”

Have you heard of the “one metric that matters” (OMTM)? This concept was first introduced to me through the book Lean Analytics, by Alistair Croll and Ben Yoskovitz.

I have a lot of respect for Ben and Alistair. I’m a proud owner of two copies of Lean Analytics. I keep one copy at home and one at the office because I refer to them at least every couple of months.

It’s a good book with solid advice save a few exceptions. The OMTM concept is one of those exceptions. In a series of posts, I’ll explain why the OMTM is dangerous advice, starting with why OMTM could leave you high but dry.

OMTM vs TMTM

When I was checking Google for what others say about the OMTM, I came across this one post from an entrepreneur who went through Y Combinator. This blogger cites an essay by the YC co-founder Paul Graham, “Startup = Growth,” as evidence that the one metric that matters is the real deal.

This blogger and Paul Graham are not alone in their opinion about the importance of growth for startups. There are many in Silicon Valley who advocate that growth is the fundamental characteristic that separates a startup from any other type of business endeavor. In most cases I would agree with this sentiment but would also stress that focusing on growth to the exclusion of all other metrics is one of the fastest ways to kill a company. There is a second metric that can’t be ignored and it’s best described by Graham’s idea of “cockroach mode” which is somewhat contradictory, or complementary, to OMTM.

In a guide to investors Paul Graham defines what it means for a startup to operate in “cockroach mode”:

Apparently the most likely animals to be left alive after a nuclear war are cockroaches because they’re so hard to kill. That’s what you want to be as a startup, initially. 

Graham’s definition of cockroach mode is that you can operate on so little money that it won’t die even in the nuclear winter of funding. The ability for a startup to manage its working capital is extremely important and guess what, this is the other metric. You can’t just spend all of your investment on a single month of growth because the growth metric is your OMTM. You also can’t have capital as your OMTM, ignore growth, and expect to make it to Series A or become a profitable business. Balancing both of these metrics together is the real way to achieve what I’m calling the “two metrics that matter” (TMTM).

Soylent as the TMTM 

Have you heard of Soylent from Rosa Labs? The story of this startup that succeeded in pulling off a massive pivot was thanks to the clever use of TMTM. The story published by Inc. summarizes how co-founder Rob Rhinehart began to resent the cost of food and how it was eating away what little startup capital they had to work with while they were in YC. Despite being a wireless technology startup, Rhinehart decided to boost both the capital metric by making low-cost food.  This startup wasn’t created because they focused on growth. This startup was born out of necessity as they watched the bank account balance shrink week after week. It was a solution intended to resolve the capital issue for the original wireless startup idea, that ended up having a growth metric of its own.

Blindsided by OMTM

Here is an example where TMTM narrowly saved a startup from financial ruin. I met a startup that almost died adhering to the OMTM mantra. This hardware startup was founded by a young team of first-time entrepreneurs. These co-founders had secured a couple million dollars to build the startup when they were still pre-revenue. They had been spending the money for the better part of a year building prototypes, dreaming big, and even getting customers but they weren’t tracking the capital metric. Fortunately, a friend of mine who is a CFO was brought in to help the company. One of his first questions was, ‘Where are you keeping track of your spending? Are you using QuickBooks?’

The answer was no, they were not using any bookkeeping solution. The focus was on getting customers: the growth metric. Two weeks into the job, this CFO quickly set up their accounting system, populated it with the necessary data, and made a discovery. He sat down with the young lions to share this news: ‘At your current rate of spending, you will be bankrupt in two months. Did you realize that last month when you spent this much money?’

The team did not realize they were burning so quickly through their cash, nor did they realize they were two months away from empty bank accounts. The new mission for this very promising startup was to throw out the OMTM and balance their metrics by meeting with investors to extend the runway.

This is why the one metric that matters philosophy is dangerous, particularly to new startup founders. We want entrepreneurs to succeed. Successful startups grow into big companies and add to our economy. Do not let hyper-focused startup founders use the OMTM to burn unnecessary holes in the wallets of investors . We can prevent startups and investors from getting burned by teaching them the collection of important metrics that measure the health of any business, including rapid-growth startups.

Conclusion

I wish Ben and Alistair had called their theory the “most important metric” (MIM). Another good, if lengthy, option would have been the “one metric that matters provided your other metrics look good too” (OMTMPYOMLGT). At the very least, we need TMTM. Unfortunately, I couldn’t even find any wiggle room in the OMTM in the literature. It suffers from as much tunnel-vision as the founders that employ it.

I’m urgently cautioning you, dear reader, of the OMTM danger. You simply cannot be successful by focusing on one metric at the exclusion of other business metrics. There’s a reason why the income statement, cashflow statement, and balance sheet are all important to managing a business. All three are used in conjunction to tell the story. It’s possible to have a great looking income statement (aka Profit and Loss statement) and still go out of business quickly for lack of cash.

In the next post in this OMTM series, I’ll describe how even growth is not one metric. It must be a collection of metrics that provide a realistic assessment of viability. Put down the rose-colored glasses of OMTM.

 

3 Stolen Analytics Team Workflows

Data science team photos were scarce, so here's a very serious looking photo of the 1876 Yale Bulldogs.
I googled “data science team photo” and found this pic of the 1876 Yale Bulldogs, national champions. Apparently this predates photographers yelling, “Look at the camera, and say cheese!” (Courtesy Wikipedia)

TL;DR: Data science teams don’t need to create a new way to work together. It’s better to steal ideas on how to collaborate from older, more established disciplines. Below are three possible models for your data science team to improve collaboration on your next data science project.


Obama on Data Science Team Sports

At Strata + Hadoop 2015, our Commander-in-Chief had a very important message to share with us data scientists. In this video presentation, President Obama decreed, “data science is a team sport.”

Data science is no different than any other activity where multiple brains are better than one. Its close relative, software engineering, has already explored and established ways to work together as a team. We don’t need to re-invent the wheel. We can borrow and steal collaborative approaches from those disciplines that struggled before us.

Here are three examples you can use to improve data science team effectiveness, or simply how to better collaborate with others on your next analytics project.

Relay Race Pipeline Model

This model is easy to understand, easy to implement, but has some drawbacks. It works well when there are clear parts of the pipeline, like a beginning, a middle, and an end. In this example, you would have three people: each responsible for one of the three parts, as in the beginning, middle and end. You want to make someone is responsible for each leg of the relay race to get your project across the finish line.

The problem is that data science projects rarely work in one direction in the real world. Just like software projects attempting waterfall, they get messy, and often need to back-track. That’s OK though: the benefit of the clear distinction of one person in charge of the front end visualization part, and another person in charge of the cleaning and munging, and another responsible for getting the raw original source data fed into the top of the pipeline, makes the communication easy. It’s more about the roles, and ignore the other parts of the relay race metaphor.

Microservices Delegation Model

This is an upgrade from the relay race, because it’s multi-dimensional. I credit Tom DeMarco for clearly explaining this.

There’s an author named Tom DeMarco who wrote Peopleware and The Deadline. In one of those books (and possibly both), DeMarco argues that work should be delegated in an isometric fashion, meaning that if you have a blob of 100% of a project, it should be carved up into sub-components that are small and defined well enough for each sub-component to be owned by a single person.

I would argue that what DeMarco wrote about is the basis for microservice architectures—except that he wrote about the concept many, many years before that word became all the hype it is today.

Another way to imagine this is that each part of your project is a sub-component, and that sub-component is a black box that accepts some input and provides some output. The inputs and outputs don’t have to be in any particular order. For instance, someone can make a little mini-app that simply receives a zipcode and a latitude and longitude as input. This app can return true if they match, otherwise false. A tight little single-purpose program is modular, and makes it easy to use in future projects, too.

Open Source Model

This is the most sophisticated of the three examples, but perhaps the most important. The software world has enabled people to work in a collaborative setting with distributed teams, making great software, for many years. There are successes and failures from that experience that are worth considering when trying to work with as a team on data science.

This is even more important when attempting to work with volunteers, such as Linux, or Firefox, or Code For America, where I have the honor of being a part of the Data Science Working Group of the SF Brigade.

I really like considering the world of open source as the way to effectively collaborate with other people. Someone makes a copy of your code or analytics package, and then makes changes to it, and then shares it with you. You as the owner of your original masterpiece can choose to accept their suggestion, or ignore it. If you accept it, that change is merged automatically. You don’t have to futz with the code. Or, if you have to futz with the code, you can make that as an “issue” and someone who wants to be considered a contributor can put their name down.

Version Control is Mandatory

I’ve heard an argument against this approach regarding working on volunteer projects like we do at Code For America, saying that when you have volunteers who may be entry-level analysts and data scientists, making the use of version control or a particular service like Github is an unnecessary barrier to entry. At the same time, these same individuals that think Github is too complicated are usually also the ones that struggle with merging everyone’s analytical contributions at the end of the project.

I would challenge the arguments against volunteers learning version control with this. A data scientist is jokingly defined as one who knows more statistics than the average programmer, and more programming than the average statistician. There are certain skills you must learn in order to be a data scientist. For example, you must learn a programming language like R or Python. You must understand basic statistics. I would also argue that understanding version control is just as important.

You may not make a lifelong career out of data science. Regardless, understanding the basics of version control will give you a major advantage over those who do not. If you know how to contribute to an open source project with a system like Github, you have a powerful skill to add to your resume. If you actually make contributions to the open source community, that is great experience. Are volunteering with team like we have at Code For San Francisco, is it because you want to learn? Are you volunteering because you want to collaborate with other great people? Why would you limit yourself by choosing not to learn to use a tool as powerful as version control?

There are plenty of resources that explain how Github works, and how to contribute to open source with Github. There are also plenty of articles about how to get started with open source in general. By the way, you don’t need to be a programmer to contribute to open source projects. By another way, Github is not the only version control option out there, but arguably the most popular.

Better Teams are Thieves

The point of this post was to jog your brain about theft. Steal ideas about the success and failure of those disciplines that already figured out how to play as a team. Learn how non data science models they work, and emulate a couple of parts that will make your next project better. This isn’t just going to improve data science. Being a thief of teamwork models will reward you in your future in unexpected ways. In the meantime, go forth and be more successful, and a happier collaborator on your next project.