Using ChatGPT for a job search: what worked, didn’t, and what’s dangerously bad

(I didn’t use ChatGPT for any part of writing this, and there’s no “ha ha actually I did” at the end)

This year, I quit after three years during which I neglected updating my resume or online profiles, didn’t do anything you could consider networking (in fairness, it’s been a weird three years) — all the things you’re supposed to keep up on so you’re prepared, I didn’t do any of it.

And a product person, I wanted to exercise these tools and so I tried to use them in every aspect of my job search. I subscribed, used ChatGPT 4 throughout, and here’s what happened:

ChatGPT was great for:

Rewriting things, such as reducing a resume or a cover letter
Interview prep

It was useful for:

Comparing resumes to a job description and offering analysis
Industry research and comparison work

I don’t know if it helped at:

Keyword stuffing
Success rates, generally
Success in particular with AI screening tools

It was terrible, in some cases harmful, at:

Anything where there’s latitude for confabulation — it really is like having an eager-to-please research assistant who has dosed something
Writing from scratch
Finding jobs and job resources

This job search ran from May until August of 2023, when I started at Sila.

An aside, on job hunting and the AI arms race

It is incredible how hostile this is on all sides. As someone hiring, the volume of resumes swamped us, many of which are entirely irrelevant to the position, no matter how carefully crafted that job description was. I like to screen resumes myself, and that meant I spent a chunk of every day scanning a resume and immediately hitting the “reject” hotkey in Greenhouse.

In a world where everyone’s armed with tools that spam AI-generated resumes tailored to meet the job description, it’s going to be impossible to do. I might write a follow-up on where I see that going (let me know if there’s any interest in that).

From an applicant standpoint, it’s already a world where no response is the default, form responses months later are frequent, and it’s neigh-impossible to get someone to look at your resume. So there’s a huge incentive to arm up: if every company makes me complete an application process that takes minimum 15 minutes and then doesn’t reply, why not use tools to automate that and then apply to every job?

And a quick caution about relying on ChatGPT in two ways

ChatGPT is unreliable right now, in both the “is it up” sense and the “can you rely on results” sense. As I wrote this, I went back to copy examples from my ChatGPT history and it just would not load them. No error, nothing. This isn’t a surprise — during the months I used it, I’d frequently encounter outages, both large (like right now) and small, where it would error on a particular answer.

When it is working, the quality of that work can be all over the place. There are some questions I got excellent responses to that as I check my work now just perform a web search that’s a reworded query, follow a couple links, and then summarize whatever SEO garbage they ingested.

While yes, this is all in its infancy and so forth, f you have to get something done by a deadline, don’t depend on ChatGPT to get you there.

Then in the “can you rely on it sense” — I’ll give examples as go, but even using ChatGPT 4 throughout, I frequently encountered confabulation. I heard a description of these language models as being eager-to-please research assistants armed with wikipedia and tripping on a modest dose of mushrooms, and that’s the best way to describe it.

Don’t copy paste anything from ChatGPT or any LLM without looking at it closely.

What ChatGPT was great for

Rewriting

I hadn’t done a deep resume scrub in years, so I needed to take add my last three years in and chop my already long and wordy resume down to something humans could read (and here I’ll add if you’re submitting to an Application Tracking System, who cares, try and hit all the keywords) add that in and keep the whole thing to a reasonable length – and as a wordy person with a long career, I needed to get the person-readable version down to a couple pages. ChatGPT was a huge help there, I could feed it my resume and a JD and say “what can I cut out of here that’s not relevant?” Or “help me get to 2,000 words” and “this draft I wrote goes back and forth between present and past tense, can you rewrite this to past tense.”

I’d still want to tweak the text, but there were times where I had re-written something so many times I couldn’t see the errors, and ChatGPT turned out a revision that got me there. And in these cases, I rarely caught an instance of facts being changed.

Interview Prep

I hadn’t interviewed in years, either, and found trying to get answers off Glassdoor, Indeed, and other sites was a huge hassle, because of forced logins, the web being increasingly unsearchable and unreadable, all that.

So I’d give ChatGPT something along the lines of

Act as a recruiter conducting a screening interview. I’ll paste the job description and my resume in below. Ask me interview questions for this role, and after each answer I give, concisely offer 2-3 strengths and weaknesses of the answer, along with 2-3 suggestions.

This was so helpful. The opportunity to sit and think without wasting anyone’s time was excellent, and the evaluations of the answers were helpful to think about. I did practice where I’d answer out loud to get better at giving my answer on my feet, I’d save good points and examples I’d made to make sure I hit them.

I attempted having ChatGPT drill into answers (adding an instruction such as “…then, ask a follow-up question on a detail”) and I never got these to be worthwhile.

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Job descriptions are long, so boring (and shouldn’t be!), often repetitive from section to section, and they’re all structured just differently enough to make the job-search-fatigued reader fall asleep on their keyboards.

I’d paste the JD and the latest copy of my resume in and say “what are the strengths and weaknesses of this resume compared to this job description?” and I’d almost always get back a couple things on both side that were worth calling out, and why:

“The job description repeatedly mentions using Tableau for data analysis work, and the resume does not mention familiarity with Tableau in any role.”

“The company’s commitment to environmental causes is a strong emphasis in the About Us and in the job description itself, while the resume does not…”

Most of these were useful for tailoring a resume: they’d flag that the JD called for something I’d done, but hadn’t included on my resume for space reasons since no one else cared.

It was also good at thinking about what interview questions might come, and what I might want to address in a cover letter.

An annoying downside was frequently flagging something based that a human wouldn’t — I hadn’t expected this from the descriptions of how good LLMs and ChatGPT were at knowing that “managing” and “supervising” were pretty close in meaning. For me, this would be telling me I hadn’t worked in finance technology, even though my last position was at a bank’s technology arm. For a while, I would say “you mentioned this, but this is true” and it would do the classic “I apologize for the confusion…” and could offer another point, but it was rarely worth it — if I didn’t get useful points in the first response, I’d move on.

Industry research and comparison work

This varied more than any other answer. Sometimes I would ask about a company I was unfamiliar with and ask for a summary of its history, competitors, and current products, and I’d get something that checked out 100%, was extremely helpful. Other times it was understandably off — so many tech companies have similar names, it’s crazy. And still other times, it was worthless: the information would be wrong but plausible, or haphazard or lazy.

Figuring out if an answer is correct or not requires effort on your part, but usually I could eyeball them and immediately know if it was worth reading.

It felt sometimes like an embarrassed and unprepared student making up an answer after being called on in class: “Uhhhh yeahhhhh, competitors of this fintech startup that do one very specific thing are… Amazon! They do… payments. And take credit cards. And another issssss uhhhhh Square! Or American Express!”

Again, eager-to-please — ChatGPT would give terrible answers rather than no answer.

I don’t know if ChatGPT helped on

Keyword stuffing

Many people during my job search told me this was amazingly important, and I tried this — “rewrite this resume to include relevant keywords from this job description.” It turned out what seemed like a pretty decent, if spammy-reading, resume, and I’d turn it in.

I didn’t see any difference in response rates when I did this, though my control group was using my basic resume and checking for clear gaps I could address (see above), so perhaps that was good enough?

From how people described the importance of keyword stuffing, though, I’d have expected the response rate to go through the roof, and it stayed at basically zero.

Success rates, generally and versus screening AI

I didn’t feel like there was much of a return on any of this. If I hadn’t felt like using ChatGPT for rewrites wasn’t improving the quality of my resumes as I saw them, I’d have given up.

One of the reasons people told me to do keyword stuffing (and often, that I should just paste the JD in at the end, in 0-point white letters — this was the #1 piece of advice people would give me when I talked to them about job searching) was that everyone was using AI tools to screen, and if I didn’t have enough keywords, in the right proportion, I’d get booted from jobs.

I didn’t see any difference in submitting to the different ATS systems, and if you read up on what they offer in terms of screening tools, you don’t see the kind of “if <80% keyword match, discard” process happening.

I’d suggest part of this is because using LLMs for this would be crazy prejudicial against historically disadvantaged groups, and anyone who did it would and should be sued into a smoking ruin.

But if someone would do that anyway, from my experience here having ChatGPT point out gaps in my resume where any human would have made the connection, I wouldn’t want to trust it to reject candidates. Maybe you’re willing to take a lot of false negatives if you still get true positives to enter the hiring process, but as a hiring manager, I’m always worried about turning down good people.

There are sites claiming to use AI to compare your resume to job descriptions and measure how they’re going to do against AI screening tools — I signed up for trials and I didn’t find any of them useful.

Things ChatGPT was terrible at

Writing from scratch

If I asked “given this resume and JD, what are key points to address in a cover letter?” I would get a list of things, of which a few were great, and then I’d write a nice letter.

If I asked ChatGPT to write that cover letter, it was the worst. Sometimes it would make things up to address the gaps, or offer meaningless garbage in that eager-to-please voice. The making things up part was bad, but even when it succeed, I hate ChatGPT’s writing.

This has been covered elsewhere — the tells that give away that it’s AI-written, the overly-wordy style, the strange cadence of it — so I’ll spare you that.

For me, both as job seeker and someone who has been a hiring manager for years, it’s that it’s entirely devoid of personality in addition to being largely devoid of substance. They read like the generic cover letters out of every book and article ever written on cover letters — because that’s where ChatGPT’s pulling from, so as it predicts what comes next, it’s in the deepest of ruts. You can do some playing around with the prompts, but I never managed to get one I thought was worth reading.

What I, on both sides of the process, want is to express personality, and talk about what’s not on the resume. If I look at a resume and think “cool, but why are they applying for this job?” and the cover letter kicks off with “You might wonder why a marine biologist is interested in a career change into product management, and the answer to that starts with an albino tiger shark…” I’m going to read it, every time, and give some real thought to whether they’d be bringing in a new set of tools and experiences.

I want to get a sense of humor, of their writing, of why this person for this job right now.

ChatGPT responses read like “I value your time at the two seconds it took to copy and paste this.”

And yes, cover letters can be a waste of time. Set aside the case where you’re talking about a career jump — I’d rather no cover letter than a generic one. A ChatGPT cover letter, or its human-authored banal equivalent, says the author values the reader’s time not at all, while a good version is a signal that they’re interested enough to invest time to write something half-decent.

Don’t use ChatGPT to write things that you want the other person to care about. If the recipient wants to see you, or even just that you care about the effort of your communication, don’t do it. Do the writing yourself.

For anything where there’s latitude for confabulation

(And there’s always latitude for confabulation)

If you ask ChatGPT to rewrite a resume to better suit a job description, you’ll start to butt up against it writing the resume to match the job description. You have to watch very closely.

I’d catch things like managerial scope creep: if you say you lead a team, on a rewrite you might find that you were in charges of things often associated with managing that you did not do. Sometimes it’s innocuous: hey, I did work across the company with stakeholders! And sometimes it’s not: I did not manage pricing and costs across product lines, where did that come from?

The direction was predictable, along the eager-to-please lines — always dragging it towards what it perceived as a closer match, but it often felt like a friend encouraging you to exaggerate on your resume, and sometimes, to lie entirely. I didn’t like it.

When I was doing resume rewriting, I made a point to never use text immediately, when I was in the flow of writing, because I’d often look back at a section of the resume and think “I can’t submit that, that’s not quite true.”

That’s annoying, right? A thing you have to keep an eye on, drag it back towards the light, mindful that you need to not split the difference, to always resist the temptation to let it go.

Creepy. Do not like.

In some circumstances it’s wild, though — I tried to get fancy with it and have it ask standard interview questions and then, based on my resume, answer as best it could. I included a “if there’s no relevant experience, skill, or situation in the resume, please say you don’t know” clarification. And it would generally do okay, and then asked about managing conflicting priorities, described a high-stakes conflict between the business heads and the technology team where we had to hit a target but we had to do a refactor, and ChatGPT entirely made up a whole example situation that followed the STAR (situation, task, action, response) model for answering, with a happy conclusion for everyone involved.

Reminded that that didn’t happen and to pass on questions it didn’t have a good response to, ChatGPT replied “Apologies for the confusion, I misunderstood the instructions…” and then restated the clarification to my satisfaction, and we proceeded. It did the same thing two questions later: totally made up generic example of a situation that could have happened at my seniority level.

If I’d just been pasting in answers to screener questions, I’d have claimed credit for results never achieved, and been the hero in crises that never occurred. And if I’d been asked about them, they’re generic enough someone could have lied their way though it for a while.

No one wants to be caught staring at their interviewer when asked “this situation with the dinosaur attack on your data center is fascinating, can you tell me more about how you quarterbacked your resiliency efforts?”

My advice here — don’t use it in situations like this. Behavioral questions proved particularly prone, but any time there was a goal like “create an answer that will please the question-asker” strange behavior started coming out of the woodwork. It’s eager to please, it wants to get that job so so badly!

Finding for jobs and job resources

Every time I tried looking for resources specific to Product Management jobs, the results were garbage “Try Indeed!” I’d regenerated and get “Try Glassdoor and other sites…” In writing this I went back to try again, and it’s now only almost all garbage still —

LinkedIn: This platform is not only a networking site but also a rich resource for job listings, including those in product management. You can find jobs by searching for “product management” and then filtering by location, company, and experience level. LinkedIn also allows you to network with other professionals in the field and join product management groups for insights and job postings.

But… regenerating the response amongst the general-purpose junk I got it to mention Mind the Product, a conference series with a job board, after it went through the standard list of things you already know about. Progress?

I got similarly useless results, when I was looking for jobs with particular fields, like climate change or at B-corps (“go find a list of B-corporations!”). It felt frustratingly like it wasn’t even trying, which — you have to try not to anthropomorphize the tool, it’s not helpful.

It is though another example of how ChatGPT really wants to please: it does not like saying “I don’t know” and would rather say “searching the web will turn up things, have you tried that?”

What I’d recommend

Use the LLM of your choice for:

Interview preparation, generally and for specific jobs
Suggestions for tailoring your resume
Help editing your resume

And keep an eye on it. Again, imagine you’ve been handed the response by someone with a huge grin, wide eyes with massively dilated pupils, an expectant expression, and who is sweating excessively for no discernible reason.

I got a lot out of it. I didn’t spend much time in GPT 3.5, but it seemed good enough for those tasks compared to GTP4. When I tried some of the other LLM-based tools, they seemed much worse — my search started May 2023, though, so obviously, things have already changed substantially.

And hey, if there are better ways to utilize these tools, let me know.

Where Reddit’s gone wrong: 3rd party apps are invaluable user research and a competitive moat, not parasites

Leave a reply

By supporting the ability of anyone to build on top of Reddit’s platform, Reddit created an invaluable user research arm that also provides a long-term competitive advantage by keeping potential competitors and their customers contributing to Reddit. This an incredibly difficult thing to do, and they seem suddenly blind to why it was worth it.

In a recent Verge interview with the CEO Steve Huffman:

PETERS: I want to stop you for a second there. So you’re saying that Apollo, RIF, Sync, they don’t add value to Reddit?

HUFFMAN: Not as much as they take. No way.

(and I’m going to ignore for the moment questions on how they’ve handled this, monetization, and so on, focusing only on this core value they’ve created and are destroying)

A vast community of people all working on new designs, development innovations, and approaches, responding immediately to user feedback to try new things – compare this to what you have to do internally.

Every company I’ve been at has a limited user research budget to discover their customers and their needs, and as limited room to get feedback on possible solutions by building prototypes or even showing paper drawings. To entirely focus on new ideas? You might be lucky to get a Hack Day once a quarter.

If you have a thriving third party development community, you have an almost unlimited budget for all of these things, happening immediately, and on a hundred, a thousand different ideas at any one time, and those ideas are beyond what you might be able to brainstorm.

It’s a dream, and once you’ve done the hard work of getting the ecosystem healthy, it does it on its own. Anything you want to think about you’ll find someone has already broken the trail for you to follow, and sometimes they’ve built a whole highway.

You can think small, like “how can we make commenting easier?” There will be a half-dozen different interpretations of what comment threading should look like, and you have the data to see if those changes help people comment more, and if that in turn makes them more engaged in conversation.

And it goes far beyond that, to entirely new visions of how your product might work for entirely new customers.

If you’re sitting around the virtual break room and someone says “what if we leaned into the photo sharing aspect, and made Reddit a totally visual, photo-first experience?” in even the best company you’re going to need to make a case to spend the time on it, then build it, figure out how to get it cleared with the gatekeepers of experimentation…

Or if you have a 3rd party ecosystem as strong as Reddit’s, you can type “multireddit photo browser” or something into a search engine and tada, there you go, a whole set of them, fully functional, taking different approaches, different customer groups. I just did that search and there’s a dozen interesting takes on this.

Every different take on the UX, and every successful third-party application is a set of customer insights any reasonable company would pay millions for. Having a complete set of APIs publicly available lets other people show you things you might not have dreamed possible (this is also a hidden reason why holding back features or content from your APIs is more harmful that it initially seems).

Successful third party applications give you insight into:

A customer group
What they’re trying to do
By comparison, how you’re failing to give it to them
A minimum number to what they’re willing to pay to solve that problem

Even when these applications don’t discover something that’s useful – say someone builds a tool that’s perfect for 0.1% of the user base, but that tool requires a lot of client-side code, so it’s just not worth it to bring that into the main application. It’s still a huge win, because those users are still on the platform, participating in the core activities that make the system run, building the network effects (and, because you’re a business, making money in total).

And if those developers of these niche apps ever hit gold and start to grow explosively, you’ll see it, and be able to respond, far earlier than you would if they weren’t on your platform.

That’s great!

The biggest barrier for any challenger app isn’t the idea, or even the design and execution, it’s attracting enough users to be viable, and surviving the scale problems if it does start to grow. By supporting a strong third party application ecosystem, you’re ensuring that they never solve those problems – their user growth is your user growth. They don’t have to solve the problem of solving the scaling infrastructure because you did. It will always make short-term sense to stay with you.

Instead of building competitors, you’re building collaborators, who will be pulling you to make your own APIs ever-better, who are working with you and contributing to the virtuous cycle at the heart of a successful user-based product like Reddit.

I know, from the outside we just don’t get it. Reddit’s under huge pressure to IPO, and the easy MBA-approved path to a successful IPO is ad revenue, which means getting all those users on the first-party apps, seeing the ads, harvesting their data, all that gross stuff. And we can imagine that the people pushing this path to riches look at all of these third party apps and say “there’s a million people on Apollo, if they were on our app, we’d make $5m more in ad revenue next year.”

This zero-sum short-sighted thinking may not be the doom of Reddit – they may well shut down all the third-party apps and survive the current rebellion of moderators and users (and the long-term effects of their response to it).

It was and could have been such a beautiful partnership, where Reddit thrived learning, cooperating with, and improving itself along with its outside partners. As this developer community now looks to rebuild around free and decentralized platforms like Mastodon, it’s easy to see how Reddit’s lost ecosystem might eventually return to topple them.

How human brains drive anti-customer design decisions on shopping sites

Leave a reply

Or, “The reason no one strictly obeys your shopping filters (the reason is money)”

Why do sites sometimes disobey filters? Often only a little bit, but noticeably, enough that it feels like an obstinate toddler testing your boundaries?

“You said you wanted a phone that was in stock and blue, huh? Got so many of those!”

“I’ll lead off by showing you some white phones that are really cheap… and hey if you want to narrow it down further, try narrowing it down –“

“Then I’ll show you phones that are blue. Mostly. More than this result set at least.”

I have cracked from frustration yelled “I told you morning departures!” while searching for flights at a travel site that employed me to work on those shopping paths.

So why? Why does everyone do this when it annoys us, the shopper?

Because our brains don’t work right, and we’re not rational beings, it ends up forcing everyone to cater to irrational cognitive biases to compete. I’ll focus here on availability and price, and in travel, because that’s where I have the most experience, but you’ll see this play out everywhere.

The worst thing from a website’s view is for you to think they don’t have what you want, or that you do and it’s too expensive, and this drives almost all the usability compromises you see that cause you to grind your teeth. And from the perspective of the people who run the website, they know — and they have to keep doing it.

Let’s start with availability. Few sites brag about the raw number of items they stock any more, but the moment you start shopping, they want you to know they have everything you could possibly be looking for. They want you to not bother shopping elsewhere.

Even when a site wants to present a focused selection, that they might not have a million things, they want you to think they have all of that specific niche.

Tablet Hotels focuses on expert-selected, boutique hotels. And here’s them walking you through their selection:

Do you believe there are 161 hip golf hotels? I didn’t. 161 hip golf hotels seems like it’s all the hip golf hotels that might be curated by hotel experts at the MICHELIN Guide(tm).

The desire to seem like they have all the available things makes sites compromise to make the store shelves seem full:

You search for dates and you get places that have partial availability
You search on VRBO for a place and get 243 results, all “unavailable”
You search for a location and get 3 in the city and then results from increasingly far away until it gets to a couple hundred results

As long as they can keep you from thinking “ugh, they don’t have anything” they’re winning — because the next time you’re shopping, you will shop where you think there’s the most selection.

They must also appear the cheapest. Our brains are terrible about this (see: the anchoring effect), and it creates a huge incentive to do whatever you have to in order to have the cheapest price even if it is irrelevant.

This sounds crazy, but I’m here to tell you having spent a wild amount of time and money doing user studies in my shopping site career, if someone’s shopping for non-stop flights between Los Angeles and Boston, and

Site A leads with a $100 14-hour flight that stops in Newark, Philadelphia, then La Guardia to give you the highest possible chance at further delays, followed by ten non-stop results for $200
Site B shows the same ten non-stop results for the same $200

Shoppers will rate Site A as being less expensive.

I have sat in on sessions where I wanted to scream “but you wrote down the same prices for the flight you ended up picking!” I have asked people why they thought that, and they’ll say “they had the lower prices” even though that lower price was junk. They will buy from that site, and return to shop there first next time.

It’s incredibly frustrating, and it happens that session, and the next, it’s not 50% of people in sessions — it’s 75, 90%. We all think we’re savvy customers, but our brains… our brains want to take those shortcuts so badly.

This drives even worse behavior, like “basic economy” — if an airline can get a price displayed that makes it look like it’s the cheapest, even if after adding seat selection, a checked bag, free access to the lavatory the person will pay far more than a normal ticket on a different airline, they’re going to be perceived as the better value, and the less expensive airline, in addition to having a better chance to make that sale because fewer people will go to the trouble of making all the add-ons and then comparing the two.

(And even then, and I swear this is true, once a shopper’s brain has “Airline A is cheaper” there is a very good chance even if they price out the whole thing, taking notes on a pad of paper next to their computer, when they do the math that shows Airline B is cheaper for what they need, they will get all consternated, scrunch their face, and say “well that can’t be right”, at which point there’s a crash in the distance as a product manager throws a chair in frustration.)

All of this combines to put anyone working on the user experience of a site in an uncomfortable situation:

Do we show a junk result up top that shows that we could get the lowest price possible, even though it’s not at all what the customer asked for, or
Do we lose the customer’s sale to the competitor who does show that result, and also risk them not shopping with us in the future?

The noble, user-advocate choice means the business fails over the long-term, and so eventually, the business puts junk in there.

So what do we do, as people who care about users and want to minimize this, do?

We can start by trying. It’s easy to sigh, give in, make the results set “get result set for filters, then throw the cheapest option at #1 no matter if it ranks or should appear” and then move on to something that’s seemingly more interesting. But this seemingly intractable conflict is where we should be dissatisfied, and where we have a chance to be creative.

We can approach with empathy: how can you be as open or helpful as possible when we’re forced to compete in this way. Instead of presenting a flight result in the same way as the others, we can say “$200 if you’re willing to compromise on stops, see more options…. $300 without your airline restrictions…”

Let customers know there’s another option, and don’t pass it off as part of the result set they asked for, call it out as a different approach.

Or, for example, the common “we have 200 hotels that aren’t available” — don’t show me 200 listings of places I can’t go, that doesn’t help anyone. If you have to tell me there are at maximum 200, tell me 50 of your normal 200 have availability if I move my dates, or here are 75 but a ways off.

Or think about this in terms of a problem you’re having — even if you write a sigh-and-an-eye-roll of a user story like “as a business, I want to build trust with users, so I can survive” that’s a starting point. What’s trust? What builds and undermines trust with your customers? Can you show your math? Can you explain what you’re trying to do to them?

It’s unrealistic to expect that you can start a conversation with a random shopper about how anchoring works and how to combat it, but what would you want to say? Are there tools you would arm them with so that they don’t fall prey to CheaperCoolerStuffwithFeesFeesFees?

Because if nothing else, knowing that this is all true, we can at least apply this to ourselves. The more time I spent in user studies watching smart people lose their way and come to entirely reasonable but incorrect conclusions because they’d been misled by having their brain trip up, the more I was able to not only ask questions like “which of these sites has the best prices for the thing I want?” but also questions like “which of these sites helps me find the thing I need?”

Concede what you must, but in seeking to help customers get what they want, instead of annoying them or seeming untrustworthy, and feeling like you’re only doing it because you’re forced to, you should be able to compete, help them succeed, and build a better and more durable relationship.

Unchecked AB Testing Destroys Everything it Touches

2 Replies

Every infuriating thing on the web was once a successful experiment. Some smart person saw

Normal site: 1% sign up for our newsletter
Throw a huge modal offering 10% off first order: +100% sign ups for our newsletter

…and they congratulated themselves on a job well done before shipping it.

As an experiment, I went through a list of holiday weekend sales, and opened all the sites. They all — all, 100% — interrupted my attempt to give them some money.

It’s like those Flash mini-game ads except instead of a virus-infested site it’s a discount on something always totally unlike what you were shopping for!

As an industry, we are blessed with the ability to do fast, lightweight AB testing, and we are cursing ourselves by misusing that to juice metrics in the short term.

I was there for an important, very early version of this, and it has haunted me: urgency messages.

I worked at Expedia during good and bad times, and during some of the worst days, when Booking.com was an up and comer and we just could not seem to get our act together to compete. We began to realize what it must have felt like to be at an established travel player when Expedia was ascendant and they were unable to react fast enough. We were scared, and Booking.com tested something like this:

Next to some hotels, a message that supply was limited.

Why? It could be either to inform customers to make better decisions. Orrrrrr it could instill a sense of fear and urgency to buy now, rather than shop around and possibly buy from somewhere else. If that’s the last room, what are the chances it’ll be there if I go shop elsewhere?

There’s a ton of consumer behavioral research on how scarcity increases chances of getting someone to buy, so it’s mostly the second one. If a study came out that said deafening high-pitched noises increased conversion rates, we would all be bleeding from our ears by end of business tomorrow, right?

So we stopped work on other, interesting things to get a version of this up. Then Booking took it down, our executives figured it had failed A/B and thus wasn’t worth pursuing, so we returned to work. Naturally Booking then rolled it out to everyone all the time, and we took up a crash effort to get it live.

(Expedia was great to me, by the way. This just a grim time there.)

You know what happened because you see it everywhere: urgency messaging worked to get customers to buy, and buy then. Expedia today, along with almost every e-commerce site that can, still does this —

It wasn’t just urgency messages, either. We ran other experiments and if they made money and didn’t affect conversion numbers (or if the balance was in favor of making money), out they rolled. It just felt bad to watch things like junky ads show up in search results, and look at the slate of work and see more of the same coming.

I and others argued, to the more practical side, that each of those things might increase conversion and revenue immediately and in isolation but in total they made shopping on our site unpleasant. In the same way you don’t want to walk onto a used car lot where you know you’ll somehow drive off with a cracked-odometer Chevrolet Cavalier that coughs its entire drivetrain up the first time you come to a stop, no one wants to go back to a site that twists their arm and makes them feel bad.

Right? Who recommends the cable company based on how quick it was to cancel?

And yet, if you show your executives the results

Control group: 3% purchased
Pop-up modals with pictures of spiders test group: 5% purchased
95% confidence

How many of them pause to ask more questions? (And if they have a question, it’s “is this life yet why isn’t this live yet?”)

And the justifications for each of the compromises are myriad, from the apathetic to outright cynical: they have to shop somewhere, everyone’s doing it, so we have to do it, people shop for travel so infrequently they’ll forget, no one’s complaining.

There’s two big problems with this:
1) if you’re not looking at the long-term, you may be doing serious long-term damage and not know it, and you’ll spiral out of control
2) you’ll open the door to disruptive competition that you almost certainly will be unable to respond to as a practical matter

Let’s walk through those email signups as an example case.

Yes, J. Crew is still here. Presumably their email list is just “still here” every couple weeks, until they’re not.

What this tells me as a customer is they want me sign up for their email more than they want me to have an uninterrupted experience at the very least. It’s like having a polite salesperson at the store ask if you need help, except it’s every couple seconds of browsing, and the more seriously you look the more of your information they want.

They’re willing for me to not buy whatever it was I wanted, or at least they are so hungry to grow their list they’ll pay me to join, which in turn should make anyone suspect they’re going to spam the sweet bejeezus out of their list in order to make back whatever discount they’re giving out.

As a product manager, it means that company has an equation somewhere that looks like

(Average cart purchase) * (discount percentage) + (cost of increased abandon rate) > ($ lifetime value of a mailing list customer)

…hopefully.

It may also be that the Marketing team’s OKRs include “increase purchases from mailing list subscribers by 30% year over year”

So there’s some balance you’re drawing between cost of getting these emails — and if you’re putting one or two of these shopping-interrupting messages on each page, it’s going to be a substantial cost — in exchange for those emails. Now you have to get value out of those emails you mined.

You may think your communications team is so amazing, your message so good, that you’re going to be able to build an engaged customer base that eagerly opens every email you send, gets hyped for every sale, and forwards your hilarious memes to all their friends.

Maybe! Love the confidence. But everyone else also thinks that, soooooo… good luck?

As a customer, I quickly get signed up for way too many email lists, so my eyes glaze over. I’m not opening any of them. Maybe I mark them as spam because some people make it real hard to unsubscribe and it’s not worth it to see if you made opt-out easy…

Now your mailing list is starting to have trouble getting filed directly to spam by automated filters, so by percentage fewer and fewer people are purchasing based on emails. Once your regular customers have all signed up for email, subscription growth even with that incentive is slowing. And if you’re sharp, you’ve noticed the math on

(Average cart purchase) * (discount percentage) + (cost of increased abandon rate) > ($ lifetime value of a mailing list customer)

Is rapidly deteriorating, and now you’re really in trouble.

What do you do?

Drive new customers to the site with paid marketing! It’s expensive even if you manage to target only good target customers. These new customers want that coupon, so you juice subscriptions and sales. And hey, that marketing spend doesn’t affect the equation… for a while.
Send more emails to the people who are seeing your emails! They’re overwhelmed with emails so you need to be up in their face every day! You see increased overall purchase numbers, and way more unsubscribes/marked as spam, and people are turned off to your brand. Which also doesn’t affect that equation… for a while.
Increase the discount offered!
Well everyone, it’s been a good run here, I’ve loved working with you all, but this other company’s approached me with this opportunity I just can’t pass up…

This is true of so many of these: if you think through the possible longer-term consequences of the thing you’re testing, you’ll see that your short-term gains often create loops that quickly undo even the short-term gain and leave you in a worse position than when you started.

But no one tests for that. The kind of immediate, hey why not, slather Optimizely on the site and start seeing what happens testing will inevitably reveal that some of the worst ideas juice those metrics.

Also, can we talk about how AB testing got us to this kind of passive-aggressive not-letting-people-say-no wording and design?

How many executive groups will, when shown an AB test for something like “ask users if we can turn on notifications” showing positive results that will juice revenue short-term, ask “can we test how this plays out long-term?”

As product managers, as designers, as humans who care, it is our responsibility to never, ever present something like that. We need to be careful and think through the long-term implications of changes as part of the initial experiment design and include them in planning the tests.

If we present results of early testing, we need to clearly elucidate both what we do and don’t know:

“Our AB test on offering free toffee to shoppers showed a 2% increase in purchase rate, so next up we’re going to test if it’s a one-time effect or if it works on repeat shoppers, whether our customers might prefer Laffy Taffy, and also what the rate of filling loss is, because we might be subject to legal risk as well as take a huge PR hit…”

Show how making the decision based on preliminary data carries huge risks. Executives hate huge risks almost as much as they like renovating their decks or being shown experiment results suggesting there’s a quick path to juicing purchase rates. At the very least, if they insist on shipping now, you can get them to agree to continue AB testing from there, and set parameters on what you’d need to see to continue, or pull, the thing you’re rolling out.

It’s not just the short-term versus the long-term consequences of that one thing, though. It’s the whole thing, all of them, together. When you make the experience of your customers unpleasant or even just more burdensome, you open the door for competition you will not be able to respond to.

I’ll return to travel. You make the experience of shopping at any of the major sites unpleasant, and someone will come along with a niche, easy-to-use, friendly site, probably with some cute mascot, and people will flock to it.

Take Hotel Tonight — started off small, slick, very focused, mobile only, and they did one thing, and you could do it faster and with less hassle than any of the big sites.

AirBNB ended up buying Hotel Tonight out for ~$400 milion. $400 million US dollars.

You’re paying for customer acquisition, they’re growing like crazy as everyone spreads their word for free. It’s so easy and so much more pleasant than your site! They raise money and get better, offer more things, you wonder where your lunch went…

If you’re a billion-dollar company, unwinding your garbage UX is going to be next to impossible. The company has growth targets, and that means every group has growth targets, and now you’re going to argue they should give up something known to increase purchase rates? Because some tiny company of idiots raised $100m on a total customer base that is within the daily variance of yours?

I’ve made that argument. You do not win. If you are lucky, the people in that room will sigh and give you sympathetic looks.

They’re trying to make a 30% year-over-year revenue growth target. They’re not turning off features that increase conversion. Plus they’ll be somewhere else in the 3-5 years it takes for it to be truly a threat, and that’s a whole other discussion. And if they are around when they have to buy this contender out, that’s M&A over in the other building, whole other budget, and we’ll still be trying to increase revenue 10% YoY after that deal closes.

There are things we can try though. In the same way good companies measure their success against objectives while also monitoring health metrics (if you increase revenue by 10% and costs by 500%, you know you’re going the wrong way), we should as product managers propose that any test have at least two measurable and opposed metrics we’re looking at.

To return to the example of juicing sales by increasing pressure on customers — we can monitor conversion and how often customers return.

This does require us to start taking a longer view, like we’re testing a new drug, as well — are there long-term side-effects? Are there negative things happening because we’re layering 100 short-term slam-dunk wins on top of each other?

I’m less sure then of how to deal with this.

I’d propose maintaining a control experiment of the cleanest, fastest, most-friendly UX, to use as a baseline for how far the experiment-laden ones drift, and monitor whether the clean version starts to win on long-term customer value, and NPS, as a start.

From there, we have other options, but all start from being passionate and persistent advocates for the customer as actual people who actually shop, and try to design our experiments to measure for their goals as well as our own.

We can’t undo all of this ourselves, but we can make it better in each of our corners by having empathy for the customer and looking out for our businesses as a whole. And over the long term, we start turning AB testing back into a force for long-term

…improvement.

Hinge’s Standout stands out as a new low in dating monetization

6 Replies

Hinge’s new Standout feature pushes them further into a crappy microtransaction business model and also manages to turn their best users as bait, and if youâ€™re a user like me, you should be looking for a way out.

I understand why theyâ€™re looking for new ways to make money. First, theyâ€™re a part of the Match.com empire, and if they donâ€™t show up with a bag of money that contains 20% more money every year, heads roll.

Second, though, every dating app struggles to find a profit model thatâ€™s aligned with their users. If youâ€™re there to find a match and stop using the app, the ideal model would be â€œyou only pay when you find your match and delete the appâ€ but no oneâ€™s figured out how to make that work.

(Tinder-as-a-hookup-enabler aligns reasonably well with a subscription model: â€œweâ€™ll help you scratch that regular itch you haveâ€)

Generally, monetization comes in two forms:

ads, to show free users while theyâ€™re browsing, and selling your data
functionality to make the whole experience less terrible

Which, again, presents a dating business with mixed incentives. Every feature that makes the experience less painful offers an incentive to make not paying even more painful.

For example: if youâ€™re a guy, you know itâ€™s going to be hard to stand out given how many other men are competing for a potential matchâ€™s attention. So sites offer you a way to have your match shown ahead of users not spending money. If a customer notices that their â€œlikesâ€ are getting way more responses when they pay for that extra thing, theyâ€™re going to be more likely to buy themâ€¦ so why not make the normal experience even more harrowing?

Dating apps increasingly borrow from free-to-play games â€” for instance, setting time limits on activities. You can only like so many peopleâ€¦ unless you payyyyy. Hingeâ€™s â€œPreferredâ€ is in on that:

49885D72 BAD3 41CF 93E4 ECC1D430F2FF 1 201 a

They also love to introduce different currencies, which they charge money for. Partly because they can sell you 500 of their currency in a block and then charge in different increments, so you always need more or have some left over that will nag at you to spend, which requires more real money. Mostly because once itâ€™s in that other currency, they know that we stop thinking about it in real money terms, which encourages spending it.

One of the scummiest things is to reach back into the lizard brain to exploit peopleâ€™s fear of loss. Locked loot boxes are possibly the most famous example: you give them a chest that holds random prizes, and if they donâ€™t pay for the key, they lose the chest. Itâ€™s such a shitty thing to do that Valve, having made seemingly infinite money from it, gave up the practice.

Hinge likes the sound of all this. Introducing:

83439278 F0EB 4103 9345 B696E7C3F62D

Wait, wonâ€™t see elsewhere? Yup.

0C3A38F9 E5E3 494D 885B A1A3A1FE6EF6

This is a huge shift.

Hinge goes from â€œweâ€™re going to work to present you with the best matches with paid features make that experience better” to â€œweâ€™re taking the best away and into new place, and you need this new currency to act on them or youâ€™ll lose them.”

If you believed before that you could use the appâ€™s central feature to find the best match, well, now thereâ€™s doubt. Theyâ€™re taking people out of that feed. Youâ€™ll never see them again! That person with the prompt that makes you laugh will never show up in your normal feed! And maybe theyâ€™ll never show up on Discover!

Keep in mind too that even from their description, theyâ€™re picking out people and their extremely successful prompts.Â Theyâ€™ve used data to find the most-successful bait, and theyâ€™re about to charge you to bite.

EB979C37 2AE3 42F4 B7C3 994D40E844DB 1 105 c

$4. Four bucks! Letâ€™s just pause and think about how outrageous this is. Figure 90% of conversations donâ€™t get to a first date â€” thatâ€™s $36 per first date this gets you. And what percentage of first dates are successful? What would you end up paying to â€” as Hinge claims to want to do â€” delete the app because youâ€™ve found your match?

Or, think about it the other way: if Hinge said â€œ$500, all our features, use us until you find a matchâ€ that would be a better value. But they donâ€™t because no one would buy that, and likely theyâ€™ve run the math and think that people are more likely to buy that $20 pack, use the roses, recharge, and theyâ€™ve got a steady income, or the purchaser will give up after getting frustrated, and that person wasnâ€™t going to spend $500. More money overall from more people spending.

If youâ€™re featured on this â€” and they donâ€™t tell you if you are â€” youâ€™re the bait to get people to spend on micro transactions. This justâ€¦ imagine youâ€™ve written a good joke or a nice thing about yourself, and people dig it.

Now youâ€™re not going to appear normally to potential matches. Now people have to pay $4 for a chance to talk to you.

Do you, as the person whose prompt generated that rose, receive one to use yourself?

You do not.

Do you have the option to not be paraded about in this way?

You do not.

This rankles me, as a user, and also professionally. As a good Product Manager, I want to figure out how to help your customers achieve their goals. You try to set goals and objectives around this â€” â€œhelp peopleâ€™s small businesses thrive by reducing the time they spend managing their money and making it less stressfulâ€ and then try to find ways you can offer something that delivers.

Sometimes this results in some uncomfortable compromises. Like price differentiation â€” offering some features that are used by big businesses with big budgets at a much higher price, while you offer a cheaper, limited version for, say, students. The big business is happy to pay to get the value youâ€™re offering them, but theyâ€™d certainly like to pay the student price.

Or subscription models generally â€” I want to read The Washington Post, and I would love not to pay for it.

This, thoughâ€¦ this is gross. Itâ€™s actively hostile to the user, and you want to at least feel the people youâ€™re trusting to help find you a partner are on your side.

I can only imagine that if this goes well â€” as measured by profit growth, clearly â€” thereâ€™s a whole roadmap of future changes to make it ever-more-expensive to look for people, and to be seen by others, and itâ€™ll be done in similarly exploitative, gross ways.

I donâ€™t want to be on Hinge any more.

Chemistry.com fizzles: a product manager attempts online dating, pt 3

Leave a reply

So far: match.com was not fun, then EliteSingles looked at Match.comâ€™s heterosexual bias, said â€œhold my privilege,â€ and set out to make the experience even more coercive, white, and hetero-normative. I did not have a good time. Then I took a couple-month break there because I got an insane flu and then met someone delightful I dated for a couple months, and I didnâ€™t want to revisit this.

Still, we

Next up I went to Chemistry.com. Chemistry, like OkCupid used to, claims to do matching based on a huge number of questions and science. Itâ€™s got Dr. Helen Fisher, who Iâ€™ve heard on podcasts and seems great!

Chemistry claims their test is â€œfun, engaging, and provides an in-depth look at who you are and what you want in a relationship.”

Iâ€™ll spoil it for you: it is none of those things, and Chemistry offers some clear signs that you shouldnâ€™t trust them.

Anyway, letâ€™s get started? Sure match and EliteSingles were white and heteronormative, but a science-based site like this is going to have a more diverse and â€”

Screen Shot 2019 11 26 at 12 00 07 PM

DAMMIT.

(And I am again using VPNs to test these things from cities with wildly different demographics, thatâ€™s not just them guessing Iâ€™m straight and in Portland)

Iâ€™m sure Chemistry will have a more nuanced set of who can look for what, right?

Screen Shot 2019 11 26 at 12 00 50 PM

Nope. Youâ€™re straight or youâ€™re gay.

ðŸ˜

So letâ€™s get into the meat of this. Letâ€™s kick off this personality test.

Screen Shot 2019 11 26 at 12 09 47 PM

ðŸ˜‘

I kinda gave up immediately. Was the next question going to ask me to feel the lumps on my head and pick the diagram closest to it? What could this possibly indicate about oneâ€™s personality?

That critical question answered, youâ€™re introduced to the bulk of the test. Itâ€™s 45 minutes of questions, often in succession asking for almost the same thing:

Screen Shot 2019 11 26 at 12 10 29 PM

and

Screen Shot 2019 11 26 at 12 10 39 PM

Occasionally with a curve ball like this:

Screen Shot 2019 11 26 at 12 13 00 PM

Screen Shot 2019 11 26 at 12 15 35 PM

These moments were welcome breaks from the world of bubbles. Eventually youâ€™re granted questions with different numbers of answers:

Screen Shot 2019 11 26 at 12 16 42 PM

When youâ€™re through that ordeal, you get to describe yourself.

Again, Iâ€™m really hoping for some better options than weâ€™ve seen in our last two adventures.

Eye colorâ€¦ hairâ€¦ buildâ€¦

Screen Shot 2019 11 26 at 12 18 22 PM

Hmmm.

Screen Shot 2019 11 26 at 12 18 28 PM

â€¦also an interesting set of choices…

Screen Shot 2019 11 26 at 12 18 37 PM

Again, hate this question, hate the â€œmarriage is the most important thingâ€ and youâ€™re either not in one, youâ€™re on your way out, out, or you were involuntarily taken out of one. In a loving long-term partnership? Nope! Doesnâ€™t matterâ€¦ ughhhh.

Screen Shot 2019 11 26 at 12 19 54 PM

It takes the â€œforced choiceâ€ approach to getting you to pick some interests. You have to have three, and only three count.

Now to upload your photo. You have two choices. Facebook, or upload.

Screen Shot 2019 11 26 at 12 20 17 PM

Wait, whatâ€™s that tiny small grey text there? â€œSkip this step.”

Look, itâ€™s voluntary to sign up for a site like this. If itâ€™s that important to their success, and to the success of everyone else, that there be a photo there, make it mandatory. Maybe donâ€™t spring it on them this late in the process â€” which is another thing, Chemistry does not tell you itâ€™s going to take so long to sign up.

Then you get the sell on subscribing —

Screen Shot 2019 11 26 at 12 23 42 PM

Okay, well, thanks for telling me. Iâ€™m curious what those features are â€” itâ€™s pretty vague what â€œenhanced searchâ€ means, and having the two communication features makes it seem like you might not be able to contact people. Itâ€™s an odd choice â€” Iâ€™d really think theyâ€™d want to do a better job expressing what the value is here before they make you the pitch.

BUT THIS IS THE PITCH! Continue is actually sign up â€” now youâ€™re asked for payment. Did you want to skip? Hidden grey text again. Note that here itâ€™s not next to the continue button, but all the way over on the left. This isâ€¦ intentionally deceptive.

Screen Shot 2019 11 26 at 12 24 09 PM

This page is so jarringly different from the design youâ€™ve seen to that point I thought for a moment that Iâ€™d clicked on an ad or gone awry somehow. Clearly this is some vestigial code owned by a troll under a bridge, or something.

However I want to focus on a huge breach of trust here.

Letâ€™s say you want that “special profile highlight offerâ€ theyâ€™re pushing. $38.94, right?

No!

No.

Screen Shot 2019 11 26 at 12 25 27 PM

There is an extra $4 added for no reason. â€œAll new upgrade ordersâ€ â€” is this an upgrade? Itâ€™s a new account. What are they talking about? Why does that say â€œupgrade now?â€ Am I even in the right place?

What are the chances you realize youâ€™re moving forward with a different amount, given this confusing presentation? This like a hidden fee on your hotel bill where if you look up at the person at the desk they immediately remove it out of embarrassment?

Youâ€™re prompted to set up some things that people can ask you, what youâ€™re looking forâ€¦ I was out by this point, though. However, Iâ€™d been sent

The results of my personality test!

UntitledImage

What, all those questions about whether Iâ€™m into new experiences told you whether Iâ€™m into new experiences? THAT IS AMAZING.

Truly a marvel of science. Who knows what the future might bring us?

Yeah, this very much rubbed me the wrong way. It felt like a particularly sophisticated â€œWhat Zootopia character are you?â€ Where all the questions are â€œdo you like carrots?â€ â€œAre you good at multiplication?â€ â€œDo you have over 1,000 people at your family reunions?â€ â€œOMG YOUâ€™RE JUDY HOPPS”

Still, this was â€” as personality tests can be â€” an interesting break before I had to face:

The cancellation test!

One of the best ways to learn about a company is by how they act when you cancel. Do they make it difficult? Do you have to call someone? Do they make you go out under a full moon and hold up a solved Rubikâ€™s Cube with both hands and turn three times counter-clockwise, so that you end facing South-by-South-East?

Screen Shot 2019 11 26 at 12 28 43 PM

Probably an account status, right?

Screen Shot 2019 11 26 at 12 28 57 PM

â€œOther account status changesâ€ is crypticâ€¦

Screen Shot 2019 11 26 at 12 29 08 PM

Oh there it is, the last option.

Screen Shot 2019 11 26 at 12 29 41 PM

Why is Date capitalized here? Why is the distinction between casual/serious made here? Why would you stop if you made a friend â€” isnâ€™t Chemistry about serious people here to meet their partners?

Why arenâ€™t you allowed to tell them you donâ€™t like their site? Thatâ€™s not a â€œTechnical issue”

Anyway, so pick a reason…

Screen Shot 2019 11 26 at 12 30 19 PM

Weâ€™re into bad breakup territory here, where everything you say requires more explanation. So you type something in â€”

Screen Shot 2019 11 26 at 12 30 27 PM

You have by my count gone through at least six (and probably a lot more, possibly including looking up a help article on how to remove your profile). Youâ€™ve just told them more about why want to remove your profile. And you get this last â€œwaitâ€ modal. Itâ€™s just..

I will say itâ€™s nice that they clearly tell you what each of those do, but itâ€™s probably deliberately confusing if someoneâ€™s going through this thinking â€œcancel my accountâ€ at each step, gets to the end, and â€” because Chemistryâ€™s been trying to divert them the whole time â€” sees â€œcancelâ€ as the option they want, and â€œRemove Profileâ€ as a different, non-deletion step. This is not helped by how many other sites â€” see Match for one example â€” very much want to keep your zombie self up and boosting their numbers, and try to dance around what profile and account mean.

The end

Iâ€™m disappointed. I thought given the association with Dr. Fisher that Chemistry might actually be moreâ€¦ on the up-and-up? More inclusive? By the time I got through the questions, though, I had no desire to see what the rest of the experience was like, and getting out of it only reinforced my impression that I didnâ€™t want to do business with Chemistry. I continue on.

EliteSingles: a Product Manager attempts to date online, pt 2

Leave a reply

Fresh off our Match adventure, letâ€™s check out EliteSingles. First, scan the page, who do we think their target market is?

Screen Shot 2019 11 18 at 10 12 43 PM

Screen Shot 2019 11 18 at 10 12 36 PM

Screen Shot 2019 11 18 at 10 12 31 PM

Huhhhhh.

Even the stock image of the customer support person â€” Iâ€™m presuming, yes â€”Â

Screen Shot 2019 11 18 at 10 12 49 PM

ðŸ˜ðŸ˜‘

Moving on. They make some bold claims:

High success rate? Compared to what? Whatâ€™s the rate? Weird they wonâ€™t tell me.

Screen Shot 2019 11 18 at 10 05 31 PM

Itâ€™s a little odd they claim elsewhere that EliteSingles is all about serious dating, then here it lists out some dating phrases (for SEO, presumably) followed by “or find love, idk ðŸ¤·â€â™€ï¸ ”

Iâ€™m on board with the approach conceptually â€” if they can intelligently select the people, only showing a few could be a huge win. It would let you consistently put effort into it, keep from being sucked in (on a site like Tinder or OkCupid or wherever, you can put in regular, incredible amounts of effort). You would hopefully be able to know that you did your best, that it went to the right places, and feel good about having put the time and energy into it.

Iâ€™m hyped! Letâ€™s run through some warning signs!

Screen Shot 2019 11 18 at 9 47 08 PM

Disappointing, in exactly the same way Match was. However, this is even more aggressively coercive than Match: once you pick what youâ€™re looking for, youâ€™re auto-taken to sign up:

Screen Shot 2019 11 18 at 9 47 46 PM

Please note the creepy results of fading between pictures. I didnâ€™t even notice this until pasted the picture in here.

If youâ€™re, for instance, bisexual, and wanted to click both check boxes, nope. You gotta go. EliteSingles,Â I know you want to get people through the process and make it quick and easy, but this is gross.

Do Match and Elite do usability testing with a diverse set of testers? At this point Iâ€™m willing to bet that Eliteâ€™s decided their target customers are heterosexuals, predominantly white heterosexuals, and maybe â€” maybe â€” they think about homosexual people occasionally. And if not, they seemingly made a conscious decision to keep same-sex couples off their home page. Why would you do that? Do you think that there are enough heterosexual customers whoâ€™ll leave if they see a gay couple? Why?

Anyway. Iâ€™m writing a blog, I persist.

Screen Shot 2019 11 18 at 9 49 18 PM 1 dragged

Itâ€™s weird they ask you for email and password and then take you to this page, which is asking you the same things, including getting you to agree to the T&C. Why would you duplicate this step, especially knowing this kind of early gate is going to have a huge impact on conversion?

Despite having said youâ€™re new â€” or at least, having passed up the chance to say youâ€™re a returning customer and login â€” the â€œalready a registered user?â€ is the most prominent box, the first thing people are going to see on this page. Do that many people enter their email and password on the first page as if theyâ€™re new? Regardless, this feels like something you could clarify in one step, rather than have two confusing duplicative pages.

There is also at the top a duplicate â€œemail/passwordâ€ path to login. This page feels like it might be a vestigial page thatâ€™s still in the path after a redesign because itâ€™s load-bearing.

A big point about a small thing: note it would not let me use a + in my email address, which is a cool way to do cool things with your gmail address. It claims it is not valid. It is! Itâ€™s in the spec for email addresses! Sure, many sites donâ€™t let you use some special characters, but they are lying that it is not valid. This is not a great way to go for a site youâ€™re going to trust with incredibly intimate information.

I know, itâ€™s an email validation thing, and maybe they have problems with people using valid special characters and it bouncing (I would wager this justification is not backed by good data).Â

How you act in the small things is how you act in the big things. I donâ€™t like this.Â

Sigh, letâ€™s go.

Please note that one hand has painted nails and a smooth, seemingly hairless arm, and the other â€” itâ€™s another white heterosexual couple is my point. Do you see this if youâ€™ve said youâ€™re homosexual? Place your bets…

It does. Of course it does.

Anyway, I too am excited to be provided with exciting matches.

What a pointless, offensive screen. One, youâ€™ve already identified yourself as â€œmanâ€ or â€œwomanâ€ in the first screen â€” a question asked without making it about gender.

But here you go! You have to confirm your gender. Your gender has to be one of those two things. What a crock of shit. I canâ€™t even imagine what it feels like to have dealt with gender identity discrimination and come across a gate like thisâ€¦ when youâ€™re trying to meet people.

Why would you do this?

Anyway, I was mad enough about this that I emailed their support address and said

Hi!
The first question in the registration process is to pick a “gender” of
male/female. It’s 2019, gender’s a spectrum and biological sex is kinda
irrelevant. Why are you asking this as if it’s a binary? What happens
to people who don’t identify with either binary?

A little glib, maybe. Anyway, they got back to me:

Dear Derek,

Thank you for your message.

IÂ appreciate what you are saying and that opinions differ around this
subject currently. We currently don’t have any plans to change this part
of the process however I will be sure to pass your comments along to
the relevant department.

Let me know if you have any further questions.

Please note that the question is not answered. But itâ€™s the â€œopinions varyâ€ that infuriates me. To have this page, to have this page like this, is a clear choice on which opinion youâ€™re lining up with, and itâ€™s the most regressive and hurtful one. This is such a crazy, dismissive, bullshit response. I want nothing to do with them. I abandoned the first time I hit that screen, and nowâ€¦ Iâ€™m writing this. Afterwards Iâ€™ll go into one of those sci-fi decontamination chambers where a burst of radiation destroys the outer layer of my skin entirely, hopefully before this seeps in.

How bad does this get? It gets pretty bad doesnâ€™t it? Some highlights!

Iâ€™m supposed to confirm their gender? Look, youâ€™re the ones asking intrusive questions so I donâ€™t have to.Â

Why, EliteSingles. Why. Are you doing this because Match does it?

Beliefs. Okay, thatâ€™s better than asking what your religion is and forcing everything under that label. Iâ€™m at least grateful —

WHY WOULD YOU DO THAT.

Is this the touch screen kiosk at a checkpoint in a fevered sequel to the Turner Diaries? Whatâ€¦ whyyyyyâ€¦ ugh.

Can you pick more than one? Guess. No, go ahead. Youâ€™ll be surprised at this answer.

I lied, youâ€™re not going to be surprised: one. Youâ€™re auto-advanced to the next screen after you pick one.Â This is yet another point I wanted to throw in the towel.

After answering enough questions to get to about the one-third mark on the progress meter, itâ€™s time for an intermission slide!

Reader…

Is it a picture of what is almost certainly a heterosexual couple?

What do you think?

What are the odds that it is not? What would I have to offer you if you were going to pay me $1 if it was not a heterosexual couple?

$10?

$100?

$1,000?

Youâ€™re still not taking $1,000.

Because of course itâ€™s a heterosexual couple. Of course it is. Are they white? Your call, butâ€¦ yes? It seems like a safe assumption.

Many questions later…

Why not one line that says â€œChoose as many terms as apply to youâ€ or another single sentence, rather than two?

Thenâ€¦ arenâ€™t many of these terms everyone wants to ascribe to themselves? And who is going to check â€œunsuccessfulâ€ besides people who meant to click other bubbles and missed?

â€œYeah, Iâ€™m good looking and attractive, but I also want to click honest, soooo distant, cold, argumentative, both dominant and dominating, irritableâ€¦”

Youâ€™re free to click as many as you want, but then they make you focus:

This is an interesting approach and Iâ€™d love to see the data of what people initially pick to what they narrow to. Iâ€™d also be interested in how theyâ€™re using each of those: are they put to different purposes? Are (as Iâ€™d suspect) the second set used strongly in weighing potential matches while the first is not?

Hey thereâ€™s another intermission!

Quick, is it a white heterosexual couple looking towards the future?

Donâ€™t roll your eyes at me.Â

Thereâ€™s a ton of multiple choice questions in the process like this:

Interspersed with free text answers like:

Screen Shot 2019-11-18 at 11.22.38 PM.png

Itâ€™s unfortunate that these are so late. At this point youâ€™ve been in the process for at least 30 minutes from the start. Whoâ€™s going to answer these well?

Their distance question is interesting to me:

Even if you want 200, you have to move the slider before â€˜OKâ€™ is usable, which I donâ€™t understand.

But my point: defaulting to 200 allows them to mitigate a huge problem for non-Tinder sites. If you join a site like this, especially if youâ€™re paying â€” and do note that at no point in this process has there been a glimpse of â€œthere are actually people on the other side, see?â€ as Match.com did â€” you need to come out of the gate with something. So itâ€™s framed as â€œare you willing to travel in your search for a partnerâ€ and itâ€™s anchored cleverly: the top option is infinite! Are you a true romantic or do you have blinders on in your search?

Iâ€™m a little surprised the language isnâ€™t stronger: â€œHow far would you travel to meet your partner?â€ Any direct suggestion that not traveling farther might mean youâ€™re not going to meet your match. And the â€œI donâ€™t mindâ€ for infinite distance seems tepid. â€œIâ€™ll travel anywhereâ€ or â€œWeâ€™ll figure it outâ€ would be stronger language.

I will now reveal what happens when Iâ€™m tired, have spent 45 minutes on this process, and come across some baffling UX. Itâ€™sâ€¦. Itâ€™s not flattering.

First, they make it seem like youâ€™re just doing this for funsies, while theyâ€™re lining up matches for you. But thereâ€™s no progress meter, no updates that theyâ€™ve got 1, 20, -4 matches in the queue. So why hang around indefinitely? And why take these seriously, if theyâ€™re already able to go find your matches?

I donâ€™t have answers.

Second, letâ€™s talk about how confusing this is.Â

There is a â€œNext Questionâ€ button. What do you think â€œNext Questionâ€ does? You type your answer, you hit next question, right?

Because thereâ€™s a button there that says â€œSave & Continueâ€ that, presumably, saves how far youâ€™ve gotten and you come back to it. Thatâ€™s a reasonable assumption. Itâ€™s even labeled â€œLater.”

No. â€œLaterâ€ is a link, it bails you out entirely. It is a third, different action, which does something larger than those buttons, but is in smaller text.Â

If you enter an answer and then hit â€œNext Questionâ€ you are not given a â€œDiscard answer and go to the next question?â€ warning or anything. You just get the next question, as if everythingâ€™s fine.Â

Why would you do this? You have three actions:

move to the next question, discarding any answer the personâ€™s entered
move to the next question, saving the entered answer
quit answering questions entirely

Iâ€™m baffled why youâ€™d choose the UI they did, where the first buttonâ€¦ argghhh.

As you can probably guess by now, I spent minutes answering questions and hitting â€œNext Questionâ€ until â€”Â

Screen Shot 2019 11 18 at 11 35 45 PM

I bailed on questions, and as a reward received this screen!

I found this page incomprehensible.

â€œMember Favoriteâ€ like itâ€™s a mobile game asking you to purchase qDollaz or something.

First, the tiers make no sense to me (and I will not be typing them in all-caps). Premium canâ€™t be Light. Premium Classic, sure, and Premium Comfort. But the distinction is Light to Classic? Light to Comfort? Light has fewer features, but thereâ€™s seemingly no difference other than length of contract between the other two. But they have different color schemes! Classic is in italics!

Comfort has that gold sparkle on the name! And is otherwise shown in a drab color scheme that isnâ€™t like what we see elsewhere in the site. Why is that one not done in the EliteSingles green? Why is even the â€œContinueâ€ in a different color â€” one that hasnâ€™t meant â€˜goâ€™ in the process so far? ! Whyyyyy?

Classic is also the only one with a discount! It gets a red badge! And a red price!

Second, this pricing is just wild. As far as I could decipher:

Light has fewer features, but youâ€™re only on the hook for 3 months, so itâ€™s $174
Classic has all the features, for six months, so $210
Comfort runs for twice that at a slightly lower price, so $384

I donâ€™t get why itâ€™s framed this way either. The worst quality product is presented as if itâ€™s the most expensive. Maybe theyâ€™re trying to anchor people with that higher price, but thatâ€™s confusing as hell. â€œWait this one is terribleâ€¦ and expensive?â€ And then the one theyâ€™re pushing, which has the â€œjust rightâ€ middle price and middle term, shares the color scheme with the crappy one, features with the Comfort, is discounted, but is still more expensive…

Itâ€™s like they wanted the reaction to be â€œUgh, what? Oh hey, Goldilocks option before it gets all drab over there, I canâ€™t be seen buying something where the buy button isnâ€™t even enabledâ€¦”

This isnâ€™t how anchoring works, though. If you do anchoring well, itâ€™s more like…

Luxury! Our platinum toothpicksâ€¦ $50,000 for a set of two. Monograms available, inquire.
Good! These reusable toothpicks are made of stylish graphite! $10.
Enh! Toothpicks, like you get in the store. Box youâ€™ll spill on the floor before you get through it, $2
Awful! Made of the worst, most splinter-prone Ash we could source, youâ€™ll hate using this only a little less than your dentist hates you for using them! Crate of millions, $1

Right? You can experiment with pricing, number of options, and features, but this is what you want. You want the customer to say â€œCan I afford the better version I want?”

Here, the worst version as presented is also presented asÂ

I can only think that theyâ€™re doing this because they want everyone to pick that option. If thatâ€™s the case, why present options? Or, why not just present one product, one price? Or perhaps just show different term/rate combinations as the way to offer choice?

I donâ€™t get this screen.

I also didnâ€™t get EliteSingles Premium Classic, or any of their options.

…

I feel like I need to offer some kind of conclusion, some neat summary of the experience and recommendations from a Product Manager-y perspective.

I canâ€™t. Iâ€™m going through this both as someone who has lived in this world, and who would so much like to find one of these services that is not terrible. When I go through something like EliteSinglesâ€™ onboarding, Iâ€™m just sad. Why not pay attention to things like diversity of appearances? Or whether youâ€™re enforcing beliefs that â€” even if you truly, fervently believe that there are only two genders, why be a jerk about it to people who think theyâ€™re not? Whatâ€™s the point? Also, genderâ€™s not binary, itâ€™s hugely complicated, Iâ€™ll acknowledge we as a society havenâ€™t figured out how to work with that complexity. We can try, though. We can at least do that.

I also always feel like when you notice something like that â€” and Iâ€™m a privileged heterosexual cis white man, I am not nearly as good at seeing things like this as I could be, but Iâ€™m trying, and if I see it â€” itâ€™s a huge red flag that the people and/or the company as a whole are lacking in empathy.

Dating is incredibly difficult and stressful. You donâ€™t want to go into that with someone you donâ€™t trust, and donâ€™t trust to be honest and understanding.

I donâ€™t, anyway.

A Product Manager attempts to date: Match.com onboarding

Leave a reply

I’ve been in and out of online dating for ages, sometimes paying, often not. After hearing I should be on a pay site for years, a friend joined one, met their partner, and moved in. Since that’s what I want, it seems worth a try. As a Product Manager by trade who has worked on onboarding before, I’m finding the experience fascinating.

Let’s talk about Match.com

NewImage

Okay, that’s a ton of white people, and all the women have long hair and all the guys have short hair and… I wondered if they were IP sniffing and know I’m somewhere that’s been gentrified to hell, but I VPNed in from other locations (Atlanta was one where I figured itâ€™d have to be different) and it’s the same picture.

Thatâ€™s a great sign.

So what are our options?

Screen Shot 2019 11 15 at 12 46 10 PM

Sigh. Thatâ€™s notâ€¦ great. So if youâ€™re bisexual, you have to pick one? If itâ€™s even more complicated than that, youâ€™re entirely screwed? This seems so immediately exclusionary, and to what end?

I’ll continue on because Iâ€™m curious if this is going to stick and I enjoy torturing myself with hope. Itâ€™s worth noting that the prompt to go forward is â€œView Photosâ€ which isnâ€™t the userâ€™s goal when they come to the page. They want to find a match at Match, not view photos. If they wanted to make themselves feel awful by looking at beautiful people theyâ€™re not going to get with, theyâ€™re not going to come here.

Match free

Nowwwww they’re putting the carrot in front of you. Here’s the interface you’ll see, there are 1,596 matches if only you’ll give them your Facebook or email.

Once you do that, though, the carrot is removed. There’s work to be done!

Match so marriage huh

That this is not “Relationship Status” but “past relationship status” always pisses me off. I’m divorced. Why does that matter? Why is that different than someone who lived with someone for a decade? Why is this question framed around marriage?

Further, this entirely neglects that there are a ton of other possibilities just within the boundaries of marriage. I donâ€™t like this one at allllllll.

But I really want to talk about something in the background.Â

That background image of the (seemingly) hetero couple? Shows on all the paths.Â If you’re looking for a same-sex relationship and signing up for Match, you get to stare at those two while you’re answering all the signup questions. There will be many questions.

I wonder if the non-hetero abandon rate is higher than it should be just on this. I hope so.

This is so low-effort, too! You need two more pictures so gay men and women see couples that reflect them. Is this just… no one who works on Match is gay? That can’t be true.

NewImage

This isn’t a potentially painful question or anything, thanks. Also, another question where you’re forced into an answer and maybe you don’t know, or maybe the answer doesnâ€™t fit into these. This whole thing feelsâ€¦ vaguely coercive.

Interestingly, and we can chalk this up to progress or Match just having one path — the questions appear identical between paths. I admit I was afraid I wouldnâ€™t be asked this when I tried those paths.

“Of course theyâ€™re asking gay people if they want kids,” you might say. “Itâ€™s 2019!”

And again, you have to identify in step one as a man or a woman, seeking a man or a woman.Â

Anyway check out this progress bar:

NewImage

You’ve made progress, for sure, but how much longer? We have no idea. If it was soon theyâ€™d tell us, though.

What about only showing dots?

It must be deliberate to not reveal the icons as you go — if you know there are more steps, you forge ahead, but if you knew you were about to get asked about babies and smoking, that’s a look into a future that requires energy and is likely to be unpleasant. I would love to see their A/B results if they tried this.

NewImage

I hate that ethnicity is always a thing. Why? Because people are searching for or to avoid other ethnicity? Oh right, that’s exactly it.

NewImage

Framing here is also interesting: you must have a religion. Athiest? Agnostic? That’s a religion. I know, I’m being crotchety now.

They only show the top six options initially, and you have to expand to get to more. Why six?Â

Your thought process to even get to those six has to be something like

â€œProtestants, of course, theyâ€™re nearly half the population, and Catholics, another quarter or so. Then itâ€™s people who say no religion, but we canâ€™t just put that, because weâ€™re making them choose a religion. Ugh. Letâ€™s break that down into â€œnon-believersâ€ â€œdonâ€™t careâ€ and â€œNew Age hippy-dippy peopleâ€ and whatâ€™s that leave us? Non-christian religions and Judaism as a whole? Ugh, justâ€¦ put it behind a more options bonus and call it a day.”

And that “Christian/Other” is first, when that wording makes it seem like a thrift store red tag item.Â Demographically, you’d think “Christian/Protestant” would be first — it’s almost half the US population.Â

This interests screen intrigues me:

NewImage

One, you’re forced to pick five and only five. There’s a lot of implications in this approach. If youâ€™re like me, you look at this and start to trade off what youâ€™re going to put in there if you only have five. You have to pick the things that you see as most-defining, because you canâ€™t do a laundry list like OkCupid. But also, what if nothing I like is on here? Do I just look at my overfull glass of Pinot and click â€œWine Tastingâ€ while sighing?

Also, Pinterest is its own thing? Facebook isnâ€™t. Instagram isnâ€™t. Iâ€™m curious how they got to that list.

Now we get to the search.

Nah, Iâ€™m not looking for anything specific. Iâ€™ll just root through your dumpster or whatever. Donâ€™t mind me.

Derek, how would you word it then?

Iâ€™d try something more goal-oriented, like â€œWhat do you want in a match?â€Â

Also, yes, this exercise is wearing on me.

Again, their “marital status” one makes me want to scream:

“Hey how are you doing?”

“I used to be bad.”

“Okay, but now?”

“I can only tell you what happened in the past.”

“Cool.”

What proportion of people seriously want only widowed or currently separated matches? Is it really enough that adding this to both the profile and the search flow are worth the cost? That seems so unlikely.

And again, I hate that they let you search by ethnicity. It’s just gross.

NewImage

Now that youâ€™ve picked the kind of flesh youâ€™re interested in, time to flesh out your profile ha ha ha ha ha Iâ€™m dying inside, help, please, help.

NewImage

You’re forced to add one topic, and then you have the option to do two more… and wait! Now there’s only one dot remaining at the bottom? Weâ€™re ALMOST DONEEEE.

Now! Upload that image!

NewImage

I dig this, actually. Itâ€™s clear about why you should upload a picture, itâ€™s clear what the next step is â€” but again, do people come to Match to view and be viewed, or to find a partner? If itâ€™s a partner, then how much do my chances improve there?

The actual experience fell down for me, too: when you upload, nothing happens for a long time, so you might upload the same image repeatedly, and then it only complains on the next page… anyway.

full member

Hey wait, I’m not a full member? I just did all that work! Continue…

NewImage

This concludes my examination of Match.com’s onboarding process.

Of course it doesn’t.

I will also note that if you have an adblocker on, throughout the process you’ll see you’re avoiding a staggering amount of crap. Disabling the ad blocking does not get that page to load though.

I’ll do them the solid of heading back to the base site. Oh sweet, CAPTCHA in order to log back in.

NewImage

Kinda feeling like that fire hydrant actually. Fiiiine —

NewImage

Ha. This concludes my look at Match.com’s onboarding.

Learning from uncooperative A/B testers

Leave a reply

One of the joys of working at a tiny startup packed into an ill-equipped, too-small space was running an account at Khaladi Brothers, the coffee shop across the street, because all small meetings had to be done outside the office. As the top coffee nerd, I took on running fresh vacuum pots over (and yelling â€œFresh pot!â€ as I entered) and exchanging the empty ones. When we moved to a newer, spacious, swanky, and quite expensive office space (hot tip to startups: donâ€™t do this) with an actual kitchen and drip coffee maker, I was put in charge of deciding which coffee beans weâ€™d order. We had many options and an office of high-volume consumers with strong opinions on everything, and needed to get down to one or two for the recurring bulk order.

Naturally, as a Product Manager, I decided to do selection through a series of A/B tests.

Must-have for the tests:

end up with a winner
clear methodology, publicly exposed
easy to participate â€” or not
takes as little as my time as possible, because this was amusing but everythingâ€™s on fire all the time
keep momentum out of it (so the first voter didnâ€™t disproportionately determine the dayâ€™s winner)

I discarded forced-choice, so coffee drinkers didnâ€™t have to vote if they didnâ€™t feel like trying both or didnâ€™t find a winner, I decided against setting up a dedicated â€œtodayâ€™s testâ€ table or doing â€œthree-sample can you tell if oneâ€™s differentâ€ type advanced testing, I didnâ€™t try to timestamp votes to determine if one did well fresh and one did well through the day… nope!

I went straight single-bracket, winner-advances, random seeding at each round. Every day I tried to get to the office before everyone, and made two giant pots of coffee labelled “A” and “B”. If someone wanted to vote for a winner, they could write it down and drop it in a tin, which I tallied at the end of the day. I will admit that having come out of Expedia, where our A/B tests were at colossal scale with live customers, this whole thing seemed trivial and I didnâ€™t spend as much time as I might have.

You may already see where some of this is going. â€œI know! I too am from the future,â€ as Mike Birbiglia says.

It was not trivial, and I ended up learning from the experience.

Test assumptions, set baselines: I didnâ€™t have 32 coffees, which was good, because some days I did an A/A test to see what the difference would be. I was surprised, on those days voting for winners was down, and results were remarkably close to 50%/50% â€” and the highest split was 58% (10/17), which was a vote off a straight split.

Know that blind tests mean subjects may reject results or: Starbucks did really well. I donâ€™t know what to say. I figured theyâ€™d barely beat out the clearly awful generic ones and Tullyâ€™s, but some of their whole beans did well got all the way to the semi-finals. Participants were not happy to learn this, came by to ask questions, and generally were reluctant to accept that they’d preferred it. If a Starbucks bean had won but it had made people unhappy, would I have gone through with ordering it? I’m glad I didn’t have to confront that.

Also… yeah, Seattleites have issues with Starbucks.

Consider the potential cost of testing itself. The relatively small amount of time I thought it would take each day turned into way more effort than Iâ€™d hoped. Doing testing in public is a colossal hassle. Even having told everyone how I was doing it, during the month this went on, there were those offering constant feedback:

it should be double-blind so I donâ€™t know which pot is which
it should have three pots, and they might all be the same, or different
no, theyâ€™re wrong…
itâ€™s easy to come in early and see which one Iâ€™m making

â€¦and so on. By week two, getting up early to make two pots of coffee as someone offered methodological criticism was an A/B trial of my patience.

If testers can tamper, they will â€” how will you deal with it? For one example, I came into the kitchen one day to get a refill and a developer was telling everyone he knew what pot was which because heâ€™d seen me brewing and had poured an early cup off that, and so knew the pot with the lower level indicator was that batch. He was clearly delighted to tell everyone drinking coffee which one theyâ€™d picked. I honored the dayâ€™s results anyway.

This kind of thing happened all the time. At one point I was making the coffee in a conference room to keep the day’s coffees concealed. In a conference room! Like a barbarian!

I was reminded of the perils of pricing A/B experiments, which Amazon was being called out for at the time â€” if customers know they might be part of a test and start clearing their browser cookies and trying to get into the right bucket, how does that skew the results? â€œPeople who reloaded the page over four times converted at a much higher rateâ€¦ we should encourage refreshing!”

Think through potential â€œmargin of errorâ€ decisions when structuring tests. There was a coffee I liked that dominated early rounds and then in the semi-finals lost by two votes to a coffee that had squeaked by in previous rounds by 1-2 votes each time. What should I have done in cases where the vote was so close? Iâ€™d decided the winner by any margin would advance, but was that the way it should have been? Should I have had a loser bracket?

In the end, we had a winner, and it was quite good â€” and far better than what the default choice would have been â€” but I was left unsatisfied. Iâ€™d met the requirements for the test, itâ€™d been a pain in the ass for me but not taken that much time. I couldnâ€™t help but think though that if Iâ€™d just set up a giant tasting session for anyone who cared, and let them vote all at once, Iâ€™d have saved everyone a lot of trouble and possibly had a better result.

But more importantly, like every other time Iâ€™ve done A/B testing in my product management career, the time I spent on the test and in thinking through its implications and the process helped me in every subsequent test, and was well worth it. I encourage everyone to find places to do this kind of lightweight learning. Surely there are dog owners out there wondering what treats are best, and dogs who would be happy to participate (and cheat, if youâ€™re not wary).

Go forth!

Two dual-track agile attempts, and lessons for future generations

Leave a reply

I’ve been through two tries at implementing dual-track agile, and I’d like to offer some perspective on the travails, the pros, the cons, and offer some advice to those who might attempt it.

What’s dual-track agile?

In short — you pull a product manager, a designer, and a developer with broad knowledge of the area you’re working in forward in the process, taking on scouting and evaluating work, which then drops into the backlog of whatever agile methodology you’re using.

This is intended to solve the problems of how you develop the stories that a team will work on — the design sprint, the reset sprint, the investigation card, or in carving capacity for those tasks out of the team’s capacity — which in themselves often provoke both a set of methodology challenges and a level of bikeshed argumentation about methodology that can be immensely draining.

Here’s Marty Cagan on this in 2012
And here’s this good Mind the Product article

Which includes this good gif:

Implementation one: everyone jump into the pool at once

At Simple, we’d been through a couple of massive crash re-platformings that hadn’t delivered new features to customers. We two product teams (one, mine, working on what would become Shared Accounts, and another working on long-neglected customer service features, so we were going after things that would create value for our customers, but we were still not shipping.

We brought in Silicon Valley Product Group to do an assessment and recommendation. One of their number came in, did interviews with key stakeholders, and then in a day-long session in which almost everyone was present (we had our recruiters there!), told us what they’d seen as problems and offered a set of prescriptions. The biggest and most systemic one was to go to dual track agile.

Our leadership then declared we’d adopt dual-track agile, and so it was.

It didn’t take. We adopted dual track in name, but in practice, we couldn’t get the developers with the required knowledge to participate, and so discovery withered. Ours could not move into doing that work, being continually pulled into on-call, supporting retiring the deep technical debt, and doing the architecture work that would keep a team working.

Without developer participation, the discovery track could evaluate whether work was valuable to the end user, and whether that value could be realized by customers, but it still meant that items about to go to the team did not have a ROI, because we still needed to figure out the I. And in turn, that meant that there still needed to be an additional step in what should be the “delivery” track to do even high level development investigation.

What could we have done?

no methodology exists in a vacuum — consider the people and teams that will be doing the work. The people may be willing, but if circumstance can’t permit, you’re set up to fail
if there are changes you can make that would allow you to use a methodology, you have to make the changes or wait — don’t do the reorg or change your approach and just hope
a top-down mass implementation wasn’t the way to go — ever the pragmatist, I’d have rather found a team where it was a particularly good fit, done it there, and then learned lessons that could spread. When we started doing Scrum on teams at Expedia, we got approval to try it in three teams with good conditions and very different problems (flight search, hotels, and packages) and were able to learn from each other and measure our success against other teams and spread the good word.

Two: easing into it

As Lead Product Manager at Manifold, I was able to drive methodology. I decided to follow my own advice and do it real subtle-like. We started to use Product Pad, doing more and more discovery activities… and my intention was to use more and more of the process until we’d be using dual-track across the teams without having to talk about it.

During this time, my boss, the VP of Eng, encouraged me to just do it! We’re small, you can do a lunch-and-learn and just go!

I, having been burned by the previous experience, and encouraged by success at gradual adoption of Scrum at Expedia, declined and decided to go ahead with moving forward. Let this be a lesson to us all:

If you have the opportunity to jump ahead, and the support to do so, maybe just do it

Dual-track took… kind of. Unfortunately for dual-track, the company made a hard business pivot and organized around long-term contractual obligations (so product teams organized around delivering promised functionality, rather than pursuing objectives of their own). There’s a little bit of work to be done within that, but you’re not going out to do basic user research, address problems, etc.

Fundamentally, dual-track exists to support testing ideas, learning, and exploration. If you’re not doing that as a business, don’t adopt a framework that supports it. It’s like the difference between the road-trip and doing a regular commute. One requires research, planning, friends, snacks, a good dog to hang out the window if you’re lucky, and the other requires you to figure out the vehicle and route once and then pay attention.

I also encountered more resistance in the form of nearly-endless tool and process questioning than I’d expected or was prepared for. I found myself answering questions like “What we do when robosaurs attack local food distribution centers is a good question, and we’d have to talk about what that would mean for ticket handling…””

Now, what was going on? There were two culture clashes with Manifold’s stated values of transparency and autonomy, where anyone should be able to do anything as long as there was an audit log and the action was reversible.

1: dual-track itself seemed to clash with culture: that there would be three people who, outside of a product team’s normal activities, would be making decisions on the direction of the team and what work there was to be done.
2: tooling: first, introducing another tool for people to use raised the “why not just do all this in (JIRA/github/___)?” and in particular, the tool I’d introduced, Product Pad, is great for a lot of things but has a ton of restrictions on roles (for instance, a normal user can’t go add stories to an idea filed by someone else) that rankled people. I had not done enough to consider this.

What happened?

We started, ideas and feedback go into the hopper, there are processes to do discovery, and it’s being used on some more-nebulous pieces to create a roadmap.

I feel like I should have abandoned it when we made our pivot — I should have considered that a change in direction and how we work as a company was worth wiping the white board clean and evaluating the process challenges against what the kind of work we’d be doing over the next 12-18 months.

Questions to ask and things to do as you weigh dual-track.

First, think about this:

What’s your elevator pitch for why the change is worth it — what will you gain, or what known harms would it have prevented?
What’s that pitch for each of the stakeholders in the process? What’s your pitch to your Marketing partners, for instance?
Who’s your champion at the exec level?
Consider the culture: will people react poorly to the appearance of a trio of people taking over the direction of the team where once it seemed open to all? What can you do in designing the process and communication to address those concerns?

And then, block out your calendar for a long time so you can concentrate, and

Map out the process from end-to-end before any public rollout or discussion.

How will an idea or customer request go through the process? What tools will be used? Consider the maximally complex cases —

A developer has an idea
We write up the idea — where? How is it tracked?
How do you decide that that idea is worth investigating out of all the other ideas? Who makes that decision? Where?
We need to do some user research on the idea — who does it? Where is that request tracked? How does the result get tracked?
We need to show a prototype to users, and it requires both design and a little bit of dev. How do you start and track that work? How do you record the work?
You now know that the idea has merit and can be made usable. How do you track that?
How do you cost out the work? How is that work assigned, and tracked?
How do you choose what goes onto the roadmap from all of the ideas that have cost/benefit ratings? How?
If you’re doing regular user research or usability testing, how will that fit? Where will those results go?
How will you communicate each decision out to everyone who is interested in the results?

Then for each person in the process, list how many tools they must use including any new ones, and how they’ll track their work at each stage. If they live in a world of GH issues and you now require them to put their head into Product Pad regularly, or participate in discussions there, you’re adding complexity into their life and they’ll have to see a lot of value come out of this extra effort — which may fall to you as a product manager.

Now you’ve got a rollout plan, of sorts, with an idea of the cost and what will need to happen for the process to be a success, and you can make an informed decision.

In summary, dual-track agile is a methodology of contrasts

Having been through two attempts to implement it, I’m much more likely to look at existing processes and find ways to build in things like regular user testing and fast feedback loops, and see if they improve things — or start to create the need for dedicated discovery process and activities.

I would love to hear other people’s success or failure stories implementing dual-track agile (or any of the latest hot methodologies) — please, drop me a line if you’ve got them.

Hate Life, Will Travel

Occasional musings of Derek Zumsteg

Using ChatGPT for a job search: what worked, didn’t, and what’s dangerously bad

An aside, on job hunting and the AI arms race

And a quick caution about relying on ChatGPT in two ways

What ChatGPT was great for

Rewriting

Interview Prep

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Industry research and comparison work

I don’t know if ChatGPT helped on

Keyword stuffing

Success rates, generally and versus screening AI

Things ChatGPT was terrible at

Writing from scratch

For anything where there’s latitude for confabulation

Finding for jobs and job resources

What I’d recommend

Where Reddit’s gone wrong: 3rd party apps are invaluable user research and a competitive moat, not parasites

How human brains drive anti-customer design decisions on shopping sites

Unchecked AB Testing Destroys Everything it Touches

Hinge’s Standout stands out as a new low in dating monetization

Chemistry.com fizzles: a product manager attempts online dating, pt 3

EliteSingles: a Product Manager attempts to date online, pt 2

A Product Manager attempts to date: Match.com onboarding

Learning from uncooperative A/B testers

Two dual-track agile attempts, and lessons for future generations

What’s dual-track agile?

Implementation one: everyone jump into the pool at once

Two: easing into it

Questions to ask and things to do as you weigh dual-track.

In summary, dual-track agile is a methodology of contrasts

An aside, on job hunting and the AI arms race

And a quick caution about relying on ChatGPT in two ways

What ChatGPT was great for

Rewriting

Interview Prep

What ChatGPT was useful for

Comparing resumes to a job description and offering analysis

Industry research and comparison work

I don’t know if ChatGPT helped on

Keyword stuffing

Success rates, generally and versus screening AI

Things ChatGPT was terrible at

Writing from scratch

For anything where there’s latitude for confabulation

Finding for jobs and job resources

What I’d recommend

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

What’s dual-track agile?

Implementation one: everyone jump into the pool at once

Two: easing into it

Questions to ask and things to do as you weigh dual-track.

In summary, dual-track agile is a methodology of contrasts

Share this: