The past, present, and future of AI, with Stripe

The past, present, and future of AI, with Stripe
How machine learning fingers credit card fraud, how approaches have evolved over the years, and what is new in AI, with Stripe's Emily Sands.

This week I'm joined by Emily Sands, who leads the Information org at Stripe, my former employer. We discuss Stripe's use of machine learning and AI techniques over the course of the last 15 years, with specific reference to making high-velocity decisions to reduce fraud and increase the number of legitimate transactions which are approved without a hitch. (A credit card transaction is, as discussed, a 5-way handshake, except when there are even more hands.) We also share some of Emily's observations from Stripe's front row seat to seeing the new wave of AI companies scale rapidly.

Sponsor:

Vanta automates security compliance and builds trust, helping companies streamline ISO, SOC 2, and AI framework certifications. Learn more at https://vanta.com/complex

Timestamps

(00:00) Intro
(01:21) Stripe's role in financial infrastructure
(02:20) AI and machine learning at Stripe
(07:36) Understanding payment processing
(15:06) The evolution of fraud detection
(19:22) Advanced fraud detection techniques
(22:55) Sponsor: Vanta
(24:14) Advanced fraud detection techniques (part 2)
(34:53) Card testing and fraud prevention
(45:29) Adaptive acceptance and future of payments
(46:35) Optimizing payment systems with machine learning
(47:13) Understanding ISO messages and credit card protocols
(48:37) The importance of semantically neutral changes
(51:22) Stripe's enhanced issuer network
(56:00) Handling chargebacks and friendly fraud
(01:04:55) Personalizing the checkout experience
(01:16:44) The rise of AI in the internet economy
(01:26:05) Agent-assisted commerce: the future of shopping
(01:32:58) Wrap

Transcript

Patrick McKenzie: Hideho everybody, my name is Patrick McKenzie, better known as patio11 on the Internet, and I'm here with my buddy Emily Sands at Stripe.

Emily Sands: Thanks for having me, patio11.

Patrick McKenzie: Thanks very much for coming. So can you tell people what you do at Stripe?

Emily Sands: I lead the Information org. We work on our data stack for machine learning (ML) infrastructure. We ensure engineers, scientists, ML engineers at Stripe can build and deploy production grade ML applications all the way through to a bunch of the applied ML and science work directly.

Patrick McKenzie: Today we’ll cover the history and recent practice of machine learning and other forms of artificial intelligence in the financial sector, with a particular interest in credit card transactions and other form of transactions, because that is where the bread is mutually buttered.

I suppose before we start, I'll give a disclaimer. I used to work at Stripe for about six years and left in early 2023. I'm still an advisor and I am not necessarily speaking for Stripe in anything I say. Unfortunately, my technical knowledge is degrading over time as I no longer have access to the GitHub repository and similar.

That out of the way:

Stripe's role in financial infrastructure

So Stripe has been working on ML for a very long time, hasn't it?

Emily Sands: We have. Stripe has for about 15 years now been building global programmable financial infrastructure. So tools to increase the GDP of the internet. And we started really by helping digitally native startups accept payments online. And Patrick, you certainly had a front row seat to that era.

If you fast forward to today, there are millions of businesses that rely on Stripe for much more than just accepting payments. They rely on us to reduce fraud, to manage complex money flows, to bring together their online commerce with their offline brick and mortar commerce, to launch embedded financial products, including things like providing capital, so loans to the businesses that are running on them. And the list goes on.

But basically we help businesses from newly formed startups to now also half of the Fortune 100s innovate and grow more easily.

AI and machine learning at Stripe

And AI has been a core part of that story for a long time. Certainly as you alluded to in the charge path directly, so for fraud detection, for making sure good payments go through, but AI also plays an important role beyond the charge path.

We use it to personalize checkout experiences, very up funnel from the transaction. We use it to help the 13,000 platforms and marketplaces that run on Stripe better understand the merchants that are onboarding to them. And we use AI to help businesses of all sizes and all business models learn from their own data so they can grow their revenue more efficiently.

Patrick McKenzie: And we will talk about some of the interesting things that one does with all the data once one has it. But Stripe is processing an awful lot of data these days connected to an awful lot of transactions, as anyone who reads Stripe’s annual letter [PDF] can read. Would you like to quote the eye-popping numbers from that?

Emily Sands: There were a few different eye popping numbers to quote, but top of mind for me is just the Stripe network generates very rich real time data on the internet economy. So in 2024, we processed $1.4 trillion in payments and it's a lot of zeros.

To put it in perspective, that's on the order of about 1.3% of global GDP. And that payments volume is up about 36% year over year. And it's really that scale that allows us to detect and ship optimization opportunities very, very quickly for our users.

Patrick McKenzie: And one thing that I would like to emphasize to people is that we're going through heady times right now with the adoption of LLMs and other sort of cutting edge approaches in the AI field. But there is in some circles around the Internet a question on whether LLMs are going to turn into productionizable value at real companies anytime soon. I'm less skeptical than those circles are, but be that as it may, this is mature technology that has been deployed for 15 plus years now.

We're going to talk about some of the evolution of the sort of orchestra of technologies used over that time. This really runs every day, really does reduce customers' risk, fraud risk, helps increase the amount of revenue they generate. This is very much not science fiction. This is financial industry fact.

Emily Sands: Definitely not science fiction and maybe since you referenced LLMs it's useful to just level set on sort of what kind of AI are we talking about here? The answer at Stripe is we use the full stack from sort of traditional machine learning all the way to large language models and it just depends on the job to be done.

Traditional machine learning like gradient boosted decision trees are really great when we need fast and interpretable predictions and where the signals that are informing the decision, so what features we're gonna hand engineer to feed the model are relatively obvious things. So say we're gonna estimate the likelihood that a charge might result in a charge back.

But we also run a lot of deep learning models, especially transformers in places like complex fraud detection. And deep learning models are just really great at picking up patterns that aren't obvious. There are a lot of transactions that would seem fine based on hand curated features that an engineer sort of could develop and intuit. But then there are underlying signals that would say otherwise.

Foundation models are obviously sort of a supercharged version of deep learning models. These are massive deep learning models trained on broad data and built to be general purpose. 

And then, when it comes to large language models or LLMs that you referenced at the top, the kind that power chatbots and assistants for language relevant contexts, those are used at Stripe in applications like Sigma Assistant, which is about letting teams ask a question in plain English of their Stripe data and get back a SQL query that they can run right away, so like talking directly to your business data.

So I would really reason about it as a layered stack. We use the full stack from very focused models to general purpose ones, and they work together based on the task at hand.

Patrick McKenzie: And I have a disadvantage in that my undergrad concentration AI is now almost 20 years out of date, but I'll take the liberty of defining one word that means a different thing in machine learning research land than it does in standard English, even in the tech industry.

The word feature, which you referenced a couple of times, typically you write a small amount of code to extract a bit of signal from some underlying transaction or document or other thing that you're looking at. The signal that you're extracting is called the feature or the code that generates the feature.

And so it could be something like, a bugbear which we'll return to later, “Is this transaction conducted in a country different from the country of the financial institution which originated the card that the transaction is running over?”

Emily Sands: Yes, and lots of counter features too, right? How many times have we seen something from this card at this IP versus another IP over the last n days or n transaction counts and so on? [Patrick notes: Velocity checks are hugely probative and surprisingly difficult to get running at scale. We’ll return to this later.]

Patrick McKenzie: And a lot of the techniques that we'll discuss in a moment are sort of aggregating a universe of hundreds or thousands of features and trying to reduce that hilariously complex and dimensional space down to a single number or a single decision with regards to should we allow this transaction to go through.

Understanding payment processing

But before we go into the joys of clustering algorithms, I'm going to stand up on my soapbox as the person who writes about financial infrastructure occasionally [Patrick notes: someday I shall slay the self-deprecation demon; today is not that day] and do just the sort of high level detail of what happens when people buy something in a web browser so that they can understand the various points for intervention for machine learning models and similar. Sound good?

Emily Sands: Sounds good.

Patrick McKenzie: I assume everyone listening to this podcast has probably bought something online before. I'll use the simple case where you have a webpage open and you are at the checkout part of the form, either because the website is sending you to via redirect to a payment processor or because they're using an iframe or similar technology to get information directly from you. 

[Patrick notes: This is an improvement from the old way of doing things, where the business first receives your payment credentials to its own server, then sends them to the payment processor itself. This tends to cause businesses to accidentally retain many credentials, which results in them all being compromised retrospectively if the business’ server is compromised. It is now heavily discouraged by PCI-DSS, an industry self-regulation, to do this, and so most of the out-of-the-box payments experiences you’d implement in 2025 would have the business never receive the credentials directly. They receive a “token”—a reference to the credentials, generated by the payments provider, which gets the credentials from the user’s browser directly using one of a few technical methods.]

Patrick continues: Most websites these days will not have you actually send your credit card details or other payment credentials direct to the business that you're buying from. That goes directly to their payment processor, which might be a company like Stripe, or might be a company like Paypal or Adyen. There might actually be multiple companies, for example, one being in charge of selling the customer on the desirability of using payment processing, but another company on the back end actually doing the math and data collection. [Patrick notes: This is an extremely common practice in financial services. It’s all infra, all the way down, and many of the largest infra companies are ones that the typical user will never even learn the name of, like e.g. FinServ.] 

The payment processor is going to do a really fascinating bit of multi-party, multi-stage orchestration with the data that you've just provided. In the most typical case for credit cards, there's at least two financial institutions involved. One is the payment processor's bank, which is called the acquiring bank. And the other is the bank that is listed on the credit card. That's called the issuing bank.

In between those two, there are some technical systems for moving data between the two, which are associated with most typically the brand that created the credit card. So for example, Visa, although we'll talk about other potential options for being in the middle there in a few minutes.

The legal, organizational, and technical infrastructure that gets information from point A to point B are collectively known in the industry as rails. So when you say something is traveling over Visa rails, you're saying, okay, a) this is a Visa transaction b) the data is moving through Visa’s systems and processes to the financial institutions that are ultimately making the choices with regards to the transaction.

Which is an important thing to say. Every party that we've just named, and sometimes even more parties besides, get a point of view on whether the transaction goes through. And their point of view is generally come up with by a computer system in substantially real time. And those computers don't typically have another way to talk to each other. They are just implementing this protocol that has existed for, broad strokes, multiple decades.

Each of them is developed in parallel. They have independent views on the transaction based on having independent views on the economic substance of the life of their customer and the life of this person that is transacting with the customer who they might have no point of view on right now.

[Patrick notes: For example, a bank will typically have high confidence in the actual real-world identity of a customer using their plastic. The transacting business, on the other hand, might have just met this person over the Internet, and so they have vastly different levels of knowledge as to what the person typically does with their card(s). The business’ processor, on the other hand, has a great deal of knowledge of what the typical customer of this business does at the business and potentially at some other businesses (which it also processes for), which is information the bank doesn’t typically have.

Complicating things, some of the largest issuers have sufficiently distributed customers such that they have a partial census of the book of business of most processors, and can infer things about transacting businesses with material history because they’ve seen them as an issuing bank frequently enough.]  

Patrick continues: And then if all of them decide the transaction should go through, the transaction goes through. And if even one of them decides “Nope!”, then the transaction fails to go through. [Patrick notes: This is extremely surprising to business owners the first time they start running cards, particularly once they learn that they and their customer together can be overruled by intermediaries and the exact identity of the overruling intermediary is often opaque.]

Patrick continues: And so this introduces sort of the fundamental tension in processing payments, which is we have a choice between which payments we deny at the margin for reasons like, for example, presumed fraud risk. There are trade-offs to be made here.

We would obviously prefer there to be less fraud in the system, generally speaking. But fraud is an expense that businesses can choose to bear just like they choose to bear other expenses like rent or smart employees or investing in technical systems. And some businesses might choose to bear a little bit more fraud in the course of getting more legitimate transactions at the margin. Do want to talk about how that trade-off presents differently to different businesses?

Emily Sands: Yes, absolutely. So before working in this industry, I actually had no idea how many things could go wrong in payments. I super appreciate you laying that out. And transaction fraud is an obvious one. And then on the flip side, all sorts of things can also go awry for good transactions.

[Patrick notes: When Tyler Cowen asked me what caused so-called spurious declines, I somewhat jokingly attributed it to “gremlins.” The spurious decline rate is a bit of a trade secret, but it is far higher than most people would assume you can tolerate in a well-established technology core to the modern economy. For some reasons, see my explanation there; we discuss countermeasures again spurious declines in a moment..]

And one of the tensions, although certainly not the only tension, is that the more a business restricts on transactions that may be fraudulent, the more good transactions, all else equal on the methodology and the model and the data and so on, the more good transactions they are inadvertently blocking as well.

And these trade-off decisions or how you jointly optimize between conversion and fraud is heavily dependent, as you note, on the type of business and their business model. Usually, businesses are—I'm an economist by trade, so it may help me here—usually profit maximizing.

And how that manifests in sort of two concrete cases is if you reason differently about a SaaS provider's objective function than about a brick and mortar goods seller, somebody shipping out a physical goods objective function. So both of them are probably profit maximizing, but in the case of the physical good, the cost of fraud to the business is very high.

And so when they get a charge back, they've already shipped out the good, they've paid shipping, they've lost the good. And that's very expensive to the business. Whereas in the case of a SaaS provider, sort of other end of the spectrum, it's much more likely that their marginal costs of serving are low. And so a charge back is much less costly.

And so then how that manifests in sort of the optimization function, both are probably wanting to optimize for profits, but the physical goods seller will bias towards blocking more fraud and be more okay taking a cost in terms of a conversion hit, then will the SaaS provider who will by and large accept some more fraud in exchange for driving up conversion.

And there are also cost implications here beyond the costs of the goods and services, which is the costs of the payment and the retries and so on, which I've extracted away a little bit from in that example, but you can think of it as fraud and conversion are two sides of a coin and companies are trying to find their optimal trade-offs given their profit function.

Patrick McKenzie: And this gets fractally complex too. One dimension of the complexity is that there are some companies that have internal payments teams that have expertise around this sort of thing and understand that they are in a multivariable optimization function and have sort of internal essentially research results where we have tried this in the past. We've had these things. We would like to continue exploring some directions and not put additional efforts into other directions in the future.

Then there are other companies that are perhaps not as sophisticated along that dimension who just want the thing to work out of the box.

And then within even generalizations such as SaaS companies typically have high margins and can tolerate more fraud. Definitely true as a overall statement, but there are some SaaS businesses where detecting payments fraud is also detecting non-payment abuse of the service.

Classically, SaaS companies that send a lot of email, for example. And since your reputation as a company for sending good email or bad email is a very important economic asset of the company, you want to use the payment fraud gateway to protect you from bad actors who would send out spammy email and thus hurt your algorithmic reputation and your legitimate customer's ability to get email into inboxes at Gmail, which is ultimately the economic lifeblood of the business. 

This is fractally complicated. Every topic you double click into, there's a million more that are below that, but we have to start somewhere.

The evolution of fraud detection

Let's start back in the day. So when we were first dragging commerce online, the early approaches with regards to automating the detection of fraud tended to use heuristics a lot. Do you want to give people a gloss of what the heuristic era looked like?

Emily Sands: Heuristics are just rules. Rules are nice because they're fast to implement and they're very explainable to understand exactly what's happening. For example, you could say block all transactions over $1,000 from IP in a certain geo [Patrick notes: geographic area] where the card details are from a different geo.

You can look at the rules and immediately know what they're doing. And sometimes that's exactly what you need. So you may find an attack vector that you've never seen before. And in those cases, a rule is your best friend because you can very quickly shut down the attack vector and speed matters a lot.

Just to pause for a moment, and then we'll talk about alternatives to rules. We actually in Radar, our fraud product, support users in writing their own rules. And last year we launched Radar Assistant, which is just a natural language interface so that non-technical folks—think fraud analysts who aren't writing code—can very quickly write and deploy rules in code without having to do the engineering work.

And so this is a LLM based example to our conversation earlier, where you just describe the logic you want in plain English, you know, block payments from new users in country X over amount Y, and it'll generate that rule code for you on the spot.

And the genesis of that is just the speed of rule writing is very important because rules are blunt instruments and we'll talk a little bit about their downsides, but the upside is you can ship them quickly and you can use it against new attack vectors.

[Patrick notes: We discuss why rules tend to degrade in effectiveness over time in a moment. Briefly, they’re elements in a dynamic process which is being observed by, in many cases, a sophisticated adversary. The ground truth of the good transactions changes over time, relatively slowly, and the ground truth of the bad transactions changes very quickly indeed. Of course, this is just a general rule of thumb: at the scale of the economy, someone is always experiencing nearly unprecedented circumstances for their business: a sudden surge of virality, a new product launch in Japan, a global pandemic which fundamentally altered grocery store transaction patterns in less than 6 weeks, etc.]

Patrick McKenzie: Back when I was still selling software for a living, I remember I was once targeted by a card testing ring. We'll talk about card testing later. Whomever the programmer working for the card testing ring was clever but not infinitely clever. All of their spoofed email addresses were some pattern and then six random numbers at hotmail.com. And I had sold, I think, 8,000 copies of the software to legitimate users but had literally never sold anything to six random digits at hotmail.com. And so I came up with an interface to flag those as presumptively fraud-y. [Patrick notes: If a transaction looked sufficiently fraud-y, based on hitting a few if statements, I gave away the software for free using the same messaging that would ordinarily happen if I charged the card. This quickly poisoned the results the attackers were getting from me, since their non-working cards got passed as working, and I was swiftly dropped from their set of card verifiers.]

Like you mentioned, sometimes businesses have a really good understanding of rules that they can implement that are very good on precision versus recall, which are the magic words that we use to explain type one and type two errors in this industry. Sometimes there are rules that feel intuitively great, which might not be great if you just did the math on them.

[Patrick notes: A far subtler problem than this one, offered as an aside: you can create a great feature, where the feature successfully flags an absurd portion of bad transactions relative to misclassified good transactions, and not have that feature be useful. How? Because it might not be additive: if the existing constellation of features you’ve implemented already correctly classify those transactions, then the marginal value of that feature will be low, and you might even see a computer decide that your weeks of work are not worth the CPU cycles to keep executing.

That’s not a great feeling, to put it mildly, and it substantially predates LLMs being able to give you emotional support while explaining that the math says the new feature is basically valueless at the current margins.] 

Patrick continues: And one of the common ones businesses come up with is, well, if a customer is buying something in a different country than the one that issued their credit card, that sounds very suspicious to me. So let's block all those off the top.

And I have bashed my head into that role at any number of companies over the years because I live life partly in Japan, partly in the United States. And I am pretty sure that I'm a relatively desirable consumer from most of the companies that I'm doing business with. And it turns out that human edge cases like myself are not particularly infrequent at the scale of the economy.

Emily Sands: As a consumer, yeah, patio11, getting blocked.

Patrick McKenzie: And so one of things that I like about Radar is it just simply when you propose a new rule, it can do back testing against your prior historical data and say, OK, this would have blocked 438 instances of fraud at a total value of blah. But just so you know, it would also have blocked x number of legitimate transactions.

And then you as a business can make a determination on whether that makes sense to just default block those or whether you want to raise them internally with the fraud team or take some other action or maybe tighten your rulemaking until you find one that plays well with history.

Emily Sands: Yeah. I think on principle, you basically don't want big static rules sticking around forever because they're not nuanced. And even if you go to the dashboard and you see that you're not blocking many good transactions with this rule today, that may change tomorrow or the next day or the day after. And you may or may not know to make the swap.

Advanced fraud detection techniques

Actually just last month, we launched a big upgrade to Radar's AI tooling to blend rules with AI. And what we support now is dynamic risk-based rules. So these are smarter rules that bring together the best of both worlds. You still get the transparency of having the rules, but then they can be overrode in select cases with the intelligence of machine learning.

So a very simple example, radar can block a transaction if the CVC, so that sort of three or four digit number you type in when you type in your credit card, is wrong, or if your postal code check fails. [Patrick notes: Many people assume that postal code is a requirement to charge credit cards. Actually not the case! It is used for AVS verification, and it is typically the only level of AVS verification that is actually probative. (You can also get back AVS return codes like e.g. “Exactly matched the street number but did not match the street name”, but these are often not very signalful.) But you can run a charge without a postal code, or with an incorrect postal code, if your processor (and the issuing bank, etc) let you.]

Emily continues: So that's a pretty straightforward block rule. But now we let businesses use rules that combine that verification data with real-time risk scores from our ML models, and also, to your comment earlier about all the different players in the ecosystem, from the card issuer.

So for example, instead of blocking everyone with a CVC mismatch, we can say, okay, this is a CVC mismatch, yes, but it looks low risk. Maybe they fat fingered their CVC this time. I certainly do that. And so we let it through. Whereas on the flip side, if it's high risk traffic with a CVC mismatch or a postal code check that fails, we still block it.

And just to give you a sense, like that one change alone, moves the needle for businesses revenue. So it improves payment success rates by 1.3 percentage points, not percent, not relative—absolute 1.3 percentage points without increasing fraud. So that's just billions in recovered revenue across Stripe's user base, just for making that one rule smarter.

Patrick McKenzie: It always feels ludicrous to me, when I saw the numbers internally or even the ones that get published, because the notion that just a sort of under the hood tweak at an infrastructure provider could have that level of uplift in the economy is a bit mind-blowing, a bit inspiring, and a bit, wow, we've put up with such a high decline rate for what turns out to be no good reason for so many decades. [Patrick notes: I dock myself one cookie for describing legacy infrastructure as having “no good reason” for the failures. They are an emergent behavior of a complex system which has the desirable property of existing, and our current Internet economy owes its existence to having had that imperfect but available substrate to grow up on.] 

Emily Sands: Yeah, I think a lot of the impact from these optimizations comes from just how well suited this space is to AI. So first, there's a ton of decision points where very small changes make a difference. So if you just take the example of a failed payment, you decide to retry the failed payment. You can tweak the timing of the retry, how you route it, how you format the request. And these are all kind of perfect opportunities for explore, exploit ML algorithms to help us learn and optimize on the fly. So just all of these decision points make it really ripe for optimization.

Second, because of the scale of the Stripe network, because of that sort of $1.4 trillion that we processed in 2024, up 36% year on year, we can detect and ship these optimization opportunities really quickly.

And then third, I think it's a perfect space for AI because it's constantly evolving. Certainly fraudsters are evolving and new fraud tactics emerge every week and a rule that worked last month may now be blocking good customers. And so it's really a space where you need models because the models keep learning because static heuristics get really stale really fast. And so this isn't just a place where kind of AI isn't nice to have. It's actually a place where AI thrives.

Patrick McKenzie: To respond to a few sub points there, we have an interesting degree of regard for the adversary here.

[Patrick notes: It’s important not to have unlimited regard for them, because many are or work for professional criminal organizations which execute the full range of gangster activities, up to and including murder. But I came up on the scruffier side of the Internet, and often reflect that when I was reading e.g. ratware forums as an anti-spam researcher early in my career, the writers of that ratware were not hugely different in character from me. A different passport, a different network, a different set of opportunities, and maybe they would have ended up selling bingo cards to U.S. schoolteachers, and I would have ended up writing for loops to defeat spam filters.

And, indeed, a more-or-less open secret in Silicon Valley is that some founders and engineers have… youthful indiscretions, which covers a spectrum from “engaged in some SEO shenanigans” to “nope, out-and-out financial fraud” to “one of the few people who can say that they, literally, broke the Internet.”]

So there's a variety of topologies of credit card fraud and other forms of financial fraud. And some of it is conducted by relatively unsophisticated actors. There are teenagers who steal Nikes via credit cards in the same way that there are teenagers who walk into stores and steal Nikes the old-fashioned way.

But a lot of fraud is conducted by well-resourced, effectively professional actors who are long-term players in the economy, just like the good guys are long-term players in the economy. And these are extremely smart people. They get to inspect the results that their work on Tuesday gets and go back to the lab again and try again on Wednesday. And they are capable of evolving their approaches at a very high cycle rate.

And so we have been in sort of Red Queen's race with them as an industry for at least 30 plus years at this point to attempt to avoid being the easiest place in the economy to squeeze illegitimate gains out of. And when that fails, legitimate users of the economy get very large dollar losses associated with that failure. And so the ability to constantly adapt to what they're trying on Wednesday and to stay ahead of them is extremely important.

Another thing you mentioned is changing the routing for requests. To just give people an intuitive understanding of what is happening here, we mentioned rails earlier, and you might have a piece of plastic in your wallet, like a debit card, for example, that has multiple rails associated with it.

So on the front of the card, it might have a logo for Visa or MasterCard. But if you flip it on the back, there might be a number of ATM networks that are associated with it, like Cirrus, for example. These are typically much less known to the typical consumer. But in the case where for whatever reason a transaction over rail A fails, it is not necessarily the case that a transaction over rail B conducted half second later would also fail.

So Stripe can intuit which those cases are where retrying again versus B would work well, just silently do the retry. And if the retry works, then from the users and business perspective, the same amount of money was charged. It was charged to the same card. This is just a success. And you don't have to tell the user, by the way, something failed in the background, but don't worry, we handled it for you.

And that notion of just at a very, very high velocity making these decisions on behalf of individual users is one of the other things that makes this a rich space for AI approaches. Because unlike things that have sort of direct consequences in the physical world, don't know, rerouting a package, you don't have to actually get the package back and take a number of days for it to move from San Francisco to Chicago to make another try at rerouting it. You can try it effectively instantly and get the results back before any human could have known that there was a failure that it's being covered for. So it is a fun industry to work in from that perspective.

So we talked about heuristics, but soon after heuristics, the industry started doing the early stages of machine learning. I get to go back to 2004 when the hot new thing at research universities was clustering algorithms. And the one that I had the most experience with back then was called k-means.

And we'll avoid going deep into the woods on what k-means actually did because it's been thoroughly obsolete by this point. But essentially, if we have a vector of features that we've created, vector just means a long string of them, but with math that people might remember from high school or undergraduate. We have a vector of features and we would like to somehow crunch those against all the data that we have on good transactions and bad transactions and cluster that to a pool of is it a good transaction or is it a bad transaction or is it a transaction that would not benefit from a retry right now or is it a transaction that would benefit from a retry right now or similar. And k-means was just a way to transform those very complicated vectors into discrete decisions.

Unfortunately, what that made up for in executing very complicated math very quickly to come up with a decision, it lacked in terms of explainability. We mentioned that a core feature of rules is that you can explain to an actual person, OK, the reason why we flag this transaction is this. And that explanation will make sense. With clustering algorithms, it's often, well, the math just said it's bad. And that's a bit unsatisfying to many users.

There's nothing you can tell about to the user directly to help them optimize it for the next time. There are often stakeholders involved who might want you to guarantee invariants about, your system fair to people that are living in different states, for example, or many other axes that we care about fairness on? And you have no great answer for them at the moment because you almost don't know what the clustering algorithm is clustering on. There is no internal inspectability of it.

And as an example, people might have heard of FICO scores. I don't know if this is true today, but back in the day, they were heavily informed by clustering algorithms. [Patrick notes: I once upon a time was a research assistant on a project which FICO contributed substantial expertise on. They were, as you can imagine, cagey at describing the secret sauce, but implied that they had more-than-academic interest in clustering.] 

FICO could tell you with a very high degree of statistical confidence that the risk of a default at FICO 500 versus FICO 800 is quite different from each other. But when people took their decisions they would make in April and saw what those decisions did to their FICO score, they said, “I paid down my debt and my FICO score went down. That seems counterintuitive. Why did that happen?”

The best anyone could do was say: “Clustering math. What are you going to do?” 

And so we have moved on from those approaches to other approaches over the years. Do you want to talk about some of those other approaches that have been developed in the industry?

Emily Sands: So certainly one downside of clustering is explainability, right? You take all your transactions, maybe think about a fraud case. You take all your transactions, you engineer a handful of features, you try to group them into clusters based on similarity. And then maybe one cluster looks like mostly good traffic and then another cluster has a suspicious spike in failed payments and weird IP addresses and mismatched billing info.

I think that the bigger issue with clustering is that while it works decently for broad pattern recognition, especially if something looks really out of place, and it works okay in non-adversarial environments where the features can be reasonably well-defined and are reasonably stable. The clusters are only as good as the features you give them. And those features are hand engineered device type, transaction size, time of day, whatever else.

So the minute fraudsters change their tactics or start behaving more like good users, the clusters just get very blurry, very fast. 

Patrick McKenzie: And so what these are effectively doing is reverse engineering the business process of Evil, Incorporated frequently to try to fingerprint it in much the same way that Evil, Incorporated is trying to fingerprint the defenses of the good guys to interdict more of their transactions at scale.

We talk about some stuff here, which is quite a dense level of intellectual content. There's literally computer science PhDs awarded to people who come up with some of these algorithms. But then there's also sort of meat and potatoes work under the hood that supports it. And you've mentioned that Stripe runs a full stack.

One of the things that Stripe did was put a lot of effort into improving infrastructure to do things that when I tell people that this is a difficult problem for many well-regarded companies, they often don't believe me. [Patrick notes: Matt Levine has repeated encounters with this phenomenon as well.] But it turns out that adding numbers and keeping counters really, really quickly in real time for a robust number of things to count is actually a challenge for computers, even though adding numbers quickly is probably the thing that computers do best.

[Patrick notes: There are, indeed, interview questions which test whether you can quickly identify a shibboleth such as “this gets easier in a NoSQL database or perhaps an in-memory token bucket sort of data structure” in response to the question “How would you, at a large credit card issuer, identify the number of attempted transactions for a particular card within the last 12 hours, keeping in mind performance considerations.” Those shibboleths are effective shibboleths because people really did have to develop them at various places in the industry, and they have not percolated everywhere yet.

The obvious example, “Just ask the database”, has a bunch of increasingly non-obvious failure modes (e.g. we don’t want the count to be stale if the bad guys stuff in several transactions in a 100ms window) and implementation felicities (e.g. are we going to need a new table/index/etc for every counter we develop? This could get painful fast…). If you think discovering them in a job interview is embarrassing, wait until you discover them over the course of a decade or two of your customers being defrauded.] 

Patrick continues: But for various reasons, the sort of quote unquote counters that we want to persist like "How many times has this credit card been seen over the entire network over the course of the last 12 hours, 24 hours, 30 days, et cetera?" Persisting a large number of counters and updating them all very, very quickly in a number of milliseconds after a transaction has happened turns out to be difficult. So Stripe created some infrastructure to do that and then open sourced it so that other companies that found themselves in this position would be able to take advantage of it.

Emily Sands: Yeah, so our next gen feature engineering platform is called Shepherd. And as you noted, we built it in a partnership with Airbnb and have since open sourced it with them under the name of Cronon. And it's really about making sure that multiple teams can generate new features quickly. So easy to build and deploy new features and that those features, including counter features, can work efficiently on the charge path.

So latency matters a lot in payments and the speed with which we can decision throughout the charge path. And there are many micro decisions. It's not just one binary decision of fraud or no fraud. Maybe we can step back on some of those micro decisions in a bit as well. But the speed with which we can decision is an important determinant of whether or not the charge is ultimately successful.

And so for us, our feature engineering platform was really about making it easy to collaborate on quick feature generation and then inference in the charge path based on those features.

Patrick McKenzie: And the scale of data is just mind-blowing both in terms of the absolute numbers but in terms of sort of qualitative looks you could take on it. Not to pat myself on the back too hard, but I remember back in 2016 I just joined Stripe and was teaching myself some of the internal data science tools, and pulled out a stat. We were surprised it was that positive and have been keeping an eye on it for the last couple of years.

The current version of the stat is that: if a new card is seen by a new business, Stripe has seen that card 92% of the time in the past.

And so we have some notion of an idea of what the typical normal use behavior is for normal users because they are transacting over the Stripe network at many different stores on the internet. And given that we know what normal use looks like, we are more likely to know if their card gets purloined by the bad guys or guessed randomly by the bad guys. We are more likely to detect that faster than firms that have less of a data set to work with.

Emily Sands: Yes, and I think that's particularly powerful in the context of card testing, which we've maybe touched on a couple times, but not explicitly defined.

Card testing and fraud prevention

Patrick McKenzie: Yeah, let's talk about card testing. What is card testing? Why does a bad guy want to test cards in the first place?

Emily Sands: Card testing is actually one of the most significant fraud threats to the financial ecosystem. And it's a little bit of a subtle one because at the moment that the card testing attack is happening, it's not super costly to the business that is experiencing the attack. 

[Patrick notes: I’d have a slightly different POV here, for what it is worth. Each transaction which is reversed will ultimately cost the business a de facto penalty fee from their processor, and if a sufficient number go through relative to legitimate volumes, the business may lose ability to process credit card payments. Economic damage to the business isn’t the primary purpose of the attack, but being hit by card testers can be an emergency situation, particularly for small businesses/charities.]

These card testers are doing just that. They're bad actors trying to validate what stolen card number can be used in the future.

And they generally do this in one of two ways. There are verification attacks and enumeration attacks. Verification attacks are when they have a finite set of stolen card credentials and they're just looping through those stolen card credentials, attempting very small transactions to verify which ones work. And then enumeration attacks are when they don't have a finite set of stolen card credentials and they're instead just repeatedly guessing card numbers.

But in both of those cases, it's very small transactions during the testing phase, one cent, two cent, five cent, 50 cents, a dollar. So it doesn't actually matter much for the business that's undergoing the card testing attack. The minute the fraudulent actor discovers that the card works, it gets through testing, they either use it directly to make substantial purchases or sell it on the illegal market.

Patrick McKenzie: It's important to understand what's going on economically here for the card tester. There's an entire supply chain of evils that supports the fraudulent use of credit cards. And there are various actors specializing in various things.

And so someone who has a pool of purloined credit cards knows that the value of that pool goes up if they can winnow it down only to cards that still work. Because after you steal a card, you're on a clock until the user or the bank discovers that the card has been stolen, invalidates that card number, and issues the user a new card. [Patrick notes: This can be because the card might be stolen by multiple people in parallel, because someone fraudulently sold you a card that had been previously sold (remember to leave them a negative review so that upstanding fraudsters like yourself are not defrauded by them in the future; your e-commerce platform of evil has this as a core feature), because early attempts to abuse the card were fingerprinted by the bank, or because the user noticed an anomalous transaction on their app/statement and called the bank.] 

And because fraudsters need to put some amount of effort and burn some amount of scarce resources into attempting to exploit a card, they would rather do that against a list that has 100% good cards or as close to 100% as they can get versus a list that might only have 25% good cards anymore.

So there's someone whose job it is to go down from that existing list of cards to the 25% of them that are actually good to resell that for a higher per card value on, for example, a darknet marketplace. There's also another kind of card testing attack, isn't there? The enumeration attack. Do you want to talk about what enumeration looks like?

Emily Sands: Enumeration is when the card tester doesn't have some finite list of cards and instead is just repeatedly guessing card numbers. So you'll see them sort of move off by one in number or off by one in expiration date in the very non-savvy example, but they often have much savvier approaches where you don't observe the off by one in any narrow window, but that's actually what is happening overall across the ecosystem.

And whether it's a verification attack or an enumeration attack, once the card gets through testing, that's when the pain to the ecosystem comes in because the fraudulent actor uses the card either directly to make large fraudulent purchases or sells the card on the illegal market for someone else to make those purchases. And while card testing is one of the most significant threats to the financial ecosystem, it's also one of the most challenging to detect.

And there's two reasons for that. One is that fraudulent actors usually hide their card testing attacks in businesses who have a lot of transactions. And so even if a spike is evident, is it meaningful or could it be something else like a flash sale? And then second, the tactics that they're using, like this is their full-time job, the tactics that they're using are constantly changing.

And so across the internet, card testing attacks are rising by a bunch, but on Stripe they're down 80% over the last two years. And our approach here has been very AI based. So rapid detection and rapid retraining. And it's a nice flywheel. Detection lets us add new data labels and features, including in the Shepherd feature engineering platform, which we then feed back into the models for retraining and redeployment. And basically the simplest version is the model generates a probabilistic guess about whether the transaction is card testing.

Above a certain threshold, it's blocked. Below that threshold, it moves forward. But determining what is the right threshold is a very nuanced calculation. And to generate it, we actually have models and thresholds at multiple levels of abstraction. So what's the overall prevalence of card testing on Stripe, which updates our daily risk posture.

Where is the card testing likely to be taking place? Which businesses, which issuers, which surfaces are experiencing an attack? And then at the individual transaction level is when the final decisioning point comes in. But that threshold again is determined by our understanding of the overall prevalence and our understanding of the likely prevalence within given businesses, given issuers, and given surfaces.

Patrick McKenzie: One of the industries that are most affected by card testing, possibly because it usually allows the user to specify the amount that they are transacting, but also because of the asymmetric resources they have in their fraud departments, is charities.

And so the bad guys will attempt to do a lot of small dollar transactions on legitimate charities to attempt to see if the transaction goes through as planned. And then if it does, the card goes into the good column for them to sell on to people who are doing typically higher dollar transactions against non-charities because the ultimate goal of the bad guys is very rarely to fund the widows and orphans fund.

So one of the interesting things that we can do with detection is the bad guy might have a programmer that is scripting up an attack on just one business in the ecosystem at any given time. And given that we have heuristically identified that today it is, for example, this local charity, we can only on days where we think that particular business is being attacked, inject a bit of transactional friction into it. Maybe ask for a little more verification of CVV codes than usually, ask for a zip code where that might not normally be required, et cetera, and then only do that for a business which looks like it is under attack and only when it is under attack and then skip the friction most of the time because as a general rule in optimizing for conversion, the more friction you have, the more legitimate transactions get blocked.

Emily Sands: Yeah, I think it's worth noting too that the life cycle of a payment is pretty long and it's easy to think about a step like blocking card testing or blocking fraudulent charges in general as just happening at one fixed moment. But in practice, there's a bunch of decisions, which I think is what you're implying, patio11, that can be made far up funnel and in some cases also down funnel of that.

And so if you think about sort of the life of a transaction, it actually starts at the checkout page, right? The user is deciding to buy the thing. And at that moment, we can adjust, I mean, certainly what payment methods are shown and the layout of the page, but also what fields we ask for, what fields we require, which in cases where we're concerned about fraud, we can add additional checks and controls.

Then there's authentication, right? So the customer has clicked submit, they're trying to pay, we need to confirm that they are who they say they are, and especially in regions like Europe and the UK, where there are strong customer authentication rules that apply, we can help businesses chart the best path through that complexity. Like what are the right exemptions to request? What rails to use? What challenge indicators to set? Because again, the name of the game isn't just blocking fraud, it's also balancing that with healthy conversion.

Patrick McKenzie: Can I call a quick time out here to give people a little bit of context on this? So most businesses on the Internet, particularly ones not based in those jurisdictions, didn't get into business because they really, really wanted to think about multi-device fraud loops. But by regulatory fiat in certain regions, there is a requirement for what's called Strong Customer Authentication, which means that you can't just have one piece of information to imply the authorization and authentication of a user.

You can't just ask them for the credit card number.

We want you to do a second authorization, ideally off-device. And so, for example, that might be we're going to have you instruct the bank to raise a push notification on their phone from the banking app, which will pop up, "Are you right now trying to transact this amount with this company, yes or no?" And if they hit yes, great, it's authorized. If they hit no, no, it's not authorized. That's the strong customer authentication (SCA) loop. [Patrick notes: Notice that now Apple’s Push Notifications Service (or Google’s equivalent used for Android phones) having a bad day means credit card transactions fail, because there just weren’t enough independent single points of failure in credit card transactions.] 

But as you've just alluded to, that's only one possible example of the loop. There's actually many different ways that could be implemented. And there are various ways that a business could say, "OK, generally speaking, this is the rule. However, we're allowed to have software cite a legal exemption on our behalf." [Patrick notes: <butterfly meme>Is this a smart contract?</butterfly meme>]

For example, we are a magazine and we authenticated someone the first time that they signed up for a monthly subscription. We shouldn't have to authenticate them in month six. Maybe you don't need to know how your computer is supposed to raise a legal argument on your behalf, but somebody in the chain does. And so we can use ML to say, well, okay, we mentioned that these multi-party transactions might involve six parties and there are computers listening to legal arguments made by other computers on behalf of actual humans.

And some of those computers like some arguments more than they like others. And so you can say, well, if you have three possible exemptions you could request, the one that is most likely to work with a particular bank in the UK is this one, and it's available to you. So make that argument and not one of the other two even though you are entitled to those. 

So this has been your detour into the wonderful world of high-frequency lawyering.

Emily Sands: And if you don't want to live in the wonderful world of high-frequency lawyering, our models can just chart the best path for you through that complexity. Maybe for folks who don't experience these authentication challenges directly to be able to reason about them as a consumer, it does matter for whether the consumer gets through the transaction. In the last year, we've seen actually 20% fewer authentication challenges and 8% less fraud.

[Patrick notes: A constant, constant source of friction for payments is that people who build payment flows necessarily have different life experiences than people who transact over them, and they are frequently simply unaware of what the ergonomics of those flows look like. This includes the most plain vanilla purchasing experiences imaginable, such as e.g. when the users are similar in socioeconomic class and transacting in the same country as the developers/designers. I tried to convince consulting clients for years that if they didn’t force engineers/etc to run a real live transaction with a credit card quarterly those flows would degrade over time, and was only middling successful in doing this.

Pip pip: Stripe engineers/designers see checkout pages a lot more than your engineering team sees checkout pages, and they are more likely to sweat the details there, including in edge cases. (e.g. I once filed a bug covering a display issue in the case where a user was using a U.S. card in a Japanese-language browser. Fixed almost immediately, for all our users worldwide.)] 

Emily continues: We are intervening less, providing smoother checkouts while also driving down fraud. And we talked at the top about trade-offs between fraud and conversion, but oftentimes there are optimizations that can actually accomplish both at the same time. [Patrick notes: Free lunches do, actually, exist, despite the strongly-held heuristic that they must not, particularly in places where many smart people are presumably sniffing for underpriced lunches and converting them into money. See the extensive discussion in Inadequate Equilibria about this; it’s one of the best and most useful parts of the book.]

Adaptive acceptance and future of payments

So you do authentication and then we do fraud decisioning on our side, but then after that there's the next step in the payment lifecycle—authorization. So we've determined that we believe the transaction is legitimate, but it still needs to be approved by the bank. And false declines there are a huge issue. So over half of US card holders have a good payment wrongly declined in any given year.

So the best tool we have here is, and you referenced it briefly earlier, albeit not by name, but it's adaptive acceptance. And it's one of those Stripe features that just works quietly behind the scenes, but delivers very rare results. So your card gets declined, often for vague or temporary reasons, we call it a soft decline. And adaptive acceptance just uses ML to find the cases of decline that are soft declines and dynamically retry the transactions in real time, usually with just small tweaks behind the scenes, changing how it's routed, changing how it's formatted. And the consumer or customer doesn't even know.

Optimizing payment systems with machine learning

So in 2024, we recovered about $6 billion in good payments that would have been lost, which is about a 60% year-over-year improvement in retry success rates, just from incremental improvements to how the ML and how the systems work. And that's an example of a retry, but we're also getting better at first attempt optimization. There's a lot you can do on the first attempt, how you adjust the ISO messages to the particular preferences of the individual bank, how you switch between PAN and token depending on the context.

Understanding ISO messages and credit card protocols

Patrick McKenzie: So I wish we could plumb the full depth of ISO messages, but we will save people from that level of geekery. But essentially, all of these messages that go over credit card rails, you can round them to about a tweet worth of data only that passes between the business that is ultimately attempting to charge the card and the bank. And it goes over a complicated protocol that's been specified for a number of decades.

Just like there would be a variety of ways to write tweets and maybe some people like them a little more and some people like them a little less, there are semantically neutral ways to rewrite a credit card transaction. Again, these systems at the—there are about 4,000 banks in the United States, and 4,000 credit unions. Each of them has a system which might be entirely different than any other system in the world for approving credit card transactions. [Patrick notes: In practice many of the smaller ones use very similar systems from so-called core processors, with some amount of integration engineering drizzled on top of them, but these do sometimes break in distinctive ways.]

Some of those systems have idiosyncratic preferences for which messages they would like you to see and they are more likely to approve a transaction if you do it in a way which is sort of idiosyncratic to them versus in a semantically neutral equivalent way which works at many other banks. It just doesn't work for them because no one really knows. There's probably no one that got into a meeting and said "absolutely, this is the way I should decide it."

It's just legacy software that was written in 1988 being projected into the present.

The importance of semantically neutral changes

Emily Sands: Isn't it interesting that semantically neutral changes matter?

Patrick McKenzie: Yeah, I think this is a common bit of folklore that everyone, I don't know that every programmer has experienced it, comments in computer code should never, ever, ever affect the actual function in the computer code. But there's quite a bit of folklore over the years of "actually, you know, we came up with this one thing where the presence or absence of a comment really did matter and that shook us to our core." [Patrick notes: This is, of course, mindbending to the emerging generation of programmers who expect comments to be semantically meaningful to the AI that wrote most of the non-commented code. Why would you write a comment if it didn’t affect the output of the computer program?] 

Another classic example from general programming is that there was once a case where, I believe it was a university, found that it would bounce emails that were sent for more than 500 miles, which is not normally a dimension you think of emails as functioning or not functioning based on whether it's 490 or 510 miles to the recipient. And there turned out to be a reason for that.

So these semantically neutral things, regardless of the reason, they empirically do matter. At a certain scale, if you have truly ridiculous scales of data available to you, you can just do the math and figure out which works better for different issuers, for different rails, for different even genres of transactions.

And so my understanding is Stripe keeps something of a holdback set for retries where of some small portion of retries, which is on a day-to-day basis a huge number of retries, just try perturbing the message a little bit in a semantically neutral fashion and try to see statistically which of these perturbations that are available succeeds more often.

And then gradually over time, come up with essentially a private Bible of all the quirks that are known about the various programs running in the financial industry where, this bank in central Japan, which most across the Stripe network, is very possible that most businesses don't even know that this bank exists, but 500,000 customers of that bank generally do, and they would really like to be able to use their cards to buy your stuff. We've worked out that it would strongly prefer recurring transactions to be specified not in field A, but field C, which is an optional part of the spec.

Emily Sands: And the reinforcement learning goes a long way here. News Corp Australia relatively recently moved to Stripe and when they did, they saw a 5% increase in authorization rates, which meant that they could retain 10,000 more readers who would have otherwise churned. And especially in the world of subscription, recurring businesses, when you turn someone that's super costly to the business, long running, they don't bother to go back and add another card or restate their credentials. So a ton of value that can be created both for customers and for businesses here.

Stripe's enhanced issuer network

Patrick McKenzie: Another thing that Stripe has been doing recently is we've talked about there are five to six plus plus plus parties involved in any given credit card transaction. They historically operate as sort of data silos.

Stripe has a view of its network. The bank has a different overlapping view of any user's use of the credit card. It's not that either of these views is necessarily better than each other, but they are different silos with different bits of information in them.

And they are historically in parallel trying to underwrite a transaction completely in a vacuum without being able to swap notes aside from that, again, tweet length, very, very short bit of context that gets passed over the rails. So what did Stripe do with regards to issuer partnerships?

Emily Sands: So we launched a couple years back now our enhanced issuer network, which is basically just sharing data back to banks and they're welcome to use that data for their decisioning or not. But the simplest way to reason about it is Stripe sends some set of major issuers fraud scores from our radar fraud product. And that turns out helps them make better decisions and helps them reduce false declines. So they very quickly adopted it. And because they adopted it, it keeps things smoother for our customers because we're all operating with more symmetric information than before we were sharing that data.

Patrick McKenzie: And these are largely affecting the decisions made on the margins. For example, an issuer operating in its own silo said, "Hmm, this transaction looks sort of iffy for me, a priori, for whatever my own reasons are. Maybe I think this user is currently vacationing in Florida. And I know that because I've seen an ATM transaction from them. I also have an app on a phone which shares location information with me. So I'm pretty sure they're in Florida. This transaction isn't in Florida."

And Stripe says, "This is a 92, a very good transaction."

Emily Sands: Small numbers are good in Radar. [Patrick notes: I forgot which end of the 0-100 score range was the good end. It has been a while.] 

Patrick McKenzie: "Okay, so Stripe says, this transaction's an eight." The issuer might say, "You know, on previous experience, when we get eights on iffy transactions, those overwhelmingly turn out to be good transactions retrospectively. So I should let this one through." And that means the user's vacation to Florida wasn't interrupted. The business gets the revenue they were entitled to. 

And the bank doesn't get a phone call from an annoyed user saying, "Why is my credit card declined?"

Which, I think anyone who's had this conversation with the bank over the phone knows the customer service representative has no real way to introspect why the bank made any individual decision. [Patrick notes: I don’t want to blame the CSR for the system design here. There are many banks which could not, in less than $100k of staff time, come to a definitive conclusion as to why a transaction failed. I realize that sounds unlikely. It happens to be true, and it is one of the reasons spurious declines have bedeviled so much commerce for so long. The systems accumulated like sediment over the years, the original architects are long-since retired, and it keeps spitting out money, which fact defeats most internal attempts to do a costly rewrite with unknown levels of execution risk simply to gain complete end-to-end understanding.] 

And they'll often—sometimes they'll say "we don't know." Sometimes they will come up with an answer which might or may not be very reflective of reality. But often it's just, "I don't know, try again. Do you have another card? You could try that too. Best of luck. I need to get to someone else in the queue now."

Emily Sands: Did you just imply that humans hallucinate too?

Patrick McKenzie: As someone who paid for university doing telephone customer service, I'm extremely aware that sometimes the person that you connect to is paying for university doing telephone customer service and might not be 100% accurate with respect to 100% of the implementation details of the COBOL mainframe that is not physically at their location.

System-wide resilience is based on top of components like humans, like circuits, like databases, and like LLMs, which all have some error rate associated with them. Hallucination is what we call it for LLMs. We don't have a great word for humans, operating error, I guess. But we can get more system-wide resilience from not saying, "I would never build a system on top of something that comes to incorrect decisions," but instead using those decisions in sort of an orchestra model where we can not use any of them as a sort of binary gateway to the decision, but rather weigh them across each other and see which in a particular configuration is most likely to be predictive of success.

So, we've talked a lot about the sort of fraudulent side of credit card use and granted the options to reduce fraud that don't restrict legitimate commerce as much are intrinsically interesting. But it turns out that there's ML being used at many different points in the stack too. One of them just came to me.

Handling chargebacks and friendly fraud

So if you've run a business online, you know that to its credit, the United States awards users of credit cards and debit cards the wide right under the law to request a refund from transactions made. The magic word for that is a dispute or sometimes a chargeback in the industry. And if you run a business, chargebacks are no fun. Nobody likes them, in part because a chargeback will come in and say, "Okay, this user, who is often poorly identified in the chargeback response, but this user disputes that—they say the $132 that they spent with you was not authorized. What would you like to do?"

And your decision at that point as the business owner or as someone working in customer service is, "Okay, I click one of a few buttons, I accept this dispute, they get their $132 back and I pay a small fee, or I would like to have a sort of moderation process take over here where I'm going to submit evidence to the credit card issuer. And some employee at the credit card issuer is going to, aspirationally speaking, read that evidence and then make a decision as to who wins. And if I win, I keep the money. If the customer wins, they get their $132 back. And I probably pay a fee for the service in either way."

[Patrick notes: The individual at a credit card issuer making this call is a low-seniority employee who is approximately equal to Customer Service Tier Two in terms of training, educational background, organizational heft, and similar, and they will be managed to make at least a hundred similar decisions in the course of an 8 hour workday. This is not the model that the legal system operates on. That is partly why it is available for $15, not the $100,000 that a short trial over a contractual dispute might cost.]  

Patrick continues: And the tough thing for the business owners to do is saying, "My business does not exactly exist to generate credit card disputes for the obvious reason. And so I don't have, in most cases, a go-to strategy for creating evidence that will responsibly say, 'Hey, this person is the legitimate user of my SaaS company, or they legitimately stayed at my hotel, or similar. Here's the one-button form of evidence to give.'"

And so a thing that Stripe has done is code chargebacks based on your likelihood of prevailing if you submit evidence because there are some issuers that make sort of a business decision where "You know, we love our customers. We love them an awful lot. If they have a problem with a business, that's the business's problem." And so submitting evidence to those businesses might be for certain sectors in the economy basically futile.

So if you knew a priori, like you can submit evidence here, it might take your staff three hours to come up with the one pager that you're allowed to submit, but it will basically never work. Then you don't go forward with spending those three hours, you just take the hit. Where if there's a different company that has say well-thought processes and Stripe says that, relative to our baseline, many chargebacks that are disputed here are actually ruled in the business's favor, then you would say, "Okay, well, that might or might not be worth my time as a business owner to develop that evidence today."

And so that's a small example of one sort of corner case for the credit card ecosystem. Thankfully, most charges are not disputed. But in the case where a business and a customer legitimately have a dispute and have called in the mediators, it's good for the business to be able to say, is the mediator talking to someone who it is worth my time to try persuading or someone who that just shouldn't be a priority for me.

Emily Sands: Yeah, and some chargebacks are inevitable. You are right that most charges are not disputed, but many charges are, like both are true. And one of the biggest culprits of chargebacks is actually friendly fraud, which I don't know how I feel about the term friendly fraud. It's kind of a misnomer. It sounds so benign, but it's a real cost to business.

So this is when someone makes a legitimate purchase. So, you know, it's not a fraudster stealing your card. You are making a legitimate purchase, but then you go on to later dispute it. And, you know, maybe you just forgot. This has happened to me. I called AmEx, like, "What's this charge from Bloomingdale's? I don't shop at Bloomingdale's." And then I remember I was on one of these aggregators for somebody's wedding gifts and I clicked a button and next thing I knew I was buying something from Bloomingdale's. So they literally forgot they made the payment, or someone else in the household uses the card.

But there are also times when the individual has received the good or service and is just trying to get the item and also get their money back. And so when these chargebacks come in, we do a couple of things. First, we, as you know, predict the odds of winning it because there is a cost, not just a time cost, but also a fee to go about disputing. So businesses want to know which cases are worth fighting.

But we are also working on automating the evidence package. So all of the documentation that you need to fight a dispute. And as you might imagine, LLMs are proving super useful here. And we'll actually be announcing this feature at Sessions just next month.

Patrick McKenzie: Nice. I know there used to be a way to do sort of automated evidence generation and have the business post it via an API or similar, but that requires a certain sophistication level of the business and having the LLM be able to extract data that is already in the database and just say, "Okay, well, if I were to spend an hour worth of staff time on writing a persuasive letter here on things that we think are true about the world, here's that letter in five seconds and for a cost of a few tokens." I'll be waiting with interest to see that presentation.

Emily Sands: Yeah, a lot of the disputes that come in are actually disputes on subscription software products. And with our billing product and with our usage-based billing product, we actually see whether the software has been used within the billing period. And so you have a pretty compelling case as a business if someone has used your product in the window during which they were charged for the product. And that's actually written in software and well instrumented. And so that is definitely an area where I think businesses will meaningfully benefit from this feature.

Patrick McKenzie: As someone who used to run a SaaS company, the most frustrating form of chargeback that would come back is when there was, what's the movie phrase? "What we have here is a failure to communicate" where the software was legitimately used by a company, but the company using the software is complex. It has a bookkeeper who is not the same person logging into the software and punching buttons every day.

And the bookkeeper—no, this is no slight against the bookkeeper. They didn't know who Calzumia Software LLC was, saw that on the statement thought, "Well, naming your LLC after a dragon is a terrible idea. I'm going to charge that back." So I'd end up in a discussion with the user on what happened here. The bookkeeper says that they charged back. Can we please submit the equivalent of an affidavit to the credit card company? And then it's still like 50-50 on whether even with an affidavit from the user that the credit card company will say, "Yeah, we agree that that was actually good use of the credit card."

Emily Sands: Yeah. And I think a lot of these disputes are particularly painful for startups and SMBs, new businesses who are trying to get going with very lean staffing who just have neither the expertise in this domain nor the sort of manpower on hand to fight back.

I was talking to a startup founder the other day who runs a jobs marketplace, and she has a relatively high dispute rate from job seekers. So it's one of the few jobs marketplaces that monetizes job seekers instead of monetizing the employers. And she was telling me that she had all these cases of disputes where the individual would say, "I've never used this service. I've never heard of this company." And she literally has in her systems their resume that they have updated that has all of their personal professional information.

And so I do think this sort of friendly fraud is a quite real problem for the efficiency with which markets operate. And it shows up in all sorts of ways, including businesses needing to charge higher prices to good users to cover their costs of serving bad users who later go on to dispute. So definitely we'll smooth the market a bit if we can just make this more efficient.

Patrick McKenzie: Yeah, and without going into too much detail about it, there are geographical hotspots of friendly fraud. There are business model hotspots of it where being able to address it at scale makes it more likely that businesses will continue offering services to sectors of the economy we would sort of a priori hope that would continue to be able to access valuable goods and services versus needing to do something like use a broad brush rule and say, "I absolutely don't want to sell to this jurisdiction anymore because my pockets get picked when I do that."

Personalizing the checkout experience

But moving back into the happier parts of the transaction lifecycle, you mentioned that even before we have the credit card charge ready for submission, there are opportunities to influence the user's purchasing journey. Let's talk a little bit about what Stripe does to those users where we control the pixels that are painted in our checkout session. How can we make that a better experience for people?

Emily Sands: I think shopping has certainly come a long way with the move from fully in person to also being online and has gotten even smoother in the world of mobile. But still, most checkout experiences on the internet continue to feel pretty generic. So no matter who you are or where you're shopping from or how you like to pay, you usually get more or less the same old form and it doesn't adapt and it doesn't know you.

Sometimes that's all it takes for customers to drop off at the finish line. And so we have been working on fixing that and AI is a bit of a magic wand here. Stripe has an optimized checkout suite, which is all about making the checkout experience increasingly personalized for our customers, tailoring it to each end user and doing that in real time.

And a simple example is payment methods. There are a proliferation of payment methods in the world, Stripe now supports well over 100, whether that's Apple Pay or Ideal or Buy Now, Pay Later. And we automatically surface the most relevant payment methods based on who you are and what you're buying. And it works, right? So we've all been to a checkout that only had a couple of credit card options and we didn't have that credit card option on hand and we bounced from the checkout.

Businesses that just show one relevant payment method beyond cards, see about a 7% boost in conversion and a 12% boost in revenue. And that's a big deal for something as small as what are the buttons on the screen and in what order do we show them. Of course, it's not just payment methods. We talked earlier about fraud prevention, so layering in dynamic fraud prevention to checkout instead of always requiring the same fields, adjusting which fields you require based on what you ex ante believed to be the riskiness of the buyer is important here.

And then moving forward, we are investing in even more personalization. So exploring ways to adapt the entire checkout UX, including the layout to the individual buyer within each session.

Patrick McKenzie: We love payment methods because as we mentioned, we're experiencing something of a Cambrian explosion of them in the world. My friendly local convenience store had, I want to say, 42 logos on it the last time I shopped at it. That will probably be higher in a year or two.

Emily Sands: And my mental model for businesses is when it comes to payment methods, more is always better. But for the business's customer, more is not better, which is to say the business needs to have at their fingertip the full set of payment methods, especially if they want to reach diverse audiences across a bunch of different geos, appeal to the generation that is delighted to pay $15 for a latte, but needs four installments for their $60 hoodie.

But then when it comes to the individual buyer, you certainly don't want to induce any kind of choice anxiety. And you want to make it very obvious to them the payment method that is the best fit based on what they're buying. And it's not as simple as just looking at the individual. Like there are individuals who prefer buy now, pay later. But then in the moment that they're buying travel, they really want to use a card with great travel points.

And so it is a complex combination of who is the individual, what are they buying from whom for how much and that together informs, by the way, also on what device and that together informs the optimal payment methods to show them. So I guess it's another example of like, payment methods in what order? It sounds kind of boring, but maybe that's why the nerds thrive at Stripe because being obsessive about optimizing the little things at the current scale really goes a long way.

Patrick McKenzie: It also allows us to deduplicate effort across the ecosystem because you could have every business employ someone whose job was being a very niche ethnographer and understanding, "17-year-old Japanese person, what are they most likely to want to use today to buy sneakers," for example, and then have a thousand other things that they constantly keep up to date and are robust against what the new hotness is as of next year. And we would need hundreds of thousands of these people working at hundreds of thousands of companies.

We could, there probably is an ethnographer with that description somewhere in Stripe at this point. But have them plus some level of ML models come up with some of the both the deep cuts as it were and the fairly straightforward things that you would implement if you spent even an hour thinking of this, but where most businesses have not even spent that hour.

One really simple concrete example for people: Japanese businesses sell lots of things to Japanese people and they sell lots of things to people who are not in Japan. If you are a customer of an e-commerce shop in Japan buying the latest Harajuku fashion, you might choose to pay for that by walking into your friendly local convenience store and paying with cash. It's called Convenience Payments. It's about 10 to 20% of the Japanese fashion e-commerce market.

If, on the other hand, you are an American fan of Harajuku fashion or—

Emily Sands: If you're patio11, what might happen?

Patrick McKenzie: It might be my wife more than me, but if she is buying something from Harajuku, it's not very convenient for her living in Chicago to just go down to a Tokyo convenience store and pay in cash. And so it would be more relevant to present her with a typical card-based method or similar.

And particularly when you have this, an iPhone or another small device where you are really, really competing for every pixel of screen real estate, putting like the convenience store payment method front and center on the iPhone and demoting the things that she actually has available to her is just, it's not in her interest, it's not in the fashion shop's interest, and it's not in the interest of any of the payment methods because the wonderful thing about the payments industry is that basically every actor indexes on successful transactions.

And so if you're presenting a method that will never successfully transact, that method doesn't gain from being presented, the businesses that would successfully transact are losing the opportunity at the sale, et cetera. So, make this incentive compatible for everyone to grease the wheels of commerce.

Another thing in a similar genre, not exactly ML, but has similar character of impacts to me. Can you tell people a little bit about what Link is?

Emily Sands: Link is our one-click checkout product. So an individual, a consumer or a business can have an identity in Link and payment methods linked in Link, no pun intended. And that will work anywhere on the Stripe network that accepts Link with a capital L, which is the product.

And so I actually had a delightful experience the other day where I was buying a baby gift for a friend from Lovevery, and I hadn't bought from them before, but I was on their website, and the first thing I hit in the checkout page was enter your email. And I entered my email, and I immediately got this pop-up modal and a buzz on my phone to two-factor auth, and I added my six-digit auth code, and next thing I knew, it had all my payment details, it had my name.

It had a default shipping address, which was also my billing address, which I was able to very quickly change to my friend's address, and off you go. And there are many benefits of the link network, but the primary one is just a very convenient one-click checkout that knows who the customer is and can quickly get them through the transaction flow, thereby driving up conversion.

Patrick McKenzie: One of the things I like here is that similar to the Shopify phrase—Shopify, Stripe customer of long standing—they call it "arming the rebels" where there are some businesses who the internet economy has been very good for like say Amazon or Apple or similar and they've had your card on file for forever and your address is updated there because of course it is. And then there's the longer tail of internet businesses that don't have that advantage, that don't have 100,000 customers on them. They're just getting started today.

And Link allows them to sort of bootstrap with the benefit of the Stripe network behind them where, OK, it might be the first time this user is transacting with you because you literally weren't open for business yesterday. But they've transacted before online. And so we can make it as friction-free as possible for them to give you a try. And if they love doing business with you and want to come back next month, wonderful for all of you.

And so give the smaller or newer or quickly growing aspirants in the economy, the tools and the benefit of the rich data set that larger companies have earned over the years by tending to being very large and satisfying many hundreds of millions of customers.

Emily Sands: It's also accelerating the efficiency with which folks transact on payment methods outside of cards. So for example, I rarely pay with my bank account because I don't want to provide my bank account to a bunch of random places on the internet. But I do have my bank account linked from Link. And so my kids are three and five. We live in Palo Alto. They do Coach Ken's soccer. You go to Coach Ken's website to check out.

And either you get the three and a half percent surcharge to pay with a credit card, or you can pay with your bank account. But I don't really need Coach Ken to have my bank account. The good news is Coach Ken uses Link, so Stripe has my bank account. I sign in with Link and I can avoid that three and a half percent fee and help Coach Ken out as well.

Patrick McKenzie: And I found as a user myself, not just to talk our book here, but there is a Stripe user. I'll drop a link to a short video about them, My Mini Factory, for one of my hobbies of painting and printing miniatures. And they also use Link for payments on these miniatures, which a typical one would be like $5 or so to get a model of dragon to paint. Often when I'm impulse buying my dragons late at night, I don't want to have to do things like update the expiry on the credit card that just rolled around last year.

Emily Sands: It's super delightful when Link shows up on these little businesses, but just so folks have a sense of where else it's being used. Like if you have a ChatGPT subscription, it's probably on Link. If you use Airbnb or Uber, you're probably doing your one click checkout through Link. So it also works on very large merchants and is growing a very robust network currently.

Patrick McKenzie: And so if you are one of those consumers and you get a new credit card in the mail from your bank, which might update it every three years as a security measure, you only have to update it in one place once. It could be any of the places, but you update it once, and then all of the other ones will continue to be able to bill you through the standard link interface with your ongoing consent when you show up to a new place.

As you mentioned, the core model is you give your email address or phone number, you get an instant ping on your phone, it takes two seconds. So making life easier for users, easier for the transacting businesses, and potentially, in some cases, decreasing the costs of commerce for various participants in the chain.

The meteoric rise of AI

So one of the things that I miss most, a tear of sadness for no longer being on Stripe email lists, is that Stripe has a very interesting front row seat on the internet economy. And in addition to being a first party user, an innovator in AI, we've also had the opportunity to see some of the breakout successes of the last few years in the AI field start to scale up to what I think is probably going to be world changing levels of impact. You probably see a lot of very interesting stuff happening. What is Stripe seeing recently in the AI world?

Emily Sands: Can't believe I just saw some tears, patio11. I don't want to rub it in, but yeah, one of the things I do relish, I'm an economist by training, and one of the things I definitely relish about working at Stripe is getting a front-row seat to today, the AI boom.

So we're working with some of the fastest growing companies in the space across the stack, from the infrastructure and modeling layer to vertical applications. OpenAI and Anthropic and Suno and Perplexity and Mid-Journey and Eleven Labs and Pinecone, Sierra, and a long tail of others that probably aren't household names yet, but very well could be soon. We were looking at the Forbes AI 50 list and 78% of those folks are Stripe users today.

And they're mostly using Stripe to launch and scale global payments fast, to power really complex billing setups. So of course, usage-based billing for metered inference, which if you're an AI company is both mission critical and really hard to get right, especially if you're operating with a really lean team.

It's interesting to see, there's a lot of hype around the AI tech. And I think that has produced very fair questions about monetization and is it real and is there customer demand and how are these companies scaling? What we see in our data is pretty clear, which is this current wave of AI startups is monetizing very fast. In fact, faster than anything we've seen before.

We recently looked at the top hundred highest grossing AI companies on Stripe and plotted their growth trajectory. And for those that have already hit 30 million in annualized revenue, they got to 30 million in ARR in just about a year and a half from founding.

Patrick McKenzie: And these are mind-blowing compared to the last generation of very skilled software businesses, which was the software as a service (SaaS) generation. I did a little bit of work on all sides of that one. [Patrick notes: Sold two small SaaS companies, consulted for a few dozens, and spent more than I ever want to have to tell my wife in $20 a month increments.]

Our industry had indicative finger to the wind benchmarks of like how long it will take a VC-funded company to hit various revenue milestones. The current generation of AI companies seem to anecdotally be blowing those benchmarks out of the water. [Patrick notes: The price of learning some facts about the world is being able to keep confidences about them. Sorry.]

Emily Sands: Yes. And quantitatively also, right? Like if you look at the same comparison, like the hundred highest grossing SaaS startups on Stripe back five, six years ago, 2018, 2019, it took them more than five and a half years to hit that same 30 million ARR mark. So you think of this AI wave as scaling revenue at three times the speed of the SaaS boom. [Patrick notes: See the annual letter.]

And that's—we talked here about 30 million plus, those are the big players, but this is also true for the little startups. Like if you use a benchmark, like how long does it take them to hit 1 million in ARR? The AI companies are getting there in five months and they're earning four times more revenue than their peers that launched just a couple years earlier.

So that trajectory has been really fun to watch. There's also, recently, like over the last year or so, been an interesting shift in what people are building in the AI space. So it actually, it's starting to look a lot like the trajectory we saw in SaaS, horizontal first and then vertical. So Salesforce, horizontal first and then Toast vertical second in SaaS. And similarly, AI is going from broad tools like ChatGPT to these kind of highly specialized and industry specific applications.

Patrick McKenzie: And in our prior conversations about this, we shook our virtual fists at the word "wrappers," which is sometimes invoked to—

Emily Sands: Yeah, that kind of misses the point.

Patrick McKenzie: Yeah, the nature of the game in SaaS for a while was being an if-then statement and for loop wrapper. The majority of the intellectual effort and the majority of value created by the business was on, there are these fundamental primitives that we have. We can expose them to the realities of particular industry. There's a little bit of secret sauce from our domain experts.

And then do the hard yards of going out to the industry, being in all the places, spinning up a sales motion that actually repeatedly connects it to the right people . You gradually convince the industry to move away from a model which was either doing it all on paper or doing most of it in Excel to "You should use the same set of applications that everyone else in the industry now uses."

[Patrick notes: Why does DocuSign have X,000 people working on e-signatures? Because they convinced the entire U.S. real estate industry that Contracts Live In DocuSign. That took about a thousand people to do. Repeat over other verticals, etc etc.]

Emily Sands: These aren't throwaway apps. These are tools that are actually making LLMs useful in specific industries. And as you know, they're bringing the right context, they're bringing the right data, they're bringing the right workflows, they're bringing the right relationships. And that's where the real economic impact is starting to show up.

So if you think about healthcare, an industry that is of course highly regulated, really interesting companies like Abridged and Nobla and Deep Scribe are just changing how doctors work. My niece, Natalie, matched a week ago last Friday at UCSF for her residency. And she told me they have a whole team rolling out AI scribe tools. And I think it's a great example of just like residency already looks different than it did a year ago.

But we wouldn't be there just with pure horizontal applications. Most institutions are not UCSF with their own AI scribe tools team. Instead, they are really looking to companies like Abridged and Deep Scribe to solve a bunch of those challenges for their vertical.

Patrick McKenzie: In some ways, the consumer applications that are built on top of, for example, Claude and Anthropic or OpenAI's suite of tools are the fastest growing consumer apps in history that are actually people are paying money for them. But in another very real sense, they're sort of an extended tech demo to every decision maker in every industry on, this now exists in the world.

Would you be doing things exactly the way that you are doing right now if you had come up in an environment where this technology was the thing? And increasingly, companies are saying, no, we would definitely architect our processes differently if we had this as a foundational primitive available to us.

And the way that actually gets done in most companies is rather than spinning up an internal AI practice and doing 15 years of pioneering machine learning work, they tap somebody in the industry that has a vertical SaaS or a vertical AI or vertical SaaS embedding vertical AI and use that as the adoption vector.

Emily Sands: Right? Like even the architects are architecting things differently. Sorry, bear with my pun. But if you look at like Sketch Pro, architects are basically using text prompts to quickly generate a bunch of their early stage architectural renderings. And of course there are examples across all industries, like Harvey for legal and Studio for real estate businesses and so on.

Patrick McKenzie: I have to give a shout out to my friend Alicia, who walked into a number of meetings with Chicago local political leaders. She is attempting to get some housing built in Chicago and it turns out that it's much easier to have a sort of productive discussion with someone if you can give them a way to visualize that. So she was just using relatively commodity AI tools to say, "Look, you're aware of the realities of life in this corner right now, they aren't wonderful, here's what that corner could look like." And then just get as far as the next step in the approval process.

Emily Sands: So it's very much not hype. This is a real wave of businesses that are building real value and doing it faster than we've seen in any previous tech cycle. And I think the economic impact, business impact is being meaningfully accelerated by, again, the shift from let's just be horizontal to let's be deeply invested in solving for a particular vertical domain with the right context and the right data and the right workflows and the right relationships.

Patrick McKenzie: So I think the takeaway for many people is that LLMs are not just a flash in the pan. This is producing real value in the status quo for businesses at all ends of the size and sophistication spectrum. But there are some things coming down the pipe which feel a bit like science fiction and which we will have to put more than a little bit of thinking into because they seem to be those kind of science fictions that are likely to be realized in our homes and in our businesses much faster than we would expect.

Agent-assisted commerce

One of those things is agent assisted commerce. Can you tell me a little bit about what Stripe is seeing in the agent assisted commerce space?

Emily Sands: I think it starts with real leaps in AI right now, especially around reasoning. And it's not just about knowing things anymore, it's about doing things. So AI is moving from just answering questions to actually taking action on our behalf. As consumers, we probably see this shift. We used to just use ChatGPT to look something up. Now we use say Operator or Perplexity where the AI can actually go ahead and do what was asked.

And that shift from knowing to doing is gonna transform a lot of industries and commerce, to your question about agentic commerce is a big one. Online shopping, yes, has gotten more convenient over the years. Yes, with e-commerce, yes, with mobile, but even now it's still kind of a pain. I mentioned my little kids. I haven't yet mentioned that they're growing like weeds and just keeping them in clothes that fit and that they'll actually wear takes way more of my time than I like to admit.

You're digging through tabs and you're searching emails to remember what size and you're re-entering card info. There's absolutely 90% of that that can be handed off very quickly here to an agent. You mentioned sci-fi, but actually a bunch of it's already happening. I can go to Perplexity right now and make purchases straight from chat results. No endless tabs, no recheckouts, memory of my past purchases. And Perplexity agents actually go out and execute those purchases across the web for me.

So it's really kind of beautiful. And there's an important bit there where Stripe comes in, which is the money movement bit. So Stripe has always been financial infrastructure for the internet, for years optimizing payments and checkout and billing flows for human buyers and businesses. And now we're tweaking that for a new kind of user, which is these AI agents.

So back in November, we launched the AI toolkit, which just makes it really easy for agents to hit the Stripe APIs. They can spend money, they can move money, they can collect money. And in the Perplexity example, Perplexity's agents are using Stripe to securely make purchases on my behalf or your behalf or whoever else likes to shop through Perplexity.

I think the simplest analogy for what's happening from a money movement perspective is actually thinking about the physical marketplace that exists in a food delivery app. So maybe you are in a food delivery app, you find a salad on Sweet Greens. Behind the scenes, the app is actually issuing a single use virtual card to your human delivery driver. And it's locked to the merchant that you're buying the salad from for the exact amount of your salad. And once it's used, it's done. The card cannot be used after that. And no one, not the driver, not the restaurant, ever has full access to your card info.

And that sort of money movement, money flow is also what is happening in the AI agent example I just gave. So the agent that Perplexity has doesn't get my card credentials when I tell it to find a pink rain jacket for less than $30 in 6T, but it can still securely complete the transaction on my behalf with the limits that I define. And it's all powered by Stripe Issuing. Stripe Issuing just lets developers create virtual cards with built-in controls where they can be used, how much they can spend, and when.

Patrick McKenzie: And so over the course of the next couple of years, we expect the breathtaking pace of R&D work and AI to continue working on things like the hallucination problem where sometimes you tell a chatbot, spend $30 on a rain jacket, and it says, "Absolutely, I'm going to buy a fire truck for $400,000." But even in the case where the chatbot might have an interesting understanding of what the instructions were, the actual financial rail that the chatbot is accessing will act as a second layer of controls there.

Emily Sands: The controls are deterministic. And the agent toolkit is being downloaded by thousands of businesses every week, because there are all sorts of businesses that see the value of being able to have agents collect and earn and spend and move money. And this isn't just for e-commerce. There are AI agents out there right now that can help you run your business. Eleven Labs is a cool voice AI startup.

And they just use our toolkit. So a voice agent can handle subscriptions and refunds. Agents can do a bunch of the boring stuff. They can generate invoices off of super messy spreadsheets. They can update the card on file. They can update the billing plan. They can issue refunds. They can send payouts. They can analyze business metrics. They can do a lot of things without needing a human. And for the financial bits that they do, we're just making it easy for them to interact with Stripe's APIs.

Patrick McKenzie: While this feels very new and very sci-fi, I love helping to ground people in the realization that we've had non-human intelligences, transacting with non-human intelligences for a very long time. We just called them corporations. So ultimately, the notion of two systems interacting with each other is not a priori threatening. These are ultimately attempting to accomplish goals on behalf of actual people who live in the world.

There's an incredible amount of legal, social, organizational and technical infrastructure that allows LLCs to transact with C corporations and similar. And we are having to both reuse some of that existing infrastructure but also come up with new novel approaches to support these use cases which might be agents transacting on behalf of people with firms or eventually agents transacting with other agents to knit high frequency transactions between firms together.

And we talked about the various ways that ML gets even more powerful when we are doing a stupidly high level of transactions with a couple of milliseconds involved in between them in the retry for credit card transactions use case. But you can eventually imagine a world in which just like there are dueling robots bidding for your attention every time you open up a newspaper these days, there might be an agent or even a panoply of agents executing on your behalf while you're off on vacation to optimize things down to what flight you're taking and dealing with problems so that you don't have to sacrifice your limited time and attention to deal with them yourself. So it will be a wild couple of years.

Emily Sands: Absolutely.

Patrick McKenzie: So Emily, thanks very much for the conversation today. I've learned a lot. Hope it's been useful for the audience. If they'd like to dig even more in on these topics, where can they find more about them?

Emily Sands: We have Stripe Sessions coming up actually just next month. It is the Internet Economy Conference, May 6th to 8th, and you can learn more at stripesessions.com.

Patrick McKenzie: Thanks very much. Hope that many of the folks meet you there. And for all the folks, thanks very much for your time and attention this week. And we'll be back next week on Complex Systems.

Emily Sands: Thanks, Patrick.