Decisions Nobody Made, policy edition, with Dave Guarino

Patrick McKenzie Sep 19th, 2024

Why government programs tend towards equilibria that no one would endorse and UXes that no one would ship.

Today I'm joined by my buddy Dave Guarino (Substack, Twitter). He has been in and around government for much of his career,having built GetCalFresh in California, and spending his days today at Propel working on SNAP navigation tools. I've long had an off-again on-again hobby interest (and then a burning all-consuming professional interest) in why government operations generally and software specifically do not seem to be delivered with levels of competence routinely achieved in the United States. I think Dave's perspective brings nuance and experience to how this discussion often goes in tech-adjacent circles, without being as handwavy, exculpatory, or politicized as it frequently is in policy circles.

Sponsors:
Check is the leading payroll infrastructure provider and pioneer of embedded payroll. Check makes it easy for any SaaS platform to build a payroll business, and already powers 60+ popular platforms. Head to checkhq.com/complex and tell them patio11 sent you.

Building an enterprise-ready SaaS app? WorkOS has got you covered with easy-to-integrate APIs for SAML, SCIM, and more. Start now at https://bit.ly/WorkOS-Turpentine-Network

Timestamps:

(00:00) Intro
(01:03) Complexity of naming government programs
(03:45) How policy decisions are made
(07:19) Why SNAP applications are so complex
(14:17) Why no one stops overly complex applications
(18:44) Political economy of different benefit programs
(24:56) Sponsor: Check | WorkOS
(26:13) Limited visibility into user experience
(29:24) Lack of application completion rate tracking
(35:27) Starting where you are
(43:44) Challenges of modernizing legacy systems
(48:35) Broken feedback loops in government
(53:01) Tech's understanding of service design
(57:07) Issues with improper payments methodology
(1:04:45) Effective ways to influence policy
(1:09:43) Increasing agency in government agencies
(1:14:56) Getting niche policy ideas into circulation
(1:18:04) Importance of frontline knowledge and user feedback
(1:21:33) Improving government services
(1:22:06) Wrap

Transcript

Patrick McKenzie: Hideho everybody, my name is Patrick McKenzie, better known as patio11 on the internets, and I'm here with my buddy Dave Guarino.

Dave Guarino: Hello.

Patrick McKenzie: Dave has worked in and around government for many years, serving in a variety of civil society roles. I first became aware of his work through his efforts with California's Supplementary Nutrition Assistance Program (SNAP), which was formerly known as Food Stamps. I’ve admired his writing on the subject and wanted to bring him on to discuss some of the broader issues we see in government, and how both insiders and outsiders can contribute to improving things.

Before we started recording, we had an interesting aside about naming conventions. Can you explain a bit about the complexity of calling the program SNAP versus Food Stamps, and the nuances involved there?

Complexity of naming government programs

Dave Guarino: The complexity of naming is illustrated by the fact that I’m about to mention four different terms. First, there's SNAP, the Supplementary Nutrition Assistance Program, which has been the official name for a couple of decades. Then there’s the more colloquial term, "food stamps," which people have used for a very long time because that’s how the benefits were originally delivered.

That term is a bit outdated now since the benefits are provided via EBT cards—Electronic Benefits Transfer cards—so "EBT" is another term people might use. And then, states often come up with their own names for the program. Here in California, for example, we call it CalFresh.

I once had an interesting conversation with someone who was involved in the focus groups and testing around the naming of CalFresh. They shared some of the other options that were considered, and it was fascinating. The name itself reflects a few phenomena. Even though SNAP is the official term and was chosen to help reduce the stigma around the program, many people still call it "food stamps" because it’s a familiar brand. People recognize it, and it’s a term that resonates across different stages of life.

Oh, and I forgot to mention this earlier, but there was a fun Twitter thread about six months ago from a former congressional staffer. They were part of the negotiations when the program was renamed to SNAP, and it was a reminder that we often think of public policy decisions as final, but names like these can evolve and carry history with them.

After a lot of deliberation, trade-offs, and technocratic analysis—testing and counting the numbers—the reality is often quite different. The person who shared the story on Twitter, whose avatar is a cartoon raccoon, explained that, in the end, they just needed a name, and they really wanted to emphasize that it’s supplemental. The idea was that people shouldn’t rely entirely on this program for their food; they should be able to afford food through other means as well. So, it became SNAP.

How policy decisions are made

I don’t remember all the exact details, but I think this story points to another important phenomenon: big policy decisions often cast a long shadow. And yet, the window in which those decisions are actually made can be surprisingly narrow. It usually involves a small group of people, and the process isn’t always entirely rational or coherent.

That, too, leaves a lasting mark on how government programs are shaped and perceived.

Patrick McKenzie: I think that's true in the private sector as well. You’d assume there’s extensive testing and deliberation over a number of years, but often it’s just two exhausted twenty-somethings in a room at 9 PM who just want to go home. They come up with something, put it in the memo for the next day, and if no one objects, that becomes policy.

That’s sometimes how policy gets made.

One thing I’d like to highlight here, and this shapes a lot of my thinking about government versus citizenry interaction, is the wide range of socioeconomic circumstances and social classes among U.S. residents. In contrast, the range within decision-makers in government is much narrower. It’s often referred to as the "professional managerial class." That class—our class— thrives on complexity. We're comfortable with legalistic reasoning, 100-page documents describing edge cases, and so on.

[Patrick notes: We are extremely and painfully aware that many in society are not like us and yet we will force any system we make to work the way we work. Not through malevolence and not entirely through ignorance; we’re just following incentive gradients alllll the way down.]

Dave Guarino: Yep.

Patrick McKenzie: Realistically, the vast majority of people who rely on SNAP don’t thrive on these forms of complexity.

We're asking them to juggle multiple names and concepts, all of which essentially mean the same thing, but differ from the ten other programs they’re eligible for—programs that may serve similar needs in their lives, but look completely different on our organizational chart.

And that’s the problem: they need to understand our org chart to navigate these programs. This doesn’t serve the people these programs are designed to help.

[Patrick notes: How does one apply for food stamps in Illinois? It’s simple really.

First you Google how to apply for food stamps, skip past the two AdSense ads because obviously Google’s first two results are trying to trick you, get to the official dot gov website because you certainly know that unlike dot org dot gov is actually restricted by the registrar, choose to apply online through ABE (Application for Benefits Eligibility, it’s a Lincoln reference, get it, sounded great in the meeting), hit the page that tells you that ABE is experiencing temporary downtime (a word and concept you are quite familiar with and therefore causes you no confusion), failover to their suggestion of sending in a paper application, download a IL444-2378 B, use the DHS Office Locator to find where you should send it (which will require you correctly selecting which of 16 sub-departments you’re looking for, but of course you’re an expert in navigating government org charts and picked up on the obvious hunt dropped on the ABE downtime page), and then you get to the hard part.]

Dave Guarino: Yeah, and I think that’s one of the big dynamics at play in all of this—complexity. Think about the term "policymakers." No one takes that job with the goal of eliminating policies that are overwhelming people. It's the same with lawmakers.

I recently heard about a California legislator who moved into a more executive, operational role. After some time, they said, “I’ve made a huge mistake over the last decade.” They realized just how many laws were already on the books—and how many new ones were being passed every day. And that experience made them understand that more laws don’t necessarily lead to better outcomes. In fact, by piling more policies and laws on top of what already exists, we often make it much harder to achieve the desired results.

So, yeah, that’s definitely a major part of the issue.

Patrick McKenzie: I think this is also a political economy problem. When you're dealing with different constituent groups and stakeholders, it’s easy to point to something like, “We fought hard for this extra paragraph—section 2.1.a—that’s for you.” But it’s much harder to say, "Look, by simplifying this program, even if you didn’t love the changes directly, the population you care about has actually benefited over the last eight years, as shown by this research," assuming such research gets done, of course.

Why SNAP applications are so complex

You’ve worked a lot on SNAP eligibility, and I think you once mentioned that the application process involved 211 questions—not pages, thankfully. I wouldn’t want to impugn SNAP’s honor by implying it is that difficult. But still, 211 questions to determine eligibility sounds like a lot.

How did we end up there?

[Patrick notes: Dave’s experiences with an earlier effort to improve this program are recounted in Recoding America by Jennifer Pahlka. I misremembered: the book quotes the application as being 212 questions long, at Kindle location 2477. Dave and Jennifer had a really good appearance together on the Odds Lots podcast on this topic, incidentally.]

Dave Guarino: That’s a great question. The 211 figure comes from a review of the online application process for 18 counties in California back in 2014. So, for any policy wonks listening who want to quibble with that number, they’re doing their job correctly by pointing out that it might be slightly off. It’s possible the exact number has changed, but directionally, it’s about 200 questions.

How did we get there? Well, there are a few factors at play. SNAP, or food stamps—the program I’m most familiar with and have worked on extensively over the last decade—has a seemingly simple mission. You can actually see this in the statute, which, paraphrasing a bit, says something like, "It’s not good when people don’t have enough money for food. Therefore, we should ensure that every American has at least some amount of money for food. And if you don’t have enough, we’ll give you a supplement to ensure you can meet a minimum threshold."

[Patrick notes: It’s not the most stirring bit of American legislative writing ever, but I think it’s instructive to actually read the text, as a carefully considered encapsulation of the compromise a nation quite divided on this subject could endorse. Note in particular the shoutouts to sectors of the political economy Congress choses to make, which we shall return to shortly.

It is hereby declared to be the policy of Congress, in order to promote the general welfare, to safeguard the health and well-being of the Nation’s population by raising levels of nutrition among low-income households. Congress hereby finds that the limited food purchasing power of low-income households contributes to hunger and malnutrition among members of such households. Congress further finds that increased utilization of food in establishing and maintaining adequate national levels of nutrition will promote the distribution in a beneficial manner of the Nation’s agricultural abundance and will strengthen the Nation’s agricultural economy, as well as result in more orderly marketing and distribution of foods. To alleviate such hunger and malnutrition, a supplemental nutrition assistance program is herein authorized which will permit low-income households to obtain a more nutritious diet through normal channels of trade by increasing food purchasing power for all eligible households who apply for participation. That program includes as a purpose to assist low-income adults in obtaining employment and increasing their earnings. Such employment and earnings, along with program benefits, will permit low-income households to obtain a more nutritious diet through normal channels of trade by increasing food purchasing power for all eligible households who apply for participation.

]

That’s the broad intent. But the reason you end up with 200 questions is that, when you start operationalizing this broad intent, you have to define every little subpart of it. What does it mean to "not have enough money for food"? What’s a measurable, verifiable way to assess that?

You can’t just say, "Let’s take it on a case-by-case basis and decide who needs it." That approach would invite a lot of bias into the process.

If you allow extreme discretion and just say, "Whoever you think needs it, gets it," you open the door to bias, graft, and corruption. So, you need some objective standards to maintain the legitimacy of the program. But then, when you start adding those standards, you end up with a lot of questions.

For example, "How much money do you make?" Well, that’s too vague. So, you refine it: "How much did you make in the last 30 days?" But then you think, "What about the next 30 days—are you expecting to make more?" And, "How much money do you have on hand? Could you pay for food out of your savings?"

[Patrick notes: I’d note that “how much did you make?” is a notoriously difficult subject to quantify in law, the area where we have the best prior art (defining income to tax it) notoriously has cases where the law does not map to our moral intuitions.

To say nothing of the frictional burden for getting beneficiaries to understand concepts like gross versus net income, which routinely befuddle e.g. professional software engineers. (How many colleagues do you have who think that ticking into a higher tax bracket costs them more than a dollar on the dollar that causes the transition between brackets?)

Dave has previously described the felt difference between gross income (what the statute looks at) and take-home pay (what many other-than-sophisticated individuals think of as their “total” income) as the single greatest source of acrimony between applicants and case officers.]

Dave Guarino: Then you start considering different sources of income. And here’s where competing goals and compromises come into play. If someone’s working, we don’t want to penalize them too much for that. So, if they’re working part-time, making minimum wage, and supporting a family of four, they’re still very much struggling. We want to ensure they still qualify for benefits, which leads to something like an earned income deduction—because they worked for that income, it counts for less.

As you start operationalizing these core questions—"Who doesn’t have enough money for food?" "How do we know that?" "How do we measure and verify that?"—you end up layering on additional questions and policies. And then, on top of that, you add other goals, like creating the right incentives.

And then we introduce incentives to encourage work and disincentivize not working. For example, there's the time limit for Able-Bodied Adults Without Dependents (ABAWDs), where generally you can’t get SNAP for more than three months in a row without working, volunteering, or engaging in education or other activities.

[Patrick notes: I had to stifle a sound when Dave spelled out that acronym because even on an aesthetic-of-the-acronym level I thought drafters would find it revolting, but no, that’s a thing.]

Dave Guarino: As you layer these policies on, what started as something simple—like in a small community where you know which families are struggling—becomes much more complex at the scale of a country like the United States. SNAP, which is likely the largest anti-hunger nutrition program in the world by dollars, becomes incredibly intricate. Every new rule adds another question to the process.

These rules are often made for good reasons. One of my favorite examples is college students. If you ask a policy wonk, "Are college students eligible for SNAP?" they’ll say no. But then they’ll immediately add, "Unless they meet an exception." You wonder, why couldn’t they just say that from the start? It’s because the eligibility rules are full of exceptions.

[Patrick notes: As an example of the complexity of communicating this to beneficiaries, on hearing this explained as a fairly sophisticated adult, I correctly intuited that one of the exceptions would cover severe physical or similar disability. I would not have guessed that “your work-study program is federally funded” counts for one.]

Dave Guarino: So you end up with branching logic: "You’re not eligible... unless you are, if you meet an exception." And every bit of that complexity ends up on the application.

The last point I’ll make is that there’s a significant structural issue: we don’t have strong feedback loops between the complexity of the application process and the lawmakers or policymakers who design these rules. That gap—the lack of feedback from the people navigating these 200-question applications to the people setting the policies—is probably the biggest reason we have this level of complexity.

Patrick McKenzie: So, we have complex applications because we have complex preferences as a society, and complex moral intuitions. We need to account for, you know, the “deserving poor” and other layers of moral reasoning, which could fill books in theology or philosophy courses. That’s fundamentally why we end up with over 200 questions.

[Patrick notes: My own feelings on desirable size of the welfare system and eligibility criteria notwithstanding, I am referring here to something frequently referred to in the literature, generally by people disparaging of the concept attempting to describe the views of their political opponents.]

Why no one stops overly complex applications

Patrick McKenzie: At some point, policymakers are sitting around a table, looking at this 200-question application. And yet, no one says, "We cannot possibly ship this." Why? Why can no one in that meeting stand up and say, "This is unworkable"?

Sorry for a brief digression—I used to work in private industry, and, knock on wood, in a well-run company, someone—whether it's the CEO or someone hired into the marketing department 30 minutes ago—will look at a 211-question form and say, "Our conversion rate is going to be absolutely shit. We cannot possibly ship this. What’s the real plan here?"

[Patrick notes: Apologies for the profanity. I was imagining being stuck in a long meeting with a team of incompetent people, who had just laboriously stepped through an obviously unshippable project plan and nonetheless believed they were about to be greenlit.]

Patrick McKenzie: So, why does no one say that in a government context?

Dave Guarino: One big reason is that the people looking at the final application—like, say, the vendor coding the online version—aren’t empowered to make changes. The vendor might say, "I’m just responsible for translating the paper form into a website." Then, if you go upstream to the people who design the paper form, they’ll say, "My job is to make sure this form complies with the federal regulations." There’s a set of citations from the federal oversight body that checks the form against specific requirements: "Does it ask this? Does it cover that?"

Their job isn’t to question those requirements; it’s to ensure compliance. If you go further upstream, you reach the people interpreting federal policy, and beyond that, those setting the policy within the framework of federal law and congressional intent.

The somewhat dissatisfying answer is that it comes down to a lack of agency and diffuse responsibility. No single person along the chain feels empowered to say, "No, we can’t send this out as is." Often, there simply isn’t anyone with the authority to make that call.

Another part of it is the profound status quo bias. Once something is already in place, the inertia to keep it going is strong, even when things aren’t working well. Change is hard in a system like this. That’s why people sometimes say, and I think they’re right, that we should create a new program rather than try to reform an existing one.

But as you alluded to with Seeing Like a State, there’s a valid concern that if we spin up a new "greenfield" program, it might be full of well-intentioned ideas but completely detached from reality. And we could end up without a proper feedback loop during the crafting of that new program.

The answer is probably yes—that’s exactly what can happen. I can point to an example with pandemic unemployment insurance. A new program can come with a host of unintended consequences because it lacks the built-in responses that a long-running program has developed over time. Over the years, existing programs often evolve almost allergic reactions to mismatches between how the world actually operates and how the program was modeled, like realizing, "Oh, we didn’t mean to include this group in that category."

Take college students, for example. Technically, they may have very little income, but if you’re going to Harvard, most students in that situation aren’t truly struggling. So, we don’t want them to be eligible for SNAP. But we do want students at Harvard who receive Pell Grants, and come from families that genuinely can’t afford tuition, to be eligible.

This is where you end up in a difficult spot. There’s no perfect solution. The ideal is somewhere in the middle, where you can start fresh but also iterate quickly and adjust based on real data. Unfortunately, we haven’t yet built institutions that are particularly good at both innovating and responding quickly to feedback. That’s the direction we should be heading, but it could take generations to get there.

Patrick McKenzie: Yeah, and I think there’s a real question about the cost of greenfield development. One of the reasons the intended beneficiaries of these programs get mired in complexity is because well-off, highly educated people—who aren’t worried about their own food security—are sitting in meeting rooms thinking, I don’t want to be responsible for adding the 212th question to the SNAP questionnaire. So, instead, they decide to create an entirely new benefits program because the 27 we already have aren’t quite right. Let’s go for a 28th.

Now, the burden falls on the beneficiaries, who are left to figure out which of the 28 programs they need to apply to first. In effect, we’re externalizing the burden of understanding and navigating the complicated policy preferences of the polity of the United States onto the people within it who are probably least equipped to handle that level of complexity.

[Patrick notes: Some beneficiaries of government assistance are complexity mavens, of course. Perhaps they write software for a living, produce podcasts, or appointment themselves the U.S.’s czar for vaccine location information. But, on balance of probability, that probably does not happen the same week they apply for benefits. One imagines that that week, their household probably had an awful lot going on, and grossly insufficient free cycles to navigate complexity that, in better times, they would happily play like a newly released video game.]

Political economy of different benefit programs

Dave Guarino: Yeah, it’s definitely a case of negative externalities that we push onto people. We say, "This friction doesn’t really affect us, but we get some benefits on the distribution side." We can say, "X group doesn’t get these benefits, Y group does," which might look good optically but makes the process harder for those who need to navigate it.

When we talk about creating new programs, I wish people considered more the different political economies of various public benefits in the U.S. Each benefit has its own unique political dynamics. Over the last decade, I’ve worked a lot on SNAP and, for a brief period, on unemployment insurance (UI). I’m not an expert on either, but I noticed some interesting structural differences between the two.

SNAP has a very durable political economy. Why? Because a lot of grocers and retailers love the program—it increases demand for groceries and food. They’re strong proponents of keeping SNAP robust and serving lower-income communities. Plus, it’s tied to the farm bill, which connects it to farm politics, adding to its political strength.

On the other hand, UI has this strange "race to the bottom" dynamic. It’s primarily a state program, with some quirky legal details, but largely state-controlled. Since UI is funded through employer taxes, states have an incentive to lower benefit amounts or restrict eligibility to keep businesses happy by reducing taxes.

By reducing business taxes, states effectively cater to the political economy of businesses, which is often far more powerful than that of unemployed workers. It’s not a difficult calculation to make. Scholars who are much deeper into unemployment insurance (UI) than I am have studied this and observed an empirical "race to the bottom" across states, where benefits get lower and lower.

That’s where people sometimes miss the bigger picture. If you want to create a "greenfield" program—a single, streamlined benefits system that simplifies everything—one of the major challenges is designing a political economy that ensures its longevity. That’s the decades-long feedback loop people don’t consider when they’re focusing on the complexity of a program in any given year.

It’s a bit of a geeky spectrum, but you need to zoom in and zoom out when thinking hard about these problems. You have to understand both the immediate details, like the number of questions on an application, and the larger, structural dynamics that affect how programs endure over time.

Patrick McKenzie: I think there’s also something interesting in the political economy around food specifically. There’s a deep cultural tradition in providing food specifically and particularly to poor people specifically. The group of beneficiaries, who are generally lower-income, tends to exist over long time horizons and has established institutions connected to it—churches, food banks, advocacy groups—that provide ongoing support.

In contrast, people who have recently lost paid employment and haven’t yet regained paid employment don’t have a natural cultural anchor or a strong advocacy community. [Patrick notes: Witness the fact that people in our social class feel the need to invent new words to describe the state of not having a work email address but do not imply that one is unemployed.] They don’t have the same institutional presence. So, when it comes time to "divide the pie," the groups with established representatives at the table—the ones who can show up consistently—tend to get a bigger slice of the pie.

Dave Guarino: Absolutely, yeah.

Patrick McKenzie: We’ve both referenced Seeing Like a State a few times, both before and after we started recording, and what I consider to be its modern update: Dan Davies’ concept of the Unaccountability Machine. [Patrick notes: Previously, on Complex Systems.] I think you mentioned it could have been titled Decisions No One Made, but...

Dave Guarino: I will hold a grudge against Dan Davies in perpetuity for not choosing that as his title. It strikes to the heart of systems geekery.

Patrick McKenzie: It would have been a wonderful title.

Organizations, especially governments, often struggle to make their own decision-making processes legible, even to themselves. One reason the government doesn’t realize that a 211-question application is causing harm to end users is because the people who actually see the website—the ones who see what users are experiencing—are insulated by multiple layers from the decision-makers. So that insight doesn’t "percolate up the tree," so to speak.

Limited visibility into user experience

Another factor is the limited visibility into the actual user experience when people interact with the system and the bureaucracy behind it. Can you talk a little about why that is?

Yeah, I think there’s both a structural level and a mechanical level to this. Structurally, the problems that are measured well and have legible feedback loops to decision-makers tend not to be the biggest problems. It’s almost by definition.

Take SNAP, which I know best. One well-measured metric is Application Processing Timeliness (APT). This tracks what percentage of applications are processed within the required federal timeline—30 days for a regular case or 7 days for an expedited case. There are about seven asterisks attached to that, and if you're curious, you can dig into the FNS QC Handbook 310, which I believe is 400 pages long. [Patrick notes: Fact check: exactly accurate.]

Dave Guarino: I do recommend it. The FNS QC Handbook 310 has a big scroll on the front, which is delightful—it really communicates what you're getting into. But back to application processing timeliness (APT), it’s one of those feedback loops that states are measured on every year. Cases are sampled, reviewed, and on the whole, when there are issues, a feedback loop builds up and those problems tend to get fixed because it’s known and recurring. It’s a feedback loop you can anticipate.

APT is one of the most important metrics, alongside another key measure: payment accuracy, or the payment error rate. These two are crucial because if a state gets them wrong, the USDA, which oversees SNAP, can impose fiscal sanctions—literally taking dollars from the state. And that’s something no one wants. So, these two structural goals—application processing timeliness and payment accuracy—tend to self-correct over time.

Take Alaska, for example. A few years ago, they had a major backlog—applications for benefits were sitting for months. Eventually, they fixed it because there was a clear feedback loop, lots of attention on the issue, and it became obvious that things were really bad.

But when someone is handed a 10- or 20-page paper application, or they’re going through a 90-page online application with 200 questions, there’s no similar feedback loop.

The frustration people feel—the "UGG" they express, or the fact that they just give up in the middle of applying and say, "Screw it, I’m not doing this"—doesn’t get through because there’s no legible feedback loop for that. Frontline workers see it all the time, of course, but that information doesn’t make its way up the chain in any meaningful or structured way.

Lack of application completion rate tracking

Patrick McKenzie: Simple question for you. If there’s a web application of this type, is there actually a dashboard somewhere that senior decision-makers look at to track things like the application completion rate? Or is that something that, technically speaking, just doesn’t exist?

[Patrick notes: Prior to the recorded part of the conversation, Dave and I had a chat about this topic, and I was hoping he’d repeat the observation that—across all government sources—we’re improving on this specific topic, and now as many as 20%ish of forms would be instrumented sufficiently to track abandonment rate in a granular fashion.]

Dave Guarino: It depends. It really varies. This is one of those situations where leadership matters, but leadership alone is insufficient structurally. Anytime someone says, "You just need a good leader," that’s not a real structural solution to the problem.

That said, leadership can make a difference. There are places where leaders who care about this issue ensure it gets measured, create dashboards, track it, and focus on it. But for the most part, the average government benefits web application probably doesn’t have a dashboard that tracks the percentage of people who start and actually finish the application.

One of my long-term advocacy goals in this area is to make sure that’s measured—because it’s practically zero cost to track—and reported up to the federal government. Imagine if the USDA could say, "Look, State X, you have twice the drop-off rate of every other state." That kind of data would be incredibly valuable.

[Patrick notes: I concur with this analysis, having directly observed nationwide metrics causing a step function increase in urgency where people were dying by the tens of thousands without achieving that result.

The state of California didn’t feel like not knowing where the covid vaccine was was a very high priority for senior leadership until Bloomberg (the publication) used federal data to calculate make a league table for efficiency in injecting delivered doses. California’s numbers—at approximately 25% in January of 2021—placed it at 48th in the nation.

That caused consternation in the corridors of power, including a bit of “That’s not actually simply measuring administration of medicine, that’s measuring our ability to produce a distributed count of things. We don’t think being bad at that task is a pressing issue under the circumstances and think Bloomberg is unfair to hit us for it.” (That is not a direct quote.)

A drum I have to beat, which many good government advocates are very quiet about to avoid impugning the character of friends, colleagues, and people they need the future cooperation of: manifest incompetence at doing very simple things is not the sole structural reason for breakage in government services. But widespread manifest incompetence is a factor, and models which suppress knowledge of it for political reasons will fail to allocate the correct number of points to increasing competence levels, and then observe carefully crafted policies fail for want of e.g. counting things, maintaining spreadsheets accurately, or otherwise performing at the capacity level of a bright middle schooler.]

Increasing agency in government agencies

Dave Guarino: Exactly—what the hell’s going on there? That’s a useful and meaningful feedback loop. So, most places don’t have it, some do, and others have it but don’t use it. During the pandemic, I worked with one government entity—I won’t name them—where I had a funny conversation early on. I suggested setting up instrumentation to see where people were dropping off, and to track how many were using mobile versus desktop, since the site wasn’t mobile-responsive at all.

The program administrators on the call were like, "Yeah, that would be great." Then, someone from the technical team, who was actually from a vendor, said, "Oh, we already have that." The program folks were surprised—"Wait, really? You do?" And the vendor confirmed, "Yeah, we’ve always had that data."

So sometimes it’s not that the data doesn’t exist, it’s that no one is actively driving the process to use it. We don’t have many structural forces pushing for the measurement and management of this kind of friction, and that’s a huge opportunity.

Patrick McKenzie: Bureaucracies differ in their capabilities and operations, but sometimes their inefficiencies rhyme. A good example comes from the mini financial crisis in 2023, when many banks, somewhat surprisingly to non-specialists, found it difficult to determine—on a minute-by-minute basis—how much money had left the bank that day. Normally, this isn’t a number anyone cares about [Patrick notes: not at that latency, at any rate] , but during a financial crisis, “How many wires were sent between 3 PM and 4 PM?” suddenly matters a lot.

It turned out that the teams running the mobile apps had better technology to estimate day-over-day increases in wire transfers than the teams overseeing the bank’s actual money flow. So, while I don’t want to poke at the government without reflecting on my own industry’s shortcomings, this is something we see frequently.

Government services that are highly visible to retail users—like education, public benefits, etc.—are often administered locally in the U.S., partly for political economy reasons and partly to avoid the Seeing Like a State problem, where a central authority tries to simplify and homogenize all the ways in which Americans are different. The idea is to locate decision-making at the local level, where people are more likely to understand local conditions.

However, this creates systemic challenges. You have 3,000 counties in the U.S., all structurally dissimilar, trying to implement one national SNAP policy under the Department of Agriculture. Each county may have different vendors, different websites, different backends. And when the federal government asks for information on how well SNAP is being implemented, they’re essentially asking 3,000 different offices—each with varying levels of capability and technology—to provide data, if they’ve even collected it.

[Patrick notes: Previously on Complex Systems, Dave Kasten and I discussed how Californian counties range in population from “smaller than my high school” to “about as big as Tokyo.” This directly affects the resourcing available to institutions which sound comparable, like e.g. “county health departments.”]

Patrick McKenzie: Then someone in the middle has to collate all this data into a report for decision-makers so that Congress can assess whether SNAP is being properly implemented across the country.

And empirically, that kind of distributed search for information, followed by making sense of it, is something America doesn’t do very well. I say this as someone who had to lead a nationwide effort to track down the COVID vaccine. After successfully developing the vaccine in a remarkably short time, the U.S. couldn’t locate it because we just shipped it out everywhere.

Each place had it in a different database, and none of those databases talked to each other. So, until someone wrote the SQL query to end all SQL queries, we literally didn’t know where the vaccines were. That continues to be a mind-blowing statement about 2021, but it’s true.

I won’t rant too much about my experience with VaccinateCA, but it definitely deepened my interest in government and public systems.

Starting where you are

You mentioned something before the call about “starting where we are.” Could you expand on that thought a bit?

Dave Guarino: Sure. One of the things I’ve come to spend a lot of time thinking about, and which was particularly counterintuitive, is the realization that working in, around, or alongside government entities often reveals a paradox. Where you think there’s a lot of power and agency, there’s often quite the opposite. You might assume that someone like a director has control over everything and can make things happen at a moment’s notice. But when you get close to them and try to help with one of their priorities, you realize they often feel quite powerless—facing all sorts of rules, structural constraints, budget limitations, and so on.

I like the saying, "start where you are," which I believe originally comes from a Buddhist nun’s book in Berkeley, but I first heard it in a Google Geodata talk a decade ago. It’s durable advice if there ever was any. "Start where you are" means accepting, in a radical way, the constraints you face and asking, "What can I actually do right now to make things a little better?" It’s about taking practical action within the system, without waiting for anyone to approve something, build an API, or go through procurement.

This philosophy is something I apply radically to my work because people outside of government often underestimate just how much relative agency they have to solve problems for those on the inside. For example, when we were working on a simplified SNAP application, the first version we built was a one-page, four-question application: name, address, signature, and date. Federal law says that’s enough to start the clock on someone’s application. We built it as a bare-bones, minimally-styled form that sent a fax to the office. It was clunky—a Bootstrap design running on a single hobby dyno on Heroku, with my personal credit card on the fax machine API—but it worked.

When we showed it to the county director, he saw it as real, working software that solved a problem for him. In one of our conversations, he asked, "When can you start sending applications?" We were like, "Whoa, this is just a minimal prototype." But he understood something important: while the normal application had 200 questions, we didn’t need all of them. The next step in the process was an interview, so really, we just needed contact information. The reality was, the application could be drastically simplified.

Starting where you are helps avoid the trap of thinking, "Let’s just ask for an API; it’ll be easier." What you don’t realize is that for someone in government, that simple request might trigger a six-year procurement process in their mind. But when you can demonstrate a working solution, many people in government are very responsive. They’re used to working within small silos and feeling limited in their ability to make change. If someone comes in and says, "I can handle 99% of this; I just need you to not shut it down," a lot of people are eager to make things better.

[Patrick notes: This is sometimes described as “Ask for forgiveness, not permission.”

A cynical person might also note that government enforcement for high priorities with dedicated agencies is a dollar short and a decade late, so perhaps you can just choose to roll to disbelieve the illusion that there is a competent authority figure capable of stopping you or willing to impose consequences for failure.]

Patrick McKenzie: Yep. VaccinateCA had a very similar experience in the early days. We talked to a few decision-makers in California, and one of them was pretty bearish on the idea of having a map. They said something like, "The people who need the vaccine the most are probably the least competent to use a smartphone to find it. They might live in areas with limited internet access or not even have a smartphone."

[Patrick notes: To not unnecessarily impugn the character of California public health decisionmakers, the full quote was:

It would probably help with public communication to have a map like the one you describe, and we could probably build (and verify) in less time than you would expect, especially given what I read as your sometimes-correct view of the glacial pace of government. However, it comes with a lot of ancillary considerations. For example, it would be very bad to send people to a location for a vaccine only to be turned away because a website was wrong, especially if those people are seniors, or front-line workers who only get limited time off of work. It also doesn’t help folks who don’t have smartphones or lack the technical savvy to understand this kind of information, who, demographically speaking, are more likely to be the ones who need the early vaccine doses anyway.]

Patrick McKenzie: At the time, I thought—and came to believe even more strongly as we operated—that while the ultimate beneficiaries of the software might not be the direct users, others could benefit from it. It turned out some of our most avid users were actually working in county health departments. They didn’t have a simple way to just Ctrl+F for vaccines in their county, and we gave them that ability. And crucially, we didn’t say, "Okay, we’re going to need two years of public commentary to write the RFP, followed by an 18-month bid process, with delivery sometime in 2028." Instead, we just said, "Here’s the URL. Use it if you want; it’s free."

By doing this, we avoided triggering the government’s internal antibodies against waste, graft, and so on. The result? County health departments used our tool directly, and it helped them successfully route a large number of patients to available vaccines.

I find that incredibly frustrating because so many objections come from these broad, anecdotal assumptions. The reality is that people’s lives are far more complex than those simplistic arguments suggest. Over the past decade, I’ve heard countless times—though thankfully less often now—that "no one would fill out a long benefits application on their phone." But the truth is, even when sites weren’t mobile responsive, people would pinch, zoom, fill out a field, submit, repeat. People find a way. They use what you put in front of them.

And like you said, there are intermediaries who may act on behalf of someone else—whether it’s a child helping a parent who doesn’t speak English well, or someone at a government office who doesn’t have the right tools but works directly with the people coming in. Those users might not be online themselves, but better online tools would still improve their experience.

The idea that an online system "won’t reach people" is one of the worst myths because it holds us back from improving those online channels. And when those systems are low friction, more people use them. People don’t type out the full URL for a state benefits portal—they’ll search for it in Google first, which presents a huge opportunity for better user experience design.

[Patrick notes: I would have very pointed observations to make on whether government officials consider success delivered through Google to be incentive compatible.

It is difficult to convey the depth of my despair as to our decisionmaking processes without making specific claims about conversations that may or may not have happened between particular government officials and particular tech companies. That would be rude.

It is not considered rude to reprint an email given permission to do so, so here’s Google, talking to VaccinateCA, which was not the government:

The uplift from VaxCA is ~5,000 sites (up from 127 [that Google otherwise had reliable information on]). For all intents and purposes, VaxCA is enabling our launch [of vaccine search] in California, and we greatly appreciate the partnership . . . VaxCA listings are now live on Local Search and Maps.

We prioritized getting the vaccine onto Google because putting the most sought after object in the history of the world onto the thing everyone uses to search seemed like an obviously good idea.]

Patrick McKenzie: I think the perfect definitely shouldn’t be the enemy of the good with these things. Ship online systems, and if your moral calculus centers on those least able to use online systems, remember that simply having them available likely won’t harm those individuals.: In fact, it will probably help by enabling their family members, advocates, and community to assist them.

[Patrick notes: It seems like a crazy thing to say, but putatively serious individuals were concerned in 2021 that publishing accurate information about the availability of the vaccine would let people good at getting information outcompete more preferred potential users of the vaccine, and therefore it was justice maximizing to not publish that information. How were those more preferred users supposed to find it? Figuring that out was somebody else’s job.]

Challenges of modernizing legacy systems

Patrick McKenzie: So, ship it, see who comes through the door, and if the composition isn’t what you hoped for, then ship the next thing that adds value. Don’t shoot down a new use case with net positive expected value just because it’s not perfect from day one.

Dave Guarino: One other thing that shouldn’t be overlooked is the capacity aspect. There’s usually a fixed number of staff who can provide hands-on help for people who really can’t manage something in a self-service way online. So, when you increase the number of people who can successfully use the so-called "self-service" channels—which, by the way, I’m not a huge fan of that framing—you free up capacity. If it becomes easier for people to handle things themselves rather than waiting on hold for 30 minutes to talk to a call center, suddenly your call center queues shrink.

That allows you to spend more time helping the people who truly need one-on-one assistance. It’s a more complex, dynamic system than just the binary idea of people either using online tools or not. There are side effects that play into the overall capacity and effectiveness of the system.

Patrick McKenzie: And, oh man, this is fractally complex. You zoom in on any part of it, and there’s infinite complexity within.

One aspect, as someone who’s worked in a call center—that’s how I paid for university and got some experience the hard way—is that as you increase the relative attractiveness of self-service channels, the population that arrives at the human-assisted channel changes.

Instead of representing a cross-section of your general customer or beneficiary base, the people coming through now are the ones having the most severe problems with the self-service channel. This can be acutely frustrating for customer service staff.

Instead of dealing with one severe challenge per shift, it becomes challenge after challenge without any "palate cleansers"—like a straightforward call from a grandma in Oklahoma who never swears at the rep and just has a simple issue with the system.

[Patrick notes: I worked in relatively humane environments where e.g. a colleague in tears after speaking to a customer who had threatened her life would have a number of people attempt to console her, production numbers be damned.

Not all call centers are managed in that fashion.

And thus, an underacknowledged challenge here: the people staffing benefits programs, who are in many ways the primary beneficiaries of the benefits programs (in a way it is rude to acknowledge without making it false), have incentives which occasionally run opposite those of the acknowledged intended beneficiaries. And it’s important to think clearly about what margins are important to us. How many minutes from how many single mothers is worth Samantha enjoying her Tuesday?

Careful: zero does sound like an attractive answer to this question, but people rarely believe Samantha should be replaced with a short shell script if that provably saves beneficiaries minutes. And so now we’re arguing about the price.]

Dave Guarino: I had a funny version of this just the other day with a benefits card. To activate it, I needed to enter a three-digit code, and the issue was that I’d entered an "O" instead of a zero. When I called in, they immediately knew what had happened and said, "Oh, this is exactly what happened." I asked, "Do you get this call a lot?" and they replied, "I get this call constantly."

Somehow, there’s no feedback loop to fix that issue. But like you’re pointing out, in some ways, that call center agent’s life is a lot easier dealing with these simpler cases than if they were only handling extremely complex situations—like overlapping benefits, replacement cards, and all the rest of it.

Patrick McKenzie: like you said, there’s often a broken feedback loop between frontline employees, the upper echelons of the operations group, and then between operations and the actual decision-makers. One thing that happened repeatedly during my time at Stripe—an organization that’s well-run in many ways—is that we used a lot of internal software built by our engineering teams. The users of that software were often people in operations.

Frequently, when engineers would embed with operations for a day, they’d see someone doing something obviously suboptimal and ask, "Does this happen a lot?" The response was often, "Oh yeah, I deal with this six times a day." And then you’d ask, "How long has this been going on?" and they’d say, "About 18 months, ever since the software broke in this way. But I’ve developed my own workaround."

At that point, the engineer would typically apologize, run back to their desk, and fix it—which worked for us. But this doesn’t work in organizations where the engineers, often part of an outsourced consultancy, never sit with the users. Even when feedback makes it up the chain to the head of operations, you need a culture where that person is empowered, incentivized, and willing to take it to the engineering team and say, "Hey, this is a top priority—let’s fix it immediately."

And sometimes, even in a well-run organization like Stripe, that still didn’t happen.

When senior operations management was asked why they didn’t try to push for changes, the answer was often something like, "Well, I’ve worked in the financial industry for a long time, and operations always gets the short end of the stick. Having fully functional software has never happened in my career. So why waste my points with senior management demanding working software when I have other asks to spend political capital on that have some chance of being satisfied?"

That’s where a phrase I use a lot comes in: the will to have nice things. We need the will to have nice things in government offices too. We should instill a culture where the systems people use are expected to actually work. And when those systems don’t work, management should view it as a problem and take action to fix it. Aspirationally, knock on wood.

Broken feedback loops in government

Dave Guarino: Yeah, and a lot of people are fighting those good fights, and I don’t want to diminish that. But you need that top-level feedback loop to make management think about these issues. That’s often where the gap is. Over time, you get narratives like, "It’s always been bad, it’ll always be bad, so why bother?" And at a certain point, if you want to fix it, you have to deal with the fact that people have maladapted to the system. Even when the system gets fixed, you’re left with people who’ve adapted to the dysfunction, and now they’re the ones creating barriers, even though the material barriers are gone.

That’s why these things are truly complex and distributed. Simply improving the software isn’t always enough. I heard a story the other day about how some operations still rely on green-screen mainframes, and the people using them are incredibly efficient—they know every keystroke to get things done quickly. But if you reimagine it as a shiny new web interface with all the Ajax and drag-and-drop features that bring government tech into the early 2000s, you risk overriding the practical, on-the-ground knowledge that made the old system work. And sometimes, that leads to things going sideways.

Patrick McKenzie: The United States immigration system, especially the green card process, is near and dear to my heart—my wife has one. They replaced the old paper-based system with a modern web application, which seemed like the kind of generic consulting solution where you think, "How bad could it be? It’s just paper to web." But it turned out that the expert staff handling these complex cases frequently relied on comparing multiple pages of paper applications side by side—something that was quick and easy to do physically.

[Patrick notes: See generally the Office of the Inspector General report around page 17 for documentation of this issue, though I think I originally read about it in a news article.]

Patrick McKenzie: On paper, you’d just pull out the three relevant pages from a 500-page application, place them next to each other, and run your finger across to make sure they matched. The new web app didn’t support that. Instead, they had to click into one section, write down a number on paper, then click to another section and do the same. Essentially, they recreated a less secure, handwritten version of the paper application at each worksite, which killed throughput, when computerization was supposed to improve it.

This all comes back to issues of legibility—making the system legible to itself. One reason this is so difficult, as you’ve pointed out, is that we’re trying to push the complexity of not just the SNAP application but a significant portion of the U.S. economy—let’s say 20 to 40 percent—through a very small number of decision-makers. There might be 600 senior decision-makers in the U.S.—members of Congress, governors, the president, cabinet staff, etc. If each of them can devote even 3 percent of their time to one program, that’s a win. But that only equates to about 18 full-time individuals managing the complexity of 1 percent of GDP, or some similarly absurd number.

We have to think about how this complexity is presented to the staff who support these decision-makers and empower their decision-making.

And the answer there isn’t great either. A typical member of Congress has fewer than 10 people on their policy staff, and the typical member of that staff is often, descriptively, a 20-something liberal arts graduate who might have excelled in college debating but probably doesn’t have much experience in things like conversion rate optimization or web applications for benefits programs.

One thing we can do, though, is recognize that these people have a very hard job. But the real question is: what can we, as individuals outside the system, do to take this complex system and help it operate better in line with the values we care about?

Tech's understanding of service design

Dave Guarino: Yeah. One of the things I think about in this context is the relationship between friction and the usage of benefits—specifically, how friction acts as a form of rationing. One thing the tech industry understands innately, part of its core epistemology, is that the design and form of services directly impact the utilization of those services.

I know it sounds a bit obvious when you put it like that, but the important point is that government, up to this point, doesn’t usually have that same mindset by default. There’s no built-in understanding that the way something appears on a form, how simple or quick it is to complete, will affect whether people actually use it.

So I think one thing...

Patrick McKenzie: There is no Paul Graham in government who is screaming at you every day: “Have you talked to a user yet today? Have you talked to a user yet today?”

[Patrick notes: PG probably wouldn’t scream, but counterfactual government PG might.]

Though probably out there, there are some excellent program administrators asking, "Have you talked to a customer today? Have you talked to a client?" I’ve seen some amazing examples where people completely buck the structural incentives and make things work by hook or by crook. They create that accountability. But structurally, that’s not common. It often comes from the practical reality that, in a startup, if your onboarding is difficult, people won’t use the product, and then you don’t get revenue—and you go bye-bye. That dynamic doesn’t happen in government.

In fact, depending on the program and its political economy, friction can sometimes be seen as a positive by certain stakeholders. "Oh, fewer people are using this? That’s good." There’s a great book by Don Moynihan and Pam Hurd, who I believe are now at the University of Michigan, called Administrative Burdens: Policymaking by Other Means. It delves into the academic and political science argument around this, but the bottom line is clear: if we want government systems to care about the friction they impose on users, we need to make them care.

These issues need to be part of the "zero-sum agenda"—the small set of priorities that must be measured alongside other key metrics. It kills me that every benefits web application out there isn’t reporting the percentage of people who start versus those who actually complete the process, along with where drop-offs happen on each screen. That would be a simple and profound way to change the legibility of these systems. And it would basically cost nothing.

Issues with improper payments methodology

Patrick McKenzie: Another thing you’ve mentioned to me before is the way formal accounting standards work in these systems. For example, if society establishes a set of rules and we accidentally overpay someone—let’s say $200 more than they were eligible for—we take a $200 hit to our score. If we underpay someone by $50 of the $200 they were eligible for, we take a $50 hit to our score.

But if society establishes a set of rules, and someone is eligible but never completes the application process, we take no hit—because we don’t measure that number.

Exactly. This falls under an area of federal policy called improper payments, and it’s one of my two pet issues. Like you just said, if someone applies and is denied inappropriately—or, even worse, can’t complete the application because it’s so full of friction—then really, we should view that as a $200 underpayment if they were eligible for $200 but got nothing. That should be considered just as bad as someone receiving $200 when they were eligible for zero.

But under current improper payments methodology, which applies across all government benefits programs, we don’t measure that. This comes from federal statutes and regulations implemented by OMB. The way improper payments are defined excludes non-payments, which makes sense in one way. If you start with a universe of improper payments, you’re naturally excluding those who didn’t receive payments at all. But that subtle difference has huge consequences—we don’t count the people who didn’t get through the system.

Patrick McKenzie: This is a wonky "legibility of the system to itself" kind of topic. I think a lot of people or advocates would hear this and say, "Oh, this is just the United States being stingy to poor people, film at 11." But it’s not simply that. If, due to a math error or similar, someone is paid $20 of the $200 they’re eligible for, that $180 difference would be surfaced in reports, counted as a mark against the program, and tracked over time across the beneficiary population.

But the zeros—the people who don’t get through the application process at all—don’t get counted. This isn’t because someone comes into the office with the goal of dropping as many people off the welfare rolls as possible. It’s simply because we started with the mindset that improper payments are the problem to solve. So, we look at the universe of payments and zoom in on which of those are improper, rather than focusing on the bigger picture of access.

Dave Guarino: Exactly. When you start from the flawed foundation that improper payments are the primary concern, it leads to absurd conclusions. Take it to the extreme: if a benefits program made zero improper payments because zero payments were made, you’d quickly realize something is seriously wrong. A benefits program is meant to distribute benefits.

And you're right—there’s no one sitting in a room, smoking a cigar, and gleefully excluding 100% of underpayments from the improper payments methodology. But structurally, we just don’t account for it. This ties into the Seeing Like a State aspect of it. One of the reasons I've heard for this is that it’s a bit harder, bureaucratically and administratively, to determine what someone was eligible for if they didn’t complete the application process. Plus, it’s more difficult to get in contact with those individuals afterward.

But I firmly believe there are ways to get around this. For example, you could call the person and say, "We’re evaluating whether the agency wrongfully denied you, and if we find you were eligible, we’ll give you all the back pay." I assure you, most people who get that call would gladly comply with the review.

Patrick McKenzie: Or we could use, you know, groundbreaking new technology from social science, like sampling.

[Patrick notes: One might detect that I feel a certain degree of frustration.]

Dave Guarino: Well, that’s the thing—we do use sampling to calculate improper payments. That’s how SNAP conducts quality control for payment errors. And this leads to my other big frustration: SNAP has a structural asymmetry in what it cares about, or rather, what it measures, which shapes what it cares about as a system. It’s not because people don’t care, but because the feedback loops only exist for certain things.

For example, we don’t really measure or report why people were denied benefits. One of the requirements is an interview. Just today, I was helping someone who said, "I didn’t get the call for my interview, I called back, and now I’ve been on hold for two hours." Many of those people end up receiving a denial letter saying, "You didn’t complete your interview," but we don’t measure that.

It would be one thing if we consciously decided, "Well, the interview is still worth it." But we haven’t even done that. In our previous work on building GetCalFresh—a simplified SNAP application, funny enough—we turned out to be one of the few data sources actually measuring why people were denied. And now there are academic papers pointing to our little dataset as evidence of how many applicants were denied for missing an interview.

This highlights another part of the improper payments problem. Unless 100% of denied applicants are truly ineligible, there’s waste in the system. The mission isn’t being accomplished. But we lack the feedback loops necessary to help administrators choose between a decision that would get more eligible people benefits and one that might just make it harder for ineligible people to receive benefits.

Administrators face this dilemma constantly. But because payment accuracy is the primary measure of success, they’re structurally pushed toward one side—unless they make a very intentional decision to do otherwise.

Patrick McKenzie: And the default, for structural reasons, will be to err on the side of making fewer payments. That’s because making fewer payments is the only way their score improves. Making more payments just increases the risk at the margin.

Dave Guarino: Dave Guarino: Yeah, and it’s even more subtle than that. A frontline worker might want to approve someone, but they might think, I’ve got two pay stubs, but maybe I should ask for a third one, just to be safe. They could probably approve the person now, but if in the past year the quality control team flagged them for not asking for that third pay stub, and it got sampled, they’d get in trouble. Why? Because quality control is focused on payment accuracy—that’s what gets reported up.

It’s not that anyone’s being a bad actor. In fact, most workers probably lean toward approving someone because, even cynically, that’s less likely to result in a time-consuming appeal. But the system is set up to push them toward asking for just a little more—to be extra sure of eligibility, to get that additional document.

This subtle systems pressure creates friction. If we want to reduce that friction, we need to intervene at that level. And that’s why these high-level issues, like improper payments, end up being the key. Including 100% underpayments in that measurement would be a simple trick to fix a lot of things.

Someone should really do that.

Effective ways to influence policy

Patrick McKenzie: So, being more concrete than just saying "someone should do that," what are the effective ways to operationalize the—sometimes called—posting-to-policy pipeline, where us randos on the internet type things into a web form, and somehow that ends up in the halls of power and actually gets things done?

Dave Guarino: Well, I’ll admit I’m only medium at best when it comes to this, but there are some very smart people who do this much better than I do, and I’m trying to improve. One key thing is understanding what’s legible to the policymaking space. People often talk about "the binder." If you have a binder, a white paper, or an issue paper, that’s often what policymakers want. They don’t want a Substack post or a tweet thread, even though it might start that way.

Patrick McKenzie: There is an enormous benefit in printing out a copy of a blog post and putting the words White Paper on top of it.

[Patrick notes: For similar reasons, I have long suggested that professionals not call their personal sites blogs, and similarly avoid packaging essays to look like they are in the blog format, with very prominent dates and similar. I have a mixed record of taking my own advice here.]

Dave Guarino: A hundred and ten percent. Someone should probably create a language model-based website where you input a tweet storm, and it spits out the DC-circles white paper version of it. That would be a high-leverage intervention—and we could do that in 30 minutes today.

The other thing is, while that might help, raising the salience of these issues is key. I firmly believe there are a lot of non-controversial, extremely niche, in-the-weeds details that just haven’t had a champion pushing for them. That’s why, when I dig into these systems and pull on enough threads, I find the root causes behind things like the worker asking for extra verification or the quality control person flagging them for not asking for it last time. These systems run on inertia.

One important thing to remember is that you can’t assume someone else will point out the irrationality of it or push for change. A lot of these issues are just on cruise control. Status quo bias is incredibly strong. Sometimes just putting it down on paper in a way that others can pick up later is the intergenerational work that needs to be done.

Another thing about the posting-to-policy pipeline is that policy has a very odd shape: nothing’s possible for 20 years, everything’s possible for six months, and then nothing’s possible again for another 20 years. It’s like healthcare policy—people spent 20 years building up to the Affordable Care Act, then there was six to twelve months of horse-trading, and suddenly every idea and white paper was relevant. Then the window closed, and we wait another 20 years.

That’s very different from the tech sector, where you can just start building something now. In policy, you’re waiting for that window. But if you care about an issue, get into the weeds, write the paper, and share it—ask for feedback. You might be surprised to find that you’re the only person pointing it out.

Patrick McKenzie: I think Nat [Friedman] observed the other day that the world is definitely not efficient, even though people often believe it is. When you get into the weeds of Requests for Comments on various federal regulations—even for large parts of the economy—you might see only 25 comments submitted. And those will consist of three industry advocates, two academics who’ve wandered into the field for some reason, three people who’ve "done the reading," and the rest is… not well calibrated.

So, when the United States of America effectively empowers a committee to make the next decision on something, it’s often those eight people in the room—and almost no one else. The reality is, you could be one of those eight people on whatever issue you care about, for the price of what is essentially a jumped-up blog post (just don’t call it a blog post).

Dave Guarino: Yeah.

Patrick McKenzie: So, I think one key takeaway is the simple fact that it is possible to make an impact. Neither of us were bitten by radioactive spiders, yet we’ve had outsized influence on outcomes in states and nations simply by deciding to take up the torch and do relatively easy things that were already in our professional wheelhouses. Someone listening to this can do the same.

Do you have any other thoughts on how we can increase agency within agencies—especially in places where agency seems hardest to cultivate right now?

Dave Guarino: Well, there are a few things I’ve observed from spending time in and around government, including about nine months in the federal government during the pandemic. I was supposed to be there for six months, but it stretched to nine. There’s a lot of current discussion around increasing state capacity—enhancing the government’s ability to actually get things done. I think there are several structural factors at play, but also a lot of rules we could simply get rid of because they don’t pass a cost-benefit analysis.

At a higher level, I’m not sure we have an effective way to systematically and continuously chip away at those unnecessary rules. One that comes up a lot in my community is the Paperwork Reduction Act. I don’t have as strong of an opinion as those who’ve spent a decade fighting it from within, but I did find it interesting that even when I worked on something with a statutory exclusion from the Paperwork Reduction Act, the specter of it still came up. It created a chilling effect, with people worrying that something might not be allowed—even when it technically was.

The bigger issue is that we’ve made government a place where the default is that something is disallowed rather than allowed—even when it aligns with the mission. That creates enormous friction and makes it very difficult to achieve anything. One part of the solution is simply removing barriers that hold us back.

I think there’s plenty to look at in terms of government-wide mandates. Why are we expecting NOAA, NASA, the Department of Defense, Health and Human Services, the VA, and the Department of Transportation to all act the same way? These are government-wide mandates, but I’d guess those agencies have very different domains. A good starting point would be asking: Do these mandates really make sense across all agencies, or should we be excluding certain ones?

That’s one way to remove unnecessary constraints. The other aspect is shifting the focus from compliance to mission. Right now, in many government agencies, it’s easier to get fired for being non-compliant with a policy than for not achieving the mission. Oversight is, to some extent, mission-oriented, but it could be more so. We need more flexibility in how the mission is accomplished. It sounds simple, but it’s not a panacea. However, when you look inside these agencies, so much of the day-to-day conversation is about compliance with procedures rather than What will help us accomplish the mission more effectively?

To the extent we can make it easier for internal teams to focus on that, it’s better. Other people have more fine-grained interventions, but mine would be this issue with improper payments methodology. You can’t fulfill the mission of delivering benefits if the structural incentive is to not make payments. That’s always going to hamper progress.

We should think about this for all government missions: What’s the best operationalization of the mission? And if you’re not meeting it, how can we remove constraints instead of adding more? How can we remove hiring and budget constraints? Shouldn’t we be flooding underperforming agencies with resources and removing compliance barriers, not to reward failure but to actually help them improve? Because more rules probably won’t make it easier for them to succeed.

Getting niche policy ideas into circulation

Patrick McKenzie: So, I think our homework for today is writing up the improper payments rule white paper and finding a way to inject it into policy circles. And while "injecting into policy circles" sounds like we need to call K Street and figure out the lobbying apparatus of the U.S., there are more accessible ways. There are blogs—ones known to much of the audience—that are practically part of the U.S. org chart, in the same way the New York Times editorial board is. Simply getting something onto those blogs is a relatively low-friction way to push it into policy conversations.

Alternatively, getting an op-ed into a major metro newspaper is surprisingly effective—and surprisingly accessible—if you can write a compelling 800 words. [Patrick notes: My neighbor recently got an editorial into the newspaper I grew up reading. This is a very effective piece of collateral to bring into a discussion with one’s representative about the same issue.]

Dave Guarino: Yes, and I’ll add two bits of tacit knowledge I’ve picked up. I sometimes post about niche topics on LinkedIn or Twitter, and you’d be surprised how the niche shows up in those spaces. Someone’s writing about this specific issue? "Oh, that’s my niche!" I’m always delighted when I see a random state.gov email subscribe to my Substack. And, of course, slightly worried I might unintentionally upset someone at some point.

But I get excited because it shows how you can reach the right people. Being public is, surprisingly, a form of differential agency. A lot of people working inside government can’t be as public with these issues, so when they can point to something public, it can be very useful internally.

That’s why it’s helpful to find a pet issue, put up a flare, and say, "Hey, everyone—this is where we’re talking about this." You’d be surprised who shows up.

Patrick McKenzie: Mm hmm. Sometimes we can produce the artifacts that an internal agent of change, let's say, would like to produce but couldn't because that will be a career limiting move, et cetera, et cetera, and, uh, simply allow it to be introduced into a discussion where it wouldn't otherwise exist.

Dave Guarino: Yes, some people are probably mad at me because I’ve definitely sent government officials Reddit posts where their customers are having trouble with a process. I send it in a kind-hearted way, like, "This might be worth looking into. You’re in leadership, and maybe you don’t have a legible measure of this, but here’s something to consider." I’m sure 30% of people get annoyed by it, but I’d wager a decent chunk of the other 70% are actually grateful. Either they can’t look at Reddit themselves, or it validates a hunch they’ve had but haven’t been able to prove because middle management hasn’t given them the data to make the argument.

Now they can point to that Reddit post and say, "Why is this happening?"

A lot of these problems are hiding in plain sight. If you’re concerned about a government process, look at the ad hoc Facebook groups, Reddit threads, TikToks, and other places where people are talking about their experiences.

Importance of frontline knowledge and user feedback

Patrick McKenzie: These groups were a goldmine during the VaccinateCA days. We’d go to the so-called "vaccine hunter" groups and see collaborations between people who were relatively well-versed in navigating the medical system and those who weren’t. We’d then point this out to decision-makers, saying, "Look, you practically need to be a lawyer to get an appointment right now—maybe fix that."

Even just surfacing this for decision-makers is crucial because they’re often managing by metrics or focusing on the stakeholders who show up to meetings—usually people on a government pay scale, not the actual beneficiaries. So, if they won’t go out into the world and seek the voice of the users, we can do everyone a favor by bringing the voice of users into those conversations.

Dave Guarino: Exactly. Even the best-intentioned people—those who might have started as a social worker, then became a policy analyst, a budget analyst, and eventually the director—experience a kind of knowledge calcification over time. Their understanding of the frontline degrades, and a recency bias sets in. They begin to think that the metrics they now have access to are more reflective of what’s happening on the ground.

But these external voices—like people posting online—can serve as an escape hatch. Yes, some of these stories might be anecdotal and not fully generalizable, but if they keep coming up, maybe your metrics are wrong. It’s like seeing 100% uptime in your monitoring system, but someone sends you a screenshot of a 404 error on your website. Sure, it could be a one-off issue like a DNS problem or bad internet connection, but you should absolutely look into it. And if seven people are posting about it, you really need to dig deeper.

Patrick McKenzie: And this is something we constantly have to force ourselves to do, even in well-operated capitalist institutions. We regularly tell the engineering team, take out your own credit card and go through the ordering process once a quarter—buy the software. Because if you don’t, it’ll be three years before anyone in the building checks that page again, and we’ll never notice that, oh, Japanese Amex cards don’t work anymore—whoops.

That’s feedback that is unlikely to get through the multiple layers of defenses designed to protect senior decision-makers from reading an annoyed email from a customer in Japanese saying, "Why won’t my Amex work?"

Improving government services

Dave Guarino: What you’re saying points to another thing more people should do. Every time you use a public service, do a screen share or record it as a Zoom session. Use a tool to chop it up, or even just take screenshots and post them later. Send them to the person who runs that program, along with feedback on what was good and what was bad.

You’d be shocked—this might be the first time they’ve received that level of granular feedback in years. It’s such an easy thing for anyone to do, and I’d love to see more screenshots of government websites floating around the internet.

Patrick McKenzie: As someone who did this as my stock consulting engagement for well-regarded firms in the U.S. tech industry for over three years, I can almost guarantee that no senior decision maker has ever seen that level of detail. And there are plenty of problems where, if you just put a red circle around something and add a question mark, someone will make a phone call that day. "Why are we doing that? This is clearly broken." So, more red circle annotations and PowerPoint slides will increase value in the world.

Dave Guarino: Exactly. If even a small percentage of people who use public services recorded their experience or took screenshots and shared them, the world would look very different in 10 years. Being able to point to a screenshot with a red circle and a question mark is like comparing a cassette tape to the best possible digital audio—it’s night and day.

Patrick McKenzie: Yep. "The website is hard" invites a 90-minute meeting with no outcome about digital access, the digital divide, and a whole lot of bike-shedding. But "Step 7 is broken for this user population and should be fixed today" invites a small, actionable task that doesn’t require a lengthy stakeholder meeting. It’s incentive-compatible for everyone—no one’s going to say no to fixing the obvious issues.

Anyhow, with that stirring note, thanks so much for coming on today, Dave. And I hope to see the rest of you folks around the internet.

podcast