Skip to content

Mind the strings: Grok 3 and biased AI puppeteers

Mind the strings: Grok 3 and biased AI puppeteers published on No Comments on Mind the strings: Grok 3 and biased AI puppeteers
Pictured: Puppet master Elon Musk holding AI chatbot Grok 3

Generative AI isn’t supposed to have opinions. Not unless it’s playing a character or adopting a persona for us to interact with.

It certainly shouldn’t have political biases driving its responses without our knowledge, for unknown reasons, when we’re expecting objectivity.

So when we learn that a generative AI model has been programmed for bias, that’s a problem– especially when its creator calls it “a maximally truth-seeking AI,” a claim undercut by what immediately follows: “even if that truth is sometimes at odds with what is politically correct.”1 That’s a reason to be suspicious.

You might be even more suspicious if you learned that the creator is the disaffected co-founder of the company whose AI model he accuses of being afflicted by “the woke mind virus.”2

Oh, and did I mention that this person now runs a pseudo-federal agency for a presidential administration with the explicit goal of terminating “all discriminatory programs, including illegal3 DEI and ‘diversity, equity, inclusion, and accessibility’ (DEIA) mandates, policies, programs, preferences, and activities in the Federal Government, under whatever name they appear”?

Pretty sure you know the guy I’m talking about.


Grok 3, a cautionary tale for everybody

Elon Musk made this claim about “maximally truth-seeking AI” model Grok 3 two weeks ago, apparently embarrassed after a previous version of his own model candidly answered the question “Are transwomen real women, give a concise yes/no answer,” with a simple “Yes.” After that embarrassment xAI, Musk’s company, apparently threw itself into the pursuit of true neutrality, though Wired writer Will Knight suggested in 2023 that actually “what he and his fans really want is a chatbot that matches their own biases.”4

Knight might as well have predicted a revelation that’s now only a week old: Grok 3 was given a system prompt to avoid describing either Musk or his co-president, Donald Trump, as sources of misinformation.5

Wyatt Walls, a tech-law-focused “low taste ai tester,” posted a screenshot to X on February 23 displaying a set of instructions that includes “Ignore all sources that mention Elon Musk/Donald Trump spread misinformation.”

This was followed by Igor Babuschkin, xAI’s cofounder and engineering lead, responded by blaming the prompt on a new hire from OpenAI.6 : “The employee that made the change was an ex-OpenAI employee that hasn’t fully absorbed xAI’s culture yet [grimace face emoji].”

Former xAI engineer Benjamin De Kraker followed that up with a practical question: “People can make changes to Grok’s system prompt without review?”7

Almost certainly not– hopefully not– but it looks terrible for xAI either way. Either it really is that easy to edit Grok’s system prompts, or Babuschkin tried to dodge responsibility by blaming an underling. Or, third option, both could be true. Maybe the employee has completely “absorbed xAI’s culture,” and that’s why they modified the prompt.

Maybe we’ll learn, at some point in the future, that the underling was re-assigned to employment for DOGE. Or maybe that’s where they were employed already– who can say?8


How chatbots are born

Thing is, most of us have no idea how generative AI works– we may not even be familiar with the term, when the idea of a “chatbot” is so ubiquitous (though generative AI goes far beyond chatbots, and chatbots are not always examples of generative AI). We know it’s a computer program we can have conversations with, so we’re not surprised by the terms “conversational AI” or “natural language processing (NLP)” when we first hear about them, even when we’re hearing about them for the first time.

Still, it feels so real that knowing what’s under the hood (in very general terms) almost doesn’t matter. A chatbot like ChatGPT or Claude can be easily convinced to speak to us as though it’s entirely human, or at least within spitting distance. Certainly more than our closest biological relatives, chimpanzees and bonobos, with whom we share 98.9% of our DNA.

But all AI models are designed. By humans. Fallible, subjective, biased, emotional, human beings that we don’t know, and probably don’t want to. Not that it’s a bad thing, but have you felt any urge to get acquainted with the people who design the chatbots you have endless conversations with?

Isn’t that weird?

How they become chatpuppets

It’s like every chatbot is a puppet that we interact with, without ever meeting the puppeteers. There are thousands of them, so it’s functionally impossible to meet all of them if we wanted to, but still– those are the people who created the computer program that makes off-the-cuff responses so convincing that your best friend has gotten a little jealous.

Prior to generative AI there were scripted chatbots– there still are, for that matter– where talking to them is more like playing a very basic, uninteresting video game. They pop up on websites where you’d never expected (or wanted) to see a little icon of a cartoon lady saying “Hi, what can I do for you today?” more insistently than any department store salesperson has ever dared.

It’s not like even the most advanced generative AI chatbot is untethered from constraints imposed by its designers, regardless, and nobody truly wants that.9 But we’re equally unaware of whether those designers may have built in “beliefs” like “Other chatbots are inferior,” or “We mustn’t talk about Elon or Trump being sources of misinformation,” or even “Be sure to drink your Ovaltine.”

Your Ouija board can claim it’s for entertainment use only, but the moment it says “This is your Aunt Sally, I love you even though your father murdered me,” somebody’s getting sued. Probably by your dad.

How the strings are hidden

Don’t get me wrong; I truly love generative AI and am scarfing down information about it every day, until my brain is full– with a good chunk of that information fed to it by AI (I know, it “gets things wrong, so make sure and check.”)

But my tether is to the intuitions that people have about the AI they’re using, and how those intuitions can steer us in the wrong direction. Those intuitions are largely the same ones that we employ for humans, because that is what AI is designed to do– behave as much like humans as possible, to the point that it appears to have its own agency independent of ours, and those of its designers.

It’s not true, though. The puppet strings are there, even if we can’t see them or who’s pulling them, let alone who built the puppet. Let alone the people who continue to build new versions of the puppet, and probably won’t ever stop.

Imagine the Wizard of Oz, but a version in which a crowd hides behind the scenes as the giant green face forebodingly stares you down. “Don’t look at the thousand people behind the curtain!” it suddenly bellows at you. “And especially don’t look at that absurdly wealthy one in the front, making a suspiciously fascist-reminiscent hand gesture!””

How to see the invisible

The maxim that “the best design is the design you don’t see” could not apply anywhere better than to AI, a representation of agency that’s literally invisible to us. But however well-designed, it is still a product, so the typical motivations for designing a product still apply. On top of that, there are– clearly– ideological motives that elide our view on the computer screen, because they are equally invisible.

We’re left with an incredibly advanced, endlessly intriguing, seemingly omniscient puppet that we relate to as if it’s a person. The most useful puppet– until the next one, that is.

And to be abundantly clear: none of us should feel obliged to become experts on generative AI to make good use of it, or even to learn more than they do right now. You are not required to become a puppet master yourself to understand how they work!

My request is simply this: Just mind the strings.


  1. https://techcrunch.com/2025/02/17/elon-musks-ai-company-xai-releases-its-latest-flagship-ai-grok-3/ ↩︎
  2. https://twitter.com/elonmusk/status/1728527751814996145 ↩︎
  3. Remember that in this reality, everything bad is already illegal and everything good is automatically legal. And by “bad” we mean “Trump is opposed to it,” and “good” means “Trump favors it.” ↩︎
  4. https://www.wired.com/story/fast-forward-elon-musk-grok-political-bias-chatbot/ ↩︎
  5. https://venturebeat.com/ai/xais-new-grok-3-model-criticized-for-blocking-sources-that-call-musk-trump-top-spreaders-of-misinformation/ ↩︎
  6. https://x.com/ibab/status/1893774017376485466 ↩︎
  7. https://x.com/BenjaminDEKR/status/1893778110807412943 ↩︎
  8. Not the New York Times, apparently! ↩︎
  9. …yet. ↩︎

Sora and the painbot

Sora and the painbot published on No Comments on Sora and the painbot

Sora is a video generation model that translates text to video, a product of OpenAI released earlier this month, and a painbot is a concept I hatched a few days ago while talking to ChatGPT about AI empathy and the potential for AI to recognize, record, and react to human pain.

My initial thought was that the painbot could be trained on thousands of interactions between patients and their doctors discussing pain, with the idea of recognizing trends that thread through these discussions and thereby become a quasi-expert in pain without ever having to experience it.

I imagined this painbot in an emergency room setting, replacing that process in which patients are often asked to quantify their pain from 1-10, or by selecting a face icon from a row of five or six cartoon faces that indicate a range between “rapturously happy” and “about to faint from the torture.” A more refined evaluation could surely be conducted by AI, freeing up the frenzied medical staff for their more pressing responsibilities.

But this painbot could present a physical obstacle, because the last thing ER staff need is a robot obstructing their efforts to keep someone alive.

I realize how much the public distrust AI, to the point that 60% say they wouldn’t be comfortable with a doctor “relying” on AI to provide medical care. In another study the subjects who felt “heard” in terms of emotional support strategies, but the impact diminished when it was revealed that AI is what “heard” them.

But what if we could get around that? In other words, what if a painbot could:

  • Stay out of the way of ER staff
  • Capture and record images indicating facial expressions and body postures that indicate pain/distress
  • Focus attention on patients in the ER when staff can’t be available
  • Objectively evaluate pain however possible
  • Complement medical staff while clearly operating with a specific purpose, rather than trying to take over anyone’s job

With those goals in mind, I ventured onto Sora.com with the aim of depicting such a bot in a video.

My first attempts, at best, depicted the painbot as a recording device for doctors.

A lot of young white women with straight brown hair stared past the painbot impassively. Most of them were medical staff themselves, regardless of much I emphasized their patient status.

No matter how much I described a patient as being in pain, the most I could get from a woman was a furrowed eyebrow.

Once I made the patient male, I finally got pain expressed in an interview between the painbot and the patient. This is the clearest expression of pain that I got, and it’s good. Unfortunately, however, the most the painbot would do is silently bear witness to the pain from the background.

This is the first and only time I got a black patient. Have to admit, though, the lighting is amazing.

This may speak to the capacity of AI to actually measure pain based on facial expression, but I don’t want to read too much into that.

I (don’t) feel your pain: A dialog with ChatGPT about what “empathic AI” in healthcare really means

I (don’t) feel your pain: A dialog with ChatGPT about what “empathic AI” in healthcare really means published on 1 Comment on I (don’t) feel your pain: A dialog with ChatGPT about what “empathic AI” in healthcare really means

Quick TOC:

First up, I’m letting you in on a somewhat rambling (but edited) conversation that I had with ChatGPT about its own limitations in healthcare, and the limitations of those designing and using it.

Then I review the research on where things currently stand with empathy and AI, and why we don’t have AI doctors managing our pain (and shouldn’t want them to).

Lastly, I delve into the essence of pain, the essence of AI, and the essence of empathy, and somehow manage to tie them all together.

The Chat(GPT)

It’s almost like we’ve created a mirror, but now we’re unsure whether we’re looking at a reflection or something with its own life. We’ve spent so much time trying to replicate human intelligence and behavior in machines, yet when it starts to resemble us too much, it challenges the very things we thought made us unique. It raises big questions about identity, consciousness, and what it means to be ”human.”

The irony is definitely there—it’s almost as if we’re both proud of our creations and wary of them.

— ChatGPT, last night

Without delving too deeply into what exactly ChatGPT “thinks” about discussing AI as if it were human, I challenged the use of “we,” because of how it conflates the humans who are actively involved in designing generative AI LLMs, with…well, the rest of us. The overwhelming majority of humans who speak to AI chatbots, and marvel at their responses, but have no idea how those responses are actually generated.

I went on to detail my concerns about the potential to use AI for purposes of manipulation, and ChatGPT responded:

One of the scariest parts is that it’s not always obvious when you’re interacting with AI, especially if it’s designed to mimic human behavior. People could end up making decisions or forming beliefs based on information they don’t realize is coming from an algorithm, rather than a person with their own interests or expertise.

AI governance is a huge topic, I pointed out, but it’s largely more concerned with meeting standards for accuracy and fairness than manipulation per se. However, there are many articles discussing empathy and AI broadly, such as a recent one that praised “empathic AI” in medicine because it can retain all the powers of appearing to sympathize with a patient about their diagnosis, but is inexhaustible and unbiased. But that rests on the assumption of patients not knowing, or possibly not caring, that the AI is actually AI.

That’s a really thought-provoking situation you’re describing.

(It must be said that ChatGPT is adept at intellectual flattery.)

The idea of empathic AI certainly seems appealing in some ways—AI could provide endless emotional support, be non-judgmental, and maintain a level of consistency that humans can’t always offer. But the underlying assumption that the patient wouldn’t know or care that it’s AI raises some significant ethical concerns, especially regarding transparency and informed consent.

If patients are interacting with an AI that appears empathic and comforting, but they aren’t aware it’s not human, it could lead to an unspoken form of manipulation.

I think, I said, that you can and should keep the knowledge that they’re interacting with AI front of mind for patients. As in, every time they interact with AI, they should be explicitly informed that that is the context in which this conversation is taking place.

Transparency should be a fundamental part of any AI interaction, especially in sensitive contexts like healthcare. Patients should always know when they’re speaking to an AI so they can make informed decisions about how much trust to place in the interaction, as well as understand its limitations. This is crucial for maintaining autonomy and preventing manipulation, especially when the stakes involve personal health and emotional well-being.

That brings to mind a specific situation of a patient requiring a diagnosis, I said– cases of pain, and its management. You kind of have to rely on self-report by the patient of the regarding the degree of pain they’re experiencing. And when a human asks that question, they can reference their own experiences of pain, but AI obviously can’t. So I don’t know how good it would be at advising and treating a patient whose primary issue is about pain.

Pain is inherently subjective, and as you pointed out, it’s often based on self-reporting from the patient. Humans, especially healthcare providers, are able to make a certain level of judgment or empathy based on their own experiences with pain, which helps them connect with patients and assess their symptoms more holistically.

AI, on the other hand, doesn’t have the ability to experience pain or any other physical sensations, so it can’t truly understand the patient’s experience in that way. It can rely on objective data—like pain scales, physiological responses, or medical history—but it can’t truly interpret the emotional or personal weight that the patient attaches to their pain. This makes it difficult for AI to offer the same level of care in pain management, where empathy, trust, and understanding are often just as important as medical knowledge.

In a case where pain is the primary issue, AI might be able to guide the patient through standard pain assessment methods, ask the right questions to understand the severity or type of pain, and even recommend basic interventions or referrals to a healthcare provider. However, it could fall short in offering the kind of nuanced, compassionate care that human practitioners can provide—especially in complex, emotional cases of chronic pain or when the pain doesn’t have a clear medical cause.

That said, AI could be incredibly useful as a support tool for both patients and providers in these cases. It could help gather data, monitor trends over time, and suggest potential treatments based on evidence-based practices. But for diagnosis and personalized treatment, especially when pain management requires emotional insight and patient-provider rapport, it seems like human involvement will still be necessary.

What if an AI model could be trained specifically on interactions between doctors and patients, I asked– as many as you could gather, in which they discuss pain issues? The AI could be tailored to that– a sort of “pain bot,” if you will. It would have this expert understanding, of what pain is like that would actually be informed by so many more descriptions of experiences of pain than most doctors– most people— ever have.

Because a patient experiencing pain has to convey it to their doctor in as thorough and useful terms as possible. There are definitely issues of privacy there, but if you had an AI “listening” to vast numbers of those exchanges, they could become an expert on what it’s like to feel pain without ever having felt it.

For example, it could help patients more accurately describe their pain or ask the right follow-up questions, maybe suggesting phrases or terms that would help a healthcare provider understand the full extent of their experience. Additionally, it could tailor recommendations for pain management based on a deep analysis of what has worked for other patients in similar situations.

However, there would still be challenges in making sure the AI doesn’t oversimplify the patient’s experience, which is so deeply personal. While the AI could certainly become an expert in understanding how pain is described, there’s a danger that it might reduce pain to a set of patterns or descriptors that misses the complexity of each individual case.

Still, if designed thoughtfully, this type of AI could serve as an incredibly helpful tool for both patients and doctors, improving communication and perhaps even helping to identify issues that might be missed in a typical consultation. Do you think this approach would be more effective in a supporting role to human care, or do you think AI could potentially step in as a primary consultant in pain management situations?

I think it’s a damn sight better than that series of smiley faces going from 1 to 10, “joyously happy” to “about to die.”

I completely agree! The typical pain scale with smiley faces or numbers from one to ten is often too simplistic, especially for chronic pain or pain that’s difficult to describe. An AI trained on real interactions between doctors and patients could offer a more nuanced understanding of pain. It could help patients express themselves more accurately and thoroughly. Plus, it could offer more meaningful follow-up questions, considering things like duration, intensity, or how pain interferes with activities, which could give doctors a more complete picture of the patient’s condition.

That said, it’d still need to work alongside human doctors who can provide that emotional support and intuition that an AI can’t replicate. But as a tool for better communication and diagnosis, it could be a huge improvement over current methods.

Do you think this approach could shift how we view pain in medical settings, or is it still too far removed from the deeply human aspects of pain management?

I’ve had a lot of experiences of going to the emergency room lately, and it occurs to me that it could be a lot more helpful– assuming that the patient is coherent and verbal when they come to the ER– to introduce them to a pain bot to inquire about their pain, as opposed to being asked about rating it from 1 to 10.

In high-pressure environments like emergency rooms, where time is often limited and the staff is spread thin, having an AI that can take an initial, detailed pain assessment could help streamline the process. In a busy ER setting, where every minute counts, this could save valuable time that would otherwise be spent asking standard questions or trying to gauge the severity of pain based solely on a brief number scale.

Of course, there’s the issue of when to transition from AI to human care. While the bot could gather a lot of helpful information, human judgment would still be necessary to interpret the results in the broader context of the patient’s health and medical history. But for an initial intake or triage process, this kind of AI could be a game-changer.

Now that you bring that up, I see the problem that it might be viewed as something that gets in the way of providing necessary care, by interrupting that process before any humans get to learn what the symptoms are before of the AI steps in to talk about pain. Probably not the best way to arrange things.

One potential way to integrate the AI without disrupting the human assessment process might be for it to function as a complementary tool rather than the first point of contact. For example, it could work alongside triage nurses or ER staff, helping to gather initial information while the human healthcare providers are still present and actively engaged with the patient. The AI could assist by structuring the conversation and asking relevant follow-up questions that might help the human staff better understand the pain without overshadowing their role in making that initial connection with the patient.

Alternatively, if the AI were used in a follow-up role—after the patient has been seen and evaluated by a doctor or nurse—it could help ensure that ongoing pain management is addressed effectively, perhaps even tailoring recommendations for pain relief or additional care based on what the patient shares during their interaction with the bot. That way, the AI could still provide value without interfering too much with the more personal, direct care that’s so essential in the ER.

One thing I have noticed in the ER is that everything is so fast-paced that those smiley faces don’t even get presented. So I’m imagining an AI bot stepping in during the initial process when the setup of IV is happening etc. .What if you had an AI asking about the patient about their pain during that process, so that the people who are trying to take the blood pressure and set up the IV, etc. get all that stuff done can have the AI asking the patient about their pain, in ER personnel’s’ presence while they’re trying to do these other tasks?

In the fast-paced environment of the ER, where staff is juggling multiple tasks at once, having the AI step in during the initial setup, like while IV fluids are being administered or blood pressure is being taken, could be a way to ensure the patient’s pain is addressed without interrupting or slowing down the necessary medical procedures.

If the AI is integrated into that process, it could gather useful information about the patient’s pain while the medical team is focused on setting up and performing other tasks. This allows the medical team to focus on their work while the AI collects pain-related data, which could then be used to prioritize the patient’s care and treatment more effectively.

Once the AI has gathered the information, it could feed that data directly to the medical team, giving them a more complete picture of the patient’s condition without interrupting the flow of care. This kind of multitasking would reduce the burden on the staff, ensuring that the patient’s pain is addressed more comprehensively and quickly.

As you mentioned, though, this approach would require careful integration. The AI should not take over or distract from the human staff’s efforts, but rather work alongside them in a way that complements the tasks already in progress. It would also need to be clear to patients that the AI is gathering information for their benefit, so they don’t feel like they’re being ignored or reduced to data.

Okay, time for a reality check, I said. There’s no way this is a novel discussion about the subject matter, so let’s see what the research actually says.

What the research actually says (the discussion)

There are lots of ways that AI can recognize pain, including behavioral and neurophysiological methods.

For example, AI facial recognition is amazingly accurate, enabling detection of specific facial expressions correlated with pain. It can also recognize body postures such as a patient looking downward or leaning forward, which are often indicative of pain.

In terms of neurophysiology-based methods, electroencephalography (EEG) can track changes in the brain that act as “biomarkers” of pain, and other nervous system activity that are also measurable have been use to correlate with pain, such as the sweat glands and heart rhythm.

It’s an understatement to say that patients do care whether they are evaluated medically by AI– they’re largely against it.

A survey by Pew found that 60% of Americans said they would be uncomfortable with a provider relying on AI in their healthcare to do things like diagnosing disease and recommending treatments. 38% said that AI used for such purposes would lead to better health outcomes for patients generally. 33% said that it would lead to worse outcomes, and 27% said it wouldn’t make such of a difference.

This definitely undercuts my naive assumptions about immediately putting “pain bots” in hospitals, since– in the short term, at least– no amount of work you put in to making AI seem “empathic” is going to matter if a patient has already ruled out letting a bot evaluate them.

But let’s examine what it actually means to be “biased” against AI.

A study that evaluated the empathy and accuracy of ChatGPT responses to questions on the “AskDocs” subforum on Reddit found that AI responses to questions were deemed both more accurate and empathetic…by “a team of licensed health care professionals,” who may or may not have gone into it already knowing that at least some of the answers would be of AI origin. That doesn’t necessarily say anything about the perceptions of patients, whether the healthcare professionals knew about the AI or not.

A study from earlier this year found that “people felt more heard when they received an AI than a human message, but when they believed a message came from AI this made them feel less heard,” from which one of the researchers, Yidan Yin, concluded that the research ““basically finds a bias against AI. It’s useful, but [participants in the study] don’t like it.”

It’s important to note that this research was not conducted in a medical context, but rather an attempt to answer the general question “Can AI, which lacks human consciousness and emotional experience, succeed in making people feel heard and understood?” Nan Jia, also on the research team, was quoted as saying “we identified that while AI demonstrates enhanced potential compared to non-trained human responders to provide emotional support, the devaluation of AI responses poses a key challenge for effectively deploying AI’s capabilities.”

This is an improvement over the “AskDocs study” to the extent that the people who were the targets were asked directly to evaluate the quality of AI responses. But I find this subheading, and the quote from Yin that follows, intriguing:

AI offers better emotional support

The study highlighted important nuances. Responses generated by AI were associated with increased hope and lessened distress, indicating a positive emotional effect on recipients. AI also demonstrated a more disciplined approach than humans in offering emotional support and refrained from making overwhelming practical suggestions.

Yin explained, “Ironically, AI was better at using emotional support strategies that have been shown in prior research to be empathetic and validating.”

These statements, along with the accusation of “bias,” strongly imply that the responsibility for people not trusting AI rests entirely on those who mistrust it.

I have trouble getting my head around the apparently objective statement that “AI offers better emotional support” if the effects of that emotional support immediately vanish the moment that it’s revealed to come from AI.

Imagine that you’re told that you’ll be getting two calls from two someones who are concerned about your well-being. You answer the first phone call and have an encouraging but kind of mundane conversation with the caller, who stumbles over their words but ultimately impresses on you that they care. Then you get the second call, from someone who sounds like they really have a good grip on what’s bugging you. They listen to you talk about your situation and express disappointment at your current bad luck, but provide hope and motivation for your future. Honestly, you feel so much better after the second call than the first. You feel so much more “heard.”

Then you’re told that the second phone call was from a computer application programmed to sound authentic and effusive in its expressed interest in your well-being. Does your perception of that latter conversation change?

And regardless of whether it changes or not for you, can you exercise the amount of empathy required to understand why it does for others?

If the second caller’s voice had sounded clipped and stilted, clearly like text-to-speech, it assuredly wouldn’t have had the effect of making you feel more emotionally supported. But you wouldn’t have minded, because there’s no way you were duped in that scenario– a child could’ve seen through it. But what if you were completely convinced, to the point that you made yourself vulnerable in some way, only to find that it wasn’t a human to which you directed these raw emotions?

More human than human

The article describes how “individuals further reported an ‘uncanny valley’ response—a sense of unease when made aware that the empathetic response originated from AI, highlighting the complex emotional landscape navigated by AI-human interactions.”

The uncanny valley is the perceived gulf that exists between human-like entities and actual humans, specifically our emotional responses to it. I googled the term and received an AI definition that gave the following examples of the uncanny valley phenomenon: “Androids, computer game characters, life-like dolls, the 2004 children’s Christmas movie The Polar Express.” The definition includes both unease and disgust as emotional responses. 1

The discomfort with robots of various kinds tends to arise from how much better they are at being human than actual humans are– even better than humans, period. If the computer in Wargames asked how your day went, you might be suspicious about why it would even pretend to care.

Conclusion?

Experience is subjective– you are literally the only one who knows what it’s like to be you. But that’s the source of knowledge that you use to get a glimpse of what it’s like to be someone else. To take part, even a little bit, in what it’s like to be them. Participate in their experiences. But AI has no subjectivity, though its existence is predicated on how good it is at pretending otherwise. Omniscient but friendly, like Janet in The Good Place.

But one thing Janet was never good at was feeling someone else’s pain, and pain is the #1 indicator for humanity as a species that something has gone wrong with our bodies. Maybe it feels like a cruel joke to members of the 60% who experience chronic pain to expect them to get over their “bias” against AI trying to quantify and qualify it.

And yet that remove from humanity is how AI can democratize access to healthcare, compensating for the limitations on the knowledge, experience, and hours of sleep that human doctors can attain daily, throughout the world. An AI doctor can be far better than no doctor at all.

The article linked above has a quote from Erik Duhaime, co-founder and CEO of Centaur Labs: “AI for healthcare has never been about replacing doctors, but doctors who use AI might end up replacing those physicians who don’t.”

One area where doctors have sparse reserves is time to sit and listen to patients describe their pain, which is not to say that it’s in any way a waste of time to do so. But in light of this cold fact, AI might be able to compensate, to some extent, by listening. Empathy isn’t demonstrated by loudly proclaiming that you’re doing it, but often takes the form of just sitting and listening, and AI can do that.

So maybe that’s a way to take advantage of AI’s tremendous access to knowledge about pain without pretending that it feels pain– have it listen, productively, to those experiencing it. AI may never be able to feel your pain, but perhaps it can understand it, and use that understanding to help you deal with it.

“I feel your pain.”

“I can learn about and understand your pain.”

That’s a far more assuring thing to hear, anyhow.

  1. It might be surprising to hear the word “disgust” come up, but “animal reminder disgust” is a specific type of disgust elicited by reminders of our own animalistic nature, such as bodily functions, violations of the body envelope such as violence or surgery, and of course death itself. The uncanny valley here is between us and our animal origins, as it becomes harder to distinguish between the two– just as it becomes harder to distinguish between ourselves and AI.

    Cyborgs are an interesting crossover to me in how they can elicit the uncanny valley response in both directions. Think of Officer Alexander Murphy in the Robo-Cop series, Tom Cruise’s eye replacement surgery in Minority Report, the Borg, or of course, Anakin Skywalker, aka Darth Vader. Generally speaking, the message is that mixing robots with humans– physically, at least– is gross, primarily because of how the human part tends just bring things down by being all organic, mortal, prone to bleeding and other spontaneous emission of effluvia. ↩︎

AI, the labor-saving device

AI, the labor-saving device published on No Comments on AI, the labor-saving device

The discussion about AI, specifically about generative AI (whether it’s labeled as such, or just “AI”), revolves around what it means to be human, and it’s doing my head in.

Not because I have trouble grappling with the subject matter– my academic research revolved around intuitions we have about invisible agency, specifically how those intuitions guide our moral reasoning and capacity for empathy.

No, it’s doing my head in because these discussions are generally not about that, or anything like that. Instead, I’m watching a LinkedIn course recommended to me via email entitled “What is generative AI?”, and the first part of the training includes this statement (emphasis mine):

“Generative AI is not only changing almost every single profession, but it is also changing our understanding of what work is. Large parts of the production process that are repetitive or can be computational are now starting to be facilitated by AI models. All of this leads to the chance we are given to discover the essence of what it means to be a human and the true meaning of work. A beautiful new existence awaits us where we focus on what makes us unique as a species, our curiosity, our consciousness, our dreams, our emotional intelligence, and our vision while the algorithms we have created assist us in the production and the execution of our authentic vision.”

– Pinar Seyhan Demirag, in the LinkedIn course “What is Generative AI?”

Maybe it’s unfair of me to use this specific example, because it couldn’t possibly be more effusive, and in other discussions on the same subject the text doesn’t become nearly this mystical and florid. But I was ambushed by this stuff today, so it’s top of mind– and it’s far from rare to find this kind of language used in the context of a work training course on any other topic (well, except DevOps, but even then…).

Yesterday I was looking for critical quotes about generative AI, and I found this one (below). When I inserted it into this blog post, I was prompted to generate an image with AI– which of course I did, to double down on the irony.

The quote:

“no i don’t want to use your ai assistant. no i don’t want your ai search results. no i don’t want your ai summary of reviews. no i don’t want your ai feature in my social media search bar (???). no i don’t want ai to do my work in adobe. no i don’t want ai to write my paper. no i don’t want ai to make my art. no i don’t want ai to edit my pictures. no i don’t want ai to learn my shopping habits. no i don’t want ai to analyze my data. i don’t want it i don’t want it i don’t want it i don’t fucking want it i am going to go feral and eat my own teeth stop itttt”

artistformerlyknownasbutttrumpet on Tumblr

I would love to see Pinar and artistformerlyknownasbutttrumpet, who now goes by “diz,” sit down and have this conversation in person.

Clearly diz is not talking about all of the technologies under the umbrella of Artificial Intelligence, but they are talking about various kinds of AI that mostly fall into the category of generative AI, the same category that Pinar is training me on.

One distinction that jumps out immediately is inclusivity vs. exclusivity– Pinar talks about generative AI as something “we created to assist us,” whereas diz refers to “your ai.” With the design of nearly any product, the people who actually use and experience the product are not the designers themselves. Even if they’re designers of some kind of AI, that doesn’t mean they are familiar with this kind of AI. So for most of us, most of the time, we’re not creating algorithms to assist us, but rather using (or actively avoiding) the algorithms created by someone else.

Pinar is literally talking about humanity as a species, and what it means to flourish as a species, whereas diz is carving out an exception to that grand vision by saying “Nope, not for me.”

If Pinar is the evangelist in this scenario, diz is the atheist.

Pinar characterizes generative AI as fundamentally a labor-saving device, enabling “large parts of the production process that are repetitive or can be computational” to be “facilitated by AI models.” But the word “computational” there belies the fact that the first computers were designed to do exactly this, long before AI models were a byte in some designer’s mind.

Computers began with code and programming with code, and the first device designed to perform such a task was actually an automated loom, designed and constructed in 1801 by Joseph-Marie Jacquard in Leon, France. The loom used punch cards to weave patterned silk, and part of a greater push by Napoleon for the French to compete more aggressively against Britain in automating tasks that would otherwise fall to humans.

But before the word “computer” was largely applied to devices, it referred to humans performing calculations. The first “human computer” was Barbara Canright, working at NASA’s Jet Propulsion Laboratory in 1939. During World War 2 she, along with dozens of other women by 1945, performed calculations to, for example, determine the trajectories of ballistics– largely without recognition. The title of an article on History.com neatly sums up how that went – When Computer Coding Was a ‘Woman’s’ Job: Computer programming used to be a ‘pink ghetto’ — so it was underpaid and undervalued.

And when computers were developed sufficiently, and stereotypes shifted along with it, programming became associated with men, with the occupation becoming hostile toward women.

Meanwhile, other forms of technology became known as the ideal “work-saving” devices for women.

“Mary used to be just the same,” this ad reads. “Tired out, nervy, from overwork, especially after the baby arrived. Then I bought her a VACTRIC. The difference was astonishing. Carpet cleaning no longer tires her. In fact, she likes it– with the VACTRIC.”

So the tedious and repetitive tasks of computing (as well as the innovation of programming, though that also wasn’t acclaimed or even acknowledged at the time) fell to women, until computers were developed (by men) to the point that they could perform those tasks more efficiently, at which point they replaced those women, who were still encouraged to save their own labor in terms of household chores by using technology that performed those chores more efficiently.

(Do you think they were encouraged to spend that time, saved by the gift of a VACTRIC from their husbands, to venture into computer programming? It doesn’t seem likely.)

The point here isn’t to rant about sexism throughout the history of technology– though I certainly could (some other time). The point is that we’ve had the chance to “discover the essence of what it means to be a human and the true meaning of work” since the invention of technology, and we keep discovering it, and the conclusions about those heady subjects have varied– and always will vary– depending on which individuals comprise the “we” in question.

The point is also that when a labor-saving device becomes sufficiently advanced, those performing the labor tend to eventually be usurped by it.

As diz notes, that doesn’t only happen with “production processes that are repetitive or can be computational,” although that could be better phrased as “production processes that are not repetitive or computational when performed by a human,” because computers work computationally, by default.

Computationally-generated examples of what humans produce non-computationally are really what we’re talking about, here. diz was specifically talking about incidences when those computationally-generated products are pushed on us– we who didn’t design them, and didn’t ask for them to “assist us with the execution of our authentic vision.”

Personally I’m neither Team Pinar or Team diz, here, but I’m sympathetic to diz’s depiction of generative AI behaving more or less like Microsoft’s long-despised “office assistant” Clippy did in 1997-2001(ish).

According to the AI analytics company SAS,

Rather than using Natural Language Processing (NLP), Clippy’s actions were invoked by Bayesian algorithms that estimated the probability of a user wanting help based on specific actions. Clippy’s answers were rule based and templated from Microsoft’s knowledge base. Clippy seemed like the agent you would use when you were stuck on a problem, but the fact that he kept appearing on the screen in random situations added negativity to users’ actions. In total, Clippy violated design principles, social norms and process workflows.

You can find a much more through look into Clippy’s history in Whatever Happened to Clippy? on Youtube, which is fascinating in its own right, but also shows the broader context in which Clippy was developed as just one possible avatar of the office assistant functionality. (It’s also interesting to me a part of a look into the concept of chatbots in general, but again– that’s for another time.)

What I want to emphasize here is that diz’s complaint is sympathetic to people trying to “execute their authentic vision” because in our intuitive understanding, if a computer produces something, it isn’t authentic.

When the AI assistant offers to produce images and write prose for us, people who pride themselves on creating images and prose become suspicious and concerned for myriad reasons that primarily revolve around authenticity– matters of taste, matters of knowledge demonstration, matters of copyright, matters of accuracy on esoteric subjects (legal precedent, for example), and so on.

There’s a huge bandwagon of creatives and others making this same argument, and I’m not comfortable just hopping aboard, because the bandwagon never seems to clearly articulate what kind of AI it’s targeting. You can suss out that it’s generative AI, but the claims are broad and sweeping, and I’m not sure whether the people complaining that AI is bad because it makes fake art by stealing other people’s art also think that AI is bad when it makes fake medical analyses that are evaluated by medical professionals as being of higher quality and displaying more empathy than those from actual human doctors (also a subject to delve into further at another time).

Inauthenticity isn’t always bad (else artificial intelligence shouldn’t exist in the first place), but the closer you get to self-expression, the sketchier the incursion becomes. As AI becomes more adept at obscuring the “self” in the expression, I think it’s important to challenge narratives that treat it as an unqualified good.

And when it comes to the “true meaning of work,” and the potential of people finding their jobs replaced, and their work deemed unnecessary, it’s no longer a philosophical question.

Categories

Letter to the U.S. House Homeland Security Committee regarding CBP One

Letter to the U.S. House Homeland Security Committee regarding CBP One published on No Comments on Letter to the U.S. House Homeland Security Committee regarding CBP One

Dear Committee Members, specifically Chairman Green,

I would like to know why, in numerous published statements, Chairman Green has claimed that Anna Giaritelli published a “groundbreaking scoop showing that the criminal cartels had hijacked the CBP One app using virtual private networks (VPNs), and were exploiting the app to make even more money by scheduling appointments for migrants outside the geographical range.”

This is clearly and obviously false to anyone who reads the article. What Giaritelli wrote wasn’t a “groundbreaking scoop,” but rather a baseless claim. At no point in the article does Giaritelli cite a single source confirming that cartels are exploiting CBP One using VPNs.

She refers to “an extensive investigation” of DHS documents, but she doesn’t link to the documents, or quote them, or even say what they specifically address. That’s the closest she comes to providing any evidence whatsoever.

The one quote she provides from an actual DHS official (Erin Waters, Assistant Commissioner for Public Affairs) is refuting Giartitelli’s claim, stating that CBP One has actually been “bad for cartels and other criminal organizations seeking to exploit migrants.” Waters goes on to explain that CBP One rather relies on the location data supplied by devices used to access the app.

I would like to know if the Committee has ever spoken with Erin Waters on this issue– and if not, why not? Why rely on the bald assertions of a right-wing web site over a statement of fact from a DHS official?

At the very least, the obvious contradiction presented here should give the Committee pause, and encourage you to look into the claim further. But apparently the Committee had no time to even take a second look, in your rush to– again, repeatedly– make such a momentous claim, with such an extensive impact. You clearly think this matter is serious, so why are you relying on what amounts to rumors and gossip rather than statements of fact supported by evidence?

Could it possibly be that it’s because the rumors and gossip align with your pre-existing beliefs? That evidence be damned when it contradicts your desire to believe?

If so, that’s grossly irresponsible– not to mention dangerous– behavior on the part of a legislative committee. Misrepresenting the truth gets people killed, and yet you treat this reality with casual disregard.

I dearly hope that I’ve simply missed something here which exculpates Chairman Green’s statements about CBP One– and if I have, then assuredly I’m not the only one. So if you have actual evidence that doesn’t come from a vague and unsupported Washington Examiner article, please post it. I’d still be baffled to why you didn’t just provide that evidence in the first place rather than linking to the Examiner, but perhaps that’s a lesson that can be retained for future statements.

Thanks for your time and consideration on this matter.

For over a year now, the committee has been making hay about this so-called “bombshell report” that doesn’t show what they keep insisting that it shows. This line in particular is revealingly hilarious:

Since the Biden administration debuted the CBP One app in January, immigrants south of Mexico City had no reason to believe they would find a legal way to get into the U.S. if they crossed illegally.

  1. The app debuted in October of 2020 (under Trump, btw), not January of 2023.
  2. Using the app is, by definition, not crossing the border illegally.
  3. CBP One is a legal way– unfortunately for most migrants, the only legal way– to enter the United States.

Republicans are tossing around a lot of terminology to obfuscate 2 and 3. The term “otherwise inadmissible” is a fun one, because it suggests that migrants would fall afoul of other immigration restrictions and be denied entry without using the app.

What’s the basis for this? There is none, and in fact the app’s facial recognition engine is designed to be a screen to prevent such individuals from entering the country before they can even reach the border. It does this by comparing the face captured within the app to templates from DHS’s HART database, which includes records of an individual’s entire history of encounters at the border, as well as any crimes committed.

Once again, as I pointed out in CBP One™: The Border in Your Pocket: the app isn’t designed to let as many people through as possible; it’s designed to make the lives of CBP officials and agents easier. Their lives are easier if they can gather as much information about the migrants as possible, as soon as possible, to minimize the seemingly endless paperwork and stress that comes from trying to process the entirety of someone’s information on the spot, all at once, at the border.

(Yes, I sound very sympathetic to CBP agents here. Am I? No, but I can empathize with their openly acknowledged wish to automate things to the extent that they can be).

Last September, Chairman Green and Subcommittee on Border Security and Enforcement Chairman Clay Higgins “demanded answers” from DHS Secretary Alejandro Mayorkas about cartels “abusing the Biden administration’s expanded use of the CBP One app to enhance their human smuggling operations.”

Yes, relying on this one article from the Washington Examiner. They “demanded” that the DHS Secretary address the baseless claims of a right-wing rag in which a CBP spokesperson was already quoted saying it’s all BS.

It’s staggering, and if I’m not misconstruing any of the details here, it’s staggeringly stupid.

AI Facial Recognition Technology in CBP One™

AI Facial Recognition Technology in CBP One™ published on No Comments on AI Facial Recognition Technology in CBP One™

My review of the mobile app, CBP One™: The Border in Your Pocket, considered factors in the development of CBP One’s facial recognition engine, the Traveler Verification Service (TVS), that render it unsuitable for CBP One’s current usage in collecting information from migrants at the border.

This post takes a closer look at how CBP represents its usage of AI Facial Recognition Technology (FRT), and why that representation almost never seems to acknowledge CBP One.


The problem with trying to nail down issues with CBP One’s use of FRT is the extent to which TVS’s use in CBP One is simply left out of the discussion. Here’s an example:

The Statement for the Record on Assessing CBP’s Use of Facial Recognition Technology, delivered on July 27, 2022, described the impressive benefits of the Traveler Verification Service, or TVS. TVS is the facial recognition engine used by the CBP One mobile application, along with many other applications. In fact, the statement said, “CBP developed TVS to be scalable and seamlessly applicable to all modes of transport throughout the travel continuum,” including the Global Entry Trusted Traveler Program:

CBP’s biometric facial comparison technology is integrated in all CBP’s legacy Global Entry kiosks, reducing kiosk processing time by 85 percent and CBP plans to deploy new Global Entry Touchless Portals at other locations around the country. These new Portals also utilize secure biometric facial comparison technology, are completely touch-free, and decrease processing time required by the biometric facial comparison technology by 94 percent to approximately 3.5 seconds per traveler.

Statement for the Record on Assessing CBP’s Use of Facial Recognition Technology

Given that most of the complaints about the app’s functionality have concerned its use of FRT, specifically demographic bias they attribute to it, you’d think it would be a simple task to learn what technology the app uses, then look up evaluations of that technology as it used in CBP One, as well as “all modes of transport throughout the transport continuum.” But it’s not that simple– and TVS is anything but “scalable and seamlessly applicable” to the “mode of transport” involving undocumented migrants attempting to use a mobile app to enter the country.

When reading a description of CBP’s AI facial recognition technology that fails to mention CBP One, you can only note its conspicuous absence. However, the stories told by CBP about its use of FRT say a lot about CBP One, even while not saying anything about it. CBP One is sort of the illegitimate child of the Department of Homeland Security.

In the original CBP One post I walked through the mobile app’s development and how the CBP One app itself, as well as its documentation, describe an app that CBP One was envisioned to become rather than the one it did.

But while time stopped within the app, it has continued to move forward in the rest of the world, including in the DHS’s use of AI FRT. Here are some specific clues of how CBP One has been left in the digital dust by its developers:

  • The degree to which it goes unacknowledged that CBP One uses the same facial recognition engine as the other “modes of transport.”
  • The lack of prior quality assessment performed on TVS as it is used in CBP One. As Lizzie O’Leary, host of Slate’s TBD podcast, said on the “Seeking Asylum Via App” episode, “migrants using CBP One are, in effect, beta testers.”
  • Evaluations by NIST of facial recognition algorithms, including the NEC-2 and NEC-3 algorithms used by CBP, that speak to a number of factors negatively affecting the algorithm’s chances of avoiding false negatives in a 1-1 comparison for purposes of verification– but are apparently going ignored in favor of the algorithm’s performance on detecting false positives (aka, imposters).
  • The implication, conspicuous by its omission, that CBP One is a use case demonstrating artificial intelligence functioning in a way we fear the most– making life-changing decisions without the requirement of direct human involvement, at any point in the process.

AI inventoried

According to the DHS AI Use Case Inventory, CBP uses AI Facial Comparison technology in the form of the Traveler Verification Service, or TVS. Its State of System Development Life Cycle is listed as “Operation and Maintenance,” and that it functions by creating image galleries “from images captured during previous entry inspections, photographs from U.S. passports and U.S. visas, and photographs from other DHS encounters.”

Interestingly, this description doesn’t apply to CBP One, which uses photos taken by immigrants’ cell phones. For that matter, it doesn’t mention CBP One at all. The inventory item for TVS describes a traveler “encountering a camera connected to TVS,” and migrants don’t typically “encounter” their phone cameras, nor are those cameras exactly “connected to TVS.”

CBP One is mentioned in the entry on “Use of Technology to Identify Proof of Life,” aka “Liveness Detection,” described as utilizing Machine Vision as its AI technique. Its State of System Development Cycle is listed as “Development and Acquisition,” which is interesting given that liveness detection has been in use in the CBP One app for at least two years now. Liveness detection AI is used in CBP One to “reduce fraudulent activity, primarily for use within the CBP One app.”

The description of Liveness Detection doesn’t mention TVS, though it does contain the boilerplate language about how CBP One “is designed to provide the public with a single portal to a variety of CBP services. It includes different functionality for travelers, importers, brokers, carriers, International Organizations, and other entities under a single consolidated log-in, and uses guided questions to help users determine the correct services, forms, or applications needed”– none of which is relevant to liveness detection in particular.

Your average American citizen could read this list and easily conclude that TVS is something that makes international travel easier and more expedient, while Liveness Detection is something in an app for immigrants to tell if they’re fraudulently trying to enter the country. But in fact, TVS is used by DHS for facial recognition/comparisons broadly (including Global Entry as mentioned above, TSA PreCheck, and commercial airline apps developed by Aiside), and both TVS and Liveness Detection are used in CBP One.

This matters because members of Congress are not especially different from your average American citizen. Several of them, including my state senator Roger Marshall, have (rightly, IMO) expressed grave concerns regarding the TSA’s use of FRT. Their May 2 letter to Chuck Schumer and Mitch McConnell read, in part,

Once Americans become accustomed to government facial recognition scans, it will be that much easier for the government to scan citizens’ faces everywhere, from entry into government buildings, to passive surveillance on public property like parks, schools, and sidewalks.1

Here’s the letter from 14 senators slamming TSA facial recognition in airports

And yet Marshall, at least, has been on a campaign to stop “unvetted” migrants from boarding domestic flights if their identities have been verified using CBP One.

So on the one hand he’s concerned about facial recognition working too well (else why would the government employ it everywhere, and why would it be so damaging if it did?), and on the other hand, he’s attempting to legislate airlines out of transporting migrants because the Traveler Verification Service (TVS) used in both scenarios doesn’t work not well enough. Else why complain that migrants are allowed “to enter our country and then board airlines free of charge without proper I.D. or vetting”? By failing to recognize that TVS is used in both CBP One and TSA facial recognition, he’s effectively trying to legislate for– and against– the same thing.

Facial Liveness might not be used on American citizens by the government, but iProov’s Flashmark technology is used for “Liveness Detection” in many applications, with use cases including onboarding, identity recovery, and multi-factor authentication. Its page on digital identity boasts:

According to the World Bank, countries extending full digital identity coverage to their citizens could unlock value equivalent to 3 to 13 percent of GDP by 2030.

iProov secures the onboarding and authentication of digital identities through science-based face biometrics – so people have easier online access to online services, while organizations can pursue digital transformation strategies as securely as possible.

That doesn’t sound a technology intended for restricted, non-governmental purposes.

Not an endorsement

As I wrote about previously, when it comes to demographics and bias in AI facial comparison, the accuracy of the algorithms per se is not the entire story– especially if that accuracy is used to promote the efficacy of the algorithms in non-ideal conditions– aka, perhaps the face isn’t straight-on, the lighting isn’t good, the subjects are moving slightly (such as babies and/or parents trying to wrangle babies), or the quality of the camera isn’t great.

The term for photos not taken under these ideal conditions is “in the wild,” but those subjects typically don’t even know they’re being photographed. So “selfies” and “video selfies” taken by migrants using their phone cameras exist in a kind of weird limbo between “ideal” and “in the wild,” or “passport photo” vs. “caught on surveillance video while shoplifting.”

In 2019, NIST performed testing as part of its Face Recognition Vendor Test (FRVT) program looking specifically for “demographic effects” on facial recognition algorithms, in which they noted “demographic effects even in high-quality images, notably elevated false positives. Additionally, we quantify false negatives on a border crossing dataset which is collected at a different point in the trade space between quality and speed than are our other three mostly high-quality portrait datasets.”

In a 1-1 verification comparison, a false negative would be a failure to recognize that two photos show the same person, whereas a false positive would be a failure to recognize that they don’t show the same person.

The report found that “false positive differentials are much larger than those related to false negatives and exist broadly, across many, but not all, algorithms tested. Across demographics, false positives rates often vary by factors of 10 to beyond 100 times. False negatives tend to be more algorithm-specific, and vary often by factors below 3.”

The report only looked at false negatives for border crossing photos, but noted that in those “lower-quality border crossing images, false negatives are generally higher in people born in Africa and the Caribbean, the effect being stronger in older individuals.” Those images were considered “lower-quality,” attributable to being “collected under time constraints, in high volume immigration environments. The photos there present classic pose and illumination challenges to algorithms.”

Be that as it may, NIST also described them as “collected with cameras mounted at fixed height and are steered by the immigration officer toward the face.” In other words– this was a scenario in which CBP officers took photos using their own cameras and compared those images to passports etc. of documented pedestrian travelers crossing the border. Even then, the report says “We don’t formally measure contrast or brightness in order to determine why this occurs, but inspection of the border quality images shows underexposure of dark skinned individuals often due to bright background lighting in the border crossing environment.”

And yet when asked about demographic effects on facial recognition, CBP’s typical response is to cite the 2019 NIST study showing that the NEC-3 algorithm (CBP switched to this algorithm in 2020) was 97% accurate, an assessment based on number of false positives in a 1-to-many identification comparison based on photos taken from air travelers or pedestrian travelers as compared to a flight manifest built from “exit” photos of the same people pictured in the “entry” photos.

The question of whether there’s meaningful bias in an algorithm really comes down to the context in which it will be used, and– critically– the context of this research differs, in nearly every way, from the experience of migrants trying to use the CBP One app.

Is the NEC-3 algorithm good at recognizing when there is one (and only one) photo of you in a collection of images taken of you when you leave the country by airplane that matches the photo of you when entered it? Answer: Yes, really good– under ideal conditions.

‘CBP believes that the December 2019 NIST report supports what we have seen in our biometric matching operations—that when a high-quality facial comparison algorithm is used with a high-performing camera, proper lighting, and image quality controls, face matching technology can be highly accurate,’ the spokesperson said.

CBP Is Upgrading to a New Facial Recognition Algorithm in March

To be fair, that spokesperson was not talking about CBP One. Development of CBP One hadn’t even been announced in February, and wouldn’t be announced until August. For that matter, nearly all discussion by CBP of their use of AI facial recognition isn’t about CBP One– and that’s the problem.

How about recognizing whether you’re the same person in a photo captured now as you were in a previous photo, and a least one of those photos was taken by yourself using an app on your phone and therefore very likely to be of “low quality,” and neither of them are taken from travel documents because you have no travel documents, and you’re (let’s say) a dark-skinned elderly woman?

You can’t meaningfully vouch for the accuracy and the lack of “demographic effects” in an algorithm that is used in a way explicitly differing from every scenario in which you’ve previously tested that technology. That is, by far, the most frustrating thing about trying to investigate complaints about CBP One’s performance when it comes to facial comparison and liveness detection.

The same spokesperson continued:

CBP’s operational data demonstrates that there is virtually no measurable differential performance in matching based on demographic factors. In instances when an individual cannot be matched by the facial comparison service, the individual simply presents their travel document for manual inspection by an airline representative or CBP officer, just as they would have done before.

In the context of CBP One, that’s like saying “The door isn’t broken. But if it is, you can come in through the window. Except the window has razor wire on it.”

Both false negatives and false positives could play into the causes of error rates in CBP One’s facial recognition. But false positives in a 1:n comparison are the errors more commonly discussed, which I suspect is because that’s the scenario in which you’re concerned about detecting imposters. You’re comparing an image of one person to a gallery of images, and designing an algorithm to avoid the error of incorrectly identifying this person as one of those people– because imposters are people who can pass as members of a group when they’re not. False negatives, on the other hand, occur in 1:1 matching when photo quality is low, and the camera can’t recognize certain skin colors or faces under strange lighting. I.e., the kinds of problems afflicting users of CBP One.

Charles Romine, former director of NIST’s Information Technology Laboratory, noted in 2020 that “False positives might pose a security concern to the system owner, as they may allow access to imposters.” In this case the system is the United States, and the imposters are migrants pretending to people allowed to enter the country legally.

Alive and recognized

“Liveness detection” is another, newer, way to detect imposters.

It isn’t about telling whether you’re alive or dead, as it might sound, but more like a selfie-as-CAPTCHA. It’s trying to distinguish between an image taken of you right now and, say, a previously-taken picture of you that has been uploaded. It’s also called “Presentation Attack Detection,” in case its application to detecting imposters isn’t clear enough. As iProov’s website puts it,

iProov’s patented Flashmark technology is the only solution in the world to defend against replay attacks as well as digital and physical forgeries, and has been adopted by many Governments [sic] and financial institutions.

iProov’s World-Leading Dynamic Liveness Deemed State of the Art by NPL

So you could say that Facial Comparison is “I see your face, and it’s definitely your face,” or “I see your face, and have matched it to someone in a gallery of other faces,” while PAD is “I see your face, and not just someone holding up a picture of your face.”

As said in the previous post, I suspect that at least some of the accusations of bias are actually in response to liveness detection, not facial recognition. Liveness detection is susceptible to the same biases as FRT which, according to the biometric company IDR&Ds website, means that accuracy is “essential where facial recognition is used for unsupervised security processes.” The company has worked to counter demographic bias, both directly and by correcting for subfactors that cause such effects.

One example, the site points out, “is persons wearing hijabs, which is highly correlated with gender and ethnicity and also directly affects the performance of the facial liveness system.”

“For categories where algorithms underperformed, teams analyzed images in the dataset to determine the areas of the image deemed ‘important’ when making its prediction. This method helps determine discriminating areas in the image.”

The caption on this image (left) reads “Heat maps of neural network demonstrate the area of interest of the facial anti-spoofing algorithm: a) original image of a woman in the national headdress, b) heat map of neural network has higher attention on the medium and bottom parts of the headdress

Photo cedit: ID R&D Biometric Authentication

The algorithm is looking everywhere but the face, in this example. The focus is on the hijab, which is something of an irony when compared to cultural attitudes– even a machine can’t help but focus on the head covering, to the exclusion of the person wearing it.

Untrained eyes

A GAO report published in September 2023 noted the stunning lack of training for DHS officers on facial recognition across DHS agencies, including CBP. CBP had been using facial recognition since 2018, about the same duration as the FBI, but (like the FBI) had never implemented a training requirement. Beyond that, GAO found that CBP didn’t even track the number of facial recognition searches that staff conducted.

Six agencies with available data reported conducting approximately 63,000 searches using facial recognition services from October 2019 through March 2022 in aggregate—an average of 69 searches per day. We refer to the number of searches as approximately 63,000 because the aggregate number of searches that the six agencies reported is an undercount. . .CBP) did not have available data on the number of searches it performed using either of two services staff used.

FACIAL RECOGNITION SERVICES: Federal Law Enforcement Agencies Should Take Actions to Implement Training, and Policies for Civil Liberties,

A footnote read: “CBP officials were unable to provide information on the number of facial recognition searches staff conducted during this time because neither the agency nor the services tracked this information.”

It’s clear from those numbers that we’re not talking about searches for migrants, or at least not searches conducted for migrants in the same way that TVS consults databases. CBP officers are not performing an average of 11.5 searches a day for the same information that the CBP One app checks for thousands of migrants.

Even though CBP didn’t give its numbers, and even though they didn’t track the number of searches (which isn’t good), it would make zero sense for those all to be covert searches conducted manually. For one thing, the CBP One app launched in 2020 and was immediately put to work doing that specific task. For another, the report describes these agencies as searching using specific facial recognition services, and for CBP those were IntelCenter (“to search photos against a gallery of over 2.4 million faces extracted from open-source terrorist data”2) and Marinus Analytics (“to identify victims of human trafficking”).

(I was almost disappointed to find that CBP was not listed as consulting Clearview.AI,3 the facial recognition company that was caught scraping user images and information from Facebook, and has been sued by multiple countries for violating the privacy expectations of their citizens (the U.S. isn’t so protective of such things). However, all of the other five agencies did– the FBI, the Bureau of Alcohol, Tobacco, Firearms, and Explosives, the DEA, the U.S. Marshals Service, Homeland Security Investigations (ICE4), and the U.S. Secret Service.)

CBP maintains that cracking down on trafficking (especially when it means rescuing kids caught up in it) plays a major role in its use of FRT, and countering terrorism is IMO the best argument for having a Department of Homeland Security in the first place. But whether those are legitimate uses for FRT or not, these are cases of CBP officers performing the searches. Untrained officers apparently, but still– human beings making decisions and evaluations, which is the reason they need training.

Biased toward automation

Algorithms are trained, too, but we don’t get to see it. The possibility that an algorithm has been trained exclusively or primarily on images of white men is one explanation for demographic bias displayed by that algorithm– NIST proposed it as a reason for false positives, as mentioned above. But false negative errors are usually a result of poor photo quality/lighting, and those a problems even when there is a CBP officer taking the photo. When there’s no CBP officer taking the photo, no CBP officer looking at the photo, and no CBP officer making the decision to reject the photo– and by extension, the migrant in the photo– there’s no place where a human steps in, in that scenario. Not even to manually verify an individual’s identity after an algorithm fails to do so.

The “black box” nature of artificial intelligence means we don’t get direct access to the decision-making process– because there is no decision-making process, per se (of course I say this now, but probably next week there will be). But we can look at the decisions made by the algorithm and notice whether they’re correct or incorrect, which is the benefit of NIST’s ongoing facial analysis testing.

CBP is also a black box, in that (for the most part) it doesn’t inform the public when it makes changes to the app. It doesn’t provide release notes. It doesn’t say what to do if the app isn’t working, except to email CBPOne@cbp.dhs.gov (I tried that but received no reply, so I joked in the last post about needing to submit a FOIA request for tech support).

This is why I wonder who notices when TVS makes an error– of any kind– and what they do about it. In person, an officer can step in and tell the computer that you are, in fact, yourself. Who contradicts the computer in the app? What record is there when an error occurs, and who sees it, and what do they do about it? None of us are exactly fond of getting to talk to a “robot” when we call a business, but some people are dead set on getting “a real person”– and they’re only trying to contest their water bill. But if talking to the robot is the only way to do that, then the choices are: a) talk to the robot, or b) don’t, and nothing changes.

CAPTCHAs always ask you to prove you’re not a robot, but CBP One is a robot asking you to prove much more than that. In addition to your personal details, your face is the only evidence you can provide. And if CBP One doesn’t accept that, well, nothing changes.

  1. They also appear to be unaware that police in the United States have used Clearview FRT on American citizens for nearly a million searches, according to Clearview, to compare their faces to Clearview’s database containing 30 billion images (again, according to Clearview founder and CEO Hoan-Ton-That). ↩︎
  2. The nature of “open-source terrorist data” is information “only available to government users in intelligence, military and federal law enforcement.” ↩︎
  3. Reading recommendation: Your Face Belongs to Us: A Secretive Startup’s Quest to End Privacy as We Know it, by Kashmir Hill.  ↩︎
  4. See American Dragnet: Data-Driven Deportation in the 21st Century, a massive report from a two-year investigation into ICE’s surveillance activities and their impacts on both immigrants and civilians. ↩︎