Can LLMs Model Real-World Systems in TLA+?
Exploring the potential for Large Language Models to assist in formalizing and verifying complex real-world systems using TLA+.

The uncanny echo between our linguistic output and the sophisticated prose generated by Large Language Models (LLMs) is blurring the lines of self-perception. This isn’t just about the practical applications of AI; it’s a subtle, profound shift in how we understand our own minds, our intelligence, and what it fundamentally means to be human. We are witnessing, and perhaps participating in, a phenomenon we can call LLMorphism: the emergent tendency for humans to increasingly view themselves through the lens of language model capabilities.
For decades, artificial intelligence has been a theoretical pursuit, a distant horizon. Now, with LLMs like GPT-4 and its successors, AI is no longer an abstract concept but an intimate, interactive presence in our daily lives. They draft our emails, summarize our documents, and even help us brainstorm creative ideas. This ubiquitous interaction, coupled with the astonishing fluency and coherence of LLM-generated text, is prompting a disconcerting introspection. If a machine can produce text that is virtually indistinguishable from human writing, what, then, is unique about our own cognitive processes?
This essay delves into the philosophical and sociological implications of LLMorphism, arguing that while LLMs are powerful tools for understanding and manipulating language, they are fundamentally not models of the human mind. Our perceived resemblance to these models is a biased inference, a fascinating but ultimately misleading reflection that risks devaluing the very essence of human cognition.
At their core, LLMs are sophisticated statistical engines. Their remarkable ability to generate coherent and contextually relevant text stems from a deep analysis of vast corpuses of human language. The Transformer architecture, the bedrock of modern LLMs, excels at identifying and replicating intricate patterns within this data. They predict the next “token”—a word or sub-word unit—based on the preceding sequence, drawing from probabilities learned during their extensive training.
This process, while immensely effective for language generation, starkly contrasts with human cognition. Humans don’t merely predict tokens; we possess an internal model of the world, built through continuous, multimodal sensory input and embodied interaction. Our understanding of “meaning” is not solely derived from statistical co-occurrence but from a rich tapestry of experience, causality, and subjective qualia.
Consider the difference between an LLM “understanding” the concept of “gravity” and a human grasping it. An LLM can articulate the laws of physics, discuss Isaac Newton, and even generate poetic descriptions of falling apples, all based on patterns in its training data. However, it has never felt the pull of gravity, nor has it experienced the consequences of a fall. Its knowledge is purely symbolic, devoid of the grounding that comes from embodiment and direct interaction with the physical world.
Furthermore, LLM “memory” is not akin to human episodic recall. When we ask an LLM to remember something from a previous conversation, we are typically injecting that context back into its prompt. There is no persistent, internalized recollection of past interactions that shapes its ongoing cognitive state. This contrasts with human memory, which is a dynamic, reconstructive process that deeply influences our present thoughts and future actions. The perceived “learning” of an LLM is confined to the training phase; it does not undergo continuous, lifelong adaptation of its neural structures in response to new experiences.
The ecosystem surrounding LLMs reflects this debate. Online forums buzz with discussions comparing LLMs to “stochastic parrots” versus nascent forms of “true intelligence.” The prevailing, and I would argue, more accurate, sentiment highlights the fundamental differences. Human intelligence is characterized by continuous integration of multimodal data (sight, sound, touch, smell, taste), the constant modification of neural pathways, and a rich internal world of motivations, goals, and emotions—all absent in current LLM architectures. The concern is that by anthropomorphizing these statistical models, we risk devaluing genuine human expertise and fostering a societal “deskilling” effect, where reliance on AI diminishes our own cognitive faculties.
A significant aspect of LLMorphism is the projection of human-like agency and intentionality onto LLMs. When an LLM provides a seemingly insightful answer or offers a creative suggestion, it’s easy to attribute intent and purpose to its output. However, this is a classic case of projecting our own cognitive frameworks onto a system that operates on entirely different principles.
LLMs, by design, lack intrinsic goals, motivations, or a sense of self. They do not “want” to be helpful, nor do they “desire” to produce accurate information. Their output is a consequence of optimizing for predicted probabilities, a sophisticated form of pattern completion. While APIs can enable LLMs to interact with external tools or participate in agentic frameworks (like AutoGPT or HuggingGPT), this is an orchestrated behavior, a programmed sequence of actions rather than genuine, self-directed volition. The “agent” is not the LLM itself, but the architecture around the LLM, leveraging its linguistic capabilities as a component.
This distinction is crucial for understanding the limitations of LLMorphism. Humans, on the other hand, are driven by a complex interplay of biological imperatives, learned values, and personal aspirations. Our agency is rooted in our lived experiences, our capacity for abstract thought, and our ability to form intentions and pursue them. The “learned scripts” that some argue humans also operate on are embedded within a biological framework that allows for emergent creativity, ethical reasoning, and profound emotional depth—qualities that remain elusive for current LLMs.
The critical perspective here is that LLMorphism is a biased inference driven by the output similarity, not by any underlying cognitive architectural similarity. The linguistic fluency of an LLM can mask its fundamental differences from human cognition. It’s like mistaking a highly detailed painting of a bird for a living creature. The resemblance is striking, but the underlying nature is entirely different.
When we critically examine the operational logic of LLMs, their limitations become starkly apparent. They exhibit a form of “brittle logic,” meaning their reasoning can collapse unexpectedly when presented with slight variations or novel contexts not well-represented in their training data. They struggle with true generalization in a way that humans, even young children, do not.
One of the most significant divergences is in the realm of critical thinking and genuine reasoning. LLMs excel at surface-level pattern recognition. They can identify correlations and reproduce arguments, but they often lack the capacity for deep causal understanding or the ability to reason beyond the explicit patterns in their training data. This is why they sometimes “confabulate”—generating plausible-sounding but ultimately fabricated information—rather than “hallucinate” in a way that might suggest a flawed internal model attempting to make sense of novel input. Confabulation in LLMs is more about stitching together plausible linguistic sequences that fit the statistical profile, even if they lack factual grounding.
Moreover, LLMs are fundamentally tethered to their text-based input. They cannot process or generate information in the rich, multimodal ways that humans do. A child learns about the world not just through words, but through seeing, hearing, touching, tasting, and interacting. This embodied, dynamic learning process is essential for developing a robust understanding of causality, spatial relationships, and social dynamics—areas where LLMs are significantly constrained.
The verdict on LLMs as comprehensive models of the human mind is clear: they are powerful linguistic tools, invaluable for tasks involving language processing and generation. However, their limitations in fundamental areas—continuous learning, embodied experience, causal understanding, and genuine reasoning—render them inadequate as direct analogies for human cognition. To fall into the trap of LLMorphism is to risk a profound misunderstanding of ourselves, potentially leading to misguided societal expectations, educational reforms, and ethical frameworks. We must resist the temptation to see our own intelligence reflected in the statistical mirror of LLMs, and instead, continue to explore and celebrate the unique, multifaceted nature of human consciousness.