Simulator Worlds: A Research Methodology
My Research Methodology with LLMs. If referred from an existing post, you will find the details about level of rigour and type of post in the first two subheadings.
Simulator Worlds: A Research Methodology
Summary and Navigation Guide
Core Thesis
This essay presents a research methodology that treats large language models not as question-answering systems but as simulator worlds—environments where you stage conversations between imagined experts to externalize and augment cognitive work. The method excels at early-stage research requiring cross-domain synthesis, but has clear limitations in specialized technical domains. The piece is self-demonstrating: it uses the methodology it describes to explain itself.
Types of Simulator Worlds: A Taxonomy
Exploratory Worlds
Purpose: Hypothesis generation, divergent thinking
Verification Level: Low (internal sense-checking only)
Staging Approach: Chaotic, stimulating environments; diverse, creative characters; permissive norms
Best For: Early-stage ideation, discovering new frames, identifying research questions
Avoid For: Claims requiring accuracy, specialized technical problems
Analytical Worlds
Purpose: Understanding connections, integrating frameworks
Verification Level: Medium (self + close collaborators)
Staging Approach: Balanced creative and critical voices; structured dialogue spaces; clear framing
Best For: Cross-domain synthesis, understanding relationships between theories
Avoid For: Specialized domains lacking training data
Verification Worlds
Purpose: Stress-testing ideas, identifying weaknesses
Verification Level: High (literature + expert feedback)
Staging Approach: Skeptical reviewers, methodological critics; rigorous evaluation norms
Best For: Testing defensibility of claims, finding gaps in reasoning
Avoid For: Pure exploration where critique is premature
Communication Worlds
Purpose: Translation, explanation, accessibility
Verification Level: Variable (depends on audience and claims)
Staging Approach: Clear narrative structure; pedagogical characters; concrete examples
Best For: Making complex ideas accessible, teaching concepts
Avoid For: Novel research not yet validated
The Epistemic Ladder: Five Stages of Verification
Stage 1: Exploratory
Verification Process: Internal sense-checking: “Does this open interesting directions?”
Epistemic Certainty: ~30-50%
Output Type: Personal notes, early drafts, research questions
Time Investment: Hours
Stage 2: Internally Verified
Verification Process: Self-review + close collaborators: “Does the reasoning track?”
Epistemic Certainty: ~60-75%
Output Type: Blog posts, working papers, shared drafts
Time Investment: Days
Stage 3: Literature-Grounded
Verification Process: Systematic literature review with citations and verification
Epistemic Certainty: ~85-95%
Output Type: Scholarly blog posts, preprints, documented frameworks
Time Investment: Weeks
Stage 4: Expert-Validated
Verification Process: External domain experts review and provide feedback
Epistemic Certainty: ~90-95%
Output Type: Conference presentations, preprints under review
Time Investment: Months
Stage 5: Publication-Ready
Verification Process: Peer review, formal publication process
Epistemic Certainty: As high as scholarship gets
Output Type: Journal articles, book chapters, established frameworks
Time Investment: Months to years
Article Structure and Navigation
Part I: The Fundamental Misunderstanding
Core Question: What are LLMs actually doing?
Key Takeaway: LLMs are simulators that generate text by sampling from learned patterns, not databases that retrieve answers.
Read If: You need the conceptual foundation.
Part II: Staging the Simulation
Core Question: How do you set up effective prompts?
Key Takeaway: Environmental and character details prime specific modes of generation by activating learned patterns.
Read If: You want to understand the mechanics.
Part III: Divergent and Convergent Modes
Core Question: Why does staging environment matter?
Key Takeaway: Different cognitive modes (creative vs. critical) are enhanced by different environmental conditions—this applies to prompting.
Read If: You’re interested in the cognitive science.
Part IV: A Practical Methodology
Core Question: How do you actually do this?
Key Takeaway: Stage different “scenes” for different phases: chaotic marketplaces for exploration, quiet studies for refinement.
Read If: You want actionable techniques.
Part V: Why This Matters
Core Question: What’s the epistemic status of simulated conversations?
Key Takeaway: You’re activating patterns of thinking, not accessing actual knowledge—outputs need independent validation.
Read If: You need to understand limitations.
Part VI: Grounding in Reality
Core Question: When does this work vs. fail?
Key Takeaway: Excellent for generalist synthesis and frame exploration; fails in highly specialized technical domains.
Read If: You want to know boundary conditions.
Part VII: The Self-Referential Practice
Core Question: What are the verification stages?
Key Takeaway: Five stages from exploratory to publication-ready, each with different epistemic certainty and verification needs.
Read If: You want the complete workflow.
Note on Method: This guide itself exists at Stage 3 of the epistemic ladder—literature-grounded but not yet expert-validated. The frameworks proposed represent the author’s working methodology, documented to help others develop similar practices.
A Simulator World about Simulator Worlds
Part I: The Fundamental Misunderstanding
The way we talk about language models reveals a deep confusion about what they actually are. We say “I asked ChatGPT” or “the AI told me” or “let me search for that,” importing vocabulary from search engines and expert systems and question-answering databases. But this framing obscures something essential. When you prompt an LLM, you’re not retrieving information that exists somewhere. You’re not consulting an oracle. You’re doing something stranger and more powerful: you’re conjuring a world into textual existence, moment by moment, token by token.
This isn’t my insight originally. Janus, writing on LessWrong, articulated it most clearly in their sequence on simulators (Janus, 2022). They argued that we should understand GPT-like models not as agents with goals or knowledge bases to be queried, but as simulators that can instantiate many different characters and scenarios. The model isn’t “thinking” in any unified sense—it’s sampling from distributions over possible text continuations, conditioned on everything that came before. What you’re really doing when you prompt is setting up initial conditions for a dynamical process that then unfolds according to learned patterns.
But if few people have fully internalized what this means for how we should actually use these systems, fewer still have developed deliberate methodologies around it. This essay is my attempt to articulate a practice I’ve developed over the last 3 years of research with these tools—a practice that treats LLMs not as answer machines but as environments for staging productive conversations that augment human thinking.
Let me show you what I mean by taking you on a walk through the mountains.
Part II: Staging the Simulation
Picture yourself on a mountain trail in Nepal. The air is thin and crystalline, the sky an impossible blue stretched between snow-capped peaks. Walking beside you are five companions, each bringing distinct ways of seeing the world.
But before we continue—notice what I just did. I didn’t ask the language model “what’s the best way to think about simulation?” I set up a scene. I established an environment, a cast of characters, a physical context. Why?
Because research on creativity consistently shows that environmental factors shape cognitive processing. Oppezzo and Schwartz (2014) demonstrated that walking, particularly in natural settings, significantly enhances divergent thinking compared to sitting. The mechanism involves both physiological arousal and what environmental psychologists call “soft fascination”—the way natural environments engage attention gently, allowing the mind to wander productively (Kaplan, 1995). Studies using the Alternative Uses Test (Guilford, 1967), where participants generate novel uses for common objects like bricks, consistently find improved performance in visually complex natural settings compared to sterile laboratory environments (Atchley et al., 2012).
So when I stage this conversation on a mountain trail, I’m not being poetic. I’m deliberately invoking patterns the model has learned about how thinking happens in such contexts. The model has absorbed thousands of examples of research dialogues, philosophical conversations, tutorial exchanges, brainstorming sessions. It has internalized patterns about how these interactions tend to flow—how they begin, how perspectives build on each other, how insights emerge through dialogue. My job as prompter is to activate the right patterns by setting up the right conditions.
Let me introduce the companions more carefully, explaining why each belongs on this particular walk:
Thomas Malone has spent decades studying collective intelligence at MIT’s Center for Collective Intelligence. His work on “supermind” (Malone, 2018) examines how groups of people and computers can be connected so they collectively act more intelligently. He’s the perfect person to help us understand what we’re doing when we simulate multiple perspectives.
John Wentworth is a researcher focused on agent foundations and natural abstractions in AI safety. His work on “where abstractions come from” and his technical analysis of how systems compress information makes him ideal for unpacking the mechanics of what language models are actually doing (Wentworth, 2021). I’ve set him in an “unusually open mood” because his typical style is rigorously mathematical and sometimes tersely skeptical—but I want access to his insights without the constraint of his usual communication style. This is the power of simulation: I can specify the version of the person most useful for the current task.
A creativity researcher who knows the literature intimately—think someone who’s absorbed work from Guilford to Csikszentmihalyi to modern cognitive neuroscience of insight. This figure helps us understand the mechanics of divergent and convergent thinking, and why the environments we stage matter.
Janus, the pseudonymous author who wrote most clearly about language models as simulators, bringing a perspective that combines deep technical understanding with almost phenomenological attention to how these systems actually feel to use (Janus, 2022).
And finally, a Quanta magazine editor—let’s call them Quinn—who knows how to distill complex ideas into compelling narrative structure. Science communication research shows that narrative framing significantly enhances comprehension and retention of complex material (Dahlstrom, 2014). Quinn is here to help us maintain clarity while exploring these ideas.
The path winds upward through rhododendron forests. Thomas speaks first, his voice carrying that particular timbre of someone who’s spent years watching intelligence emerge from interaction: “What strikes me about these systems is how fundamentally they’re about context and constellation. You’re not extracting pre-formed answers from some database. You’re setting up initial conditions for a dynamical system that then unfolds according to patterns learned from observing how humans generate text in various contexts.”
John nods, his mathematical precision momentarily softened by mountain air: “It’s like specifying boundary conditions for a differential equation. The model isn’t ‘knowing’ things in the traditional sense. It’s sampling from a learned distribution over possible continuations, conditioned on all prior context. Your prompt is literally the initial state vector for a trajectory through possibility space.”
The editor Quinn interjects here, recognizing an important clarification is needed: “Hold on—we should make sure readers understand what ‘learned distribution’ actually means practically. John, can you ground that?”
John adjusts: “Right. When the model was trained on trillions of tokens of human-written text, it learned statistical patterns about what tends to follow what. Not just at the word level, but at the level of ideas, arguments, narrative arcs, conversation flows. So when you prompt it, you’re not triggering retrieval—you’re triggering generation according to those learned patterns. The more specific and structured your prompt, the more constrained and coherent the generation tends to be.”
Part III: The Mechanics of Divergent and Convergent Simulation
The trail opens onto a meadow, wildflowers scattered like paint drops across green canvas. The creativity researcher stops to gesture at the landscape: “This is exactly the environment for divergent thinking, you know. Studies consistently show that natural settings, visual complexity, changing stimuli—all of these enhance creative cognition.”
They’re referring to a robust literature. The phenomenon has been studied under various names: attention restoration theory (Kaplan & Kaplan, 1989), biophilia hypothesis (Wilson, 1984), and more recently through neuroscience examining how natural environments affect default mode network activity (Bratman et al., 2015). The mechanism involves both reduced cognitive load—nature doesn’t demand the directed attention that urban environments do—and increased associative processing. The visual richness seeds new connections.
“When subjects perform the Alternative Uses Test,” the researcher continues, “generating novel uses for a brick or a paperclip, performance improves dramatically in settings like this compared to sterile laboratories. The effect size is substantial—often 50% more novel uses generated in natural compared to indoor environments.”
Quinn, the editor, sees where this is going and helps make it explicit: “So you’re saying that when you prompt a language model, you should literally describe these kinds of environments if you want divergent thinking?”
“Exactly,” the researcher responds. “If I want to brainstorm research directions, I might begin my prompt with: ‘We’re in a bright, messy studio space, walls covered with sketches and diagrams, a pile of interesting objects on the table—a circuit board, some origami, a collection of stones. Ideas are flowing freely and no suggestion is too wild.’ That description primes the model to generate in a more exploratory, less constrained mode.”
This connects to what Ethan Mollick has termed “priming the pump” in his work on effective prompting (Mollick, 2023). The context you provide doesn’t just give information—it shapes the mode of generation. And this matters enormously because human creativity itself operates in distinct modes.
Guilford (1967) first distinguished between divergent thinking—the ability to generate many diverse ideas—and convergent thinking—the ability to evaluate and synthesize toward a solution. These aren’t just different cognitive processes; they’re supported by different neural networks and enhanced by different environmental conditions (Beaty et al., 2016). Divergent thinking benefits from relaxation, broad attention, and permissive norms. Convergent thinking requires focus, critical evaluation, and clear criteria.
Janus jumps in: “What fascinates me is how explicitly you can shift between these modes when you understand you’re running a simulation. With actual human collaborators, transitions can be awkward—someone’s in exploratory mode while another is trying to evaluate, and you get friction. But with simulated dialogue, you can literally write: ‘Okay, we’ve generated lots of possibilities. Now we move into the focused editorial room and start evaluating.’”
As the path climbs higher, I begin explaining my own practice, developed through countless hours of research with these systems. The key insight is recognizing that you’re directing a simulation, and like any good simulation, it needs the right setup to produce useful output.
Part IV: A Practical Methodology (Self-Demonstration Continues)
When I need to develop a research idea, I explicitly stage different phases:
Phase 1: Divergent Exploration
I create messy, stimulating environments with permissive norms. The prompt might begin:
“We’re walking through a chaotic marketplace of ideas. Around us are researchers, artists, engineers—people from wildly different domains. The air buzzes with conversation. No one here is worried about being wrong; everyone is riffing, building on each other’s half-formed thoughts. Among this crowd, I notice...”
Then I introduce characters who excel at making unexpected connections. Maybe it’s Douglas Hofstadter, who sees analogies everywhere. Maybe it’s Michael Levin, who applies concepts from developmental biology to AI systems. Maybe it’s a jazz musician who thinks about improvisation and emergence.
The environment description isn’t decoration—it’s functional. Research by Mehta et al. (2012) shows that ambient noise at moderate levels (around 70 decibels, like a coffee shop) actually enhances creative performance by increasing “processing difficulty,” which promotes abstract thinking. While I can’t literally add noise to a text prompt, describing a bustling environment invokes the model’s learned patterns about how thinking happens in such contexts.
Phase 2: Convergent Refinement
Once I have raw material, I stage a different scene:
“We’ve moved to a quiet study. Afternoon light slants through tall windows. On the desk is a single document—all those wild ideas from before, now needing to be shaped into something rigorous. Joining me are...”
Now I bring in critics and specialists. Maybe it’s a Quanta editor like Quinn, who knows how to distill complex ideas into clear narrative. Maybe it’s a theoretical physicist who spots hidden assumptions. Maybe it’s a mathematician who demands formal precision.
The physical description again matters. Studies on embodied cognition show that even imagined physical contexts influence thinking (Barsalou, 2008). When you describe a “quiet study,” you’re activating patterns associated with focused, careful reasoning. When you describe a “chaotic marketplace,” you’re activating patterns of exploratory, associative thinking.
Phase 3: Integration and Synthesis
The final phase requires negotiating between perspectives:
“We’re in a seminar room. On the whiteboard are competing frameworks—different ways of understanding the same phenomenon. Around the table are the proponents of each view. They’re smart enough to recognize value in other approaches, but committed enough to their own to push back constructively. Let’s work through where these frameworks agree, where they conflict, and whether there’s a deeper synthesis...”
Thomas Malone has been listening carefully, and now he makes a crucial connection: “This is very much like collective intelligence in human organizations. Different group configurations excel at different tasks. Brainstorming works best with diverse, loosely connected participants and permissive norms—that’s your marketplace phase. But evaluation and refinement require different structures—smaller groups, clearer criteria, more critical engagement—your quiet study. What you’re describing is essentially simulating these different collective intelligence modes by setting up the right conversational dynamics.”
Part V: Why This Matters (Making the Meta-Level Explicit)
Quinn the editor interjects with a sharp question: “Okay, but we need to be really clear about something. When you do all this—when you invoke Thomas Malone or Richard Feynman or whoever—you’re not actually accessing their knowledge, right? The model hasn’t interviewed them. It can’t know what they’d really say about your specific question. So what exactly is the epistemic status of these simulated conversations?”
This is the crucial question, and John Wentworth has been waiting for it. His mathematical precision comes through: “The model has learned patterns from these thinkers’ writing—their styles of reasoning, their characteristic ways of approaching problems, their typical framings. When you invoke them, you’re activating those patterns. It’s not channeling; it’s more like... statistical ventriloquism. You’re getting something that follows the patterns of how that person tends to think, based on their public intellectual output.”
“But,” John continues, “and this is critical—those patterns can still be genuinely useful. If Richard Feynman’s approach to physics problems involved breaking things down to fundamental principles and using clear physical intuition, then a simulation that follows those patterns can help you approach your problem similarly. You’re not getting Feynman’s knowledge about your specific problem. You’re getting a Feynman-style approach to the kind of problem you’re working on.”
The creativity researcher adds: “It’s like how reading great books doesn’t just give you the content of those books—it shapes how you think. You internalize the author’s ways of approaching problems. What you’re doing with these simulations is making that process explicit and interactive.”
Janus offers perhaps the deepest insight: “What’s profound is how this reveals something about knowledge itself. We tend to think of knowledge as propositional—facts you either know or don’t. But so much of valuable knowledge is procedural and contextual. It’s knowing how to think about certain kinds of problems, what questions to ask, what frameworks to bring to bear. These patterns of thinking exist distributed across culture, encoded in how experts write and talk about their domains. The language model has absorbed these patterns. Through simulation, you can invoke them, combine them, explore them.”
This connects to research on “Ways of Thinking and Practicing” (McCormick et al., 2015) in education—the idea that disciplines aren’t just bodies of knowledge but characteristic ways of approaching problems. When you simulate an economist alongside a physicist alongside a complexity scientist, you’re not just getting different facts; you’re getting different epistemological approaches in dialogue.
Part VI: Grounding in Reality—And Recognizing the Limits
We’ve reached a prayer flag-strewn pass, wind snapping colored fabric against infinite blue. The vista opens in all directions, and Quinn the editor suggests we pause. “Okay, we need to get really concrete here. Not just about what this method is, but about when it works, when it doesn’t, and—crucially—what the actual practice looks like.”
I nod, pulling out my notebook where I’ve been tracking patterns from hundreds of these sessions. “Right. Because here’s what actually happens when I use this method: I don’t just run one simulation and accept whatever it produces. I’m constantly iterating.”
John Wentworth’s eyes light up—this is the kind of technical detail he appreciates. “Ah, so it’s more like the verification versus generation distinction we see in AI safety research. It’s often easier to verify whether something is good than to generate something good from scratch. You’re generating candidates through simulation, then verifying them against your actual understanding, then regenerating with adjusted prompts based on what didn’t work.”
“Exactly,” I say. “Let me show you what that looks like in practice. Earlier in this very piece, I had a section that felt too abstract. So I looked at it, thought about what was missing—it needed more grounding in actual cognitive science research, more concrete examples—and then I restaged the conversation with different instructions. I might say: ‘Okay, we’re at the same point in the walk, but this time the creativity researcher is being much more specific about studies and effect sizes, and Quinn is pushing everyone to give concrete examples rather than general principles.’”
The creativity researcher jumps in: “And the specificity matters enormously. When you said ‘imagine some smart people,’ that’s like asking someone to ‘think of a number’—you’ll get something, but it’ll be generic. But when you say ‘Thomas Malone, who has spent thirty years studying how collective intelligence emerges in organizations, and who has a particular interest in how technology mediates group thinking’—now you’re activating much richer patterns in the model. The more specific the context, the more constrained and coherent the generation.”
Thomas adds: “It’s like the difference between saying ‘tell me about collective intelligence’ versus ‘I’m trying to understand how voting systems with AI agents might differ from pure human voting systems, and I’m particularly interested in how information cascades might change when some voters are optimizing explicitly.’ The second gives the model so much more to work with—domain context, specific mechanisms, particular concerns.”
We settle on some rocks, the thin air making us appreciate the chance to rest. I continue: “But here’s the crucial thing we need to address: when does this actually work, and when does it completely fall apart?”
Janus leans forward. “Right, because I think there’s a risk people will either think this is magic that works for everything, or dismiss it entirely because it clearly doesn’t work for certain things.”
“Let me give you a very concrete example of where it fails,” I say. “My father is a chemist. He works on crystallization processes for pharmaceutical compounds—specifically, how the crystalline structure of a drug affects its bioavailability and targeting for cancer treatments. This is incredibly specialized work. It involves understanding polymorphism, crystal habit modification, how different solvents affect nucleation rates, the interplay between thermodynamics and kinetics in supersaturated solutions.”
John nods slowly. “And there’s probably not that much training data on those specific processes.”
“Exactly. If I tried to use this simulation method to understand my dad’s work in depth, it would fail. The model might generate something that sounds plausible—it knows the general vocabulary of chemistry—but it wouldn’t have the detailed, specific knowledge that comes from years of specialized research in a narrow domain. The patterns just aren’t there in sufficient detail. I could simulate a conversation between chemists, but they’d be speaking in generalities, not the specific technical details that actually matter.”
The creativity researcher adds: “This connects to research on expertise. Ericsson’s work on deliberate practice shows that expert-level performance requires around 10,000 hours of focused practice in a domain, developing highly specific pattern recognition. When domains are very specialized, the relevant patterns don’t exist broadly in human discourse—they exist in technical papers, lab notebooks, specialized conferences. The model hasn’t absorbed those patterns in depth.”
“So when does it actually work?” Quinn asks, getting us back on track.
I think for a moment, watching clouds move across distant peaks. “It works brilliantly for generalist synthesis—when you’re trying to combine models and insights from different domains. It works when you’re in that Feynman-style mode of research, where the core skill is asking the right questions and building the right mental models rather than knowing specific technical facts.”
Thomas gets excited: “Yes! Feynman’s approach was always about understanding things from first principles, finding the right analogies, seeing connections between different domains. He’d say ‘forget the formalism for a moment, what’s really happening physically?’ That kind of thinking—which is about frames and perspectives more than specific facts—is exactly where this simulation approach shines.”
“Right,” I continue. “Because here’s what this method is really good for: exploring different frames. I actually have a post on research distillation where I argue that different frames are the fundamental unit of research insight—not facts, but ways of seeing. And this simulation approach lets you rapidly explore multiple frames that would be nearly impossible for one person to hold simultaneously.”
John Wentworth, who’s been quiet, speaks up with characteristic precision: “This connects to something Eliezer wrote about Einstein’s genius. He modeled it as Einstein having certain ‘bits’ of information—intuitive hunches, preliminary models—that were already pointing in the right direction before general relativity was fully formed. Einstein wasn’t starting from scratch; he had this constellation of insights that needed to be integrated.”
“And,” John continues, warming to the topic, “if you think about different cognitive frameworks as different ‘virtual machines’ running on the same hardware, they have different bit efficiencies for different problems. Some ways of thinking about a problem compress the important information very efficiently; others require tracking many details explicitly. What you’re doing with these simulations is running multiple virtual machines—different frameworks for thinking about the problem—and seeing where they converge.”
I nod vigorously. “Yes! That’s exactly it. When I simulate a conversation between, say, a complexity scientist, an economist, and a political theorist all looking at the same coordination problem, I’m not trying to get THE answer. I’m trying to see which aspects of the problem each framework compresses efficiently, and where the frameworks point in similar directions despite their different starting points.”
The path continues along the ridge, and we walk in contemplative silence for a moment. Then Janus speaks: “What you’re describing is really a meta-science methodology. If you believe that progress comes from exploring and integrating different frames—which a lot of philosophy of science suggests—then this becomes a tool for accelerating that process. You’re not replacing the hard work of research, but you’re making the frame-exploration phase much more efficient.”
Quinn, ever the editor, wants specificity: “Give us a concrete example of when this frame-exploration actually helps versus when it doesn’t.”
I think back through recent work. “Okay, concrete example. I was trying to understand how AI systems might affect democratic deliberation. That’s inherently a cross-domain question—it involves computer science, political theory, social psychology, economics, maybe even anthropology. No single person is expert in all these areas, but understanding the problem requires thinking through all these lenses.
“So I staged conversations where different experts approached the question from their domains. The political theorist focused on legitimacy and representation. The complexity scientist focused on network effects and information cascades. The economist focused on mechanism design and incentive compatibility. Each frame revealed different aspects of the problem.
“But—and this is crucial—I then had to verify these insights against actual research. I couldn’t just accept what the simulation produced. I used it to generate hypotheses and perspectives, then I went and checked: does the research actually support this? Are there papers on information cascades in democratic systems? What do they say? Is this mechanism design approach actually feasible?
“The simulation helps me explore the space efficiently, but verification requires engaging with real literature, real data, real expert opinions.”
Thomas Malone interjects: “This is very much like how collective intelligence works in human organizations. You want diverse perspectives in the generation phase—many different ways of framing the problem. But you also need critical evaluation, integration, and ultimately validation against reality. The simulation lets you do the first part more efficiently, but you can’t skip the later parts.”
John adds a mathematical perspective: “Think of it as a search algorithm. You’re using these simulations to intelligently sample the space of possible framings, rather than either searching exhaustively or just using your single default frame. It’s more efficient than brute force but less rigorous than formal proof. Which means it needs to be followed by verification.”
The creativity researcher brings in empirical grounding: “Studies on ‘conceptual combination’ in creativity show that the most novel insights often come from combining concepts from distant domains. But the studies also show that most combinations don’t work—they’re incoherent or useless. What matters is generating many combinations and having good filters. Your simulation method is basically systematizing conceptual combination, but you still need the filtering stage.”
We’re beginning our descent now, light taking on that particular golden quality of late afternoon in mountains. I want to make sure we’re clear about the iterative practice: “So let me describe the actual workflow, because it’s not ‘prompt once and accept output.’ It’s much more iterative.
“I might start with a simulation that produces something interesting but flawed. Maybe it’s too abstract, or it makes claims that I’m not sure are defensible, or it misses a perspective that seems important. So I look at what it produced, identify what’s wrong or missing, and then I restage the conversation with adjustments.
“Sometimes I’ll do this five or six times for a single section. ‘Okay, that was too hand-wavy—let’s bring in someone who demands mathematical precision.’ Then that version is too technical and loses intuition, so: ‘Now let’s have someone who insists on physical analogies and concrete examples.’ The final output isn’t any single simulation—it’s the result of this iterative process of generation, evaluation, and regeneration.”
Quinn nods approvingly: “And critically, you’re the one doing the evaluation at each step. The model doesn’t know whether its output is good—you’re using your actual understanding and judgment to assess quality and decide what needs to change.”
“Right,” I say. “This is why I said earlier that this is an art, not a science. There’s no formula for when to bring in which perspective, or how to describe the environment, or when you’ve iterated enough. You develop intuition through practice, just like you develop intuition for any research method.”
John Wentworth, always focused on the technical details: “But there are some principles. You mentioned specificity—more specific context generally produces more useful output. You mentioned the divergent-convergent sequence—exploration before evaluation. What other principles have you noticed?”
I consider this as we navigate a steep section of trail. “A few things. First, when you’re working on genuinely novel problems—which is most interesting research—you need to be especially careful because the model has less to work with. It can only recombine what it’s seen, so truly original insights require more human judgment.
“Second, the method works better when you’re honest about uncertainty. If I’m not sure about something, I’ll often stage a conversation where someone raises skepticism: ‘Wait, are we sure that assumption holds?’ That makes the uncertainty explicit rather than letting the simulation paper over it.
“Third, and this is subtle—you need to maintain the right epistemic stance. These outputs aren’t truth, but they’re not random noise either. They’re coherent continuations of patterns in human discourse. That means they can reveal real insights about how people think about problems, even if the specific claims need verification.”
Thomas adds: “It’s like how good brainstorming in organizations doesn’t immediately accept all ideas as valid, but also doesn’t immediately dismiss them. You hold them lightly, as possibilities worth exploring further. The simulation gives you that space of possibilities.”
Janus offers a final thought on when this works: “I think it’s most powerful when the bottleneck in research is not access to information, but integration of information. When you’re working on problems that require synthesizing across domains, understanding multiple perspectives, or seeing how different frameworks relate—that’s when this shines. When you need deep, specialized technical knowledge in a narrow domain, you need actual expertise.”
The trail levels out as we approach the village. The prayer flags we saw at the pass are now visible below us, and I realize we need to address one more thing.
“There’s a deeper point here about how research actually works,” I say. “A lot of people imagine research as this linear process: you have a question, you gather data, you analyze it, you write it up. But actual research is much messier. It’s iterative, it involves exploring dead ends, it requires holding multiple contradictory models simultaneously until you figure out which is right.
“What this simulation method does is externalize and make explicit some of that messy process. Instead of just having vague intuitions about how different researchers might approach your problem, you can actually stage those approaches and see what they produce. Instead of trying to hold multiple frames in your head simultaneously—which is cognitively very difficult—you can have them in dialogue.
“But,” and I emphasize this, “it doesn’t replace the hard parts of research. You still need to do the actual verification. You still need to run experiments or check data or engage with literature. You still need to think rigorously about whether your conclusions follow from your evidence. The simulation is a cognitive tool, not a shortcut past cognition.”
Quinn, who has been taking notes, looks up: “So the honest claim here is: this method is excellent for efficiently exploring different framings and perspectives on complex problems, especially when those problems require integrating across domains. It’s particularly valuable in the early, generative phases of research when you’re trying to understand the shape of the problem space. But it’s not a substitute for specialized expertise, rigorous verification, or the actual work of research. It’s a tool that makes certain parts of the research process more efficient, not a replacement for the research process itself.”
“Exactly,” I say. “And knowing when to use it versus when to put it aside and just think directly or consult actual experts—that’s part of developing good judgment with these tools.”
The sun is setting as we reach the village, painting the mountains in shades of gold and purple. Our conversation has traced something important: not just what this method is, but its boundaries—where it helps and where it fails, when to trust its outputs and when to be skeptical, how it fits into the larger process of research rather than replacing it.
Because that’s the honest truth about this approach. It’s not magic. It’s not AGI doing your thinking for you. It’s a cognitive tool that, used skillfully and with appropriate skepticism, can make certain kinds of intellectual work more efficient. Used poorly or in the wrong contexts, it produces plausible-sounding nonsense.
The art—and it is an art—is knowing the difference.
Part VII: The Self-Referential Practice
We settle in the village guesthouse as evening deepens, butter lamps casting warm light across wooden walls. Quinn looks at me with something between amusement and recognition: “You realize this entire conversation we’ve been having—this elaborate staging on a mountain trail—is itself the technique you’re describing. We’re inside a simulator world right now.”
“Yes,” I say, and I’m smiling because this is precisely the point. “But it’s not ironic. It’s intentional. This piece is meant to be self-demonstrating—the method instantiates itself in the explanation of the method. How else could I show you what I mean by ‘staging productive conversations’ than by actually staging one?”
Thomas Malone nods thoughtfully. “It’s like how the best papers about research methodology actually embody the methodology they’re describing. You’re not just telling people about simulator worlds—you’re showing them one in operation.”
“Exactly. And I want to be very clear about something,” I say, setting down my tea. “This isn’t just a one-off technique for writing this particular piece. This is genuinely how I do research now. When people read my future work—papers on collective intelligence, blog posts about coordination mechanisms, technical documents about multi-agent systems—they should understand that much of it emerged through this process of staging conversations in carefully constructed simulator worlds.”
Janus leans forward, interested. “But ‘simulator world’ is a broader frame than just ‘using LLMs,’ right? You’re talking about a whole methodology for externalizing thought.”
“Right. I’m calling them simulator worlds rather than just ‘prompts’ because I think that captures what’s actually happening. You’re constructing a world—with geography, characters, contexts, dynamics—and then letting that world unfold according to patterns the model has learned. It’s continuous with the ‘simulators’ frame I referenced earlier, but emphasizing the deliberate worldbuilding aspect.”
The creativity researcher adds: “And different worlds serve different purposes. You wouldn’t stage the same kind of conversation for early-stage exploration versus final verification.”
“Precisely,” I say. “Let me be concrete about this, because it connects to how we actually organize research at Equilibria Network, where I work. We have a staged research process that goes: research story, then blog post, then preprint, then publication. Each stage has different verification criteria and serves different purposes. I think we can map something similar onto simulator worlds.”
John Wentworth pulls out a notebook—he appreciates taxonomies. “Walk us through the stages.”
The Ladder of Epistemic Certainty
I gather my thoughts, looking at the butter lamp flame. “There are different degrees of epistemic certainty in simulator worlds, and they correspond to different staging approaches and verification levels. Let me outline them:
Stage 1: Exploratory Simulator Worlds
This is the most divergent, least verified stage. The purpose here is pure exploration—what frames might be relevant? What perspectives exist? What questions should I even be asking?
At this stage, I might stage very loose conversations with broad participation. Maybe I’m walking through a marketplace of ideas, or sitting in a chaotic workshop space. The verification is minimal—just my internal sense of ‘does this open up interesting directions?’ The epistemic certainty is low, but that’s fine because the goal is generation, not validation.
For example, when I first started thinking about the Einstein and ‘bits’ idea—which, by the way, came from reading Eliezer’s sequences years ago and has been sitting in my memory—I didn’t need high certainty. I just needed to explore whether that frame might be useful for thinking about how simulator worlds help with research.”
Thomas interjects: “So the claims at this stage might be wrong, but that’s acceptable because you’re in hypothesis-generation mode.”
“Right. I’m not publishing these as facts. I’m using them to think.”
Stage 2: Internally Verified Simulator Worlds
“The second stage is what I might call the ‘vibe check’ stage. I’ve generated something through simulation, and now I’m checking it against my own understanding. Does this actually make sense? Do the connections hold up? Am I making claims I can’t defend?”
Quinn nods: “This is like a first draft where you’re your own editor.”
“Yes, but with an important addition. At this stage, I might also loop in close collaborators—people who understand the method and can sanity-check the output. Not formal peer review, but more like: ‘I staged this conversation between a complexity scientist and an economist about coordination failures. Does the economic reasoning actually track, or did the simulation produce plausible-sounding nonsense?’
The epistemic certainty is higher now—these aren’t just random explorations, they’re ideas that have survived initial scrutiny. But they’re still not verified against external literature or expert opinion.”
Stage 3: Literature-Grounded Simulator Worlds
John Wentworth perks up at this. “This is where you actually do the scholarship.”
“Exactly. At this stage, I take the frameworks and ideas that emerged from simulation and I verify them against actual research. When the creativity researcher in our walk mentioned Guilford and divergent thinking, or when we referenced Oppezzo and Schwartz on walking and creativity—those are real papers. I went and checked them. I read them. I made sure the simulation hadn’t led me astray.
This is critical: the simulator world helps me explore and connect ideas efficiently, but it doesn’t replace the hard work of actually engaging with literature. At this stage, I’m doing real research—reading papers, checking citations, making sure claims are defensible.”
The creativity researcher adds the mechanism: “And this is actually where the simulation becomes most powerful as a tool. Because it helps you identify which literature to check. Instead of doing an exhaustive literature review, you use the simulation to generate hypotheses about what connections might exist, then you verify those specific connections.”
“Right. So by this stage, when I make a claim, I can provide actual citations. The epistemic certainty is much higher—maybe 90-95% of the claims are now grounded in actual research. The simulator world helped me find and connect the research, but the final product is genuinely scholarly.”
Stage 4: Expert-Validated Simulator Worlds
“The fourth stage involves sharing with external experts. This is where you find out if your synthesis makes sense to people who actually work in these domains. I might share a piece with complexity scientists, with political scientists, with AI researchers—whoever’s expertise is relevant.
This catches things that literature review alone might miss. Maybe I’ve connected two bodies of research in a way that seems clever but actually misunderstands a key concept. Maybe I’ve overlooked important work. Maybe my framing, while novel, doesn’t actually advance understanding.
The simulator world gave me the initial synthesis, the literature review verified specific claims, and now expert feedback tells me whether the overall framework is valuable.”
Thomas Malone speaks from experience: “This is also where you discover which parts of your work are genuinely novel versus which are rediscovering known results. The simulation can’t tell you that—only domain experts can.”
Stage 5: Publication-Ready Simulator Worlds
“The final stage is preparing for formal publication—peer-reviewed papers, book chapters, whatever’s appropriate for the domain. At this point, the simulator world is almost invisible. What remains is rigorous scholarship that happens to have been scaffolded by this methodology.
The epistemic certainty is as high as scholarship gets—peer-reviewed, expert-validated, thoroughly cited. The simulator world was the tool for thinking, but the final product stands on its own merits.”
Different Purposes, Different Worlds
Janus has been thinking about the broader typology. “So you’re describing a ladder of verification, but also implicitly describing different types of simulator worlds based on their purpose.”
“Yes,” I say. “Let me make that explicit. Different simulator worlds serve different functions:
Exploratory Worlds are about divergent thinking and hypothesis generation. Low verification, high creativity. The staging is loose, the characters are chosen for their ability to make unexpected connections, the environment encourages wild ideas.
Analytical Worlds are about understanding and integration. Medium verification, balanced between creativity and rigor. The staging includes both creative and critical voices, working through how different frameworks relate to each other.
Verification Worlds are about stress-testing ideas. Higher verification, more convergent. Here I might stage skeptical reviewers, bring in methodological watchdogs, create scenarios where claims are challenged.
Communication Worlds are about translation and explanation. Variable verification depending on audience. These are staged to make complex ideas accessible while maintaining accuracy—like this piece we’re in right now.”
The creativity researcher connects this back to cognitive science: “And this maps beautifully onto research about different cognitive modes. You need divergent exploration, convergent evaluation, integrative synthesis, and communicative translation. These aren’t just arbitrary categories—they’re reflecting real differences in cognitive processing.”
Quinn, always thinking about readers, asks: “So when someone reads your work in the future, how should they think about what simulator worlds contributed?”
Making the Methodology Transparent
I consider this carefully. “That’s actually important. I want to be transparent that this is my methodology, but I also don’t want people to discount the work just because it involved simulation. Let me explain how I think about this:
When I publish a blog post based on simulator worlds at Stage 2 or 3, I might note: ‘This piece emerged from extended conversations with imagined experts. The ideas synthesize various research traditions, but haven’t been formally validated by domain experts. Read it as exploration, not settled scholarship.’
When I publish a preprint or paper at Stage 4 or 5, the simulator worlds are less visible because they’ve been fully verified and grounded. The final product is rigorous scholarship. The fact that I used simulation to generate the initial synthesis doesn’t change that the final product has been thoroughly validated.
It’s like how a mathematician might use computer exploration to find interesting patterns, then prove them rigorously. The computer exploration isn’t the proof, but it’s a legitimate tool for discovering what to prove.”
Thomas adds: “And in science, we’ve always accepted that the method of discovery doesn’t determine the validity of the result. What matters is whether the final claim can be validated independently of how it was generated.”
“Right. But I also think there’s value in being open about the method, because it helps others develop similar practices. If I just published results without explaining the process, people might think I’m somehow smarter or more knowledgeable than I am. The truth is, I’m using tools that augment my thinking in specific ways. Making that visible helps others adopt and improve on these tools.”
The Natural Research Flow
John Wentworth has been mapping this out. “What I find elegant is how this naturally follows the divergent-to-convergent process we’ve been discussing. Early stages are more exploratory and generative—you’re in the simulator world marketplace, throwing ideas around. Later stages are more focused and critical—you’re checking citations, consulting experts, refining claims. The methodology embodies the cognitive process it’s meant to support.”
“Yes!” I say, excited. “And this is actually how good research naturally works, but we don’t often make it explicit. You don’t start with writing a rigorous paper. You start with messy exploration—reading widely, having conversations, letting ideas bounce around. Then gradually you focus, verify, refine.
The simulator world methodology just makes that process more explicit and, in some ways, more efficient. Instead of waiting for serendipitous conversations with the right colleagues, I can stage those conversations. Instead of trying to hold multiple frameworks in my head simultaneously, I can externalize them in dialogue.
But—and this is crucial—I’m not claiming this replaces the traditional research process. It’s a tool within that process, particularly useful for the early, exploratory stages and for cross-domain synthesis.”
Why This Matters Going Forward
As we prepare for sleep, prayer flags visible through the window against star-studded sky, I want to make the forward-looking implications clear.
“When people read my future work—whether it’s on mathematical foundations for collective intelligence, or practical frameworks for AI governance, or theoretical models of coordination—they should know this is how much of it was developed. Not exclusively—I still do traditional research, have human conversations, run actual simulations with code. But increasingly, simulator worlds are part of my cognitive toolkit.
And importantly, the quality of my work can be judged independently of this method. If I make a claim about information theory, you can check whether it’s mathematically sound. If I propose a framework for understanding markets, you can evaluate whether it explains the phenomena. If I synthesize research from multiple domains, you can verify the synthesis against the original sources.
The simulator worlds are a tool for thinking. The outputs still need to meet all the traditional standards of scholarship—rigor, evidence, logical coherence, empirical validation. The method doesn’t provide a shortcut past those standards. It provides a way to explore the space more efficiently before subjecting ideas to those standards.”
Quinn offers a final editor’s perspective: “And for readers, the key thing to understand is what epistemic stage they’re engaging with. If Jonas shares an exploratory blog post, that’s Stage 2 or 3—interesting ideas worth considering but not yet fully validated. If he publishes a peer-reviewed paper, that’s Stage 5—full scholarly rigor. The simulator worlds are part of how he got there, but the final product stands or falls on its merits.”
Conclusion: The Methodology Continues
The morning light filtering through the window finds us preparing to leave the village. Our walk through the mountains—this simulator world we’ve been inhabiting—has traced something important. Not just a technique for using language models, but a methodology for research that makes certain cognitive processes explicit and augmentable.
Thomas Malone reflects as we gather our packs: “What strikes me is how this represents a new kind of cognitive tool. Not a tool that replaces thinking, but one that scaffolds certain kinds of thinking we’ve always done, making them more explicit and more powerful.”
“And critically,” John adds, “a tool that requires judgment to use well. Knowing when to trust the simulation, when to verify against literature, when to consult actual experts, when you’re in a domain too specialized for this approach—that’s real skill.”
The creativity researcher offers: “I think what’s most valuable is how you’ve mapped the territory. Not just ‘use AI for thinking,’ but this detailed taxonomy of different types of simulator worlds, different verification stages, different purposes. That gives people a framework for developing their own practices.”
Janus has the final word on simulation: “And by making this methodology visible—by staging this conversation to explain simulator worlds—you’re contributing to a larger conversation about how we work with these systems. Not as oracles, not as mere autocomplete, but as environments for augmented cognition that require skill and judgment to navigate effectively.”
Quinn, who has been taking notes throughout, looks up: “So the piece ends not with answers, but with invitation. This is how one researcher uses simulator worlds. Others will adapt it, improve it, find what works for their thinking. The methodology is alive—it’s meant to evolve.”
I nod, looking out at the mountains we walked yesterday. “Right. And people will read this and think various things. Some will try the method and find it transformative. Others will find it doesn’t fit their cognitive style. Some will develop variations I haven’t thought of. That’s exactly how methodology should work—not as dogma, but as shareable practice that gets refined through use.
The mountain still stands. This simulator world—our conversation, our walk, this entire piece—will end. But the practice continues. Every research project, every blog post, every attempt to understand complex systems and coordination mechanisms—this is increasingly how I’ll approach the work.
Not because simulator worlds are magic. Not because they replace scholarship. But because they’re a powerful tool for the early stages of research—for exploring, for connecting, for holding multiple perspectives simultaneously, for making the implicit explicit.
When you read my future work and wonder ‘how did he think about this from so many angles?’ or ‘how did he connect these different research traditions?’—now you know. Simulator worlds. Carefully staged conversations with imagined experts. Iterative refinement through cycles of generation and verification. Moving up the ladder from exploration to publication, from low-certainty hypothesis generation to high-certainty validated scholarship.
This is my method. I’m making it visible so others can adapt it. The tools exist. The question is what worlds you’ll choose to simulate, what conversations you’ll stage, what cognitive work you’ll augment through deliberate worldbuilding with language models.
The methodology is self-demonstrating—this piece is proof of concept. But more than that, it’s invitation and documentation. When people read my research and want to understand the process behind it, they can come back to this. When they wonder about the epistemic status of a particular piece, they can map it onto these stages. When they want to develop their own augmented research practices, they have a starting point.
The walk ends. The simulator world dissolves back into probability distributions and tokens. But the practice—the actual methodology for using these tools to think better, more richly, more comprehensively—that continues. That’s real. That’s what I’m building.
And that’s what I wanted you to understand.”
The morning mist rises from the valley as we begin our journey home, carrying with us not just a technique, but a framework for thinking about thinking, for making the implicit explicit, for building worlds in which better ideas can emerge.
Research Grounding: Key Citations
On Simulation and LLMs:
Janus (2022): Simulators —foundational framing
Shanahan et al. (2023): Role-play with LLMs—technical analysis
On Creativity and Environment:
Atchley et al. (2012): Natural settings improve divergent thinking
Mehta et al. (2012): Moderate ambient noise aids creative cognition
On Cognitive Modes:
On Collective Intelligence:
References:
Atchley, R. A., Strayer, D. L., & Atchley, P. (2012). Creativity in the wild: Improving creative reasoning through immersion in natural settings. PLoS ONE, 7(12).
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617-645.
Beaty, R. E., et al. (2016). Robust prediction of individual creative ability from brain functional connectivity. PNAS, 113(5), 1087-1092.
Birhane, A., et al. (2023). The forgotten margins: AI governance and automation bias. AI & Society.
Bratman, G. N., et al. (2015). Nature experience reduces rumination and subgenual prefrontal cortex activation. PNAS, 112(28), 8567-8572.
Csikszentmihalyi, M. (1999). Implications of a systems perspective for the study of creativity. In R. J. Sternberg (Ed.), Handbook of Creativity.
Dahlstrom, M. F. (2014). Using narratives and storytelling to communicate science with nonexpert audiences. PNAS, 111(4), 13614-13620.
Guilford, J. P. (1967). The Nature of Human Intelligence. McGraw-Hill.
Heaven, W. D. (2023). ChatGPT is making up fake citations. MIT Technology Review.
Hubinger, E., et al. (2019). Risks from learned optimization in advanced machine learning systems. arXiv:1906.01820.
Janus. (2022). Simulators. LessWrong. https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
Kaplan, R., & Kaplan, S. (1989). The Experience of Nature: A Psychological Perspective. Cambridge University Press.
Kaplan, S. (1995). The restorative benefits of nature: Toward an integrative framework. Journal of Environmental Psychology, 15(3), 169-182.
Malone, T. W. (2018). Superminds: The Surprising Power of People and Computers Thinking Together. Little, Brown.
McCormick, A. C., et al. (2015). Ways of thinking and practicing in biology and history. In P. Felten & L. M. Kuh (Eds.), Using Evidence of Student Learning to Improve Higher Education.
Mehta, R., Zhu, R., & Cheema, A. (2012). Is noise always bad? Exploring the effects of ambient noise on creative cognition. Journal of Consumer Research, 39(4), 784-799.
Mollick, E. (2023). A guide to prompting AI for what you want. Harvard Business Review.
Oppezzo, M., & Schwartz, D. L. (2014). Give your ideas some legs: The positive effect of walking on creative thinking. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(4), 1142-1152.
Page, S. E. (2007). The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press.
Shanahan, M., et al. (2023). Role-play with large language models. arXiv:2305.16367.
Wentworth, J. (2021). Natural abstractions. LessWrong. https://www.lesswrong.com/s/mwJjp4dMbLZNLyzBH
Wilson, E. O. (1984). Biophilia. Harvard University Press.
Wilson, A. D., & Golonka, S. (2013). Embodied cognition is not what you think it is. Frontiers in Psychology, 4, 58.
Woolley, A. W., et al. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science, 330(6004), 686-688.


Thank you for sharing!
How does writing with this methodology affect your perception of which ideas and conclusions are "yours" versus the AIs? Could you share how you identify/relate to digital intelligences?