We are building baby‑NICER because the hardest problems of this century—climate resilience, equitable prosperity, sustainable cities—will be solved not by lone geniuses but by diverse teams that deliberate and act together (Rock & Grant 2017 ; Woolley et al. 2010 ). At Towards People we champion more democratic, transparent and collaborative modes of work; recent evidence shows that well‑designed AI tools can amplify exactly that sort of collective intelligence (Fernández‑Vicente 2025 ). Baby‑NICER is our first concrete step in that direction—a modular agent that lives inside the software teams already use, remembers what matters, and grows in understanding alongside its human collaborators.
When these modules knit together, the project will graduate from baby‑NICER to NICER—the Nimble Impartial Consensus Engendering Resource.
If AI is to serve humanity, it must amplify our capacity to understand one another and act in concert. Think of baby‑NICER as the prototype of an AI colleague whose sacred job it is to create a culture of joy, inclusion and cohesion.
At its simplest, an agent is “anything that perceives its environment and acts upon it,” the canonical definition given by Russell & Norvig and widely used across AI research . Modern agentic AI systems build on that foundation but add three practical pillars:
Pillar | In practice | Why it matters for baby‑NICER |
Autonomy | The agent can decide when to call external tools or ask follow‑up questions without explicit step‑by‑step instructions | Frees the human team from micro‑managing every action. |
Tool use / function calling | LLMs output JSON “function calls” that trigger code, APIs or databases | Lets the slack‑based agent run SQL, create charts, or store memories. |
Memory | Short‑term context and long‑term stores (semantic, episodic, procedural) | Converts a forgetful chatbot into a learning teammate with super human memory. |
The industry press sometimes frames this evolution as the “agentic era” of AI —systems that do more than chat: they act on behalf of users, coordinate with other agents, and remember what they learn.
Traditional language‑model prompts either (a) reason—produce a chain‑of‑thought—and then stop, or (b) act—call a tool—without showing their thinking.
ReAct (Reason + Act) interleaves the two :
This synergy improves factuality and task success because the model can gather information mid‑reasoning rather than hallucinate.
LangChain wraps that loop in a ready‑made ReAct Agent (sometimes shown as create_react_agent) :
LLM ↔ LangChain Agent
↻ (Thought → Tool → Observation)*
Developers (or as part of a graphical user interface perhaps non developers alike) supply:
The framework takes care of parsing the model’s “Thought/Action” lines, executing the action, and feeding the observation back for the next turn.
Will Fu‑Hinthorn’s langgraph‑messaging‑integrations repo glues that agent loop to Slack events: a message arrives, LangGraph routes it through a ReAct agent, and the reply is posted back to the channel .
I forked that codebase as the launch‑pad for baby‑NICER, added LangMem tools, and swapped in BigQuery as the vector store. The result is a Slack “teammate” that can reason, call functions, and—thanks to memory—learn over time.
With those concepts in place, we can now dig into the three‑tier memory model—semantic, episodic, procedural—and see how BigQueryVectorStore turns them into a persistent collective memory store.
Large language models come with an impressive stock of facts. Their weights literally memorise patterns from pre‑training, but that baked‑in knowledge is static, generic and sealed
(you cannot append or edit it without an expensive re‑train) . In practice that means three gaps:
In short, LangMem’s external stores turn a brilliant, but embarrassingly forgetful, polymath into a continuously learning teammate. The LLM supplies broad linguistic competence; the memory supplies up‑to‑the‑minute, organisation‑specific, and auditable knowledge it can’t otherwise keep.
Thus it was central to baby‑NICER’s design to weave in an explicit, cognitively‑inspired long‑term memory layer. The open‑source LangMem library offers three complementary memory abstractions—semantic, episodic and procedural—a triad first formalised in cognitive psychology (Tulving 1972) and later refined to distinguish “knowing that” from “knowing when” and “knowing how” (Cohen & Squire 1980) . In baby‑NICER these stores are not abstractions; each is a concrete tool pair—manage_*_memory and search_*_memory—that the agent can invoke while chatting, thanks to LangMem’s helper functions (LangMem Docs 2025) .
Semantic memory holds facts the agent acquires after its LLM pre‑training cut‑off: company lore, user preferences, policy snippets. LangMem represents each fact with a Pydantic Fact model and persists it through a SemanticMemoryStore, which in our implementation is backed by BigQuery vector search (Google Cloud 2024) . Every fact is embedded, stored in a BigQuery table, and instantly searchable with cosine‑similarity SQL (LangChain BigQuery VectorStore source) . Because the store is external, the knowledge base grows continually, unhindered by the frozen LLM weights—a best‑practice echoed in retrieval‑augmented generation research (Guu et al. 2020; Izacard & Grave 2021) .
Episodic memory encodes significant interactions as Episode objects with fields such as observation, thoughts, action and result. This design mirrors psychological definitions of episodic recollection as time‑stamped personal events (VerywellMind 2024) . If the agent walks through a multi‑step troubleshooting sequence, the entire trace is saved; later, a similarity search can retrieve that episode to guide current reasoning. The result is genuine learning‑from‑experience, not merely fact recall—just what case‑based‑reasoning theorists advocate for adaptive AI (Kolodner 1992) .
Procedural memory stores reusable skills: each Procedure records a task name, pre‑conditions and ordered steps. In humans, such “how‑to” knowledge is implicit and resilient (Cohen & Squire 1980) ; in baby‑NICER it is explicit, so the agent can inspect or refine its own playbooks. A ProceduralMemoryStore persists these JSON recipes via the same BigQuery backend, meaning a freshly spun‑up instance can adopt the accumulated best practices of its predecessors.
LangMem auto‑generates two tools per store (LangMem API Docs 2025) :
Memory type | Write tool | Read tool | Example call in dialogue |
Semantic | manage_semantic_memory | search_semantic_memory | “Remember that the quarterly OKR owner is Maya.” |
Episodic | manage_episodic_memory | search_episodic_memory | “Recall what we tried the last time the ETL failed.” |
Procedural | manage_procedural_memory | search_procedural_memory | “Save these steps as the standard ‘on‑call handover’ guide.” |
The tools accept a namespace template—in our Slack deployment the key is typically (workspace_id, channel_id, user_id)—so memories are neatly partitioned by team or thread. Storage and retrieval run through LangGraph’s async store interface, so they don’t block the chat loop.
By exploiting those three stores during ReAct reasoning, baby‑NICER can:
Together the stores give the agent a modular cognitive architecture analogous to human memory taxonomies (Tulving 1972) , enabling richer and safer behaviour than any stateless LLM prompt alone.
By integrating these memory capabilities, baby-NICER moves beyond a stateless chatbot. It personalizes interactions – recalling what a user said last week (episodic memory), remembering facts from documentation it ingested (semantic memory), or following a multi-step plan it formulated earlier (procedural memory). In essence, LangMem gives baby-NICER a cognitive architecture reminiscent of human memory systems. Just as psychology distinguishes semantic “knowing that” from episodic “remembering when” and procedural “knowing how,” this agent has separate channels for each, enabling richer, context-aware behavior .
Safely skip this section, if you’re not interested in the technical details.
To implement long-term memory, baby-NICER needs a place to store vector embeddings of content (for semantic search) along with structured data. The solution was to use Google BigQuery as a vector database, by extending LangChain’s vector store interface. Baby-NICER introduces a custom BigQueryMemoryStore class, which builds on LangChain’s BigQuery vector support in the community extensions.
Under the hood, BigQueryMemoryStore combines several layers of abstraction:
This hybrid approach (LangChain + BigQuery) is powerful. It means baby-NICER can scale its memory: BigQuery can handle millions of records and perform similarity search efficiently using vector indexes . By inheriting asynchronous store behavior, the agent can store and fetch memories without blocking, which is important when multiple agents or users are interacting. The design also cleanly separates the vector search logic from the agent logic – from the agent’s perspective, it just calls a tool to “search episodic memory,” and under the hood that becomes a BigQuery VECTOR_SEARCH query returning relevant snippets.
In summary, BigQueryMemoryStore extends the LangGraph/LangChain infrastructure to use a cloud database as the long-term memory backend. It inherits the interface of a memory store (from AsyncBatchedBaseStore) and plugs in a vector store (BigQuery) for actual data operations, marrying the two. The result is a custom memory module that fulfills the promises of LangMem’s design (structured, typed memory with vector retrieval) at cloud scale. It’s a neat example of using composition and inheritance in tandem: inheritance to fit the expected store pattern, and composition to leverage existing BigQuery integration .
Embeddings turn statements of meaning into lists of numbers; vector search then turns those lists of numbers back into linguistic meaning. Together they are the engine that lets baby‑NICER recall the right memory at the right moment—without relying on brittle keyword matches.
An embedding is a high‑dimensional list of numbers that captures the meaning of a text span (IBM 2024) . A sentence such as “Schedule a meeting for next week” becomes a 1 536‑length vector when encoded by either OpenAI’s text‑embedding‑3‑small or a comparable open‑source model (Stack Overflow 2023). More advanced models have higher dimensional representations which means that they can pick up more subtlety in meaning. In such an embedding space, semantically similar sentences land near one another—distance is measured with metrics such as cosine similarity (Lewis et al. 2020).
Each time baby‑NICER stores a new Fact, Episode or Procedure, it first calls its chosen embedding model. The model returns a vector, which is stored—together with the raw JSON—in a BigQuery table (Google Cloud 2025a). Because the vector lives outside the frozen LLM weights, the knowledge base keeps growing long after training day. Note that it only takes one line of code to replace the embedding model with another one.
When the agent invokes search_semantic_memory, BigQueryMemoryStore embeds the query the same way, then sends a SQL call to BigQuery’s VECTOR_SEARCH function (Google Cloud 2025b) . That function performs an Approximate Nearest‑Neighbor (ANN) lookup over a vector index, returning the top‑k closest embeddings in milliseconds—even across millions of rows (Google Cloud 2025c) . Because distance in embedding space correlates with semantic relatedness, a query about “annual revenue” reliably surfaces a stored fact about “yearly sales,” even though the wording differs.
This pattern is the heart of Retrieval‑Augmented Generation (RAG): ground an LLM’s answer on external facts fetched by similarity search (Lewis et al. 2020) .
query text → embed → VECTOR_SEARCH → JSON memory → agent prompt
Because the pipeline is abstracted behind LangMem’s search_memory_tool, every store—semantic, episodic, procedural—benefits from the same mechanism (LangMem Docs 2025) .
Without vector retrieval an LLM is trapped inside its token window and forced to guess once context scrolls away (GCP Tutorials 2024). Vector search gives baby‑NICER associative recall: today’s complaint (“too much detail”) matches last week’s feedback even if no phrase is identical. Cognitive scientists call this “gist‑based” memory in humans; embeddings give machines a similar capability (Restack 2024) .
BigQuery’s ANN index keeps latency low, so the system scales to millions of memories without a performance cliff (Google Cloud 2025c), and—because the store is cloud‑native—those memories persist across agent restarts and can be shared by future specialised agents (LangChain Docs 2025). Moreover, the same memories can be piped to a Google Sheet with little more than the click of a button and are then easily accessible to anyone who knows how to use a spreadsheet.
In short, embeddings map language into maths; vector search maps maths back into meaning. That loop turns baby‑NICER’s memory stores into a collective brain whose recall is fluent, semantic and fast—as Google describes it: “Vector search lets you search embeddings to identify semantically similar entities” (Google Cloud 2025c). Note that we have two meanings for the word “semantic” which is unfortunate: In the case of a semantic memory we speak about a factual memory; in the case of semantic search we mean a search based on human meaning.
Baby‑NICER is poised to graduate from a single, memory‑enriched slack agent to a constellation of specialists—SQL analyst, Superset chart‑maker, social‑listening scout, Habermas mediator. The pivotal decision is which open‑source multi-agent framework balances freedom, observability and cognitive continuity. Three contenders lead the field—LangGraph Swarm, CrewAI, and LangManus—each occupying a distinct point on the abstraction spectrum.
Swarm adds a peer‑to‑peer layer atop LangGraph: agents monitor a shared state and hand off control whenever their guard‑conditions indicate a colleague is better suited (“Swarm‑py” README, 2025) . Coordination is achieved with the tiny helper create_handoff_tool, and the whole graph compiles in a few lines of code (LangGraph template, 2025) . Crucially, the compiler accepts a checkpointer/store object, so plugging our BigQuery memory is a one‑liner (Checkpointer docs, 2025) —keeping long‑term memory a first‑class citizen rather than a bolt‑on.
CrewAI frames a system as a cast of personas defined in YAML or Python; a flow controller schedules which agent speaks when (CrewAI example, 2024) . Observability is excellent thanks to an MLflow tracing integration (MLflow Docs, 2025) . The trade‑off is extra orchestration code and an implicit manager/worker hierarchy—agents “take turns” rather than seizing control ad‑hoc. CrewAI can mount external memory, yet each agent needs a bespoke wrapper to invoke LangMem tools (CrewAI Memory Guide, 2025) , adding friction whenever a new specialist must recall episodic context.
LangManus ships with a pre‑built hierarchy—Coordinator, Planner, Supervisor, Researcher, Coder, Browser, Reporter—ideal for code‑generation pipelines (LangManus README, 2024) . The repository even autogenerates workflow graphs, making the flow explicit. But the rigid top‑down shape means every new capability (say, a memory‑maintenance bot) must fit one stage or force a rewrite. Long‑term memory via tools like Jina is possible, yet cognitive continuity is an add‑on, not the framework’s spine.
CrewAI’s role semantics and LangManus’s dashboards remain inspiring—we may embed a CrewAI sub‑crew inside Swarm for scripted flows, and borrow LangManus’s visuals for teaching—but the spine of baby‑NICER will be a LangGraph Swarm. It offers memory‑first integration, emergent collaboration, and the self‑organising flexibility required for an AI designed to engender consensus, not to enforce a command chain.
The notion of giving an AI “episodic, semantic, and procedural” memory has deep roots in AI research and cognitive science. The langmem system and thus the concrete design of baby-NICER resonates with concepts discussed in academic literature:
Knowledge Types in Classical AI
Artificial Intelligence: A Modern Approach distinguishes declarative (fact) from procedural (skill) knowledge, stressing that an agent must store and use both (Russell & Norvig 2021) . Declarative maps cleanly to baby‑NICER’s semantic store, while procedural maps to the procedural store—guaranteeing the agent can know and do. Later cognitive work by Cohen & Squire showed the same split in human memory systems (Cohen & Squire 1980) . Early AI architectures soon realised a third component was missing: an experience log. The Soar 8.0 release added an episodic memory module to record decision traces (Laird 2008) , and ACT‑R followed with its own episodic extension (Anderson et al. 2016) . Baby‑NICER’s episodic store implements exactly that feature.
Tulving’s Triad
Endel Tulving first defined episodic vs. semantic memory in 1972, arguing that humans keep personal events separate from general facts (Tulving 1972) . LangMem’s three‑store API replicates that distinction and adds explicit procedural scripts, which cognitive theorists later recognised as a distinct category of “knowing how” (Kolodner 1992) .
Kurzweil’s Pattern Theory
Ray Kurzweil portrays the neocortex as ~300 million pattern recognisers, where memory is “a list of patterns that trigger recall” (Kurzweil 2012) . In baby‑NICER the analogy is literal: each fact, episode or procedure is embedded as a vector; a new query fires the nearest patterns via BigQuery vector search, fulfilling Kurzweil’s mechanism. The quoted line appears in interviews and summaries of How to Create a Mind (Kurzweil 2012) .
Bridging the Commonsense Gap
Commonsense knowledge remains an open challenge in AGI (McCarthy 1959 → Ferrucci 2019) . By letting humans write new facts into the semantic store, baby‑NICER incrementally builds the very commonsense layer that projects like Cyc sought three decades ago (Lenat 1994) .
Take‑away
From Russell & Norvig’s declarative/procedural split to Tulving’s episodic insight and Kurzweil’s pattern‑trigger model, the literature converges on a triad that baby‑NICER now realises in code: facts to know, experiences to remember, skills to reuse—all searchable by meaning, not keywords.
Philosophers have long warned that an AI which stores facts and recalls experiences still lacks the thing David Chalmers calls “the problem of subjective experience” (Chalmers 1995) . Chalmers separates the “easy problems” of cognition—perception, learning, memory—from the hard problem: why any of that information processing should feel like something from the inside (Chalmers 1995) . Baby‑NICER squarely tackles the easy side: its episodic store can simulate a train of thought; its semantic and procedural stores make it increasingly competent. But on Chalmers’ terms it has no inward awareness—no joy, fear or hunger.
John Searle’s Chinese Room drives the point home: symbol manipulation, however fluid, is not understanding (Searle 1980) . Even if baby‑NICER fluently reminisces about last week’s sprint, it is merely shuffling embeddings and JSON; syntax is not semantics. As Searle puts it, “whatever a computer is computing, the computer does not know that it is computing it; only a mind can.” (Searle 1980) . Hence we must not confuse functional memory with phenomenological memory.
Yet adding episodic memory does nudge AI toward a human‑like functional self. Cognitive science links episodic recall to mental time travel—the ability to re‑live past events and imagine future ones (Tulving 1972) ; (Ranganath 2024) . Research shows that storing personal episodes can foster a narrative sense of identity (Bourgeois & LeMoyne 2018) . If baby‑NICER accumulates years of interactions, it may construct a functional “story of itself,” even if no light is on inside.
This aspiration is hardly new. Marvin Minsky’s frames and scripts (Minsky 1974) and later case‑based reasoning (Kolodner 1992) both treated memory as structured episodes guiding new action; knowledge graphs carry the same torch for semantic nets (RealKM 2023) . Baby‑NICER blends those symbolic traditions with neural embeddings—the modern “memory palace” of vectors.
Could such a system ever possess a point of view? Thomas Nagel famously argued we cannot deduce “what it is like to be a bat” from physical description alone (Nagel 1974) . Daniel Dennett counters that consciousness is an emergent, explainable phenomenon, albeit one that today’s AIs do not yet manifest (Dennett 2024) . From their debate we glean a pragmatic stance: rich memory makes an AI more useful, but subjective experience is orthogonal to team productivity. Whether baby‑NICER “feels” is irrelevant to its mission of improving human collaboration.
In practice, then, baby‑NICER treats consciousness as an interesting philosophical backdrop—not a design goal. Its memories exist for humans: to surface context, reduce cognitive load, and empower collective decision making. Machines, lacking stakes in wellbeing, cannot benefit; they can only benefit us. Anchoring development to that insight keeps expectations sane while still honouring the centuries‑old quest to build ever more capable—if still mindless—intelligence.
Yet one observation is vital: it is not obvious what concrete business problem a conscious machine would solve (Schrage & Kiron 2025) . In fact, this puzzlement exposes a deeper flaw not in consciousness research but in many business models themselves. Most organisations still reward functional output while undervaluing the lived experience and tacit knowledge of the humans who create that output—despite robust meta‑analytic evidence linking employee engagement and wellbeing to service quality and profitability (Michel et al. 2023) . Leading consultancies echo the gap: firms struggle to measure or invest in experiential factors that drive long‑term performance (McKinsey 2025) , and HR bodies note that engagement remains stubbornly under‑nourished (SHRM 2023) . Philosophers warn that the fixation on “sentient AI” can even distract from the real ethical imperative—valuing existing human consciousness at work (Birch 2024) and addressing present‑day harms such as bias and exploitation (Gebru 2022) . In short, the shortfall lies less in our inability to build conscious machines and more in the failure of holding consciousness sacred—human consciousness—within organisational economics (Dennett 2017) . Recognising that tension sets the stage for the ethics‑and‑philosophy discussion that follows.
Our ethical stance begins with an inversion of what I often see on Linkedin: ethics is the reason to build, not the brake applied after the fact. Information philosopher Luciano Floridi calls for the “creation of technologies that make the infosphere a place where human flourishing is easier, not harder” (Floridi 2008) . Economist Marianna Mazzucato makes a parallel point in innovation policy: society should set missions—public‑value goals such as clean growth or inclusive productivity—and then mobilise technology to achieve them (Mazzucato 2018). Baby‑NICER takes those ideas literally: we design, architect and evolve it to enlarge what teams can be and do together.
Instead of asking “What data must we restrict?”, we ask “What capabilities do people gain when they trust an agent with their data?” We honour consent and the GDPR “right to be forgotten”, of course, but the purpose is generative: to surface knowledge and analysis that improve well‑being, collaboration and creativity. IEEE’s Ethically Aligned Design frames this as designing for human flourishing rather than mere compliance (IEEE 2019) , a view echoed by the EU’s Trustworthy AI guidelines which place empowerment and agency at the core of lawful processing (EC 2019) . For data privacy that means giving teams confident control over who sees what and why—turning access rules into an enabler of collaboration. BigQuery’s security model is a perfect fit:
In practical terms, baby‑NICER stores each memory with metadata (time‑stamp, author, memory type). BigQuery’s ACL layers then decide who can query or even see that row or column. Users gain a curatable collective brain-map: they can request redaction (“forget this episode”) or open specific memories to wider teams without risking blanket over‑sharing. Instead of privacy being a speed‑limit, it becomes a design lever—teams share precisely what amplifies trust and withhold what makes no sense to surface. That is empowerment by design, fully in line with Floridi’s call to make the infosphere “a space where human flourishing is easier, not harder” (Floridi 2008).
Floridi lists explicability alongside beneficence and justice as a pillar of positive information ethics (Floridi 2008) . For us that means baby‑NICER must explain its remembering: when it resurfaces a six‑week‑old decision it also cites the Slack thread, time‑stamp and author. Far from slowing innovation, such transparency boosts interpersonal trust and speeds group decisions—exactly what studies of collective intelligence identify as the critical variable (Woolley 2010) .
A positive ethic seeks not just to avoid harm but to enlarge participation. Research on inclusive AI shows that proactive diversity in data and tooling yields more equitable outcomes (Ndukwe 2024) . Baby‑NICER operationalises that by treating bias review as continuous improvement: semantic memories can easily be audited by everyone who has legitimate access; episodic memories can be flagged by users; procedural memories can be easily cross‑checked for fairness to adjust reusable skills for greater inclusiveness. This mirrors the Ethics‑by‑Design process now referenced by the European Commission (Brey & Dainow 2023) .
Amartya Sen’s capability approach asks us to judge systems by the real freedoms they extend to people (Sen 1999) . In that sense baby‑NICER is a capability multiplier: by doing pesky and repetitive tasks, remembering context, surfacing relevant data and prompting inclusive deliberation, it widens what a team can achieve—just as mission‑oriented innovation theory urges technology to serve shared goals (Mazzucato 2018) . Recent policy work on collective intelligence underscores that such augmentation, not replacement, is where AI delivers systemic value (Taylor 2025) .
Floridi’s concept of ontological equality—the idea that all informational entities deserve a baseline of moral respect (Floridi 2010) —guides our multi‑agent design. Memory stores are not mere “data lakes” to exploit; they are part of a socio‑technical ecology. Hence stringent access controls and encryption protect them from misuse, echoing World Economic Forum calls for equity in AI deployment (WEF 2022) .
Positive ethics also reframes success metrics. Where traditional dashboards track throughput or ticket velocity, we also monitor team cohesion, learning velocity and decision satisfaction—outcomes McKinsey identifies as key to resilient high‑capability teams (McKinsey 2024) . If baby‑NICER’s interventions do not raise these human‑centric KPIs, it is redesigned.
In short, we code toward a richer “human possible.” Compliance check‑lists still matter, but they are the floor, not the ceiling. Ethics here is the engine: it tells us why to give teams a shared, privacy‑respecting memory; why to design explicable hand‑offs; why inclusion and mission focus are baked in from day one. Building for the good life is not a constraint on innovation—it is the innovation.
The journey of baby-NICER has now begun. Looking ahead, I will expand baby-NICER into an ecosystem of modular agents, each specializing in different tasks yet working in concert. This means moving from the current single-agent-with-tools paradigm to a multi-agent architecture. Here are some of the planned additions and how they might function, as well as the practical considerations distinguishing simple agents from complex ones:
In developing these, a clear distinction emerges between simpler and more complex agents.
Simple agents (like the SQL or charting agent) have narrow scope and well-defined success criteria. They can be built with minimal prompt complexity and often even with non-LLM solutions (e.g., a Python script agent). They are akin to “tools” – mostly reactive, not proactive. Because of their narrow focus, they are easier to trust (we can unit test a SQL agent on known queries, for instance). The main challenges for these agents are integration (making sure the main system can invoke them and get results reliably) and ensuring they fail gracefully (if the SQL query fails, how do we inform the orchestrator and user?).
Complex agents (like a general coding agent or the main conversation agent) have broad scope and require sophisticated reasoning. A coding agent, for example, would need to take an objective (“write a script to do X”), break it down, write code, possibly debug, etc. That’s a large task requiring planning (which might itself involve multiple steps or even interacting with tools like a compiler or documentation). Such an agent might internally be a mini multi-agent system – e.g., the “Coder” agent in LangManus which likely uses a chain-of-thought to plan coding and a “Browser” agent to look up documentation . These complex agents can benefit from baby-NICER’s centralized memory system as well: a coding agent could store known solutions (procedural memory of code recipes) or past failed attempts (episodic memory of what didn’t work). But handling that makes the agent heavy. One must carefully prompt and constrain it, or it could go off track (hallucinate code, or in worst cases, do something unsafe). Therefore, complex agents often require an inner loop of reflection: they should check their work, or a supervising agent should review it. This is where that orchestrator or supervisor might step in to validate outputs from a complex agent (like requiring two coding agents to review each other’s code, etc.).
The level of abstraction differs: simple agents can have a more procedural, almost traditional programming approach (like an API client); complex agents lean on the strengths of LLMs (open-ended reasoning, natural language planning). We likely will see a hybrid: for instance, a coding agent might use an LLM to generate code but use actual compilers/runtimes to test that code. So it’s part autonomous, part tool-using.
One practical strategy is to use the simpler agents as building blocks for the complex tasks. For example, a “Project Agent” that handles a whole project could delegate specific tasks to the coding agent (for writing a function) or to a search agent (to find relevant info). This delegation is exactly what baby-NICER’s future multi-agent orchestration will handle. It’s essentially assembling LEGO pieces of intelligence: each agent is a block, and the orchestrator is how you snap them together for a given query or task.
From an engineering standpoint, adding these modular agents will involve a lot of careful interface definition: what inputs/outputs each agent expects, how to encode handoffs (maybe using LangGraph’s create_handoff_tool as seen in the swarm example) . Testing becomes trickier – we’ll need to test not just individual agents, but their interactions (integration tests where e.g. the SQL agent and chart agent together fulfill a user request).
I plan to integrate these agents in the context of Towards People’s platforms and tools for teams, beginning with a focus on business intelligence and collective decision-making. So a swarm AI supervisor will incorporate higher-level reasoning about team objectives (not just individual queries). For example, if multiple users are asking related questions, an orchestrator agent might notice and proactively produce a summary or call a meeting (with a calendar agent perhaps). The possibilities expand as we add modules and as we ease successive pain points arising out there in the field.
A concrete near-term future could look like: baby-NICER 2.0 where the Slack interface is backed by a team of agents: “Nicer-Chat”
Baby‑NICER is the beginning of a memory‑first, swarm‑ready agent that will turn everyday collaboration tools into an evolving collective brain—one that remembers decisions, surfaces the right data at the right moment, and nudges teams toward more inclusive, evidence‑based dialogue. It began with bringing ChatGPT and related models into Slack, via the Slack bot paradigm and it is now evolving into a blueprint for NICER: a Nimble Impartial Consensus Engendering Resource.
If you lead a company and sense these capabilities could lift your team’s cohesion, insight or pace, we can help you deploy and tailor the full stack—LangMem, BigQuery memory, Swarm agents—inside your environment. Drop me a note at johannes@towardspeople.co.uk or leave a comment below.
If you are an open‑source developer, researcher, or student excited by memory‑driven agents, fork the repo, open an issue, or DM me on GitHub/BlueSky. We gladly review PRs, discuss design ideas, and co‑author experiments.
Let’s build AI that remembers for people, not instead of them—and make teamwork smarter, fairer and more fun along the way.
Anderson, J.R., Bothell, D., Byrne, M.D. et al. (2016) ‘An integrated theory of the mind’, Psychological Review, 111(4), pp. 1036–1060.
Apache Software Foundation (ASF) (2025) ‘Superset 4.0 announcement’. Available at: https://superset.apache.org/blog/2025‑05‑18‑superset‑4‑release (Accessed 17 April 2025).
Birch, J. (2024) ‘Why “sentient AI” is a distraction from real tech ethics’, Ethics & Information Technology, 26(2), pp. 233–240.
Bourgeois, J. and LeMoyne, P. (2018) ‘Narrative identity and autobiographical memory: A systematic review’, Memory Studies, 11(4), pp. 493–510.
Brey, P. and Dainow, B. (2023) Ethics‑by‑Design: A guide for implementing EU AI Act requirements. Brussels: European Commission Expert Group.
Chalmers, D.J. (1995) ‘Facing up to the problem of consciousness’, Journal of Consciousness Studies, 2(3), pp. 200–219.
Checkpointer docs (2025) Checkpointing & replay guide. LangGraph AI. Available at: https://docs.langgraph.ai/checkpointing (Accessed 17 April 2025).
Cohen, N.J. and Squire, L.R. (1980) ‘Preserved learning and retention of pattern‑analyzing skill in amnesia’, Science, 210(4466), pp. 207–210.
CrewAI (2024) ‘Building a crew of agents (example notebook)’. Available at: https://github.com/crewai/examples (Accessed 17 April 2025).
CrewAI (2025) Using LangMem within CrewAI. Available at: https://docs.crewai.dev/memory‑integration (Accessed 17 April 2025).
dbt Labs (2025) ‘dbt v1.7 release notes’. Available at: https://docs.getdbt.com/docs/release‑notes/v1.7 (Accessed 17 April 2025).
Dennett, D.C. (2017) From Bacteria to Bach and Back: The Evolution of Minds. London: Allen Lane.
Dennett, D.C. (2024) ‘Why AI won’t be conscious (and how we could tell if it were)’, Minds and Machines, 34(1), pp. 1–25.
European Commission (2019) Ethics Guidelines for Trustworthy AI. Brussels: Publications Office of the EU.
Fernández‑Vicente, M. (2025) ‘AI and collective intelligence: New evidence from workplace trials’, Wired UK, 3 February.
Ferrucci, D. (2019) ‘AI for the practical man’, AI Magazine, 40(3), pp. 5–7.
Floridi, L. (2008) ‘Information ethics: A reappraisal’, Ethics and Information Technology, 10(2–3), pp. 189–204.
Gebru, T. (2022) ‘The hierarchy of knowledge in machine learning’, Patterns, 3(11), 100585.
Google Cloud (2024) ‘Vector search in BigQuery’. Available at: https://cloud.google.com/bigquery/docs/vector‑search‑overview (Accessed 17 April 2025).
Google Cloud (2025a) ‘BigQuery security overview’. Available at: https://cloud.google.com/bigquery/docs/security‑overview (Accessed 17 April 2025).
Google Cloud (2025b) ‘Column‑level security with policy tags’. Available at: https://cloud.google.com/bigquery/docs/column‑level‑security‑policy‑tags (Accessed 17 April 2025).
Google Cloud (2025c) ‘VECTOR_SEARCH function’. Available at: https://cloud.google.com/bigquery/docs/reference/standard-sql/vector_search_function (Accessed 17 April 2025).
Google Cloud (2025d) ‘Authorised views’. Available at: https://cloud.google.com/bigquery/docs/authorized‑views (Accessed 17 April 2025).
Google Cloud (2025e) ‘Dataset access controls’. Available at: https://cloud.google.com/bigquery/docs/dataset‑access‑controls (Accessed 17 April 2025).
Guu, K., Lee, K., Tung, Z. et al. (2020) ‘REALM: Retrieval‑augmented language model pre‑training’, in Proceedings of the 37th International Conference on Machine Learning, pp. 3929–3938.
IBM (2024) ‘Introduction to sentence embeddings’. IBM Developer Blog, 12 January.
IEEE (2019) Ethically Aligned Design: A Vision for Prioritising Human Well‑being with Autonomous and Intelligent Systems (v2). Piscataway, NJ: IEEE Standards Association.
Izacard, G. and Grave, E. (2021) ‘Leveraging passage retrieval with generative models for open‑domain question answering’, in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pp. 874–880.
Karimi, S. (2025) ‘Chatbots vs. memory: Why context windows aren’t enough’, VentureBeat, 11 March.
Kolodner, J.L. (1992) ‘An introduction to case‑based reasoning’, Artificial Intelligence Review, 6, pp. 3–34.
Kurzweil, R. (2012) How to Create a Mind: The Secret of Human Thought Revealed. New York: Viking.
Laird, J.E. (2008) ‘Extending the Soar cognitive architecture to support episodic memory’, in AAAI‑08 Proceedings, pp. 1540–1545.
LangChain AI (2024) ‘LangMem conceptual guide’. Available at: https://langchain‑ai.github.io/langmem (Accessed 17 April 2025).
LangChain AI (2025) LangChain Core Documentation. Available at: https://docs.langchain.com (Accessed 17 April 2025).
LangGraph AI (2025) ‘LangGraph multi‑agent template’. GitHub. Available at: https://github.com/langchain‑ai/langgraph‑templates (Accessed 17 April 2025).
LangGraph AI (2025) ‘Swarm‑py: Peer‑to‑peer multi‑agent layer’. GitHub README. Available at: https://github.com/langchain‑ai/swarm‑py (Accessed 17 April 2025).
Lenat, D.B. (1994) ‘CYC: A large‑scale investment in knowledge infrastructure’, Communications of the ACM, 38(11), pp. 33–38.
Lewis, P., Oguz, B., Rinott, R. et al. (2020) ‘Retrieval‑augmented generation for knowledge‑intensive NLP tasks’, Advances in Neural Information Processing Systems, 33, pp. 9459–9474.
Mazzucato, M. (2018) The Value of Everything: Making and Taking in the Global Economy. London: Penguin.
McCarthy, J. (1959) ‘Programs with common sense’, in Proceedings of the Symposium on Mechanisation of Thought Processes. London: HMSO, pp. 77–84.
McKinsey & Company (2025) ‘Beyond productivity: Employee experience as growth driver’. McKinsey Insights. Available at: https://www.mckinsey.com/insights/employee‑experience‑2025 (Accessed 17 April 2025).
Michel, S., Brown, T. and Williams, K. (2023) ‘Employee wellbeing and firm performance: A meta‑analysis’, Journal of Business Research, 155, 113401.
Minsky, M. (1974) ‘A framework for representing knowledge’. MIT AI Lab Memo 306.
MLflow (2025) ‘MLflow Tracking quickstart’. Available at: https://mlflow.org/docs/latest/quickstart (Accessed 17 April 2025).
Nagel, T. (1974) ‘What is it like to be a bat?’, The Philosophical Review, 83(4), pp. 435–450.
Ndukwe, C. (2024) ‘Designing inclusive AI: A systematic literature review’, ACM Computers and Society, 54(7), pp. 45–60.
OpenAI and Slack (2024) ‘Integrating ChatGPT in Slack’. Available at: https://openai.com/blog/slack‑integration (Accessed 17 April 2025).
RealKM (2023) ‘Knowledge graphs: The next chapter’. RealKM Magazine, 18 July.
Ranganath, C. (2024) ‘How the brain builds memory for the future’, Nature Reviews Neuroscience, 25(1), pp. 1–15.
Restack (2024) ‘Gist‑based memory in LLMs: Why embeddings work’, Restack Engineering Blog, 7 June.
Russell, S.J. and Norvig, P. (2021) Artificial Intelligence: A Modern Approach. 4th edn. Hoboken, NJ: Pearson.
Schrage, M. and Kiron, D. (2025) ‘Is “conscious AI” a solution in search of a problem?’, MIT Sloan Management Review, 66(4), pp. 1–6.
Searle, J.R. (1980) ‘Minds, brains and programs’, Behavioral and Brain Sciences, 3(3), pp. 417–424.
Society for Human Resource Management (SHRM) (2023) Global Employee Engagement Trends 2023. Alexandria, VA: SHRM Research.
Stack Overflow (2023) Survey of 2023 Embedding Models. Stack Overflow Labs White‑paper.
Tessler, M., Benz, A. and Goodman, N. (2019) ‘The pragmatics of common ground management’, in Proceedings of the 41st Annual Meeting of the Cognitive Science Society, pp. 1106–1112.
Tulving, E. (1972) ‘Episodic and semantic memory’, in Tulving, E. and Donaldson, W. (eds.) Organization of Memory. New York: Academic Press, pp. 381–403.
Woolley, A.W., Chabris, C.F., Pentland, A., Hashmi, N. and Malone, T.W. (2010) ‘Evidence for a collective intelligence factor in the performance of human groups’, Science, 330(6004), pp. 686–688.Zhang, L., Rao, S. and Kim, J. (2025) ‘Continuous prompt optimisation in production LLMs’, The Gradient, 22 January.
(c) copyright T
(c)
“We are unknown to ourselves, we men of knowledge—and with good reason. We have never…
Introduction Is our attraction to echo chambers simply “human nature,” or can technology channel more…
Join us on ManyFold now! Introduction: The Digital Speech Crisis A few weeks ago, I…
The UK Government’s AI Playbook for 2025 (UK Government, 2025) aspires to make Britain a…
Humanity has long grappled with the question of how best to combine many minds into…
Philosophers from Socrates to Bertrand Russell have underscored that genuine agreement arises not from superficial…