Building OTTO: What My Master's Thesis Taught Me About AI Agents, Knowledge Management, and Enterprise Workflows

Learnings from my Master's Thesis.

6/7/20266 min read

Hello again - it's been a while!

I finally got my Master's of Science degree after finishing my Thesis titled "Capturing the Unspoken: Design and Evaluation of an AI Agent for Tacit Knowledge Management in Project-Based Organizations."

The goal of the thesis was to research how an AI agent can be built so that it basically interviews employees to gain insights of their implicit knowledge, persists it and makes it searchable for a positive impact on future projects.

Since the company I developed the agent for uses Microsoft 365 - I dived into the Microsoft Copilot Studio suite and was able to learn quite a lot about agentic workflows, orchestration and retrieval-augmented generation.

The agent was ultimately named OTTO (Organizational Tacit Transfer Orchestrator) and lives in the Microsoft Teams client, so that it could be reached quite easily within the work context of every employee.

Meet OTTO

OTTO lives inside Microsoft Teams so employees can reach it from the place where a lot of their project communication already happens.

The basic workflow is simple:

OTTO prompts a project team member to reflect on a recent project situation, or the user starts a capture session manually.
Instead of just asking for a short "lesson learned", OTTO guides the user through a conversational interview.
The agent asks follow-up questions about the situation, the decision, alternatives, constraints, rationale, and what someone else could learn from it.
The captured conversation is summarized into a structured knowledge record.
Other users can later ask natural-language questions in Teams and retrieve relevant insights through a retrieval-augmented generation workflow.

The important part is that OTTO was not meant to be a generic chatbot. The goal was to design a small knowledge management system around a conversational interface. The chat was only the visible layer. Behind it were orchestration flows, structured data records, contributor context, retrieval logic, and evaluation data.

What I Built

Technically, the prototype combined several Microsoft ecosystem components:

Microsoft Teams as the user-facing interface.
Microsoft Copilot Studio for the conversational agent and topic orchestration.
Power Automate / Agent Flows for background automation and data operations.
Dataverse for storing projects, users, sessions, and captured knowledge insights.
RAG-style retrieval to search stored knowledge and synthesize answers.

The Dataverse schema became one of the most important parts of the artifact. I had to model not only the final insight, but also the surrounding context: project, contributor, session type, rationale, alternatives considered, contextual factors, transferable lesson, applicability boundaries, and original transcript.

That sounds like a lot of metadata, but it matters. A lesson learned without context often becomes too generic to be useful. "Plan stakeholder alignment earlier" is not very helpful by itself. It becomes useful when you know what kind of project it came from, what constraint made the timing difficult, what decision was made, and when the lesson does or does not apply.

What I Learned

1. AI agents are mostly orchestration problems

Before the thesis, it is easy to think of an AI agent mainly as a prompt plus a chat interface.

After building OTTO, I see it differently. The hard part was not only getting a language model to generate good text. The hard part was coordinating the whole process around it: when to trigger a conversation, how to keep state, how to decide which flow should run, how to persist outputs, how to handle edge cases, and how to make retrieval useful later.

In an enterprise setting, the agent is only as good as the workflow around it.

2. RAG is not magic. The data model matters.

Retrieval-augmented generation sounds simple: store knowledge, search it, use AI to answer questions.

In practice, the quality of retrieval depends heavily on how the knowledge was captured and structured in the first place. If the stored records are vague, incomplete, or missing context, the retrieval layer cannot fix that reliably.

This became one of the main design lessons of the thesis: good RAG starts before retrieval. It starts during capture.

3. The best AI workflow is often the least disruptive one

One of the stronger findings from the evaluation was that workflow integration matters. If employees have to leave their normal work environment and visit yet another platform, adoption becomes harder immediately.

Putting OTTO into Teams was therefore not just a technical convenience. It was a design decision. The agent had to meet users where they already work.

For future AI projects, I would treat this as a central question from the beginning: where does the user already spend time, and how can the AI capability be embedded there instead of creating another destination?

4. Conversational capture works best when it guides reflection

The strongest evaluation evidence supported narrative-based scaffolding. In simpler terms: OTTO worked best when it helped people tell a richer story about what happened.

That means asking about the decision, the alternatives, the rationale, the constraints, and the transferability of the lesson. A good agent should not only collect answers. It should help users think.

This is especially important for tacit knowledge, because people often do not have the full lesson ready before the conversation starts. The value emerges through the dialogue.

5. Not every promising design feature matters equally in every context

Some findings were more nuanced than expected.

For example, I expected contributor attribution to improve the credibility of retrieved knowledge. In the evaluation setting, this effect did not really appear. A likely reason is that the participants already knew each other and already had a sense of who was credible in which area.

That does not mean attribution is useless. It means its value may become more visible when knowledge is reused across teams, departments, or projects where people do not already know each other.

This was a good reminder that AI systems are always social systems too. A feature that makes sense in theory can behave differently depending on the organizational context.

6. A prototype becomes real when it breaks in boring ways

The second field iteration also showed how fragile agent orchestration can be. Some conversations appeared successful from the user side, but not every captured result was persisted correctly.

That kind of issue is not glamorous, but it is exactly where prototype work becomes real. Reliability, observability, data integrity, and fallback paths matter just as much as impressive AI output.

For client work, this is probably one of the most important lessons: AI demos can be built quickly, but useful AI systems need boring engineering discipline.

What the Thesis Concluded

The thesis does not claim that AI can capture all tacit knowledge. Some knowledge is deeply embodied, social, or experience-based in a way that cannot simply be extracted into a database.

The more realistic claim is this: AI agents can help capture tacit-but-articulable knowledge. That includes things like decision rationales, heuristics, constraints, trade-offs, and lessons that experienced people can explain when they are prompted in the right way.

The evaluation showed that the most robust parts of OTTO were:

guided conversational scaffolding for capturing richer project knowledge;
workflow-integrated access through Microsoft Teams;
structured persistence that keeps enough context for later interpretation.

Other parts, such as proactive triggers and contributor attribution, were promising but more context-dependent.

That mixed result actually made the thesis more useful to me. It showed where AI agents already work well for knowledge management, and where design assumptions still need to be tested carefully.

Why This Matters To Me

This thesis gave me a much clearer view of what "building AI agents" means in a real organizational environment.

It is not just prompt engineering. It is process design, data modeling, systems integration, user experience, governance, and evaluation.

It also reinforced why I enjoy working at the intersection of business processes and technology. The interesting part is not only whether an AI system can generate an answer. The interesting part is whether it can create a useful change in how work is done.

OTTO was my first larger end-to-end AI agent project in an enterprise context. It gave me hands-on experience with Microsoft Copilot Studio, agentic workflows, RAG, structured knowledge capture, Dataverse, Power Automate, and evaluation in a real organization.

Most importantly, it taught me to think about AI systems less as standalone tools and more as embedded organizational capabilities.

That is the kind of work I want to keep doing: designing and building AI systems that are technically grounded, useful in real workflows, and honest about their limitations.