Joe Anderson

Personal MCP Server Project

The first project we built in an Intro to Distributed Systems course (that I'm taking in the final semester of my undergrad degree) was a single-node key-value store that is used to store vector representations of the text from the course textbook and can be accessed through remote procedure calls (RPC). The course textbook was chunked into semantically meaningful segments, then embedded into vectors and stored in an RPC-accessible store.

I was inspired to expand on this project by implementing my own Retrieval-Augmented Generation (RAG) pipeline that would process lecture slide data from the three classes I'm taking this semester and store it in the key-value store, and implement a Model Context Protocol (MCP) server that would allow me to query an AI agent about lecture material. I also wanted to ask the agent more general questions about my classes and events on my calendar, so I built out additional MCP tools that would call the Canvas and Google Calendar APIs.

I gained experience architecting and implementing a RAG pipeline, a gRPC-based Key-Value store server, and an MCP server in this project. I built 6 MCP tools that have been super effective in allowing me to ask an AI agent questions about course schedules, assignments, and lecture content, and get personalized, up-to-date responses. You can view the source code for the project here.

Table of Contents

Technologies Used

RAG Pipeline

  • Python
  • Sentence Transformers (SBART): Used all-MiniLM-L6-v2 model to compute embeddings and for semantic search

Key-Value Store

  • Python
  • gRPC

MCP Server

  • Python
  • FastMCP

Usage Examples

Lecture Slide QueryLecture Slide Query Response
Canvas Schedule Query
Google Calendar QueryGoogle Calendar Query Response

Architecture

Data Flow DiagramData flow diagram for the MCP project
  • The lecture slide ingestion pipeline reads .pdf and .docx lecture material, normalizes text, chunks content, generates embeddings, and writes JSONL records.
  • The ingestion client uploads the lecture slide text chunks and embeddings into the key-value store.
  • The gRPC key-value server stores lecture text chunks and embeddings, supports health checks, streams embeddings for retrieval, and persists state to disk.
  • FastMCP server registers all tools and provides one local MCP endpoint for AI assistants.
  • Lecture search tools that perform semantic retrieval by comparing query embeddings against stored vectors and returning top matching passages.
  • Canvas MCP tools for listing and filtering assignments, fetching assignment details, and retrieving grade/submission information by using the Canvas LMS API.
  • Google Calendar MCP tools for listing calendars and retrieving events across day, week, and custom date windows using the Google Calendar API.
  • User queries AI agents through the GitHub Copilot interface, which can use the MCP tools.

Available MCP Tools

  • search_lecture_slides: Retrieves the most relevant lecture slide passages for a query using semantic similarity.
  • canvas_get_schedule: Returns Canvas calendar events for a date range.
  • canvas_get_assignments: Returns Canvas assignments by due-date window.
  • canvas_get_assignment_details: Returns assignment details for a specific assignment name.
  • google_calendar_list_calendars: Returns all calendars available to the authenticated user.
  • google_calendar_get_events: Returns Google Calendar events by time window.

Future Work

As of right now, I have to manually download lecture slides and put them into the data-processing/RAG/documents directory. It'd be nice to have a script that would pull and download lecture slides from the Canvas API and integrate it into the document ingestor script to avoid this manual download process, and ensure the vector embeddings are up-to-date. I made an attempt to implement a script to do this when I first started working on this project, but quit once I realized that some Canvas courses don't allow students to access the Files API. There should still be ways to download files programmatically using different resources that are available through the Canvas LMS API aside from the Files API. For example, while I don't have access to the Files API in my STAT 4102 class, I believe I can still download files from it using the Modules API.

While the MCP tools help with tasks that include summarizing lecture content and answering questions about assignment logistics and schedule information, there's still a lot of queries that a student may be interested in asking an AI agent that it can't answer yet. Adding other documents on Canvas related to coursework such as textbook PDFs, homework assignments, and syllabus material to be ingested into the RAG pipeline and then building out additional corresponding MCP tools for this content will allow for a more generalized agent that can answer more course-related questions, such as homework or textbook related questions.