The first project we built in an Intro to Distributed Systems course (that I'm taking in the final semester
of my undergrad degree) was a single-node key-value store that is used to store vector representations of the
text from the course textbook and can be accessed through remote procedure calls (RPC). The course textbook was chunked
into semantically meaningful segments, then embedded into vectors and stored in an RPC-accessible store.
I was inspired to expand on this project by implementing my own Retrieval-Augmented Generation (RAG) pipeline that would
process lecture slide data from the three classes I'm taking this semester and store it in the key-value store, and implement
a Model Context Protocol (MCP) server that would allow me to query an AI agent about lecture material. I also wanted to ask
the agent more general questions about my classes and events on my calendar, so I built out additional MCP tools that would
call the Canvas and Google Calendar APIs.
I gained experience architecting and implementing a RAG pipeline, a gRPC-based Key-Value store server, and an MCP server in
this project. I built 6 MCP tools that have been super effective in allowing me to ask an AI agent questions about course schedules,
assignments, and lecture content, and get personalized, up-to-date responses. You can view the source code for the project
here.
Table of Contents
Technologies Used
RAG Pipeline
- Python
- Sentence Transformers (SBART): Used all-MiniLM-L6-v2 model to compute embeddings and for semantic search
Key-Value Store
- Python
- gRPC
MCP Server
- Python
- FastMCP
Usage Examples
Lecture Slide Query Response

Google Calendar Query Response
Architecture

- The lecture slide ingestion pipeline reads
.pdfand.docxlecture material, normalizes text, chunks content, generates embeddings, and writes JSONL records. - The ingestion client uploads the lecture slide text chunks and embeddings into the key-value store.
- The gRPC key-value server stores lecture text chunks and embeddings, supports health checks, streams embeddings for retrieval, and persists state to disk.
- FastMCP server registers all tools and provides one local MCP endpoint for AI assistants.
- Lecture search tools that perform semantic retrieval by comparing query embeddings against stored vectors and returning top matching passages.
- Canvas MCP tools for listing and filtering assignments, fetching assignment details, and retrieving grade/submission information by using the Canvas LMS API.
- Google Calendar MCP tools for listing calendars and retrieving events across day, week, and custom date windows using the Google Calendar API.
- User queries AI agents through the GitHub Copilot interface, which can use the MCP tools.
Available MCP Tools
search_lecture_slides: Retrieves the most relevant lecture slide passages for a query using semantic similarity.canvas_get_schedule: Returns Canvas calendar events for a date range.canvas_get_assignments: Returns Canvas assignments by due-date window.canvas_get_assignment_details: Returns assignment details for a specific assignment name.google_calendar_list_calendars: Returns all calendars available to the authenticated user.google_calendar_get_events: Returns Google Calendar events by time window.
Future Work
As of right now, I have to manually download lecture slides and put them into the data-processing/RAG/documents directory.
It'd be nice to have a script that would pull and download lecture slides from the Canvas API and integrate
it into the document ingestor script to avoid this manual download process, and ensure the vector embeddings
are up-to-date. I made an attempt to implement a script to do this when I first started working on this project, but quit once
I realized that some Canvas courses don't allow students to access the
Files API.
There should still be ways to download files programmatically using different resources that are available through the Canvas
LMS API aside from the Files API. For example, while I don't have access to the Files API in my STAT 4102 class,
I believe I can still download files from it using the
Modules API.
While the MCP tools help with tasks that include summarizing lecture content and answering questions
about assignment logistics and schedule information, there's still a lot of queries that a student may be
interested in asking an AI agent that it can't answer yet. Adding other documents on Canvas related to coursework
such as textbook PDFs, homework assignments, and syllabus material to be ingested into the RAG pipeline
and then building out additional corresponding MCP tools for this content will allow for a more generalized agent
that can answer more course-related questions, such as homework or textbook related questions.