Multimodal AI Chatbot
A conversational AI that processes both natural language and digital images, enabling mood-reactive visual transformations and context-aware dialogue.
Overview
This project explores what happens when you merge conversational AI with real-time image manipulation. Rather than treating text and vision as separate domains, I designed a system where the chatbot's mood and conversation context directly influence how it transforms images. A user can ask the bot questions, load an image, and then apply a filter that reflects the emotional tone of the ongoing chat. The core innovation is the mood-based colormap system: instead of generic filters, the bot applies OpenCV colormaps (Spring for "happy," Ocean for "sad," Jet for "hype") so every transformation feels contextually grounded in conversation.
My Role
Solo developer. I owned the full architecture: conversation logic, image I/O, filter design, and integration. I chose NLTK pattern matching + reflection as the conversational foundation because it allowed me to focus engineering effort on the vision side while still maintaining fluid dialogue. For images, I designed the mood-mapping system to make visual feedback feel less like a utility and more like a collaborative creative act.
Project Snapshot
Core Capability
Dual-mode processing: conversational NLP + image manipulation
Key Innovation
Mood-based colormap filters that mirror conversation context to visuals
Technical Stack
Python, NLTK, OpenCV, Naïve Bayes classifier
User Interaction
Cross-modal: chat prompts image actions; Image loading enriches conversation state
Key Contributions
- 01
Built a conversational core using NLTK pattern matching with pronoun reflection (ELIZA-style dialogue)
- 02
Implemented image I/O pipeline: file upload, preset library management, and display utilities
- 03
Developed three core image filters (grayscale, edge detection, Gaussian blur) plus mood-based colormap system
- 04
Created mood-context mapping so visual transformations reflect emotional state of conversation
- 05
Integrated Naïve Bayes classifier to automatically route user input to appropriate response handler
- 06
Built command parser for natural language filter requests (e.g., "apply edge to my image")
Research Questions
- ?
How can visual feedback reinforce conversational context and emotional tone?
- ?
What makes a chatbot feel "alive" vs. mechanical in multimodal interactions?
- ?
How can image processing serve as a tool for reflection rather than just automation?
- ?
What happens when dialogue and vision processing are tightly coupled rather than siloed?
Build Tracks
- 01
NLP conversation engine with context preservation across turns
- 02
Image pipeline: upload, storage, retrieval, and display management
- 03
Filter library: standard CV transforms + mood-based creative mapping
- 04
Integration layer bridging text commands to vision operations
- 05
Classifier for intent detection and routing
Outcome
The chatbot successfully processes both modalities in real time and feels cohesive; loading an image or applying a filter doesn't break the conversation; it extends it. The mood-based colormap system is the standout: it's a simple idea (map emotion names to color palettes), but it makes multimodal interaction feel intentional rather than bolted-on. Technically, all assignment requirements were met: custom image loading, preset library, multiple filter types, and a creative visual component. The larger takeaway: multimodal AI doesn't require massive models or complexity; it requires thoughtful integration of what each modality can express.
Gallery






Next Project
Spotify Recommendation Algorithm Research Zine