Multimodal AI Chatbot
A conversational AI that processes both natural language and digital images, enabling mood-reactive visual transformations and context-aware dialogue.
Project Snapshot
Core Capability
Dual-mode processing: conversational NLP + image manipulation
Key Innovation
Mood-based colormap filters that mirror conversation context to visuals
Technical Stack
Python, NLTK, OpenCV, Naïve Bayes classifier
User Interaction
Cross-modal: chat prompts image actions; Image loading enriches conversation state
Key Contributions
- Built a conversational core using NLTK pattern matching with pronoun reflection (ELIZA-style dialogue)
- Implemented image I/O pipeline: file upload, preset library management, and display utilities
- Developed three core image filters (grayscale, edge detection, Gaussian blur) plus mood-based colormap system
- Created mood-context mapping so visual transformations reflect emotional state of conversation
- Integrated Naïve Bayes classifier to automatically route user input to appropriate response handler
- Built command parser for natural language filter requests (e.g., "apply edge to my image")
Research Focus
- How can visual feedback reinforce conversational context and emotional tone?
- What makes a chatbot feel "alive" vs. mechanical in multimodal interactions?
- How can image processing serve as a tool for reflection rather than just automation?
- What happens when dialogue and vision processing are tightly coupled rather than siloed?
Active Build Tracks
- NLP conversation engine with context preservation across turns
- Image pipeline: upload, storage, retrieval, and display management
- Filter library: standard CV transforms + mood-based creative mapping
- Integration layer bridging text commands to vision operations
- Classifier for intent detection and routing
Project Visual
Multimodal AI Chatbot
No original project images available
Overview
This project explores what happens when you merge conversational AI with real-time image manipulation. Rather than treating text and vision as separate domains, I designed a system where the chatbot's mood and conversation context directly influence how it transforms images. A user can ask the bot questions, load an image, and then apply a filter that reflects the emotional tone of the ongoing chat. The core innovation is the mood-based colormap system—instead of generic filters, the bot applies OpenCV colormaps (Spring for "happy," Ocean for "sad," Jet for "hype") so every transformation feels contextually grounded in conversation.
My Role
Solo developer. I owned the full architecture: conversation logic, image I/O, filter design, and integration. I chose NLTK pattern matching + reflection as the conversational foundation because it allowed me to focus engineering effort on the vision side while still maintaining fluid dialogue. For images, I designed the mood-mapping system to make visual feedback feel less like a utility and more like a collaborative creative act.
Outcome
The chatbot successfully processes both modalities in real time and feels cohesive—loading an image or applying a filter doesn't break the conversation; it extends it. The mood-based colormap system is the standout: it's a simple idea (map emotion names to color palettes), but it makes multimodal interaction feel intentional rather than bolted-on. Technically, all assignment requirements were met: custom image loading, preset library, multiple filter types, and a creative visual component. The larger takeaway: multimodal AI doesn't require massive models or complexity—it requires thoughtful integration of what each modality can express.
Next Project
Spotify Recommendation Algorithm Research Zine