Multimodal AI Chatbot

A conversational AI that processes both natural language and digital images, enabling mood-reactive visual transformations and context-aware dialogue.

CompletedSemester Project·PythonNLPNLTKOpenCVImage ProcessingChatbot

Overview

This project explores what happens when you merge conversational AI with real-time image manipulation. Rather than treating text and vision as separate domains, I designed a system where the chatbot's mood and conversation context directly influence how it transforms images. A user can ask the bot questions, load an image, and then apply a filter that reflects the emotional tone of the ongoing chat. The core innovation is the mood-based colormap system: instead of generic filters, the bot applies OpenCV colormaps (Spring for "happy," Ocean for "sad," Jet for "hype") so every transformation feels contextually grounded in conversation.

My Role

Solo developer. I owned the full architecture: conversation logic, image I/O, filter design, and integration. I chose NLTK pattern matching + reflection as the conversational foundation because it allowed me to focus engineering effort on the vision side while still maintaining fluid dialogue. For images, I designed the mood-mapping system to make visual feedback feel less like a utility and more like a collaborative creative act.

Project Snapshot

Core Capability

Dual-mode processing: conversational NLP + image manipulation

Key Innovation

Mood-based colormap filters that mirror conversation context to visuals

Technical Stack

Python, NLTK, OpenCV, Naïve Bayes classifier

User Interaction

Cross-modal: chat prompts image actions; Image loading enriches conversation state

Key Contributions

  • 01

    Built a conversational core using NLTK pattern matching with pronoun reflection (ELIZA-style dialogue)

  • 02

    Implemented image I/O pipeline: file upload, preset library management, and display utilities

  • 03

    Developed three core image filters (grayscale, edge detection, Gaussian blur) plus mood-based colormap system

  • 04

    Created mood-context mapping so visual transformations reflect emotional state of conversation

  • 05

    Integrated Naïve Bayes classifier to automatically route user input to appropriate response handler

  • 06

    Built command parser for natural language filter requests (e.g., "apply edge to my image")

Research Questions

  • ?

    How can visual feedback reinforce conversational context and emotional tone?

  • ?

    What makes a chatbot feel "alive" vs. mechanical in multimodal interactions?

  • ?

    How can image processing serve as a tool for reflection rather than just automation?

  • ?

    What happens when dialogue and vision processing are tightly coupled rather than siloed?

Build Tracks

  • 01

    NLP conversation engine with context preservation across turns

  • 02

    Image pipeline: upload, storage, retrieval, and display management

  • 03

    Filter library: standard CV transforms + mood-based creative mapping

  • 04

    Integration layer bridging text commands to vision operations

  • 05

    Classifier for intent detection and routing

Outcome

The chatbot successfully processes both modalities in real time and feels cohesive; loading an image or applying a filter doesn't break the conversation; it extends it. The mood-based colormap system is the standout: it's a simple idea (map emotion names to color palettes), but it makes multimodal interaction feel intentional rather than bolted-on. Technically, all assignment requirements were met: custom image loading, preset library, multiple filter types, and a creative visual component. The larger takeaway: multimodal AI doesn't require massive models or complexity; it requires thoughtful integration of what each modality can express.

Gallery

Image 1
Image 2
Image 3
Image 4
Image 5
Image 6

Next Project

Spotify Recommendation Algorithm Research Zine