Building a Giphy-like Stable Diffusion Slackbot
Slack is our company’s default instant messenger (IM) app and I love it because of its ease of integration with 3rd party (and custom) services. Take Giphy, for example. The ability to add amusing gifs into text-based chat is a powerful tool to defuse tense conversations, express emotion, add visual humour to the stream task-orientated jibber-jabber, and even evoke a soft chortle from wherever you’ve set up office in your new hybrid work-from-home-work-Starbucks life.
Being a fan of the visual medium of expression, I’ve also been taken with the boom in AI/ML Visual Language Models (VML) like Dall-E, MidJourney and Stable Diffusion. The ability to create mind-blowing images from a text prompt is a technological leap that is sure to disrupt any industry that works with visuals.
Combining both of these tools seemed like a fun thing to do, so I created a Giphy-like Slackbot that takes in prompts and returns a AI generated image back to the user, harnessing the power of Stable Diffusion, the open source VLM created by Stability.ai.
Yeah, I know, like MidJourney’s discord space.
End user flow
The end user experience is pretty simple.
- User types the slash command /stablediffusion and their prompt text. For example, ‘/stablediffusion a cat riding a bicycle’
2. Stable Diffusion bot notifies the user that the request is being processed.
3. When the picture is ready, the Stable Diffusion bot posts the picture in the channel.
Technical flow
The technical flow is described in the diagram below.
Steps:
- User types /stablediffusion a cat riding a bicycle.
- Slack sends slash command payload to Lambda function.
- Slack command handler creates message in SQS for asynchronous processing, and returns a response to Slack that the image will appear when ready.
- Stable Diffusion Image handler picks up new message in the queue and extracts the prompt.
- Lambda gets image by calling Stable Diffusion API passing in text prompt and image generation parameters.
- Lambda saves the image to an s3 bucket, accessible over HTTPS.
- Lambda returns response payload to Slack, including URL to new image.
- A picture of a ‘Cat riding a bike’ appears in slack.
Setup and Configuration
This isn’t a forensically detailed set of instructions, but hopefully provides enough of a guide to help you get it going and steer you in the right direction. As with all technical solutions, there are many ways to skin a cat (riding a bicycle) and there are definitely more simply ways to do this using platforms such as Google Colab. Perhaps I’ll try do the same with Google Colab and do some cost comparisons.
Prerequisites
This solution utilises the following technologies:
- A local environment running Windows, Mac OS, or Linux. I recommend Mac OS or Linux for less debugging issues.
- A local installation of Docker, Python (latest stable version), Terraform, AWS CLI, BentoML.
- Slack. You will need admin access to a slack workspace to create a Bot for your workspace.
- AWS. You will need admin access to an AWS subscription to spin up the required resources for the Stable Diffusion Bot.
The new technology in this list to me was BentoML. BentoML is a Python framework that makes it easy to create and deploy Machine Learning services at scale.
Setup Stable Diffusion on AWS
Start with setting up Stable Diffusion on AWS. I followed this tutorial on BentoML. At a high level, the tutorial describes how to download a pre-built Stable Diffusion Model, or build your own; build a Docker image of the Stable Diffusion Model and Web API and push to AWS ECS, and deploy the Docker image onto a chunky AWS Virtual Server with GPU firepower.
Some tips and gotchas in getting Stable Diffusion running on AWS.
- Use Linux or a Mac. I initially tried on Windows and ended up wasting a lot of issue with invalid paths and configuration.
- The dependencies listed in the BentoML tutorial are not complete. The pre-requisites listed are a full-er list of what is required to deploy a Stable Diffusion model to EC2.
- I built the Bento Model locally so I could build and run it myself to test it out before deploying into onto an EC2 instance on AWS. It does take a long time on a laptop to run prompts with no GPU, so keep your inference steps down to save yourself some valuable time.
Once you’ve successfully setup Bento you will have access to the Stable Diffusion Rest API. You will be using the URL for this in your Lambda functions.
Setup Lambda Functions and SQS
A created the following services in AWS to handle Slackbot commands, create images based on prompts, and return the generated image back to the originating slack channel.
Lambda function: Slack Command Handler
A Lambda function written in Python (3.9) which receives a Slash command event payload from the Stable Diffusion Slack bot. The handler saves the event body onto an SQS queue and returns a message back to the Slackbot that the message is being processes.
Some notes on this function:
- The event sent by the Slackbot contains the prompt submitted by the user. This is extracted later by the Stable Diffusion Image Generator function.
- You can use an API Gateway to provide an external endpoint for the function (as a Lambda proxy), or enable the Function URL in the Lambda configuration settings.
- You can create your own rich messages to send back to Slack using blocks. Read more here.
Lambda function: Slack Command Handler
A Lambda function written in Python (3.9) which does the following:
- Receives the SQS message posted by the Slack Command Handler
- Extracts the image prompt and return URL (a temporary URL generated by Slack that can be posted back to the originating channel within 30 minutes),
- Calls the Stable Diffusion API to generate the image and waits for the image to be generated
- Saves the generated image to a publicly accessible S3 bucket. I used a GUID for file name.
- Returns a response to Slack with the generated Image URL and the original prompt.
Setup a Slackbot
Finally, create a Slackbot in your Slack workspace. I created a new App called Stable Diffusion configured with the following:
- Slash Command. Within the Slackbot, create a slash command with the Request URL of the Slack Command Handler.
- I didn’t have to setup OAuth scopes or web hooks as this was implicit in the Return URL contained within the the Slack command event payload.
And So…?
After getting this setup in our Slack workspace and sharing it with a small group, I did see some behaviour patterns start to emerge in terms of finding amusing conversational moments where a machine generated image could add some visual flavour and colour to the chat.
However, I did shut it down soon after as I didn’t want to clock up lots of compute costs for running the large EC2 instance. And so, the Stable Diffusion slack bot is now temporarily closed. Perhaps its doors will open again if a use case appears which warrants spending money on Cats on Bikes.