Giant milestone, running text-to-image AI at home

Since Open AI announced their first machine learning model, DALL·E, for text-to-image generation in January 2021, new and better models have been popping out of the woodwork at an ever increasing pace. The last few months it’s been all over the internet and mainstream media, as more and more people are getting acces to play with the models and the internet is getting flooded with amazing images generated from text.

But the biggest milestone to date was, in my opinion, on the 22nd of August when stability.ai released their text-to-image model, Stable Diffusion, to the public. Quality wise Stable Diffusion is comparable to models like DALL·E 2 and Midjourney. The big difference is that it is free, open source and comes with a license that allows for both commercial and non-commercial usage – not only to the works you create but you can even include the model itself in your own products.

… and the best bit is that you can run the model at home on a graphics card, so no more waiting in a queue online to get your images generated. I just had to try it, so that’s what I spend all of last weekend doing…

At first, my son and I had a ball making silly pictures, many featuring Boris Johnson. Generating a 512×512 image only takes around 7 seconds on a RTX-2080Ti. That amounts to a lot of silly pictures in a very short time.

My first attempt at generating something pretty was “young woman wearing steam punk helmet, detailed face”. I ran it a few time until I got a nice result and after tweaking the option a bit I was pretty happy with the result except for some pretty bad errors around the mouth and eyes. But another AI model was quickly downloaded, GFPGAN for face restoration. Ran the image through the model and it fixed the problem. Next problem was the image resolution, it was only generated in 768×512 pixels due to video ram limitations on my videocard. Again another model to the rescue making it possible to upscale the image to a more useful 3072×2048 pixels.

Using the GFPGAN model for face restoration and an ESRGAN based model for upscaling.

By the end of the weekend the images had reached a hole new level. I learned from examples online and started using much longer promos and including artists names to get different styles. one or more of the following artists are used in some of the images; Greg Rutkowski x6, Brian Froud x2, H. R. Giger , Alex Grey, Magali Villeneuve, Jason Felix, Steve Argyle, Tyler Jacobson, Peter Mohrbacher, Jessica Rossier, Daniel Mijtens, Hieronymus Bosch, Anna Podedworna, Grant Wood, PJ Crook, Edward Hopper and artgerm. It is amazing how far you can push the model to use its memory of all the images it has trained on. I have only scratched the surface. There is already a term for it: “Prompt crafting”. Some will happily call it an art form, I see it more as a craft though.

I have no doubt that the industry is changing and these models will be a big part of it from now on. What industry? Actually any industry that works with visuals. There is so much the model can be used for already in its current state. It has only been 2 weeks since the model was released in the wild and there are already loads of graphical and web interfaces available for it. Even more impressive, a plugin for PhotoShop so the model can be used inside the program for both bleeding between images and changing them, and that after only 2 weeks! There are several more projects of that calibre – what will we have in 6 months?

12 images in 200 seconds. Have an idea, let the computer make 200 versions of it while you are out for lunch and maybe one of them could be the base for whatever you working on. Maybe 200 colour variations for a concept. Or maybe…

The model is also capable of image-to-image and that opens up for loads of possibilities to help the model to create exactly the images you are looking for. I haven’t tried it yet but I think I know what I’m doing next weekend.

20 images in different styles I kept from my experiments over the weekend. No human could generate that body of work in a weekend and they were picked from the around 2000 images generated while experimenting.

Try Stable Diffusion now

If you want to try right now, give the Stable Diffusion online demo a go.

How to install at home

There is actually no point in going through how I installed the software. I did it last weekend and there is already a lot of new software and tutorials making it even easier to install and work with. A quick search should find several good tutorials.

Is it safe to run the software locally?

There is no chance of a SkyNet scenario with your computer starting world war III. It is however possible to hide python code in the models that will be executed when you load the model. The problem is pythons pickle format that the models are saved in. Use common sense and only download models from sites you trust.

Upscaling

When it comes to upscaling I was using chaiNNer by Joey Ballentine. It’s open source and still only in alpha but already very useful for loading models for upscaling images and VERY promising.