Run Stable Diffusion on Your M1 Mac’s GPU

Daily Digest email

Get the top HN stories in your inbox every day.

usehackernews

Magnusviri[0], the original author of the SD M1 repo credited in this article, has merged his fork into the Lstein Stable Diffusion fork.

You can now run the Lstein fork[1] with M1 as of a few hours ago.

This adds a ton of functionality - GUI, Upscaling & Facial improvements, weighted subprompts etc.

This has been a big undertaking over the last few days, and I highly recommend checking it out. See the mac m1 readme [3]

[0] https://github.com/magnusviri/stable-diffusion

[1] https://github.com/lstein/stable-diffusion

[2] https://github.com/lstein/stable-diffusion/blob/main/README-...

jw1224

Brilliant, thank you! I just got OP's setup working, but this seems much more user-friendly. Giving it a try now...

EDIT: Got it working, with a couple of pre-requisite steps:

0. `rm` the existing `stable-diffusion` repo (assuming you followed OP's original setup)

1. Install `conda`, if you don't already have it:

    brew install --cask miniconda

2. Install the other build requirements referenced in OP's setup:

    brew install Cmake protobuf rust

3. Follow the main installation instructions here: https://github.com/lstein/stable-diffusion/blob/main/README-...

Then you should be good to go!

EDIT 2: After playing around with this repo, I've found:

- It offers better UX for interacting with Stable Diffusion, and seems to be a promising project.

- Running txt2img.py from lstein's repo seems to run about 30% faster than OP's. Not sure if that's a coincidence, or if they've included extra optimisations.

- I couldn't get the web UI to work. It kept throwing the "leaked semaphor objects" error someone else reported (even when rendering at 64x64).

- Sometimes it rendered images just as a black canvas, other times it worked. This is apparently a known issue and a fix is being tested.

I've reached the limits of my knowledge on this, but will following closely as new PRs are merged in over the coming days. Exciting!

johnfn

I followed all these steps, but I got this error:

> User specified autocast device_type must be 'cuda' or 'cpu'

> Are you sure your system has an adequate NVIDIA GPU?

I found the solution here: https://github.com/lstein/stable-diffusion/issues/293#issuec...

hhjinks

I had to manually install pytorch for the preload_models.py step to work, because ReduceOp wasn't found. Why even use anaconda if all the dependencies aren't included? Every time I touch an ML project, there's always a python dependency issue. How can people use a tool that's impossible to provide a consistent environment for?

lacker

You are completely correct that there are a lot of dependency bugs here, I would just like to pedantically complain that the issue in question is PyTorch supporting MPS, which is basically entirely a C++ dependency issue rather than a Python one. (PyTorch being mostly written in C++ despite having "py" in the name.) And yeah the state of C++ dependency management is pretty bad.

wokwokwok

FYI: black images are not just from the safety checker.

Yes, the safety checker will zero out images but can just turn it off with an “if False:”; Mostly black images are due to a bug, especially frustrating because it turns up on high step counts and means you’ve wasted time running it.

My experience has been roughly 2-4/32 of an image batch comes back black at the default settings, regardless of the prompt.

Just stamp out images in batches and discard the black ones.

toinewx

I was able not to have black images by using a different sampler

--sampler k_euler

full command:

"photography of a cat on the moon" -s 20 -n 3 --sampler k_euler -W 384 -H 384

jastanton

I tried that as well but resulted in an error:

AttributeError: module 'torch._C' has no attribute '_cuda_resetPeakMemoryStats'

https://gist.github.com/JAStanton/73673d249927588c93ee530d08...

philsnow

To get past `pip install -r requirements` I had to muck around with CFLAGS/LDFLAGS because I guess maybe on your system /opt/homebrew/opt/openssl is a symlink to something? On mine it doesn't exist, I just have /opt/homebrew/opt/openssl@1.1 symlinked to /opt/Cellar/somewhere.

The command that finally worked for me:

  python3 -m venv venv
  . venv/bin/activate
  CFLAGS="-I /opt/homebrew/opt/openssl@1.1/include" LDFLAGS="-L /opt/homebrew/opt/openssl@1.1/lib -L/opt/homebrew/Cellar/openssl@1.1/1.1.1q/lib -lssl -lcrypto" PKG_CONFIG_PATH="/usr/local/opt/openssl@1.1/lib/pkgconfig" GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1 pip install -r requirements.txt

dannywarner

Thank you with those extra steps I got it working now myself. At least I think thank you. My work productivity for the next few days might not agree.

sbierwagen

Instructions don't work here, dead ends at

  FileNotFoundError: [Errno 2] No such file or directory: 'models/ldm/stable-diffusion-v1/model.ckpt'

Looks like there's a step missing or broken at downloading the actual weights.

Going up to the parent repo points at a bunch of dead links or hugginface pages.

addandsubtract

You have to download the model from the huggingface[0] site first (requires a free account). The exact steps on how to link the file are then detailed here[1].

[0] https://huggingface.co/CompVis/stable-diffusion-v-1-4-origin... [1] https://github.com/lstein/stable-diffusion/blob/main/README-...

pugio

Can you describe how you did (/ are doing) this? Do you now need to use conda (as opposed to OPs pip only version)?

jw1224

See my edit for more info. (Just ironing out a couple of other issues I've found, so might update it again shortly)

bfirsh

Nice. We'll get this guide updated for this fork. Everything's moving so fast it's hard to keep track!

We struggled to get Conda working reliably for people, which it looks like lstein's fork recommends. I'll see if we can get it working with plain pip.

pugio

I really appreciate the use of pip > conda. Looking forward to the update for the repo!

bfirsh

Running lstein's fork with these requirements[0] but seeing this output[1]. Same steps as original guide otherwise.

Anyone got any ideas?

[0] https://github.com/bfirsh/stable-diffusion/blob/392cda328a69...

[1] https://gist.github.com/bfirsh/594c50fd9b2e6b173e31de753a842...

sork_hn

Same output for me also.

EDIT: https://github.com/lstein/stable-diffusion/issues/293#issuec... fixed it for me.

jw1224

Check my comment alongside yours, I got Conda to work but it did require the pre-requisite Homebrew packages you originally recommended before it would cooperate :)

wincy

I couldn't get the setup process working until I switched the python distro to 3.10, as the scripts were relying on typings features that were added in 3.10 even though the yml file specified 3.9. Was strange.

fragmede

Conda is recommended because it starts from a clean environment so you're not debugging 13 other experiments the user has going on.

yieldcrv

are there benchmarks?

I was following the github issue and the CPU bound one was at 4-5 minutes, the MDS one was at 30 seconds, then 18 seconds, and people were still calling that slow.

What is it currently at now?

and I don't know what "fast" is, to compare

What are the Windows 10 with nice Nvidia chips w/ CUDA getting? Just curious whats comprehensive

squeaky-clean

> What are the Windows 10 with nice Nvidia chips w/ CUDA getting?

Are you referring to single iteration step times, or whole images? Because obviously it depends on the number of iteration steps used.

Windows 10, RTX 2070 (laptop model), lstein repo. I get about 3.2 iter/sec. A 50 step 512x512 image takes me 15 seconds.

yieldcrv

I’m referring to there being a community effort to normalize performance metrics and results at all, with the M1 devices being in that list as well, so that we dont have to ask these questions to begin with

Are you aware of any wiki or table like that?

Aeolun

Huh, that’s the same speed I get on Collab. Pretty good.

runeb

Wow, that is over twice as fast as my Windows 11, RTX 3080ti

dmd

Wait, what? On my M1 imac I’m getting about 25 minutes. What am i doing wrong?

BrentOzar

It's falling back to CPU. Follow the instructions to use a GPU version - sometimes it's even a completely different repo, depending on whose instructions you're following.

zone411

Around 6 seconds.

solarkraft

I ran into:

ImportError: cannot import name 'TypeAlias' from 'typing' (/opt/homebrew/Caskroom/miniconda/base/envs/ldm/lib/python3.9/typing.py)

itsuka

I followed the conda instruction which uses Python 3.9 and ran into the same issue. The workaround is to import TypeAlias from typing_extensions:

stable-diffusion/src/k-diffusion/k_diffusion/sampling.py

(before)

  from typing import Optional, Callable, TypeAlias

(after)

  from typing import Optional, Callable
  from typing_extensions import TypeAlias

This issue is tracked in https://github.com/lstein/stable-diffusion/issues/302

wincy

you can also just change the python version in the yml file to 3.10.4 and it'll work

icedchai

I ran into this. You need Python 3.10. I had to edit environment-mac.yaml and set python==3.10.6 ...

xiphias2

I changed the dependency to 3.10.4 (tried 3.10.6 as well), installed python 3.10.4, deactivated and activated ldm environment, but it still uses python 3.9

kenrose

This worked for me too.

personjerry

TypeAlias is only used once, you can open sampling.py and remove the import on line 10 and the usage on line 14:

  from typing import Optional, Callable

  from . import utils

  TensorOperator = Callable[[Tensor], Tensor]

totetsu

What do I need for the in painting? is there a source for the models/ldm/inpainting_big/last.ckpt' file?

badc0ded

I used this: wget -O models/ldm/inpainting_big/last.ckpt https://heibox.uni-heidelberg.de/f/4d9ac7ea40c64582b7c9/?dl=... Found it here: https://huggingface.co/spaces/multimodalart/latentdiffusion/...

This worked afterwards: python scripts/inpaint.py --indir data/inpainting_examples/ --outdir outputs/inpainting_results

toinewx

Everything works excepts it only generates black images,

did you run

python scripts/preload_models.py

python scripts/dream.py --full_precision ?

arthurcolle

Disable safety check

sanroot99

What's the performance of these models ,how much pc spec required for sane operation?

Yido

Cool

bschwindHN

Everyone posting their pip/build/runtime errors is everything that's wrong with tooling built on top of python and its ecosystem.

It would be nice to see the ML community move on to something that's actually easily reproducible and buildable without "oh install this version of conda", "run pip install for this package", "edit this line in this python script".

zmmmmm

Interesting question to me is whether it's actually in part fundamental to the success of Python. There are plenty of "clean" ecosystems that avoid at least some of these issues. But in generally they fail to thrive in these type of spaces and people keep coming back to ecosystems that are messier.

Is it possible that creativity and innovation actually require a level of chaos to succeed? Or alternatively, that chaos is an inevitable byproduct of creativity and innovation, and any ecosystem where these are heavily frowned on deters the type of people who actually drive the cutting edge forward?

Just putting it out there as food for thought. It does translate back to quite often, that it acutally does make a lot of sense to do your prototyping and experimenting in one ecosystem but then leave that behind to deploy your production workloads where possible.

colordrops

JavaScript is a much cleaner ecosystem but for some reason there's a long running stigma against it. It would work fine for this use case.

brundolf

You can have an ecosystem that's chaotic (and vibrant) without its core pillars being chaotic (and shaky)

mrtksn

It's not just the python, this is the experience practically everywhere and that's why people create containers etc. It's excruciatingly hard to setup the environment to start doing anything productive these days, you can't just start coding unless you use an IDE like Xcode or PyCharm.

rattray

JS isn't perfect, but it's so much easier to deal with than Python in these regards.

jeroenhd

I've had the exact same issues with the JS ecosystem (ran into a problem where npm wouldn't work but yarn did, still haven't figured out why).

Both are easy and reliable with a few months of experience. Both are terrible if you rarely ever use them.

Tenoke

Js libs don't need to care for things like system packages and drivers as much as Python ML does.

LudwigNagasena

CommonJS vs ES6 module loading is already a nightmare.

konart

Not sure about that. Golang or Rust have no such problem in my experience.

mrtksn

Maybe the difference is the legacy and the community? Golang and Rust kind of languages are geared toward software engineers and are pretty new, meanwhile things like Python or JS or anything very popular are used as tools by people great at things that are not necessarily in the domain of software engineering but they created very useful libraries and tools regardless which usually means piles and layers of very useful code that has poor engineering.

I don't think this is particularly fair. This is literally hours old, and people installing now are really debugging rather than installing a "finished" build.

The forks are weird mashups of bits of repos, and running on a M1 GPU is something that barely works itself.

Give it maybe 3 months and it will be much smoother.

bschwindHN

I think it is fair, actually. This is not unique to hours-old python projects, it's a common theme among almost all python tools I've used.

I have a suspicion that that something written in Julia, Go, Rust, or possibly even C wouldn't have nearly this many issues. I'm not talking about debugging the actual functionality of the software, but rather the environment and tooling surrounding the language and software built with it.

This project in particular should be an easy case because you know the hardware you'll be running on ahead of time.

I'm ranting a bit, but I've tried so many tools based on python and almost _none_ of them built/installed/ran correctly on the happy path laid out in each project's readme.

Anyway, sorry, rant over.

Yes there is a fair amount of truth in that.

I do think that experience helps here. I have a recipe for installing Python that works on most python projects most of the time.

  git clone <project>
  python3 -m venv ./venv
  source ./venv/bin/activate
  pip install -r requirements.txt
  deactiviate # need to do this to include the correct command line tools in path (eg Jupyter)
  source ./venv/bin/activate

Done.

On a Linux or Intel Mac system this works with pretty much every reasonable Python project.

On M1 Macs the situation isn't great at the moment, though.

pdehaan

My anecdotal experience is that it briefly gets better, then much worse. Build scripts downloading tarballs from dead URLs and dependencies on unspecified versions of libraries that have since had breaking API changes are frequent issues.

…it’s still awesome that people put in the effort to do these things at all, but the tools often have a tendency to make me feel like an archaeologist trying to piece what ancient artifacts are missing and how they were supposed to all fit together.

sbierwagen

I ran into many of these same problems when trying to replicate ROS environments five years ago. It's not Python's fault, it's just academics being bad at releasing software. After all, why would they?

In a university, ten steps to replicate the PI's personal environment is perfectly fine. How would releasing a single binary make their life any easier? Why would they bother?

undefined

[deleted]

smoldesu

I found a pretty great Docker image for the SD webui. I was forced to extreme measures since NixOS isn't super friendly with Conda (and I was particularly lazy).

Worked out fine in the end, though. Highly recommended if you're on an Nvidia rig: https://github.com/AbdBarho/stable-diffusion-webui-docker

sanroot99

Docker the silver bullet /s

fulafel

Looking through the top level comments, there are only 2 like that and they are about build errors in dependent native libraries written in other languages, which is not really the Python ecosystem's fault (as the author has chosen not to distribute prebuilt stuff).

systemvoltage

I disagree, dependencies are sort of a universal problem. Ever had to set LD_LIBRARY_PATH?

Python is pretty much innocent here.

forrestthewoods

> Python is pretty much innocent here.

Could not possibly disagree more. https://xkcd.com/1987/

The problem with dependencies is that for very bad reasons people don’t ship them.

Someone needs to package SD with a full copy of the Python runtime and every dependency. This should be the default method of distribution.

#ShipYourDamnDependencies

jeroenhd

There are working CUDA Docker images that work both on Windows and Linux for stable diffusion so the package including dependencies already exists. It's just they the standard packaging method doesn't work well on Apple hardware.

black3r

Just don't mess up your system. The tutorial linked works perfectly fine in a clean python environment.

That XKCD is more about messing up your system by not knowing what you're doing but randomly following shitty tutorials which suggest stuff that collide with each other... Python's only fault in this is that it's a simple language and thus attracts people who aren't software engineers (students and math majors) and thus mostly don't know or care how to keep your system clean but love writing tutorials.

It's pretty easy to keep your pythons clean, don't use conda, never run pip with sudo, never run pip with --user..., never run pip outside a virtualenv (a good safety measure for that is to have pip point to nothing in your user shell, you can access system python with python3/pip3 if needed)... To check your python is clean, create a new environment and run pip freeze, it should output nothing.

pyenv is a non-destructive system for managing multiple pythons and virtualenvs (all pythons and envs get installed into ~/.pyenv), pip is a good system for distributing dependencies (when library authors don't skip out on providing binary wheels, and software authors use pip freeze to generate requirements.txt files).

joshstrange

It's insane to me how fast this is moving. I jumped through a bunch of hoops 2-3 days ago to get this running on my M1 Mac's GPU and now it's way easier. I imagine we will have a nice GUI (I'm aware of the web-ui, I haven't set it up yet) packaged in an mac .app by the end of next week. Really cool stuff.

addandsubtract

I hope this kickstarts some kind of M1 migration. There are so many ML projects I'd like to try, but they all depend on CUDA.

joshstrange

Yep, I was just thinking the same thing. M1/M2 appears to be a huge untapped resource for ML stuff as this proves. I maxed out my MBP Max and this is probably the first time I'm actually fully using the GPU cores and it's pretty freaking cool. Creating landscapes or fictional characters (think D&D) is already super fun, I look forward to playing with img2img some more as well.

zone411

The performance gap to the top-end Nvidia cards will get much larger as they release new cards later this year, though.

icedchai

Same here. My M1 Max's GPUs were basically idling until this came along!

bee_rider

Do they depend on CUDA, or are they just much better tuned for NVIDIA cards? I thought the whole ML ecosystem was based on training models and then running them on frameworks, where model was sorta like data and the framework handles the hardware? (albeit with models that can be tweaked to run more efficiently on different hardware) (I don't really know the ecosystem so it is definitely possible that they are more closely tied together than I thought).

joshvm

The latter. The major frameworks, at least, can be run in CPU-only mode, with a hardware abstraction layer for other devices (like CUDA-capable cards, TPUs etc). So practically it means you need an Nvidia GPU to get anywhere in a reasonable amount of time, but if you're not super dependent on latency (for inference) then CPU is an option. In principle, CPUs can run much bigger model inputs (at the expense of even more latency) because RAM is an order of magnitude more available typically.

upbeat_general

From my experience the bigger frameworks may have support for non-CUDA devices (that is not just the CPU fallback) but many smaller libraries and models will not, and will only have a CUDA kernel for some specialized operation.

I encounter this all the time in computer vision models.

inamberclad

I'd rather see something more platform agnostic. I'm sad OpenCL isn't a bigger success.

pixelpoet

OpenCL still works amazingly well on all platforms (e.g. two commercial programs of mine), it's just that everyone keeps saying it's dead and refusing to use it :(

danaris

Gods, yes. I've been trying to get various kinds of deepfake-related projects to work on M1 (or even just on Mac—the Intel Macs haven't come with Nvidia cards for years now) for some time now, as we're trying to generate video stimuli that present different people doing the same things, or the same person doing several different things, for psych research.

It's been an exercise in frustration.

fezfight

It's not really better to move from one closed eco system to another. We should collectively agree to strengthen more open platforms, shouldn't we?

dekervin

Just yesterday I read another comment on HN saying we will have to wait another decade before being able train it in someone "basement"( https://news.ycombinator.com/item?id=32658941 ). I made a bookmark for myself ( https://datum.alwaysdata.net/?explorer_view=quest&quest_id=q... ) to look for data that help estimate when it will be feasible to run Stable Diffusion "at home". I guess it's already outdated!

squeaky-clean

To run stable diffusion at home you have to download the model file, which took the equivalent of tens of thousands of hours spread across cloud provided GPUs.

If the model file just vanished from everyone's hard drive one day, and cloud providers installed heuristics to detect and ban image dataset training, retraining the model file would actually take decades for any consumer, even an enthusiast with a dozen powerful GPUs. The image dataset alone is 240TB.

dannyw

You forget how much mark-up cloud providers charge.

I trained StyleGAN 2 from scratch using 8x 3090s at home and it took 3 months. It's fine.

240TB is small fish, my homelab is a petabyte and I consider it small.

zone411

Umm training is not the same as running it.

sxp

Is there a good set of benchmarks available for Stable Diffusion? I was able to run a custom Stable Diffusion build on a GCE A100 instance (~$1/hour) at around 1Mpix per 10 seconds. I.e, I could create a 512x512 image in 2.5 seconds with some batching optimizations. A consumer GPU like a 3090 runs at ~1Mpix per 20 seconds.

I'm wondering what the price floor of stock art will be when someone can use https://lexica.art/ as a starting point, generate variations of a prompt locally, and then spend a few minutes sifting through the results. It should be possible to get most stock art or concept art at a price of <$1 per image.

fleddr

It can be even cheaper.

Midjourney, in case you appreciate their output, has an unlimited plan for 30$ a month. The only limitation is that if you're an extremely heavy user, they may "relax" you, which means results come in a bit slower.

Note that they've been also experimenting with a --beta parameter which basically means the algorithm uses StableDiffusion's algorithm behind the scenes, or you can use any of 4 versions of MidJourney's more stylistic algorithms.

So if you don't want to tinker or don't have a high-end GPU, it's a cheap way to play around. I have StableDiffusion running locally but still prefer MidJourney. I enjoy the stylistic output but it's also a highly social way to generate art. Everybody is doing it in the open.

Anyway, the stock art part is a hairy subject. You should assume that you AI image is not copyrighted. Which begs the question why they would pay at all.

esperent

>The only limitation is that if you're an extremely heavy user, they may "relax" you, which means results come in a bit slower.

You don't have to be an extremely heavy user. I used it for about an hour every evening and it took 11 days out of a month subscription for them to put me on relax mode.

The relax mode is based on how busy the service is. If usage is low, it's the same a fast mode. But other times its really slow.

That makes it unpredictable enough that it stopped being fun for me to use it. I've barely used midjourney since I got put on relaxed mode - it stopped feeling like I can jump on and play because I might hit a busy period and then it'll take 5 minutes to generate a prompt

That said, I could buy more hours of fast mode and I think it's still way cheaper than Dall-E or Dreamstudio

sowbug

Related: I wrote up instructions for running Stable Diffusion on GCE. I used a Tesla T4, which is probably the cheapest that can handle the original code. If you're spinning up an instance to play with, rather than to batch-process, then cheaper makes more sense because most of the machine's time is spent waiting for you to type stuff and look at the results.

https://sowbug.com/posts/stable-diffusion-on-google-cloud/

skybrian

So you’re estimating over a thousand generated images an hour and less than a tenth of a cent per image using the A100. If that turns out to be accurate, it seems like some online image generation will included in the price of the stock art.

(DreamStudio is charging a bit over one cent per generated image at default settings, depending on exchange rates.)

brokenodo

My 3170Ti will create 512x512 image in about 5-6 seconds with 50 inference steps

gregsadetsky

Bananas. Thanks so much... to everyone involved. It works.

14 seconds to generate an image on an M1 Max with the given instructions (`--n_samples 1 --n_iter 1`)

Also, interesting/curious small note: images generated with this script are "invisibly watermarked" i.e. steganographied!

See https://github.com/bfirsh/stable-diffusion/blob/main/scripts...

grishka

> Also, interesting/curious small note: images generated with this script are "invisibly watermarked" i.e. steganographied!

Why?

Retr0id

So that future iterations of StableDiffusion (or similar models) don't end up getting trained on their own outputs.

hifikuno

Oh wow, I didn't even think of that. I am pretty sure a few of the repo's have turned off the invisible watermark, I wonder if that will have consequences down the line for training data.

gregsadetsky

... so this means that watermarking an image you own is probably the only way to avoid it being used for training further models? :-)

cageface

After playing around with all of these ML image generators I've found myself surprisingly disenchanted. The tech is extremely impressive but I think it's just human psychology that when you have an unlimited supply of something you tend to value each instance of it less.

Turns out I don't really want thousands of good images. I want a handful of excellent ones.

chrisfrantz

Human curation will likely remain valuable into the future.

r3trohack3r

I've been playing with Stable Diffusion a lot the past few days on a Dell R620 CPU (24 cores, 96 GB of RAM). With a little fiddling (not knowing any python or anything about machine learning) I was able to get img2img.py working by simply comparing that script to the txt2img.py CPU patch. Was only a few lines of tweaking. img2img takes ~2 minutes to generate an image with 1 sample and 50 iterations, txt2img takes about 10 minutes for 1 sample and 50 generations.

The real bummer is that I can only get ddim and plms to run using a CPU. All of the other diffusions crash and burn. ddim and plms don't seem to do a great job of converging for hyper-realistic scenes involving humans. I've seen other algorithms "shape up" after 10 or so iterations from explorations people do online - where increasing the step count just gives you a higher fidelity and/or more realistic image. With ddim/plms on a CPU, every step seems to give me a wildly different image. You wouldn't know that steps 10 and steps 15 came from the same seed/sample they change so much.

I'm not sure if this is just because I'm running it on a CPU or if ddim and plms are just inferior to the other diffusion models - but I've mostly given up on generating anything worthwhile until I can get my hands on an nvida GPU and experiment more with faster turn arounds.

squeaky-clean

> You wouldn't know that steps 10 and steps 15 came from the same seed/sample they change so much.

I don't think this is CPU specific, this happens at these very low number of samples, even on the GPU. Most guides recommend starting with 45 steps as a useful minimum for quickly trialing prompt and setting changes, and then increasing that number once you've found values you like for your prompt and other parameters.

I've also noticed another big change sometimes happens between 70-90 steps. It's not all the time and it doesn't drastically change your image, but orientations may get rotated, colors will change, the background may change completely.

> img2img takes ~2 minutes to generate an image with 1 sample and 50 iterations

If you check the console logs you'll notice img2img doesn't actually run the real number of steps. It's number of steps multiplied by the Denoising Strength factor. So with a denoising strength of 0.5 and 50 steps, you're actually running 25 steps.

Later edit: Oh and if you do end up liking an image from step 10 or whatever, but iterating further completely changes the image, one thing you can do is save your output at 10 steps, and use that as your base image for the img2img script to do further work.

wokwokwok

https://github.com/Birch-san/stable-diffusion has altered txt2img to support img2img and added other samplers, see:

https://github.com/Birch-san/stable-diffusion/blob/birch-mps...

That branch (birch-mps-waifu) runs on M1 macs no problem.

schleck8

With the 1.4 checkpoint, everything under 40 steps can't be used basically and you only get good fidelity with >75 steps. I usually use 100, that's a good middleground.

auggierose

How do you change these steps in the given script? Is it the --ddim_steps parameter? Or --n_iter? Or ... ?

schleck8

With --ddim_steps

Aeolun

I found I got quite decent results with 15-30 steps when generating children’s book illustrations (of course, no expectation for hyperrealism there)

jw1224

Are we being pranked? I just followed the steps but the image output from my prompt is just a single frame of Rick Astley...

EDIT: It was a false-positive (honest!) on the NSFW filter. To disable it, edit txt2img.py around line 325.

Comment this line out:

    x_checked_image, has_nsfw_concept = check_safety(x_samples_ddim)

And replace it with:

    x_checked_image = x_samples_ddim

pja

That means the NSFW filter kicked in IIRC from reading the code.

Change your prompt, or remove the filter from the code.

johnfn

Haha, busted!

pja

To be fair, the reason the filter is there is that if you ask for a picture of a woman, stable diffusion is pretty likely to generate a naked one!

If you tweak the prompt to explicitly mention clothing, you should be OK though.

undefined

[deleted]

undefined

[deleted]

r3trohack3r

If you open up the script txt2img and img2img scripts, there is a content filter. If your prompt generated anything that gets detected as "inappropriate" the image is replaced with Rick Astley.

Removing the censor should be pretty straightforward, just comment out those lines.

nonethewiser

It bothers me that this isn't just configurable. Why would they not want to expose this as a feature?

undefined

[deleted]

Aeolun

Plausible deniability

joshmlewis

When the model detects NSFW content it replaces the output with the frame of Rick Astley.

rhacker

It's kind of amazing that ML can now intelligently rick roll people.

I think it would be awesome to update the rickroll feature to the following:

Auto Re-run the img2img with some text prompt: "all of the people are now Rick Astley" with low strength so it can adjust the faces, but not change the nudity!!!1

GordonS

Hah, it would be hilarious if it generated all the nudity you wanted - but with Rick Astley's face on every naked person!

werdnapk

To be fair, the developers added this "feature" and can easily be disabled in the code. The ML just says "this might be NSFW".

_2d30

Same thing happened to me which is especially odd as I literally just pasted the example command.

hackerlight

It has a lot of false positives. A lot of my portraits of faces were marked as NSFW. Possibly detecting proportion of the image that's skin color?

ntr--

Unrelated to stable diffusion, but I was showing DALL-E to my sister last night and a prompt with > Huge rubber tree set off the TOS violation filter.

AI alignment concerns are definitely overblown...

johnfn

For those as keen as I am to try this out, I ran these steps, only to run into an error during the pip install phase:

> ERROR: Failed building wheel for onnx

I was able to resolve it by doing this:

> brew install protobuf

Then I ran pip install again, and it worked!

geerlingguy

In the troubleshooting section it mentions running:

    brew install Cmake protobuf rust

To fix onnx build errors. I had the same issue.

jonplackett

What kind of speed does this run at? Eg. How long to make a 512x512 image at standard settings?

pwinnski

I haven't installed from this link specifically, but I used one of the branches on which this is based a few days ago, so the results should be similar.

On a first-gen M1 Mac mini with 8GB RAM, it takes 70-90 minutes for each image.

Still feels like magic, but old-school magic.

antihero

On an M1 Pro 16GB it is taking a couple minutes for each image.

Turing_Machine

A little over three minutes on a first-gen M1 iMac with 16GB.

It looks like memory is super-important for this (which isn't all that surprising, really...).

pwinnski

Installed from this link on a MacBook Pro (16-inch, 2021) with Apple M1 Pro and 16GB. First run downloads stuff, so I omit that result.

I had a YouTube video playing while I kicked off the exact command in the install docs, and got: 16.84s user 99.43s system 61% cpu 3:08.51 total

Next attempt, python aborted 78 seconds in! Weird.

Next attempt, with YouTube paused: 16.31s user 95.48s system 65% cpu 2:49.45 total

So around three minutes, I'd say.

moneycantbuy

For 512x512 on M1 MAX (32 core) with 64 GB RAM I'm getting 1.67it/s so 30.59s with the default ddim_steps=50.

colaco

I've gotten 1.35it/s that corresponds to 38s, but I've the M1 Max with the 24 cores GPU (the "lower end" one).

jw1224

On my M1 Pro MBP with 16GB RAM, it takes ~3 minutes.

johnfn

Looks like I'm getting around 4s per iteration on my M1 Max. At 50 iterations, that's 200 seconds.

_ph_

On my M2 Air, 16G, 10 CPU cores, the default command as in the installing instructions takes like 2m20s.

chime

MacBook Air M2 8 CPU 8GB the example apple image took 35mins. Guess I'll wait for now.

NwpierratorR

You clearly doing something wrong as I get about 3 minutes per image on m1 mac mini.

But yeah, at this stage most of guides are early hacks and require individual tweaking. It is quite expected that people get varying results. I assume in a week or a month situation will get much better and much more user-friendly.

chemeng

Getting around 4 minutes per image on M1 MacBook Air 16GB

dominicl

Hm, taking 2 hours on my M1 MacBook Air 16GB and it's clearly swapping. Are you using model v1.4? Or any other memory optimization that you applied?

whywhywhywhy

M1 Max (32gb) is around 35 seconds per image.

leetbulb

I just had to:

> brew link protobuf --overwrite

Don't blindly run this command unless you understand what you're doing.

matsemann

Python dependency hell in a nutshell. Impossible to distribute ML projects that can easily be ran.

ChildOfChaos

Is there anyway to keep up with this stuff / beginners guide? I really want to play around with it but it's kinda confusing to me.

I don't have an M1 Mac, I have an Intel one with an AMD GPU, not sure if i can run it? don't mind if it's a bit slow, or what is the best way of running it in the cloud? Anything that can product high res for free?

Karuma

Yes, you can run it on your Intel CPU: https://github.com/bes-dev/stable_diffusion.openvino

And this should work on an AMD GPU (I haven't tried it, I only have NVIDIA): https://github.com/AshleyYakeley/stable-diffusion-rocm

There are also many ways to run it in the cloud (and even more coming every hour!) I think this one is the most popular: https://colab.research.google.com/github/altryne/sd-webui-co...

EddySchauHai

https://beta.dreamstudio.ai/dream

It's not free but I've played with it a lot over the last two days for around $10, generating the most complex photos I can (1024x1024, 150 steps, 9 images, etc)

holoduke

follow this guide: https://github.com/lstein/stable-diffusion/blob/main/README-...

i am runnig it on my 2019 intel macbook pro. 10 minutes per picture

Daegalus

I wrote a guide for AMD.

https://yulian.kuncheff.com/stable-diffusion-fedora-amd/

it's for, but it could be adapted to Linux as.lomg as you install the right drivers and such.

yreg

Have you managed to set it up? I might have the same computer as you.

ChildOfChaos

Not yet, I haven't had much time to look into it all yet.

Looks like it's going to be a lot of fun though.

amelius

I'd rather see someone implemented glue that allows you to run arbitrary (deep learning) code on any platform.

I mean, are we going to see X on M1 Mac, for any X now in the future?

Also, weren't torch and tensorflow supposed to be this glue?

nathas

Broadly speaking, it looks like they are. The implementation of Stable Diffusion doesn't appear to be using all of those features correctly (i.e. device selection fails if you don't have CUDA enabled even though MPS (https://pytorch.org/docs/stable/notes/mps.html) is supported by PyTorch.

Similar goes for quirks of Tensorflow that weren't taken advantage of. That's largely the work that is on-going in the OSX and M1 forks.

davedx

I got stuck on this roadblock, couldn’t get CUDA to work on my Mac, was very confusing

cercatrova

That's because CUDA is only for Nvidia GPUs and Apple doesn't support Nvidia GPUs, it has its own now.

desindol

Didn’t apple stop supporting Nvidia cards like 5 years ago? How could it be confusing that Cuda wouldn’t run?

dustingetz

    (base)   stable-diffusion git:(main) conda env create -f environment.yaml
    Collecting package metadata (repodata.json): done
    Solving environment: failed
    
    ResolvePackageNotFound:
      - cudatoolkit=11.3

oh i was following the github fork readme, there is a special macos blog post

scoopertrooper

If you look at the substance of the changes being made to support Apple Silicon, they're essentially detecting an M* mac and switching to PyTorch's Metal backend.

So, yeah PyTorch is correctly serving as a 'glue'.

https://github.com/CompVis/stable-diffusion/commit/0763d366e...

bfirsh

As mentioned in sibling comments, Torch is indeed the glue in this implementation. Other glues are TVM[0] and ONNX[1]

These just cover the neural net though, and there is lots of surrounding code and pre-/post-processing that isn't covered by these systems.

For models on Replicate, we use Docker, packaged with Cog for this stuff.[2] Unfortunately Docker doesn't run natively on Mac, so if we want to use the Mac's GPU, we can't use Docker.

I wish there was a good container system for Mac. Even better if it were something that spanned both Mac and Linux. (Not as far-fetched as it seems... I used to work at Docker and spent a bit of time looking into this...)

[0] https://tvm.apache.org/ [1] https://onnx.ai/ [2] https://github.com/replicate/cog

code51

Without k-diffusion support, I don't think this replicates Stable Diffusion experience:

https://github.com/crowsonkb/k-diffusion

Yes, running on M1/M2 (MPS device) was possible with modifications. img2img and inpainting also works.

However you'll run into problems when you want k-diffusion sampling or textual inversion support.

Birch-san

stable-diffusion supports k-diffusion just fine on M1. You just have to detach a tensor in to_d() to stop the values exploding to infinity. https://twitter.com/Birchlabs/status/1563622002581184517?s=2...

code51

I've been following your MPS branch and have run it but couldn't address the issue without this explanation. Thank you!

Daily Digest email

Get the top HN stories in your inbox every day.