Easy Stable Diffusion XL in your device, offline

Daily Digest email

Get the top HN stories in your inbox every day.

rgbrgb

Just installed, this is very cool. Local AI is the future I want (and what I'm working on too). A few notes using it...

Pros:

- seems pretty self contained

- built in model installer works really well and helps you download anything from CivitAI (I installed https://civitai.com/models/183354/sdxl-ms-paint-portraits)

- image generation is high quality and stable

- shows intermediate steps during generation

Cons:

- downloads 6.94GB SDXL model file somewhere without asking or showing location/size. Just figured out you can find/modify the location in the settings.

- very slow on first generation as it loads the model, no record of how long generations take but I'd guess a couple minutes (m1 max macbook, 64GB)

- multiple user feedback modules (bottom left is very intrusive chat thing I'll never use + top right call for beta feedback)

- not open source like competitors

- runs 7 processes, idling at ~1GB RAM usage

- non-native UX on macOS, missing hotkeys you'd expect, help menu. electron app?

Overall 4/5 stars, would open again :)

liuliu

You should check out Draw Things on macOS. It works well enough for SDXL on 8GiB macOS devices.

rgbrgb

Thanks. Yeah I played with your app early on and just fired it up again to see the progress. Frankly I find the interface pretty intimidating but it is cool that you can easily stitch generations together.

Unsolicited UX recs:

- strongly recommend a default model. The list you give is crazy long. It kind of recommends SD 1.5 in the UI text below the picker but has the last one selected by default. Many of them are called the same thing (ironically the name is "Generic" lol).

- have the panel on the left closed by default or show a simplified view that I can expand to an "advanced" view. Consider sorting the left panel controls by how often I would want to edit them (personally I'm not going to touch the model but it is the first thing).

You are doing great work but I wouldn't underestimate the value of simplifying the interface for a first-time user. It seems to have a ton of features but I don't know what I should actually be paying attention to / adjusting.

Is there a business model attached to this or do you have a hypothesis for what one might look like?

liuliu

Agreed on UX feedback. It accumulated a lot of crufts from the old technologies to the new. This just echos my early feedback that co-iterating UI and the technology is difficult, you'd better pick the side you want to be on and there is only one correct side (and unfortunately, the current app is trying hard to be on both-side).

miles

Are you the developer by any chance? If so, it would be helpful to state it.

liuliu

I am. I thought this is obvious. My statement is objective. I would go as far as: it is the only app works at 8GiB macOS devices with SDXL-family models.

drcongo

I've been generating stuff non-stop in Draw Things for a few days, it's very good. Agree with the comments elsewhere about the rather overwhelming UI, and I have only one feature request: let us input the number of images we want to generate - the 100 limit means I keep having to check if it's finished to restart it.

rcarmo

Any plans for SD Turbo? Both base and XL models would be a great fit for a mobile device.

heyyeah

Draw Things is amazing. Great work and thanks for developing it!

maxdaten

If you are interested in the tech-stack:

https://noiselith.notion.site/License-61290d5ed7ab4c918402fd...

So yes, it is an electron app with svelte, headless-ui, tailwindcss etc

sytelus

+1 for asking download location.

philote

Another con is it only works on Silicon Macs.

Vicinity9635

Apple Silicon* I presume?

This could honestly be the excuse I need (want) to order an absolute beast of a macbook pro to replace my 2013 model.

wayfinder

If you want an absolute beast, especially for this stuff, you probably want Intel + Nvidia. Apple Silicon is a beast in power efficiency but a top of the line M3 does not come close to the top of the line Intel + Nvidia combo.

quitit

If it's just for hobby/interest work, then just a heads-up that even the 1st generation Apple Silicon will turn over about one image a second with SDXL Turbo. The M3s of course are quite a bit faster.

The performance gains in recent models and PyTorch are currently outpacing hardware advances by a significant margin, and there are still large amounts of low-hanging fruit in this regard.

mtlmtlmtlmtl

Is that 1GB idle per process or total for all 7 processes?

mikae1

> not open source like competitors

Who are the competitors?

quitit

DiffusionBee: AGPL-3.0 license (Native app)

InvokeAI: Apache license 2.0 (web-browser UI)

automatic1111: AGPL-3.0 license (web-browser UI)

ComfyUI:GPL-3.0 license (web-browser UI)

There's more, but I don't pay enough attention to it

mikae1

Thanks! https://lmstudio.ai/ too. For the more technically inclined perhaps.

0xDEADFED5

for people with Intel video cards (all 10 of us!) there's also SD.Next (automatic1111 fork): https://github.com/vladmandic/automatic

8n4vidtmkvmk

I like ComfyUI the most now but it's probably not the most beginner friendly. But has great features, is extensible, and you can build workflows that work for you and save them so you don't have to click a million times like Auto1111.

vunderba

I'd also recommend InvokeAI, an open source offering which has a very nice editable canvas and is very performant with diffusers.

https://github.com/invoke-ai/InvokeAI

UberFly

I just installed InvokeAI and wish I hadn't. It installs -so much- outside of its target directory. A1111 and ComfyUI are fairly self contained where you put them.

throwaway234a

[dead]

skrowl

[dead]

sophrocyne

There are already a number of local, inference options that are (crucially) open-source, with more robust feature sets.

And if the defense here is "but Auto1111 and Comfy don't have as user-friendly a UI", that's also already covered. https://github.com/invoke-ai/InvokeAI

internet101010

I switched to InvokeAI and won't go back to basic a1111 webui. I like how everything is laid out, there are workflow features, you can easily recall all properties (prompt, model, lora, etc.) used to generate an image, things can be organized into boards, and all off the boards/images/metadata are stored in a very well-designed sqlite database that can be tapped into via DataGrip.

quitit

automatic1111: great for the fast implementation of the most recent generative features

comfyui: excellent for workflows and recalling the workflows, as they're saved into the resulting image metadata (i.e. sharing images, shares the image generation pipeline)

InvokeAI: Great UX and community, arguably were a bit behind in features as they were focused on making the UI work well. Now at the stage of bringing in the best features of competitors - Like you, I can easily recommend it above all other options.

squeaky-clean

> recalling the workflows, as they're saved into the resulting image metadata (i.e. sharing images, shares the image generation pipeline)

Doesn't a1111 already do this? Theres a PNG Info tab where you can drag and drop a PNG and it will pull all the prompt, inverse prompt, model, etc. And then a button to send it to the main generation tab. It doesn't automatically load the model, but that may be intentional because of how long it takes to change loaded models.

holoduke

Can you actually use those workflows in some sort of API from a script to automate it from lets say a python script. Played arround with comfy. Really nice, but i would like to automate it within my own environment.

didibus

It's just missing too many features for me still, even though I like what it has better. I use things like segment-anything, customer upscalers, I prefer how inpainting is controlled in A1111 where you can say if you want whole image or mask area only, etc.

I've personally been using SD.Next, which is a fork of A1111 with support for the diffuser backend, a cleaned-up UI, and also sometimes has support for newer things before A1111, though not always. It's plugin compatible with A1111.

GaggiX

Also just Krita with the diffusion AI plugin: https://github.com/Acly/krita-ai-diffusion

TaylorAlexander

Yeah "Run Stable Diffusion locally" is a weird pitch since that's already easy to do tbh.

blehn

No idea whether or not the UI is user-friendly, but the installation steps alone for InvokeAI are already a barrier for 99.9% of the world. Not to say Noiselith couldn't be open-source, but it's clearly offering something different from InvokeAI.

demosthanos

I can't even figure out how one would install Noiselith. It has some text that says "Download for free on your PC", but it's not a button or a link. Maybe they're doing some weirdly locked-down user-agent sniffing and refuse to allow me to even attempt to download any version on Linux?

InvokeAI is installed via a script, sure, but it's also just a few clicks: download, extract, double-click on a specific file, enjoy.

blehn

There are two giant download buttons on the Noiselith homepage. The mac button downloads a dmg and the windows button downloads an exe.

smcleod

Yeah invokeAI is fantastic!

brucethemoose2

I would highly recommend Fooocus to anyone who hasn't tried: https://github.com/lllyasviel/Fooocus

There are a bajillion local SD pipelines, but this one is, by far, the one with the highest quality output out-of-the-box, with short prompts. Its remarkable.

And thats because it integrates a bajillion SDXL augmentations that other UIs do not implement or enable by default. I've been using stable diffusion since 1.5 came out, and even having followed the space extensively, setting up an equivalent pipeline in ComfyUI (much less diffusers) would be a pain. Its like a "greatest hits and best defaults" for SDXL.

stavros

I was afraid of the Python setup (even though I'm a Python developer), but yep: Make the virtualenv, install the dependencies, done. This is amazing, the images it generates are immediately beautiful.

It does look bad that it bundles GTM, though, as a sibling commenter says.

Samples:

https://imgz.org/i9oicVqo/

https://imgz.org/i8Ur3WjW/

https://imgz.org/i5j6r6TZ/

brucethemoose2

Be sure to try the styles as well. Thats actually a seperate input than the prompt for SDXL, and most other UIs dont implement the style prompting.

dragonwriter

> Be sure to try the styles as well. Thats actually a seperate input than the prompt for SDXL.

No, its not.

There are two text encoders, but they aren't really “prompt” and “style” inputs.

> and most other UIs dont implement the style prompting.

Most UIs default mode of operation sends the same input to both text encoders, but at least comfy has nodes that support sending separate text to them. OTOH, while there may be some cases where sending different text to the two encoders helps in a predictable way, AFAIK most of the testing people has done has shown that optimal prompt adherence usually comes from sending the same to both.

neilv

Looks like the Web UI of the self-hosted install of Fooocus sells out the user to Google Tag Manager.

Can our entire field please realize that running this surveillance is a bad move, and just stop doing it.

stoobs

I think that's coming from gradio?

SV_BubbleTime

Probably, auto1111 does the same - but I agree with GP it shouldn’t be there.

If it isn’t explicitly surveillance, it could effectively be.

pmarreck

Have to build it yourself on Mac, and we all know how "fun" building Python projects is

jessepasley

Just spent about 10 minutes building it on MacBook Pro M1. I come with significant bias against Python projects, but getting Fooocus to run was very, very easy.

pmarreck

I finally had a chance to set it up and yes, it works great!

pmarreck

That's good to know!

liuliu

Yeah, Fooocus is much better if you are going for the best local generated result. Lvmin puts all his energy into making beautiful pictures. Also it is GPL licensed, which is a + in my book.

stoobs

Eh, I messed around with it for a while - it's okay and good for beginners, but without much more effort you can get better results out of A1111 or ComfyUI

calamari4065

Is this at all usable on a CPU-only system with a ton of RAM?

brucethemoose2

Not really. There is a very fast LCM model preset now, but its still going to be painful.

SDXL in particular isn't one of those "compute light, bandwidth bound" models like llama (or Fooocus's own mini prompt expansion llm that in fact runs on the CPU).

There is a repo focused on CPU-only SD 1.5.

calamari4065

Yeah, llama runs acceptably on my server, but buying a GPU and setting it all up seems really unfun. Also much more expensive than my hobby budget

airesearch69

i use the same. Any ideas where to find actually (not outdated) guides on how to create your own "modell" out of the most similiar picutures of my dream modell? want to use it further with the same face on it.

thanks for any tipp guys :)

rvz

Looks like a complete contraption to setup and looks very unpleasant to use at first glance when compared against Noiselith.

The hundreds of python scripts and having the user to touch the terminal shows why something like Noiselith should exist for normal users rather than developers or programmers.

I would rather take a packaged solution that just works over a bunch of scripts requiring a terminal.

Liquix

installation/setup is dead simple. up and running in under 3 minutes:

git clone https://github.com/lllyasviel/Fooocus.git

cd Fooocus

pip3 install -r requirements_versions.txt

python3 entry_with_update.py

Filligree

Let's see...

> pip3: command not found

Okay. I'll need to install it? What package might that be in, hmm. Moving on, I already know it's python.

> /usr not writeable

Guess I'll use sudo...

= = =

Obviously I know better than to do this, but very few people would. This is not 'dead simple'! It's only simple for Python programmers who are already familiar with the ecosystem.

Now, fortunately the actual documentation does say to use venv. That's still not 'dead simple'; you still need to understand the commands involved. There's definitely space for a prepackaged binary.

zirgs

Or you can use Stability Matrix package manager.

liuliu

You have to make trade-off in software development. Fooocus trades on the best picture rather than the most beautiful interface, and also simplicity in its use. I think it is a good trade-off given the technology is improving at breaking-neck speed.

Look, DiffusionBee is still maintained but still no SDXL support.

Anyone who bet that the technology is done and it is time to focus on the UI is making the wrong bet.

rgbrgb

This project is really cool and I like the stated philosophy on the README. I think it's making the right trade-off in terms of setting useful defaults and not showing you 100 arcane settings. However, the installation is too hard. It's a student project and free so I'm not criticizing the author at all but I think it's a pretty fair and useful criticism of the software and likely a significant bottleneck to adoption.

Tiberium

Huh? It has a really simple interface, much much much simpler than anything else that uses SD/SDXL locally. Installation is also simple for Windows/Linux, don't know about macOS.

undefined

[deleted]

alienreborn

Interesting, will check it out to see how it compares with https://diffusionbee.com which I am using for last few months for fun.

janmo

I just checked out both and Noiselith produces much, much better results.

Auracle

I realize it may be good marketing, but it's odd to have the fact that it's on device and offline be the primary differentiator when that's probably how most people use Stable Diffusion already.

I'd probably focus more on it being easy to install and use, as that's something that isn't done much. For me, if it doesn't have Controlnet, upscaling, some kind of face detailer, and preferably regional prompting, I'm out.

I also kind of wish all of these people that want to make their own SD generators would instead work on one of the open source ones that already exist.

While an app store might be a good idea, in a world with Auto111 and all of their extensions I think it's going to go over poorly with the Stable Diffusion community, for what it's worth.

philipov

You hit the nail on the head when you said it's good marketing, but go all the way. The thing you find odd tells you who they want to use their product; You're not their target audience. They are trying to convert people from using online-only services like Dall-E, not people who already use SD.

michaelt

I think there's probably a bunch of people who don't use things like A1111 because of the complexities of the download-this-which-downloads-this-which-downloads-this-then-you-manually-download-this-and-this setup model.

I can see how something simpler might appeal to new users, even if it doesn't appeal to existing users.

Auracle

Sure, and I agree with that. As I said, I'd probably push that just as much as it being 'offline,' if not more.

prepend

I’ve oddly found many cloud wrappers to stable diffusion. So I like the upfront on device/offline description.

It was weird when I was first playing with SD how many packages did severe phone home or vms or whatever instead of just downloading a bunch of stuff and running it.

undefined

[deleted]

solarkraft

I've used SD on my device, but I found it worth it to pay for the hosted version because it's much faster.

lost_tourist

[dead]

kleiba

Sales prompt: "Young woman with blonde curls in front of a fantasy world background, come hither eyes, sitting with her legs spread, wearing a white shirt and jeans hot pants."

I mean, really??

rcoveson

If the prompt wasn't somewhat sexual, divisive, or offensive it would be wide open to the chorus of "still not as good as midjourney/dall-e/imagen". Freedom from restriction is one of the main selling points.

momojo

I'm genuinely curious how many people in the open source community are pouring their sweat and blood into these projects that are, at the end of the day, enabling guys to transform their macbooks into insta-porn-books.

SV_BubbleTime

How many technological revolutions do we need to go through before we just accept an admit by default it’s typically about boobs?

lost_tourist

[dead]

KolmogorovComp

Glad I’m not the only one who found it inappropriate. Feels very much like a dog whistle.

rcoveson

What's subtle about it? In the dog whistle analogy, who are they who cannot hear the whistle?

To me this is more like yelling "ROVER! COME HERE BOY!" at the top of your lungs.

samutek

The actual prompt is "magic world and the girl sitting inside a computer monitor, fantasy, cinematic close up photo."

OP is just offended by the image of an attractive woman, I guess. Apparently that's "creepy" now.

smcleod

Yeah that’s creepy as.

dreadlordbone

After installation, it wouldn't run on my Windows machine unless I granted public and private network access. Kinda tripped up since it says "offlilne".

tredre3

I had a similar experience.

On the first run it downloads about 30GB of data. I don't know if it would work offline on subsequent runs because for me it never ran again without crashing!

Also upon uninstallation it left behind all its data (not user data, mind you. But the executable itself, its python venv, its updater, and all the models. Uninstall basically just removed the shortcut in the start menu).

kemotep

If you disconnected completely from the internet did it still run?

That is completely wrong to advertise it as “offline” if it requires an active internet connection to run.

stets

definitely exciting to see more local clients come out. As mentioned in other comments, there are some great ones out already. I've used automatic1111 which is quick and doesn't require a ton of tuning. But it still has lots of knobs and options which makes it difficult initially. Fooocus is super quick but of course less customization.

Then there's ComfyUI, the holy grail of complicated, but with that complication comes the ability to do so much. It is a node-based app that allows you to create custom workflows. Once your image is generated, you can pipe that "node" somewhere else and modify it, eg: upscale the image or do other things.

I'd like to see if Noiselith or some others offer support for SDXLTurbo -- it came out only a few days ago but in my opinion is a complete game-changer. It can generate 512x512 images in ~half a second on consumer GPUs. The images aren't crazy quality but that ability to make a prompt like "fox in the woods", see it instantly and then add "wearing a hat" and see it instantly generate again is so valuable. Prior to that, I'd wait 12 seconds for an image. Sounds like not a big deal, but the value of being able to iterate so quickly makes local image gen so much more fun.

tracerbulletx

All the real homies use ComfyUI

weakfish

Elaborate?

tracerbulletx

I'm being kind of tongue in cheek because I understand that this is for just making things really easy and ComfyUI is a node based editor that most people would have trouble with. But the best UI for local SD generation that the community is using is https://github.com/comfyanonymous/ComfyUI

ttul

If you are a programmer at heart, ComfyUI will feel very comfortable (pun intended). It's basically a visual programming environment optimized for the type of compositional programming that machine learning models desire. The next thing this space needs is someone to build an API hosting every imaginable model on a vast farm of GPUs in the cloud. Use ComfyUI and other apps to orchestrate the models locally, but send data to the cloud and benefit from sharing GPU resources far more efficiently.

If anyone has a spare thousand hours to kill, I would build that and connect it up with the various front-ends including ComfyUI, A111, etc.. not a small amount of effort, but it will be rewarding.

rish

Agreed. It's worth the learning curve for the sheer power you can enable your workflows. I've always wanted to toy around with node based architectures and this seemed quite easy after using A1111 extensively. The community providing ready to go workflows has made it quite enjoyable too.

cchance

Haven't gotten to test it but given i use CoreML on Comfy, i wonder if we'll see more optimizations and performance work on the back end of these platforms as more useful frontends come out. the 1-4it/s on a 512 image is just sad, and the 2-3s/it on 1024 is just sad in this modern day, hell the ANE can't even run SD 1024x1024 images on a Macbook Pro M3 :S

amelius

So it's free, but not open source.

What is the catch?

sib

They will have a non-free (as in beer) version once they exit beta (per the website).

SV_BubbleTime

With no real way to confirm it doesn’t phone home.

IDK, this all seems weird considering there are four other really good projects that do all of these things already.

stjohnswarts

what? there are dozens of application level firewalls out there

NKosmatos

As others have stated, Local AI (completely offline after model/weight download) is the way to go. If I have the hardware why shouldn't I be able to run all these fancy software on my own machine?

There are many great suggestions and links to other similar/better packages, so follow the comments for more info, thanks :-)

Daily Digest email

Get the top HN stories in your inbox every day.