OpenJourney: Midjourney, but Open Source

open-journey.github.io

Daily Digest email

Get the top HN stories in your inbox every day.

celestialcheese

If anyone wants to try it out without having to build and install the thing - https://replicate.com/prompthero/openjourney

I've been using openjourney (and MJ/SD) quite a bit, and it does generate "better" with "less" compared to standard v1.5, but it's nowhere close to Midjourney v4.

Midjourney is so far ahead in generating "good" images across a wide space of styles and subjects using very little prompting. While SD requires careful negative prompts and extravagant prompting to generate something decent.

Very interested in being wrong about this, there's so much happening with SD that it's hard to keep up with what's working best.

MintsJohn

I've been thinking that for months, but recently swung towards being more optimistic about SD again, everything midjourney looks midjourney while SD allows you to create images in any style. MJ really needs to get rid of that MJ style, make it optional as it's undeniably pretty, it's just becoming a little much.

But I still feel 2.x is somehow a degradation of 1.x, its hard to get something decent out of it. The custom training/tuning and all is nice (and certainly the top rain to use SD over MJ, many use cases MJ just can't do) but it should not be used as a band-aid for appearantly inherent shortcomings in the new clip layer (I'm assuming this is where the largest difference comes from, since the Unet is trained on largely the same dataset as 1.x).

throwaway675309

To be fair that's the default style of MJ, you're seeing that a lot because most users don't take the time to add style modifiers to their prompts.

If you add qualifiers such as soft colors, impressionistic, western animation, stencil, etc. you can steer mid journey towards much more personalized styles.

brianjking

Yeah, a lot of Midjourney images are very clearly Midjourney images. Does Midjourney have inpainting/outpainting yet? I admit it's the offering I've evaluated the least.

Midjourneys upscaled images to their current max offering look fantastic, that's for sure. My wife generates some really great stuff just for fun.

lobocinza

It has inpainting and scaffolding at least.

lobocinza

MJ can emulate a lot of styles

https://github.com/willwulfken/MidJourney-Styles-and-Keyword...

smeagull

SD really shat the bed, and a bunch of projects appear to have stuck with 1.5.

michaelbrave

I think 2.0 has potential still, it works with textual inversion type models much better, which can kinda play nice with each other, so given enough of those I imagine you can get some cool stuff with it. I've also heard it does negative prompts much better, so those are less optional in 2.0

But yeah for now, all my custom models are 1.5 so I've yet to fully upgrade yet, most of the community seems to be doing similar at the moment.

lobocinza

MJ is easy to get started with and works well out of the box. SD is for those that want to do things that MJ can't like embeddings.

ted_bunny

What's SD? No one's said.

agf

Stable Diffusion https://en.wikipedia.org/wiki/Stable_Diffusion

tehsauce

stable diffusion

chamwislothe2nd

Every midjourney image has the same feeling to it. A bit 1950s sci-fi artist. I guess it's just that it all looks airbrushed? I can't put my finger on it.

cwkoss

Yeah, I think Midjourney makes fewer unsuccessful images, but harder to get images that dont match their particular style.

TillE

I don't know if that was Midjourney's intent, but it seems like a smart approach. Instead of trying to be everything to everyone and generating quite a lot of ugly garbage, you get consistently good-looking stuff in a certain style. I'm sure it helps their business model.

IshKebab

It's the science magazine article illustration look.

brianl047

Sounds great

If Midjourney applies this to all their artwork then maybe it alleviates some of the ethical concerns (Midjourney then has a "style" independent of the training data)

lobocinza

I've played a lot with it lately and that just not true. If you play with styles, colors, angles, views you have a lot of control about how the imagine will look. It can emulate pretty much all mainstream aesthetics.

ImprobableTruth

I think it's down to having a lot of feedback data due to being a service, SD has its aesthetics ratings, but I assume it pales in comparison.

nickthegreek

This is just a sd checkpoint trained on output of Midjourney. You can load it into a1111 or invokeai for easier usage. If you are looking for new checkpoints, check out the Protogen series though for some really neat stuff.

rahimnathwani

Do you mean this one? https://huggingface.co/darkstorm2150/Protogen_Infinity_Offic...

On the same topic, is there some sort of 'awesome list' of finetuned SD models? (something better than just browsing https://huggingface.co/models?other=stable-diffusion)

liuliu

https://civitai.com/

nickthegreek

Not sure why this is downvoted. Civitai does in fact list a bunch of fine tuned models and can be sorted by highest ranked, liked, downloaded, etc. It is a good resource. Many of the models are also available in the .safetensor format so you dont have to worry about a pickled checkpoint.

narrator

Looking at this site, I would argue that the canonical "hello world" of an image diffusion model is a picture of a pretty woman. The canonical "hello world" for community chatbots that can run on a consumer GPU will undoubtedly be an AI girlfriend.

rahimnathwani

Thanks.

BTW I love your app! At my desk I use Automatic1111 (because I have a decent GPU), but it's so nice to have a lean back experience on my iPad. Also, even my 6yo son can use it, as he doesn't need to manipulate a mouse.

dr_dshiv

Wow. Is there something like this for text models?

madeofpalk

why are they all big breasted women?

nickthegreek

Here are the protogen models https://civitai.com/user/darkstorm2150

pdntspa

I just gave Protogen a spin and the diversity of outputs it gave me was abysmal. Every seed for the same (relatively open-ended) prompt used the same color scheme, had the same framing, and the same composition. Whereas with SD 1.5/2.1, the subject would be placed differently in-frame, color schemes were far more varied, and results were far more interesting compositionally. (This is with identical settings between the two models and a random seed)

So unless you want cliche-as-fuck fantasy and samey waifu material, classic SD seems to do a much better job.

vintermann

Yes, protogen is based on merging of checkpoints. The checkpoints it's merged from are also mostly based on merging. Tracing the degree of ancestry back to fine tuned models is hard, but there's a ton of booru-tagged anime and porn in there.

If there's one style I dislike more than the bland Midjourney style, it's the super-smooth "realistic" child faces on adult bodies that protogen (and its own many descendants) spit out.

quitit

It's actually worse, because automatic and invoke will let you chain up GANs to fix faces and the like, and both have trivial installation procedures.

This offering is like going back to August 2022.

152334H

HN is just incredibly bad at figuring out what kind of ML projects are worth getting excited about and what aren't.

MJ v4 doesn't even use Stable Diffusion as a base [0]; a fine-tune of the latter will never come close to achieving what they do.

[0] - https://discord.com/channels/729741769192767510/730095596861...

kossTKR

It doesn’t use stablediffusion?

I thought everything besides dall-e was sd under the hood.

tsurba

Mj earlier versions were around before SD came out. Before dall-e 2 too, but after 1 IIRC. So I assume they have their own custom setup. Perhaps based on dall-e 1 paper originally (not weights as they were never published) and improved from there.

Eduard

I didn't understand a single word you said :D

lxe

sd checkpoint -- stable diffusion checkpoint. a model weights file that was obtained by tuning the stablediffusion weights file using probably something like dreambooth on some number of midjourney-generated images.

a1111 / invokeai -- stable diffusion UI tools

Protogen series -- popular stablediffusion checkpoints you can download so you can generate content in various styles

throwaway64643

> This is just a sd checkpoint trained on output of Midjourney

Which is sub-optimal -> bad. You don't want to train on output from an AI because you'll end up with a worse version of whatever that AI is already being bad at (hands, foot, and countless other things). This is the AI feedback loop that people have been talking about.

So instead of figuring out what Midjourney has done to get such good result, people just blatantly straight copied those results and fed them directly into the AI, as true as the art thief stereotype they are.

version_five

The huggingface element of these annoys me. Reading the other comments, this is just a stable diffusion checkpoint, so I should be able to download it and not use the diffusers library or whatever other HF stuff. But it's frustrating that it's tied to a for profit ecosystem like this.

I suppose pytorch is / was Facebook, but if feels more arms length. I don't have to install and run a facebook cli to use it (nobody get any ideas).

You don't need a HF cli, you just need to use git LFS (I believe now part of git) to pull the files off of HF (unfortunately still requiring an account with them). It would be nice to see truly open mirrors for this stuff that don't have to involve any company.

rattt

You don't need a HF account to download the checkpoint, can be downloaded straight from the website/browser, direct url: https://huggingface.co/openjourney/openjourney/resolve/main/...

version_five

Is it possible to download with curl or git lfs (or other "free" command line tool) with no login? I couldn't find a way to do that with the original sd checkpoints.

rattt

Yes works with anything now, they removed the manual accepting of the terms and auth requirement some months after release.

stainablesteel

i don't think it's at the point where most individuals can financially support the model training, its a company doing all this because it requires the consolidated funds of a business

give it 10 years and this will change

notpushkin

Maybe crowdfunding is an option today?

zargon

There was a group that tried to do this recently and Kickstarter shut them down.

Rastonbury

You can download the checkpoint right from hugging face and diffusers is a library you can use for free, I'm not sure what the issue is here, that people need an account?

titaniumtown

Someone should do this but for chatGPT. massive undertaking though

Edit: https://github.com/LAION-AI/Open-Assistant

vnjxk

look up "open assistant"

titaniumtown

oh damn https://github.com/LAION-AI/Open-Assistant

cool stuff, thanks

EamonnMR

If it's using a RAIL license isn't it not open source?

nickvincent

Yeah, that's a fair critique, I think the short answer is depends who you ask.

See this FAQ here: https://www.licenses.ai/faq-2

Specifically:

Q: "Are OpenRAILs considered open source licenses according to the Open Source Definition? NO."

A: "THESE ARE NOT OPEN SOURCE LICENSES, based on the definition used by Open Source Initiative, because it has some restrictions on the use of the licensed AI artifact.

That said, we consider OpenRAIL licenses to be “open”. OpenRAIL enables reuse, distribution, commercialization, and adaptation as long as the artifact is not being applied for use-cases that have been restricted.

Our main aim is not to evangelize what is open and what is not but rather to focus on the intersection between open and responsible licensing."

FWIW, there's a lot of active discussion in this space, and it could be the case that e.g. communities settle on releasing code under OSI-approved licenses and models/artifacts under lowercase "open" but use-restricted licenses.

kmeisthax

My biggest critique of OpenRAIL is that it's not entirely clear that AI is copyrightable[0] to begin with. Specifically the model weights are just a mechanical derivation of training set data. Putting aside the "does it infringe[1]" question, there is zero creativity in the training process. All the creativity is either in the source images or the training code. AI companies scrape source images off the Internet without permission, so they cannot use the source images to enforce OpenRAIL. And while they would own the training code, nobody is releasing training code[2], so OpenRAIL wouldn't apply there.

So I do not understand how the resulting model weights are a subject of copyright at all, given that the US has firmly rejected the concept of "sweat of the brow" as a copyrightability standard. Maybe in the EU you could claim database rights over the training set you collected. But the US refuses to enforce those either.

[0] I'm not talking about "is AI art copyrightable" - my personal argument would be that the user feeding it prompts or specifying inpainting masks is enough human involvement to make it copyrightable.

The Copyright Office's refusal to register AI-generated works has been, so far, purely limited to people trying to claim Midjourney as a coauthor. They are not looking over your work with a fine-toothed comb and rejecting any submissions that have badly-painted hands.

[1] I personally think AI training is fair use, but a court will need to decide that. Furthermore, fair use training would not include fair use for selling access to the AI or its output.

[2] The few bits of training code I can find are all licensed under OSI/FSF approved licenses or using libraries under such licenses.

nickvincent

This is a great point.

Not a lawyer, but as I understand the most likely way this question will be answered (for practical purposes in the US) is via the ongoing lawsuits against GitHub Copilot and Stable Diffusion and Midjourney.

I personally agree the creativity is in the source images and the training code, but think that unless it is decided that for legal purposes "AI Artifacts" (the files containing model weights, embedding, etc.) are just transformations of training data and therefore content and subject to the same legal standards as content, I see a lot of value in trying to let people license training and code and models separately. And if models are just transformations of content, I expect we can adjust the norms around licensing to achieve similar outcomes (i.e., trying to balance open sharing with some degree of creator-defined use restriction).

twoodfin

How would you distinguish “just a mechanical derivation of training set data” from compiled binary software? The latter seems also to be a mechanical derivation from the source code, but inherits the same protections under copyright law.

taneq

“Mechanical derivation” is doing a lot of heavy lifting here. What qualifies something as “mechanical”? Any algorithm? Or just digital algorithms? Any process entirely governed by the laws of physics?

cwkoss

Is the choice of what to train upon not creative? I feel like it can be.

kaoD

> nobody is releasing training code

Interesting. Why is this happening?

skybrian

Fair enough. "Source available" would be better than "open source" in this case, to avoid misleading people. (You do want them to read the terms.)

daveloyall

I'm not familiar with machine learning.

But, I'm familiar with poking around in source code repos!

I found this https://huggingface.co/openjourney/openjourney/blob/main/tex... . It's a giant binary file. A big binary blob.

(The format of the blob is python's "pickle" format: a binary serialization of an in-memory object, used to store an in-memory object and later load it, perhaps on a different machine.)

But, I did not find any source code for generating that file. Am I missing something?

Shouldn't there at least be a list of input images, etc and some script that uses them to train the model?

JoshTriplett

Yeah, this should not have a headline of "open source". Really disappointing that this isn't actually open, or even particularly close to being open.

EamonnMR

Seems like 'the lawyers who made the license' and the OSI might be good authorities on what's open source. I'd love to hear a good FSF rant about RAIL though.

undefined

[deleted]

dmm

Are ML models even eligible for copyright protection? The code certainly but what about the trained weights?

charcircuit

My thought is that it is a derivative work from the training data. The creativity comes from what you choose to or not to include.

Well Open Source licenses don't make sense for training artifacts for the same reason Creative Commons licenses are used for written and artists "open" works rather than Open Source.

nagonago

> Also, you can make a carrier! How you may ask? it is easy. In our time, we have a lot of digital asset marketplaces such as NFT marketplaces that you can sell your items and make a carrier. Never underestimate the power open source software provides.

At first I thought this might be a joke site, the poorly written copy reads like a parody.

Also, as others have pointed out, this is basically just yet another Stable Diffusion checkpoint.

notpushkin

This particular wording sounds like it could be a poor translation from Russian. Sdelat' karjeru (literally: to make a career) means to make a living doing something, or to succeed in doing some job.

88stacks

I was about to integrate this into https://88stacks.com but it requires a write token to hugging face which makes no sense. It’s a model that you download. Why does it need write access to hugging Face!?!

bootloop

Does it really, have you tried it or do you mean because of the documentation? Just skimmed through the code, haven't really seen anything related to uploading. Might not even be required.

vjbknjjvugi

why does this need write permissions on my hf account?

deathtrader666

"For using OpenJourney you have to make an account in huggingface and make a token with write permission."

admax88qqq

But why

undefined

[deleted]

KaoruAoiShiho

How is it equivalent, it's not nearly as good. Some transparency about how close it is to MJ would be nice though, because it can still be useful.

whitten

Maybe this is an obvious question but if you generate pictures using any of these tools, can you create the same picture/character/person with different poses, or backgrounds, such as for telling a story, and/or creating a comic book, or would you get a new picture every time, such as for the cover of a magazine ?

How reproducible would the pictures be ?

Narciss

Yes, you can create an AI model based on a few pictures of the “model” (the model can also be AI generated) and then you can generate images of all kinds with that model included.

Check out this video from prompt muse as an example: https://youtu.be/XjObqq6we4U

haghiri

This was my project, but since @prompthero changed their "midjourney-v4 dreambooth" model's name to openjourney, I changed my model name to "Mann-E" which is accessible here: https://huggingface.co/mann-e/mann-e_4_rev-0-1 (It's only a checkpoint and under development)

pfd1986

Are there instructions for fine tuning the model on our own images? Thanks!

shostack

I'm failing to train a model off of this in the Automatic1111 webui Dreambooth extension. Training on vanilla 1.5 works fine. It throws a bunch of errors I don't have in front of me on my phone.

Anyone else have similar issues? I loaded it both from a locally downloaded version of the model as well as from inputting in the huggingface path and my token with write (?!?) permissions.

Anyone run into similar issues? Suggestions?

Daily Digest email

Get the top HN stories in your inbox every day.