DALL·E 3

Daily Digest email

Get the top HN stories in your inbox every day.

dang

> Coming soon

Ah - this isn't out yet. That puts this in the "announcement of an announcement" category (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...).

Let's have a thread once the actual thing is there to be discussed. There's no harm in waiting (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...).

GaggiX

I never saw a post being more deboosted before but this "announcement" was extremely underwhelming so I welcome it.

dang

Happy to exceed expectations in underwhelming!

davidbarker

If anyone's interested: last year I generated around 7,000 images using DALL•E 2 and uploaded them to https://generrated.com/

I wanted a way to experiment/see what DALL•E 2 could create and share with others as some sort of inspiration/a starting point.

This was before the API was available, so I had to generate and save them all manually. And it was rather expensive! But fun.

Looks like I'll have to update them all for DALL•E 3 when I get access.

dinkleberg

If you end up doing that, it would be cool to see the comparison of the v2 vs v3 images.

davidbarker

Absolutely. I've already started thinking about how I could incorporate some kind of comparison feature.

Because there's no way to control the seed, a direct comparison (using a before/after slider, for example) probably wouldn't make sense. But I could put the group of 4 images from each version above/below each other as a general comparison, perhaps?

thewataccount

> Because there's no way to control the seed, a direct comparison (using a before/after slider, for example) probably wouldn't make sense.

Even if it was the same seed, from my understanding Dalle3 would have to be just a further trained version of the same checkpoint to even resemble Dalle2's image. Like stable diffusion 1.4 vs 1.5 and 2.0 and 2.1 will make identifiably similar images, but 1.5 vs 2.1 vs SDXL won't look remotely similar.

Even more so because I'd wager they changed their encoder and/or decoder too.

* I think that if they generated something like a controlnet for guidance the same way in both models then they might be comparable but from my understanding Dalle2 doesn't work that way at all.

Comparisons would still be interesting though!

undefined

[deleted]

tempaccount420

You can't use artist names in prompts anymore so I don't think you'll be able to use DALL-E 3 there?

EDIT: it's only living artists actually that you can't prompt (hopefully, the article says so at least)

lcnPylGDnU4H9OF

That's kind of an interesting way to handle the copyright issues. Not sure how effective it is as I suspect that can be bypassed by including a bunch of details about the artist but not the name.

> ... in the style of a famous Spanish artist who was born in 1881 and passed away in 1973 [and a bunch of other shit about Pablo Picasso]

(I also notice that this is more verbose than just "in the style of Pablo Picasso", which probably helps OpenAI's bottom line given costs associated with token counts. I doubt that's their intention with the change, just something of note. And, of course, a living example would be more applicable for copyright issues but the idea is still demonstrated.)

andromaton

The artist whose name includes Diego José Francisco de Paula Juan Nepomuceno María de los Remedios Cipriano de la Santísima Trinidad Ruiz.

davidbarker

I think it'll decline if you ask for a living artist, but sounds like it'll work for other artists.

But otherwise you're right — some might not work.

delano

Crafting descriptions of an artist and/or individual works may function as a reasonable replacement for specific names. That's what's happening behind the scenes anyway.

It's an interesting problem. Like, what's the point of conception for a work of art?

somenameforme

Is anybody aware of the specific technical reason that it struggles with words so much? There seems to be enough of a pattern to create a million reasonable hypotheses - curiosity makes me want to which it really is!

Looking at the images it's particularly interesting how it seems to have never once gotten the text correct, always just being a little bit off. Well sometimes way off, but mostly quite close.

gwern

Points to them still relying on a single small text embedding which is lossy instead of full cross-attention; but the fact that it can follow instructions reasonably well at least means that they've moved off CLIP - phew! It was frustrating arguing with people about DALL-E 2 limitations which were ultimately due to nothing but CLIP.

epivosism

Yeah, it's really strange how hard they make it to manage, download, and get full prompts for your images on all these platforms. I made this discord bot for midjourney which with some easy configuration can download and annotate all your images, including as much info as I could grab about version, etc. https://github.com/ernop/social-ai/tree/main/SocialAI

Even then it's not perfect since I'm getting info off of the command you send, which may have fallen into whatever the defaults were at the time, and so when interpreted today, not easily possible to reconstruct the version/seed/etc. from that point in the past, if you didn't include it in the prompt. But still, I just like having a folder of 30k images that I can never lose, with at least the prompt, so I can go through and re-run them later (even manually) to get comparisons over time.

stolsvik

That was really good. At first I thought it was a bit dull with just those 5 images, but the tons of different styles and concepts exemplified made it a great inspiration.

tomcam

Yes, I'm interested! It's absolutely incredible to have your pictures with the associated prompts. Thanks a ton!

davidbarker

Very kind of you to say — thanks!

nextworddev

Thoughts:

- ChatGPT integration is absolutely huge (ChatGPT Plus and enterprise integrations coming in October). This may severely thwart Midjourney and a whole bunch other text-to-image SaaS companies, leaving them only available to focus on NSFW use cases. - Quality looks comparable to Midjourney - but Midjourney has other useful features like upscaling, creating multiple variations, etc. Will DallE3 keep up, UX wise? - I absolutely prefer ChatGPT over Discord as the UI, so UI-wise I prefer this.

jonplackett

What I think could be amazing about ChatGPT integration is (holding my breath…) the ability to iterate and tweak images until they’re right, like you can with text with ChatGPT.

Currently with Midjourney/SD you sometimes get an amazing image, sometimes not. It feels like a casino. SD you can mask and try again but it’s fiddly and time consuming.

But if you could say ‘that image is great but I wanted there to be just one monkey, and can you make the sky green’ and have it take the original image modify it. Then that is a frikkin game changer and everyone else is dust.

This _probably_ isn’t the way it’s going to work. But I hope it is!

blensor

I am using Automatic1111 and SDXL and this is exactly the workflow people are doing with that.

Generate an image, maybe even a low quality one. Fix the seed and then start iterating on that

genewitch

do you have a link on how this is done? what does "fix the seed" mean? I was experimenting last night with seed variation (under Extra checkbox next to the seed input box) and i couldn't accurately describe what the sliders did. Sometimes i'd get 8 images that were slight variations, and sometimes i'd get 3 of one style with small variations, and then 5 of a completely different composition with slight variations, in the same batch.

As far as the OP goes, they claim you don't need to prompt engineer anymore, but they just moved prompt engineering to chatgpt, with all of the fun caveats that comes with.

LordDragonfang

Not quite. In A1111, the process is "pick a seed and iterate the prompt". You're still regenerating the image from noise, and if SD decides to hallucinate a feature that wasn't in the prompt, it's often very hard to remove it.

Since gpt4 natively understands images, there's the potential for it to look at the image, and understand what about it you want to change

nextworddev

Exactly - iterative "chat-driven" improvement of images will be a paradigm shift

gsuuon

That looks exactly like what was in the video (though the video seems to be down?)

jonplackett

Didn’t see a video! OK I am pretty excited now.

So long as it actually can create an image virtually the same but changed only how I want it. That would just blow everything else away.

I mean I guess it’s. Or that ridiculous. Generative fill in photoshop is kind of this, but the ability to understand from a text prompt what I want to ‘mask’ - if that’s even how it would work - would be very clever

brianjking

Dall-e 2 had variations, inpainting, etc well before Midjourney. However, I 100% agree that it's going to be interesting to see who and what wins this race.

rvz

Stability AI with Stable Diffusion is already at the finish line in this race, by being $0, open source and not being exclusively a cloud-based AI model and can be used offline.

Anything else that is 'open source' AI and allows on-device AI systems eventually brings the cost to $0.

chankstein38

I agree. I am barely excited for DALL-E 3 because I know it's going to be run by OpenAI who have repeatedly made me dislike them more and more over the last year plus. My thoughts are: "Cool. Another closed system like MidJourney. Chat integration would be cool but it's still going to likely be crazy expensive per image versus infinite possibilities with Stable Diffusion."

Especially with DALL-E. Honestly I'd be more excited if MidJourney released something new. DALL-E was the first but, in my experience, the lower-quality option. It felt like a toy, MidJourney felt like a top-tier product akin to Photoshop Express on mobile, still limited but amazing results every time, and Stable Diffusion feels like photoshop allowing endless possibilities locally without restrictions except it's FREE!

Karunamon

That's a different race entirely; most people don't even have the hardware, let alone the knowledge to run SD.

junipertea

unless it's way worse

cubefox

> ChatGPT integration is absolutely huge

Probably not. Bing Chat (which uses GPT-4 internally) already has integration of Bing Image Creator (which uses Dall-E ~2.5 internally), and it isn't good. It just writes image prompts for you, when you could simply write them yourself. It's a useless game of telephone.

danbala

To be fair, nsfw is a considerable market, which itself is plenty big enough for hold many many businesses. llm trained on literotica, anyone?

josephg

> llm trained on literotica, anyone?

I'm not going to post links, but there are several active projects & companies already doing that.

taytus

I don't see it. I use chatgpt to create prompts for midjourney. It takes me a couple of clicks only. I don't see the massive difference. Specially since midjourney is much much better than DallE

undefined

[deleted]

V__

> DALL·E 3 is designed to decline requests that ask for an image in the style of a living artist.

> Creators can now also opt their images out from training of our future image generation models.

So this version was again trained (without permission) on copyrighted work. And they try to shift the burden onto artists to manually opt out.

Aren't they afraid some court might, at some point, force them to pay each artist back a fee for each generated image?

jdminhbg

> So this version was again trained (without permission) on copyrighted work.

So am I!

klabb3

Yes but you were allowed because you are a human (I assume). Human rights are not transferable to even animals, much less machines.

You also have the right to have your eyes open in a public dressing room, but you’re not allowed to turn on your camera and film.

You’re also allowed to have fireworks but not recreational C4s. And so on.

thurn

Completely true, but copyright is not a "right" in the sense of human rights, it's a legal construct that we created to create certain social benefits. And it certainly wasn't my impression that most HN users view the current state of copyright law as an unmitigated positive force in the world.

amelius

Except you were actually trained, while the AI applied some form of computation that we (perhaps opportunistically) call "training" but it isn't really the same thing. You can't win in court just by naming things the same.

jdkee

What is the qualitative difference between the two?

smallerfish

How do we access your brain's API for fractions of a cent? I've got $100 ready to go!

jdminhbg

$100 grants you continuous access to my higher-level functions for about half an hour. Inquire within!

CamperBob2

The usual way. Send me a check every couple of weeks, and a 1099 at the end of the tax year.

lucideer

> Aren't they afraid some court might, at some point, force them to pay each artist back a fee for each generated image?

I'd say they're banking on the horse having bolted by the time such a thing might happen (i.e. courts would need to force 1000s of very large powerful companies to pay millions of people - an insurmountable legal effort).

timeon

Seems like we are experiencing another historical robbery.

tticvs

Oh no! Really? Surely you can point to what has been stolen, right?

two_in_one

> again trained (without permission) on copyrighted work

So far there is no solid proof of that. They didn't disclose the sources or the methodology. Except for 'trained' and 'copyrighted' the rest is questionable. Otherwise they would be already paying royalties.

They could have used the output from prev version 2 with prompts generated by GPT, and then corrected by humans based on the produced image. Also they could use CV to analyze new/old images. I.e. if there is a new feature in the image add it to prompt and train again.

BurningFrog

A living artist should be able to request an image in the style of themselves.

klabb3

It’s absolutely insane this is “allowed” under regular copyright, and now going into mass-consumer commercial products. Given the state of courts, I don’t have any hopes this will be reversed.

They are certainly striking under-the-table deals with big IP holders like Disney to not poke the bears, but leave all smaller actors defenseless (or rather penniless, more so than they already are).

callalex

>Aren't they afraid some court

Sam Altman has already empirically proven himself to be rich enough to be above the courts with the whole WorldCoin thing, why should he assume it would suddenly be different now?

tempaccount420

There is nothing illegal about World Coin.

ipaddr

It's already illegal in some places and will be in more soon

https://www.google.com/amp/s/www.coindesk.com/policy/2023/08...

Horffupolde

It’s a price they are willing to pay.

mortureb

All artists are trained on copyrighted work.

callalex

And they pay a license fee to do so. Artist textbooks are not free, because they pay the rights holders to reproduce their work.

karmajunkie

categorically false.

mortureb

Bullshit. You can look up all works up for free online. What a sorry argument. Copyrighted means you can’t reproduce it, not not look at it.

CuriouslyC

This does look like it might be a real threat to Midjourney, but isn't going to dethrone Stable Diffusion. I'm guessing prompt adherence will be excellent, but the lack of customizability and the art style gimping are going to limit is use greatly. People are going to use producing base images using Dall-E 3 to get composition, then run them through Stable Diffusion for style/upscaling/details.

boppo1

Why isn't this a threat to Stable Diffusion? I see more super high quality stuff from midjourney that I struggle to ID as generated than I do SD.

TacticalCoder

Stable Diffusion is open and fully deterministic: a given version of SD+tools+seed shall always give exactly the same output. The model is available to everyone so you run it locally.

Which means there are countless (free) amazing tools around SD.

StableDiffusion is threatened by exactly nothing.

(others have mentioned that SD shall happily generate porn: I don't care about that... But I care about SD being the actual "open AI").

itake

Maybe I'm doing it wrong, but I didn't couldn't get SD+tools+seed to be deterministic.

Images generated, with the exact same settings (including seed), on m1 laptop are not the same as the images from my nvidia GPU desktop with the SD-webui.

KeplerBoy

Because SD runs locally and produces the kind of content people want.

boppo1

> the kind of content people want.

Well put. I was wondering how that aspect was going to be acknowledged & phrased here.

yieldcrv

true, fortunately there are many more things to create than nudity, sexually explicit, and political figures

“so for everything else, there’s Midjourney”

darknoon

Stable Diffusion + fine-tuning is quite powerful for creating specific art styles that aren't easily described to DALL-E / MJ

orbital-decay

Because SD has way less commercial restrictions and provides far more control than a text prompt alone ever could even with a real human on the other end.

pr337h4m

Stable Diffusion is extensible - for example, ControlNet images have gone super viral on Twitter in the last few days. https://twitter.com/0xgaut/status/1702394230478360637 https://twitter.com/deepfates/status/1701055664603426970

Also, it's basically free if you own a Mac or an iPhone.

ilkke

It is completely free even if you don't own an Apple product

ormax3

SD is uncensored, you can generate anything you want with no "safety" rails to stop you.

Just take a loot at civitai to see the kind of finetuned models that are out there.

ilkke

I do feel Midjourney is better in the "press button and pretty image pop out" but if you want to have control over the process and results then tools like Invoke are generations ahead.

0xDEF

Because people can make porn with locally hosted Stable Diffusion.

They can't do that with MidJourney and DallE.

chankstein38

Yeah honestly the art style gimping made me roll my eyes. Not really interested. MidJourney's filtering was annoying enough and ChatGPT's pointless refusals to do simple things because they could be misinterpreted annoyed me to no end. Combine the two and add explicit filtering for artist styles... yeah I'll just pass on this one.

EDIT: for what it's worth, I'm not making NSFW stuff with MidJourney. I'm talking about things like being unable to use the word "cutting" or "slicing" because they could be used to make gore but I wanted "A stock photo of a person cutting cheese on a counter"

carlosrdrz

Funny how they say: "Creators can now also opt their images out from training of our future image generation models." and then the link is just a form to submit a single image at a time.

They mention you can disallow GPTBot on your site, sure, but even if you do, what happens if the Bot already scraped your image? In any case, probably other people would just publish your picture in some other website that does not disallow GPTBot anyway.

keiferski

I have been creating a large number of Midjourney images for a project recently, and it’s made me remember/realize something that I think the AI Art doomsayers seem to not understand: the importance of curation.

A quick glance at /r/Midjourney or even the images featured in DALLE link above shows how boring the “default result” is when using a generator. While it may be easier to create images, you still need some artistic sense and skills to figure out which ones are appealing. In the bigger picture I think this basically means that illustration-type art will become more of a curatorial activity, in which being able to filter through masses of images becomes the predominant skill needed.

timeon

Something like photography?

keiferski

Yeah I think that’s a good comparison. Images are cheap to make and everywhere. That doesn’t mean everyone is suddenly taking good photos.

mrweiner

I’m not terribly familiar with the text to image tools, but you can provide source images as baseline, right? I’d wager that if you’re able to create a baseline image to feed in, your results will be better. The better the input, the better the output. It definitely feels like a situation where artists who can leverage ai will be the ones pulling ahead in the commercial sector.

keiferski

It doesn’t really work that way. Yes, you can use images as a source, but they are more just mined for “pieces” to rearrange, not overall aesthetic effects.

minimaxir

That is not how image-to-image approaches work.

ControlNet is a obvious counterexample. If you think "diffusion is just collaging", upload a control image using this space that cannot exist in the source dataset (e.g. a personal sketch) and generate your own image: https://huggingface.co/spaces/AP123/IllusionDiffusion

astrange

You can do it with ControlNet guidance for SD.

jprete

I don't know how long such a phase will last, and I don't think anyone should count on it.

dirtyid

They can probably train an AI to filter based on human appeal. But IMO there's still room for artists with taste and technical talent to manipulate images closer to a curated ideal. Like the current SD photoshop workflows where generated content creates a base. I imagine once the workflow matures, there's going to be more manual imput again, i.e. drafting specific postures / arrangements for controlnet to block out composition before AI fills to 90%, and then human taste tries to refine/reiterate the last 10%.

minimaxir

Given the bullet point of "DALL·E 3 is built natively on ChatGPT" and the tight integration between ChatGPT and the corresponding image generation (and no research paper released with the announcement), I strongly suspect that DALL-E 3 is a trial run of GPT-4 multimodal capabilities and may be run on a similar infrastructure.

cubefox

GPT-4 can only do text-to-text and image-to-text. It can't generate images itself. So it will simply use an API call. Really nothing special, Bing does the same thing.

ekam

The art produced by GPT-4 so far hasn't been at this level but this may be a newer version. See: https://arxiv.org/pdf/2303.12712.pdf

gumballindie

Have they removed copyrighted "training" material or are they still relying on people's hard work which they "learned from" without consent and are selling without permission?

gwerbret

From the end of this announcement (emphasis mine):

"DALL·E 3 is designed to decline requests that ask for an image in the style of a living artist. Creators can now also opt their images out from training of our future image generation models."

Very carefully-worded statement. So...still relying on people's hard work, but on the upside, you get to opt out of having your work be fodder for DALL·E4." </s>

shmatt

* using living artists work to train models = good

* generating living artists work using said models = bad

Good ethical consistency from the OpenAI crew

gumballindie

Well OpenAI is consitent in that it consistently tries to monetise other people's work without paying for it and in doing so it leverages the gullibility of the masses to defend their actions. Clever.

tsukikage

An artist using an artbook as reference material = good

An artist tracing an image in an artbook and selling it = bad

Seems consistent to me

torginus

But not opt out of training Dalle4 on Dalle3 outputs, which were trained on your art.

undefined

[deleted]

judge2020

Legally this is likely a non-issue. It depends on if they successfully make the case that their AI learns like a human, and thus any outputs that aren't direct copies of existing art are new creations.

This even applies if the AI copies an artists art style (in the same vein as a human looking at one artists art over a weekend and then being commissioned to paint something in the same style, which is completely legal since you can't copyright an art style; although Adobe would love that[0]).

0: https://twitter.com/UltraTerm/status/1679294173793628161

MattRix

This kind of argument is always absurd to me because it doesn’t matter if it learns like a human or not, it’s still not a human.

(I would also argue that it learns and generates images in ways that are non-human, just based on speed and scale alone)

educaysean

How would you turn that into legislation though? I can simply draw a few pieces of art in the style of Naoki Urasawa, train a model on it, and claim that the outputs of the model are non-infringing. An artistic style is either copyrightable or not - I don't think a blurry middle ground helps anyone.

undefined

[deleted]

pixl97

Corporations are people too... in a particular legal perspective.

wwarner

So the law says that a painter can't steal copyrighted images, but a programmer can. Maybe the law has some catching up to do.

tsukikage

That's the exact opposite of the argument being made.

If you wouldn't be able to sue a painter over producing an image, why should you be allowed to sue a programmer over producing that image?

raytopia

They haven't. Probably one reason Adobe's AI will beat them out long term.

Also another thing that's been on my mind is I wonder if all this AI generation stuff could cause a Games Industry style crash where due to such a over saturation of highly advertised but meaningless/worthless AI content consumers lose interest and stop spending money in different respective industries (books, ganes, films, digital art, music, etc.) and then they crash.

minimaxir

If there's an AI crash it will be due to the vast number of AI companies with insufficiently differentiated products and subsequent race to the bottom, not due to the ubiquity of the output.

gumballindie

I am not aware of a games industry crash, it would appear that gaming is an industry larger than all other forms of art combined. But indeed niches that are saturated by enshitified content have almost crashed and I get your point. I suppose the average will turn even more average and indeed people will stop spending money on it. AI being a statistical machine it will excell at making whatever is common and plenty and as such those industries will suffer even more. Average music, content writing, drawings, etc, will drop to near zero value, that's guranteed.

alec_irl

The comment you're replying to is referring to the video game industry crash in North America in the early 1980s. Basically the market was flooded with games of poor quality due to a lot of factors, including Atari's complete lack of quality control on games they put on their 2600 console. Nintendo ended up redesigning their Famicom console as the Nintendo Entertainment System with an emphasis on it looking like a VCR as opposed to a cheap game console like NA audiences were used to (the Famicom itself is fairly small and plasticky with permanently attached controllers). Additionally they were strict about licensing development on the system with the goal of fostering a crop of family friendly, high quality games. It was a couple years after the fact (iirc) but Nintendo's efforts to differentiate themselves in the wake of the crash obviously payed off and led to a long period of Japanese ascendancy in the games markets. So the crash cleared out a lot of the market and led to a huge opportunity for Nintendo.

Tijdreiziger

https://en.wikipedia.org/wiki/Video_game_crash_of_1983

undefined

[deleted]

simonw

OpenAI license image data from Shutterstock, so it's possible that this is trained entirely on licensed images.

https://investor.shutterstock.com/news-releases/news-release...

More transparency about the training data, as always, would be greatly appreciated.

NoMoreNicksLeft

Truly, I have always hated how human artists have "trained" by looking at other people's art without permission, downloaded those without permission into their meat-brains, and trained their organic neural networks on this art.

You can't do that. It's copyright-maximalist copyright infringement.

datagram

The fundamental differences in scale between manual recreation by a human and automated replication by a machine are what led to the creation of copyright law in the first place.

NoMoreNicksLeft

No, that isn't what led to the creation of copyright in the first place. The concept predates Gutenberg's printing press.

Copyright in the United States was drafted into the Constitution as a way of rewarding creators so they could create more.

They're already being rewarded, perhaps too handsomely, there is no need to extend it further. If they persist in trying to take more than they're given, then the public will just need to revoke the privilege. It's not a human right.

rusk

> without permission

or attribution even

MilStdJunkie

That ship sailed long, long ago with ImageNet, I'm afraid. All that theft is part of "the economy" now, which means it ain't comin' back. Best we can hope for is a legal decision that says, "AI doesn't make shit. It's all public".

gumballindie

Funny how people complained about chinese factories stealing IP and wanting protection, but now they defend OpenAI and others, and even rejoice at the idea that this will "free society". Not clear to what it will be free to do since many that were left without jobs due to said factories have switched to white collar work, which is now to be stolen by ... the people that complained about china stealing their IP. It's hilarious to watch this mass histeria kickstarted by one single corporation. People are literarily like cattle - you can steer them any direction you wish if you know how.

BizarreByte

> but now they defend OpenAI and others, and even rejoice at the idea that this will "free society"

A tale as old as time, "it's different when we do it".

KHRZ

Funny how there's so many well paid IP lawyers around, and they focused so hard lobbying to extend the copyright to 100+ years for simple copying, but they never had the imagination and creativity that copyright is supposed to be all about protecting, to extend copyright beyond that.

4bpp

Have human artists done so?

gumballindie

But a machine is not a human, so your analogy is moot.

4bpp

Most of the times this argument is fielded, including this one, it is formulated in the shape of an appeal to a general moral principle. I don't see what this principle is supposed to be: as my analogy shows, there is clearly no general moral principle against learning from copyrighted material and people's hard work without their explicit permission. The more narrow interpretation, in which the claimed principle is that a machine must not learn from copyrighted material (...), is also implausible: since we have no real history of machines learning from copyrighted material in any way that is recognizable as learning, it stands to reason that a principle addressing that scenario can not yet have become general.

The appeal is thus to a completely novel principle that you have come up with for yourself; and it seems that rather than presenting arguments for why others should adopt this principle, you are trying to present it in such a way that someone not paying close attention would be fooled into believing that it is common sense and widely accepted. An analogy with the classic "you wouldn't download a car" comes to mind.

flir

But a human is a machine.

tysam_and

And another round of races commences!

Some folks seem to have some strong ire towards OpenAI (maybe a bit less recently), but for one, they seem to do a really, really, _really_ good job at making themselves "the benchmark to beat" for certain things, and in doing that, I think they really seem to push the field quite far forward. <3 :'))))

jprete

I dislike OpenAI because they were founded to work on AI safety, and the most anti-safety thing you can possibly do is encourage competition over AI capabilities, which is exactly what they are doing over and over again.

dragonwriter

AFAICT, “AI safety” was a term created by the overlapping (sometimes in the same body) group of X-risk cultists and corporate AI marketeers as part of their effort to redirect concern from the real and present problems created and exacerbated by existing and imminently-being-deployed AI systems into phantom speculative future problems and corporate prudishness.

jprete

X-risk concerns have been around for a long time and were not invented by AI marketeers. I agree that the marketeers are abusing the concept to try for regulatory lock-in and to make their products look maximally impressive.

astrange

AI safety is the dumbest idea in the world by people who think computers are magic, so confusing its meaning is great. The original AI safety people now think LLM training might accidentally produce an AI through "mesa-optimizers", which is more or less a theory that if you randomly generated enough numbers one of them will come alive and eat you.

orangecat

If there's any magic being alluded to, it's by the people who say that AIs will never reach or exceed human intellectual capabilities because they're "just machines", with the implication that human brains contain mystical intelligence/creativity/emotion substances.

emporas

True, and a good way to explain it to a layperson is through a comparison of Html and Python.

Are there any implementations of Python in Html? No, because Html is not a programming language. Are there any implementations of Html in Python? Many, because Python is a programming language.

Given these assumptions, one easily imagine that Html is a weaker language than Python.

So if Html is weak, let's make it stronger! Let's add some more Html headers of webpages, than three. Html has now 1 million headers! Is it less weak now? Does it come closer in strength to Python?

No, because the formal properties of Html did not change at all, no matter the number of headers. So, do the formal properties of the grammar generator called GPT, are any different related to how many animals it got statistical data on? No, the formal properties of GPT's grammar did not change at all, if it happens to know about 3 animals or a trillion.

tysam_and

While I dislike the silliness that you're alluding to, I think you're using multiple meanings of the phrase 'AI Safety' there all lumped into one negative association.

There are risks, esp in a profit-motivated capitalistic environment. Most researchers don't take the LessWrong in-culture talk seriously. I'm not sure many people are going to be able to actually understand the concerns of people in that group given the way you've presented their opinion(s).

golol

Look, there is no AI safety advancement without AI capability advancement. I think we van learn fuck all about AI safety if we don't try to actually build those AIs, carefully, and play around with them. AI safety is not an actual field of study when you don't have AIs of corresponsing level to study - otherwise there are zero useful results.

jprete

Sure, but you snuck an assumption in there. Just because AI is possible, or someone else will do it, doesn’t obligate us to build it. If we can’t make AI without risk of significant or existential harm, then we shouldn’t do it at all.

unnouinceput

>Some folks seem to have some strong ire towards OpenAI...

Yes, they should. OpenAI IS Microsoft, never forget this. Any old timer like myself remembers the crap Microsoft pulled in 90's. And nowadays they still would do the same (and in background sometime they still do it) if they would lead in those areas. I have no love towards FB/Zucky boi, but the move to make LLAMA free is a good one. Hopefully another leak comes from inside OpenAI and we get access to everything.

yunwal

Not even in the background. In order to use BingGPT you have to use the edge browser (there are ways around this but they are not obvious to a nontechnical user). What could possibly be the reason for that besides anticompetitive behavior?

flir

> I have no love towards FB/Zucky boi, but the move to make LLAMA free is a good one.

I think that might be a bit "enemy of my enemy". Remember "commoditize your complement"? Not that I'm averse to the tech giants forcing each other into a race to the bottom.

boppo1

><3 :'))))

What is the meaning of this? Why is it part of your post?

cooper_ganglia

I think it's a heart and a smiling face with a tear :)

tysam_and

Yeah it's just a habit for what I do, feels most comfortable for me when commenting or messaging people.

We humans all have our quirks! <3 :'))))

prvc

As an isolated image, I prefer the Dall-E 2 sample (of the basketball player) to all the others on that page, aesthetically. Due perhaps to having used a more fine-art-heavy training corpus, or a less specific correspondence to prompts?

binarymax

I appreciate your preference (I like things heaver on impressionism too), but I don't think it's due to the corpus but rather the model capability. DALL-E 2 is just behind in capability. Of course we won't know until October but I suspect you could prompt v3 to get a style closer to v2 if you wanted.

starshadowx2

This is actually an interesting issue the Midjourney team has thought a lot about. As each version has gotten "better", ie more realistic, there's been some loss of the "artistic" side. There are a lot of users who still use the old V2 model (compared to the most recent V5) specifically because of how "bad" it was. The grimy and less coherent parts are what they're actually looking for, instead of a more precise or perfect looking result. This has led to there being flags for adding in more stylisation or "weirdness" or being able to choose between more realistic or more artistic versions of models.

dinkblam

agreed. the new version (which they obviously view as so much better that this is their one "we made improvements" sample) is just repugnant

timeon

reminds me kitsch

ricree

Artistic preference aside, the Dall-E 3 version definitely follows the prompt closer (in the it shows someone dunking a ball).

prvc

That's part of my point. It better reflects the banal concept expressed by the prompt.

pastor_bob

it looks less like an 'oil painting' though. Looks to me like one of those stencil, spray-painted images you see people selling at tourist attractions.

Perhaps the Dall-E 2 unintentionally got that better.

gzer0

The reason that this will be a good product is that it is accessible natively within the chatgpt interface.

In addition, having access to a library of prompts, and being able to produce, create, and store images within the web interface will unlock this type of generative ai for images to many more people.

Compare this to the midjourney way, in which users must not only sign up, they have to use a discord bot (not saying this is hard, but more so, a larger barrier to entry).

Native integration will mean instant adoption by millions on day 1.

Daily Digest email

Get the top HN stories in your inbox every day.