Show HN: Open Prompts – dataset of 10M Stable Diffusion generations
71 comments·September 22, 2022
Been using Lexica quite a bit for prompt analysis, thanks for the work.
General browsing is heavily dominated by portraits though. I was wondering if it would be worth having a face detected flag on images so you could filter portraits.
That’s a really cool idea. Especially since that in particular doesn’t seem quite hard. Do you have any ideas in how you’d implement that?
Great work on Lexica.art, it's been indispensable for finding good prompts and combinations.
How much does it cost to host it? Feels like hosting 500GB of images and serving them can't be cheap.
Last month Lexica served a little over 1 billion images and the Cloudflare bill (I'm using R2 + workers) was a little over $5k. I've since gotten it down to a more reasonable amount after spending some time to re-encode the images to reduce our bandwidth usage significantly. If Lexica were running on AWS/S3 I imagine our first month's bill would be closer to $100k rather than $5k. This is only image serving, so not including costs to run the beefy CPU servers to run CLIP search, frontend, DB, backend, etc.
Why not go with a server or two or some VMs on Hetzner/Kimsufi/OVH/Netcup/BuyVM etc where they have very generous included transfer or even unmetered (BuyVM) ?
I get it that everyone wants to use the trendy newest tech (workers etc or whatever the latest is), but your bill could easily be 20% (or less) of the $5k kind of numbers you are mentioning.
I guess if those kind of numbers are just water under the bridge for you than you may as well go with the easier cloud setup/infra though.
Besides search and image listing, what other plans do you have with Lexica? Also, do you plan to open-source anything of it?
Weird 1billion 50k request ends up about 5k on the cloudfront pricing calculator. Do you have a bandwidth estimate? Idk how you got that 100k quote
What is the bandwidth like? I guess something in the order of a few 100 TB? Perhaps you can host this from a an "unmetered" server for $50 per month. Not sure how high your peak loads are though.
Thanks! Great work with Lexica.
We also released a free API https://devapi.krea.ai/ if anyone wants to check it out.
It will soon have endpoints with custom image generation features.
Amazing! Great to see more prompt APIs!
Amazing work, love Lexica! Thank you!
Thanks for posting this — I made it!
I've been steadily adding new prompts/images, and in fact today was the first time a set of user-submitted prompts were added.
I'm so pleased to know it's useful to others.
really cool! that must have been a lot of work. Here's another great site for references: https://proximacentaurib.notion.site/proximacentaurib/parrot...
Thanks for the kind words. I collected them and built the site over ~2 months. The financial cost to run the prompts on DALL•E 2 was the hardest part!
This is a tangent, but what are your thoughts about the word on this image?
It seems pretty on-topic. You've seen a lot of images, so I'm curious if that happens often? Or how significant you think it is.
I already knew your site. I don’t recall where I saw it. I shared it with Victor right away because of the sheer amount of content! Insane!
In fact, I think thought for a while about how to potentially integrate it with Krea somehow. But I came up empty. If you have any ideas, please reach out via twitter!
Thanks! Prompts can be hard to create, we hope that having access to these kinds of datasets we will be able to create tools and conduct studies that help us create better images and understand better the possibilities of AI models like stable diffusion.
I’d love to integrate our crawler with GitHub Actions and make it a self-updating dataset…
There’s so much stuff to do!
Amazing work great stuff !!!!!
I have a negative emotional reaction to PromptBase. Stable diffusion is free and someone tries to make a business out of adding little value on top of it? It's not wrong or anything, I just don't like it...
We're living some crazy times! A truly AI summer.
If you enjoy thinking about how the future of this field might look like, I highly recommend watching the interview between Yannic Kilcher and Sebastian Risi (https://www.youtube.com/watch?v=_7xpGve9QEE).
I was mind-blown after hearing it. It was a long time since I didn't hear such an interesting conversation. It's crazy how Risi's ideas correlate so well with the way how complex systems emerge in nature (optimizing locally), and the idea of self-organizing systems is just amazing.
> AI Summer
Stable Diffusion was released just a month ago and look at the amount of applications and improvements that have been developed, it feels like a year!
Open-source is the way to get the most out of this tech. We plan to keep building all the features that are to come at krea.ai in this way.
It is truly crazy the pace of updates. It's the first single topic I feel like I can't keep up with no matter how much time I spend with it.
It's like how they say Da Vinci's was the last generation that could know 'everything' but now we're getting into real-time too much new information!
There’s definitely no way to keep up with machine learning today. Let alone all the world’s knowledge.
AI will be the new Da Vinci.
Krea dev here.
Lexica is a search engine (like krea.ai), but it doesn’t allow you to create collections or like generations.
Regarding the API, both have public APIs although I’m not sure if you can paginate through several search results using the public Lexica API. In the Krea Prompts API, you can do cursor-based pagination.
Finally, Lexica API allows you to do CLIP-based search, but with Krea we are using PostgreSQL full-text search (for now). However, the code to do CLIP search with the dataset (including reverse image search) is in the repository.
(edit: also, nor Lexica nor other search engines or similar products are offering the dataset afaik.)
Nice! That is cool!
Well, we have a like + collection system at krea.ai.
We may use the data from there to train custom models. Kind of the same that MidJourney has, where they ask people to rate images in exchange for GPU hours as prizes.
We haven’t thought deeply about it yet.
One of the main things were learning from the current trajectory of large models is that this kind of supervision isn't necessary. It's better to focus on bigger models, more data, and better datasets. Models will improve faster than we can come up with clever ways to add this supervision.
I wonder how CLIP search would work for finding errors in the dataset.
For now, a workaround is to create your own "glitch" collection in krea.ai and store there images with artifacts.
If you end up doing it we will add a "download all" button right away :)
And all the prompts from each collection could also be added to Open Prompts for sure.
Interesting. Kind of like Scale AI for generative AI.
I know! I’m sorry that it doesn’t work that way.
The site is using Svelte + SvelteKit and I couldn’t find amazing Masonry components (like Masonic from React) that allow me to save and restore the scroll position easily. I can do it using hacky ways but there’s more important things to do.
I’m also still trying to figure out why Back Forward Cache is not working right away with my current implementation. I would make the site snappier and also address the issue you’re bringing.
Perhaps open-sourcing the code and figuring it out all together is the way…