Did SDXL FAIL? - Downfall or too young?

Рет қаралды 29,394

Күн бұрын

Did SDXL fail the expectations of the community? I asked my community what you think about SDXL. Here are the most important arguments. What do you think about SDXL? Did Stability AI get it wrong? Or is SDXL simply too young to take the crown?
#### Links from the Video ####
stability.ai/blog/stable-diff...
civitai.com/models/4201/reali...
civitai.com/models/4384/dream...
civitai.com/models/43331/maji...
civitai.com/models/101055/sd-xl
civitai.com/models/112902/dre...
civitai.com/models/43977/leos...
#### Join and Support me ####
Buy me a Coffee: www.buymeacoffee.com/oliviotu...
Join my Facebook Group: / theairevolution
Joint my Discord Group: / discord
AI Newsletter: oliviotutorials.podia.com/new...
Support me on Patreon: / sarikas

Пікірлер: 418

@gkoriski 8 ай бұрын

I'm working on a huge series for more then a year, I started with the basic models and have been improving the images since, I don't use ready made styles and control the results style exclusively using the prompts. considering the results I've been able to achieve since SDXL came out, In my book, SDXL is a huge step up.

@user-tn4eq9kk7f 8 ай бұрын

@MrSongib 8 ай бұрын

I had the same view, my problem is my Hardware limitation and the model size and the Lora is too big. and WebUI is seems stagnate with SDXL aswell, the improvement that we see more is from CtrlNet and other stuff that utilize with more "precision" on what we want to look for when we render stuff with AI.

@latt.qcd9221 8 ай бұрын

To me, one of the single most important things that I need from future AI is the ability for prompts to have a better understanding of multiple subjects. There's plenty of times when I want a certain pose with multiple characters or need two characters each with very specific looks that is incredibly difficult to achieve, even with ControlNets, composable LoRA's, etc. I want to be able to prompt for characters with specific heights, hair colors, outfits, pose, etc. and the prompt only affect that particular character and not the other. That's far more important to me than just squeezing out a little bit more detail, especially when it comes at such a significant performance cost. Add to that the fact that I have hundreds of LoRA's that I use, all of which are SD1.5, and I have no reason to swap.

@mirek190 8 ай бұрын

then model must be even bigger than SDXL to better understand context

@Steamrick 8 ай бұрын

@@mirek190 Yes, you'd need a big LLM like, say, GPT-4 in the backend. Like Dall-E 3 has. Unfortunately, that's not going to run on anybody's home computer hardware for at least half a decade more, probably a full decade. Though I guess they're also swiftly improving how good smaller LLMs are.

@westingtyler2 8 ай бұрын

i think you are talking about 'cardinality', which is the bleeding edge right now, and I think Dalle 3 has it pretty strongly. given enough time, sd will as well.

@highcommander2007 8 ай бұрын

100% this. Would be really nice if it "paused" at something like 20% and you can click "subject 1" and "subject 2" to assign the features you entered in the prompt. I feel like something like this could be programmed fairly easily.

@6daysoflight 8 ай бұрын

Im working on my second short film now and can fully acknowledge that. Dall e 3 is really great in that case.

@waltdistel716 8 ай бұрын

the overhyping felt really disingenuous from the start. that infamous "preference win rate" graph was a serious red flag to me

@pennybillson3616 8 ай бұрын

In terms of creating photorealistic images of humans and for photo restoration, I find that 1.5 yields more pleasing results (with extensions and upscaling). However, for more artistic projects, I get better results with SDXL.

@neoneil9377 8 ай бұрын

👍💯

@ThePromptWizard2023 8 ай бұрын

Strange, I am finding referenceing artist names is terrible in XL, but great in certain 1.5 models. I did one graph of 10 totally different artist names, the pics came out very similar, total fail it seemed on artist names.

@davidsouto79 8 ай бұрын

There is one thing that is missing in the debate here, and that is that SDXL model was trained with non propietary content and that allow Stable Diffusion to get Open RAIL-M License and now you as user of the model can get full legal control over your IA art creations in the United States. That is no petty thing at all and is a huge step forward. Basically SDXL allowed Stability Ai to become full open source paving the road for a much brighter future for the community.

@robinmountford5322 8 ай бұрын

Absolutely. This is a game changer and should make all other debate points moot.

@ThaGamingMisfit 8 ай бұрын

I was also surprised this wasn't mentioned, it was the first upside I thought of. I guess most people don't care about copyright.

@davidsouto79 8 ай бұрын

@@robinmountford5322 Yeah I think the hardware debate is really shortsighted as everyone, pros a and hobbiest, eventually upgrade their rigs. But a consistent legal framework for our creations is the one thing that was missing and that Stability AI has managed to acomplish it for the Open Source community as Adobe Firefly has manged to do it for the propietary software industry. It's an incredible gift to everyone and 6 or 7 years for now everyone will be able to train SDXL models and loras with ease and be able to do business with their craft.

@eimantasbutkus5324 8 ай бұрын

I didn't know this. But if it's true, it's quite significant indeed.

@RickHenderson 8 ай бұрын

Thanks for mentioning this. I didn’t know that, but it’s a huge issue for me. I’ll still with SDXL and get it as good as a I can.

@Seany06 8 ай бұрын

It's still young, give it another 6 months.. I've got to say though I have been extremely expressed with more recent SD 1.5 models like epicrealism compared to models we had at the start of the year and of course last autumn, the progress has been astounding. It took us a while to get here and although I'd expect us to have taken what we've learned from building on 1.5 and taking it to SDXL as meaning quick results maybe it will still take people more time to tweak and fully understand how to get the best results from SDXL. SDXL hasn't really been out that long, not even 3 whole months since the full version has been released. make relax, have patience.

@eimantasbutkus5324 8 ай бұрын

It will take longer for SDXL to shine its full potential because more than half of the community is still sitting on sd 1.5, sadly.

@digidope 8 ай бұрын

6months is a decade in AI world. MJ is rolling out 2048px resolution on next update and Ideogram and Dalle3 can do perfect text. Sure, we need sdxl for the titty pics, but competition is moving alot faster on quality. Dalle3 is next level with prompt understandin and no custom XL model can improve that.

@digidope 8 ай бұрын

@@eimantasbutkus5324Its also due the very slow of training XL models. It's expensive to train a very good model for XL.

@h.m.chuang0224 8 ай бұрын

@@eimantasbutkus5324 And probably two-thirds of the community can't even run SDXL properly on their hardware

@tsahello 7 ай бұрын

😅epic no Nlight and UM fashion Is Better bro

@Foolsjoker 8 ай бұрын

I use about 80% SDXL, and 20% 1.5. I think it just needs some of 1.5s extensions to come over. Give it time.

@eimantasbutkus5324 8 ай бұрын

How much time though? Over half the community are sitting with their thumbs up their butt with sd 1.5, refusing to adapt and move over.

@vizsumit 8 ай бұрын

The biggest problem is training time. While resolution has increased 2 times the training time is 4x to 6x with SDXL.

@speedy3749 8 ай бұрын

But tha'ts to be expected. Double resolution means you double the pixels in two dimensions, so 2*2 is four. There is also some overhead to keep the pixels related, so 4x to 6x sounds about right just with simple math. That has always been the case with image resolutions: Even if it scaled totally linear, doubling image size quadrouples the number of pixels and therefore render times.

@vizsumit 8 ай бұрын

@@speedy3749 intersting, I never thought about that.

@DandelionStrawart 8 ай бұрын

Great video, as always Olivio. I just want to mention this for the people who are having trouble with low vram and SDXL. I was one of those people, it was taking me 5 minutes to generate a simple picture with auto1111 and SDXL. There's a fellow on youtube with a channel called Tube Underdeveloped. There he has a video where he shows how to tweak the web-ui file so that it can generate SDXL fast. I followed his instructions and went from 5 minutes to simple pictures, in as little as 15 seconds. Now I have an RTX3060, you mileage may vary, but his video is very detailed and has a lot of stats for various cards. If it helped me, I'm sure it can help you. I have no affiliation with that channel, just passing info on.

@BoolitMagnet 8 ай бұрын

SDXL hasn't failed me, but... it also was not a major leap from SD 1.5 that I was expecting since it was a major release. Aside from higher detail because of the higher resolution, I'm glad to see much much less instances of double-heads or extra limbs in the base SDXL, and even less needs for face restore when compared to the base SD 1.5. For the base models for SDXL, I would rather that they prioritize better interpretation and adherence to the prompt. Even using a smaller model of an LLM that can fit in consumer GPU's along side the inference engine may give enough of a boost to make it worthwhile. Everything else can still be handled by the custom models, LORA's, IPAdapter, ControlNet, in/out painting, etc... My fear is that if the next major update to SDXL is just another "incremental" improvement, the momentum for progress behind SD will deflate dramatically and the community will be stuck with community-driven enhancements to SD 1.5. Who knows, this may motivate the birth of a new (truly) open source steward who will take SD in the direction that benefits the community the most.

@StringerBell 8 ай бұрын

SDXL is another league. I don't even need to put negative prompts anymore. On 1.5 I had to paste 2-3 size of the prompt to negative prompts and it was ridiculous to get a decent image without trickery.

@d.hunterphillips74 8 ай бұрын

Same. my negative prompt now is more stylistic or something and just a couple of words as opposed to a laundry list

@jopansmark 8 ай бұрын

Yeah, seems like OpenAI and Midjourney are running a huge Stable Diffusion hate campaign all over the internet

@Eyoldaith 6 ай бұрын

Would recommend trying to trim 1.5 neg prompts, you put in too much and it’ll practically skip over half of it. Overly long negative prompts negatively affect the outputs from many models, better off using negative embeddings and some keywords based on your prompt and model.

@AlistairKarim 8 ай бұрын

SDXL is great, but because of its hardware requirements, it's doomed to be less universal. So, SD1.5 will see more experimentation and niche extensions for years to come. I hope that with more efforts to fine-tune SDXL, it will become a great base model for commercial applications. If you work in a studio that uses generative content, you'd obviously prefer more detail and resolution, regardless of how many LORAs are out there on Civitai.

@kritikusi-666 8 ай бұрын

It is not their fault that the hardware prices are shit.

@mirek190 8 ай бұрын

Who even use 4GB cards? Such card are 10 years old. You want to work with graphic / AI you want to run it on potato? wtf

@shabadooshabadoo4918 8 ай бұрын

The reason I dont use XL is its way slower on my hardware and also i find the 1.5 based checkpoints make 1024 images just fine. then the upscaling is so damn good its easy to get a 3kx3k image in no time flat with only 10gb of video memory. and its such a fast work flow because getting the image you want is super fast and then the only time input is upscaling the already perfect image to develop even more details, it also keeps inpainting fast since it happens before upscaling.

@henrik.norberg 8 ай бұрын

I'm using AI to create pictures for books I'm writing and need my models to follow my prompt as much as possible. SDXL is better to follow my prompt but I get better pictures with 1.5 so yeah I'm disappointed. Was going to buy a 7900 XTX or 4090 (only a 3060 12GB now) but now I think I will wait and see if Dale-E 3 will suit me better as it follows the prompt almost insanely well.

@jtjames79 8 ай бұрын

I've also got a 3060 I'm planning on adding a 4060. AI models can take advantage of multiple cards automagically. That would be 20 gigs of vram, and a decent number of tensor cores between the two of them. Way way cheaper than trying to upgrade to a single superior card.

@user-yj3mf1dk7b 8 ай бұрын

@@jtjames79dont want to disappoint you, but it wont be 20 VRAM it would be 12 + 8 vram.

@henrik.norberg 8 ай бұрын

@@jtjames79 Yeah, some AI like LLMs can use multiple cards decently but it will never be as hassle free as a single card. I also only have a server with space for a single card and my main gaming rig is a mini-itx with a single slot so... And a 7900 XTX/4090 in my main gaming rig will be nice in a year or so when I'm finished with Baldur's Gate 3 😂

@del669 8 ай бұрын

im new to this so take that into account, but the result i get for simple anime style game assets on sdxl is significantly better for me. it understands my prompts better, has less abnormalities

@lionhearto6238 8 ай бұрын

thanks for bringing this up. good to see someone with your platform speak on such topics

@telemarcelo 8 ай бұрын

Well I think SDXL base model is much better than the regular 1.5. The fine 1.5 fine tuned models are superb but I did not have as much luck with the base model as I did with SDXL in comfyui.

@deadman5985 8 ай бұрын

I agree with all those comments and I feel you picked those that really reflect how the community feels about it. Community really tries with SDXL and consumer HW will get better but yeh for me it still can't beat 1.5 community models and LORAs.

@OscarFrancoQuintanilla 8 ай бұрын

I correct the text: "I had a video card with 6 GB of VRAM. I started using SD 1.5 in automatic 1111 mode with Controlnet, Loras and bezels, trying various models for a few months, and everything was going great, until SDXL came out. The first thing I noticed is that 1111 auto mode would not load it, mainly due to the hardware limit of only 6 GB of VRAM on my card :(. However, ComfyUI and Focus arrived, which allowed running SDXL models, especially Focus. Normally, to render an image in SD 1.5 type 512x768, it took me an average of 12 to 18 seconds depending on whether I used Controlnet, Loras or Roop to change the faces. But in Focus SDXL, it took 1 minute and I could only change SDXL models and use some Loras to SDXL. However, the quality and the ability to include higher definition text was evident. Although the SD 1.5 models are great in artistic or realistic mode, in SDXL the quality was undeniably superior, and still is, especially now that More SDXL and Loras models are coming out. The most professional tool is undoubtedly ComfyUI, and it works on cards with 4 GB or 6 GB of RAM. Next up is Focus, which launched its new version, Focus MRE, which offers Controlnet inpainting, more styling options and has significantly improved its options. There is no doubt about the additional quality that the SDXL provides, although the well-worked world of SD 1.5 is still very solid, and I am still using it. Now, I have purchased a 12 GB VRAM card, and the rendering process is 8 times faster in 1111 auto mode and in Foocus or ComfyUI. The process will take time as users upgrade their equipment to further explore the world of SDXL. In any case, it appears that all SD 1.5 models are migrating to SDXL, as are their Loras, and SDXL is even running in automatic 1111 mode. For this reason, my conclusion is that improving the equipment will give them greater possibilities. creative and higher quality. Thank you for reading."

@karenreddy 8 ай бұрын

Fail? The quality of the fine-tuned models is already much better overall, and it listens more to prompts. As it matures it'll keep becoming more clear. It's easy to spot a mostly prompt based 1.5 generation now, as they're very flat next to XL.

@coloryvr 8 ай бұрын

I like them both: SDXL is sometimes better with complex prompts on complex situations and with well-defined color compositions. Right now Dalle3 is at the top of my list, but I'm sure SD will top of Dalle3 in a few weeks... Things are changing so quickly in AI ... At least I believe that Local Stable Diffusion will win the race in the long run (anyway if it's 1111 or SDXl). I primarily use SDXL to generate endless textures for use as VR brushes... (The new tiling feature is amazing!) Happy colorful Greetinx to all AI enthusiasts!

@DivaAi 8 ай бұрын

What new tiling feature? SDXL has a tiling feature? Do tell! I'd love to check it out! :D

@robinmountford5322 8 ай бұрын

I have a very outdated GPU, a 1060, It runs SDXL at high resolutions very well on Comfy UI. I also find the most realistic oil painting style on SDXL base model than on any other trained model on both 1.5 and SDXL. Also remember the license is much more permissive on SDXL which I think is very crucial.

@michaelbayes802 8 ай бұрын

agree with your analysis Olivio. For me I am quite disappointed with SDXL - it is the hardware requirements on the one side which leads to slow image generation as a result i have abandoned A111 to get more speed via ComfyUI but still it is too slow. Also whilst the images have more pixels I don't find that the details are much better. I also find the faces you get from SDXL all look very similar and not really what I like.

@user-yj3mf1dk7b 8 ай бұрын

lol, like faces in 1.5 different. just use different names in promt. simple as that.

@BillMeeksLA 8 ай бұрын

While it still doesn’t have the best community support, SDXL works so much better than 1.5 it’s not even funny. I rarely have to use a negative prompt, for example. I was ready to start making assets for my animated series using my custom 1.5 models, but delayed to upgrade all of them to XL because it works so much better out of the box.

@SchusterRainer 8 ай бұрын

SDXL is a huge improvement personally for me. Prompting is still quirky, but you can get really nice results, especially with some finetunes or the "xl more art full". I'd say it depends on what you expect, and SD 1.5 has an area where it shines so does XL. XL also needs to be prompted a bit different to get the same results. But overall I fell I can get much more creative output in terms of nonrealistic imagery

@Ulayo 8 ай бұрын

I think SDXL has a lot more potential than 1.5, it's just that we aren't seeing what it's really capable of yet. It feels like the development is going slow, the models need more finetuning and we need more of the tools we have with 1.5.

@phamucmanh6909 8 ай бұрын

I always keep in my mind that: It's never about the tool. Who uses it matter

@mikebrave-makesgames5365 8 ай бұрын

I've had plans to upgrade to XL but retraining all of my Lora's is actually a massive commitment, and I've only gotten a few done, so 90% of my time has still been with 1.5 as well.

@phizc 8 ай бұрын

Are your LoRAs public?

@Phezox 8 ай бұрын

SDXL (including all the finetuned models currently available) can make better artistic images and it's better at colors and contrast, cinematic lighting, better at styles like anime etc. but overall the 1.5 models are better IMO (at least for my use cases), 1.5 models can generate images that feel more natural than what SDXL can. Anyway I use both and I find myself using 1.5 models much more. Also inpainting models for 1.5 are just magic, inpainting models are one of the best features of 1.5. I just hope that SDXL can be finetuned in the future to be at least on par with 1.5 finetuned ones. Though I admit that SDXL can understand many prompts better than 1.5 and it has many more styles compared to 1.5

@NorthgateLP 8 ай бұрын

I think you're missing one big point: SDXL comes with a refiner that the community based models don't have. The choice on how the refiner has been implemented wasn't well thought trough IMO and it hurts the whole ecosystem. Aside from that we have a fracture in the UIs with ComfyUI and A1111 both demanding their own individual extensions, which just makes everythin move a lot slower.

@MrMsschwing 8 ай бұрын

Yes indeed, the refiner was a bad choice I think. Why would you want an extra step with an extra model. This doesn't feel like an advancement.

@heinzerbrew 8 ай бұрын

i hate the refiner. there are some sdxl based models that don't need it. I love those

@DivaAi 8 ай бұрын

@@heinzerbrew Almost none of the new SDXL models need the refiner, and IIRC that was the way Stability AI intended it. :)

@kademo1594 8 ай бұрын

Im wondering where all the big creator’s models are. Like realistic vision, the one behind dreamshaper is taking a break, where are models from the creators of rev animated and majicmix

@Elwaves2925 8 ай бұрын

Realistic Vision for SDXL has been out for a little while and is currently on version 2.0. It's called RealVisXL.

@zxbc1 8 ай бұрын

There are a lot of new models in SDXL that are superb. Copax's models are excellent, so are Zavy's. Then there's RealVis which does realism very well too. Juggernaut also has many versions.

@HunnniDarling 8 ай бұрын

I have made and downloaded a couple SDXL loras, So far I am getting mediocre results if I don't refine and upscale, but I am liking the end results much better. GTX 3060 12GB I got used to shorter wait times on SD but it has been worth adapting.

@turbofliptv 8 ай бұрын

Thanks for mentioning! I am also mainly into 1.5 - but mostly because of the communities loras and controlnet pipeline. Not great in every render but mostly and very flexible for every need! Can we just have some more optimized 1.5?

@highcommander2007 8 ай бұрын

One thing that always surprised me is why these softwares don't have a simple text option. It blows my mind because how relatively simple text is compared to lighting/perspective/ proportions and so on.

@jancijak9385 8 ай бұрын

You could get similar refiner flow results with comfyui before the sdxl was even out. You could even use different models as refiners of previous images and edit parts of the image. No matter how good improvements of the prompting side will be, we are not able to describe and develop the images that we want in single step. Iteration and control are the most important for the models. I would love to see controllers, that will fix the state of background. I would love to fix the exact person that I have rendered and just change the view angle.

@milo8425 8 ай бұрын

Yeah, DALL-E 3 is definitely making me ask this question as well.

@BecauseBinge 8 ай бұрын

SDXL excells at composition compared to SD 1.5

@henrischomacker6097 8 ай бұрын

Imho the problem with the 1.5 models is not the image-quality, but the bad real understanding of the content of the image that should be created. The actual stable diffusion models still have no understanding that a hand normally has five fingers and colors of even "combined" tags in brackets still bleed all over the image. Another big problem that can drive you crazy is the over-intensive zoom attention for image-parts that get a lot of attention in the way that you may eg use 20 phrases to describe a face very accurately and also give some of the keywords higher values. "What? I wanted and already had a frightened robot riding a horse and now, after defining the facial expression of the robot I only have a little face-part left?" And it is almost impossible without doing a lot of tricks and expressively describing the horseshoes to get the full image back. Imho the whole weighting-system is broken. How to bring more than one person onto the stage only by describing them with "the person on the left has..." and "the person on the right wears..."? - Still unsolved. (OK, maybe this could also be achieved by a clever integration of the opencv lib to recognize people or image-parts by triggers found in the prompt, create an image-mask in the background for that regions, region by region, in the background and onlly fill these parts like with inpaintion with the belonging prompt-parts. - Could work. But there still ist the lack of textural understanding of the models, so an additional text decoder must must probabely be used to even recognize the prompt keywords that belong to "the left person". Yes, it's nice that XL gives us very fine detaild HQ images, but that was not what we desperately needed. - What we need is good control over the image with only prompting.

@doingtime20 8 ай бұрын

To me it is partly a config thing, for example I've noticed Invoke AI and Comfy produce better images than Automatic 1111. With more tweaking I believe people will realize XL is indeed superior to 1.5. Sure it's not a huge jump, but it's definitely not a failure as some people are framing it.

@manleycreations6466 8 ай бұрын

As far as the download numbers... I first used SDXL though the Comfy UI interface and it downloaded it automatically. Then when Automatic 1111 was updated to use SDXL I just copied it from the Comfy UI folder instead of re-downloading it from CivitAI. I don't know how many other people got the model this way but that might point to lower numbers on CivitAI's site.

@rwarren58 2 ай бұрын

I’m new to all of this. I only have 8Gb of GPU, I’m getting out of memory errors. Thanks for this (new to me) info. I subscribed!

@quaterman1270 8 ай бұрын

I think regarding presicion Dalle-3 has set the bar pretty high. Stable-Diffusion doesn't come even close to it. I am hoping to get this kind of presicsion soon with stable-diffusion

@michaelpiper8198 8 ай бұрын

With stuff like dalle 3 coming out and I’m sure others will follow, I don’t really feel the need to use SDXL anymore since I can just get parts of image I want using stuff like dalle 3 and then put it together with SD 1.5 along with any uncensored tweaks I want to make.

@user-yu4ix4qs9q 8 ай бұрын

You forgot one thing about SDXL versus 1.5 : the prompt. SDXL is more readable, clean et short. You need some many extra negative prompt words with 1.5 that's sometimes become ridiculous.

@97BuckeyeGuy 8 ай бұрын

I started playing with SD just as XL was released. I really hated the prompts that were needed to get decent images from SD15 and had much results with my SDXL prompts. So, I've basically switched over entirely to SDXL. Each image definitely takes longer to create, but I don't need to create so MANY images with XL to get one that I like. Of course, with the release of DALLE-3, it's become painfully obvious that the Stable Diffusion is lightyears behind on their language model.

@ddrguy3008 8 ай бұрын

Censorship and no controlnet. Censorship on this kind of a tool messes with the process and no one wants from the stand point that when wanting to create legitimately creative and meaningful things, we want it generated freely, not overlapping with control that is unnecessary and arguably disruptive to that process/diffusion.

@DanKetchum007 8 ай бұрын

The most important thing they could introduce is layers, with SD intuitively placing things on their own layers.

@MondayMoustache 8 ай бұрын

The only reason I'm still using 1.5 models is because there simply aren't any XL models that have been trained to be as good at specific and consistent styles, it seems like the XL models I've tried have been unpredictable, as though they require too much ultra-specific style prompting in order to get what you want. It reminds me of how the old DALL-E 1.0 model would often give MS paint styles if you didn't explicitly state the style you wanted

@Yattayatta 8 ай бұрын

Hey Oliver and community, I'm upgrading my computer mainly to run generative AI locally without queues and subscriptions, what GPU is good value currently? I was looking at a 4070 with 12gb vram or maybe a 4080 with 16 gb, I'd love any input from knowledgeable people

@nokta7373 8 ай бұрын

I agree, the extra time required to generate in XL is usually not worth it, especially since the images are not super good out of the box and you need experimenting and working with the prompts in order to get what you want. You add in the upscaling and the refiner time, it's just cumbersome for very little if no gain. Even if I strip it all down and render not in 1024x1024 but at lower resolutions losing a bit of quality to speed up the process, it still can take up to 50s to render 1 image for my old GPU whereas in 1.5 i can render a 512x512 in 9s, see if the seed/prompt is any good and then go the img2img upscale route if I nailed it. I don't know if it's because XL ss still new so it's potential isn't still truly unleashed yet, but right now XL models feels very limiting in both quality of results and the time required to get those results. Can't beat MJ while going down this road and with Dalle-3 looking more promising by the day, i think SDXL might have truly missed the mark. Like you, I'm still using 1.5 models 95% of the time.

@anonimo6603 7 ай бұрын

I'll preface this by saying that I'm completely new to this world; I recently switched from NovelAI Diffusion to a SD1.5-derived template with Easy Diffusion (I haven't been able to install any other software) and I'm wondering: these models, are they able to make copies of the same image with slight modifications? If I have for example an image of a person or character that I want in another location, am I able to do that? Another question: is there any word on future projects after SDXL? I apologize for the bad English, I use DeepL.

@IlRincreTeam 8 ай бұрын

It's just that the Gigantic Laion text encoder (SDXL) has very similar results to the Huge text encoder (SD2.1), the man that uncensored SD 2.1 on Civitai made it very clear. The biggest difference is the "text writing" in the pictures, but most people can do that easily with photoshop, the overall composition quality is exactly the same

@gorkulus 8 ай бұрын

I thought the 2 images you showed had a dramatic difference in quality, detail, and photorealism. It seems super clear to me

@elysilk4538 7 ай бұрын

I have four paint brushes on my AI easel: SD 1.5, SDXL 1.0, Bing/Create, and MidJourney. I try all of them for each image, and f select the one that gives me what I am looking for. But it is not always the same virtual brush that does the job. I like them all.

@pokepress 8 ай бұрын

I’ve primarily been using Stable Diffusion to create “what if?” Images from existing graphics, so ControlNet has been a necessity. It’s only recently been updated for SDXL, and judging from my initial tests, I may need to invest in a GPU with 12 or even 16 GB of VRAM, and that doesn’t leave too many options at a price I’m willing to pay. I’m sure SDXL will be useful as the tools continue to catch up, but for now I’d recommend most folks start with 1.5.

@ThoughtFission 8 ай бұрын

Totally agree with what you are sayiing. Also very upset with censoring and biases!

@JamesRogersProgrammer 8 ай бұрын

For my uses it is just starting to give good results. Just like SD 1.5 it takes a while for people to figure out how to train their models to get the results we want to see. Plus every new model has a learning curve. It takes time to figure the magic words you need to say to generate the results you want.

@ksiobrga 8 ай бұрын

Im working with SDXL and its way better than SD15. Just waiting the community release Tile mode for controlnet and this would be perfect

@picsou2867 8 ай бұрын

interesting point of view! Also not to mention that most of the community trained SDXL models are encouraged not to use a refiner model, which in some way made me question, is the SDXL architecture design fundamentally flawed, as the community has proven that refiner concept does not bring significant improvements or make the image quality even worse.

@RoguishlyHandsome 8 ай бұрын

Very interesting. Thank you for these thoughts.

@racingvw92 8 ай бұрын

SDXL needs to be saved in a few months, if it is still not in a good place then a rebrand to something newer better needs to happen. Realize the audience that wants a current type of AI for NSFW vs making landscapes. The tool should not exclude anything and there should be niches where someone wants to dive into deeper. Either NSFW, design interiors, landscapes, abstracts, etc ... I draw the line at deep fakes and the like however.

@OnigoroshiZero 8 ай бұрын

I've been in love with the style of RevAnimated, so with nothing even close to that style based on XL, I never even tried it. Now I spend my time with DALL-E 3, which given some proper instructions to create a similar style, it can some times replicate it to a good degree but with much higher quality and detail.

@Firespark81 8 ай бұрын

I still use 1.5 as my every day driver. I was not too impressed with sdxl. I messed with it a bit and imo the images that it makes are not much better than what I can get with 1.5 and a good model. All the times I tested it I still had issues with hands and other random nonsense. Also the images do not appear as sharp to me when generated with xdsl. But that could be user error. I kinda expected more on the lines of what we are seeing with dalle 3. But got what felt more like a parallel move in direction.

@CoolAiAvatars 8 ай бұрын

The low number of downloads is correlated with the hardware requirements.

@mute888 8 ай бұрын

I have been so completely uninterested in image generation until SDXL and open source stuff.

@sternkrieger1950 8 ай бұрын

Basically SDXL is SD 1.5 just with 768x768 resolution instead of 512x512.

@pensiveintrovert4318 8 ай бұрын

We need a model that is good with both, in the image text and at images. Meme creation alone would be a huge use case.

@JohnSmith-hv6ks 8 ай бұрын

In my experience. Anatomy was a big problem. Extra knees and arms. Floating limbs. most of the time

@scetchmonkey007 8 ай бұрын

I wish you could have separate prompts for multiple subjects. IE make a three subject scene, and each subject has a separate box for prompts. I was wondering if Comfy UI might get something like that. That could be a big step forward.

@OlivioSarikas 8 ай бұрын

You can do that in comfyUI, have a look here. From the page linked, download the image and drag it into your comfyui to load the full workflow: comfyanonymous.github.io/ComfyUI_examples/noisy_latent_composition/

@scetchmonkey007 8 ай бұрын

@@OlivioSarikas I guess it's finally time to get into comfy UI, been too busy lately to try to learn a new feature. All I've had time to do lately is go back into 1.5 and play around with some of the newer models. Still every time I leave SDXL I feel the loss of quality in 1.5 and my system is strong enough to generate single images in SDXL in 30 seconds so hardware has not been an issue for me. The reason I keep liking 1.5 is I use it to generate characters for my D&D game and their are certain monsters and races I can get in 1.5, that I can't do in SDXL. But if I can get these weird creatures some require me to use 3 or more loras just to get them right, to all appear in the same image, without the struggle of in panting them in (last time I tried I spent way to much time just doing a two character image.) That would be awesome.

@ngc892 8 ай бұрын

I think XL needs more time, i miss some LORAs in XL, that i use often with 1.5. At the moment i use XL around 20% and 80% 1.5

@jagsdesign 8 ай бұрын

A very important point I would like to make about SD models is by far the only models available in opensource that gives enormous freedom to manipulate and adjust than any other diffusion setup out there and most of the diffusion setups have evolved or used SD as its base. I have been in AI tinkering from the days of Pytti setup to disco diffusion to SD workflows and even had the chance to work on experimental images with almost all other models including open AI Dalle and at the end most of the models and methods have severe limits of what you can achieve from inputs and fail to grasp the fundamental needs of the graphic design and animation industry at large who are the consumers for same. I teach design for both UG and PG and it is a great asset to work with SD models for any and every kind of results and adjustable for every kind of output needs. That said it is easily amenable for the future and very similar to workflows in Blender and development works which many other AI models are too cumbersome to work with. Community is very strong and solutions arrive in minutes rather than days. SDXL and their clones are super cool !

@menteirradiante1307 8 ай бұрын

I like to experiment with ias and I saw the same problem in sdxl as in chatgpt, (free version, I haven't seen the paid version yet) it is biased, at some point in its programming it acquired this characteristic, either through learning, or through programming itself. Different models are becoming this way, it is necessary to break down keywords by putting diversity. at least in my opinion. hugs and much peace.

@AnonymousUser-gq9oc 6 ай бұрын

You always say exactly what I've been thinking all by my lonesome. You are the best youtuber everrrr!

@gordonbrinkmann 8 ай бұрын

Regarding the hardware requirements of SDXL, well it also depends how you are using Stable Diffusion. For example, with A1111 I never really got SD 2.0 to work with the GTX 1060 6GB at work, and SDXL not at all. At home with my RTX 3060 12GB it works. But when I switched to ComfyUI, not only does it work like a charm on my RTX, but also on my old GTX it is now possible to generate SDXL images and at the same time render a 3D scene in Blender parallel to this, although it's a bit slower of course.

@vendacious 8 ай бұрын

As a Deforum + ControlNet user, where the Refiner cannot be used, I don't think the slow generation speed is worth what I get from SDXL, but that's because I almost always use ControlNet, so if Deforum and ControlNet aren't working together, I have o use my 1.5.2 version of A1111 or my Vlad install which I detached the head at the "Divide by Zero" push.

@hilbrandbos 8 ай бұрын

the only thing we need is models that 'listen' well. Models that understand cardinality, depth and human traits (and not only the ones considered 'beautiful').

@camaxide 8 ай бұрын

Personally I haven't touched 1.5 since SDXL released.. it was problematic inside A1111 at start, and yes there were fewer good models also, but getting better fast.. But the improved resolution is worth it even from the start. So to me, no it didn't fail at all, just takes some time for it to build up the model base that 1.5 had :)

@davidc1179 8 ай бұрын

For me, I'm still using SD1.5. Few reasons to that: I'm the creator of endlessReality, which is a 1.5 model, and you can't just throw away a project that you've been working on for almost a year now. There's also the fact that I have trained different embeddings for characters of my novel, including Occsan who's the redhead girl you can see on the endlessReality page on civitai. And these embeddings are obviously for 1.5. When I tried to train again these embeddings on SDXL (and on 2.x before that), the results were just not as good, in the sense of "not what these characters should look like". Other reasons are already listed in your video: better control of what you're doing with 1.5 with various tools like controlnet, etc... Also the memory requirements. Even with my 4090, I don't want to spend the hours on brainstorming ideas solely on waiting for generations.

@DylanComas 8 ай бұрын

The prompting of SDXL is a step up, even if as a developper, I feel more comfortable using weighted keywords. But other than that, it's heavy on ressources, slow even with a pretty good GPU and if you have a good workflow with 1.5, the quality difference between both isn't worth the trouble.

@DiyEcoProjects 8 ай бұрын

Hi there, can i ask something please? Im new to this Ive downloaded Stable Diffusion to my computer. How do i know that a "Model" is safe to install on my computer? How do you tell if they are dodgy? Thank you for any help. All the best

@mirek190 8 ай бұрын

safe has extension .safetensor

@DiyEcoProjects 8 ай бұрын

@@mirek190 ah thank you, ill look into that :)

@angryox3102 8 ай бұрын

I’m new to Stable Diffusion after leaving Midjourney. As a newbie I find that SDXL is easier to get decent pictures than 1.5. With 1.5 I’m always getting double heads and weird stuff like that.

@dzordzkeko2608 8 ай бұрын

Probably too high denoising strength. It is easy to check, on 1.0 monster, 0.5 resemble human.

@lalayblog 8 ай бұрын

I am using SDXL on 3050 ti with 4GB of VRAM with ComfyUI. Don't confuse audience that its impossible. In ComfyUI you can install custom nodes which can make an impainting mask with text prompt. So, ADetailer and hires fix is there. Also there is a latent converter SD1.5 SDXL, so you can mix both generations in one workflow even with 4GB to render up to 1600 x 1400 images. Bigger image will give you a duplication issue but you can go beyound that without upscalers. Just bring back your hands closer to your head 😉

@MikevomMars 8 ай бұрын

I found myself using SDXL ONLY now because it is SO much better than the previous versions.

@Cocoaben 8 ай бұрын

i think it's just because the majority of people's computers don't meet the specs required to run it

@louis-dubois-artist 8 ай бұрын

Great video. Sdxl fails because: high requirements, less flexible, worse nudes.

@SteveWarner 8 ай бұрын

If you have to make a video asking if SDXL failed, then the answer is yes, and for all the reasons you cited. It's simply not the huge leap forward that we expected.

@Voidgamer21 8 ай бұрын

sadly I have to agree with you. Ive had to Ditch Auto1111 completely due to low GPU, and now looking for an alternative. Swarm is looking like a good contender. Thanks for the vid.

@PaulMayne 8 ай бұрын

Thanks!

@OlivioSarikas 8 ай бұрын

Thank you very much for your support

@musicandhappinessbyjo795 8 ай бұрын

In my opinion SDXL is pretty young . But here are the things that is mind blowing. 1.Prompting - It has become so simple and it uses normal grammer . 2.Lora - I think the output of lora training is a bit more consistent. But here is a thing I would consider. The refiner that came with SDXL. It is said to be a failure but has anyone tried to create model solely from Refiner? I think there is a lot of potential that can be achieved using refiner training.

@DennisFrancispublishing 8 ай бұрын

I'm still using 1.5 for img2img since most of what I do is finetuning my own work with my own Loras. SDXL is still too slow for my needs.

@noobplays6710 8 ай бұрын

Hello sir i wanted to ask that does stable diffusion requires internet to perform image generation

@canaljoseg0172 8 ай бұрын

He told it that simple!!! The XL is for generating images at 1024, while the SD 1.5 is 512 and 768. That's why most people don't like the XL, it's for more equipment, but I don't understand what the model creators do, based on 768px, it's not for 768px. , they must train with 1024. The problem is us and Civitai that accept models in 768px.

@highcommander2007 8 ай бұрын

1 year advancements in Ai need to fix much of the complexity in the software and interfaces. You almost need a degree in computer science to find, update and install various things like control nets, python, and various training tools. A company like Adobe will step forward and simplify this making it easier for people to just focus on creating. Next 2 to 5 years we will need major advancements in video generation. In a short time there won't be image generators, they will all be video and the image gen is just part of it. The next BIG thing is whoever combines the power of storytelling Ai with visual media Ai. Imagine going into a movie theatre and typing what you want to see (example: a star wars movie directed in the style of XYZ which takes place 1000 years before Luke and Leia.) then you go sit down and the full length movie is generated and played before your eyes, all with Ai voices, Ai visuals and Ai written story.

@AustinGlamourPhoto 8 ай бұрын

I've been playing with SDXL for a week pretty intensively. It's not nearly as flexible as 1.5 and doesn't respond to art styles nearly as well but it can produce amazing work with some work. I wish they would have trained with the same language system they used with 1.5.

@KefazX 8 ай бұрын

I'm fortunate enough to own a 4090 and I've basically stopped using SD1.5. SDXL is still sort of annoying to use though, when you're used to the speed of 1.5. I get around 11 it/s with Comfy UI's default workflow, compared to 25+ it/s with 1.5. With my own workflow, where I mostly use restart sampling and dpm++ 2s ancestral, I get around 3 it/s.

@Sithma 8 ай бұрын

Flawless video, you correctly pointed out what's good and what's bad, and if you add what you said here that models are also way bigger than 1.5 (6,5gb each) we go to the conclusion that 1.5 wins hands down in any way

@pinkindepth 8 ай бұрын

plz do a tutorial for draw things on mac

@erics7004 8 ай бұрын

SD 2.0 , that was disappointed. SDXL os a huge improvement

@jhahn8702 8 ай бұрын

With SD 1.5 I can get photorealistic result(of people) even the details are low, but with SDXL, I can get only well done drawing result even the details are better.

@TheElement2k7 8 ай бұрын

When it takes about 5 to 10 minutes to generate an image in SDXL and about 30 seconds in SD1.5, I still prefer 1.5 thought the pictures from XL looked like the pictures from 1.5 if you run 1.5 pictures with Secound Pass and Ultimate Upscaler Script, I prefer 1.5 but the technology will surely get better with time

@roguenoir 7 ай бұрын

Correct me if I'm wrong but SD 1.5 owes a lot to the Novel AI (NAI) leak a year ago and most of the popular models today (both realistic and cartoon/anime) have some NAI in it. The NAI model was trained on possibly millions of images. The equivalent hasn't happened with SDXL and unless another leak happens (unlikely) or someone donates a ton of GPU time to train on many more images, it's still got a lot of catching up to do. I remember Andrew Ng talk about, in a lecture several years ago, that having high quality training data is far more important than having a better model architecture in general.