Offset Noise: Midjourney Dethroned

Рет қаралды 30,789

Жыл бұрын

We explain the new Offset Noise discovery which allows latent diffusion model trainers to get vastly improved results by changing a single line of code. We also compare images generated by offset noise models to pre-offset noise images and Midjourney images. This is probably the first time since the release of the v4 model in December 2022 that the stable diffusion community has achieved parity with Midjourney.
Thanks so much to Nicholas Guttenberg for the excellent research. That article should by all rights be a research paper.
======= Links =======
Offset Noise: www.crosslabs.org/blog/diffus...
Cacoe's Server: / discord
Old version of the Illuminate model: huggingface.co/IlluminatiAI/I...
An Image Made of Sin Waves: • Fourier Image Decompos...
======= Music =======
Music from freetousemusic.com
‘Onion’ by LuKremBo: • (no copyright music) l...
From KZfaq Audio Library:
Escapism Yung Logos

Пікірлер: 145

@CrossLabsAI Жыл бұрын

We were about to make a video about Nicholas' findings. After seeing your brilliant explanation, we don't have much to add! Thank you so much for sharing our research!

@lewingtonn Жыл бұрын

holy shit, that's such a high compliment! Keep up the fantastic work guys! We're all counting on you!!! 🤜

@shishbasupalit5955 Жыл бұрын

Very clever and elegant piece of work - makes you wonder how many other low hanging fruits are waiting to be plucked.

@n8mo Жыл бұрын

Really nice to find a Stable Diffusion channel that actually understands how the underlying technology works! Subscribed. So many of the bigger channels just do how-tos and make "breaking news" videos about new features without taking the time to understand *why* these things work. Great explanation about the issue caused by SD's noising function. I understood the issue perfectly despite not having any ML experience (though, I am a software developer, so that helps lol)

@zbylo26 Жыл бұрын

wow, in hindsight seems so obvious. It only shows how that we're in early days and there's huge things just waiting to be discovered in the diffusion models. Love your work. Z.

@Neteroh Жыл бұрын

Midjourney Right Now: Nooooooo! My Dineros!!! My Dineros!!!

@lewingtonn Жыл бұрын

kek

@msampson3d Жыл бұрын

god I love this channel. What a great, straightforward explanation on the topic! I don't really know anyone else going this in depth on these stable diffusion topics on KZfaq. Greatly appreciated!

@vvvemn Жыл бұрын

Really appreciate all the effort that goes into research and how well you explain it along the visualizations. Keep up the great work!

@RikkTheGaijin Жыл бұрын

This dude is singlehandedly wiping the floor with pretty much every other channel about SD. You are a godsent mydude. Lets' watch that subscribe count going to the moon.

@lewingtonn Жыл бұрын

hahahaha that's such a nice thing to say! This guy is the actual king though: www.youtube.com/@outlier4052 he published like, a literal image generation paper and his videos are still really engaging

@afrosymphony8207 Жыл бұрын

why must we resort to snarky comparisons?

@lewingtonn Жыл бұрын

@@afrosymphony8207 😢

@RikkTheGaijin Жыл бұрын

@@afrosymphony8207 your parents don't love you.

@afrosymphony8207 Жыл бұрын

@@RikkTheGaijin i turned out alright because my community influenced a good moral system upon me. You on the otherhand, received all the love in the world and just look at what a miserable twat you became.

@RichSaCa Жыл бұрын

You are like teaching how and what paintbrushes are made of, the characteristics of canvases and all the fundamental stuff an artist should know to make good art. Thanks for the insights and your efforts to make clear some obscure subjects.

@MrAwesomeTheAwesome Жыл бұрын

Your videos were already good to begin with and they're getting better. Great information, good explanations. Appreciate the good work! Cheers! :D

@muerrilla Жыл бұрын

wow I just redid my last dreambooth model using offset noise, and I'm blown away by the difference it makes in general quality!

@nemonomen3340 Жыл бұрын

Got to be the most informative video on SD I’ve watched in months. I actually understand how it works a little better now; Excellent content!

@chickenp7038 Жыл бұрын

there’s nothing more beautiful then a simple fix

@natsuschiffer8316 Жыл бұрын

Hoping for another leak soon to learn more secrets like this.

@SaadAhmed3000 Жыл бұрын

what an amazingly simple description of a fourier transform

@RichardShift Жыл бұрын

.. and you made this video with all that content. That alone is a lot!! Thank you.

@lewingtonn Жыл бұрын

hahah to be fair I haven't posted in like 8 years so I had time

@Beyondarmonia Жыл бұрын

Great video. Perfect balance of technical and everyday. Keep going. Subscribed.

@OutsidersLaptop Жыл бұрын

Sweet. The other day I was trying to generate a scene set at night, and no matter how much I fine tuned the positive prompt to emphasize dark (and used the negative to omit light-related concepts), I kept getting outputs with really bright skies. Night-associated hues, but day-like intensities. Intentional or not, this video answered a lot of my questions regarding SD's trouble in this specific case. These deeper dive videos are much appreciated. Keep the goods coming!

@ethansmith7608 Жыл бұрын

Thank you for the explanation! I imagine the best hack is to randomize a seperate offset on each image, but the more faithful method I’ve seen is to use cosine noise schedule and fully destroy the image, which for some reason SD, uses a smaller standard deviation of 0.12 instead of 0.2 in previous works

@juanchogarzonmiranda Жыл бұрын

The best SD channel , THKs Koi (Smiling:1.1)

@xmorse Жыл бұрын

You explained noising so well, thank you for your awesome videos!

@KainSpero Жыл бұрын

Another wonderful video with Amazing explanations!!! So awesome see how far Cacoe has come.

@lewingtonn Жыл бұрын

dude cacoe is a beast! he's a MACHINE

@lewingtonn Жыл бұрын

an AI even!!

@lucretius1111 Жыл бұрын

Brilliant analysis. Another awesome vid!

@badradish2116 Жыл бұрын

all of your videos are the best video ive seen on that topic. all of them.

@xn4pl Жыл бұрын

They could've used simplex or perlin noise with a number of harmonics for training to properly represent low and high frequency noise.

@lewingtonn Жыл бұрын

both of those things sound super interesting, but my dude the first person to do that will get published so what are you waiting for?

@xn4pl Жыл бұрын

@@lewingtonn yeah, i'm not a computer science major (or even minor) and my programming knowledge is limited to writing text based tictactoe in python so science paper is not really my place to plug my thoughts on the matter. Also it wouldn't work if you just plug different noise formula and hope for the best, I guess more feasible approach would be to split the training image into different frequency bands using bandpass filter (like you showed in the video) and then apply noise of the same harmonic to each of them (preferably distribute noise weights like in a saw wave, lower harmonics having more noise and higher less (i might be wrong but i think it's called pink noise)) and then pass all of them (or their sum) to latent diffusion for training, if this training works it might more accurately represent high and low level details and use multiharmonic noise like perlin or simplex. It would probably take a double major in machine learning and digital signal processing to even test this idea or understand that I overcomplicated things and it can be done more easily by some handy signal processing math trick.

@RichSaCa Жыл бұрын

@@xn4pl I don't know if you are overcomplicating it, but I do guess that you are onto something, and that pretty cool things could come from your experimentations even if you are not a double major. What matters is not titles but experience, testing, failure and repeat. Just a thought.

@tomsolidPM Жыл бұрын

Great video and explanation! Kudos 🙌

@chrislloyd1734 Жыл бұрын

It makes good sense! Well done with the explanation.

@judahgamermedia Жыл бұрын

The one that Ive noticed and would like to see is these models doing more to improve symmetry and anatomy of odjects. As an artist these are things that we are taught and take years of practice to get right. And perhaps wants they get is down right, the need for artist in this reguard would minimize but you have to remember that this is a tool that is for people who havent mastered these aspect of art yet.

@TheGalacticIndian Жыл бұрын

I like this guy! Clearly Midjourney's style is so distinctive, like a single, well-known artist, that you can immediately distinguish which is which. While SD with all its quirks is much more diverse and versatile, like thousands of artists working behind the scenes. And that's a huge advantage of SD (plus you can run it locally).

@KyleandPrieteni Жыл бұрын

Misjourney has already met its doom since it started thanks to SD being open-sourced. It was eventually going to meet its match with the growing community of model trainers and people discovering many awesome functions and tricks.

@autonomousreviews2521 Жыл бұрын

This was a joy to watch :)

@afrosymphony8207 Жыл бұрын

midjourney still has the edge, maybe that can change with next version of illuminate

@shishbasupalit5955 Жыл бұрын

The title and the thumbnail of the video made me think this was an art video, so I started it, then took a long break to read the original post, went down a rabbit hole back to '90s research on power spectra, and finally came back to the video to realise it was not an art video after all, but rather an amazing technical analysis of the work. I do want to point out though it's not really that the noise doesn't affect low frequency features - white noise has a flat power spectrum and affects features of all frequencies equally. What's happening is that "natural" images have a bias that low frequency features have more power, so a constant amount of noise affects the signal to noise ratio of high frequency features more.

@ajudicator Жыл бұрын

Midjourney is far ahead (in part) because they add random tokens that are hidden to prompts based on RLHF (user feedback and ratings) So if you can rate aesthetics of a model based on an reinforced approach to the weights on the prompts then this helps

@worthstream Жыл бұрын

LAION is collecting data on aesthetics score for generated images (and associated prompts), so let's hope this approach will be available to the open source community soon, too.

@MrBlitzpunk Жыл бұрын

People always says "this is the midjourney killer/this could replace midjourney" As if stable diffusion and it's custom models aren't already beating it for a quite while now

@Mimeniia Жыл бұрын

Awe. Kaapstad in the house. Great explanation bro!

@NerdyRodent Жыл бұрын

Awesome video 😉

@mcarthcart414 Жыл бұрын

Great explanation! The principal kind of sounds like HDR photography. Combining multiple photos of different exposures to get a high dynamic range. The results can be stunning.

@DavidSilverman-darktoad Жыл бұрын

Very cool. I hope stability ai sees this and updates these noising functions for the 3.0 mode training that’s coming up

@lewingtonn Жыл бұрын

they 100% know about this already, dw mate

@the_jingo Жыл бұрын

so how do you use these offset thing in SD?

@lewingtonn Жыл бұрын

you download a model trained using it

@swannschilling474 Жыл бұрын

Sweet!!

@michaelli7000 Жыл бұрын

good technical and fun staff

@papus9163 Жыл бұрын

thnx for the update

@kirbulich Жыл бұрын

What is happening? I was reading this topic about a hour ago o_O It seems like we starting to enter the singuliarity phase more and more every second.

@2PeteShakur Жыл бұрын

good or bad?

@kirbulich Жыл бұрын

@@2PeteShakur idk? My life connected with computers, this is always good for me.

@pipinstallyp Жыл бұрын

More like butterfly effect, it's been in works for two weeks almost. Many smart people are working on a lot of things. Stuff is exciting though. :) I like the idea of Singularity for sure. With AI happenings it's wild, butterfly flapped it's wing named GANs and Transformers and here we are.

@lewingtonn Жыл бұрын

nah, I hacked into your pc like a few months back. you need to spend less time on onlyfans mate

@kirbulich Жыл бұрын

@@lewingtonn go chat in my notepad++ I have alot of question.

@zzzzzzz8473 Жыл бұрын

great video overview thanks ! yea i think there is so much more experimentation we can do with diffusion , like implementing all the augmentations of styleganADA and more . i wonder if we think of and apply as many linear transformations as possible that would influence how the model has to learn about it , for example chromatic aberration , emboss , edge sharpening would likely influence the model to utilize a kind of convolution process to undo that type of augmentation , which could be a useful tool for the model in reconstructing an image as well .

@lewingtonn Жыл бұрын

yeah like, there's a super high likelihood that that would improve results imo... someone is gonna publish a paper where they train a model to do the noising at some point...

@FunwithBlender Жыл бұрын

Always Koi, nice vid :)

@CMak3r Жыл бұрын

Is it affecting only training process, or generation of images on pretrained model will also benefit from this solution? Can I copypaste his code into my local Automatic1111 SD code to get better generations?

@lewingtonn Жыл бұрын

offset noise effects the training process only. It just makes training more effective by forcing the model to think more about brightness

@xn4pl Жыл бұрын

@@lewingtonn isn't txt2img uses noise that also averages near 0.5 which makes the model (original) try to match it in the generated image, so by just offsetting it we can control how bright or dark the final generation will be? At least it's the way I understood it. If the model learns that no matter what final image should have 0.5 mean than it's a learning issue, but if it was trained on different brightness images which average all over the place, so it should generate proper images from offset noise from the get go.

@VKTRUNG Жыл бұрын

@@xn4pl I'm under the same impression. If the model try to match the mean brightness of the input noise, wouldn't it be possible to control the brightness of the result by controlling the mean brightness of the input noise (by using some noise function that has controllable noise mean brightness)?

@mattweger437 Жыл бұрын

So simple yet so insane

@lewingtonn Жыл бұрын

BEHOLD 1.1 is here: civitai.com/models/11193/illuminati-diffusion-v11 praise be to cacoe!!!

@inxomnyaa Жыл бұрын

i thought the video is on 2.5x speed when i looked at the webcam.

@Uhor Жыл бұрын

♥

@devnull_ Жыл бұрын

Thanks! Very interesting! But have to admit, I have been way more worried about missing arms and wonky shapes and details in general :D

@devnull_ Жыл бұрын

Have to admit, maybe I didn't listen carefully enough, but is this more about dynamic range of the resulting image or more about preserving/learning finer details? Seems like the article you show talks about "generate very dark or light images" / guess I'll have to read the whole article :D

@XxRazienxX Жыл бұрын

If you learn to paint you can add those yourself.

@lewingtonn Жыл бұрын

@@devnull_ yeah, the article is very sick, you should definitely read it

@silvermushroom-gamifyevery6430 Жыл бұрын

Sir, you are the 3blue1brown of AI Art.

@Tarbard Жыл бұрын

Great explanation. There's a lora called epi_noiseoffset which does this.

@JRGeoffrion Жыл бұрын

Question: From the graphs in the video, as I understand it, the model assumes somewhat of a symmetrical normal distribution around the noise with set black and white points (0, 255). These assumptions (black/white points and normal distributions) don't hold true for most images - even more so for images that are extra-ordinary. Is there a way to train / denoise while accounting for these additional factors?

@hermancharlesserrano1489 Жыл бұрын

Noice! Surely we want noise controls for the user? Sometimes you want composition, sometimes detail…

@pipinstallyp 4 ай бұрын

man it's been a wild west out there an year later.

@jonatan01i Жыл бұрын

14:35 exactly what I was thinking about.

@jonatan01i Жыл бұрын

Now, very high freq and very low freq info is distroyed fast. We need to do that in all frequencies. So, not only destroying some blobs, but blobs of any size.

@khirondb Жыл бұрын

Youre hecking sick aswell fam 😊

@jaredgreen2363 11 ай бұрын

Have you tried noising the discrete cosine transform? That way you obscure all frequencies at the same rate.

@Smiithrz Жыл бұрын

Super interesting stuff man. I use both SD and MJ, and always wishing SD was on par with MJ, particularly for “art”. SD is incredible for realism, but definitely lagging behind MJ on the artistic front; hopefully this means there’s light at the end of the tunnel for SD and art?

@KyleandPrieteni Жыл бұрын

MJ is based on stable diffusion it's just really well-trained because they had a team of people and the resources to make it good. I mean I have been able to make better art with SD than MJ, MJ gets repetitive since I can tell by its style. SD is open source though, there is a ton of models based on SD that are really good on the artistic front, you just have to know each model's way of handling prompts. Civit AI is where I get all the models and I train my own models too.

@Smiithrz Жыл бұрын

@@KyleandPrieteni Thanks Kyle, I'm aware of all of that. I just use the term "MJ" as a quick way of saying "The SD model that MJ uses", and "SD" as "all other SD models", lol. I actually did some work for Civitai recently :)

@omegablast2002 Жыл бұрын

is the 1.1 version not available anywhere?

@HB-kl5ik Жыл бұрын

Just released on civitai 🙂

@hplovecraftmacncheese Жыл бұрын

I'm new to SD. So we need to alter the code of SD then train our own models to approach the quality of MidJourney?

@lewingtonn Жыл бұрын

no sir, simply use the newer, better models. Luckily we have nerds to do all the hard work for us

@michaelli7000 Жыл бұрын

i have a question why the mj images have more like an identical art style like some digital painting with high contrast value, while sd's style seems more diverse? thank you

@philabusterr Жыл бұрын

Sorry if I missed this in the video, but does SD 1.5 have this built in now? Or is it forthcoming? Or is there something I have to turn on?

@JoelRehra Жыл бұрын

I Think you put the wrong llink to cacoe`s server in ya description... just links to the example images...

@lewingtonn Жыл бұрын

holy shit thanks! I updated it now

@steves5476 Жыл бұрын

Should try adding noise in fourier transform space instead of pixel space to tackle every frequency overall.

@AaronMayzes Жыл бұрын

I don't know anything about anything when it comes to this, but how would it change if the noise were added to the color represented as HSV or CMYK or something, rather than RBG?

@drdca8263 Жыл бұрын

Very nice! But, when I try to think mathematically about *why* the high frequency components would be washed out first, even though it makes sense intuitively that that would be the case, it isn’t clear to me how to justify that conclusion mathematically? I guess the thing to do would be to describe the distribution over Fourier transforms of Gaussian noise, but at first blush, I see no reason why the variance for higher frequency components would be larger than for lower frequency components. Like, say that we have functions f : (Z/pZ) -> R (Or maybe C , but whatever) for some prime number p, if we do a discrete Fourier transform of that, then, the components/coefficients (other than the frequency zero component) will be the dot products with the different p-th roots of unity, in different orders (possibly multiplied by some constant normalization factor). But, because these have the same terms, just in different orders (this is true because I picked the number of entries to be prime, and multiplication mod p by things other than zero, is invertible.) then, for independent identically distributed random values for the components of the function, these Fourier components should also be identically distributed. Now, maybe this could be just because I picked Z/pZ instead of Z/nZ , but still, I would think that for most natural numbers less than n, for a typical n, they should have no factors in common with- ok actually that’s not true... Half of the natural numbers n are even, and half of the natural numbers less than n (for large n) are even, and so at least half the time, at least half of the numbers less than n will have a factor in common with n and therefore have no inverse... But, would that really be responsible for the low frequency components being influenced less by noise?! Would we really expect that if the training only used images where both the width and height of the image are a prime number of pixels, that this phenomenon would go away?! (Except for the frequency zero component) That doesn’t sound like it should be true. That would be pretty bizarre, I think? Maybe the thing is just that there are many more frequencies (among integer multiples of the base frequency) that we would consider to be “high frequency” than that we would consider “low frequency”, and so most of the variance ends up in what we would consider “high frequency”? If you know the answer to my confusion, even if you are reading this comment multiple years after I wrote it, please reply to let me know the answer (provided that no one else has already sufficiently explained it, but that kinda goes without saying I guess.) .

@hardkur Жыл бұрын

i wish there was a way to implement Dalle mini text encoder into stable diffusion

@lewingtonn Жыл бұрын

dude, duuuuuude, I guarantee that's what SD3.0 will do, because it's the most glairing issue with SD right now

@hardkur Жыл бұрын

@@lewingtonn Correct , with Dynamic Thresholding (CFG Scale Fix) Clip skip and deliberate model im getting quality better than MJ but i never can get prompts to behave the way i want :/

@Roughneck7712 Жыл бұрын

Anyone who has been training their own custom models, textual inversions, and LoRAs knows that Midjouney doesn’t hold a candle to A1111 and Kohya

@hungdinh2193 Жыл бұрын

great explain. can you make video about controlnet. thanks

@VozerLamTruyen Жыл бұрын

We know this for years in denoising. High frequency will go first

@CreativePunk5555 Жыл бұрын

We're seeing this a bit more often now - the Midjourney killers. But these platforms only help Midjourney, competition is good business and will push MJ to improve and not get comfortable. Also, until the actual release and some time being spent by millions of users, we won't really know how great this all might be. Leonardo is something that is getting a lot of action recently, but will it stand the test of time? Everything new will always be hot for a moment - but when the dust settles, that will be the true indicator.

@_sytch Жыл бұрын

Non-open AI doesn't matter if you care about any positives it's gonna bring. Proprietary AI is a tool for corporations, governments and other powerful entities, and I doubt very much they'll use it for the greater good, rather the opposite. We've already seen tons of censorship that OpenAI do. So I personally don't care about MJ, never used it and don't plan to. Only SD matters as of now, whatever improves it is good. If other companies keep improving, whatever. Openness and availability always win in the long run, proven by thousands of years.

@jakejakejakejakejakejake Жыл бұрын

Great channel, bump up your audio levels though! ^_^ X

@jonatan01i Жыл бұрын

maybe change the line not to 1*old_noise + 0.1*new_noise but to a*old_noise + (1-a)*new_noise ; that might help not to make the overall noise being too much

@vincentcarlucci1259 Жыл бұрын

I am a little confused. You say that the Illuminate images are better, or at least as good as, MIdjourney but I find them too de-saturated. is this because you are only focusing on the values and expect Illuminate to correct this problem in the future? Okay, I had a look at the images in the Discord and they do appear better, some are quite saturated. However, in low light images the colors still do seem to desaturate. This does mirror what would happen with your eye since color sensitivity peaks in bright light and declines in lower light. In very dim light you are in effect color blind. There is a question here, though, about whether this is desirable. The dim images with higher saturation from Midjourney are, to me, more appealing.

@lewingtonn Жыл бұрын

Ok, look that's fair, obviously it's hard to articulate opinions about aesthetics, but to me some of the illuminati images certainly look better, and for me that's really exciting because it means we're at least close to parity

@cc12yt Жыл бұрын

This also kinda works as an anti "stable-diffusion-detector"

@TheCopernicus1 Жыл бұрын

Mateeeeeeee

@lewingtonn Жыл бұрын

maaaaaaaaaaaaaaaaaaaaaaaaaaate, yeah apparently I'm not dead hahaha

@LouisGedo Жыл бұрын

👋

@asdion Жыл бұрын

It's wonderful to see how profit stunts technological development once again.

@alecubudulecu Жыл бұрын

fun fact... this is how photography and our eyes work. around 50-70% of the world around you is... grey. what you see... is for the most part... grey.

@Zoltar0 Жыл бұрын

Open source all the way!

@chiveerum Жыл бұрын

Pardon a noob: if the models were trained on regenerating source-images that don't have a 0,5 mean necessarily from noise (With a 0,5 mean), why would it learn to target a 0,5 mean?

@IgorNV Жыл бұрын

Open source wins yet again. Technology belongs to the people!

@ZeroIQ2 Жыл бұрын

That is very cool, thanks for sharing! Maths wins again lol

@alicapwn Жыл бұрын

The discovery is great but doesn't explain midjourney quality and coherence. Only data does.

@SuperSigma69 Жыл бұрын

If it's Open source, midjourney will just use it

@CodyCha Жыл бұрын

Much improved but doesn’t come close to Midjouney in terms of the aesthetic and creativity

@LeonvanBokhorst Жыл бұрын

Parody 😂🎉

@alexs1681 Жыл бұрын

So did i get it right? You even haven't tested that model, 1.1 right? Great

@lewingtonn Жыл бұрын

nah, cacoe being a legend gave me an early copy

@Joviex Жыл бұрын

Those are not even close to parity. MJ still has the better composition.

@lewingtonn Жыл бұрын

Citation needed! I don't know how you can say that with such certainty. Publish a SICK comparison proving your point go go

@CodyCha Жыл бұрын

@@lewingtonn if you have an eye for design, it's not even a debate. Midjourney is on a different level. Just grab a pro designer/artist and do a blind test.

@spiffingbooks2903 Жыл бұрын

Where has this gone over the last month, the illuminati 1.1 model on civitai seems no longer to be available. Has it been integrated into all new models now?