The Lesson About GUID IDs I Learned the Hard Way

Рет қаралды 27,191

Күн бұрын

Become a sponsor to access source code ► / zoranhorvat
Join Discord server with topics on C# ► codinghelmet.com/go/discord
Enroll course Beginning Object-Oriented Programming with C# ► codinghelmet.com/go/beginning...
Subscribe ► / @zoran-horvat
I will give you the database system ingredients, the most ordinary ones, but if you combine them lightly, there will be a nasty performance bug in your database. A puzzle? Yes, it's a puzzle.
Hear the ingredients first. GUID IDs are common both in relational and NoSQL databases. One can generate them in a distributed system and never produce a collision, which is nice. SQL Server supports GUIDs natively. Does this sound familiar?
Add one more detail, and your perspective might change: SQL Server creates a clustered index on the key column by default.
Now, stop for a moment and think. An application generates GUID values stored as new row keys in the SQL Server database table with the clustered index. The stage is set for the disaster that will only become apparent when the table grows large. And so, the correcting action will be equally costly.
This video will display the dangers of using GUID IDs with default SQL Server behavior and the order of steps to correct the issue once it has been added to the solution. You will also learn how to define your entities and never cause this problem when persisting them.
00:00 Intro
01:08 The Wrong Clustered Index
06:15 Turning to Nonclustered Index
12:15 The Correct New Entity
14:48 Conclusion
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
👨 About Me 👨
Hi, I’m Zoran, I have more than 20 years of experience as a software developer, architect, team lead, and more. I have been programming in C# since its inception in the early 2000s. Since 2017 I have started publishing professional video courses at Pluralsight and Udemy and by this point, there are over 100 hours of the highest-quality videos you can watch on those platforms. On my KZfaq channel, you can find shorter video forms focused on clarifying practical issues in coding, design, and architecture of .NET applications.❤️
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
⚡️RIGHT NOTICE:
The Copyright Laws of the United States recognize a “fair use” of copyrighted content. Section 107 of the U.S. Copyright Act states: “Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phono records or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright." This video and our youtube channel, in general, may contain certain copyrighted works that were not specifically authorized to be used by the copyright holder(s), but which we believe in good faith are protected by federal law and the Fair use doctrine for one or more of the reasons noted above.
#csharp #dotnet #sqlserver

Пікірлер: 202

@FBarbarian 3 ай бұрын

This channel is proof that sometimes the most valuable advice is not on some big channel with millions of subscribers, but on a small channel like this. I cannot express how grateful I am for all your content and advice. You are extremely valuable to the programing community. Thank you so much.

@zoran-horvat 3 ай бұрын

Hey, don't speak like that! I want my channel to grow, too :)

@FBarbarian 3 ай бұрын

@@zoran-horvat I really hope you become as successful as you want to be on youtube! I was just thinking that the demographic for advanced programming discussions is not large. That's why we see channels who explain the basics receive way more views than channels who showcase advanced knowledge. If channel views were proportional to the knowledge of the creator I'm sure you'd be in the top 10 KZfaqrs already. Anyway, I was just trying to express my gratitude for your videos and everything you teach us. Agan. Thank you!

@adam-xt8te Ай бұрын

@@zoran-horvat You have my sub mister... and epic name

@lettuceturnipthebeets790 3 ай бұрын

love your sense of humor Zoran! 😊

@billy65bob 3 ай бұрын

Protip for people who want to drop primary keys: If you make a change to it in SSMS' Table designer, you can have it generate a script, and it will output all the corresponding drop/create statements for you. I use this a lot when I clean up databases, as it's often faster to reverify foreign keys (minutes) than to leave them active during mass data operations (days). Especially if you have foreign keys that aren't a primary predicate in your indexes, which is often the case.

@zoran-horvat 3 ай бұрын

That is a very useful tip, thank you!

@billy65bob 3 ай бұрын

If I had a wishlist for SQLServer, right at the very top would be some sort of mechanism that optionally deferred validating referential integrity until commit.

@zoran-horvat 3 ай бұрын

@@billy65bob That is an interesting thought. Do you know of some other database that can do that already?

@billy65bob 3 ай бұрын

@@zoran-horvat I would not be surprised to learn that OracleDB does. But some googling suggests that both OracleDB and PostreSQL supports 3 levels of it on all constraints. Those levels being Never (e.g. UNIQUE), optionally during Statements (e.g. bulk insert with FOREIGN KEY to self), and Until Commit. With the level being set when the constraint is defined.

@codingbloke 3 ай бұрын

Thanks Zoran. A note though that there a number of advantages to a clustered index. Some watching this video might be led to believe that its about insertion performance but there is more to it. In fact I would suggest its about read performance. Many large tables represent transactions (e.g. an order), over time the older transactions are rarely accessed, its the active current ones that accessed concurrently. Having these records physically "clustered" greatly increases the chances that they are already in memory when accessed. It also reduces the read effort a little because the leaf index pages *are* the record pages. So whether to use a clustered index or not is decision to made carefully and not just on the basis that a developer may prefer to allocate the ID themselves rather than let the DB to it. That said there are approaches that give the best of both worlds, that is, keep the the index clustered and generate keys in the application that highly likely to be sequential.

@zoran-horvat 3 ай бұрын

You have good points here, thanks! Especially the one on reading the data sequentially.

@mannetjie3704 3 ай бұрын

@@zoran-horvat sql azure forces you to add a clustered index on your table(s). that said, it doesn't have to be the primary key :)

@zoran-horvat 3 ай бұрын

@@mannetjie3704 I am not sure that SQL Azure really forces you to use the clustered index. I'd rather say it takes it as a default, which is the same as in SQL Server. Are you sure about that? And yes, you can build a clustered index on any column you wish, even the nullable and non-unique columns are susceptible to clustered indexes. The rule that makes sense is that one table can only have one clustered index.

@cdoubleplusgood 3 ай бұрын

It depends on the use case. Seeking a clustered index saves the extra key lookup in the main table, but the table itself being the index also means that typically fewer rows fit in one page, and as a result, a clustered index seek causes more IO than a seek in a "lean" index. The best case is when you have a "lean" index with few rows that is still covering. That might not be possible for every use case.

@zoran-horvat 3 ай бұрын

@@cdoubleplusgood That is true, but this video is not about removing a clustered index. It is about a bug that may happen when there is a clustered index. To fix the bug, you must either guarantee the sequence or drop the clustered index. That is the message. Looking from the other end, then if the clustered index is required (e.g. for the performance benefits it offers in some scenarios), then we must be very careful when generating ID values.

@JoeFeser 3 ай бұрын

I was on a project a few years back that every id in every table was a guid. One of the cross tables was a 3 way primary key. Inserts would take 20-40 seconds. I was assigned to "fix" the perf issue and they were not happy with my solution, DROP THE INDEX.. I tried telling them the design was bad.

@zoran-horvat 3 ай бұрын

That is the point, the DBA job must be done well for a database to operate well.

@esra_erimez 3 ай бұрын

The brilliance of this video cannot be overstated. Well done.

@torrvic1156 3 ай бұрын

Thank you Zoran! Nice video. I am always trying to avoid usage of this GUIDs and use ints but sometimes I’ll have to use this.

@zhenglaowang8489 3 ай бұрын

I just want to emphasize how professional and helpful this video was. Please keep it up and thank you. 👏

@jordanfarr3157 3 ай бұрын

I cannot believe how much I learned in 15 minutes. This is one of your best.

@zoran-horvat 3 ай бұрын

Glad to hear that.

@MixturaLife 3 ай бұрын

Fantastic! Your voice makes it sound like fairty tale)

@arkord76 Ай бұрын

Thank you very much for this video, Zoran! I was not aware about the nature of clustered indices. The explanation helped me to revise the DB configuration on a current project I am working on. Very helpful!

@GiovanniCostagliola 3 ай бұрын

Thank you for sharing, Zoran. As you're aware, removing the cluster primary key entails additional overhead for each key retrieval. While this action resolves index fragmentation, it introduces an I/O penalty with each key access. An alternative solution I propose is the adoption of ULID or UUIDv7 (with a personal preference for the former). These options ensure the monotonicity of generated GUIDs. This quality not only renders clustered primary keys feasible and time-sequential processing effective, but also enriches the ID with additional time-information that could prove beneficial in some scenarios. You would saved a lot of migrations ;)

@zoran-horvat 3 ай бұрын

Yes, the use of a UUID standard that guarantees sequence is preferable, but be careful if the system is truly distributed. The monotonicity could fall off node clocks are out of sync. On the other hand, a clustered index forgives small discrepancies, i.e. the performance loss will usually be immeasurable if it only has to move a few rows at worst. That is another hidden gem of clusters indexes.

@mangodude-nq6su 3 ай бұрын

Your voice and manner of talking are amazing. Thanks for valuable advice

@fernandocalmet 3 ай бұрын

Perfect explanation, very kind to share this lesson, big thanks.

@sfif4585 2 ай бұрын

Thank you so much for your work. You are really amazing teacher.

@JohnSmith-op7ls 3 ай бұрын

This is why you just let SQL Server generate the GUIDs for you in a simple setup or if you have a clustered DB setup or need to generate IDs in the application for other reasons, you don’t bother with GUIDs and instead use a better, sequential ID algo like Snowflake or some variant. Not only is it more compact, saving data file and index space, but you can keep a clustered index in cases where your reads will be more frequent than writes, both of which will improve performance and lower costs.

@zoran-horvat 3 ай бұрын

If it only were "just"...

@VoldHooker 3 ай бұрын

Brilliant explanation, thank you.

@wsollers1 3 ай бұрын

I love these videos. While I don't code in c# I do find them remarkably informative and useful.

@xorlop 2 ай бұрын

Wow this was a cool video to learn about. Has general applications anywhere :)

@AlexUkrop 3 ай бұрын

Really useful as I created GUID PK clustered as SQL suggested by default wrongly. Thanks a lot.

@admindravid482 3 ай бұрын

Thank you so much...I encountered this exact problem....when the table got big...

@zoran-horvat 3 ай бұрын

That is when you find out... What were your timings? How long would an insert take when you noticed the problem?

@Buutyful 3 ай бұрын

hi zoran, thank you for all this knowladge, is there a video in your channel that talks about domain modeling design?

@zoran-horvat 3 ай бұрын

In many videos I cover different aspects of domain modeling, though only a few videos mention that intent in the title.

@zbaktube 3 ай бұрын

Could the original setup work, but with Ulid instead of Guid? So we can keep the clustered index?

@Syed.KhubaibAli 3 ай бұрын

Thankyou @zoran-horvat, I really like your content , please keep posting valuable stuff.

@jarosawfrankowski7845 3 ай бұрын

Great content!

@mcdaddy1334 3 ай бұрын

Very Nice!

@PelFox 3 ай бұрын

Do you recommend making some other column with an incremental value as clustered or is it fine to have a table without it?

@zoran-horvat 3 ай бұрын

That would be an optimization and, as such, it is valid in certain scenarios. I have intentionally tried to avoid getting into the details of measuring and optimizing read and write performance because that would make it an hour long video at least, but it seems that the question of performance is inescapable...

@nikize 3 ай бұрын

0:47 There is no guarantee whatsoever that there will never be collisions 💥 however, it is unlikely. Guids is 128 bits, some of these bits has defined values and can not change, and there is several issues that can make them non unique, the default new Guid implementations are even predictable.

@asdasdaa7063 3 ай бұрын

Thank you Zoran, indexing has been something which very little videos exist for EF core. Please if you can create more videos explaining indexing, so that when your database gets large you can insert and read data in a performant way?

@zoran-horvat 3 ай бұрын

There are many channels about SQL Server for example where you can find excellent videos on those topics. I am not sure I can explain indexes and performance even remotely well as in some videos I watched in the past.

@asdasdaa7063 3 ай бұрын

@@zoran-horvat I have a question is this problem which you mentioned, only happen when you let the database create GUIDs as well as creating your own? How does this apply if you always create your own GUIDs, and when doing so you you have them as non clustered?

@user-ur8tk5gr7o 3 ай бұрын

Hi Zoran, thanks for another interesting video. I like to add a clustered index on another column which I make as a shadow property in EF and make this and identity seed to ensure the order. What are your thoughts on this approach?

@zoran-horvat 3 ай бұрын

I have used similar tricks in the early days, but then found out that I favor simpler configurations... Though, the idea you mentioned will work.

@kasperlaursen7005 3 ай бұрын

I usually avoid using guids as primary keys in the database completely. I assign standard integer database owned primary keys and add a second id column that is the guid. Thoughts on this?

@zoran-horvat 3 ай бұрын

Then you lose the greatest benefits of using GUIDs - the ability to generate the value before persistence and the ability to persist entities in the storage that does not support generating IDs.

@billy65bob 3 ай бұрын

@@zoran-horvat Such a setup doesn't preclude the benefits of application generated guids. Applications with such a setup are usually ignorant of the incrementing Ids when they're not resolving foreign keys in their queries. It does come at the cost of requiring 2 indexes at minimum; a clustered one for the incrementing ID, and a unique nonclustered one for the GUID; which does have some performance implications.

@zoran-horvat 3 ай бұрын

@@billy65bob Yes, a better formulation in my reply would be "you don't take these benefits for free".

@Veretax 3 ай бұрын

Yeah I'm glad that you showed this because I remember that when I was a junior lots of my co-workers thought that you would just generate the idea and this is before everything went to Goods because then you wouldn't have to find a way to look up what you just saved or find a way to get it to return the good from the database but when you have a distributed system this becomes a problem in part because if every client is creating a good there is a slight possibility of a collision it's extremely rare but depending upon the algorithm on which the idea is being created yeah you can have a collision it is more common in integer Auto incrementing what keys then g u i d paid guid has sort of become the standard to avoid this but again because the way networks work I would rather the server to find these values than the client 99% of the time. Because otherwise you might have values to get created that could be security issues a function which would then do an insert that would add something with a good then you might have data being added. Net has done a good job of trying to sanitize these things and make it harder for it to happen but it's one of those things that's always been in the back of my head that there's a reason why some things are best left on the server

@zoran-horvat 3 ай бұрын

The possibility of a collision in GIIDs is truly negligent, so you shouldn't count it in. There are issues that happen far more frequently in distributed databases and we still have the means to handle them.

@Veretax 3 ай бұрын

Yeah I feel like I didn't really word what I was trying to say very well.

@stevehancock2396 3 ай бұрын

Hi Zoran, thanks for the video on this subject, certainly interesting. The only thing that I'd suggest would be a little more focus on the database itself, regarding default column values and the using NewID() and NewSequentialId() functions. While EntityFramework can take care of this, if you're using EF to generate your database, this isn't the case if either using another ORM, or working against an already existing database. Just felt that this bit of detail had been glossed over.

@zoran-horvat 3 ай бұрын

Yes, that could be a value topic for another video. I thought that covering other aspects in this video would hurt its purpose.

@zhenglaowang8489 3 ай бұрын

Actually I have a question. I am currently working with EF Core using a .NET Core project, and I want to be able to prevent values below 0 from getting inserted/updated into a database table (basically expecting an exception thrown by EF Core in that case). I have read that there is a "CHECK" clause to enable us to validate a value before updating a column, but chieving it with EF Core seems to be difficult to do. I have noticed there is a Fluent-API call within EF Core module "HasCheckConstraint()", which seems to be the way to go, but it's marked as "[Obsolete]". I want a method that I can use together with migrations. I have also tried the "[Range(0, int.MaxValue)]" annotation, but with no success. Any idea as to how I can do it, and in an elegant and robust way, supporting migrations with EF Core? THANK YOU!

@lettuceturnipthebeets790 3 ай бұрын

I have an important question you've written a custom migration there, which means that in order to squash all the migrations in the future, your colleague would have to find the custom code among the sea of other auto-generated migrations and then recreate it later from my experience, it's a huuge pain in the butt how would you deal with custom migrations? is it ok to just put them in a different folder (under same namespace)? or is there a more elegant way for that? thanks!

@zoran-horvat 3 ай бұрын

There is no difference between custom and generated migration code. Each migration is immutable and EF will not write into it again. As I pointed out, this is not the problem caused by EF. I have been using other migration tools before and they had the same problem, that you must add custom migration code to make a certain change. On the other hand, the whole issue only happened once I made a bug in the earlier configuration.

@lettuceturnipthebeets790 3 ай бұрын

@@zoran-horvatI understand that migrations are immutable once created, but how do you squash migrations that you cannot recreate using ef core? by squashing I mean creating a big migration from hundreds of smaller ones (which will get the custom migrations lost if not careful)

@PaulPendor 3 ай бұрын

@@lettuceturnipthebeets790I’m trying to understand your question. What Zoran did was generate a migration, and then add to it. You can add whatever you want to the migration, even down to custom SQL statements. But that is the complete definition of your migration. If you later want to create a migration script, say an idempotent script which you will generate and apply within a CI pipeline, all you’re doing is combining the migrations you have defined. The migrations aren’t generated again. You’re just combining, or squashing if I am understanding your question correctly, into a single SQL migration script.

@andersjuul8310 3 ай бұрын

Brilliant! I didn't know there was a problem (and always used app generated guids before). One day a colleague would have suggested db-generation and I would have accepted if not for your caution...

@zoran-horvat 3 ай бұрын

It's the other way around: database-generated IDs are normally safe, because the tools would set up the database correctly. It is the application-generated IDs that may sometimes cause trouble if not aligned with how the database is created.

@desertfish74 3 ай бұрын

How,do other databases such as Postgres or even SQLite handle this?

@zoran-horvat 3 ай бұрын

From what I know, Postgres also supports clustered indexes.

@cdoubleplusgood 3 ай бұрын

Postgres has something similar, but it is not applied by default. By default, the PK is a "normal" index. Similar for Oracle. They have something called an "index organized table". But that is also not used by default.

@coolY2k 3 ай бұрын

But what if the database is read-heavy, and you search data by id: guid? Yes, insert would be slow O(long n), insted of O(1), but the read would be O(log n), instead of O(n). It's a trade off.

@zoran-horvat 3 ай бұрын

It is a tradeoff, but not with those asymptotic complexities. The insert of out-of-sequence IDs into a table with the clustered index takes O(n) time and read takes O(log n). The insert and read from a table with a common key is O(log n) in both cases. So it is actually O(log n) vs. O(n) tradeoff, which becomes tragic on any table with more than, say, 100k rows and absolutely unsustainable on a 1M rows table. The point is that there is no arithmetic joggling that would make the relocation of half a table in every insert even remotely performant as anything else a man could imagine, lest the table is of a very small size, hence not important in any analysis.

@coolY2k 3 ай бұрын

@@zoran-horvat > The insert of out-of-sequence IDs into a table with the clustered index takes O(n) time Ok. Fair enough finding where to insert data is O(log n), but actually moving everything and inserting is O(n). For tables where I expect a lot of rows, and where I'm not going use id:guid as most used where condition I don't think it's a problem. It would be fun to see what are actually read and write speeds for 1k, 10k, 100+k rows table with clustered and not-clustered id:guid.

@devmarkmonster 3 ай бұрын

Just wondering how many deployment systems actually use the Down migration. I haven’t seen them yet, mostly a idempotent migration script is generated for the Up. Am I missing out?

@zoran-horvat 3 ай бұрын

It is a safety valve in case you suddenly figure something out after deploying to production. I remember events of that kind in my career. Very rare indeed, but when it happened it was always driven by panic reactions at the customer, so they would request immediate rollback of the deployment and only then an investigation.

@petrmalecik5661 3 ай бұрын

Great Video I reall liked It. Just an offtopic question are you planning to create a course on Udemy or Pluralsight which would go more deeper into programming paradigm used by you. I took your OOP course and Design Pattern Course. Now I would also love to see some FP/OOP course will such course ever be created? Thank you.

@zoran-horvat 3 ай бұрын

I have one such course in the queue about OOD and domain modeling, but no dates when I might actually start working on it.

@OzgulEzgin 3 ай бұрын

Thanky you very much Zoran? Is it possible to create sequential Guid IDs in the Application so we don't have to give up on clustered index?

@zoran-horvat 3 ай бұрын

There is a class that apparently does that (learn.microsoft.com/en-us/dotnet/api/microsoft.entityframeworkcore.valuegeneration.sequentialguidvaluegenerator?view=efcore-8.0). I must admit I have never used it. To make it worse, I have never even read is documentation carefully. The critical question to answer is whether the sequence remains ascending if you move the application to a new server? Though there exists the standard definition of GUID that puts a timestamp at the front, thus making any sequence of GUIDs ascending, there is a nasty corner case when the two servers are out of sync with time. The problem is the same as when using a timestamp to version entities where it may sometimes lead to lost updates! For all these reasons, I normally let the database generate IDs when a clustered index is a requirement (in systems with heavy inserts, e.g.), and drop the sequential IDs request otherwise.

@OzgulEzgin 3 ай бұрын

Thank you so much @@zoran-horvat

@brianm.9451 3 ай бұрын

I didn’t know that EF Core and SQL Server generated sequential GUID’s when properties are empty. I started defaulting back to using long’s over guids. I prefer the sequential nature of it and clustered index is important for me.

@zoran-horvat 3 ай бұрын

When a clustered index is required, then the number of options is limited. That is the case in systems with heavy inserts. The way EF Core handles that is to initialize the table field with NEWSEQUENTIALID(), the same way as we would do when writing the CREATE TABLE script by hand.

@nickbarton3191 3 ай бұрын

Some years ago, I worked on a project using SSRS; I was writing T-SQL scripts to improve report turn-around constantly. Scary stuff and absolutely hated it. How does non-clustered indexes compare in performance in-the-large ?

@zoran-horvat 3 ай бұрын

It depends on what you need. Someone pointed out how fast they are if we tend to read the rules in chronological order, which is the central request in some applications. The clustered index will then make such reads substantially faster than the one with rows scattered all over the data pages. On the other hand, most of the business applications do not have such a request. We usually read the routes as per request, and requests are randomly driven by end users. That makes clustered indexes less important, even to a degree that maintaining them becomes a liability with no benefit in return.

@nickbarton3191 3 ай бұрын

@@zoran-horvat Yes, I found that clustered indexes worked faster with time-ordered data; happened to be quantities of mushrooms collected at distributed farms per picker over a week and aggregates per farm per cost-center.

@krm1t 3 ай бұрын

This does not seem like a very great advise - Unless you have a small database or don't care about select/update/delete performance. Imagine having a really large warehouse and when ever someone gives you some new data to "insert" you put it in a box and label it with your primary key. Since your key is ordered (typically ascending) you need to place the box at the right spot. If your key comes after the last box then you just put the box right at the end but if it's somewhere in the middle you now have to move all the boxes (this is called page splitting) that comes after your key to make room for the new one. Now imagine that same warehouse but without the clustered index. You get some new data but instead of putting the data into a box and labeling it you just put data on what ever shelf where you can find room. You are not even assured that all the data is in the same place. So when your "client" wants their data again you have to run all over the warehouse to gather up the pieces in order to get them their data. To put it simply. This fragments data on the disc making it slower to find the data for selects, updates and, deleting. The real solution to this problem is to make an integer or bigint column as the primary key and then make it an identity column. If you insist on having a Guid as your identifier then add that as a separate column with a nonclustered unique index.

@zoran-horvat 3 ай бұрын

First of all, my advice is NOT that you should not use a clustered index, as your comment seems to imply. The video is about a very specific situation when an application generates GUIDs using an algorithm that does not guarantee the sequence and there is a clustered index. One must go. Regarding your analysis, the actual performance depends on read and write patterns. Neither the clustered nor non-clustered index is better in advance without knowing the statistics. Your analysis does not generalize and, hence, does not lead to any conclusion.

@krm1t 3 ай бұрын

@@zoran-horvat what pattern are you talking about when the sql server just sticks the data wherever it can find available space on the disc?

@zoran-horvat 3 ай бұрын

@@krm1t This what you said is too wrong to answer to.

@krm1t 3 ай бұрын

@zoran-horvat sry it seems KZfaq decided to not show all of your message. A refresh added the first part which I hadn't read. That is true. You don't say it's a general advise. That's a misunderstanding on my part.

@MrValhallans 3 ай бұрын

You note a performance impact on using application generated GUIDs on a clustered index, however I would love to know what sort of impact that might be. For instance against a 1 million row X 10 column table, just as an example to get the percentage differance on read and write ops.

@zoran-horvat 3 ай бұрын

The percentage doesn't matter. It is the fact that an operation that takes near-constant time would be substituted with an operation that takes time proportional to a half of the table's size on average. One simply does not do the latter.

@MrValhallans 3 ай бұрын

@@zoran-horvat I take your point that its is self evidently the correct approach, however I have applications with this issue and I need to prioritize my time. If I go back and spend the dev time on the migrations will I see a 5% or a 40% perf increase. Thanks for the amazing videos, big fan 😁

@zoran-horvat 3 ай бұрын

@@MrValhallans The point is that when this issue manifests, it makes every row insert a pain so obvious that measuring it makes no difference, especially knowing that every next day will only make it slower. How do you prioritize work on a system where inserting a row takes a second?

@adam-xt8te Ай бұрын

I wonder why people prefer GUIDs instead incremented integers as PKs. Doing emergency hand-corrections in DB with FKs cause headache then.

@zoran-horvat Ай бұрын

An application cannot generate an autoincrement number. It must rely on the database to communicate the ID back to the application. Not to mention that autoincrement anything is almost invariably out of the question in distributed systems. Now list ten situations that come to your mind where waiting for the autoincrement number is an issue, and you will have ten use cases for GUIDs. It shouldn't be hard.

@adam-xt8te Ай бұрын

@@zoran-horvat Adding related data to many tables at once requires knowledge about PKs and FKs, but I always relied on EF or SQL's SCOPE_IDENTITY then and never set them by hand. Ok, it's just easier having keys before and after the action.

@jaiderariza1292 3 ай бұрын

The problem occurred when the app needs to create its own GUIDs, but if I understand you correctly, there is no problem with letting the DB generating the GUID + clustered, if the app does not want to create GUIDs, and will perform better. correct?

@zoran-horvat 3 ай бұрын

The problem was with IDs generated out of order when there was a clustered index. The database generates GUIDs in the ascending order when instructed, and so the problem would not occur then. However, that is not to say that database-generated IDs or clustered index perform better. The overall performance is also the result of read and write patterns exhibited by the application.

@pyce. 3 ай бұрын

I would keep an auto-incremented long as primary key (clustered) and a GUID with unique constraint (unclustered).

@zoran-horvat 3 ай бұрын

That is one possible design. But only one out of several valid ones. We should always keep the requirements in mind, both functional and nonfunctional, when making design decisions. For example, a clustered index can cause write congestion in some scenarios and work like a charm in others. A table with a common scattered index can cause read delays in other scenarios. There is no one size that fits all.

@MrWTFNETWORK 3 ай бұрын

Why not just have an auto incremented long as Primary Key (generated by the db) and don't even have the Guid?

@zoran-horvat 3 ай бұрын

@@MrWTFNETWORK Because it might be that the application is generating ID and database is only storing it. That is frequently the case in distributed systems. In such a situation, autoincrement numbers are not an option.

@MrWTFNETWORK 3 ай бұрын

@@zoran-horvat you made me think of a scenario where I store invoices and invoice line itens, 1 to Many relationship and we were using guid for primary Key of both tables. In reality to acess line items I see that the line items table could instead have a composite primary Key as invoice Id and line item id (incrementing integer generated in memory instead of guid as there can only be max 100000 items), what do you think about this approach? We only need to get 2000 items in a paged way until we allow clients to get all line items via API. We would use a continuation token being the id of the last item and always sort by the id

@Dragonet17 2 ай бұрын

What do you think about ULID lib ? Does it solve the problem ?

@zoran-horvat 2 ай бұрын

There is the new video demonstrating that possibility: kzfaq.info/get/bejne/g7CTlNuamMizf2w.html The caveat is that ULID will not solve the problem if the table already contains many rows. ULIDs are sorted left-to-right, where GUIDs used by the SQL Server have a different structure and sorting. Therefore, the sequence of ULIDs would not extend the sequence of IDs already in the table, once again causing the relocation of great numbers of existing rows upon every insert.

@Dragonet17 2 ай бұрын

@@zoran-horvat thank you for you answer, I've already started watching your latest video, I'm glad that answer is there

@MetronSM 3 ай бұрын

I'm not that fit in databases. But does that mean that I can basically generate my (gu-)ids in my software and use those ids to link entries of different tables before storing the data in the database? We ran into the problem that in Entity Framework 6, when referencing objects, we have to transfer large parts of the database because of the direct object links. The clients don't have direct access to the database but go through a server which handles the communication with the database. Being able to directly generate the ids myself would help us to mitigate some problems we have during multiple transfers.

@zoran-horvat 3 ай бұрын

If I got your description right, that is precisely the scenario where we choose to let the application generate IDs. The application then has the control of the times when it creates objects and times when it does then to the database - those two don't have to happen simultaneously. It is also useful when sending objects between systems, as well the systems will associate the same object with the same ID.

@MetronSM 3 ай бұрын

Thanks for the reply@@zoran-horvat

@Krauttrooper 3 ай бұрын

Alter table is a DDL instruction and can not included in a transaction. Only DML is protected by a transaction.

@zoran-horvat 3 ай бұрын

DDL statements in SQL Server exexute in a transaction, with a few exceptions like CREATE/DROP/ALTER DATABASE and the same for FULLTEXT INDEX. You can also mix DDL and DML statements in the same transaction. That is not limited to SQL Server. Other databases also support transactional DDL, most notably Oracle.

@Krauttrooper 3 ай бұрын

@@zoran-horvat didn't know. Thanks

@eveneveneven 3 ай бұрын

That's not true for SQL Server is it?

@zoran-horvat 3 ай бұрын

@@eveneveneven SQL Server supports DDL in transactions.

@user-we6wp1ky7f 3 ай бұрын

Perfect video Zoran! Looks like that initial approach with generating guids on database level is quite simple and we may forget about these nuances of how sql server works. We just don't need to assign new guid values for entities in our application code, am I right?

@zoran-horvat 3 ай бұрын

Actually, I believe it is better to generate GUIDs in the application because that is their greatest strength: global uniqueness. It lets it generate them in the frontend as well. It lets us choose whether to store entities in aggregate storage in one node or bounded context and in a relational database in another. When you put everything on the table, leaving the database to generate values on insert begins to look less attractive.

@user-we6wp1ky7f 3 ай бұрын

Please correct me if I wrong here, but for Publishers table with this approach we don't have any indexes now because primary key PublisherId is not the clustered index now. Will our search and filter operations in this table now be slower than with a clustered index?

@zoran-horvat 3 ай бұрын

@@user-we6wp1ky7f There must always be an index on the key, or otherwise the database would perform a full table scan on every access to a row. The default in SQL Server (and only in SQL Server) is that it keeps the table sorted by the ascending je, a.k.a. the clustered index. But if you opt for a "regular" unique index, as I did in the video, then there will be a separate index built and updated on every insert. So, the answer is that both Books and Publishers tables will have unique indexes on their keys at the end of the video.

@user-we6wp1ky7f 3 ай бұрын

@@zoran-horvat Thank you for your explanations, I became smarter after watching your video and reading your answers!

@reikooters 3 ай бұрын

You said what you did, but you didn't say why you did it. Why don't you want sequential guids? Where are the benchmarks? Before the change, what was the insert and select time and the query plan generated? After you made the change, what was the insert and select time and the query plan generated? Why did you decide you don't you want a clustered index? You may speed up writes but surely reads will be slower.

@zoran-horvat 3 ай бұрын

It depends on what you think that I said. I see that many viewers didn't see the central message of the video, which is probably my fault in passing that message. So, the message is: IF the application has been generating IDs for some time AND there was a clustered index THEN... we begin. The video doesn't question any of the decisions because those are driven by entirely different forces. It only questions the combination of decisions.

@reikooters 3 ай бұрын

@@zoran-horvat That does help to clarify the starting situation, and is still interesting as I definitely think others have been in this situation, although I do think it would've been nice to see some details on the improvement that the change made

@Galakyllz 3 ай бұрын

Resetting the UUIDs to UUID7 would be all that you would want to do. Removing the clustered index is not the best solution when we have UUID7. UUID7's initial bytes are sorted by a timestamp and then the right side is random, allowing you to sort them and immediately tell which ones were created first-to-last. If these are meant to be customer-facing and being able to tell when an entry was created is a security concern, then the primary key should be an auto-incrementing integer (with a clustered index) and a second column should exist with a UUID4 (with a regular index). As a side-note, including a UNIQUE constraint on the second column would cause unnecessary overhead.

@zoran-horvat 3 ай бұрын

That is true, with a caveat. What used to be encapsulated in the database, both creating and consuming IDs, now becomes the problem of someone else. It is entirely possible that a future update, such as accepting IDs from the node or service which was not entitled with that before, could break the rule causing a surge in performance. I would advise to either keep the dependency in an algorithm in one place (database), or to not have it at all.

@Galakyllz 3 ай бұрын

@@zoran-horvat True, either the database needs to have an implementation of UUID7 that it uses to generate the IDs or (when the UUID7 must come from an external source) all pathways that result in an insert into that table must go through the same abstract layer that generates a UUID7 on the caller's behalf. Most systems have a "repository" layer that would suffice but once you have more than one system with access to insert into that database, each one becomes an entrypoint for bad (non-UUID7) IDs. Creating a shared library for interacting with tables of that database would be my next path forward to greatly reduce errors.

@dxhelios7902 3 ай бұрын

@zoran "on SQL Server, when a GUID property is configured as a primary key, the provider automatically performs value generation client-side, using an algorithm to generate OPTIMAL SEQUENTIAL GUID values." from EF core docs.

@zoran-horvat 3 ай бұрын

That is true. The problem occurs when GUIDs are generated in the application and only transferred to EF. In that case, the algorithm will be whatever the application used and, unless a specific sequential algorithm is configured, that will be an out-of-sequence algorithm, as it was in my demo.

@dxhelios7902 3 ай бұрын

@@zoran-horvat "whatever application"? eehh, i don't know. Let's say you have existing dataset from other system (SYSEXT) that you would like to integrate into your system (SYSPROD). This dataset has entities with attributes (RDBMS sense): SYSEXTItemID, SYSEXTItemTitle. You want to store these entities in your DB as well. In that case you may say this is application generated IDs. I can get such scenario. You see how I view PK - they are specific to your DAL of your DB (SYSPRODDB). IMHO DAL responsibility is to save, operate and retrieve data consistently from your data store. The implications of such approach: 1. Only DAL controls how data is saved and retrieved. 2. It must control indexes, constraints and relations as needed inside DAL. In your scenario you would may as well have EF CORE generated sequential IDs in following way: - define table like SYSPRODID, SYSEXTItemID, SYSEXTItemTitle - add index on SYSEXTItemID, if needed, but keeping in mind your goals of adding records quickly, probably you want to avoid that In that case you would remove design dependency between PK and "IDs generated by application". Even if the application changes its behavior, DAL implementation stay stable. If I would encounter such an issue, then I would keep my DAL fully responsible for consistency and performance and would remove coupling between DB design and application as it would lead to other issues. Team may misuse and abuse such approaches. Also this does not help if Item is deleted or SQL is restarted due to maintenance on-prem or in the cloud. Fragmentation still possible and index maintenance should help more than anything else.

@JoeFeser 3 ай бұрын

The new index will end up getting fragmented over time and will require some maintenance.

@zoran-horvat 3 ай бұрын

Every nonclustered index does, and databases are still live and kicking.

@kocot. 2 ай бұрын

TLDR; if you can't ensure ascending GUIDs order, don't use clustered index, you're welcome.

@zoran-horvat 2 ай бұрын

Watch the video with more attention. You're welcome.

@kocot. 2 ай бұрын

@@zoran-horvatsorry, I've rephrased it, now it's both more precise and shorter :) don't get me wrong, many are interested in a full video, but most don't have 15min+ just to figure out what the whole thing about and and worth spending time, unfortunately it's rare that youtubers provide sufficient description, or even tray to bait with incomplete titles

@zoran-horvat 2 ай бұрын

@@kocot.It is not possible to state the problem precisely in 50-characters title, nor in the description for that matter. My channel consists of advanced-level videos and trust me, quite a few programmers watch them very carefully. I expect them to because I don't know how to pour an advanced piece of knowledge into a head that lacks patience. Look at me - I have been learning programming for 35 years now. Regarding the problem addressed in this video, the assumption is that the change cannot fit the existing table with data in it. That is a regular case in practice. Either the IDs are coming from another system that just didn't care to make them sortable, or the table already contains a lot of data, like a million rows, produced with a sortable UUID generated by a different algorithm, so the new algorithm does not extend the existing sequence. If either is the case, we must make cuts. One is to give up the clustered index, and that decision is tied to a few other considerations, such as write and read statistics. The other option is to recreate the table and generate new IDs using the new algorithm. Both decisions are hard to make, and this video does not suggest which one is the right one. Actually, chances are that it will be the DBA to decide. The upcoming video, which is in the publishing queue right now, will demonstrate both possibilities on two tables in the same database.

@berndeckenfels 3 ай бұрын

Why would sql server sort by the 4th segment?

@zoran-horvat 3 ай бұрын

The structure of UUIDs is defined by the corresponding standards. If you need to analyze the values, then you take the parts as defined in the standard. There is no freedom to decide.

@berndeckenfels 3 ай бұрын

@@zoran-horvat yes but the sql server does sort the bytes/characters in order of left to right, and normal v1 uuids like you showed do not sort chronological this way. V6 are better in this regard.

@DavidBeaumont 3 ай бұрын

Interesting. Why is it better to want to switch to the non clustered approach? Why not let the DB continue to create the ordered GUIDs itself?

@zoran-horvat 3 ай бұрын

It is not the question whether it is better or not, but which node in the distributed system is in charge to generate or, let's say, know the value of the ID. If it is an applicative node, a service, or anything of that sort, then keeping the storage in charge of generating IDs requires an additional trip back to the application before it can continue doing what it planned. It is also hard to rely on the database in polyglot systems where some of the storage nodes might not even support generating IDs.

@Muhammadsaleh77 3 ай бұрын

That's great

@ScorpionJL 3 ай бұрын

This is one reason I do not use migrations for SQL Server and instead go with a database first approach. The VS SSDT project allows for publishing changes to the DB schema and handles all the drops and creates needed. It's especially useful when iterating changes over a dev database and then pushing them all to a prod one when ready. Also, removing the clustered key results in a table classified as a HEAP and has it's own pros and cons. If you only ever lookup via the GUID and never do a range lookup there won't be much difference, but if you ever need to lookup a range of rows in a table, you'll want to create a clustered key. Also, depending on the concurrency and throughput of the application, sometimes a clustered GUID column may help in the long run, as long as you are doing database maintenance such as index rebuilds or defrags on a consistent basis.

@7th_CAV_Trooper 3 ай бұрын

Guids will kill your index. Btrees are optimized for right only writes. Use ulids instead because they're monotonicly increasing. I'm 99% sure Azure SQL requires clustered pk. EF can be setup to allow auto-inc into indexes. This is the best option, unless you're building mobile apps that require offline operation.

@ivancavlek4755 3 ай бұрын

Ok, but what about creating a sequential GUID in the app? There are NuGet packages that enable that (SequentialGuid by jhtodd for example) and also code examples on Code Project by Jeremy Todd ("GUIDs as fast primary keys under multiple databases"). I use this code for almost a decade now in the app and it didn't fail me (yet). Would these solve the problem i.e. have your cake and eat it too, or clustered index with an app generated sequential ID?

@zoran-horvat 3 ай бұрын

That would resolve the problem, but with a caveat: Every node that generates the IDs would have to know the condition and, on top of that, use the same algorithm. I fear more from configuration errors than the conceptual ones, especially as the resulting big is a nonfunctional one, therefore hard to reproduce and catch. On a related note, I plan to make a video on strongly-typed IDs with EF Core. The strong ID type would be the ideal place to lock the GUID generation algorithm, though it remains that the C# module containing that decision would have to be in sync with any other source of IDs.

@KangoV 2 ай бұрын

Uuid key solved by using Type 1. These are ordered. This is how Cassandra solves it. You're welcome.

@zoran-horvat 2 ай бұрын

Yeah sure, but you forgot the problem statement. You just messed up the database in exactly the same way as I did in the demo. You're welcome.

@KangoV 2 ай бұрын

@@zoran-horvat Explain how? I've been using Type 1 UUIDs for years. Never had a problem. Maybe this is SQL Server specific?

@zoran-horvat 2 ай бұрын

@@KangoV That is your homework, not mine. BTW, I did explain it in the video. Two times.

@Dalendrion 3 ай бұрын

So SQL Server generates the GUIDs in a way that each GUID is sequential. Intuitively, it seems that it's less random, and so it's more likely to create collisions. Letting the app generate GUIDs seems safer. Is my intuition correct? Or am I missing some context?

@Dalendrion 3 ай бұрын

"Sequential" is the wrong word, I suppose. "Ordered" covers it better.

@zoran-horvat 3 ай бұрын

@@Dalendrion Yes, ordered is a better term for that. There are several standards for GUIDs, at least one of them guarantees order on top of universal uniqueness (that is the principal trait of that standard).

@jasonfreeman8022 3 ай бұрын

Use INT ids for joins WITHIN the DB. Assign GUID’s as actual entity IDs so you can sync across DB’s (think Dev, QA, UAT and PROD) without using identity insert. If you want to use guids for actual IDs then use a time-based guid thats more suitable to your clustered index.

@zoran-horvat 3 ай бұрын

If you are going to query by GUID then you got nothing... That is the reason why we use external ID as the key (in those scenarios). Less code and better database performance, all in one package.

@CronKz 3 ай бұрын

Great video as always! What about using ULIDs instead? By the spec they are sorted by definition, since the first part of a ULID is the encoded timestamp. In that case you might be able to keep using the clustered index, even if the application generates the values. ULIDs are great! Unfortunately they are not yet commonly used.

@zoran-horvat 3 ай бұрын

It is a good idea. The only issue with types that are sorted by time is when used for versioning - once the system truly becomes distributed, small discrepancies among timers will cause incorrect sorting of versions and lost updates. However, the concept alone is very useful.

@berndeckenfels 3 ай бұрын

Normal uuids are not sorted by timestamp but there are special variants which are

@berndeckenfels 3 ай бұрын

@@zoran-horvatwhy does gaps in timers cause data loss?

@zoran-horvat 3 ай бұрын

@@berndeckenfels When a time sequence is used to version changes in a distributed system, then we risk the case when two updates are reversed because the node that effectively generated a later update could have its timer late. In that case, the earlier update could overwrite it. That is one of the failure modes for distributed databases.

@berndeckenfels 3 ай бұрын

@@zoran-horvat ah ok, yeah that’s not really related to uuid but yes that’s a problem if you don’t use vector clocks or similar

@ssa2955 3 ай бұрын

Is there a benefit in creating the GUIDs in code in comparison to creating them on the RDMS side?

@zoran-horvat 3 ай бұрын

It is tricky to vote this in terms of benefits. It is rather a reaction to other forces when designing a distributed system. So, in practice, you will need to cope with a situation where applications already have the need to generate IDs and letting a database do it for them is a sub par solution.

@anon_y_mousse 3 ай бұрын

Is that a problem with SQL Server specifically? If so, why not use something better? Also, GUID ID is redundant. You can just say GUID as both ID's are the same thing.

@zoran-horvat 3 ай бұрын

This is not the problem, nor is it with SQL Server specifically. Many other databases support clustered indexes - Oracle and Postgres do, for instance. The video is showing one typical mistake that can happen. A few commenters confirmed that happened to them in the past, and it also happened to me. You would typically notice the problem when an insert into a table becomes unreasonably long, such as over second or longer. I have heard horror stories of an insert taking half a minute before someone noticed that the things are not right. Regarding the naming, you are right that GUID is the ID. It is a good point one should keep in mind. However awkwardly, I was reading it in C# - Guid Id, i.e. the Guid type and the Id property.

@anon_y_mousse 3 ай бұрын

@@zoran-horvat Sequential indexes shouldn't cause that problem if the database is implemented correctly. I can only assume that they're using the GUID directly as a key instead of hashing it. This kind of thing is why increasingly I write my own. I keep seeing these kinds of logical fails in "professional" software.

@Sokooo 3 ай бұрын

That’s why you use Postgres ;p

@zoran-horvat 3 ай бұрын

No, that's not why you use Postgres. One must be stupid to make such a random technical decision.

@SantiagoBurbano-su9oe 2 ай бұрын

Don't, don't, thousand don'ts. You have just turned your Books table into a Heap, which in MSSQL absolutely kills reading performance. While the problem you describe with non-sequential guids is correct, leaving the table without Clustered Index is much much worse. A simple Index reorganization will fix the guid problem as your table grows. Stick it to the Maintenance Plan (you DO have one, right?) and call it a day.

@zoran-horvat 2 ай бұрын

Did you just explain we should never use heap tables before asking what queries we plan to execute? You know better than that, I hope. For example, that "absolutely kills reading performance" depends on what we query, right?

@SantiagoBurbano-su9oe 2 ай бұрын

@@zoran-horvat You ask me to provide context when you yourself provided none. In fact, that's actually the main objection I make to your video: the lack of context. Sure, Heaps have a place, sure non-clustered PKs have a place. But giving a 'general advice' to drop the Clustered when client-generated Guids are used as PKs without any warnings or caveats can put inexperienced developers in a much worse place than a fragmented clustered index would. I'm absolutely sure you know better too. 😉

@zoran-horvat 2 ай бұрын

@@SantiagoBurbano-su9oe I think I gave quite a sufficient amount of context when I said that the unsorted IDs are coming in from an external source. That condition is a given at the beginning of the demo. The rest is a consequence.

@matiascasag 3 ай бұрын

KZfaq why my comments are being deleted?

@zoran-horvat 3 ай бұрын

Did you add tags or links into them? KZfaq regularly deleted those.

@matiascasag 3 ай бұрын

@@zoran-horvat I wrote N3wId from M@ss Tr@ansit

@berndeckenfels 3 ай бұрын

Btw I think clustered uuid are not that bad if you insert/delete regularly because the gaps in the index will stabilize and only minor reordering is needed. In fact the easier clustered lookup will make up for it in some cases and you don’t run into one sided balanced index trees or mind extra old records keeping a lot of empty heap dangling,

@joshman1019 3 ай бұрын

Couldn't you seed your ID with some sort of sequential ID in a format similar to {SquentialID}_{GUID}? So something like a Linux based time index. That way the record always gets sorted into the end of the file. This would allow you the benefits of clustered indexing and collision avoidance. If you did get a few rows generated on the same time index, then the re-sorting would be fairly efficient. In my opinion.

@zoran-horvat 3 ай бұрын

You can use a specialized algorithm that guarantees sequence, but that opens a different set of issues. All components that you might allow assigning IDs must guarantee to use the same algorithm, for example. Any later change is not possible, and so on. All these limitations stem from the principal problem that we have separated the source of IDs from the storage that depends on them. None of the issues appear if we lift the artificial constraint of the storage, that is the point.

@joshman1019 3 ай бұрын

@@zoran-horvatThis is great information. Thank you!

@ricodomonkos3053 3 ай бұрын

DDLs not always can be rolled back. For example in MySQL, Postgres and Oracle a DDL is implicitly committed.

@zoran-horvat 3 ай бұрын

Yes, sure, but that doesn't change the fact that every mainstream RDBMS supports transactional DDL. Actually, missing such a fundamental feature in 2024 either makes it outdated or a niche technology with an accent on something else (which is fine).

@jonijarvinen7833 3 ай бұрын

Sorry but this video provides absolutely no context in which circumstances using GUIDs is slow. It isn't even slow by default and can be just as fast or faster than ordered sequential keys because it prevents congestion. This video also does not provide any data on what exactly is "slow". Don't take it from me, just go watch a video about with actual data and experience to back up what they are saying, which is, you should not worry about using GUIDs. They are just as fast in almost all situations withou any kind of index shenanigans: kzfaq.info/get/bejne/pMlkdrCgnLy3gKM.htmlsi=2dPyPAT3rX2OGHOn

@zoran-horvat 3 ай бұрын

My video is not on the topic you are mentioning. I would appreciate it if you could watch my video as being what it is, rather than as not being on the topic of some other video. My video is about a very particular situation when a programmer unintentionally leaves the clustered index on the key column, and then intentionally uses an out-of-sequence source of IDs. The situation naturally occurs when using GUIDs. Contrary to what you have imposed, the video you have commented on is not about index performance. After reading your comment two times, I have an impression that you have missed the theme of my video entirely, particularly the point that out-of-sequence IDs cause relocation of large number of table rows on every insert - an effect that has no relationship to fragmentation, specificity, uniqueness, or any other property of indexes.

@jonijarvinen7833 3 ай бұрын

@@zoran-horvat And that video takes a deep dive into clustered indexes. There is no one best practice. It depends how you use your data. Heavy inserts are slower on an ordered clustered index because you are creating a hot spot to the last page. From the author of the video from another source: "First, I ask that you realize my video isn't just about GUIDs. I use Random GUIDs to emphasize many myths and points about all kinds of indexes. Shifting gears back to your question, when you're working on an almost pure OLTP platform and as I say in the video, logical fragmentation isn't going to matter at all. You're going to be looking up one row at a time and that means a trip in the B-Tree from the root thru the intermediate levels of the B-Tree, to the page in the leaf level where your row is. If the index is (for example), 4 levels deep, then you'll do 4 8k page reads for every look IF the index can be used. If not, then a scan of the leaf level is going to occur and I probably don't need to tell you that can be millions of pages. Also, Random GUIDs any better or worse than IDENTITY columns when it comes to finding things UNLESS you're using those things as previously identified keys. They're both surrogate keys and mostly tell you nothing about the data on the row. In both cases, they work great for OLTP IF you already know which key to lookup. For everything else, you're going to need a different column or two to find what you're looking for. The key thing to remember when it comes to "Clustered Indexes" is that "It Depends" and there is no panacea. If your system IS most OLTP then, like I said, Random GUIDs and IDENTITY columns are great even as a Clustered Index. If, however, you're doing "range" lookups by (for example), dates, then the better choice would be to put the Clustered Index on the date column as the first key and the unique surrogate column as the second key column so that you SQL doesn't have to do "Row Lookups" to get the information not carried in a non-clustered index with a date key. Even that isn't a universal recommendation because, just like an IDENTTY column, have a Clustered Index on the data will likely create an Insert "Hot Spot" at the logical last page of the Clustered Index. Even worse and as I mention in the video, it's even worse if you do such inserts (which all go in to 100% page fullness and then they create another page) and those are following by any UPDATE that makes the row larger (which I refer to as "ExpAnsive Updates") before you can rebuild your index at a lower fill factor, you're going to get massive page splitting and the resulting fragmentation. Of course, Random GUIDs don't have either of those issues because all INSERTs are fairly well equally distributed across multiple pages (usually 1 page per row if the table has at least 1 page per row being inserted). EVERYTHING is a tradeoff when it comes to the Clustered Index. If someone starts a recommendation about Clustered Indexes with "It a "Best Practice" that all Clustered Indexes should be...", just walk away because they completely wrong unless they end that with "assigned with great caution because there is no panacea". I also used to believe in the old advice that "every table should have a clustered index". Again, I'll now say "It Depends", especially since that came out with ALTER TABLE REBUILD in 2008. Yep... I know about that nasty old RID being added to non-clustered indexes instead of a nice narrow INTEGER or BIGINT but still "It Depends". So, for OLTP, "It Depends". For Analytics, "It Depends" on something else. Even within Analytics, each table may have (likely WILL have) different requirements to maximize performance and minimize resource usage. I'll also end this with logical fragmentation may or may not make a difference for reads (you need to test), will usually make a difference for writes because even supposed "good" page splits are 4 to 43 times more resource intensive as proven by Paul Randal, and page density should almost always be a primary concern. It's very bad to have a low page density in many cases and it's very bad to have a high page density in others. Again, I know it sounds like a cop out on my part but... "It Depends" and "Must Look Eye" are the only truths when it comes to indexes and the lack there of. Ah... there's one more truth... no matter what you decide on, it will be a compromise... especially for Clustered Indexes and Heaps :D"

@zoran-horvat 3 ай бұрын

@@jonijarvinen7833 My video does not take any dive on clustered indexes. My video is not about clustered indexes. OK?

@yaroslavpanych2067 3 ай бұрын

EF. Duh. It is such a bitch. It would be ok, if it just worked with data, not fucking with structure! I'm developer, and in most cases i have few architects above me, we most definitely know about structure better, and how to update it.

@zoran-horvat 3 ай бұрын

Actually, EF did nothing here. It was just passing values between the application and the database with no modifications.