No video

Stop using COUNT(id) to count rows

  Рет қаралды 373,525

PlanetScale

PlanetScale

Күн бұрын

Пікірлер: 609
@DoubleM55
@DoubleM55 Жыл бұрын
I had a strong feeling that something like this is going on in DB engine, but only "argument" I had is pretty weak: "It would be very stupid if DB engine pulled all data from all columns just to pass it to count() function and return single int", and it would be very easy to implement such optimization. So I was using select(*) while simply trusting the DB to do the right thing. Thanks to this great video, now I have a confirmation and also know exactly how DB decises how to count rows in optimal way. Great video :)
@PlanetScale
@PlanetScale Жыл бұрын
Nice to have your gut feeling proven right! Glad you liked it.
@QWACHU
@QWACHU Жыл бұрын
This was maybe true far far in a past, but many years ago count(*) was optimized in DB engines to not use whole columns.
@SergeDuka
@SergeDuka 11 ай бұрын
I suggest using ‘count(1)’ to specifically show that no data is used. It eliminates any confusion.
@timucinbahsi445
@timucinbahsi445 10 ай бұрын
I think this is one of the most important skills in programming. Knowing you are not the only smart person around :) If it's so obvious to us, developers of the engine might have thought of it as well. Modesty W
@ErazerPT
@ErazerPT Жыл бұрын
The most important take from the video is not about count(), but generically that (if speed is critical) you should always review the execution plan. Easier to do while at the prototyping phase, but can still be accomplished in production, just needs more testing and QA.
@meenstreek
@meenstreek Жыл бұрын
The problem is the execution plan the database will use while in development with very little data will be different to the execution plan in production when the query is now traversing millions of rows. So you just have to make educated guesses and use experience to know when a query will work or not and then iterate.
@wasd3108
@wasd3108 Жыл бұрын
I don't think you got the main point. It's about shutting down your uncle when everyone is around.
@christianbarnay2499
@christianbarnay2499 10 ай бұрын
If speed is critical you should always write your queries in the most simple and easy to understand way. Optimizing the execution plan is one of the main purposes of the DB engine. Just let it do its job without interfering. When performances degrade, the first response is to update statistics to let the engine have an accurate view of the data. If the engine still fails at finding the fastest path, rewrite your query in a better way. And ultimately give it directions as a last resort solution.
@ErazerPT
@ErazerPT 10 ай бұрын
@@christianbarnay2499 "If the engine still fails at finding the fastest path, rewrite your query in a better way". That's what i said. "Review the execution plan" not "optimize the execution plan". If you can see that the DB engine can't make heads from tails of what you want, THEN you need to think on what/where to change so that it can. And waiting for degradation is a bad idea, because it can take long enough that when you NEED to make changes you might find out you CAN'T make them anymore (without breaking stuff). Also, most people writing queries are software developers with not that much DB background. You can't expect them to write "excellent queries" or design "excellent schema"... As a "dumb software developer" i count my blessings when i have really good DB people around so i can show them my stuff and they can keep me from shooting my own feet :D
@christianbarnay2499
@christianbarnay2499 10 ай бұрын
@@ErazerPTDatabases are so central to software that I can't label someone a software developer if they can't write decent SQL requests. And SQL is so easy to understand it only takes a couple hours to get the basic knowledge that will fulfill 90% or your needs.
@teej_dv
@teej_dv Жыл бұрын
What about counting in OurSQL?
@aarondfrancis
@aarondfrancis Жыл бұрын
I can always count on you! YouSQL
@AmitMerchant
@AmitMerchant Жыл бұрын
😆@@aarondfrancis
@dmsalomon
@dmsalomon Жыл бұрын
Chill comrade, we don't need to expropriate the database engine for the proletariat
@ejaz787
@ejaz787 Жыл бұрын
in soviet Russia you don't count(*) , * counts you
@XmasApple
@XmasApple Жыл бұрын
Did you mean YQL used in YDB?
@dimasshidqiparikesit1338
@dimasshidqiparikesit1338 Жыл бұрын
This is actually incredibly educational. Thanks, planetscale!
@binaryfire
@binaryfire Жыл бұрын
"You'll have won the argument, which is what the holidays are all about." 🤣
@bazcuda
@bazcuda Жыл бұрын
"todos" also means "everything" in Spanish. So, "Select * from todos" means "select everything from everything". Terribly inefficient but it'll be great fun writing the code to sort out what we want from the results 😜 - a grizzled vet.
@CraftDownloads
@CraftDownloads Жыл бұрын
I get that, also works in portuguese 😂
@GuilhermeMG
@GuilhermeMG Жыл бұрын
With count(1) you get the performance boost without all the confusion
@PlanetScale
@PlanetScale Жыл бұрын
Yup! Counting with a constant is totally a viable option. See 06:09.
@CottidaeSEA
@CottidaeSEA Жыл бұрын
It's what I always do, because I feel it is the most expressive.
@fano72
@fano72 Жыл бұрын
I also like to do that. No data is needed from the rows to count them.
@TehKarmalizer
@TehKarmalizer 11 ай бұрын
This is what I've been doing for ages. Using count(*) at least has been slower in some older or obscure database engines. And I've worked with several.
@pukkimi
@pukkimi Жыл бұрын
I might call myself a somewhat senior programmer. Sometimes query optimizers did not realize to use a clustered or whatever else indexes of a table when using count (*). This happened at least on Oracle 8 and the workaround was to use count ([indexed column]) where [indexed column] = something. Count (*) caused full table scan in some cases, at least if table contained lobs. So there might really be a reason why some grey beards warn on count(*). When in doubt, check the execution plan.
@TheGreatAtario
@TheGreatAtario Жыл бұрын
I think the real takeaway here is that Oracle sucks
@pukkimi
@pukkimi Жыл бұрын
@@TheGreatAtario Sir you are most certainly correct on Oracle, but there are same kind of stupid behaviors in almost every database as far as I know. Not this fault but many more and different. The real takeaway is that always check the execution plan :)
@alexU42k
@alexU42k 11 ай бұрын
It is always a challenge to figure out if something was done on purpose or by lack of knowledge or any other reason
@DmitriyYankin
@DmitriyYankin 11 ай бұрын
Why count should use index?
@alexU42k
@alexU42k 11 ай бұрын
@@DmitriyYankinit is up to query optimizer to use index or not, but in general it uses cheaper solution (less IO operations)
@user-yb6rd1fm5e
@user-yb6rd1fm5e 11 ай бұрын
A couple of corrections: The COUNT() instruction can receive as parameter an EXPRESSION or a wildcard, that is to say, you could write: COUNT(*) or COUNT(1), COUNT(pepito), COUNT(id) or COUNT(99999) which will give you the same, the inside is considered a wildcard, the "*" is used as a wildcard by convention , but we could use any character because by definition, it doesn't use information about any particular column (and including for the COUNT the rows that contains NULL values in any column). In the case that you comment the COUNT(id) and the COUNT(*) bring the same result because the "id" is declared as if it was a wildcard so the behavior is the same and the server takes the license to optimize the process as you have explained in the video. But, if you really wanted to count the values of a field, the correct way would be to specify COUNT(ALL id) and this expression does have a difference with respect to the COUNT(id), and it is because it will only consider for the count the NON NULL values inside that field In the case of the example of the video COUNT(id) and COUNT(ALL id) should return the same result, since the "id" field, being a primary key, would never be empty, but the difference would be that you would force the server to use the index of the primary key to execute the COUNT(ALL id). Finally, while it is true that the server often saves us from ourselves, it is not exactly true that it always makes the best decisions, as a DBA with over 10 years of experience I have found myself in several situations where after checking the execution plan I realize that the server is taking a not so optimal index for the instruction that has been requested and you have to address it to use the correct index for some instruction, this is seen quite often in big data querys.
@myonlynick
@myonlynick 11 ай бұрын
i partially disagree with your second paragraph. I Put it to the test. I Downloaded the open source 'world database' for mysql. I ran 2 queries. select count(*) from country; which gives a reply: 239. The second query is: select count(IndepYear) from country; which gives a reply: 192. Indepyear is not a primary key and has several NULL values. IF you are wondering: select count(ALL IndepYear); returns the value 192 as well. Hence, in mysql 'ALL' is optional.
@LeeKao
@LeeKao 11 ай бұрын
I felt my brain growing as I was reading your comment
@DmitriyYankin
@DmitriyYankin 11 ай бұрын
Why someone liking this absolutely wrong answer? count() depends on particular columns. count(id) will count only NON NULL ones. COUNT(id) and COUNT(ALL id) are absolutely the same exepression as count(id) is implicitely ALL.
@user-yb6rd1fm5e
@user-yb6rd1fm5e 11 ай бұрын
@@DmitriyYankin read the damn documentation before you said something. Also, obviously some databases work a bit different from what I said, if you use other SQL database read you own documentation 🙄.
@DmitriyYankin
@DmitriyYankin 11 ай бұрын
@@user-yb6rd1fm5e yep, read it yourself. dev mysql com: "COUNT(expr) Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement." ... "COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values." ...
@PaulSebastianM
@PaulSebastianM Жыл бұрын
primary keys are clustered, aligned to disk clusters, physically, so counting them means traversing the disk to gather the count, and if the order on disk is not adjacent, then the index is fragmented so counting can take a lot of time, while non primary keys or indexes are no clustered, meaning they don't need to follow the disk physical alignment so they are most often stored off-table in a much more compact data structure, which even when it gets fragmented, the data is still going to be close to each other because all the data structure holds is index records, not every row in the table, like what each clustered index follows.
@aarondfrancis
@aarondfrancis Жыл бұрын
Correct!
@jvapr27
@jvapr27 Жыл бұрын
Very cool to know. FYI: I think for some databases this is not true though. Clustering can be set differently than primary keys. For db2, for example primary indexes are not by default clustered. For databases like snowflake, they do not index the primary key. Each DB may be different. Still very cool. Thanks!
@PaulSebastianM
@PaulSebastianM Жыл бұрын
@@jvapr27 correct, you can have one single index that can be clustered because that orders the records of that table physically. But that index doesn't have to be the primary key though some dbs might enforce that.
@debasishraychawdhuri
@debasishraychawdhuri Жыл бұрын
count(*) and count (id) are semantically equivalent, the database should not behave differently for those queries.
@PaulSebastianM
@PaulSebastianM Жыл бұрын
@@debasishraychawdhuri well, no. One is explicit, the other is implicit. Implicit means suggested but not expressly stated. Thus they are completely different. That is logical deduction.
@lostcarpark
@lostcarpark Жыл бұрын
Thanks for this. I've had this argument so many times. I think some early SQL engines did look at all columns for count(*), but I believe pretty much all of them have this optimization at this point.
@user-ps7zt3vm9q
@user-ps7zt3vm9q 11 ай бұрын
I believe it is a key point. Some early SQL engines did not have that optiomization, now they have it. To make a coclusion with a smart look now, is not exactly right. So man should make it automatically and even do not spend time, always use `count(1)` and not to worry about the peformance or search for confirmations. That solution works everywhere.
@ragsChannel
@ragsChannel Жыл бұрын
A nice gotcha indeed! One question : this "optimization" -- is it applicable ONLY to MySQL or is also the case with say, Postgresql ??
@romanstingler435
@romanstingler435 Жыл бұрын
in postgres the star is not necessarily the fastest, it also depends if an index is used or just a scan is used, due to the amount of data in the table, and if auto vacuum was successful recently.
@Keelyn1984
@Keelyn1984 11 ай бұрын
It applies also to oracle. Count(*) counts the rows not the data. But keep in mind that inline-views or subselect still have to fetch data for the sql to even work. Using a constant instead of * is also a common (most offen 1 is used) viable alternative.
@artemisamberdrive583
@artemisamberdrive583 Жыл бұрын
for specific purposes (e.g. extremely large table, statistics, etc) i set up a count-table with a single row and column, holding the information of the count of rows of the "parent table". this requires setting up triggers on the parent table insert+delete procedures to increment+decrement the value of the count-table. keep in mind that this setup slows down the process of writing data, but data is usually read many times in contrast to written.
@GuruEvi
@GuruEvi Жыл бұрын
That seems extremely 'hacky' and you end up doing the same thing an auto_increment lock does (with more hoops) and if your system gets busy or needs to scale you basically lose any concurrency. Also makes your application a whole lot less portable and you could make a mistake (eg. most examples I've seen online, do not take into account that an INSERT or DELETE can target multiple rows, but the trigger only gets called once, so now you need looping logic or you have a bug). Not sure if you actually "need" a count if you're working with that many records, but most database engines can provide an estimate or perhaps, you may be able to use a different database system altogether that is better optimized for providing statistical information.
@SXsoft99
@SXsoft99 Жыл бұрын
yes and at the same time you need to keep on separate columns all the conditions combinations for filtering, also good luck finding programmers that actually use things like triggers since it hides the application business logic in the database
@christianbarnay2499
@christianbarnay2499 10 ай бұрын
I don't know for MySQL but most DB engines already do that on their own. They have a "table information" table that contains all the metadata of each table, including the row count. select count(anything not nullable) without a where clause will automatically trigger a lookup in that table to get the current count of rows of the selected table.
@SR-ti6jj
@SR-ti6jj Жыл бұрын
Does this mean adding a non-null secondary index will improve count performance on tables that don't already have one?
@parkamark
@parkamark Жыл бұрын
You could create a secondary index on the same column(s) as the primary index. That would then speed up counting, unless the DB engine is doing some clever stuff under the hood that means you don't necessarily have to do that. But given what he's said in this video, creating a secondary index on the same column(s) as the primary is certainly a good workaround. Maybe someone who knows more than I do could clarify this point.
@parkamark
@parkamark Жыл бұрын
To answer your question directly, having ANY index on a table is way better than none at all, both in the case of searching and counting. So having a simple non-null non-unique index would be the minimal requirement for fast counting. It also matters if the columns are fixed or variable length datatypes, eg. int is fixed length, varchar is variable. If all columns are fixed, then the database can also do a fast count without any indices because it knows that the length of each row is fixed, and it knows the full size of the entire table, thus what the number of rows must be.
@TravisTennies
@TravisTennies Жыл бұрын
Just subscribed. Hard to find people who actually talk real facts these days.
@harrytsang1501
@harrytsang1501 Жыл бұрын
Yes, very often the problem you are trying to solve is more generic, and can be expressed in more generic terms. Using language features and trusting that smarter people have put more effort in optimizing the language itself is often more optimised than what we can come up with ourselves. One classic C trick was to not use multiply/divide and instead add or subtract bitshifted values. However, in modern systems that no longer take dozens of clock cycles to multiply, the compiler knows better and will just replace your whole expression with a multiply.
@Demonslay335
@Demonslay335 Жыл бұрын
The compiler may even optimize your division into multiplication or bit shifts, and all other kinds of fun wizardry. It really is more important to have readable code in many cases nowadays.
@CottidaeSEA
@CottidaeSEA Жыл бұрын
@@Demonslay335 In some cases it doesn't do what you want it to do; but only then is it worth looking into optimization. We should always be aware of the performance impact our code has, but there's no need to go crazy about optimization before you even have any performance information to work with.
@hwstar9416
@hwstar9416 8 ай бұрын
usually compiler isn't as smart as you think. It can do simple optimizations like the one you mentioned, but anything slightly more complex it fails at.
@harrytsang1501
@harrytsang1501 8 ай бұрын
@@hwstar9416 That is because you are using more dynamic languages. With more strict rules for memory safety and type setting, newer languages like Rust and Zig are doing wonders. It doesn't save you from algorithm problems with Big O of n cube tho
@hwstar9416
@hwstar9416 8 ай бұрын
​@@harrytsang1501 I don't use dynamically typed langs, I use C/C++. People often overestimate how optimizing the compiler is, it's not as impressive as you think
@airbornesnail
@airbornesnail Жыл бұрын
"You'll have won the argument which what holidays are all about" - best sentence I've ever heard. :D
@TheyCalledMeT
@TheyCalledMeT 11 ай бұрын
which also means it needs a secondary non null index to function the way you indicated. redo the trial without such an index to see what it does
@Metruzanca
@Metruzanca Жыл бұрын
This is the type of content I didn't know I wanted. More please.
@anothermouth7077
@anothermouth7077 Жыл бұрын
Two thumbs up to you. My TL was at times bugging me on some reviews. Even when I showed him SQL docs saying that most optimisation is already being done by the MySQL there's no need to over engineer the solution.
@Ceelbc
@Ceelbc Жыл бұрын
With a proper SQL implementation, this should not matter; the compiler should handle this.
@Ceelbc
@Ceelbc 11 ай бұрын
@@lawrencechiasson975 *which is part of the compiler.
@Im_Ninooo
@Im_Ninooo Жыл бұрын
I've been using COUNT(1) for a while now. I wonder if the behavior is the same in other databases such as CockroachDB
@jameseze9469
@jameseze9469 Жыл бұрын
So why didn't they just make the Count function parameterless. Select Count() from customers
@AirelleInfo
@AirelleInfo Жыл бұрын
Because there are good reasons to COUNT(): on a nullable column, COUNT() removes the NULL values and counts only valued rows for that column; or you may need to COUNT(DISTINCT ) to count non-NULL unique values.
@jameseze9469
@jameseze9469 Жыл бұрын
@@AirelleInfo ah shit!, I forgot the count(distinct ).
@violin245
@violin245 8 ай бұрын
How is someone talking about SQL this charming
@nskeip
@nskeip Жыл бұрын
I should create a database engine where COUNT(*) is "multiply number of columns by the number of rows", COUNT(/) is "divide rows by columns", COUNT(+) is ... (you get the idea)
@jeffmccloud905
@jeffmccloud905 Жыл бұрын
actually, WAY back in the day in Oracle (1990s), it was recommended to use SELECT COUNT(1) and not COUNT(*) because it actually did make a difference. but they fixed that. but some grizzled old devs kept that convention.
@edbutler3
@edbutler3 Жыл бұрын
Yeah, I've been using count(1) on Oracle for decades. I've suspected for a while that recent versions have a smart enough query optimizer to do the right thing with count(*), but I haven't taken the time to verify. And for MS SQL Server, the "culture" has always been to use count(*), so I've assumed it's ok there.
@PanduPoluan
@PanduPoluan Жыл бұрын
I always use COUNT(1). Because COUNT(*) depends totally on the engine's optimisation.
@freibuis
@freibuis Жыл бұрын
This was not always the case though. Count(*) was always slower. Not by a little.. by a lot. That saying i havent used my mysql in 10 years. But with my experience with postgres, it didnt matter
@spacemanmat
@spacemanmat 10 ай бұрын
Always nice to see how the optimiser is working under the covers. I’ve seen a few cases where the original program had done something dumb but the optimiser picked up the issue an optimised the issue away. Still makes me uneasy relying on it though.
@andrewkamoha4666
@andrewkamoha4666 Жыл бұрын
6:15 "Over Thanksgiving, when you're arguing with your family about whether count(*) or count(id) is faster" I'm sure grandma will love to talk about that ...
@aarondfrancis
@aarondfrancis Жыл бұрын
My dad was actually a DBA, so there's a non-zero chance it will come up
@odysseus655
@odysseus655 Жыл бұрын
I stopped using count(*) decades ago when I ran into a problem with our database engine at the time where there was some catalog corruption and this was erroring out with "column not found error". Lately I've been using count(1). Likely not a great reason to continue not using it (and I'm expecting the execution plan to be the same in any case).
@kentlarsson1263
@kentlarsson1263 8 ай бұрын
Great stuff! Love that you mix in a bit of fun with the content, it's what got me to subscribe!
@michelprovencher8518
@michelprovencher8518 11 ай бұрын
Like the EXISTS function where the "SELECT * FROM ..." is just a predicate for the function to work and only the WHERE clause is meaningful
@spaceemotion1
@spaceemotion1 Жыл бұрын
In a client project a few months back, we had to remove pagination from an admin dashboard, because COUNT(*) to get the no. of pages was timing out. even after hours of letting it run. The table we tried to scan was over 5TB in size. How would you count these kinds of tables in MySQL? (we couldn't easily migrate to something like Postgres for other reasons).
@PlanetScale
@PlanetScale Жыл бұрын
In those scenarios, you'd probably be better off using something like cursor (or keyset) pagination. Check out this post for more info on that! planetscale.com/blog/mysql-pagination
@spaceemotion1
@spaceemotion1 Жыл бұрын
@@PlanetScale that's what we ended up with, but we still had no clue how many records there actually are 😅 (for internal analytics)
@aarondfrancis
@aarondfrancis Жыл бұрын
@@spaceemotion1 This is totally the kind of thing PlanetScale Boost can solve for! kzfaq.info/get/bejne/qbCGrdlzsarVoas.html
@reikooters
@reikooters Жыл бұрын
If you're happy with an approximate number, in MSSQL we use: select rows from sys.partitions where object_id = Object_Id('my_table'); Based on a google search, you can do the same in MySQL: select table_rows from information_schema.tables where table_name = "my_table"; Or alternatively: select num_rows from information_schema.innodb_sys_tablestats.name = "my_table"; This uses a pre-calculated value and doesn't actually touch the table itself, so you get an answer immediately, although the value is not constantly updated every insert, hence approximate. I was actually surprised not to see this approach mentioned in the video, but I guess it was just a "myth buster" video - even though I swear I saw the thumbnail said "fastest way to count". You could also keep your own separate table where you keep accurate counts every time you do an insert, and use that in your application when you want to show the total number of rows.
@SimonBuchanNz
@SimonBuchanNz Жыл бұрын
I'd be interested to see if count(1) worked?
@Neakas
@Neakas 10 ай бұрын
This is also true in MS-SQL. it will use the most narrow Non Clustered Index on a Table. If there is non, the Tables clustered Index has to be checked, which is slow. but i think in MS-SQL you can use sys.sysindexes to look up the rowcount even faster
@adamtak3128
@adamtak3128 Жыл бұрын
Please make more education SQL content. This was fantastic.
@CodeKujo
@CodeKujo 11 ай бұрын
The advice to avoid count(*) predates mysql. It may have even been true in mysql at some point or some obscure schema. As you said, count(*) relies on the optimizer to do the right thing. I use count(0) myself, but I wouldn't be surprised if sometimes you have to be more specific to get the right query plan. I think the biggest lesson in this video is not to rely on advice, but to check the plan and know that counting an index can be faster--which is great advice!
@ivanskyttejrgensen7464
@ivanskyttejrgensen7464 11 ай бұрын
When I got a job that involved Oracle 7 in 1999 the DBA told me to use count(*) because the old hack with count(1) wasn't needed anymore. So it was presumably true at some point in the 90s.
@davidlean8674
@davidlean8674 11 ай бұрын
I was teaching performance tuning on behalf of a database vendor in 1989. COUNT(*) was the recommended approach for SQL Server (both Mifcosoft & Sybase), Oracle, DB2, & Ingress. So yes it predates mySQL. The syntax alternative was to specify a column name, But that was only if you wanted to find the count of non-null fields in that column or expression. Count (constant) was never necessary in any platform I've used. Yet lots of people suggested it. Most had minimal clue about DB internals or query optimisations.
@ABaumstumpf
@ABaumstumpf 10 ай бұрын
@@ivanskyttejrgensen7464 I have used it on oracle6 so at least even back then the advice to use count(1) was already outdated. Likely it was something for pre-ansi sql.
@christianbarnay2499
@christianbarnay2499 10 ай бұрын
But all queries rely on the optimizer. The optimizer is the core of the querying engine. As long as you don't mess with the execution plan by forcing a path through hints, the optimizer will always kick in and do its job.
@BlairdBlaird
@BlairdBlaird 10 ай бұрын
@@christianbarnay2499 a big difference is that count(*) semantics are defined by the SQL standard itself, so its optimisation is a lot more likely than count(constant) being recognised as equivalent... to count(*).
@aaronmeder
@aaronmeder 5 ай бұрын
Love it! Thanks guys for sharing
@XaviSanz35
@XaviSanz35 Жыл бұрын
if you remove secondary indexes, and run count again, how long it takes?
@qkktech
@qkktech 10 ай бұрын
count star is asking recordcount and that is table property. When select * is used in columnar database then it is not optimal since in case columnar databases columns are retrieved separate and then records are put together. In case of columnar databases for null allowed columns also distinct count is also column property sometimes also min and max etc. Vertica has for example no indexes at all.
@RockTheCage55
@RockTheCage55 Жыл бұрын
Granted it "might" be based on the sql engine: Oracle, Sql Server, Postgres, ect.... i usually do `select count(1) from MyTable;`. Whether that is faster or not i don't know & it might depend on the engine again.
@PlanetScale
@PlanetScale Жыл бұрын
Right. We are specifically talking about MySQL here.
@stLmpp
@stLmpp Жыл бұрын
I also use count(1), don't know if it's slow or faster than count(*)
@HoSza1
@HoSza1 Жыл бұрын
Why do you write "ect"? Is it a typo?
@brentfoo3938
@brentfoo3938 Жыл бұрын
@@HoSza1 no that means i didn't list every database engine out there. There are many more :).
@jige1225
@jige1225 Жыл бұрын
@@brentfoo3938 then 'etc.' (and just one '.')
@lq_yt
@lq_yt 11 ай бұрын
on mysql
@IARRCSim
@IARRCSim Жыл бұрын
If you count a primary key, why wouldn't the SQL vendor recognize the intent and treat it the same as * anyway? Can count(id) where id is the primary key ever lead to a different result than count(*)? I wish he tested with multiple SQL vendors(MySQL, Oracle, MS SQL, PostgreSQL, SQLite), with * and with id to see if all or just some are optimized this way. I wish he also timed some of them in milliseconds too to see how important the difference is. Not all SQL vendors optimize the same ways so I'd be surprised if this advice is applicable to every SQL vendor that exists. If I submitted count(*) in a pull request and the reviewer didn't like it, he might not be convinced by this youtube video because it doesn't go into any depth on evidence with different SQL vendors or actually timing anything. It would be a lot more efficient in man hours if a video like this included enough evidence and detail than if the masses who tried to follow the advice did all this testing independently on their own.
@GhiveciuMarian
@GhiveciuMarian 8 ай бұрын
This make sense when you select all rows from table, but throw in there any WHERE clause, or any filtering then the advantage might evaporate. The hardest thing i found was to display results on filtering ... In this for example might you want to show how many todos are done from total. This specific table does not have the field 'done' but if its having done, and was not specified in a key it will result in a table scan.
@devhaua
@devhaua 10 ай бұрын
One of the best videos I haveever watched on SQL, tq
@PlanetScale
@PlanetScale 10 ай бұрын
Thank you! Love hearing that
@quintennn
@quintennn 7 ай бұрын
"Tell your family on thanksgiving" It would take me about 5 winters to explain this to my family.
@Petoj87
@Petoj87 10 ай бұрын
Would be interesting if the same is true for other sql engines like litesql, postgres, sql server and postgres
@cangurcan99
@cangurcan99 11 ай бұрын
Wow, using mysql for decades and never knew this. Thanks man.
@jannickbreunis
@jannickbreunis Жыл бұрын
“When you’re arguing with you family…” haha nice one.
@anasouardini
@anasouardini Жыл бұрын
Why don't DBs store a count variable and save the day? Why don't they count ROWS by the rows themselves` instead of ROWS BY COLUMNS?
@jameslucas5590
@jameslucas5590 11 ай бұрын
I'm a vet and will use *. I don't care if it bothers anyone, because I want to just move on. However, I Love this video and the knowledge you share.
@jacobstamm
@jacobstamm Жыл бұрын
Doesn’t the compiler optimization demonstrated at 5:25 mean the recommended usage of count(*) makes no difference in practice?
@aarondfrancis
@aarondfrancis Жыл бұрын
People use COUNT(id) because they think they are being helpful, when in reality they are not. If they are using COUNT(id) to mean COUNT(*), it's better to make that intent clear. Thanks for the comment!
@TheDragShot
@TheDragShot 11 ай бұрын
- *Newbies:* Using COUNT(*) - *Dinosaurs:* Using COUNT(ID_COLUMN) - *Me and my two neurons:* Using COUNT(1)
@Peter-Ja
@Peter-Ja 10 ай бұрын
Looking forward for the next argument with my family about the performance of the SQL count operation. Everyone will be so excited
@PlanetScale
@PlanetScale 10 ай бұрын
Hopefully you win! "Happy Thanksgiving, y'all don't know anything!" - Peter, probably
@alexpetrov8871
@alexpetrov8871 10 ай бұрын
0:55 >optimized This optimization is specifically MySQL case, in other DB it may not be so and can even depend on DB version. So generally for SQL count(id) is better (if there is suitable index of course).
@mortona42yt
@mortona42yt Жыл бұрын
I knew the db optimizes the query, but didn't know details like this. I would be surprised if after 20+ years of development, it would interpret count(*) as "load everything from the table and count it". What about when you don't have a secondary index, or doing a join query? Probably still counts the returned rows, or some optimization with joined indexes?
@MrGbison
@MrGbison 11 ай бұрын
this tested by me select count(*) is fastest than count(id) .. but only few milliseconds different. The reason already explain here. if you have some complex query and need to count how many the data.. select bla-bla etc just make the query as sub query like this select count(*) c from ( select bla-bla ) tbl1 * found where my older script using select bla2 and count using mysqli_num_rows (auch)
@medilies
@medilies Жыл бұрын
Maaan, I didn't know this channel is posting such content :0 I liked it, subscribed and activated notifications.
@rumisbadforyou9670
@rumisbadforyou9670 Жыл бұрын
I never got into using SQL databases, but having written a few on-disk and over-the-network data structures, I'd expect the count(*) to be smart enough to use some cached "total_length" value, especially considering that a lot of effort went into writing query optimizers. I guess, people would think that because of lack of experience of writing a data store yourself?
@sohn7767
@sohn7767 Жыл бұрын
would be easy enough on a simple table query, in fact count() and indexes probably do do that. However whenever you call a function or a view or whatever script with at least a little logic, the final length is unknown
@brdrnda3805
@brdrnda3805 Жыл бұрын
I never wrote a data store, but worked with relational databases for almost 25 years. Still, assuming COUNT(*) wouldn't be fast and it would process the whole record sounds utterly absurd to me. (And, honestly, I never heard that myth)
@diegocardenas4522
@diegocardenas4522 7 ай бұрын
Best ad ever, keep them coming
@tomasma4896
@tomasma4896 10 ай бұрын
Cools, I am using MySQL for years but never heard of this one. Please more videos like this :)
@alexaneals8194
@alexaneals8194 11 ай бұрын
It's database engine dependent. Oracle, SQL Server, DB2, MySql and etc. all have different implementations. In SQL Server there are faster ways of getting a count then using select count(*) which incidentally puts locks on the table being counted if you are not careful.
@RajveerSingh-vf7pr
@RajveerSingh-vf7pr 11 ай бұрын
That's the thing about Sql, Sql is not like a normal programming language,... It takes the query, analyse it, plans it, then runs it
@CFEF44AB1399978B0011
@CFEF44AB1399978B0011 10 ай бұрын
It seems to me that the point of a declarative language is that you declare what you need, or tell the computer what you need. The computer figures out the best way to get it for you. Telling the computer how it should count seems to be defeating the purpose of having a declarative language where the computer will figure out the best way to do what you requested.
@ahmad-murery
@ahmad-murery Жыл бұрын
I know about count(*) but I didn't know how the Optimizer decides about what index to use. One learn new things everyday, Thanks Aaron!
@marioomarti815
@marioomarti815 7 ай бұрын
7. Crea un rol que permita conectarse a la base de datos y hacer SELECT sobre las tablas creadas anteriormente. CREATE ROLE ROL_USU; GRANT CREATE SISSION TO ROL_USU; GRANT SELECT ON TABLA1 TO ROL_USU; GRANT SELECT ON TABLA2 TO ROL_USU; GRANT SELECT ON TABLA3 TO ROL_USU
@marioomarti815
@marioomarti815 7 ай бұрын
8. Conceder el rol creado a USU2 y USU3. GRANT ROL_USU TO USU2, USU3 9. Conceder a USU4 privilegios sobre algunas tablas con la opción de poder concedérselos también a otros usuarios. GRANT INSERT, DELETE ON TABLA1 TO USU4 WITH GRANT OPTION; GRANT INSERT, DELETE ON TABLA2 TO USU4 WITH GRANT OPTION;
@TeslaAndTech2024
@TeslaAndTech2024 Жыл бұрын
I felt like the video was not finished. I didn't see you run any thing. You just said it was faster, but I wanted to see the numbers.
@mariomario4676
@mariomario4676 9 ай бұрын
thanks for the educational content
@x364
@x364 11 ай бұрын
Very nice! Thank you!
@michaellemon8287
@michaellemon8287 10 ай бұрын
I think this video is a bit misleading. The part about count(*) vs count(id) is good and explains well why they are equivalent, also the explanation about the index used by the database is informative. However, saying that count(*) is fast is not accurate - depending on circumstances it can be fast or it can be very slow. Usually, when tables grow into millions of rows SELECT count(*) becomes problematic. Sometimes mysql optimizer is not smart and will use a wrong index - for example, I have a table with 750 000 rows and for SELECT count(*) mysql chooses a secondary nullable index even though a primary index exists! I don't know how a nullable index can be used for count(*) but certainly not efficiently because the query takes 2 seconds, which is quite slow. A table without a primary index of 2 million rows can take as much as 2 minutes to count all rows! So overall, the statement that SELECT count(*) is fast is what I find misleading in this video. This method of counting should not be encouraged unconditionally, actually people should be aware of performance problems they may come across. So while a myth, not entirely a myth!
@PlanetScale
@PlanetScale 10 ай бұрын
Thanks for the reply! In fact, a secondary index is what the optimizer chooses precisely because it is the fastest. The docs explicitly say this! The reason a secondary index is faster is because it is more compact. In MySQL the primary index is the clustered index, meaning it contains the entirety of the table's data. While count(*) may not always be fast, it is the fastest way to count the rows in a table.
@uplink-on-yt
@uplink-on-yt Жыл бұрын
It's actually written in the manual. You don't have to prove it. It surprised me when I read it.
@adamzaczek6342
@adamzaczek6342 Жыл бұрын
Holy count, I just found an awesome channel to subscribe to. Love the humor at the end!
@aarondfrancis
@aarondfrancis Жыл бұрын
Holy count 😂
@notoriouslycuriouswombat
@notoriouslycuriouswombat Жыл бұрын
maybe i am in the wrong circles but i've never worked with anyone who said don't use count(*) weird, did you just make this up? or do people actually say this?
@PlanetScale
@PlanetScale Жыл бұрын
Sounds like you're in the right circles, tbh
@therealcomment5622
@therealcomment5622 8 ай бұрын
I can't wait to go to the next thanksgiving.
@smeedee
@smeedee 10 ай бұрын
You could also use the „rows“ from the explain query. In some use cases this is already good enough ^^
@mutatedllama
@mutatedllama Жыл бұрын
What a great video. Earned a subscribe. Looking forward to more!
@djenning90
@djenning90 Жыл бұрын
I love your style!
@aarondfrancis
@aarondfrancis Жыл бұрын
Thank you!
@CirTap
@CirTap Жыл бұрын
Thank you for saving everyone's Thanksgiving! 😂
@jayakarthikreddy
@jayakarthikreddy 8 ай бұрын
I verified the explain plan of count(*) on a table, optimiser even picked a secondary index on a null column.
@Kaspricci
@Kaspricci 11 ай бұрын
When I was still coding DB scripts “select count(1) from table” was considered fastest.
@wadecodez
@wadecodez Жыл бұрын
Your thanksgiving conversations sound interesting
@rickdg
@rickdg Жыл бұрын
Me using count(1) like a caveman.
@dennydharmawan4820
@dennydharmawan4820 Жыл бұрын
if count(id) and count(*) uses due-date in the end, then does it matter what you use?
@heiko3169
@heiko3169 Жыл бұрын
Actually he explained more why using count(*) is not bad at all. :) Although it would be interesting what would happen if there is no other (secondary) index at all on the table
@jacobstamm
@jacobstamm Жыл бұрын
I’m wondering this as well. Seems like the intelligence of the compiler makes the point somewhat moot. Then again, it’s good to know that I should always have an unclustered non-null index on my tables for this use case.
@heiko3169
@heiko3169 Жыл бұрын
@@jacobstamm You dont need to. :) I investigated that more for myself. It simply doesnt matter whether there is another index or not. The COUNT() function is specialized right for that purpose, and using it with * is just fine and will be fast anyway.
@martymoo
@martymoo 11 ай бұрын
I didn't know this. Thanks! Has it always been this way?
@Neptun084
@Neptun084 11 ай бұрын
Ok thanks, I'll introduce this to my entire family, specially to my gran grands
@arithex
@arithex 11 ай бұрын
Is 'select count' still always going to be a table-scan query? Or are there internal optimizations, that maintain a count of active rows in a table.. maybe an in-memory cache that's updated atomically with insert/delete operations?
@paulthomas2577
@paulthomas2577 11 ай бұрын
This changes by database server. This is true for MySQL but SELECT count(*) is much slower in SQL Server. In SQL Server the way I learned to to do it was SELECT count(1)
@Wangaruro
@Wangaruro Жыл бұрын
I actually learned something new today! Thanks!
@sathishkannan6600
@sathishkannan6600 Жыл бұрын
What software are you using to see which column is read? Is there an extension for this postgres?
@BobChao87
@BobChao87 Жыл бұрын
This isn't a special software at all! The EXPLAIN statement is part of MySQL itself and actually just returns a table with rows as seen in the video. PostgreSQL has a similar EXPLAIN statement, where you literally just put whatever you have as your and slap EXPLAIN in front, so "EXPLAIN " just like in the video will work similarly for you as well! Most modern database systems should have EXPLAIN or something like it, and you can likely find it by searching for " EXPLAIN statement".
@sathishkannan6600
@sathishkannan6600 Жыл бұрын
@@BobChao87 Thank you so much. New to database world
@willi1978
@willi1978 Жыл бұрын
All other dbs only return one field with count(*). Have never seen a table when running that query
@xavier.xiques
@xavier.xiques 11 ай бұрын
Good video, thanks
@adriancs6455
@adriancs6455 9 ай бұрын
thanks for this info
@ZanarkandStarplayer
@ZanarkandStarplayer Жыл бұрын
Now I'm ready for the holidays 💪🏽
@PlanetScale
@PlanetScale Жыл бұрын
Go get em!
@alexcoding99
@alexcoding99 Жыл бұрын
very informative!
@LV4EVR
@LV4EVR 11 ай бұрын
Thanksgiving comment totally, 100% EARNED the like. Cheers.
@PlanetScale
@PlanetScale 11 ай бұрын
Haha thank you! 🫡
@HishamElsayad
@HishamElsayad 11 ай бұрын
Thanks so much for providing this great information.
@JuanLuisEcheverria
@JuanLuisEcheverria Жыл бұрын
Hey dude, your way to explain this topic is very well !! Congrats
@PlanetScale
@PlanetScale Жыл бұрын
Hey, thanks!
@Zach2825
@Zach2825 Жыл бұрын
Wow, thank you!
@nicolasguillenc
@nicolasguillenc Жыл бұрын
you are amazing at explaining things man
@aarondfrancis
@aarondfrancis Жыл бұрын
Thank you!
@rolandtennapel5058
@rolandtennapel5058 11 ай бұрын
Awesome, just 1 sidenote; If, like you said at the start of the video, you want to get the row count of a table, just ask the table instead of counting it... 😅
You don't need NoSQL (use MySQL)
31:26
PlanetScale
Рет қаралды 74 М.
Pagination in MySQL - offset vs. cursor
13:20
PlanetScale
Рет қаралды 56 М.
UNO!
00:18
БРУНО
Рет қаралды 4,7 МЛН
Чёрная ДЫРА 🕳️ | WICSUR #shorts
00:49
Бискас
Рет қаралды 4,1 МЛН
Look at two different videos 😁 @karina-kola
00:11
Andrey Grechka
Рет қаралды 14 МЛН
How Senior Programmers ACTUALLY Write Code
13:37
Thriving Technologist
Рет қаралды 1,5 МЛН
From Idea To Robust Database Design
16:21
Ljupche Vasilev
Рет қаралды 8 М.
Naming Things in Code
7:25
CodeAesthetic
Рет қаралды 2,1 МЛН
Storing money in MySQL (the right way)
7:52
PlanetScale
Рет қаралды 41 М.
Easy database indexing strategies
36:03
PlanetScale
Рет қаралды 24 М.
Don't hide your database indexes!
11:35
PlanetScale
Рет қаралды 12 М.
Why You Shouldn't Nest Your Code
8:30
CodeAesthetic
Рет қаралды 2,7 МЛН
Faster database indexes (straight from the docs)
13:28
PlanetScale
Рет қаралды 126 М.
Cursor Pagination is the FASTEST - But you can't use it if...
13:14
Milan Jovanović
Рет қаралды 17 М.
UNO!
00:18
БРУНО
Рет қаралды 4,7 МЛН