Doubling the speed of my game's graphics [Voxel Devlog #18]

  Рет қаралды 20,031

Douglas

Douglas

Күн бұрын

Online demo: github.com/DouglasDwyer/octo-...
Additional voxel models: drive.google.com/drive/folder...
With some clever tricks, I've managed to halve the frame times in my voxel game engine! Join me as I explain three essential optimizations for voxel rendering. These optimizations include using DDA for traversal, bitwise masking to filter out potential intersections, and performing a low-resolution depth prepass. In addition, I talk about ray marching in general and discuss the other new features that I added to my engine this month.
Music used in the video:
Chris Doerksen - Perfect Parry
Corbyn Kites - Orbit
White Bat Audio - Athena
White Bat Audio - Chroma
Chris Doerksen - New Groove

Пікірлер: 163
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Want to learn how to write powerful, high-performance code? Then be sure to check out CodeCrafters using the link below, and help support my channel: app.codecrafters.io/join?via=DouglasDwyer They have one project which is completely free to complete during their beta, and you can begin any of their projects for free! Get 40% off if you upgrade to a paid account within three days.
@Kingofthecrazy
@Kingofthecrazy 3 күн бұрын
douglas i've been watching your game dev videos for awhile now and since you don't have an email or a discord i'll ask here i want to work with you on a game i have planned if you are curious lemme know afterall you where to find me, just reply to this message
@dleiferives
@dleiferives 2 ай бұрын
It’s amazing how much you’ve grown as a programmer. Keep up the good work
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks! I intend to do so!
@GabeRundlett
@GabeRundlett 2 ай бұрын
Great video as always! A couple of suggestions: Regarding using DDA on multiple levels aka a "hierarchy"- HDDA, Ken Museth, implementation in OpenVDB. The basic idea is that the DDA process is pretty much restarted when the hierarchy level changes. An alternative (if you had a shallow tree) would be to have a stack of the DDA traversal state, one copy for each level. This alternative may be too register heavy (and might benefit from the stack being in shared mem instead). Regarding the voxel memory layout, you show that first comes the 64-bit bitmask and then the data segment. This means that when you fetch a brick, you always load 8 bytes of bitmask data, as well as an additional 120 bytes of data, to fill the cache line. This means that 93% of your cache during traversal will be filled with these data segments!!! try to pack at least 16 bricks together so that you fill all 128 bytes of the aligned cache line with bitmasks!!
@andreirost4696
@andreirost4696 2 ай бұрын
Eyooo hello
@oamnog
@oamnog 2 ай бұрын
Ohh I get it, so that the cache isn't wasted on data segments that may not even be used as the ray may skip the voxel brick, and also the bitmask data of the surrounding voxel bricks are loaded ahead of time to be used spontaneously, and reducing updating the cache frequently. Determining the neighbouring 15 voxel brick bitmask data to store could be determined using 2x2x4, 1x2x8, 1x4x4 and 1x1x16 bounding boxes that can be offset and rotated based on the ray general direction and position. The ray's position in the bounding box would prob be in the far corner, and the shape of the bounding box (1x1x16, 2x2x4, etc) is determined whether the ray is moving closely axis aligned, or sharply in two axis, etc. You would prob need to take into account the relative hierarchical voxel size that the ray is currently at. But that approach sound complicated, and you could prob just pack 64 bricks' bitmasks together into 4 128 bytes aligned cache line (but idk how cache works so i may be wrong).
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks for the suggestions! I'll be sure to check out the OpenVDB implementation. As for cache lines, that's a very good point. I'll have to experiment with alternate data layouts in the future. But there's one nuance that I didn't mention - data segments are variable length. So if a brick has a single non-empty child, then the data segment will only be 4 bytes. But coming up with a way to pack the bitmasks (maybe I could store the bitmasks inline alongside each child pointer) might also help with cache coherency! That said, I don't really think that I'm memory bound right now :)
@pusab2064
@pusab2064 2 ай бұрын
This is some crazy low level thinking i aspire to be like this
@JoeTaber
@JoeTaber 2 ай бұрын
@@DouglasDwyer You can be limited by memory bandwidth separately from being limited by memory capacity. :)
@frozein
@frozein 2 ай бұрын
Great stuff, the bitmasking optimization was really clever. As for you hierarchical traversal question, it is definitely possible to avoid the ray-cube intersection test. The basic idea is at every step, you first descend as far down the tree until you reach a non-empty node. Then, perform your DDA step within that node, and if the ray exits the current tree node, ascend back up the tree until you are in bounds again. You do need to reinitialize DDA whenever you change levels though. I actually just implemented this for octree traversal in my engine and it works great!
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks for watching! I do the same thing in my engine that you describe. What you call reinitializing DDA is what I'm referring to as a ray-cube intersection test, since you use the cube and ray position data to determine the side of closest hit :) Edit: speaking of your engine, can't wait to see your next video!
@frozein
@frozein 2 ай бұрын
@@DouglasDwyer Ah I see, makes sense. I avoid the ray-cube intersection by subtracting position within the node and multiplying by 2 when descending. I keep a stack of node positions to ascend. It would be slightly more complicated for a 64-tree like you have though.
@nitaki_
@nitaki_ 2 ай бұрын
This is a great video, but you should refrain from using rotating or moving gradients as a background for your explanations. The youtube compression does not like it at all
@franzusgutlus54
@franzusgutlus54 2 ай бұрын
You are so right!
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Good suggestion. I was curious to see how it would turn out, but perhaps KZfaq's compression butchers it...
@danieles6684
@danieles6684 Ай бұрын
@@DouglasDwyerI actually was going to say I liked it!
@dominobuilder100
@dominobuilder100 Ай бұрын
@@DouglasDwyerI really liked it. Thought that it was a really nice detail, even with compression.
@linksword7110
@linksword7110 2 ай бұрын
holy, i thought you said you were going to reduce quality for more episodes. But this is amazing
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
I am not planning to reduce quality for discrete graphics cards. The lower quality will just be an option for the engine to run on iGPUs
@linksword7110
@linksword7110 2 ай бұрын
i meant on the videos :p
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
@@linksword7110 Oh gotcha haha. I'm glad to hear you thought the quality is still good. This video only took 8 hours total to produce!
@yiwmsh4393
@yiwmsh4393 2 ай бұрын
The mountain top looks so distant to me, down here at the start of the trail, so I'm very thankful you're posting the pictures you took up there.
@NathanNahrung
@NathanNahrung 2 ай бұрын
Would you see further improvements by expanding on on the beam technique? For example, what if your initial low resolution image was based on the displays aspect ratio? For example, a 16:9 ratio screen would generate a 17x10 square grid of rays (rays at screen edges so grid x & y are +1 the screen aspect ratio) for the first low resolution image, the next image which would start rays further from the camera could have 16x more samples (a 68x40 square grid of rays) as this would force to the next brick layer, and then this pattern could continue until the final resolution image which is at the displays ultimate resolution (including any multiple samples per pixel). Aternativly you could determine the initial low resolution by working backwards from the displays ultimate resolution, rounding up fractions of pixles by overdrawing the field of view for the lesser resolution images, and perhaps going all the way down to a 16 pixel image as the initial low res image.
@StiffAftermath
@StiffAftermath Ай бұрын
This project is very exciting to me to see all the persistent progress you are making. I hope you continue to persevere and make the most awesome voxel engine. My game is waiting for an engine such as this. John Lin's engine is amazing, but I dont think he is very open to licensing it out. Thank you for your continued work!! Much appreciated and looking forward to the future! ❤❤
@stormy_vox
@stormy_vox 2 ай бұрын
your optimizations are really cool! i don’t fully understand the bitmask stuff bit it’s neat
@phylliida
@phylliida 2 ай бұрын
Ur videos are so good!! Always excited to see another one
@DanaTheLateBloomingFruitLoop
@DanaTheLateBloomingFruitLoop 2 ай бұрын
Really cool stuff! The second optimization was genius!
@adricklynn8882
@adricklynn8882 2 ай бұрын
Amazing work! Can't wait to see more
@tommycard4569
@tommycard4569 2 ай бұрын
I always look forward to your videos, keep up the great work!
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Will do!
@empireempire3545
@empireempire3545 2 ай бұрын
Okay i'm drunk but hear me out: beam-tree You initially cast a single square 'macro-beam' from the camera and advance it forward as usual until you hit first voxel - this is the root node. The moment you hit the voxel, you split the macro-beam into sub-beams - ideally as few as possible, these are the 1st gen children of the root. The sub beam which hit the intiial voxel gets handled more or less as you do atm with low and high resolution beams. The rest of the camera projection proceeds onwards - but at worst case you still need only 4 macro beams to surround the beam (X) with the voxel which you hit (consider how to divide the area of M's): MMMMM MMXMM MMMMM MMMMM Looking from the side, the macro beams are truncated square-based pyramids, subdividing into smaller truncated square-based pyramids. The idea of course is to reduce the number of beam-steps you have to make. Smaller number of beams means smaller number of steps you would have to calculate - and even if the calculation is slightly more complicated (square vs point) then the reduction of 2 orders of magnitude should more than make up for it.
@stevegta6830
@stevegta6830 2 ай бұрын
I've always loved voxel engine graphics. Great work on this :)
@247flashgames
@247flashgames 2 ай бұрын
You are insane, and I love the quality of your videos & algorithms! You explain your strategies well, and I can’t wait to see more videos! ❤
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thank you :)
@MatejVancoCG
@MatejVancoCG 2 ай бұрын
Really cool. Thanks for such detailed explanation!
@platinumsun4632
@platinumsun4632 2 ай бұрын
Dude I really love the beam optimization. Looks so cool. Like out of an old 2.5d game.
@simongido565
@simongido565 2 ай бұрын
I do something similar in my engine. I call it compressed and normal cells. Compressed cell is the one that contains subgrid of selected size with the same voxels. The first advantage is that instead of storing for example 16x16x16 voxels I store only one that represents all of them if they are the same. Then about raymarching. I raymarch compressed grid as if it was normal voxel grid. If i get inside cell that contains only one voxel which means it is compressed, I just return color of the voxel that cell holds. If I get inside cell which contains 16x16x16 voxels it means it was not compressed, I start new raymarching that starts where compressed cell raymarching ended and I raymarch voxel subgrid which has dimensions 16x16x16, then if not hit happend I just continue raymarching compressed cells again and so on. It increased performance a lot and memory wise it is also great. I am thinking about implementing multiple levels but for now I am okay with just two levels.
@simongido565
@simongido565 2 ай бұрын
Maybe I could draw a picture if you wanted 😆 it might make more sense then.
@sti3167
@sti3167 2 ай бұрын
kind of like palette compression?
@simongido565
@simongido565 2 ай бұрын
@@sti3167 more like grid compression? I do not know what exactly is pallete compression :D. Imagine grid 256x256x256 and each voxel has size 1.0. So to compress it and traverse faster I take some number by which is 256 divisible. Lets say 8. So I create new grid with dimensions 32x32x32 ( 256 / 8 ) and voxel size 1.0 * 8. Now I go through original grid by checking 8x8x8 subgrids. If that subgrid has uniform voxel I copy in top compressed grid just single voxel instead all of them. If they are not uniform I copy all of the voxels. When raymarching I raymarch compressed grid as if it was regular voxel grid. But once I get inside voxel (cell) which has more voxels inside, I switch to traversing that subgrid with dimensions 8x8x8.
@simongido565
@simongido565 2 ай бұрын
So in the end it is grid of smaller grids. Which I create from denser grid.
@sti3167
@sti3167 2 ай бұрын
@@simongido565 Sounds like you are generating and using LoDs/mips for voxels on multilevel grid(or nested grid, sometimes called brick-map) depending on distance/required details. I would love to see some code of how you generate it/store/traverse, but its up to you : ) P.S Afaik palette compression is about compressing data that belongs to voxels instead of storing it in voxels themselves - for example unique colors or normals etc in a chunk of voxels is being written to a lookup-table(or something like that) and then every voxel indexes into this table instead of directly defining it, that way you can minimize memory overheads using less bits, even without doing quantization/lossy compression. This might improve performance too, because that way you can ensure you are not memory bound and there are less cache misses because it can fit into shared memory, cache lines etc
@c0pi
@c0pi 2 ай бұрын
I recognize the place of the demo!! I had a pizza in that place. Nice!
@RyDawgE
@RyDawgE 2 ай бұрын
This is AWESOME!!!! Very inspirational
@HerrDoktorWeberMD
@HerrDoktorWeberMD 2 ай бұрын
Doing god's work, stuff I wish I had the spare time to achieve. I just pray you don't pull a John Lin by dropping the most gorgeous voxel world renderer anyone has ever seen and then vanishing.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Well, I would love to "pull a John Lin" by dropping the most gorgeous voxel world renderer. But I don't intend to disappear!
@owendavies8227
@owendavies8227 2 ай бұрын
Wow, the beam optimization is absolutely genius.
@CSPciccionsstibilepippons
@CSPciccionsstibilepippons 2 ай бұрын
I tried the native demo today: with my Ryzen 3500u integrated GPU it runs at around 50 ms with default settings, but with 2x downscale it runs under 20 ms even with maximum lod bias and I can't even notice the difference in resolution 🎉🎉🎉
@shadow_blader192
@shadow_blader192 2 ай бұрын
Amazing video
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thank you!
@0memega
@0memega 2 ай бұрын
Before this, I didn't know you could even use DDA for raytracing. As I only used it for raycasting. Amazing video.
@hexadeque1101
@hexadeque1101 2 ай бұрын
Here is my DDA on an octree implementation if it is helpful: I have basically a stack data structure holding the current node I'm looking at and its parents so that I can walk up the tree. The stack is initialized to just hold the root octree node. At the beginning of each DDA step in the loop, the current position (rayOrigin + rayDirection*distance) is guaranteed to be within the current octree node on the top of the stack, but the current node is not guaranteed to be a leaf node (the current position is inside one of its children). while (rayDepth < RENDER_DISTANCE) { Get the bottom most node containing the current position (the current node is now guaranteed to be a leaf node containing the current position) Check if the current node is a filled voxel. If so, the ray hit something Step the ray with DDA (the current position is no longer within the bounds of the current node) Walk up the nodes in the stack until the current position is within bounds of the current node (sets up for the first part of the loop for the next iteration) } Hopefully that made sense. If not, my actual code is in the replies.
@hexadeque1101
@hexadeque1101 2 ай бұрын
struct OctreeNode { uint data; // Basically an enum using the following constants uint[8] children; }; // Different materials are not implemented yet so full and empty are the only two states a voxel can be const uint EMPTY = 0; const uint FULL = 1; const uint PARTIAL = 2; // For parent nodes who have some full (or partial) children and some empty children struct Ray { bool hit; uvec3 position; ivec3 normal; float depth; }; layout(std430, binding = 1) buffer World { OctreeNode world[]; }; ... Ray castRay(vec3 rayOrigin, vec3 rayDirection) { ... // Ray box intersection on the size of the octree in case the ray starts outside the octree is omitted ray.position = uvec3(floor(rayOrigin)); ray.depth = 0; vec3 stepSize = vec3( sqrt(1 + (rayDirection.y*rayDirection.y + rayDirection.z*rayDirection.z) / (rayDirection.x*rayDirection.x)), sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.z*rayDirection.z) / (rayDirection.y*rayDirection.y)), sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.y*rayDirection.y) / (rayDirection.z*rayDirection.z)) ); ivec3 direction = ivec3(sign(rayDirection)); vec3 rayLength = mix(rayOrigin - ray.position, 1 - rayOrigin + ray.position, step(0.0, rayDirection)) * stepSize; // The stack is stored as an array indexed by currentLevel which is incremented every time we descend down the octree // It stores uints which are indices into the world array uint nodes[OCTREE_LEVELS + 1]; uint currentLevel = OCTREE_LEVELS; // Level 0 is an individual voxel nodes[currentLevel] = 0; // Root node is always index 0 uvec3 currentNodePosition = uvec3(0); while (ray.depth < RENDER_DISTANCE) { while (world[nodes[currentLevel]].data == PARTIAL) { currentLevel--; uint voxelWidth = uint(pow(2, currentLevel)); uvec3 center = currentNodePosition + uvec3(voxelWidth); uvec3 subnode = uvec3(greaterThanEqual(ray.position, center)); currentNodePosition += subnode * voxelWidth; nodes[currentLevel] = world[nodes[currentLevel + 1]].children[subnode.x + subnode.y * 2 + subnode.z * 4]; } if (world[nodes[currentLevel]].data == FULL) { ray.hit = true; return ray; } // DDA step + store hit info for if we hit something // Possible optimization: instead of taking one unit long steps, take steps proportional to the size of the current node. i.e. if the current node is empty and 4 voxels wide, step 4x the distance. I could not get that working at the moment, though, as it brings extra complications. :/ if (rayLength.x < rayLength.y && rayLength.x < rayLength.z) { ray.position.x += direction.x; ray.depth = rayLength.x; rayLength.x += stepSize.x; ray.normal = ivec3(1.0, 0.0, 0.0) * -ivec3(sign(rayDirection)); } else if (rayLength.y < rayLength.z) { ray.position.y += direction.y; ray.depth = rayLength.y; rayLength.y += stepSize.y; ray.normal = ivec3(0.0, 1.0, 0.0) * -ivec3(sign(rayDirection)); } else { ray.position.z += direction.z; ray.depth = rayLength.z; rayLength.z += stepSize.z; ray.normal = ivec3(0.0, 0.0, 1.0) * -ivec3(sign(rayDirection)); } // Bounds check if (ray.position.x < 0 || ray.position.y < 0 || ray.position.z < 0 || ray.position.x >= worldSize || ray.position.y >= worldSize || ray.position.z >= worldSize) { return ray; } // I was lazy and went all the way up to the root node every time and left only going up as far as necessary for "later" currentLevel = OCTREE_LEVELS; nodes[currentLevel] = 0; currentNodePosition = uvec3(0); } return ray; }
@WlliamBrown
@WlliamBrown 2 ай бұрын
This looks really awesome! I stumbled across your videos yesterday, and have viewed most of them. I have also been researching voxels with marching cubes, but it destroys precise environment manipulation. Have you considered applying marching cubes only within each of your 8x8x8 voxels?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks for watching! I prefer the look of smooth cubic voxels, so I'm not planning to integrate marching cubes at this time (it would also make ray traversal more complicated)
@djordjepepic1656
@djordjepepic1656 2 ай бұрын
You might want to check out Nvidia's paper 'Raytracing Sparse Voxel Database Structures on the GPU', they outline a hierarchical DDA approach for efficient empty space skipping that you might be able to implement in your engine. You should read the paper for a better explanation, but the basic idea is to do DDA for all levels of the hierarchy at the same time, going down a level on collisions and going up a level when leaving a 'chunk' of the current level.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Interesting - I'll be sure to check it out :)
@bytesandbikes
@bytesandbikes 2 ай бұрын
Very nice! 😀
@beholdergamedesign
@beholdergamedesign 2 ай бұрын
I kind of wonder if we need another name for ray marching determined by grid boundaries or bounding volume intersections vs by distance fields.
@procedupixel213
@procedupixel213 2 ай бұрын
"Grid traversal" algorithms is the name that I learned for some of these.
@BalintCsala
@BalintCsala 2 ай бұрын
Ray marching is just the process of stepping the ray forward by some amount instead of using analytical formulas, if you visit the wikipedia page it tells you the name of those two specific variations, sphere tracing for SDF-s and cube-assisted raymarching for voxel traversal, although I personally see that referred to more as "voxel tracing"
@jonfriedl4786
@jonfriedl4786 26 күн бұрын
Gosh all these awesome voxel engines I'll never make myself lol.
@nathanruiz3424
@nathanruiz3424 12 күн бұрын
Instead of standard ray/plane intersection marching, maybe you can use SDF ray marching when you aren't using DDA. SDFs are very GPU friendly and the SDF of a cube does not use division. Plus you can round/bevel your cubes using SDFs for added artistic desicions.
@seth111yta1
@seth111yta1 Ай бұрын
As voxel engines advance and relative voxel size shrinks, is there an advantage to just rendering voxels as a point cloud with like face-camera colored squares or something? great work BTW and thank you for actually making browser demos
@moonshot3159
@moonshot3159 2 ай бұрын
holey moley this guy's good! Have you also considered similar optimizations that Vercidium (yt channel) did with his voxel engine? Or are those a different beast all together.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks for watching! I think that most of Vericidium's optimizations are very good, but they mostly apply to rasterizing voxels. Since I'm ray marching, they don't apply.
@Z_Z.t
@Z_Z.t 2 ай бұрын
for DDA in different sized pow2 boxes just replace multiplication with bit shifts and integer arithmetics
@matanon8454
@matanon8454 2 ай бұрын
Its amazing! Is water flow simulation something you consider to add at some point? Would love to dig some rivers and see water flowing 😍
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
I haven't studied fluid simulation yet, so unfortunately I probably won't get to it for a while. But I would love to have water like John Lin's!
@victorwidell9751
@victorwidell9751 2 ай бұрын
The low resolution pre rendering pass was interesting. Would it help to do it multiple times? I imagine it could be done with each level of an octree.
@UnofficialFoneE
@UnofficialFoneE 2 ай бұрын
Yes, this video made me "bricked" 👀
@Lantalia
@Lantalia 2 ай бұрын
Neat! Have you done anything with multiple, unaligned, grids, so you can, for instance, have a moving ship that is itself a voxel grid? It seems like your rendering solution may make that a lot easier than I would normally expect
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
While the renderer supports unaligned objects, I haven't re-added voxel entities on the game logic side. The functionality is there - hoping to take advantage of it in a future episode!
@JosephGleespen
@JosephGleespen 2 ай бұрын
@@DouglasDwyer How do you ray trace the unaligned objects, or rather shoot rays through/past the unaligned obejcts? Do you trace them in separate passes and the composite them into a single image? I'm mostly trying to conceptualize how one would perform DDA while encountering things that are off grid.
@Harald723
@Harald723 2 ай бұрын
YES MORE...
@spidermankey1398
@spidermankey1398 Ай бұрын
You can do bresenhem's algorithm instead of DDA which is more efficient
@DouglasDwyer
@DouglasDwyer Ай бұрын
Bresenham's algorithm is not conservative - it may miss voxels on the line that rays pass through. As such, using Bresenham using would result in gaps and artifacts appearing on objects.
@owendavies8227
@owendavies8227 2 ай бұрын
I guess for preserving DDA calculations between scales, if you're just going up or down a single level, the numbers would be 4 times greater or less with no other changes, and that could be performed just with bitshifting and nothing else, which would be pretty fast.
@owendavies8227
@owendavies8227 2 ай бұрын
I guess you could keep track of how many levels you are stepping between and do a bitshift of double the number of difference in levels of hierarchy, but it would probably be more computationally efficient just to do a double bitshift each time so you don't have to keep track of anything.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Unfortunately, I don't quite think that this works. With DDA, you store the distance along the ray to each plane of the current voxel. But if you move up/down tree levels, the planes may shift by non-uniform amounts. There are perhaps ways to update the plane positions based upon the initial and final voxel, but I don't know of any that are more efficient than just reinitializing DDA from scratch (which is equivalent to a ray-box intersection test).
@owendavies8227
@owendavies8227 2 ай бұрын
@@DouglasDwyer I am willing to believe I made some kind of mistake.
@owendavies8227
@owendavies8227 2 ай бұрын
​@@DouglasDwyer I don't know if this would necessarily be faster, but you could, if nothing else, theoretically update the plane positions in linear time. If the axes are aligned to the voxel grid (which could be done with a matrix multiplication for the camera and ray just once in the beginning), it would mostly just be addition, subtraction and bitshifts to move the planes around.
@Wizarth
@Wizarth 2 ай бұрын
You mention that you ensure the beams never go far enough into the geometry hierarchy to miss a voxel that is smaller than the grid/pixel size. Do you adjust the depth based on distance from camera? I.E. does it go deeper into the hierarchy when closer to the camera?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Yes, that's correct. The maximum distance that a ray can travel depends upon the size of the voxel (i.e. hierarchy depth) and the distance between adjacent rays. Adjacent rays "spread out" as they progress with a perspective camera, so that needs to be accounted for.
@jujuteuxOfficial
@jujuteuxOfficial 17 күн бұрын
For the beam optimisation: Have you considered that some single-voxel particles could be entirely missed?
@Octal_Covers
@Octal_Covers 2 ай бұрын
Which DDA algorithm are you using? Different algorithms have difference performance characteristics so it might be worthwhile to look at some other options. As for stepping through various sized voxels, would it be possible to multiply the amount you add to the ray position? Since it's a line, it should be the same relative positions
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
I am using a similar technique to the one described on this page: lodev.org/cgtutor/raycasting.html The problem is that when you step between grid levels, the distance from the ray to the current X/Y/Z planes changes in a non-uniform way. You need to update that distance somehow, and as far as I can tell updating the distance and recomputing it are pretty similar computationally. So every time I switch grid levels, I need to recompute the DDA variables. This is what I refer to as performing a box-intersection test, since it's the same thing mathematically.
@Octal_Covers
@Octal_Covers 2 ай бұрын
@@DouglasDwyer you could attempt updating it each time the voxel size changes, and if it's not as performant as box intersection tests then you could just stick to box intersection tests. The algorithm I'm using for my voxel game is the Fast Voxel Traversal Algorithm, which doesn't use as much trig functions
@amaryllis0
@amaryllis0 2 ай бұрын
Game looks great, i gotta second that the rotating backgrounds make me feel nauseous tho
@AIShipped
@AIShipped 2 ай бұрын
That’s amazing progress
@guillaumemagniadas2472
@guillaumemagniadas2472 2 ай бұрын
When you're speaking of sparse tree, are you speaking of sparse voxel octree ? I wrote an alternative of DDA algorithm that would compute a different step depending of the current level of depth in the SVO
@Z_Z.t
@Z_Z.t 2 ай бұрын
10:48 isnt using simple rasterization with depth gonna give better depth values? (im aware about greed meshing)
@EndroEndro
@EndroEndro 2 ай бұрын
Sadly i'm getting error panicked at 618 in web-d2eddb7ca2dff036 js file (Could not load GPU adapter) and 602:21 in opera other browsers and exe the same issue.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Thanks for trying the demo! It sounds like your computer doesn't support Vulkan or WebGPU. I will add an error message in the future for unsupported devices.
@UltimatePerfection
@UltimatePerfection Ай бұрын
You should try getting the voxel resolution higher, with each voxel being 10cm or less.
@steluste
@steluste 2 ай бұрын
Are you using any engine/framework such as Monogame or it's all from scratch? Also do you have a public repository of the project?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
The project is written entirely from scratch in Rust and uses WebGPU as the graphics API. The project is not open-source at present, but many components of it (like the event system are networking system) are open-source on my GitHub!
@thomasmcbride2287
@thomasmcbride2287 2 ай бұрын
THREE DIFFERENT OPTIMIZATIONS??? YOU’RE A WIZARD!
@teabow.
@teabow. Ай бұрын
1:45 Is the entire world a single tree? 7:35 I understand the masking optimizations, but how do you figure out the masks for the rays? There are so many positions and directions.
@DouglasDwyer
@DouglasDwyer Ай бұрын
Thanks for watching! 1. The world is divided into 256³ chunks. Each chunk is its own tree. A pointer from each chunk to its tree structure is stored in a 3D array. 2. The masks are procedurally generated at compile-time. There are 64 possible positions within a brick and eight possible octants for the direction (determined by -x or +x, -y or +y, and -z or +z). Hence, there are 512 masks. I generate them by looping over every combination and testing all voxels to see which ones the mask should include.
@teabow.
@teabow. Ай бұрын
@@DouglasDwyer Thank you for answering! I thought the rays were projected from a point behind the screen? How can their directions be so limited, then?
@DouglasDwyer
@DouglasDwyer Ай бұрын
You're correct - the rays themselves can have virtually infinite directions. However, the masks don't need to be one-to-one. All rays that point toward the same octant are grouped. So a ray whose direction has a negative x-component, positive y-component, and negative z-component will be grouped with all other -x,+y,-z rays. The same goes for all other combinations of positive and negative cardinal axes. This way, we can eliminate all voxels that lay to the opposite side. If a direction has a negative x-component, then we know it cannot hit voxels that have a positive x-displacement from the ray, for example.
@teabow.
@teabow. Ай бұрын
@@DouglasDwyer Oh, I get it now! What an awesome optimization.
@slent2410
@slent2410 Ай бұрын
So do you play on releasing this to the public once its completed or in a useable state?
@DouglasDwyer
@DouglasDwyer Ай бұрын
There's already a demo playable online! But yes, I hope to turn this into an actual game someday.
@R.Daneel
@R.Daneel 2 ай бұрын
Wow. KZfaq compression isn't doing your grey background any favours! I can't tell from the video. Have to fallen down the SIMD rabbit-hole yet?
@TheRyulord
@TheRyulord Ай бұрын
This is being rendered on GPU using a custom shader so everything is inherently SIMD
@meetem7374
@meetem7374 17 күн бұрын
Hello, I've actually stumbled across this exact problem of DDA being fixed, and I needed to traverse the octree in a more efficient manner. And actually I found the solution, it's pretty dirty, but it works. IDK if you can contact me on KZfaq, but you can try.
@meetem7374
@meetem7374 17 күн бұрын
Other optimizations in mine and your raymarcher are actually pretty simillar :) for bitmasks etc
@user-cl1rq1sg8m
@user-cl1rq1sg8m 15 күн бұрын
Could you improve the performance of the voxel game veloren
@Deniil2000
@Deniil2000 2 ай бұрын
you mentioned u32 and u64 in your video. are you writing this in Rust?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
The project is written entirely in Rust and uses WebGPU as the graphics API.
@jamesoneill4607
@jamesoneill4607 2 ай бұрын
Fewer rays
@lumiey
@lumiey 2 ай бұрын
There is an LOD 'leaking'? artifact when I fly a bit high up
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Yep, the LOD generation isn't perfect. The LOD leaking shouldn't happen as much with procedurally generated terrain and other solid models. It happens because the church has a lot of thin surfaces and so the engine isn't sure which surface to put on top when generating the higher LODs.
@monstercameron
@monstercameron Ай бұрын
is this signed distance fields? 3:19
@StylishHobo
@StylishHobo 2 ай бұрын
Rather than your custom data structure. Have you considered using a k-d tree instead?
@Stevidgry
@Stevidgry Ай бұрын
vertically sectioned chunks?????
@lumiey
@lumiey 2 ай бұрын
It runs super smooth on my PC, but I can't see how fast it actually runs because I can't disable vsync in online demo
@lumiey
@lumiey 2 ай бұрын
My GTX 1060 can handle 1500x1500 resolution at 60fps (2.25 mega pixels (1080p is 2.0736MP), there is no performance difference between web/native
@Midazc
@Midazc 2 ай бұрын
Did you slow down the audio track? Sounds much more natural on 1.25x
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Speaking slowly and clearly is important for public speaking, so that's what I aim to do. But you are more than welcome to enjoy the video at 1.25x if you'd like!
@paxcoder
@paxcoder 2 ай бұрын
So down to 60ms? But it would have to be ⌊1000/60⌋ = 16ms to run at 60fps right?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Yep! As noted in the video, the benchmarks presented were on a very bad GPU (Intel UHD 750H). On any discrete card, the engine runs at a buttery-smooth 60 FPS - for example, the scenes in the video run at 4 ms (or 250 FPS) on my NVIDIA 1660 TI. It's possible to make the game run in realtime on the Intel GPU as well by lowering the resolution. But for benchmarking I wanted the frame times to be as big as possible, so I chose a worst case.
@paxcoder
@paxcoder 2 ай бұрын
@@DouglasDwyer Too bad a discrete card is a requirement, as I'm afraid I only use integrated cards, but that is a whole lot of voxels there, so I guess it makes sense.
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Discrete card is only a requirement for 1080p. If you play the game at a lower resolution, you can definitely hit 60 FPS on an iGPU. But don't take my word for it - try the demo and evaluate for yourself! There is an option in the settings menu to downscale the image resolution for better performance :)
@xezvcikgames1191
@xezvcikgames1191 2 ай бұрын
@@DouglasDwyer can you please show in pseudo-code how you do ensure low res wont step too far 11:06 ? i didnt quite get it, also what it would be like to trace more coarse lods in pre-pass, for example on CPU side? still could avoid a lot of traversal iterations. How do you even make sure its done in conservative manner? wont it require supercover dda or smth, hard to imagine tbh
@shadow_blader192
@shadow_blader192 2 ай бұрын
1:55 isn't it called raycasting?
@sjoerdev
@sjoerdev 2 ай бұрын
raymarching is raycasting in small steps
@shadow_blader192
@shadow_blader192 2 ай бұрын
@@sjoerdev raymarching is raycasting based on minimum distance to objects?
@sjoerdev
@sjoerdev 2 ай бұрын
@@shadow_blader192 thats sphere tracing which is a form of raymarching
@shadow_blader192
@shadow_blader192 2 ай бұрын
@@sjoerdev oh those names
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
No idea, everyone seems to have their own names for these algorithms lol
@mohammadazad8350
@mohammadazad8350 2 ай бұрын
Premier!
@mohammadazad8350
@mohammadazad8350 2 ай бұрын
So apparently commenting "First" in French earned me a heart... my studying is paying off!
@timmygilbert4102
@timmygilbert4102 2 ай бұрын
​@@mohammadazad8350je suis là pour tester ton français 👀 voyons voir si ça paye tant que ça 😂
@mohammadazad8350
@mohammadazad8350 2 ай бұрын
@@timmygilbert4102 "I am ??? for test your (familiar) French. Let's see see if that pays ??? that that" By interpolating, we get: "I am going to test your French, let's see if it paid off". I can't believe a KZfaq comment just gave an exam on something I do for fun :).
@mohammadazad8350
@mohammadazad8350 2 ай бұрын
*gave me Also, I genuinely had to resist the urge to look up "tant" in the dictionary.
@timmygilbert4102
@timmygilbert4102 2 ай бұрын
@@mohammadazad8350 good job 👍😁
@hombacom
@hombacom 20 күн бұрын
Online demo doesn't work on my iMac 27 Chrome/Firefox or iPhone
@DouglasDwyer
@DouglasDwyer 20 күн бұрын
Thanks for trying out the demo! If you'd like to help debug the issue, please open a ticket at github.com/DouglasDwyer/octo-release. Open the developer console in Chrome (Firefox is not supported) and copy-paste what you see :)
@hombacom
@hombacom 20 күн бұрын
@@DouglasDwyer I'm too lazy so I do it here.. Uncaught (in promise) Error: Using exceptions for control flow, don't mind me. This isn't actually an error! at imports.wbg.__wbindgen_throw (web-1b9437e1ec3ce582.js:958:15) at web-1b9437e1ec3ce582_bg.wasm:0x5e1292 at web-1b9437e1ec3ce582_bg.wasm:0x2dc81b at web-1b9437e1ec3ce582_bg.wasm:0x56f919 at web-1b9437e1ec3ce582_bg.wasm:0x591e27 at __wbg_finalize_init (web-1b9437e1ec3ce582.js:1948:10) at __wbg_init (web-1b9437e1ec3ce582.js:1984:12) octo-release/:1 No available adapters. octo-release/:1 No available adapters. web-1b9437e1ec3ce582.js:618 panicked at game\hal\src\gpu.rs:242:51: Could not load GPU adapter. Stack: Error at imports.wbg.__wbg_new_abda76e883ba8a5f (douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582.js:602:21) at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[4786]:0x5e5856 at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[2553]:0x578261 at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3660]:0x5d775c at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3610]:0x5d6b19 at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3809]:0x5da3ca at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[691]:0x38bb86 at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[1936]:0x537364 at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[1780]:0x52167d at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[459]:0x229111
@DouglasDwyer
@DouglasDwyer 20 күн бұрын
Thanks for doing this, I really appreciate it! For whatever reason, it looks like your web browser doesn't support WebGPU. Eventually, I'll get around to adding a proper error message for when devices aren't supported.
@beholdergamedesign
@beholdergamedesign 2 ай бұрын
A voxel brick has 8 64 bit occupancy masks?
@DouglasDwyer
@DouglasDwyer 2 ай бұрын
Since a voxel brick has 4^3 = 64 voxels, with one bit per voxel, each brick has 1 64-bit occupancy mask.
@gsestream
@gsestream 2 ай бұрын
try to render your engine nightmare scenario, ie the worst case geometry. multiplication is always cheap operation. division might not be, but will usually be also cheap on modern hardware. no op required is better than have to do op tho. also if you use sphere collision volume overlapping, then you need only one test per cell, not 6 for all sides of a cube cell plane boundaries. but you are testing coordinate plane boundaries for the DDA universally I think. so instead of storing stuff in the data structure, why not use the 64-bit as a pointer to the voxel data, which could be anything, not limited to the 64-bits. pointer to data structure is very fast. some optimizations are not worth it and make the logic a mess. order of magnitude optimizations should be the focus always. oh you try to compress the data using bit storage. if you make the assumption of large empty areas, then you can compress the data with RLE ones and zeros (voxel no voxel) for traces. so you are not using BVH to traverse top-down the voxel tree, that would compress the data also. bvh top-down traversal will work like DDA but processes the tree in the occupancy order, and stops if there is no stuff voxels in the tree leaf. usually you can just do the old style draw distance capping to limit amount of voxels drawn, even if you have billions. the old trick. even when doing true 3d traces. ie only render closest to a draw distance, even for lighting bounce traces. less is more. to get an idea of actual performance, try 4K resolution and see how badly it chucks or not. making an off-line renderer to test any features at high res and lighting rendering gives directions. you know what to do. maybe use an off-line pre-calc sdf for the ray marches, at least for static parts of the scene. maybe separately test dynamic meshes. instead of complex hit detection, you would just raster all voxels. as triangles. difference is not significant in general, not in the order of magnitude margin range. try rendering a voxel cloud, which is spongy ultra sparse voxel mesh. sdf can be updated on fly locally where voxel mesh edits happen.
Emissive voxels and fancy lighting [Voxel Devlog #19]
13:08
OPTIMIZING my physics engine [Voxel Devlog #13]
9:30
Douglas
Рет қаралды 25 М.
Heartwarming: Stranger Saves Puppy from Hot Car #shorts
00:22
Fabiosa Best Lifehacks
Рет қаралды 22 МЛН
БОЛЬШОЙ ПЕТУШОК #shorts
00:21
Паша Осадчий
Рет қаралды 10 МЛН
Best father #shorts by Secret Vlog
00:18
Secret Vlog
Рет қаралды 22 МЛН
WHO LAUGHS LAST LAUGHS BEST 😎 #comedy
00:18
HaHaWhat
Рет қаралды 21 МЛН
One Month of Game Development | Dream Game Devlog 1
9:31
Caleb Gilbert
Рет қаралды 82 М.
Best FREE Software for Game Development in (2024)
8:01
anyDev
Рет қаралды 32 М.
Adobe: A Disgusting, Criminal Company
10:21
Bull Technology
Рет қаралды 147 М.
6 MONTHS WRITING A GAME ENGINE IN C++ | Devlog #1
8:23
voxelbee
Рет қаралды 51 М.
Teardown - Your Questions Answered
13:53
2kliksphilip
Рет қаралды 413 М.
ChatGPT makes Voxel Engine with Rust
12:20
Tantan
Рет қаралды 92 М.
I Made a Game With No Free Time
13:36
LazyAlarm
Рет қаралды 158 М.
How Two People Created Gaming’s Most Complex Simulation System
38:54
ThatGuyGlen
Рет қаралды 1,3 МЛН
ЧОП ДОСЫН ЖОҒАЛТЫП АЛДЫ (GTA V)
14:10
MANGO PLAY
Рет қаралды 28 М.
МЕГАЯЩИКИ ВЕРНУЛИСЬ В BRAWL STARS
20:36
Поззи
Рет қаралды 708 М.