Meet Sohu, the fastest AI chip of all time.

No video

Meet Sohu, the fastest AI chip of all time.

Рет қаралды 21,307

Күн бұрын

Unleash the power of transformers! Introducing Sohu, the world's first chip built specifically for blazing-fast transformer models like ChatGPT and Stable Diffusion 3. Sohu offers unmatched speed and efficiency, but focuses solely on transformers - it can't run other AI models. This is a gamble on the future of AI, but if transformers are here to stay, Sohu will revolutionize the game. See why we took the leap!
Tell us what you think in the comments below!
Twitter Announcement: x.com/Etched/s...
Founder Interview (w/ John Coogan): x.com/johncoog...
Etched Blog: www.etched.com...
Groq Rebuttal 1: x.com/GroqInc/...
Groq Rebuttal 2: wow.groq.com/1...

Пікірлер: 139

@RonMar 2 ай бұрын

I think this makes 100% sense because GPUs are no built for inference-specific purposes-- just co-opted because it was at hand. The idea of bespoke hardware for a given model is intriguing. These optimized chips will be much faster/energy than the more generalized GPUs. Also, much cheaper and smaller I expect. Eventually, I think Local Inference will dominate almost all usages. Only extreme, edge-case's will require a cloudy data center.

@MrDragonFoxAI 2 ай бұрын

i think that is very much wishful thinking - the money is made in DC's and the hardware is build for it - at least for the foreseeable future and with all the regulation hitting right now - well you get my drift while i agree that this is a win for transformers .. its not uncommon .. groq did similar stuff with there custom fabric - but uses sram - hbm3e is sold out for like 2 years ahead .. and slow in i/o vs direct on die - once you go off chip .. that wont beat groq once optimized - it cant .. / and groq stopped selling hardware :)

@paulmuriithi9195 2 ай бұрын

agree. am rooting for biomolecular processes specific models and their custom hardware. like attachment based recurrent NN's would run great on SOHU. what do you think?

@netron66 Ай бұрын

another thing, gpu need reeally low latency so they need to have preferably one chip, that is why nvidia kill nvlink in rtx card, but for ai and workstation pc it is not the first priority

@coolnhotstorage 2 ай бұрын

I think just having tokens that fast opens up the doors to massively recursive chain of thought. You can get an even greater level of accuracy if you let the models refine they're thoughts rather than just get it to write an essay of the spot. That's what i'm most excited for.

@vishalmishra3046 2 ай бұрын

*There were no transformers (Attention abstraction) before 2017* Etched is incorrectly assuming that Transformers (unlike RNN/LSTM etc.) will *NEVER* be replaced by an entirely new innovation (using an abstraction that's way more effective than *Attention* ). From that moment on, all LLMs including Vision Transformers based models will migrate to the new (better-than-Attention) mechanism. Now Sohu needs to be replaced by another special purpose ASIC. Meanwhile, nVidia is incorporating version-3 transformer engine in their post-Blackwell ( *Rubin* ) architecture.

@biesman5 2 ай бұрын

It's obvious that transformers will be eventually replaced and they know it as well. I guess they are betting that it won't happen overnight, so they might as well make some cash in the meanwhile.

@ernestuz 2 ай бұрын

Betting everything to transformers is a risky thing, their main drawback is the context size, memory needs goes with context size squared, the result is small contexts and memory bandwidth starvation, apart from needing zillion of ultrabytes to operate. Even the most advocate companies are looking for solutions with other architectures. We will be in a transition period for a few years.

@peterparker5161 2 ай бұрын

It's just the fact that transformers are currently the best usable thing.

@darkevilbunnyrabbit 2 ай бұрын

We need this to make Open Source models viable for the average consumer.

@TridentHut-dr8dg Ай бұрын

Dots are connecting aren't they

@VastCNC 2 ай бұрын

Who’s going to buy them? Not the chips, but the company. I could see a major acquisition to pass the bag and build the moat.

@aifluxchannel 2 ай бұрын

At this point I doubt they're looking for a buyer. Frankly, given they're making ASICs although they have the best tech in existence for transformer inference, the amount of debt necessary to make that happen is astronomical. Do you think nVidia / OpenAI would look to acquire them?

@VastCNC 2 ай бұрын

@@aifluxchannel I’m hoping Anthropic, but more likely musk, Google or Meta.

@GerryPrompt 2 ай бұрын

@@VastCNC was also thinking this

@brandonreed09 2 ай бұрын

The AI providers will all invest, none will acquire.

@AaronALAI 2 ай бұрын

Frick....shut up and take my money! This will make local hosting much more common.

@aifluxchannel 2 ай бұрын

I would be willing to mortgage my house for one of these... in three years when they start showing up on eBay ;). Do you use GPUs or groq with open source models?

@AaronALAI 2 ай бұрын

@@aifluxchannel I would pay a hefty fee for one of these(2x plus would be heart stopping). I use 7x 24gb gpus on a xeon system.... sometimes I trip the breaker.

@mackroscopik 2 ай бұрын

I only need one of these! If anyone needs a kidney or testicle hit me up, I have an extra one of each.

@user-bd8jb7ln5g 2 ай бұрын

Inference costs will start dropping based on how quickly these guys can deliver their product. Assuming same price as Nvidia cards and 20x inference improvement. Inference cost should drop by 20x. I bet however that their accelerators will be cheaper than Nvidia's.

@aifluxchannel 2 ай бұрын

It's going to get interesting, especially with the already apparent lull in the GPU compute market.

@Perspectivemapper 2 ай бұрын

🐴We should assume that other chip manufacturers will jump on the bandwagon of dedicated LLM chips as well.

@MrAmack2u 2 ай бұрын

So you cant use them to train models? That is where we need the biggest breakthrough at this point.

@Alice_Fumo 2 ай бұрын

I actually don't think we do. There are a lot of techniques which make models more capable by using a lot more compute on inference. Also, there's reinforcement learning where very few of the inferences might be considered to be very good and could then be used as new training data.

@thorvaldspear 2 ай бұрын

@@Alice_Fumo Yea, prompting techniques like Tree of Thoughts look promising but are very inference hungry, so having lightning fast inference will be super helpful. Also, imagine how valuable inference becomes when proper robotics transformers finally get figured out; lightning fast reflexes, unstoppable killing machines...

@gentleman9534 2 ай бұрын

شريحة xtropic يمكنها تدريب النماذج بسرعة فائقة جدا و رخيصة جدا و استهلاك منخفض جدا يكاد يكون معدوم في استهلاك الطاقة

@woolfel 2 ай бұрын

something will replace transformers. Will they be able to adapt and change fast enough when transformers architecture evolves to something else? Things are moving fast and research shows there's lots of room for improvement.

@DigitalDesignET 2 ай бұрын

Very interesting, keep us informed. Thank you so much.

@aifluxchannel 2 ай бұрын

You bet!

@DanielBellon108 2 ай бұрын

I went to there website and there's no place to place any orders for chips or software there not even on KZfaq yet is this company really there 🤔

@nagofaz 2 ай бұрын

Hold on a second. We're all getting worked up over this, but has anyone actually seen a real card yet? I can't help but feel we should be a lot more skeptical here. The whole thing just reeks of 'too good to be true'. Look, I'd love to be wrong, but let's face it - this wouldn't be the first time someone's tried to pull a fast one in this industry. These guys aren't blind; they know the market's on fire right now, with money being thrown around like there's no tomorrow. Maybe we should take a step back and think twice before buying into all this hype.

@dg-ov4cf 2 ай бұрын

this reads like when i tell claude to act human

@fontenbleau 2 ай бұрын

interesting to see a real demo of this, that is what my robot needed!

@supercurioTube 2 ай бұрын

If Gen AI settle on transformers already, I expect mobile SoC and laptops to add a Transformers ASIC block on each chip Just like video encoder and decoder blocks for each individual codecs.

@aifluxchannel 2 ай бұрын

Makes more sense to have an API backed by an entire datacenter of these asics. Far cheaper.

@supercurioTube 2 ай бұрын

@@aifluxchannel maybe combining both? Like Apple's implementation with "Apple Intelligence" on iPhones. But with transformer ASIC added to the NPU.

@PaulSpades 2 ай бұрын

@@aifluxchannel LLMs are now capable as a natural language text and voice interface. This has been a goal of HCI for 50 years. Other multimedia tasks can be handled via remote compute, but the command interface and interpretation needs to be local.

@sigmata0 2 ай бұрын

It's like any other kind of technology. Those who enter first can get superseded by those who follow as technology, techniques and thinking changes. it's great because it keeps the pressure on to have companies innovate and improve. With the resources at NVIDIA's disposal they can too observe improvements by others and include them with additions in their next iterations of product.

@novantha1 2 ай бұрын

I think this is less interesting as a specific product than it appears on the surface. It's interesting certainly for the implications on the industry, but in terms of its direct relevance to "ordinary" people who like to buy a piece of hardware and use it (as opposed to being locked behind APIs) it's way less useful. Adding onto that, I think the most interesting thing in inference workloads is ultra low precision; forget FP8, why aren't they doing Int8? Int4? Int2 (as per Bitnet 1.58 I think 1.5 bit should be possible with a dedicated ASIC) could be incredible. Floating point numbers aren't actually very easy to work with on a hardware level, so it seems like a really weird choice for an inference only chip, as I'm sure they could have squeezed even more performance out of just using the same number of bits as integers, let alone using low precision integers. (I think Int 2 could potentially be ~4.5x faster off the top of my head). More to the point, given I'm primarily interested in hardware I can actually buy, if I wanted something like this Tenstorrent's accelerators seem way more interesting, and affordable. With all of the nay-saying said, the one thing about these that seems really interesting is that Moore's Law isn't actually dead. We're still getting more transistors. The issue is that it becomes harder and harder to control them all together in a centralized manner (like on a CPU), hence, CPU performance has declined, and even parallelized you see the same leveling off of improvements over time due to things like accessing cache and so on in GPUs. I'm no hardware engineer, but it seems to me, that in addition to gaining more performance from removing roadblocks (ie: only using hardware needed to calculate a Transformer network, so no CPU or GPU specific elements are included), Transformer specific hardware should still continue scaling with performance node improvements to a greater degree than existing legacy architectures, very similarly to how bitcoin ASICs were able to do so.

@elon-69-musk 2 ай бұрын

most amount of compute will be needed for inference to produce synthetic data so this type of chips might be more important than training chips

@darkreader01 2 ай бұрын

Like groq LPU, won't they provide us any platform to try it out ourselves?

@gerardolopez9368 2 ай бұрын

Nemotron was released a couple days ago, definitely see a race going on 🔥🔥🔥🔥💡💡

@issay2594 2 ай бұрын

Nvidia traps itself. It's enough to remember how 3d graphics has started - 3dfx creating the gpu to do specific load - meshs calculations making quake run fast. gpu were the tool to do certain task faster, not "everything in the world should be ever done on gpu", as it was on cpu before..

@TheReferrer72 2 ай бұрын

Inference is not the biggest capital outlay for the big players and I thought the Google, Microsoft's, Meta's & Amazon's of the world are already making their own GPU's. its making the foundation models that requires big CapEx spends, this is of little threat to Nvidia or am I missing something?

@AltMarc 2 ай бұрын

Excerpt from a Nvidia Press release: "DRIVE Thor is the first AV platform to incorporate an inference transformer engine, a new component of the Tensor Cores within NVIDIA GPUs. With this engine, DRIVE Thor can accelerate inference performance of transformer deep neural networks by up to 9x, which is paramount for supporting the massive and complex AI workloads associated with self driving."

@DonaldHughesAkron 2 ай бұрын

More than wanting to buy one.. How do you buy stock in this company? Are they on the market yet?

@ScottyG-wood 2 ай бұрын

Dead on arrival if data center + inference only is their strat. The initial sales will be great but nvidia, AMD, Intel, and Google are doubling down on merging inference and training. When you’re taking data center, the model that Sohu is taking will not make it. Their positioning will make or break them.

@cacogenicist 2 ай бұрын

Wen consumer home ASIC LLM inference machines for under $4,000? 😊

@aifluxchannel 2 ай бұрын

On eBay in five years when these start to show up haha.

@falklumo 2 ай бұрын

The challenge today is still training, not inference.

@cacogenicist 2 ай бұрын

Of course when we have a model with "average human intelligence," it's not going to have the _knowledge_ of an average human -- it will be vastly more knowledgeable than any human. It'll be like an average human only in specific sorts of reasoning, where they are rather poor presently. The top models are already quite good at verbal analogical reasoning.

@user-bd8jb7ln5g 2 ай бұрын

Majority of AI compute over the last year is being used for inference. If these guys can deliver these accelerators and servers quickly with no issues, Nvidia won't know what hit them.

@Wobbothe3rd 2 ай бұрын

Arent transformers going to be replaced by mamba/S4 models? And even if that isnt true, is it really a good bet to assume the transformer will be the dominant type of model forever?

@aifluxchannel 2 ай бұрын

Not necessarily, Mamba (SSM) and Eagle (RNN) based models only exist as alternatives to transformers because they help with the problem of scaling compute. If Etched has actually solved this with hardware for Transformers the performance wins of Mamba start to look much less impressive.

@loflog 2 ай бұрын

i wonder of problems like lost in the middle will come to the forefront once hardware scaling is eased dont think we actually know today whether theres architectural blindspots in transformers that motivate new architectures to emerge

@pigeon_official 2 ай бұрын

500k T/s is genuinely the most unbelievable absolutely insane thing I've ever heard in my life so until we actually see it irl I will continue to not believe it but I hope it's true so badly

@manonamission2000 Ай бұрын

self-driving cars could use this

@ps3301 2 ай бұрын

When a new ai model appears, this startup will tank

@aifluxchannel 2 ай бұрын

Well... the entire point of this hardware is it can run anything transformer based. All SOTA models, closed and open source, are transformer based.

@wwkk4964 2 ай бұрын

@@aifluxchannelit's really hoping that an SSM hybrid will not outcompete a pure transformer based model.

@JankJank-om1op 2 ай бұрын

found jensen's alt

@HaxxBlaster 2 ай бұрын

@@aifluxchannelThats what he is saying, new models which will not be transformers based. But they will surely be used until the next breakthrough

@seefusiontech 2 ай бұрын

Where is the John Coogan video? I looked and couldn't find it.

@aifluxchannel 2 ай бұрын

Linked in description but I'll put it here as well - x.com/johncoogan/status/1805649911117234474/video/1

@seefusiontech 2 ай бұрын

@@aifluxchannel Thanks! I swear I looked, but I didn't see it. BTW, I watched it, but your vid had way better info :) Keep up the great work!

@obviouswarrior5460 Ай бұрын

4000 € ? I have money ! I wish to buy one Sohu (and more after) ! Where can we buy it ?!

@styx1272 2 ай бұрын

Brainchips Corps revolutionary neural spiking Akida2 Chip has its own revolutionary new algorithm called TENN's creating ultra low power consumption and possibility similar productivity Etch whilst being highly adaptable And it doesn't require a cpu or memory to operate.

@tamineabderrahmane248 2 ай бұрын

the Ai hardware accelerators race has been started !

@bigtymer4862 2 ай бұрын

What’s the price tag for one of these though 👀

@aifluxchannel 2 ай бұрын

More than any of us can afford haha. But I bet it's within striking distance of the nVidia B200 at least per rack.

@tsclly2377 2 ай бұрын

Watch Microsoft as they have gone the complete binary AI route and they have their core of users that they want to service, gaming and business. the one thing that is blatantly obvious it that the trend is for server centralization when that may not be in the best interests of the user or get the best results.

@sikunowlol 2 ай бұрын

this is actually huge news..

@jonmichaelgalindo 2 ай бұрын

Can it run a diffuser like SDXL? Or a diffusion-transformer like SD3 / Sora?

@jonmichaelgalindo 2 ай бұрын

They claim SD is a transformer, which is half-true, but the VAE is a convolver. A CNN. They specifically said they can't run CNNs. Are they gaslighting right now? Did they sink everything into text-only hardware that's already out of date now that multimodal is taking over?

@aifluxchannel 2 ай бұрын

Yep, it's been tested with SD3 and can run any transformer based model (SORA included)

@aifluxchannel 2 ай бұрын

This was also my first thought - but they claim to have already run SD3 on the device.

@jonmichaelgalindo 2 ай бұрын

@@aifluxchannel I bet they ran just the transformer backbone. Pass the latent noise in over bus transfer, run denoise steps on Sohu's hardware, transfer the latent output back to GPU with stop-over in RAM, then run the VAE. That's a lot of hops.

@hjups 2 ай бұрын

@@aifluxchannel Do you have a link to this claim? It seems rather foolish to support SD3 since you are necessarily throwing away algorithmic improvements that come with causal attention - i.e. they may have gotten 1M tk/s if they did not support non-causal attention.

@bpolat 2 ай бұрын

One day those type of chips will be on computer or movile device and work with the largest text or video models witout even internet

@HaxxBlaster 2 ай бұрын

Isnt this just theorethical so far, if not, where is the demo?

@aifluxchannel 2 ай бұрын

No, they're fabbed the prototypes at TSMC already - going into production at TSMC within the month.

@HaxxBlaster 2 ай бұрын

@@aifluxchannel Thanks for the reply, but i need more to be convinced to see if this can become a consumer product for real. A prototype is one thing, but there could be a lot of other possible obstacles to get to a real product. Good luck to these guys, but i get an instant red flag when its a lot of talk and no real product yet

@GerryPrompt 2 ай бұрын

500,000 tokens/s???? 😂 Groq is TOAST

@aifluxchannel 2 ай бұрын

They certainly have a lot of catchup to work on. Glad they're no longer the "first" to enter this space. Do you use Groq with open source models or just run locally with your own GPU?

@hjups 2 ай бұрын

Groq's main attribute is latency, not throughput (although they have been marketing for throughput). While latency can still be low in the cloud, Groq is also used for sub-ms applications like controlling particle accelerators (LLMs were more of a "oh and we can do that too").

@MrDragonFoxAI 2 ай бұрын

groq is also old .. 14nm process .. - once they switch to the new samsung node .. it should be vastly different - the big win here is sram vs offdie hbm3e

@hjups 2 ай бұрын

@@MrDragonFoxAI 14nm is not "old" in the ASIC world. I don't know if Sohu stated a process node, but I would not be surprised if they did a tapeout at 14nm. If I recall correctly, the new LPU2 is moving to 7nm and will also have LPDDR5. But that comes with other challenges since Groq relies on cycle-accurate determinism (which is only possible with SRAM). So it's an engineering tradeoff. Also keep in mind that a Sohu die is likely much larger and power hungry than a Groq LPU die - I would not be surprised if Sohu was at the reticule limit (again a tradeoff).

@MrDragonFoxAI 2 ай бұрын

@@hjups they did .. they aim for a 4.5nm tsmc node - and probably what did the dev silicon too

@testingvidredactro Ай бұрын

Good luck, hope they succeed, gpu/tpu/npu/apu manufacturers and clouds just burning the power with their not optimal solutions for ML and charging too much for it...

@aifluxchannel Ай бұрын

Datacenter acceleration makes sense, but IMO NPU / TPU is a waste of time and engineering horsepowerl.

@maragia2009 2 ай бұрын

What of training. Inference is only one part, training is really, really, really important.

@bozbrown4456 2 ай бұрын

Nothing about Power usage / price

@aifluxchannel 2 ай бұрын

We currently don't have any information from Etched regarding price / power usage. However, given the size of the die and the rack density I think its safe to bet that Sohu is far more power efficient than the nVidia B200.

@lavafree 2 ай бұрын

Nahhh…nvidia just would integrate more specialized circuits into its chips

@aifluxchannel 2 ай бұрын

But their purpose is to ship narrow general compute devices not asics. The limiting factor is how nVidia GPUs have to rely on batch processes not streaming processes.

@theatheistpaladin 2 ай бұрын

We need a maba acsci lpu.

@yagoa 2 ай бұрын

without making chips that can combine memory and processing on wafer, you can't create a human-like intelligence

@aifluxchannel 2 ай бұрын

You just need enough of them networked together ;)

@jameshughes3014 2 ай бұрын

wow. designing and building hardware that can only run transformers, not knowing if something better would come along. But what an amazing payoff if they can be cost effective on release.

@aifluxchannel 2 ай бұрын

They have at least 2-3 years to let things play out. At least for now, the only advantages of Mamba and RNN's is their preferable scaling capability. Transformers still have the upper hand in raw performance.

@jameshughes3014 2 ай бұрын

@@aifluxchannel Oh I agree. Even if a novel algorithm takes over in a month and is the new hot thing, there's so much code now that uses transformers that it's gonna be useful.

@K9Megahertz 2 ай бұрын

Not sure it's wise to invest in making an AI chip only for transformers when transformers are not what is going to take us to the next level. Transformers are limited regardless of context size or how many tokens you can generate per second.

@IntentStore 2 ай бұрын

20x faster inference would take us to the next level. This makes AI-ifying everything basically free. It doesn’t have to be the permanent future, it’s just radically improving the usefulness of current models and enabling new applications which couldn’t be done before due to token speeds and latency, like making live voice assistants respond as quickly as a real person, and giving them CD quality voices instead of sounding like a Skype call.

@IntentStore 2 ай бұрын

It also means coding assistants and coding automation go from being sorta possible to; generate a working tested application with 200,000 lines of code in a few seconds.

@IntentStore 2 ай бұрын

Of course as soon as a better architecture is proven, they can immediately begin R&D on an asic for that too, because they will have tons of revenue from their previous successful product and investors.

@K9Megahertz 2 ай бұрын

@@IntentStore Faster crap is still crap. Transformers will never be able to do that. They fail simple algorithmic coding tests I give them. Because they don't know how to code, only spit out code they've been trained on that aligns with whatever prompt you give it. The reason it cant get the answer to one of the problems right is that the answer lies in a single PDF that was on the net 25 years ago. (It's still out there) I know this because I helped fix a bug in the algorithm back then. It's nothing complicated, maybe 100 lines of code. In order for a transformer to spit out a 200,000 line program, it would have had to been trained on a 200,000 line program. Multiple in fact. Which means the code had already had to have been written. Which at that point, you already have the code, what do you need a Transformer AI for? Transformers wont get significantly better, they cant. They're limited by their probabilistic text predictive noodles. Which just doesn't work for software.

@styx1272 2 ай бұрын

@@K9Megahertz What about multivariable Brainchip's spiking neural Akida2 chip with its unique TENN's algorithm ? It may possibly steamroll the competition.

@frodenystad6937 2 ай бұрын

This is real

@sativagirl1885 2 ай бұрын

human years to #AI is like dog years to humans.

@nenickvu8807 2 ай бұрын

There is an issue of the shortsightedness of designing chips to do inference alone, especially that of transformer based models. Transformer based models are probabilistic in nature, and businesses and individuals can't bet on them alone. Other forms of software and hardware needs to pair with this before it really does have long term value. It's a good start, but NVIDIA is still ahead. After all, NVIDIA use their GPUS to not just simply generate probabilities that are novel and interesting, they use them to design chips and model physics. Reliability and consistency is the real magic here. And no one, not Meta or Google or OpenAI have been able to hone down AI to the point where it is reliable and consistent. And they will never get there with simple inference chips like these.

@aifluxchannel 2 ай бұрын

Good point, but that stated usecase is far larger than what Etched has set out to solve with Sohu. Sohu is intended for one thing and one thing only, model inference with transformer based models.

@nenickvu8807 2 ай бұрын

@@aifluxchannel that's the problem. Competitor chips like Sohu may catch up to past use cases, but the industry has already evolved. Retrieval Augmented Generation is already the standard. Agentic and AI team based approaches is already becoming an expectation. So where is the investment going to come from that pays for this super inference chip that has limited or no collaborative functionality with other software and hardware, even if it inferences more quickly? And there is always the threat that the next big model might not be just based in transformers and will require hardware that is different. And the year after that, and the year after that. After all, software grows quickly, hardware rarely does.

@lb5928 2 ай бұрын

This clown just said no one is using AMD MI300X. 😂 Microsoft/Open AI, Oracle, IBM and Amazon AWS and more have announced they are using MI300x. Reportedly it is one of the best selling AI accelerators right now.

@Manicmick3069 2 ай бұрын

That's where I stopped listening. AMD engineering is world class. That's who NVIDIA needs to worry about. If ROCM comes with better integration, it's game time.

@bobtarmac1828 2 ай бұрын

Ai overhype. Can we …cease Ai? Or else be… laid off by Ai, then human extinction? Or suffer an… Ai new world order? With swell robotics everywhere, …Ai jobloss is the top worry. Anyone else feel the same?

@apoage 2 ай бұрын

ouh

@Zale370 2 ай бұрын

Finally someone showing how Jensen's marketing BS was just that!

@aifluxchannel 2 ай бұрын

Etched and Jensen make very different products focusing on the same market segment.

@MARKXHWANG Ай бұрын

And I can make a chip 10000X faster than them

@Maisonier 2 ай бұрын

This is fake news dude...

@aifluxchannel 2 ай бұрын

nVidia will definitely revise their B200 benchmark numbers, but the areas where Etched tweaks their statistics are the same places where Groq and Cerebras have also made claims. Regardless, these chips are currently the most performant when it comes to inference on transformer based models.

@HemangJoshi 2 ай бұрын

I also think so about comparison of all nvidia GPUs, here you have shown that nvidia GPUs are not getting better but they are for compute per energy unit. It is not only getting better but breaking Moore's laws.

@GilesBathgate 2 ай бұрын

My question would basically be "how" transformers attention basically do matmul and ffn is also matmul. I can understand how ASICS are better when you algo is sha(sha()) GPUs are not designed to do that, but how are these gains made when the algos are the same? memory access patterns?

@AltafKhan-qd1tk Ай бұрын

I'm literally working on developing this chip lol

@GilesBathgate Ай бұрын

@@AltafKhan-qd1tk So is it just more die area for floating point operation units, by removing stuff that GPU only has for graphics tasks? Is it basically a TPU?