The description of DeepSeek reminds me of my experience in networking in the late 80s - early 90s.
Back then a really big motivator for Asynchronous Transfer Mode (ATM) and fiber-to-the-home was the promise of video on demand, which was a huge market in comparison to the Internet of the day. Just about all the work in this area ignored the potential of advanced video coding algorithms, and assumed that broadcast TV-quality video would require about 50x more bandwidth than today's SD Netflix videos, and 6x more than 4K.
What made video on the Internet possible wasn't a faster Internet, although the 10-20x increase every decade certainly helped - it was smarter algorithms that used orders of magnitude less bandwidth. In the case of AI, GPUs keep getting faster, but it's going to take a hell of a long time to achieve a 10x improvement in performance per cm^2 of silicon. Vastly improved training/inference algorithms may or may not be possible (DeepSeek seems to indicate the answer is "may") but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.
Another aspect that reinforces your point is that the ATM push (and subsequent downfall) was not just bandwidth-motivated but also motivated by a belief that ATM's QoS guarantees were necessary. But it turned out that software improvements, notably MPLS to handle QoS, were all that was needed.
Plus the cell phone industry paved the way for VOIP by getting everyone used to really, really crappy voice quality. Generations of Bell Labs and Bellcore engineers would rather have resigned than be subjected to what's considered acceptable voice quality nowadays...
3G networks in many European countries were shut off in 2022-2024. The few remaining ones will go too over the next couple of years.
VoLTE is 5G, common throughout Europe. However the handset manufacturer may need to qualify each handset model with local carriers before they will connect using VoLTE. As I understand the situation, Google for instance has only qualified Pixel phones for 5G in 19 of 170-odd countries. So 5G features like VoLTE may not be available in all countries. This is very handset/country/carrier-dependent.
Yes, I think most video on the Internet is HLS and similar approaches which are about as far from the ATM circuit-switching approach as it gets. For those unfamiliar HLS is pretty much breaking the video into chunks to download over plain HTTP.
Yes, but that's entirely orthogonal to the "coding" algorithms being used and which are specifically responsible for the improvement that GP was describing.
HLS is really just a way to empower the client with the ownership of the playback logic. Let the client handle forward buffering, retries, stream selection, etc.
>> Plus the cell phone industry paved the way for VOIP by getting everyone used to really, really crappy voice quality
What accounts for this difference? Is there something inherently worse about the nature of cell phone infrastructure over land-line use?
I'm totally naive on such subjects.
I'm just old enough to remember landlines being widespread, but nearly all of my phone calls have been via cell since the mid 00s, so I can't judge quality differences given the time that's passed.
Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber. And at first, the novelty of being able to call anyone anywhere, even with this awful quality, was novel enough that people would accept it. And most people did until the carriers decided they could allocate a little more with VoLTE, if it works on your phone in your area.
> Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber.
Has it not been like this for a very long time? I was under the impression that "voice frequency" being defined as up to 4 kHz was a very old standard - after all, (long-distance) phone calls have always been multiplexed through coaxial or microwave links. And it follows that 8kbps is all you need to losslessly digitally sample that.
I assumed it was jitter and such that lead to lower quality of VoIP/cellular, but that's a total guess. Along with maybe compression algorithms that try to squeeze the stream even tighter than 8kbps? But I wouldn't have figured it was the 8kHz sample rate at fault, right?
Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation", but the rumbling of the vocal coords isn't the entire audio data which is present in-person. Clicks of the tongue and smacking of the lips make much higher frequencies, and higher sample rates capture the timbre/shape of the soundwave instead of rounding it down to a smooth sine wave. Discord defaults to 64kbps, but you can push it up to 96kbps or 128kbps with nitro membership, and it's not hard to hear an improvement with the higher bitrates. And if you've ever used bluetooth audio, you know the difference in quality between the bidirectional call profile, and the unidirectional music profile, and wished to have the bandwidth of the music profile with the low latency of the call profile.
> Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation"
Huh? What? That's not even remotely true.
If you read your comment out loud, the very first sound you'd make would have almost all of its energy concentrated between 4 and 10 kHz.
Human vocal cords constantly hit up to around 10 kHz, though auditory distinctiveness is more concentrated below 4 kHz. It is unevenly distributed though, with sounds like <s> and <sh> being (infamously) severely degraded by a 4 kHz cut-off.
AMR (adaptive multi-rate audio codec) can get down to 4.75 kbit/s when there's low bandwidth available, which is typically what people complain about as being terrible quality.
The speech codecs are complex and fascinating, very different from just doing a frequency filter and compressing.
The base is linear predictive coding, which encodes the voice based on a simple model of the human mouth and throat. Huge compression but it sounds terrible. Then you take the error between the original signal and the LPC encoded signal, this waveform is compressed heavily but more conventionally and transmitted along with the LPC signal.
Phones also layer on voice activity detection, when you aren't talking the system just transmits noise parameters and the other end hears some tailored white noise. As phone calls typically have one person speaking at a time and there are frequent pauses in speech this is a huge win. But it also makes mistakes, especially in noisy environments (like call centers, voice calls are the business, why are they so bad?). When this happens the system becomes unintelligible because it isn't even trying to encode the voice.
The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.
The 8kbps rates on cellular are the more complicated (relative to G.711) AMR-NB encoding. AMR supports voice rates from about 5-12kbps with a typical 8kbps rate. There's a lot more pre and post processing of the input signal and more involved encoding. There's a bit more voice information dropped by the encoder.
Part of the quality problem even today with VoLTE is different carriers support different profiles and calls between carriers will often drop down to the lowest common codec which is usually AMR-NB. There's higher bitrate and better codecs available in the standard but they're implemented differently by different carriers for shitty cellular carrier reasons.
> The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.
I'm a moron, thanks. I think I got the sample rate mixed up with the bitrate. Appreciate you clearing that up - and the other info!
And memory. In the heyday of ATM (late 90s) a few megabytes was quite expensive for a set-top box, so you couldn't buffer many seconds of compressed video.
Also, the phone companies had a pathological aversion to understanding Moore's law, because it suggested they'd have to charge half as much for bandwidth every 18 months. Long distance rates had gone down more like 50%/decade, and even that was too fast.
I worked on a network that used a protocol very similar to ATM (actually it was the first Iridium satellite network). An internet based on ATM would have been amazing. You’re basically guaranteeing a virtual switched circuit, instead of the packets we have today. The horror of packet switching is all the buffering it needs, since it doesn’t guarantee circuits.
Bandwidth is one thing, but the real benefit is that ATM also guaranteed minimal latencies. You could now shave off another 20-100ms of latency for your FaceTime calls, which is subtle but game changing. Just instant-on high def video communications, as if it were on closed circuits to the next room.
For the same reasons, the AI analogy could benefit from both huge processing as well as stronger algorithms.
> You’re basically guaranteeing a virtual switched circuit
Which means you need state (and the overhead that goes with it) for each connection within the network. That's horribly inefficient, and precisely the reason packet-switching won.
> An internet based on ATM would have been amazing.
No, we'd most likely be paying by the socket connection (as somebody has to pay for that state keeping overhead), which sounds horrible.
> You could now shave off another 20-100ms of latency for your FaceTime calls, which is subtle but game changing.
Maybe on congested Wi-Fi (where even circuit switching would struggle) or poorly managed networks (including shitty ISP-supplied routers suffering from horrendous bufferbloat). Definitely not on the majority of networks I've used in the past years.
> The horror of packet switching is all the buffering it needs [...]
The ideal buffer size is exactly the bandwidth-delay product. That's really not a concern these days anymore. If anything, buffers are much too large, causing unnecessary latency; that's where bufferbloat-aware scheduling comes in.
The cost for interactive video would be a requirement of 10x bandwidth, basically to cover idle time. Not efficient but not impossible, and definitely wouldn’t change ISP business models.
The latency benefit would outweigh the cost. Just absolutely instant video interaction.
It is fascinating to think that before digital circuits phone calls were accomplished by an end-to-end electrical connection between the handsets. What luxury that must have been! If only those ancestors of ours had modems and computers to use those excellent connections for low-latency gaming... :-)
You’re arguing for a reduction in quality in internet services. People do notice those things. It’s like claiming people don’t care about slimmer iPhones. They do.
Man, I saw a presentation on Iridium when I was at Motorola in the early 90s, maybe 92? Not a marketing presentation - one where an engineer was talking, and had done their own slides.
What I recall is that it was at a time when Internet folks had made enormous advances in understanding congestion behavior in computer networks, and other folks (e.g. my division of Motorola) had put a lot of time into understanding the limited burstiness you get with silence suppression for packetized voice, and these folks knew nothing about it.
I remember my professor saying how the fixed packet size in ATM (53 bytes) was a committee compromise. North America wanted 64 bytes, Europe wanted 32 bytes. The committee chose around the midway point.
Doesn’t your point about video compression tech support Nvidia’s bull case?
Better video compression led to an explosion in video consumption on the Internet, leading to much more revenue for companies like Comcast, Google, T-Mobile, Verizon, etc.
More efficient LLMs lead to much more AI usage. Nvidia, TSMC, etc will benefit.
No - because this eliminates entirely or shifts the majority of work from GPU to CPU - and Nvidia does not sell CPUs.
If the AI market gets 10x bigger, and GPU work gets 50% smaller (which is still 5x larger than today) - but Nvidia is priced on 40% growth for the next ten years (28x larger) - there is a price mismatch.
It is theoretically possible for a massive reduction in GPU usage or shift from GPU to CPU to benefit Nvidia if that causes the market to grow enough - but it seems unlikely.
Also, I believe (someone please correct if wrong) DeepSeek is claiming a 95% overall reduction in GPU usage compared to traditional methods (not the 50% in the example above).
If true, that is a death knell for Nvidia's growth story after the current contracts end.
I can see close to zero possibility that the majority of the work will be shifted to the CPU. Anything a CPU can do can just be done better with specialised GPU hardware.
Then why do we have powerful CPUs instead of a bunch of specialized hardware? It's because the value of a CPU is in its versatility and ubiquity. If a CPU can do a thing good enough, then most programs/computers will do that thing on a CPU instead of having the increased complexity and cost of a GPU, even if a GPU would do it better.
We have both? Modern computing devices like smart phones use SoCs with integrated GPUs. GPUs aren't really specialized hardware, either, they are general purpose hardware useful in many scenarios (built for graphics originally but clearly useful in other domains including AI).
People have been saying the exact same thing about other workloads for years, and always been wrong. Mostly claiming custom chips or FPGAs will beat out general purpose CPUs.
Yes, I was too hasty in my response. I should have been more specific that I mean ML/AI type tasks. I see no way that we end up on general purpose CPUs for this.
In terms of inference (and training) of AI models, sure, most things that a CPU core can do would be done cheaper per unit of performance on either typical GPU or NPU cores.
On desktop, CPU decoding is passable but it's still better to have a graphics card for 4K. On mobile, you definitely want to stick to codecs like H264/HEVC/AVC1 that are supported in your phone's decoder chips.
CPU chipsets have borrowed video decoder units and SSE instructions from GPU-land, but the idea that video decoding is a generic CPU task now is not really true.
Now maybe every computer will come with an integrated NPU and it won't be made by Nvidia, although so far integrated GPUs haven't supplanted discrete ones.
I tend to think today's state-of-the-art models are ... not very bright, so it might be a bit premature to say "640B parameters ought to be enough for anybody" or that people won't pay more for high-end dedicated hardware.
> Now maybe every computer will come with an integrated NPU and it won't be made by Nvidia, although so far integrated GPUs haven't supplanted discrete ones.
Depends on what form factor you are looking at. The majority of computers these days are smart phones, and they are dominated by systems-on-a-chip.
That's also what AVX is but with a conservative number of threads.. If you really understand your problem I don't see why you would need 32 threads of much smaller data size or why you would want that far away from your CPU.
Whether your new coprocessor or instructions look more like a GPU or something else doesn't really matter if we are done squinting and calling it graphics like problems and/or claiming it needs a lot more than a middle class PC.
It lead to more revenue for the industry as a whole. But not necessarily for the individual companies that bubbled the hardest: Cisco stock is still to this day lower than it was at peak in 2000, to point to a significant company that sold actual physical infra products necessary for the internet and still around and profitable to this day. (Some companies that bubbled did quite well, AMZN is like 75x from where it was in 2000. But that's a totally different company that captured an enormous amount of value from AWS that was not visible to the market in 2000, so it makes sense.)
If stock market-cap is (roughly) the market's aggregated best guess of future profits integrated over all time, discounted back to the present at some (the market's best guess of the future?) rate, then increasing uncertainty about the predicted profits 5-10 years from now can have enormous influence on the stock. Does NVDA have an AWS within it now?
>It lead to more revenue for the industry as a whole. But not necessarily for the individual companies that bubbled the hardest: Cisco stock is still to this day lower than it was at peak in 2000, to point to a significant company that sold actual physical infra products necessary for the internet and still around and profitable to this day. (Some companies that bubbled did quite well, AMZN is like 75x from where it was in 2000. But that's a totally different company that captured an enormous amount of value from AWS that was not visible to the market in 2000, so it makes sense.)
Cisco in 1994: $3.
Cisco after dotcom bubble: $13.
So is Nvidia's stock price closer to 1994 or 2001?
I agree that advancements like DeepSeek, like transformer models before it, is just going to end up increasing demand.
It’s very shortsighted to think we’re going to need fewer chips because the algorithms got better. The system became more efficient, which causes induced demand.
If you normalize Nvidia's gross margin and take into account of competitors sure. But its current high margin is driven by Big Tech FOMO. Do keep in mind that 90% margin or 10x cost to 50% margin or 2x cost is a 5x price reduction.
Because DeepSeek demonstrates that loads of compute isn't necessary for high-performing models, and so we won't need as much and as powerful of hardware as was previously thought, which is what Nivida's valuation is based on?
That's assuming there isn't demand for more powerful models, there's still plenty of room for improvement from the current generation. We didn't stop at GPT-3 level models when that was achieved.
Not only are 10-100x changes disruptive, but the players who don't adopt them quickly are going to be the ones who continue to buy huge amounts of hardware to pursue old approaches, and it's hard for incumbent vendors to avoid catering to their needs, up until it's too late.
When everyone gets up off the ground after the play is over, Nvidia might still be holding the ball but it might just as easily be someone else.
Yes, over the long haul, probably. But as far as individual investors go they might not like that Nvidia.
Anyone currently invested is presumably in because they like the insanely high profit margin, and this is apt to quash that. There is now much less reason to give your first born to get your hands on their wares. Comcast, Google, T-Mobile, Verizon, etc., and especially those not named Google, have nothingburger margins in comparison.
If you are interested in what they can do with volume, then there is still a lot of potential. They may even be more profitable on that end than a margin play could ever hope for. But that interest is probably not from the same person who currently owns the stock, it being a change in territory, and there is apt to be a lot of instability as stock changes hands from the one group to the next.
That would be an unusual situation for an ETF. An ETF does not usually extend ownership of the underlying investment portfolio. An ETF normally offers investors the opportunity to invest in the ETF itself. The ETF is what you would be invested in. Your concern as an investor in an ETF would only be with the properties of the ETF, it being what you are invested in, and this seems to be true in your case as well given how you describe it.
Are you certain you are invested in Nvidia? The outcome of the ETF may depend on Nvidia, but it may also depend on how a butterfly in Africa happens to flap its wings. You aren't, by any common definition found within this type of context, invested in that butterfly.
Technically, all the Nvidia stock (and virtually all stocks in the US) are owned by Cede and Co. So Nvidia has only one investor.[0] There's several layers of indirection between your Robinhood portfolio and the actual Nvidia shares, even if Robinhood mentions NVDA as a position in your portfolio.
You will find that the connection between ETFs and the underlying assets in the index is much more like the connection between your Robinhood portfolio and Nvidia, than the connection between butterflies and thunderstorms.
[0] At least for its stocks. Its bonds are probably held in different but equally weird ways.
> Technically, all the Nvidia stock (and virtually all stocks in the US) are owned by Cede and Co.
Technically, but they extend ownership. An ETF is a different type of abstraction. Which you already know because you spoke about that abstraction in your original comment, so why play stupid now?
It seems more stark even. The energy costs that are current and then projected for AI are staggering. At the same time, I think it has been MS that has been publishing papers on LLMs that are smaller (so called small language models) but more targeted and still achieving a fairly high "accuracy rate."
Didn't TMSC say that SamA came for a visit and said they needed $7T in investment to keep up with the pending demand needs.
This stuff is all super cool and fun to play with, I'm not a nay sayer but it almost feels like these current models are "bubble sort" and who knows how it will look if "quicksort" for them becomes invented.
>but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.
The rise of the net is Jevons paradox fulfilled. The orders of magnitude less bandwidth needed per cat video drove much more than that in overall growth in demand for said videos. During the dotcom bubble's collapse, bandwidth use kept going up.
Even if there is a near-term bear case for NVDA (dotcom bubble/bust), history indicates a bull case for the sector overall and related investments such as utilities (the entire history of the tech sector from 1995 to today).
The web didn't go from streaming 480p straight to 4k. There were a couple of intermediate jumps in pixel count that were enabled in large part by better compression. Notably, there was a time period where it was important to ensure your computer had hardware support for H.264 decode, because it was taxing on low-power CPUs to do at 1080p and you weren't going to get streamed 1080p content in any simpler, less efficient codec.
Correct. DCT maps N real numbers to N real numbers. It reorganizes the data to make it more amenable to compression, but DCT itself doesn't do any compression.
The real compression comes from quantization and entropy coding (Huffman coding, arithmetic coding, etc.).
> DCT compression, also known as block compression, compresses data in sets of discrete DCT blocks.[3] DCT blocks sizes including 8x8 pixels for the standard DCT, and varied integer DCT sizes between 4x4 and 32x32 pixels.[1][4] The DCT has a strong energy compaction property,[5][6] capable of achieving high quality at high data compression ratios.[7][8] However, blocky compression artifacts can appear when heavy DCT compression is applied.
DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.
When someone can replicate your model for 5% of the cost in 2 years, I can only see 2 rational decisions:
1) Start focusing on cost efficiency today to reduce the advantage of the second mover (i.e. trade growth for profitability)
2) Figure out how to build a real competitive moat through one or more of the following: economies of scale, network effects, regulatory capture
On the second point, it seems to me like the only realistic strategy for companies like OpenAI is to turn themselves into a platform that benefits from direct network effects. Whether that's actually feasible is another question.
This is wrong. First mover advantage is strong. This is why OpenAI is much bigger than Mixtral despite what you said.
First mover advantage acquired and keeps subscribers.
No one really cares if you matched GPT4o one year later. OpenAI has had a full year to optimize the model, build tools around the model, and used the model to generate better data for their next generation foundational model.
I think it's worth double clicking here. Why did Google have significantly better search results for a long time?
1) There was a data flywheel effect, wherein Google was able to improve search results by analyzing the vast amount of user activity on its site.
2) There were real economies of scale in managing the cost of data centers and servers
3) Their advertising business model benefited from network effects, wherein advertisers don't want to bother giving money to a search engine with a much smaller user base. This profitability funded R&D that competitors couldn't match.
There are probably more that I'm missing, but I think the primary takeaway is that Google's scale, in and of itself, led to a better product.
Can the same be said for OpenAI? I can't think of any strong economies of scale or network effects for them, but maybe I'm missing something. Put another way, how does OpenAI's product or business model get significantly better as more people use their service?
You are forgetting a bit, I worked in some of the large datacenters where both Google and Yahoo had cages.
1) Google copied the hotmail model of strapping commodity PC components to cheap boards and building software to deal with complexity.
2) Yahoo had a much larger cage, filled with very very expensive and large DEC machines, with one poor guy sitting in a desk in there almost full time rebooting the systems etc....I hope he has any hearing left today.
3) Just right before the .com crash, I was in a cage next to Google's racking dozens of brand new Netra T1s, which were pretty slow and expensive...that company I was working for died in the crash.
Google grew to be profitable because they controlled costs, invested in software vs service contracts and enterprise gear, had a simple non-intrusive text based ad model etc...
Most of what you mention above was well after that model focused on users and thrift allowed them to scale and is survivorship bias. Internal incentives that directed capitol expenditures to meet the mission vs protect peoples back was absolutely a related to their survival.
Even though it was a metasearch, my personal preference was SavvySearch until it was bought and killed or what ever that story way.
In theory, the more people use the product, the more OpenAI knows what they are asking about and what they do after the first result, the better it can align its model to deliver better results.
A similar dynamic occurred in the early days of search engines.
I call it the experience flywheel. Humans come with problems, AI asistant generates some ideas, human tries them out and comes back to iterate. The model gets feedback on prior ideas. So you could say AI tested an idea in the real world, using a human. This happens many times over for 300M users at OpenAI. They put a trillion tokens into human brains, and as many into their logs. The influence is bidirectional. People adapt to the model, and the model adapts to us.. But that is in theory.
In practice I never heard OpenAI mention how they use chat logs for improving the model. They are either afraid to say, for privacy reasons, or want to keep it secret for technical advantage. But just think about the billions of sessions per month. A large number of them contain extensive problem solving. So the LLMs can collect experience, and use it to improve problem solving. This makes them into a flywheel of human experience.
The same one that underpins the entire existence of a little company called Spotify: I'm just too lazy to cancel my subscription and move to a newer player.
Not exactly a good sign for OpenAI considering Spotify has no power to increase prices enough such that it can earn a decent profit. Spotify’s potential is capped at whatever Apple/Amazon/Alphabet let them earn.
All these "OpenAI has no moat" arguments will only make sense whenever there's a material, observable (as in not imaginary), shift on their market share.
OpenAI does not have a business model that is cashflow positive at this point and/or a product that gives them a significant leg up in the same moat sense Office/Teams might give to Microsoft.
High interest rates are supposed to force the remaining businesses out there to be profitable, so in theory, the startups of today should be far faster to profitability or they burn out.
Nobody expects it but what we know for sure is that they have burnt billions of dollars. If other startups can get there spending millions, the fact is that openai won't ever be profitable.
And more important (for us), let the hiring frenzy start again :)
They have a ton of revenue and high gross margins. They burn billions because they need to keep training ever better models until the market slows and competition consolidates.
When the market matures, there will be fewer competitors so they won’t need to sustain the level of investment.
The market always consolidates when it matures. Every time. The market always consolidates into 2-3 big players. Often a duopoly. OpenAI is trying to be one of the two or three companies left standing.
> First mover advantage acquired and keeps subscribers.
Does it? As a chat-based (Claude Pro, ChatGPT Plus etc.) user, LLMs have zero stickiness to me right now, and the APIs hardly can be called moats either.
If it's for mass consumer market then it does matter. Ask any non-technical person around you. High chance is that they know ChatGPT but can't name a single other AI model or service. Gemini, just a distant maybe. Claude, definitely not -- I'm positive I'm hard pressed to find anyone in my technical friends who knows about Claude.
> DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.
you are assuming that what DeepSeek achieved can be reasonably easily replicated by other companies. then the question is when all big techs and tons of startups in China and the US are involved, how come none of those companies succeeded?
There seem to be a 100 fold uptick in jingoists in the last 3-4 years which makes my head hurt but I think there is no consistent "underestimation" in academic circles? I think I have read articles about the up and coming Chinese STEM for like 20 years.
Yes, for people in academia the trend is clear, but it seems that WallStreet didn't believe this was possible. They assume that spending more money is all you need to dominate technology. Wrong! Technology is about human potential. If you have less money but bigger investment in people you'll win the technological race.
I think Wall Street is in for surprise as they have been profiting from liquidating the inefficiency of worker trust and loyalty for quite some time now.
It think they think American engineering excellence was due to neoliberal inginuenity visavi the USSR, not the engineers and the transfer of academic legacy from generation to generation.
This is even more apparent when large tech corporations are, supposedly, in a big competition but at the same time firing thousands of developers and scientists. Are they interested in making progress or just reducing costs?
What does DeepSeek or really High Flyer do that is particularly exceptional regarding employees? HFT and other elite law or Hedge funds are known to have pretty zany benefits.
That doesn't the calculus regarding the actions you would pick externally, in fact it only strengthens the point for increased tech restrictions and more funding.
Which brings the question, if LLMs are an asset of such strategic value, why did China allow the DeepSeek to be released?
I see two possibilities here, either that the CCP is not that all-reaching as we think, or that the value of the technology isn't critical, and that the release was further cleared with the CCP and maybe even timed to come right after Trump's announcement of American AI supremacy.
I really doubt there was any intention behind it at all. I bet deepseek themselves are surprised at the impact this is having, and probably regret releasing so much information into the open.
It is hard to estimate how much it is "didn't care", "didn't know" or "did it" I think. Rather pointless unless there are public party discussion about it to read.
It will be assumed by the American policy establishment that this represents what the CCP doesn't consider important, meaning that they have even better stuff in store. It will also be assumed that this was timed to take a dump on Trump's announcement, like you said.
And it did a great job. Nvidia stock's sunk, and investors are going to be asking if it's really that smart to give American AI companies their money when the Chinese can do something similar for significantly less money.
I mean, it's a strategic asset in the sense that it's already devalued a lot of the American tech companies because they're so heavily invested in AI. Just look at NVDA today.
Your making some big assumptions projecting into the future. One that deepseek takes market position, two that the information they have released is honest regarding training usage, spend etc.
Theres a lot more still to unpack and I don’t expect this to stay solely in the tech realm. Seems to politically sensitive.
I feel like AI tech just reverse scales and reverse flywheels, unlike the tech giant walls and moats now, and I think that is wonderful. OpenAI has really never made sense from a financial standpoint and that is healthier for humans. There’s no network effect because there’s no social aspect to AI chatbots. I can hop on DeepSeek from Google Gemini or OpenAI at ease because I don’t have to have friends there and/or convince them to move. AI is going to be a race to the bottom that keeps prices low to zero. In fact I don’t know how they are going to monetize it at all.
DeepSeek is not profitable. As far as I know, they don’t have any significant revenue from their models. Meanwhile, OpenAI has $3.7b in revenue last reported and has high gross margins.
Deepseek inference API has positive margin. This however does not take into account R&D like salary and training cost. I believe OpenAI is the same in these aspects, at least before now.
Even if DeepSeek has figured out how to do more (or at least as much) with less, doesn't the Jevons Paradox come into play? GPU sales would actually increase because even smaller companies would get the idea that they can compete in a space that only 6 months ago we assumed would be the realm of the large mega tech companies (the Metas, Googles, OpenAIs) since the small players couldn't afford to compete. Now that story is in question since DeepSeek only has ~200 employees and claims to be able to train a competitive model for about 20X less than the big boys spend.
My interpretation is that yes in the long haul, lower energy/hardware requirements might increase demand rather than decrease it. But right now, DeepSeek has demonstrated that the current bottleneck to progress is _not_ compute, which decreases the near term pressure on buying GPUs at any cost, which decreases NVIDIA's stock price.
Short term, I 100% agree, but remains to be seen what "short" means. According to at least some benchmarks, Deepseek is two full orders of magnitude cheaper for comparable performance. Massive. But that opens the door for much more elaborate "architectures" (chain of thought, architect/editor, multiple choice) etc, since it's possible to run it over and over to get better results, so raw speed & latency will still matter.
I think it's worth carefully pulling apart _what_ DeepSeek is cheaper at. It's somewhat cheaper at inference (0.3 OOM), and about 1-1.5 OOM cheaper for training (Inference costs: https://www.latent.space/p/reasoning-price-war)
It's also worth keeping in mind that depending on benchmark, these values change (and can shrink quite a bit)
And it's also worth keeping in mind that the drastic drop in training cost(if reproducible) will mean that training is suddenly affordable for a much larger number of organizations.
I'm not sure the impact on GPU demand will be as big as people assume.
It does, but proving that it can be done with cheaper (and more importantly for NVidia), lower margin chips breaks the spell that NVidia will just be eating everybody's lunch until the end of time.
The sweet spot for running local LLMs (from what I'm seeing on forums like r/localLlama) is 2 to 4 3090s each with 24GB of VRAM. NVidia (or AMD or Intel) would clean up if they offered a card with 3090 level performance but with 64GB of VRAM. Doesn't have to be the leading edge GPU, just a decent GPU with lots of VRAM. This is kind of what Digits will be (though the memory bandwidth is going to be slower with because it'll be DDR5) and kind of what AMD's Strix Halo is aiming for - unified memory systems where the CPU & GPU have access to the same large pool of memory.
The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)
Because if you don't have infinite money, considering whether to buy a thing is about the ratio of price to performance, not just performance. If you can get enough performance for your needs out of a cheaper chip, you buy the cheaper chip.
The AI industry isn't pausing because DeepSeek is good enough. The industry is in an arms race to AGI. Having a more efficient method to train and use LLMs only accelerates progress, leading to more chip demand.
Important to note: the $5 million alleged cost is just the cpu compute cost for the final version of the model; it's not the cumulative cost of the research to date.
The analogous costs would be what OpenAI spent to go from GPT 4 to GPT 4o (i.e., to develop the reasoning model from the most up-to-date LLM model). $5 million is still less than what OpenAI spent but it's not a magnitude lower. (OpenAI spent up to $100 million on GPT4 but a fraction of that to get GPT 4o. Will update comment if I can find numbers for 4o before edit window closes)
It doesn't make sense to compare individual models. A better way is to look at total compute consumed, normalized by the output. In the end what counts is the cost of providing tokens.
As pointed out in the article, Nvidia has several advantages including:
- Better Linux drivers than AMD
- CUDA
- pytorch is optimized for Nvidia
- High-speed interconnect
Each of the advantages is under attack:
- George Hotz is making better drivers for AMD
- MLX, Triton, JAX: Higher level abstractions that compile down to CUDA
- Cerbras and Groq solve the interconnect problem
The article concludes that NVIDIA faces an unprecedented convergence of competitive threats. The flaw in the analysis is that these threats are not unified. Any serious competitor must address ALL of Nvidia's advantages. Instead Nvidia is being attacked by multiple disconnected competitors, and each of those competitors is only attacking one Nvidia advantage at a time. Even if each of those attacks are individually successful, Nvidia will remain the only company that has ALL of the advantages.
* Groq can't produce more hardware past their "demo". It seems like they haven't grown capacity in the years since they announced, and they switched to a complete SaaS model and don't even sell hardware anymore.
The same Hotz who lasted like 4 weeks at Twitter after announcing that he'd fix everything? It doesn't really inspire a ton of confidence that he can single handedly take down Nvidia...
They have their own nn,etc libraries so adapting should be fairly focused and AMD drivers have a hilariously bad reputation historically among people who program GPU's (I've been bitten a couple of times myself by weirdness).
I think you should consider it as, if they're trying to avoid Nvidia and make sure their code isn't tied to NVidia-isms, and AMD is troublesome enough for basics the step to customized solutions is small enough to be worthwhile for something even cheaper than AMD.
> Any serious competitor must address ALL of Nvidia's advantages.
Not really, his article focuses on Nvidia's being valued so highly by stock markets, he's not saying that Nvidia's destined to lose its advantage in the space in the short term.
In any case, I also think that the likes of MSFT/AMZN/etc will be able to reduce their capex spending eventually by being able to work on a well integrated stack on their own.
They have an enormous amount of catching up to do, however; Nvidia have created an entire AI ecosystem that touches almost every aspect of what AI can do. Whatever it is, they have a model for it, and a framework and toolkit for working with or extending that model - and the ability to design software and hardware in lockstep. Microsoft and Amazon have a very diffuse surface area when it comes to hardware, and being a decent generalist doesn’t make you a good specialist.
Nvidia are doing phenomenal things with robotics, and that is likely to be the next shoe to drop, and they are positioned for another catalytic moment similar to that which we have seen with LLMS.
I do think we will see some drawback or at least deceleration this year while the current situation settles in, but within the next three years I think we will see humanoid robots popping up all over the place, particularly as labour shortages arise due to political trends - and somebody is going to have to provide the compute, both local and cloud, and the vision, movement, and other models. People will turn to the sensible and known choice.
So yeah, what you say is true, but I don’t think is going to have an impact on the trajectory of nvidia.
>So how is this possible? Well, the main reasons have to do with software— better drivers that "just work" on Linux and which are highly battle-tested and reliable (unlike AMD, which is notorious for the low quality and instability of their Linux drivers)
This does not match my experience from the past ~6 years of using AMD graphics on Linux. Maybe things are different with AI/Compute, I've never messed with that, but in terms of normal consumer stuff the experience of using AMD is vastly superior than trying to deal with Nvidia's out-of-tree drivers.
He's setting up a case for shorting the stock, ie if the growth or margins drop a little from any of these (often well-funded) threats. The accuracy of the article is a function of the current valuation.
Exactly. You just need to see a slight deceleration in projected revenue growth (which has been running 120%+ YoY recently) and some downward pressure on gross margins, and maybe even just some market share loss, and the stock could easily fall 25% from that.
You have to look at non-gaap numbers, and therefore looking at forward PE ratios is necessary. When you look at that, AMD is cheaper than NVDA. Moreover, the reason why AMD PE ratio looks high is because they bought xilinx, and in order to save on taxes, it makes their PE ratio look really high.
That is extraordinarily simplistic. If NVDA is slowing and AMD has gains to realize compared to NVDA, then the 10x difference in market cap would imply that AMD is the better buy. Which is why I am long in AMD. You can't just look at the current P/E delta. You have to look at expectations of one vs the other. AMD gaining 2x over NVDA means they are approximately equivalently valued. If there are unrealized AI related gains all bets are off. AMD closing 50% of the gap in market cap value between NVDA and AMD means AMD is ~2.5x undervalued.
Disclaimer: long AMD, and not precise on percentages. Just illustrating a point.
The point is, it should not be taken for granted that NVDA is overvalued. Their P/E is low enough that if you’re going to state that they are overvalued you have to make the case. The article while well written, fails to make the case because it has a flaw: it assumes that addressing just one of Nvidia’s advantages is enough to make it crash and that’s just not true.
My point is that you have to make the case for anything being over/undervalued. The null hypothesis is that the market has correctly valued it, after all.
If medium to long term you believe the space will eventually get commoditized I the bear case is obvious. And based on history there's a pretty high likelihood for that to happen.
No thats not true. Hedge funds get paid so well because getting a small percentage of a big bag of money is still a big bag of money. This statement is more true the closer the big bag of money is to infinity.
NVDA is valued at $3.5 trillion, which means investors think it will grow to around $1 trillion in yearly revenue. Current revenue is around $35 billion per quarter, so call it $140 billion yearly. Investors are betting on a 7x increase in revenue. Not impossible, sounds plausible but you need to assume AMD, INTC, GOOG, AMZN, and all the others who make GPUs/TPUs either won't take market share or the market will be worth multiple trillions per year.
Tech companies are valued higher because lots of people think there's still room for the big tech companies to consolidate market share and for the market itself to grow, especially as they all race towards AI. Low interest rates, tech and AI hype add to it.
Funny timing though, today NVDA lost $589 billion in market cap as the market got spooked.
Unless something radically changed in the last couple years, I am not sure where you got this from? (I am specifically talking about GPUs for computer usage rather than training/inference)
> Unless something radically changed in the last couple years, I am not sure where you got this from?
This was the first thing that stuck out to me when I skimmed the article, and the reason I decided to invest the time reading it all. I can tell the author knows his shit and isn't just parroting everyone's praise for AMD Linux drivers.
> (I am specifically talking about GPUs for computer usage rather than training/inference)
Same here. I suffered through the Vega 64 after everyone said how great it is. So many AMD-specific driver bugs, AMD driver devs not wanting to fix them for non-technical reasons, so many hard-locks when using less popular software.
The only complaints about Nvidia drivers I found were "it's proprietary" and "you have to rebuild the modules when you update the kernel" or "doesn't work with wayland".
I'd hesitate to ever touch an AMD GPU again after my experience with it, haven't had a single hick-up for years after switching to Nvidia.
Another ding against Nvidia for Linux desktop use is that only some distributions either make it easy to install and keep the proprietary drivers updated (e.g. Ubuntu) and/or ship variants with the proprietary drivers preinstalled (Mint, Pop!_OS, etc).
This isn’t a barrier for Linux veterans but it adds significant resistance for part-time users, even those that are technically inclined, compared to the “it just works” experience one gets with an Intel/AMD GPU under just about every Linux distro.
they are, unless you get distracted by things like licensing and out of tree drivers and binary blobs. If you'd rather pontificate about open source philosophy and rights than get stuff done, go right ahead.
The unification of the flaws is the scarcity of H100s
He says this and talks about it in The Fallout section - even at BigCos with megabucks the teams are starved for time on the Nvidia chips and if these innovations work other teams will use them and then boom Nvidia's moat is truncated somehow which doesn't look good at such lofty multiples
George Hotz is a hot Internet celebrity that has basically accomplished nothing of value but has a large cult following. You can safely ignore.
(Famous for hacking the PS3–except he just took credit for a separate group’s work. And for making a self-driving car in his garage—except oh wait that didn’t happen either.)
He took an “internship” at Twitter/X with the stated goal of removing the login wall, apparently failing to realize that the wall was a deliberate product decision, not a technical challenge. Now the X login wall is more intrusive than ever.
Yes, but it's worth mentioning that the break consisted of opening up the phone and soldering on a bypass for the carrier card locking logic. That certainly required some skills to do, but is not an attack Apple was defending against. This unlocking break didn't really lead to anything, and was unlike the later software unlocking methods that could be widely deployed.
You’re not wrong, but after all these years it’s fair to give benefit of the doubt - geohot may have grown as a person. The PS3 affair was incredibly disappointing.
Given the number of times he has been on the news for bombastic claims he doesn’t follow through on, I don’t think we need to guess. He hasn’t changed.
What specifically is in comma.ai that makes it less technically impressive? Comma.ai looks like epic engineering to me. I haven't made any self driving cars.
Why do you think otherwise? Can you share specific details?
In which way? As a user who switched from an AMD-GPU to Nvidia-GPU, I can only report a continued amount of problems with NVIDIAs proprietary driver, and none with AMD. Is this maybe about the open source-drivers or usage for AI?
A new entrant, with an order of magnitude advantage in e.g. cost or availability or exportability, can succeed even with poor drivers and no CUDA etc. Its only when you cost nearly as much as NVidia that the tooling costs become relevant.
Those are definitely not the limiting factors here.
Not nearly all data centers are water cooled, and there is this amazing technology that can convert sunlight into electricity in a relatively straightforward way.
AI workloads (at least training) are just about as geographically distributeable as it gets due to not being very latency-sensitive, and even if you can't obtain sufficient grid interconnection or buffer storage, you can always leave them idle at night.
Solar microgrids are cheaper and faster than nuclear. New nuclear isn't happening on the timescales that matter, even assuming significant deregulation.
Well, prediction is very difficult, especially with respect to the future. But the fundamentals look good.
Current world marketed energy consumption is about 18 terawatts. Current mainstream solar panels are 21% efficient. At this efficiency, the terrestrial solar resource is about 37000 terawatts, 2000 times larger than the entire human economy:
~ $ units
Currency exchange rates from exchangerate-api.com (USD base) on 2024-11-25
Consumer price index data from US BLS, 2024-11-24
7290 units, 125 prefixes, 169 nonlinear units
You have: 21% solarirradiance circlearea(earthradius)
You want: TW
* 36531.475
/ 2.7373655e-05
So, once datacenters are using seven hundred thousand times more power than currently, we might need to seek power sources for them other than terrestrial solar panels running microgrids. Solar panels in space, for example.
You could be forgiven for wondering why this enormous resource has taken so long to tap into and why the power grid is still largely fossil-fuel-powered. The answer is that building fossil fuel plants only costs on the order of US$1–4 per watt (either nameplate or average), and until the last few years, solar panels cost so much more than that that even free "fuel" wasn't enough to make them economically competitive. See https://www.eia.gov/analysis/studies/powerplants/capitalcost... for example.
Today, however, solar panels cost US$0.10 per peak watt, which works out to about US$0.35 to US$1 per average watt, depending largely on latitude. This is 25% lower than the price of even a year ago and a third of the price of two years ago. https://www.solarserver.de/photovoltaik-preis-pv-modul-preis...
>Now, you still want to train the best model you can by cleverly leveraging as much compute as you can and as many trillion tokens of high quality training data as possible, but that's just the beginning of the story in this new world; now, you could easily use incredibly huge amounts of compute just to do inference from these models at a very high level of confidence or when trying to solve extremely tough problems that require "genius level" reasoning to avoid all the potential pitfalls that would lead a regular LLM astray.
I think this is the most interesting part. We always knew a huge fraction of the compute would be on inference rather than training, but it feels like the newest developments is pushing this even further towards inference.
Combine that with the fact that you can run the full R1 (680B) distributed on 3 consumer computers [1].
If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs, what happens when that is only important to a small fraction of the overall AI compute?
Conversely, how much larger can you scale if frontier models only currently need 3 consumer computers?
Imagine having 300. Could you build even better models? Is DeepSeek the right team to deliver that, or can OpenAI, Meta, HF, etc. adapt?
Going to be an interesting few months on the market. I think OpenAI lost a LOT in the board fiasco. I am bullish on HF. I anticipate Meta will lose folks to brain drain in response to management equivocation around company values. I don't put much stock into Google or Microsoft's AI capabilities, they are the new IBMs and are no longer innovating except at obvious margins.
Google is silently catching up fast with Gemini. They're also pursuing next gen architectures like Titan. But most importantly, the frontier of AI capabilities is shifting towards using RL at inference (thinking) time to perform tasks. Who has more data than Google there? They have a gargantuan database of queries paired with subsequent web nav, actions, follow up queries etc. Nobody can recreate this, Bing failed to get enough marketshare. Also, when you think of RL talent, which company comes to mind? I think Google has everyone checkmated already.
Can you say more about using RL at inference time, ideally with a pointer to read more about it? This doesn’t fit into my mental model, in a couple of ways. The main way is right in the name: “learning” isn’t something that happens at inference time; inference is generating results from already-trained models. Perhaps you’re conflating RL with multistage (e.g. “chain of thought”) inference? Or maybe you’re talking about feeding the result of inference-time interactions with the user back into subsequent rounds of training? I’m curious to hear more.
I wasn't clear. Model weights aren't changing at inference time. I meant at inference time the model will output a sequence of thoughts and actions to perform tasks given to it by the user. For instance, to answer a question it will search the web, navigate through some sites, scroll, summarize, etc. You can model this as a game played by emitting a sequence of actions in a browser. RL is the technique you want to train this component. To scale this up you need to have a massive amount of examples of sequences of actions taken in the browser, the outcome it led to, and a label for if that outcome was desirable or not. I am saying that by recording users googling stuff and emailing each other for decades Google has this massive dataset to train their RL powered browser using agent. Deepseek proving that simple RL ca be cheaply applied to a frontier LLM and have reasoning organically emerge makes this approach more obviously viable.
Makes sense, thanks. I wonder whether human web-browsing strategies are optimal for use in a LLM, e.g. given how much faster LLMs are at reading the webpages they find, compared to humans? Regardless, it does seem likely that Google’s dataset is good for something.
How quickly the narrative went from 'Google silently has the most advanced AI but they are afraid to release it' to 'Google is silently catching up' all using the same 'core Google competencies' to infer Google's position of strength. Wonder what the next lower level of Google silently leveraging their strength will be?
Google is clearly catching up. Have you tried the recent Gemini models? Have you tried deep research? Google is like a ship that is hard to turn around but also hard to stop once in motion.
It seems like there is MUCH to gain by migrating to this approach - and it theoretically should not cost more to switch to that approach than vs the rewards to reap.
I expect all the major players are already working full-steam to incorporate this into their stacks as quickly as possible.
IMO, this seems incredibly bad to Nvidia, and incredibly good to everyone else.
I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.
They'll have a slight disadvantage to say Google - who can much more easily switch from GPU to CPU. ChatGPT could have some growing pains there. Google would not.
> I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.
Often expenses like that are keeping your competitors away.
Yes, but it typically doesn't matter if someone can reach parity or even surpass you - they have to surpass you by a step function to take a significant number of your users.
This is a step function in terms of efficiency (which presumably will be incorporated into ChatGPT within months), but not in terms of end user experience. It's only slightly better there.
One data point but my subscription for chatgpt is cancelled every time. So I made every month decision to resub. And because the cost of switching is essentially zero - the moment a better service is up there I will switch in an instant.
Would it not be useful to have multiple independent AIs observing and interacting to build a model of the world? I'm thinking something roughly like the "councelors" in the Civilization games, giving defense/economic/cultural advice, but generalized over any goal-oriented scenario (and including one to take the "user" role). A group of AIs with specific roles interacting with each other seems like a good area to explore, especially now given the downward scalability of LLMs.
This is exactly where Deepseeks enhancements come into play. Essentially deepseek lets the model think out loud via chain of thought (o1 and Claude also do this) but DS also does not supervise the chain of thought, and simply rewards CoT that get the answer correctly. This is just one of the half dozen training optimization that Deepseek has come up with.
This assumes no (or very small) diminishing returns effect.
I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality.
I believe the margin to expand is on CoT, where tokens can grow dramatically. If there is value in putting more compute towards it, there may still be returns to be captured on that margin.
Don't forget that "CUDA" involves more than language constructs and programming paradigms.
With NVDA, you get tools to deploy at scale, maximize utilization, debug errors and perf issues, share HW between workflows, etc. These things are not cheap to develop.
Running a 680-billion parameter frontier model on a few Macs (at 13 tok/s!) is nuts. That'a two years after ChatGPT was released. That rate of progress just blows my mind.
And those are M2 Ultras. M4 Ultra is about to drop in the next few weeks/months, and I'm guessing it might have higher RAM configs, so you can probably run the same 680b on two of those beasts.
The higher performing chips, with one less interconnect, is going to give you significantly higher t/s.
Offtopic, but your comment finally pushed me over the edge to semantic satiation [1] regarding the word "moat". It is incredible how this word turned up a short while ago and now it seems to be a key ingredient of every second comment.
Even if you have no interest at all in stock market shorting strategies there is plenty of meaty technical content in here, including some of the clearest summaries I've seen anywhere of the interesting ideas from the DeepSeek v3 and R1 papers.
I was excited as soon as I saw the domain name. Even after a few months, this article[1] is still at the top of my mind. You have a certain way of writing.
I remember being surprised at first because I thought it would feel like a wall of text. But it was such a good read and I felt I gained so much.
I was put off by the domain by bias against something that sounds like a company blog. Especially a "YouTube something".
You may get more milage from excellent writing on a yourname.com. This is a piece that sells you not this product, plus it feels more timeless. In 2050 someone my point to this post. Better if it were on your own name.
I'm curious if someone more informed than me can comment on this part:
> Besides things like the rise of humanoid robots, which I suspect is going to take most people by surprise when they are rapidly able to perform a huge number of tasks that currently require an unskilled (or even skilled) human worker (e.g., doing laundry ...
I've always said that the real test for humanoid AI is folding laundry, because it's an incredibly difficult problem. And I'm not talking about giving a machine clothing piece-by-piece flattened so it just has to fold, I'm talking about saying to a robot "There's a dryer full of clothes. Go fold it into separate piles (e.g. underwear, tops, bottoms) and don't mix the husband's clothes with the wife's". That is, something most humans in the developed world have to do a couple times a week.
I've been following some of the big advances in humanoid robot AI, but the above task still seems miles away given current tech. So is the author's quote just more unsubstantiated hype that I'm constantly bombarded with in the AI space, or have there been advancements recently in robot AI that I'm unaware of?
2 months ago, Boston Dynamics' Atlas was barely able to put solid objects in open cubbies. [1] Folding, hanging, and dresser drawer operation appears to be a few years out still.
I saw such robot's demos doing exactly that on youtube/x - not very precisely yet, but almost sufficiently enough. And it is just a beginning. Considering that majority of the laundry is very similar (shirts, t-shirts, trousers, etc..) I think this will be solved soon with enough training.
Can you share what you've seen? Because from what I've seen, I'm far from convinced. E.g. there is this, https://youtube.com/shorts/CICq5klTomY , which nominally does what I've described. Still, as impressive as that is, I think the distance from what that robot does to what a human can do is a lot farther than it seems. Besides noticing that the folded clothes are more like a neatly arranged pile, what about all the edge cases? What about static cling? Can it match socks? What if something gets stuck in the dryer?
I'm just very wary of looking at that video and saying "Look! It's 90% of the way there! And think how fast AI advances!", because that critical last 10% can often be harder than the first 90% and then some.
> The beauty of the MOE model approach is that you can decompose the big model into a collection of smaller models that each know different, non-overlapping (at least fully) pieces of knowledge.
I was under the impression that this was not how MoE models work. They are not a collection of independent models, but instead a way of routing to a subset of active parameters at each layer. There is no "expert" that is loaded or unloaded per question. All of the weights are loaded in VRAM, its just a matter of which are actually loaded to the registers for calculation. As far as I could tell from the Deepseek v3/v2 papers, their MoE approach follows this instead of being an explicit collection of experts. If thats the case, theres no VRAM saving to be had using an MOE nor an ability to extract the weights of the expert to run locally (aside from distillation or similar).
If there is someone more versed on the construction of MoE architectures I would love some help understanding what I missed here.
Not sure about DeepSeek R1, but you are right in regards to previous MoE architectures.
It doesn’t reduce memory usage, as each subsequent token might require different expert buy it reduces per token compute/bandwidth usage.
If you place experts in different GPUs, and run batched inference you would see these benefits.
Is there a concept of an expert that persists across layers? I thought each layer was essentially independent in terms of the "experts". I suppose you could look at what part of each layer was most likely to trigger together and segregate those by GPU though.
I could be very wrong on how experts work across layers though, I have only done a naive reading on it so far.
I suppose you could look at what part of each layer was most likely to trigger together and segregate those by GPU though
Yes, I think that's what they describe in section 3.4 of the V3 paper. Section 2.1.2 talks about "token-to-expert affinity". I think there's a layer which calculates these affinities (between a token and an expert) and then sends the computation to the GPUs with the right experts.
This doesn't sound like it would work if you're running just one chat, as you need all the experts loaded at once if you want to avoid spending lots of time loading and unloading models. But at scale with batches of requests it should work. There's some discussion of this in 2.1.2 but it's beyond my current ability to comprehend!
Ahh got it, thanks for the pointer. I am surprised there is enough correlation there to allow an entire GPU to be specialized. I'll have to dig in to the paper again.
It does. They have 256 experts per MLP layer, and some shared ones. The minimal deployment for decoding (aka. token generation) they recommend is 320 GPUs (H800). It is all in the DeepSeek v3 paper that everyone should read rather than speculating.
Got it. I’ll review the paper again for that portion. However, it still sounds like the end result is not VRAM savings but efficiently and speed improvements.
Yeah, if you look DeepSeek v3 paper deeper, each saving on each axis is understandable. Combined, they reach some magic number people can talk about (10x!): FP8: ~1.6 to 2x faster than BF16 / FP16; MLA: cut KV cache size by 4x (I think); MTP: converges 2x to 3x faster; DualPipe: maybe ~1.2 to 1.5x faster.
If you look deeper, many of these are only applicable to training (we already do FP8 for inference, MTP is to improve training convergence, and DualPipe is to overlapping communication / compute mostly for training purpose too). The efficiency improvement on inference IMHO is overblown.
Yes but, for a given size of model, Deepseek claims that a model trained with FP8 will work better than a model quantized to FP8. If that's true then, for a given quality, a native FP8 model will be smaller, and have cheaper inference.
I don't think entire GPU is specialised nor a singular token will use the same expert. I think about it as a gather-scatter operation at each layer.
Let's say you have an inference batch of 128 chats, at layer `i` you take the hidden states, compute their routing, scatter them along with the KV for those layers among GPUs (each one handling different experts), the attention and FF happens on these GPUs (as model params are there) and they get gathered again.
You might be able to avoid the gather by performing the routing on each of the GPUs, but I'm generally guessing here.
This is a humble and informed acrticle (comparing to others written by financial analysts the past a few days). But still have the flaw of over-estimating efficiency of deploying a 687B MoE model on commodity hardware (to use locally, cloud providers will do efficient batching and it is different): you cannot do that on any single Apple hardware (need to at least hook up 2 M2 Ultra). You can barely deploy that on desktop computers just because non-register DDR5 can have 64GiB per stick (so you are safe with 512 RAM). Now coming to PCIe bandwidth: 37B per token activation means exactly that, each activation requires new set of 37B weights, so you need to transfer 18GiB per token into VRAM (assuming 4-bit quant). PCIe 5 (5090) have 64GB/s transfer speed so your upper bound is limited to 4 tok/s with a well balanced propose built PC (and custom software). For programming tasks that usually requires ~3000 tokens for thinking, we are looking at 12 mins per interaction.
I don't think anyone uses MTP for inference right now. Even if you use MTP for drafting, you need to batching in the next round to "verify" it is the right token, if that happens you need to activate more experts.
DELETED: If you don't use MTP for drafting, and use MTP to skip generations, sure. But you also need to evaluate your use case to make sure you don't get penalized for doing that. Their evaluation in the paper don't use MTP for generation.
EDIT: Actually, you cannot use MTP other than drafting because you need to fill in these KV caches. So, during generation, you cannot save your compute with MTP (you save memory bandwidth, but this is more complicated for MoE model due to more activated experts).
Great article. I still feel like very few people are viewing the Deepseek effects in the right light. If we are 10x more efficient it's not that we use 1/10th the resources we did before, we expand to have 10x the usage we did before. All technology products have moved this direction. Where there is capacity, we will use it. This argument would not work if we were close to AGI or something and didn't need more, but I don't think we're actually close to that at all.
Correct. This effect is known in economics since forever - new technology has
- An "income effect". You use the thing more because it's cheaper - new usecases come up
- A "substitution effect." You use other things more because of the savings.
I got into this on labor economics here [1] - you have counterintuitive examples with ATMs actually increasing the number of bank branches for several decades.
DeepSeek is bullish for the semiconductor industry as a whole. Whether it is for Nvidia remains to be seen. Intel was in Nvidia position in 2007 and they didn't want to trade margins for volumes in the phone market. And there they are today.
Man, do I love myself a deep, well-researched long-form contrarian analysis published as a tangent of an already niche blog on a Sunday evening! The old web isn't dead yet :)
This was an amazing summary of the landscape of ML currently.
I think the title does the article injustice, or maybe it’s too long for people to read to appreciate it (eg the deepseek stuff can be an article within itself).
Whatever the ones with longer attention span will benefit from this read.
I think they would like it a lot, but I think the title doesn’t match the content, and it takes too much reading before one realises it goes beyond the title.
I'm wondering if there's a (probably illegal) strategy in the making here:
- Wait till NVDA rebounds in price.
- Create an OpenAI "competitor" that is powered by Llama or a similar open weights model.
- Obscure the fact that the company runs on this open tech and make it seem like you've developed your own models, but don't outright lie.
- Release an app and whitepaper (whitepaper looks and sounds technical, but is incredibly light on details, you only need to fool some new-grad stock analysts).
- Pay some shady click farms to get your app to the top of Apples charts (you only need it to be there for like 24 hours tops).
- Collect profits from your NVDA short positions.
I don’t think this is what happened with DeepSeek. It seems that they’ve genuinely optimized their model for efficiency and used GPUs properly (tiled FP8 trick and FP8 training). And came out on top.
The impact on the NVIDIA stock is ridiculous. DeepSeek took the advantage of flexible GPU architecture (unlike inflexible hardware acceleration).
This is what I still don't understand, how much of what they claim has been actually replicated? From what I understand the "50x cheaper" inference is coming from their pricing page, but is it actually 50x cheaper than the best open source models?
50x cheaper than OpenAI's pricing on an open source model which doesn't require giving that quality level up. The best open source models were much closer in pricing but V3/R1 are that way while being a results topper.
As a European I really don’t see the difference between US and Chinese tech right now - the last week from Trump has made me feel more threatened from the US than I ever have been by China (Greenland, living in a Nordic country with treaties to defend it).
I appreciate China has censorship, but the US is going that way too (recent “issues” for search terms). Might be different scales now, but I think it’ll happen. I don’t care as much if a Chinese company wins the LLM space than I did last year.
I really don't think he's a bad guy. He helped accelerate timelines and backed this tech when it was still a dream. Maybe he's not the brains behind it but he's been the brawn, and I think people should try to be more charitable and gracious about him rather than constantly vilify him.
I used to own several adult companies in the past. Incredible huge margins and then along came Pornhub and we could barely survive after it as we did not adapt.
With Deepseek this is now the 'Pornhub of AI' moment. Adapt or die.
They understood the Dmca brilliantly so they did bulk cheap content purchases and hid behind the Dmca for all non licensed content which was "uploaded by users". They did bulk purchases of cheap content from some studios but that was just a fraction
Of course their risk of going advertise revenue only was high and in the beginning mostly only cam providers would advertise
Our problem was that we had contracts and close relationships with all the big studios so going the Dmca route would have severed these ties for an unknown risk. In hindsight not creating a company which did abuse the Dmca was the right decision. I am very loyal and it would have felt like cheating
Now it's a different story after the credit card shake down when they had to remove millions of videos and be able to provide 2257 documentation for each video
For sure NVIDIA is priced for perfection perhaps more than any of the other of similar market value.
I think two threats are the biggest:
First Apple. TSMC’s largest customer. They are already making their own GPUs for their data centers. If they were to sell these to others they would be a major competitor.
You would have the same GPU stack on your on phone, laptop, pc, and data center. Already big developer mind share. Also useful in a world where LLMs run (in part) on the end user’s local machine (like Apple Intelligence).
Second is China - Huawei, Deepseek etc.
Yes - there will be no GPUs from Huawei in the US in this decade. And the Chinese won’t win in a big massive battle. Rather it is going to be death by a thousand cuts.
Just as what happened with the Huawei Mate 60. It is only sold in China but today Apple is loosing business big time in China.
In the same manner OpenAi and Microsoft will have their business hurt by Deepseek even if Deepseek was completely banned in the west.
Likely we will see news on Chinese AI accelerators this year and I wouldn’t be surprised if we soon saw Chinese hyperscalars offering cheaper GPU cloud compute than the west due to a combination of cheaper energy, labor cost, and sheer scale.
Lastly AMD is no threat to NVIDIA as they are far behind and follow the same path with little way of differentiating themselves.
English economist William Stanley Jevons vs the author of the article.
Will NVIDIA be in trouble because of DSR1 ? Interpreting Jevon’s effect, if LLMs are “steam engines” and DSR1 brings 90% efficiency improvement for the same performance, more of it will be deployed. This is not considering the increase due to <think> tokens.
More NVIDIA GPUs will be sold to support growing use cases of more efficient LLMs.
> DeepSeek is a tiny Chinese company that reportedly has under 200 employees. The story goes that they started out as a quant trading hedge fund similar to TwoSigma or RenTec, but after Xi Jinping cracked down on that space, they used their math and engineering chops to pivot into AI research.
I guess now we have the answer to the question that countless people have already asked: Where could we be if we figured out how to get most math and physics PhDs to work on things other than picking up pennies in front of steamrollers (a.k.a. HFT) again?
DeepSeek is a subsidiary of a relatively successful Chinese quant trading firm. It was the boss' weird passion project, after he made a few billion yuan from his other passion, trading. The whole thing was funded by quant trading profits, which kind of undermines your argument. Maybe we should just let extremely smart people work on the things that catch their interest?
Interest of extremely smart people is often is strongly correlated with potential profits, and these are very much correlated with policy, which in the case of financial regulation shapes market structures.
Another way of saying this: It's a well-known fact that complicated puzzles with a potentially huge reward attached to them attract the brightest people, so I'm arguing that we should be very conscious of the types of puzzles we implicitly come up with, and consider this an externality to be accounted for.
HFT is, to a large extent, a product of policy, in particular Reg NMS, based on the idea that we need to have many competing exchanges to make our markets more efficient. This has worked well in breaking down some inefficiencies, but has created a whole set of new ones, which are the basis of HFT being possible in the first place.
There are various ideas on whether different ways of investing might be more efficient, but these largely focus on benefits to investors (i.e. less money being "drained away" by HFT). What I'm arguing is that the "draining" might not even be the biggest problem, but rather that the people doing it could instead contribute to equally exciting, non-zero sum games instead.
We definitely want to keep around the the part of HFT that contributes to more efficient resource allocation (an inherently hard problem), but wouldn't it be great if we could avoid the part that only works around the kinks of a particular market structure emergent from a particular piece of regulation?
This is completely fake though. It was more like their founder decided to start a branch to do AI research. It was well planned, they bought significantly more GPUs than they can use for quant research even before they start to do anything AI.
There was a crack down on algorithmic trading, but it didn't had much impact and IMO someone higher up definitely does not want to kill these trading firms.
The optimal amount of algorithmic trading is definitely more than none (I appreciate liquidity and price quality as much as the next guy), but arguably there's a case here that we've overshot a bit.
The price data I (we?) get is 15 minute delayed. I would guess most of the profiteering is from consumers not knowing the last transaction prices? I.e. an artificially created edge by the broker who then sells the API to clean their hands of the scam.
Real-time price data is indeed not free, but widely available even in retail brokerages. I've never seen a 15 minute delay in any US based trade, and I think I can even access level 2 data a limited number of times on most exchanges (not that it does me much good as a retail investor).
> I would guess most of the profiteering is from consumers not knowing the last transaction prices?
No, not at all. And I wouldn't even necessarily call it profiteering. Ironically, as a retail investor you even benefit from hedge funds and HFTs being a counterpart to your trades: You get on average better (and worst case as good) execution from PFOF.
Institutional investors (which include pension funds, insurances etc.) are a different story.
Interestingly a lot of the math and physics people in the ML community are considered "grumpy researchers." A joke apparent with this starter pack[0].
From my personal experience (undergrad physics, worked as engineer, came to CS & ML because I liked the math), there's a lot of pushback.
- I've been told that the math doesn't matter/you don't need math.
- I've heard very prominent researchers say "fuck theorists"
- I've seen papers routinely rejected for improving training techniques with reviewers say "just tune a large model"
- I see papers that show improvements when conditioning comparisons on compute restraints because "not enough datasets" or "but does it scale" (these questions can always be asked but require exponentially more work)
- I've been told I'm gatekeeping for saying "you don't need math to make good models, but you need it to know why your models are wrong" (yes, this is a reference)
- when pointing out math or statistical errors I'm told it doesn't matter
- and much more.
I've heard this from my advisor, dissertation committee, bosses[1], peers, and others (of course, HN). If my experience is short of being rare, I think it explains the grumpy group[2]. But I'm also not too surprised with how common it is in CS for people to claim that everything is easy or that leet code is proof of competence (as opposed to evidence).
I think unfortunately the problem is a bit bigger, but it isn't unsolvable. Really, it is "easily" solvable since it just requires us to make different decisions. Meaning _each and every one of us_ has a direct impact on making this change. Maybe I'm grumpy because I want to see this better world. Maybe I'm grumpy because I know it is possible. Maybe I'm grumpy because it is my job to see problems and try to fix them lol
Arguably, the emergence of quant hedge funds and private AI research companies is at least as much a symptom of the dysfunctions of academia (and society's compensation of academics on dimensions monetary and beyond) as it is of the ability of Wall Street and Silicon Valley to treat former scientists better than that.
Yes and no. Industry AI research is currently tightly coupled with academic research. Most of the big papers you see are either directly from the big labs or in partnership. Not even labs like Stanford have sufficient compute to train GPT from scratch (maybe enough for DeepSeek). Here's Fei-Fei Li discussing the issue. Stanford has something like 300 GPUs[1]? And those have to be split across labs.
The thing is that there's always a pipeline. Academic does most of the low level research, say TRL[2] 1-4, partnerships happen between 4-6, and industry takes over the rest. (with some wiggleroom on these numbers). Much of ML academic research right now is tuning large models, made by big labs. This isn't low TRL. Additionally, a lot of research is rejected for not out-performing technologies that are already at TRL 5-7. See Mamba for a recent example. You could also point to KANs, which are probably around TRL 3.
> Arguably, the emergence of quant hedge funds and private AI research companies is at least as much a symptom of the dysfunctions of academia
Which is where I, again, both agree and disagree. It is not _just_ a symptom of the dysfunction of academia, but _also_ industry. The reason I pointed out the grumpy researchers is because a lot of these people have been discussing techniques that DeepSeek used, long before they were used. DeepSeek looks like what happens when you set these people free. Which is my argument, that we should do that. Scale Maximalists (also alled "Bitter Lesson Maximalists", but I dislike the term) have been dominating ML research, and DeepSeek shows that scale isn't enough. So will hopefully give the mathy people more weight. But then again, is not the common way monopolies fall is because they become too arrogant and incestuous?
So mostly, I agree, I'm just pointing out that there is a bit more subtly and I think we need to recognize that to make progress. There are a lot of physicists and mathy people who like ML and have been doing research in the area but are often pushed out because of the thinking I listed. Though part of the success of the quant industry is recognizing that the strong math and modeling skills of physicists generalize pretty well and you go after people who recognize that an equation that describes a spring isn't only useful for springs, but is useful for anything that oscillates. That understanding of math at that level is very powerful and boy are there a lot of people that want the opportunity to demonstrate this in ML, they just never get similar GPU access.
This story could be applied to every tech breakthrough. We start where the breakthrough is moated by hardware, access to knowledge, and IP. Over time:
- Competition gets crucial features into cheaper hardware
- Work-arounds for most IP are discovered
- Knowledge finds a way out of the castle
This leads to a "Cambrian explosion" of new devices and software that usually gives rise to some game-changing new ways to use the new technology. I'm not sure where we all thought this somehow wouldn't apply to AI. We've seen the pattern with almost every new technology you can think of. It's just how it works. Only the time it takes for patents to expire changes this... so long as everyone respects the patent.
Yes this is exactly right. All you need is the right incentives and enough capital and markets will find away to breech any moat that’s not enforced via regulations.
It's still wild to me that toasters have always been $20 but extremely expensive lasers, digital chips, amps, motors, LCD screens worked their way down to $20 CD players.
So... Electric toasters came to market in the 1920s, priced from $15, eventually getting as low as $5. Adjusting for inflation, that $15 toaster cost $236.70 in 2025 USD. Today's $15 toaster would be about 90¢ in 1920s dollars... so it follows the story.
On average toasters have always been $20. Wasn't $5 an outlier during dotCom crash homegoods firesales? There are some outliers. I just think it's wild that some coils cost the same as a radioactive laser, ICs, amps, motors, etc. There's a certain minimum cost and the complexity doesn't matter.
Invention is expensive. Innovation is less expensive. Production is (usually) the cheap part. Once the invention and innovation is paid off, it's just production...
The beginning of the article was good, but the analysis of DeepSeek and what it means for Nvidia is confused and clearly out of the loop.
* People have been training models at <fp32 precision for many years, I did this in 2021 and it was already easy in all the major libraries.
* GPU FLOPs are used for many things besides training the final released model.
* Demand for AI is capacity limited, so it's possible and likely that increasing AI/FLOP would not substantially reduce the price of GPUs
His DeepSeek argument was essentially that experts who look at the economics of running these teams (eg. ha ha the engineers themselves might dabble) are looking over the hedge at DeepSeek's claims and they are really awestruck
Where do you have this "capacity" limit from? I can get as many H100s from GCP or wherever as I wish, the only thing that is capacity limited are 100k clusters ala ELON+X, but what DeepSeek (and the recent evidence of a limit in pure base-model scaling) shows is that this might actually not be profitable, and we end up with much smaller base models scaled at inference time. The moat for Nvidia in this inference time scaling is much smaller, also you don't need the humongous clusters for that either you can just distribute the inference (and in the future run it locally too).
Part of the reason Musk, Zuckerberg, Ellison, Nadella and other CEOs are bragging about the number of GPUs they have (or plan to have) is to attract talent.
Perplexity CEO says he tried to hire an AI researcher from Meta, and was told to ‘come back to me when you have 10,000 H100 GPUs’
Maybe DeepSeek ain't it, but I expect a big "box of scraps"[1] moment soon. Constraint is mother of invention, and they are evading constraints with a promise of never-ending scale.
This reminds of the joke in physics, in which theoretical particle physicists told experimental physicists, over and over again, "trust me bro, standard model will be proved at 10x eV, we just need a bigger collider bro" after another world's biggest collider is built.
Wondering if we are in a similar position with "trust me bro AGI will be achieved with 10x more GPUs".
The difference is the AI researchers have clear plots showing capabilities scaling with GPUs and there's not a sign that it is flattening so they actually have a case for saying that AGI is possible at N GPUs.
Sauce? How do you even measure "capabilities" in that regard, just writing answers to standard tests? Because being able to ace a test doesn't mean it's AGI, it means its good at taking standard tests.
Sorry, my blog crashed! Had a stupid bug where it was calling GitHub too frequently to pull in updated markdown for the posts and kept getting rate limits. Had to rewrite it but it should be much better now.
> Amazon gets a lot of flak for totally bungling their internal AI model development, squandering massive amounts of internal compute resources on models that ultimately are not competitive, but the custom silicon is another matter
Juicy. Anyone have a link or context to this? I'd not heard of this reception to NOVA and related.
I think Nova may have changed things here. Prior to Nova their LLMs were pretty rubbish - Nova only came out in December but seems a whole lot better, at least from initial impressions: https://simonwillison.net/2024/Dec/4/amazon-nova/
The point about using FP32 for training is wrong. Mixed precision (FP16 multiplies, FP32 accumulates) has been use for years – the original paper came out in 2017.
This is such a great read. The only missing facet of discussion here is that there is a valuation level of NVDA such that it would tip the balance of military action by China against Taiwan. TSMC can only drive so much global value before the incentive to invade becomes irresistible. Unclear where that threshold is; if we’re being honest, could be any day.
Very interesting and it seems like there is more room for optimizations for WASM using SIMD, boosting performance by a lot! It's cool to see how AI can now run even faster on web browsers.
The R1 paper (https://arxiv.org/pdf/2501.12948) emphasizes their success with reinforcement learning without requiring any supervised data (unlike RLHF for example). They note that this works well for math and programming questions with verifiable answers.
What's totally unclear is what data they used for this reinforcement learning step. How many math problems of the right difficulty with well-defined labeled answers are available on the internet? (I see about 1,000 historical AIME questions, maybe another factor of 10 from other similar contests). Similarly, they mention LeetCode - it looks like there are around 3000 LeetCode questions online. Curious what others think - maybe the reinforcement learning step requires far less data than I would guess?
> While Apple's focus seems somewhat orthogonal to these other players in terms of its mobile-first, consumer oriented, "edge compute" focus, if it ends up spending enough money on its new contract with OpenAI to provide AI services to iPhone users, you have to imagine that they have teams looking into making their own custom silicon for inference/training
This is already happening today. Most of the new LLM features announced this year are primarily on-device, using the Neural Engine, and the rest is in Private Cloud Compute, which is also using Apple-trained models, on Apple hardware.
The only features using OpenAI for inference are the ones that announce the content came from ChatGPT.
First of all, I don't invest in Nvidia and like Olygopols. But it is too early to talk about Nvidia's future. People are just betting and wishing about Nvidia's future. No one knows people's what people will do in the future. what they will think? It's just guessing and betting. Their real competitor is not Deepseek. Did AMD or others release something new and compete with Nvidia's products? If NVDIA will be the market leader, this means they will lead the price. Being Olygopol is something like that. They don't need to compete for the price of competitors.
When he says better linux drivers than AMD he's strictly talking about for AI, right? Because for video the opposite has been the case for as far back as I can remember.
Yes, AMD drivers work fine for games and things like that. Their problem is they basically only focused on games and other consumer applications and, as a result, ceded this massive growth market to Nvidia. I guess you can sort of give them a pass because they did manage to kill their archival Intel in data center CPUs but it’s a massive strategic failure if you look at how much it has cost them.
>With the advent of the revolutionary Chain-of-Thought ("COT") models introduced in the past year, most noticeably in OpenAI's flagship O1 model (but very recently in DeepSeek's new R1 model, which we will talk about later in much more detail), all that changed. Instead of the amount of inference compute being directly proportional to the length of the output text generated by the model (scaling up for larger context windows, model size, etc.), these new COT models also generate intermediate "logic tokens"; think of this as a sort of scratchpad or "internal monologue" of the model while it's trying to solve your problem or complete its assigned task.
Is this right? I thought CoT was a prompting method and are we calling the reasoning models as CoT models?
I'm curious what are the key differences between "a reasoning model" and good old CoT prompting. Is there any reason to believe that the fundamental limitations of prompting don't apply to "reasoning models"? (hallucinations, plainly wrong output, bias towards to training data mean etc.)
The level of sophistication for CoT model varies. "good old CoT prompting" is you hoping the model generates some reasoning tokens prior to the final answer. When it did, the answers tended to be better for certain class of problems. But you had no control over what type of reasoning tokes it was generating. There were hypothesis that just having a <pause> tokens in between generated better answers as it allowed n+1 steps to generate an answer over n. I would consider Meta's "continuous chain of thought" to be on the other end of "good old CoT prompting" where they are passing back the next tokens from the latent space back to the model getting a "BHF" like effect. Who knows what's happening with O3 and Anthropics O3 like models..
The problems you mentioned is very broad and not limited to prompting. Reasoning models tend to outperform older models on math problems. So I'd assume it does reduce hallucination on certain class of problems.
NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng), who has previously purchased at least 10,000 A100 shovels... has a "side project" where they figured out how to dig really well with a shovel and shared their secrets.
The gold rush, wether real or a bubble is still there! NVIDA will still sell every shovel they can manufacture, as soon as it is available in inventory.
Fortune 100 companies will still want the biggest toolshed to invent the next paradigm or to be the first to get to AGI.
I think the biggest threat for future NVIDIa right now is their own current success.
Their software platforms and CUDA are a very strong moat against everyone else. I don't see any beating them on that front right now.
The problem is that I'm afraid that all that money sloshing inside the company is rotting the culture and that will compromise future development.
- Grifters are filling out positions in many orgs only trying to milk it as much as possible.
- Old employees become complacent with their nice RSU packages Rest & Vest.
NVIDIA used to be extremely nimble and was way fighting way above it's weight class. Prior to Mellanox acquisition only around 10k employees and after another 10k more.
If there's a real threat to their position at the top of the AI offerings will they be able to roll up the sleeves and get back to work or will the organizations be unable to move ahead.
Long term I think it's inevitable that China will take over the technology leadership. They have the population and they have the education programs and the skill to do this. At the same time in the old western democracies things are becoming stagnant and I even dare to say that the younger generations are declining. In my native country the educational system has collapsed, over 20% kids that finish elementary school cannot read or write. They can mouth-breath and scroll TikTok though but just barely since their attention span is about the same as gold fish.
LOL. This isn't rot, it is reaching the end goal, the people doing the work reach the rewards they were working towards. Rot would imply somehow management should prevent rest and vest but that is the exact model that they acquired their talent on. You would have to remove capitalism from companies when companies win at capitalism making it all just a giant rug pull for employees.
The vast majority of Nvidia's current value is tied to their dominance in AI hardware. That value could be threatened if LLMs could be trained and or ran efficiently using a CPU or a quantum chip. I don't understand enough about the capabilities of quantum computing to know if running or training a LLM would be possible using a quantum chip, but if it becomes possible, NVDA stock is unlikely to fair well (unless they are making the new chip).
I always appreciate reading a take from someone who's well versed in the domains they have opinions about.
I think longer-term we'll eat up any slack in efficiency by throwing more inference demands at it -- but the shift is tectonic. It's a cultural thing. People got acclimated to shlepping around morbidly obese node packages and stringing together enormous python libraries - meanwhile the deepseek guys out here carving bits and bytes into bare metal. Back to FP!
This is a bizarre take. First Deepseek no doubt is still using the same bloated Python ML packages as everyone else. Second since this is "open source" it's pretty clear that the big labs are just going to replicate this basically immediately and with their already massive compute advantages put models out that are extra OOM larger/better/etc/ than what Deepseek can possibly put out. Theres just no reason to think that e.g. a 10x increase in training efficiency does anything but increase the size of the next model generation by 10x.
However imagine entering a store where the camera looks up your face in shared database and profiles you as a person who will pay higher prices - and the prices are displayed near you according to your profile...
Nvidia seem to be one step ahead of this and you can see their platform efforts are pushing towards creating large volumes of compute that are easy to manage for whatever your compute requirements are, be that training, inference or whatever comes next and whatever form. People are maybe tackling some of these areas in isolation but you do not want to build datacenters where everything is ringfenced per task or usage.
This is such a comprehensive analysis, thank you. For someone just starting to learn about the field, it’s a great way to understand what’s going on in the industry.
I think this is just a(nother) canary for many other markets in the US v China game of monopoly. One weird effect in all this is that US Tech may go on to be over valued (i.e., disconnect from fundamentals) for quite some time.
> Another very smart thing they did is to use what is known as a Mixture-of-Experts (MOE) Transformer architecture, but with key innovations around load balancing. As you might know, the size or capacity of an AI model is often measured in terms of the number of parameters the model contains. A parameter is just a number that stores some attribute of the model; either the "weight" or importance a particular artificial neuron has relative to another one, or the importance of a particular token depending on its context (in the "attention mechanism").
Has a wide-scale model analysis been performed inspecting the parameters and their weights for all popular open / available models yet? The impact and effects of disclosed inbound data and tuning parameters on individual vector tokens will prove highly informative and clarifying.
Such analysis will undoubtedly help semi-literate AI folks level up and bridge any gaps.
Considering the fact that current models were trained on top-notch books, those read and studied by the most brilliant engineers, the models are pretty dumb.
They are more like the thing which enabled computers to work with and digest text instead of just code. The fact that they can parrot pretty interesting relationships from the texts they've consumed kind of proofs that they are capable of statistically "understanding" what we're trying to talk with them about, so it's a pretty good interface.
But going back to the really valuable content of the books they've been trained on, they just don't understand it. There's other AI which needs to get created which can really learn the concepts taught in those books instead of just the words and the value of the proximities between them.
To learn that other missing part will require hardware just as uniquely powerful and flexible as what Nvidia has to offer. Those companies now optimizing for inference and LLM training will be good at it and have their market share, but they need to ensure that their entire stack is as capable of Nvidia's stack, if they also want to be part of future developments. I don't know if Tenstorrent or Groq are capable of doing this, but I doubt it.
I think it's more than just the market effect on "established" AI players like Nvidia.
I don't think it's necessarily a coincidence that DeepSeek dropped within a short time frame of the announcement of the AI investment initiative by the Trump administration.
The idea is to get the money from investors who want to earn a return. Lower capex is attractive to investors, and DS drops capex dramatically. It makes Chinese AI talent look like the smart, safe bet. Nothing like DS could happen in China unless the powers-that-be knew about it and got some level of control. I'm also willing to bet that this isn't the best they've got.
They're saying "we can deliver the same capabilities for far less, and we're not going to threaten you with a tariff for not complying".
Great article, thanks for writing it! Really great summary of the current state of the AI industry for someone like me who's outside of it (but tangential, given that I work with GPUs for graphics).
The one thing from the article that sticks out to me is that the author/people are assuming that deepseek needing 1/45th the amount of hardware means that the other 44/45ths large tech companies have invested were wasteful.
Does software not scale to meet hardware? I don't see this as 44/45ths wasted hardware, but as a free increase in the amount of hardware people have. Software needing less hardware means you can run even _more_ software without spending more money, not that you need less hardware, right? (for the top-end, non-embedded use cases).
---
As an aside, the state of the "AI" industry really freaks me out sometimes. Ignoring any sort of short or long term effects on society, jobs, people, etc, just the sheer amount of money and time invested into this one thing is, insane?
Tons of custom processing chips, interconnects, compilers, algorithms, _press releases!_, etc all for one specific field. It's like someone taking the last decade of advances in computers, software, etc, and shoving it in the space of a year. For comparison, Rust 1.0 is 10 years old - I vividly remember the release. And even then it took years to propagate out as a "thing" that people were interested in and invested significant time into. Meanwhile deepseek releases a new model (complete with a customer-facing product name and chat interface, instead of something boring and technical), and in 5 days it's being replicated (to at least some degree) and copied by competitors. Google, Apple, Microsoft, etc are all making custom chips and investing insane amounts of money into different compilers, programming languages, hardware, and research.
It's just, kind of disquieting? Like everyone involved in AI lives in another world operating at breakneck speed, with billions of dollars involved, and the rest of us are just watching from the sidelines. Most of it (LLMs specifically) is no longer exciting to me. It's like, what's the point of spending time on a non-AI related project? We can spend some time writing a nice API and working on a cool feature or making a UI prettier and that's great, and maybe with a good amount of contributors and solid, sustained effort, we can make a cool project that's useful and people enjoy, and earns money to support people if it's commercial. But then for AI, github repos with shiny well-written readmes pop up overnight, tons of text is being written, thought, effort, and billions of dollars get burned or speculated on in an instant on new things, as soon as the next marketing release is posted.
How can the next advancement in graphics, databases, cryptography, etc compete with the sheer amount of societal attention AI receives?
Where does that leave writing software for the rest of us?
To me, this seems like we are back again in 1953 and a company just announced they are now capable of building one of IBM's 5 computers for 10% of the price.
I really don't understand the rationale of "We can now train GPT 4o for 10% the price, so that will bring demand for GPUs down.". If I can train GPT 4o for 10% the price, and I have a budget of 1B USD, that means I'm now going to use the same budget and train my model for 10x as long (or 10x bigger).
At the same time, a lot of small players that couldn't properly train a model before, because the starting point was simply out of their reach, will now be able to purchase equipment that's capable of something of note, and they will buy even more GPUs.
P.S. Yes, I know that the original quote "I think there is a world market for maybe five computers", was taken out of context.
P.S.S. In this rationale, I'm also operating under the assumption that Deepseek numbers are real. Which, given the track record of Chinese companies, is probably not true.
Please tell me if I am wrong. I know very little details and heard a few headlines and my hasty conclusion is that this development clearly shows the exponential nature of AI development in terms of how people are able to piggyback from the resources, time and money of the previous iteration. They used the output from chatgpt as the input to their model. Is this true, more or less accurate or off base?
All this is good news for all of us. Bad news probably for Nvidia's margins long term but who cares. If we can train and inference in less cycles and watts that is awesome.
As a bystander it's so refreshing to see this, global tech competition is great for the market and it gives hope that LLMs aren't locked behind Bs of investments and smaller players can compete well as well.
So at some point we will have too many cannon ball polishing factories and it will become apparent the cannon ball trajectory is not easily improved on.
Despite the fact that this article is very well written and certainly contains high quality information, I choose to remain skeptical as it pertains to Nvidia's position in the market. I'll come right out and say that my experience likely makes me see this from a biased position.
The premise is simple: Business is warfare. Anything you can do to damage or slow down the market leader gives you more time to get caught up. FUD is a powerful force.
My bias comes from having been the subject of such attacks in my prior tech startup. Our technology was destroying the offerings of the market leading multi-billion-dollar global company that pretty much owned the sector. The natural processes of such a beast caused them not to be able to design their way out of a paper bag. We clearly had an advantage. The problem was that we did not have the deep pockets necessary to flood the market with it and take them out.
What did they do?
The started a FUD campaign.
They went to every single large customer and our resellers (this was a hardware/software product) a month or two before the two main industry tradeshows, and lied to them. They promised that they would show market-leading technology "in just a couple of months" and would add comments like "you might want to put your orders on hold until you see this". We had multi-million dollar orders held for months in anticipation of these product unveilings.
And, sure enough, they would announce the new products with a great marketing push at the next tradeshow. All demos were engineered and manipulated to deceive, all of them. Yet, the incredible power of throwing millions of dollars at this effort delivered what they needed, FUD.
The problem with new products is that it takes months for them to be properly validated. So, if the company that had frozen a $5MM order for our products decides to verify the claims of our competitor, it typically took around four months. In four months, they would discover that the new shiny object was shit and less stellar than what they were told. I other words, we won. Right?
No!
The mega-corp would then reassure them that they iterated vast improvements into the design and those would be presented --I kid you not-- at the next tradeshow. Spending millions of dollars they, at this point, denied us of millions of dollars of revenue for approximately one year. FUD, again.
The next tradeshow came and went and the same cycle repeats...it would take months for customers to realize the emperor had no clothes. It was brutal to be on the receiving end of this without the financial horsepower to be able to break through the FUD. It was a marketing arms race and we were unprepared to win it. In this context, the idea that a better mouse trap always wins is just laughable.
This did not end well. They were not going to survive another FUD cycle. Reality eventually comes into play. Except that, in this case, 2008 happened. The economic implosion caught us in serious financial peril due to the damage done by the FUD campaign. Ultimately, it was not survivable and I had to shut down the company.
It took this mega-corp another five years to finally deliver a product that approximated what we had and another five years after that to match and exceed it. I don't even want to imagine how many hundreds of millions they spent on this.
So, long way of saying: China wants to win. No company in China is independent from government forces. This is, without a doubt, a war for supremacy in the AI world. It is my opinion that, while the technology, as described, seems to make sense, it is highly likely that this is yet another form of a FUD campaign to gain time. If they can deny Nvidia (and others) the orders needed to maintain the current pace, they gain time to execute on a strategy that could give them the advantage.
Perhaps most devastating is DeepSeek's recent efficiency breakthrough, achieving comparable model performance at approximately 1/45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources.
I wrote in another thread why DeepSeek should increase demand for chips, not lower.
1. More efficient LLMs should lead to more usage, which means more AI chip demand. Jevon's Paradox.
2. Even if DeepSeek is 45x more efficient (it is not), models will just become 45x+ bigger. It won’t stay small.
3. To build a moat, OpenAI and American AI companies need to up their datacenter spending even more.
4. DeepSeek's breakthrough is in distilling models. You still need a ton of compute to train the foundational model to distill.
5. DeepSeek's conclusion in their paper says more compute is needed for next break through.
6. DeepSeek's model is trained on GPT4o/Sonnet outputs. Again, this reaffirms the fact that in order to take the next step, you need to continue to train better models. Better models will generate better data for next-gen models.
I think DeepSeek hurts OpenAI/Anthropic/Google/Microsoft. I think DeepSeek helps TSMC/Nvidia.
Combined with the emergence of more efficient inference architectures through chain-of-thought models, the aggregate demand for compute could be significantly lower than current projections assume.
This is misguided. Let's think logically about this.
Therefore, you can conclude that Nvidia and TSMC demand should go up because of CoT models. In 2025, CoT models are clearly bottlenecked by not having enough compute.
The economics here are compelling: when DeepSeek can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA's customers are burning cash unnecessarily or margins must come down dramatically.
Or that in order to build a moat, OpenAI/Anthropic/Google and other laps need to double down on even more compute.
Fwiw many of the improvements in Deepseek were already in other 'can run on your personal computer' AI's such as Meta's Llama. Deepseek is actually very similar to Llama in efficiency. People were already running that on home computers with M3's.
A couple of examples; Meta's multi-token prediction was specifically implemented as a huge efficiency improvement that was taken up by Deepseek. REcurrent ADaption (READ) was another big win by Meta that Deepseek utilized. Multi-head Latent Attention is another technique, not pioneered by Meta but used by both Deepseek and Llama.
Anyway Deepseek isn't some independent revolution out of nowhere. It's actually very very similar to the existing state of the art and just bundles a whole lot of efficiency gains in one model. There's no secret sauce here. It's much better than what openAI has but that's because openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.
Anyway why do i point out that Deepseek is very similar to something like Llama? Because Meta's spending 100's of billions on chips to run it. It's pretty damn efficient, especially compared to openAI but they are still spending billions on datacenter build-outs.
> openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.
Isn't the point of 'The Bitter Lesson' precisely that in the end, brute force wins, and hand-crafted optimizations like the ones you mention llama and deepseek use are bound to lose in the end?
Imho the tldr is that the wins are always from 'scaling search and learning'.
Any customisations that aren't related to the above are destined to be overtaken by someone that can improve the scaling of compute. OpenAI do not seem to be doing as much to improve the scaling of the compute in software terms (they are doing a lot in hardware terms admitedly). They have models at the top of the charts for various benchmarks right now but it feels like a temporary win from chasing those benchmarks outside of the focus of scaling compute.
it means they can serve more with what they have if they implement models with deepseek's optimizations. More usage doesn't mean Nvidia will get the same margins when cloud providers scale out with this innovation.
Yesterday I wrote up all my thoughts on whether NVDA stock is finally a decent short (or at least not a good thing to own at this point). I’m a huge bull when it comes to the power and potential of AI, but there are just too many forces arrayed against them to sustain supernormal profits.
Anyway, I hope people here find it interesting to read, and I welcome any debate or discussion about my arguments.
Wanted to add a preface: Thank you for your time on this article, I appreciate your perspective and experience, hoping you can help refine and reign in my bull case.
Where do you expect NVDA's forward and current eps to land? What revenue drop off are you expecting in late 2025/2026. Part of my bull case for NVDA, continuing, is it's very reasonable multiple on insane revenue. An leveling off can be expected, but I still feel bullish on it hitting $200+ (5 Trillion market cap? on ~195B revenue for Fiscal year 2026 (calendar 2025) at 33 EPS) based on this years revenue according to their guidance and the guidance of the hyperscalers spending. Finding a sell point is a whole different matter to being actively short. I can see the case to take some profits, hard for me to go short, especially in an inflationary environment (tariffs, electric energy, bullying for lower US interest rates).
The scale of production of Grace Hopper and Blackwell amaze me, 800k units of Blackwell coming out this quarter, is there even production room for AMD to get their chips made? (Looking at the new chip factories in Arizona)
R1 might be nice for reducing llm inferencing costs, unsure about the local llama one's accuracy (couldnt get it to correctly spit out the NFL teams and their associated conferences, kept mixing NFL with Euro Football) but I still want to train YOLO vision models on faster chips like A100's vs T4 (4-5x multiples in speed for me).
Lastly, if the Robot/Autonomous vehicle ML wave hits within the next year, (First drones and cars -> factories -> humanoids) I think this compute demand can sustain NVDA compute demand.
The real mystery is how we power all this within 2 years...
* This is not financial advice and some of my numbers might be a little off, still refining my model and verifying sources and numbers
Looks like huge astroturfing effort from CCP. I am seeing these coordinated propaganda inside every AI related sub on reddit, on social media and now - here.
As it says in the article, you are talking about a mere constant of proportionality, a single multiple. When you're dealing with an exponential growth curve, that stuff gets washed out so quickly that it doesn't end up matter all that much.
Keep in mind that the goal everyone is driving towards is AGI, not simply an incremental improvement over the latest model from Open AI.
Their loss curve with the RL didn't level off much though, could be taken a lot further and scaled up to more parameters on the big nvidia mega clusters out there. And the architecture is heavily tuned to nvidia optimizations.
When was the last time the US got their lunch ate in technology?
Sputnik might be a bit hyperbolic but after using the model all day and as someone who had been thinking of a pro subscription, it is hard to grasp the ramifications.
There is just no good reference point that I can think of.
Yep some CEO said they have 50K GPUs of the prior generation. They probably accumulated them through intermediaries that are basically helping nvidia sell to sanctioned parties by proxy
Actually, compression is an incredibly good way to think about intelligence. If you understand something really well then you can compress it a lot. If you can compress most of human knowledge effectively without much reconstruction error while shrinking it down by 99.5%, then you must have in the process arrived at a coherent and essentially correct world model, which is the basis of effective cognition.
Fwiw there's highly cited papers that literally map AGI to compression. As in they map to the same thing and people write papers on this fact that are widely respected. Basically a prediction engine can be used to make a compression tool and an AI equally.
The tldr; if given inputs and a system that can accurately predict the next sequence you can either compress that data using that prediction (arithmetic coding) or you can take actions based on that prediction to achieve an end goal mapping predictions of new inputs to possible outcomes and then taking the path to a goal (AGI). They boil down to one and the same. So it's weird to have someone state they are not the same when it's widely accepted they absolutely are.
Related ongoing thread:
Nvidia’s $589B DeepSeek rout - https://news.ycombinator.com/item?id=42839650 - Jan 2025 (574 comments)
The description of DeepSeek reminds me of my experience in networking in the late 80s - early 90s.
Back then a really big motivator for Asynchronous Transfer Mode (ATM) and fiber-to-the-home was the promise of video on demand, which was a huge market in comparison to the Internet of the day. Just about all the work in this area ignored the potential of advanced video coding algorithms, and assumed that broadcast TV-quality video would require about 50x more bandwidth than today's SD Netflix videos, and 6x more than 4K.
What made video on the Internet possible wasn't a faster Internet, although the 10-20x increase every decade certainly helped - it was smarter algorithms that used orders of magnitude less bandwidth. In the case of AI, GPUs keep getting faster, but it's going to take a hell of a long time to achieve a 10x improvement in performance per cm^2 of silicon. Vastly improved training/inference algorithms may or may not be possible (DeepSeek seems to indicate the answer is "may") but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.
Another aspect that reinforces your point is that the ATM push (and subsequent downfall) was not just bandwidth-motivated but also motivated by a belief that ATM's QoS guarantees were necessary. But it turned out that software improvements, notably MPLS to handle QoS, were all that was needed.
Nah, it's mostly just buffering :-)
Plus the cell phone industry paved the way for VOIP by getting everyone used to really, really crappy voice quality. Generations of Bell Labs and Bellcore engineers would rather have resigned than be subjected to what's considered acceptable voice quality nowadays...
I've noticed this when talking on the phone with someone with a significant accent.
1. it takes considerable work on my part to understand it on a cell phone
2. it's much easier on POTS
3. it's not a problem on VOIP
4. no issues in person
With all the amazing advances in cell phones, the voice quality of cellular is stuck in the 90's.
I generally travel to Europe, and it baffles why I can't use VoLTE there (maybe my roaming doesn't allow that), and fallback to 3G for voice calls.
At home, I use VoLTE and the sound is almost impeccable, very high quality, but in the places I roam to, what I get is FM quality 3G sound.
It's not that cellular network is incapable of that sound quality, but I don't get to experience it except my home country. Interesting, indeed.
In which countries?
3G networks in many European countries were shut off in 2022-2024. The few remaining ones will go too over the next couple of years.
VoLTE is 5G, common throughout Europe. However the handset manufacturer may need to qualify each handset model with local carriers before they will connect using VoLTE. As I understand the situation, Google for instance has only qualified Pixel phones for 5G in 19 of 170-odd countries. So 5G features like VoLTE may not be available in all countries. This is very handset/country/carrier-dependent.
> VoLTE is 5G
Technically, on 5G you have "VoNR"[0], where VoLTE is over 4G.
[0] https://en.wikipedia.org/wiki/Voice_over_NR
VoLTE can very well be 5G now, and it can vary from country to country, but my first memoryof VoLTE is originally started with LTE/4G networks.
https://en.wikipedia.org/wiki/Voice_over_LTE
Yes, I think most video on the Internet is HLS and similar approaches which are about as far from the ATM circuit-switching approach as it gets. For those unfamiliar HLS is pretty much breaking the video into chunks to download over plain HTTP.
Yes, but that's entirely orthogonal to the "coding" algorithms being used and which are specifically responsible for the improvement that GP was describing.
HLS is really just a way to empower the client with the ownership of the playback logic. Let the client handle forward buffering, retries, stream selection, etc.
>> Plus the cell phone industry paved the way for VOIP by getting everyone used to really, really crappy voice quality
What accounts for this difference? Is there something inherently worse about the nature of cell phone infrastructure over land-line use?
I'm totally naive on such subjects.
I'm just old enough to remember landlines being widespread, but nearly all of my phone calls have been via cell since the mid 00s, so I can't judge quality differences given the time that's passed.
Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber. And at first, the novelty of being able to call anyone anywhere, even with this awful quality, was novel enough that people would accept it. And most people did until the carriers decided they could allocate a little more with VoLTE, if it works on your phone in your area.
> Because at some point, someone decided that 8 kbps makes for an acceptable audio stream per subscriber.
Has it not been like this for a very long time? I was under the impression that "voice frequency" being defined as up to 4 kHz was a very old standard - after all, (long-distance) phone calls have always been multiplexed through coaxial or microwave links. And it follows that 8kbps is all you need to losslessly digitally sample that.
I assumed it was jitter and such that lead to lower quality of VoIP/cellular, but that's a total guess. Along with maybe compression algorithms that try to squeeze the stream even tighter than 8kbps? But I wouldn't have figured it was the 8kHz sample rate at fault, right?
Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation", but the rumbling of the vocal coords isn't the entire audio data which is present in-person. Clicks of the tongue and smacking of the lips make much higher frequencies, and higher sample rates capture the timbre/shape of the soundwave instead of rounding it down to a smooth sine wave. Discord defaults to 64kbps, but you can push it up to 96kbps or 128kbps with nitro membership, and it's not hard to hear an improvement with the higher bitrates. And if you've ever used bluetooth audio, you know the difference in quality between the bidirectional call profile, and the unidirectional music profile, and wished to have the bandwidth of the music profile with the low latency of the call profile.
> Sure, if you stop after "nobody's vocal coords make noises above 4khz in normal conversation"
Huh? What? That's not even remotely true.
If you read your comment out loud, the very first sound you'd make would have almost all of its energy concentrated between 4 and 10 kHz.
Human vocal cords constantly hit up to around 10 kHz, though auditory distinctiveness is more concentrated below 4 kHz. It is unevenly distributed though, with sounds like <s> and <sh> being (infamously) severely degraded by a 4 kHz cut-off.
AMR (adaptive multi-rate audio codec) can get down to 4.75 kbit/s when there's low bandwidth available, which is typically what people complain about as being terrible quality.
The speech codecs are complex and fascinating, very different from just doing a frequency filter and compressing.
The base is linear predictive coding, which encodes the voice based on a simple model of the human mouth and throat. Huge compression but it sounds terrible. Then you take the error between the original signal and the LPC encoded signal, this waveform is compressed heavily but more conventionally and transmitted along with the LPC signal.
Phones also layer on voice activity detection, when you aren't talking the system just transmits noise parameters and the other end hears some tailored white noise. As phone calls typically have one person speaking at a time and there are frequent pauses in speech this is a huge win. But it also makes mistakes, especially in noisy environments (like call centers, voice calls are the business, why are they so bad?). When this happens the system becomes unintelligible because it isn't even trying to encode the voice.
The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.
The 8kbps rates on cellular are the more complicated (relative to G.711) AMR-NB encoding. AMR supports voice rates from about 5-12kbps with a typical 8kbps rate. There's a lot more pre and post processing of the input signal and more involved encoding. There's a bit more voice information dropped by the encoder.
Part of the quality problem even today with VoLTE is different carriers support different profiles and calls between carriers will often drop down to the lowest common codec which is usually AMR-NB. There's higher bitrate and better codecs available in the standard but they're implemented differently by different carriers for shitty cellular carrier reasons.
> The 8KHz samples were encoded with relatively low encoding complexity PCM (G.711) at 8KHz. That gets to a 64kbps data channel rate. This was the standard for "toll quality" audio. Not 8kbps.
I'm a moron, thanks. I think I got the sample rate mixed up with the bitrate. Appreciate you clearing that up - and the other info!
And memory. In the heyday of ATM (late 90s) a few megabytes was quite expensive for a set-top box, so you couldn't buffer many seconds of compressed video.
Also, the phone companies had a pathological aversion to understanding Moore's law, because it suggested they'd have to charge half as much for bandwidth every 18 months. Long distance rates had gone down more like 50%/decade, and even that was too fast.
Love those analogies . This is one of main reason I love hacker news / reddit . Honest golden experiences
I worked on a network that used a protocol very similar to ATM (actually it was the first Iridium satellite network). An internet based on ATM would have been amazing. You’re basically guaranteeing a virtual switched circuit, instead of the packets we have today. The horror of packet switching is all the buffering it needs, since it doesn’t guarantee circuits.
Bandwidth is one thing, but the real benefit is that ATM also guaranteed minimal latencies. You could now shave off another 20-100ms of latency for your FaceTime calls, which is subtle but game changing. Just instant-on high def video communications, as if it were on closed circuits to the next room.
For the same reasons, the AI analogy could benefit from both huge processing as well as stronger algorithms.
> You’re basically guaranteeing a virtual switched circuit
Which means you need state (and the overhead that goes with it) for each connection within the network. That's horribly inefficient, and precisely the reason packet-switching won.
> An internet based on ATM would have been amazing.
No, we'd most likely be paying by the socket connection (as somebody has to pay for that state keeping overhead), which sounds horrible.
> You could now shave off another 20-100ms of latency for your FaceTime calls, which is subtle but game changing.
Maybe on congested Wi-Fi (where even circuit switching would struggle) or poorly managed networks (including shitty ISP-supplied routers suffering from horrendous bufferbloat). Definitely not on the majority of networks I've used in the past years.
> The horror of packet switching is all the buffering it needs [...]
The ideal buffer size is exactly the bandwidth-delay product. That's really not a concern these days anymore. If anything, buffers are much too large, causing unnecessary latency; that's where bufferbloat-aware scheduling comes in.
The cost for interactive video would be a requirement of 10x bandwidth, basically to cover idle time. Not efficient but not impossible, and definitely wouldn’t change ISP business models.
The latency benefit would outweigh the cost. Just absolutely instant video interaction.
It is fascinating to think that before digital circuits phone calls were accomplished by an end-to-end electrical connection between the handsets. What luxury that must have been! If only those ancestors of ours had modems and computers to use those excellent connections for low-latency gaming... :-)
Einstein would like to have a word…
And for the little bit of impact queueing latency has (if done well, i.e. no bufferbloat), I doubt anyone would notice the difference, honestly.
You’re arguing for a reduction in quality in internet services. People do notice those things. It’s like claiming people don’t care about slimmer iPhones. They do.
Man, I saw a presentation on Iridium when I was at Motorola in the early 90s, maybe 92? Not a marketing presentation - one where an engineer was talking, and had done their own slides.
What I recall is that it was at a time when Internet folks had made enormous advances in understanding congestion behavior in computer networks, and other folks (e.g. my division of Motorola) had put a lot of time into understanding the limited burstiness you get with silence suppression for packetized voice, and these folks knew nothing about it.
> ... guaranteed minimal latencies. You could now shave off another 20-100ms of latency for your FaceTime calls...
I already do this. But I cheat - I use a good router (OpenWrt One) that has built-in controls for Bufferbloat. See [How OpenWrt Vanquishes Bufferbloat](https://forum.openwrt.org/t/how-openwrt-vanquishes-bufferblo...)
> The horror of packet switching is all the buffering it needs, since it doesn’t guarantee circuits.
You don't actually need all that much buffering.
Buffer bloat is actually a big problem with conventional TCP. See eg https://news.ycombinator.com/item?id=14298576
I remember my professor saying how the fixed packet size in ATM (53 bytes) was a committee compromise. North America wanted 64 bytes, Europe wanted 32 bytes. The committee chose around the midway point.
53 byte frames is what results in the exact compromise of 48 bytes for the payload size.
Doesn’t your point about video compression tech support Nvidia’s bull case?
Better video compression led to an explosion in video consumption on the Internet, leading to much more revenue for companies like Comcast, Google, T-Mobile, Verizon, etc.
More efficient LLMs lead to much more AI usage. Nvidia, TSMC, etc will benefit.
No - because this eliminates entirely or shifts the majority of work from GPU to CPU - and Nvidia does not sell CPUs.
If the AI market gets 10x bigger, and GPU work gets 50% smaller (which is still 5x larger than today) - but Nvidia is priced on 40% growth for the next ten years (28x larger) - there is a price mismatch.
It is theoretically possible for a massive reduction in GPU usage or shift from GPU to CPU to benefit Nvidia if that causes the market to grow enough - but it seems unlikely.
Also, I believe (someone please correct if wrong) DeepSeek is claiming a 95% overall reduction in GPU usage compared to traditional methods (not the 50% in the example above).
If true, that is a death knell for Nvidia's growth story after the current contracts end.
I can see close to zero possibility that the majority of the work will be shifted to the CPU. Anything a CPU can do can just be done better with specialised GPU hardware.
Then why do we have powerful CPUs instead of a bunch of specialized hardware? It's because the value of a CPU is in its versatility and ubiquity. If a CPU can do a thing good enough, then most programs/computers will do that thing on a CPU instead of having the increased complexity and cost of a GPU, even if a GPU would do it better.
We have both? Modern computing devices like smart phones use SoCs with integrated GPUs. GPUs aren't really specialized hardware, either, they are general purpose hardware useful in many scenarios (built for graphics originally but clearly useful in other domains including AI).
People have been saying the exact same thing about other workloads for years, and always been wrong. Mostly claiming custom chips or FPGAs will beat out general purpose CPUs.
> Anything a CPU can do can just be done better
Nope. Anything inheriantly serial is better off on the CPU due to caching and it's architecture.
Many things that are highly parallizable are getting GPU enabled. Games and ML are GPU by default, but many things are migrating to CUDA.
You need both for cheap, high performance computing. They are different workloads.
Yes, I was too hasty in my response. I should have been more specific that I mean ML/AI type tasks. I see no way that we end up on general purpose CPUs for this.
The graphics in games are GPU by default. But the game logic itself is seldom run on the CPU as far as I can tell.
In terms of inference (and training) of AI models, sure, most things that a CPU core can do would be done cheaper per unit of performance on either typical GPU or NPU cores.
On desktop, CPU decoding is passable but it's still better to have a graphics card for 4K. On mobile, you definitely want to stick to codecs like H264/HEVC/AVC1 that are supported in your phone's decoder chips.
CPU chipsets have borrowed video decoder units and SSE instructions from GPU-land, but the idea that video decoding is a generic CPU task now is not really true.
Now maybe every computer will come with an integrated NPU and it won't be made by Nvidia, although so far integrated GPUs haven't supplanted discrete ones.
I tend to think today's state-of-the-art models are ... not very bright, so it might be a bit premature to say "640B parameters ought to be enough for anybody" or that people won't pay more for high-end dedicated hardware.
> Now maybe every computer will come with an integrated NPU and it won't be made by Nvidia, although so far integrated GPUs haven't supplanted discrete ones.
Depends on what form factor you are looking at. The majority of computers these days are smart phones, and they are dominated by systems-on-a-chip.
That's just factually wrong, DeepSeek is still terribly slow on CPUs. There's nothing different about how it works numerically.
I think SIMD is not so much better than SIMT for solved problems as a level in claiming a problem as solved.
What do you think GPUs are? Basically SIMD asics.
That's also what AVX is but with a conservative number of threads.. If you really understand your problem I don't see why you would need 32 threads of much smaller data size or why you would want that far away from your CPU.
Whether your new coprocessor or instructions look more like a GPU or something else doesn't really matter if we are done squinting and calling it graphics like problems and/or claiming it needs a lot more than a middle class PC.
It lead to more revenue for the industry as a whole. But not necessarily for the individual companies that bubbled the hardest: Cisco stock is still to this day lower than it was at peak in 2000, to point to a significant company that sold actual physical infra products necessary for the internet and still around and profitable to this day. (Some companies that bubbled did quite well, AMZN is like 75x from where it was in 2000. But that's a totally different company that captured an enormous amount of value from AWS that was not visible to the market in 2000, so it makes sense.)
If stock market-cap is (roughly) the market's aggregated best guess of future profits integrated over all time, discounted back to the present at some (the market's best guess of the future?) rate, then increasing uncertainty about the predicted profits 5-10 years from now can have enormous influence on the stock. Does NVDA have an AWS within it now?
>It lead to more revenue for the industry as a whole. But not necessarily for the individual companies that bubbled the hardest: Cisco stock is still to this day lower than it was at peak in 2000, to point to a significant company that sold actual physical infra products necessary for the internet and still around and profitable to this day. (Some companies that bubbled did quite well, AMZN is like 75x from where it was in 2000. But that's a totally different company that captured an enormous amount of value from AWS that was not visible to the market in 2000, so it makes sense.)
Cisco in 1994: $3.
Cisco after dotcom bubble: $13.
So is Nvidia's stock price closer to 1994 or 2001?
I agree that advancements like DeepSeek, like transformer models before it, is just going to end up increasing demand.
It’s very shortsighted to think we’re going to need fewer chips because the algorithms got better. The system became more efficient, which causes induced demand.
It will increase the total volume demanded, but not necessarily the amount of value that companies like NVidia can capture.
Most likely, consumer surplus has gone up.
More demand for what, chatbots? ai slop? buggy code?
obligatory https://en.wikipedia.org/wiki/Jevons_paradox
If you normalize Nvidia's gross margin and take into account of competitors sure. But its current high margin is driven by Big Tech FOMO. Do keep in mind that 90% margin or 10x cost to 50% margin or 2x cost is a 5x price reduction.
So why would DeepSeek decrease FOMO? It should increase it if anything.
Because DeepSeek demonstrates that loads of compute isn't necessary for high-performing models, and so we won't need as much and as powerful of hardware as was previously thought, which is what Nivida's valuation is based on?
That's assuming there isn't demand for more powerful models, there's still plenty of room for improvement from the current generation. We didn't stop at GPT-3 level models when that was achieved.
No, it doesn't.
Not only are 10-100x changes disruptive, but the players who don't adopt them quickly are going to be the ones who continue to buy huge amounts of hardware to pursue old approaches, and it's hard for incumbent vendors to avoid catering to their needs, up until it's too late.
When everyone gets up off the ground after the play is over, Nvidia might still be holding the ball but it might just as easily be someone else.
Yes, over the long haul, probably. But as far as individual investors go they might not like that Nvidia.
Anyone currently invested is presumably in because they like the insanely high profit margin, and this is apt to quash that. There is now much less reason to give your first born to get your hands on their wares. Comcast, Google, T-Mobile, Verizon, etc., and especially those not named Google, have nothingburger margins in comparison.
If you are interested in what they can do with volume, then there is still a lot of potential. They may even be more profitable on that end than a margin play could ever hope for. But that interest is probably not from the same person who currently owns the stock, it being a change in territory, and there is apt to be a lot of instability as stock changes hands from the one group to the next.
> Anyone currently invested is presumably in because they like the insanely high profit margin, [...]
I'm invested in Nvidia because it's part of the index that my ETF is tracking. I have no clue what their profit margins are.
> I'm invested in Nvidia [...] my ETF
That would be an unusual situation for an ETF. An ETF does not usually extend ownership of the underlying investment portfolio. An ETF normally offers investors the opportunity to invest in the ETF itself. The ETF is what you would be invested in. Your concern as an investor in an ETF would only be with the properties of the ETF, it being what you are invested in, and this seems to be true in your case as well given how you describe it.
Are you certain you are invested in Nvidia? The outcome of the ETF may depend on Nvidia, but it may also depend on how a butterfly in Africa happens to flap its wings. You aren't, by any common definition found within this type of context, invested in that butterfly.
Technically, all the Nvidia stock (and virtually all stocks in the US) are owned by Cede and Co. So Nvidia has only one investor.[0] There's several layers of indirection between your Robinhood portfolio and the actual Nvidia shares, even if Robinhood mentions NVDA as a position in your portfolio.
The ETF is just one more layer of indirection. You might like to read https://en.wikipedia.org/wiki/Exchange-traded_fund#Arbitrage... to see how ETFs are connected to the underlying assets.
You will find that the connection between ETFs and the underlying assets in the index is much more like the connection between your Robinhood portfolio and Nvidia, than the connection between butterflies and thunderstorms.
[0] At least for its stocks. Its bonds are probably held in different but equally weird ways.
> Technically, all the Nvidia stock (and virtually all stocks in the US) are owned by Cede and Co.
Technically, but they extend ownership. An ETF is a different type of abstraction. Which you already know because you spoke about that abstraction in your original comment, so why play stupid now?
It improves TSMC' case.. Paying Nvidia would be like paying Cray for every smartphone that is faster than a supercomputer of old.
It seems more stark even. The energy costs that are current and then projected for AI are staggering. At the same time, I think it has been MS that has been publishing papers on LLMs that are smaller (so called small language models) but more targeted and still achieving a fairly high "accuracy rate."
Didn't TMSC say that SamA came for a visit and said they needed $7T in investment to keep up with the pending demand needs.
This stuff is all super cool and fun to play with, I'm not a nay sayer but it almost feels like these current models are "bubble sort" and who knows how it will look if "quicksort" for them becomes invented.
Another example: people like to cite how the people who really made money in the CA gold rush were selling picks and shovels.
That only lasted so long. Then it was heavy machinery (hydraulics, excavators, etc)
I always like the "look" of high bit rate Mpeg2 video. Download HD japanese TV content from 2005-2010 and it still looks really good.
Yes, that is a very apt analogy!
>but there's no physical limit preventing them from being discovered, and the disruption when someone invents a new algorithm can be nearly immediate.
The rise of the net is Jevons paradox fulfilled. The orders of magnitude less bandwidth needed per cat video drove much more than that in overall growth in demand for said videos. During the dotcom bubble's collapse, bandwidth use kept going up.
Even if there is a near-term bear case for NVDA (dotcom bubble/bust), history indicates a bull case for the sector overall and related investments such as utilities (the entire history of the tech sector from 1995 to today).
I love algorithms as much the next guy, but not really.
DCT was developed in 1972 and has a compression ratio of 100:1.
H.264 compresses 2000:1.
And standard resolution (480p) is ~1/30th the resolution of 4k.
---
I.e. Standard resolution with DCT is smaller than 4k with H.264.
Even high-definition (720p) with DCT is only twice the bandwidth of 4k H.264.
Modern compression has allowed us to add a bunch more pixels, but it was hardly a requirement for internet video.
The web didn't go from streaming 480p straight to 4k. There were a couple of intermediate jumps in pixel count that were enabled in large part by better compression. Notably, there was a time period where it was important to ensure your computer had hardware support for H.264 decode, because it was taxing on low-power CPUs to do at 1080p and you weren't going to get streamed 1080p content in any simpler, less efficient codec.
Right.
Modern compression algorithms were developed but not even computationally available for some of the time.
DCT is not an algorithm at all, it’s a mathematical transform.
It doesn’t have a compression ratio.
Correct. DCT maps N real numbers to N real numbers. It reorganizes the data to make it more amenable to compression, but DCT itself doesn't do any compression.
The real compression comes from quantization and entropy coding (Huffman coding, arithmetic coding, etc.).
> DCT compression, also known as block compression, compresses data in sets of discrete DCT blocks.[3] DCT blocks sizes including 8x8 pixels for the standard DCT, and varied integer DCT sizes between 4x4 and 32x32 pixels.[1][4] The DCT has a strong energy compaction property,[5][6] capable of achieving high quality at high data compression ratios.[7][8] However, blocky compression artifacts can appear when heavy DCT compression is applied.
https://en.wikipedia.org/wiki/Discrete_cosine_transform
Exactly, it’s not an algorithm, it’s one mechanism used in many (most?) compression algorithms.
Therefore, it has no compression ratio, and it doesn’t make sense to compare it to other algorithms.
I'm sure it helped, but yeah, not only e2e bandwidth but also the total network throughput increased by vast orders of magnitude.
DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.
When someone can replicate your model for 5% of the cost in 2 years, I can only see 2 rational decisions:
1) Start focusing on cost efficiency today to reduce the advantage of the second mover (i.e. trade growth for profitability)
2) Figure out how to build a real competitive moat through one or more of the following: economies of scale, network effects, regulatory capture
On the second point, it seems to me like the only realistic strategy for companies like OpenAI is to turn themselves into a platform that benefits from direct network effects. Whether that's actually feasible is another question.
This is wrong. First mover advantage is strong. This is why OpenAI is much bigger than Mixtral despite what you said.
First mover advantage acquired and keeps subscribers.
No one really cares if you matched GPT4o one year later. OpenAI has had a full year to optimize the model, build tools around the model, and used the model to generate better data for their next generation foundational model.
What is OpenAI's first-mover moat? I switched to Claude with absolutely no friction or moat-jumping.
Brand - it's the most powerful first-mover advantage in this space.
ChatGPT is still vastly more popular than other, similar chat bots.
What is Google's first mover moat? I switched to Bing/DuckDuckGo with absolutely no friction or moat jumping.
Brands are incredibly powerful when talking about consumer goods.
Google wasn't the first mover in search. They were at least second if not third.
Google's moat was significantly better results than the competition for about 2 decades.
Your analogy is valid at this time, but proves the GP's point, not yours.
I think it's worth double clicking here. Why did Google have significantly better search results for a long time?
1) There was a data flywheel effect, wherein Google was able to improve search results by analyzing the vast amount of user activity on its site.
2) There were real economies of scale in managing the cost of data centers and servers
3) Their advertising business model benefited from network effects, wherein advertisers don't want to bother giving money to a search engine with a much smaller user base. This profitability funded R&D that competitors couldn't match.
There are probably more that I'm missing, but I think the primary takeaway is that Google's scale, in and of itself, led to a better product.
Can the same be said for OpenAI? I can't think of any strong economies of scale or network effects for them, but maybe I'm missing something. Put another way, how does OpenAI's product or business model get significantly better as more people use their service?
You are forgetting a bit, I worked in some of the large datacenters where both Google and Yahoo had cages.
1) Google copied the hotmail model of strapping commodity PC components to cheap boards and building software to deal with complexity.
2) Yahoo had a much larger cage, filled with very very expensive and large DEC machines, with one poor guy sitting in a desk in there almost full time rebooting the systems etc....I hope he has any hearing left today.
3) Just right before the .com crash, I was in a cage next to Google's racking dozens of brand new Netra T1s, which were pretty slow and expensive...that company I was working for died in the crash.
Look at Google's web page:
https://www.webdesignmuseum.org/gallery/google-1999
Compare that to Yahoo:
https://www.webdesignmuseum.org/gallery/yahoo-in-1999
Or the company they originaly tried to sell google to Excite:
https://www.webdesignmuseum.org/gallery/excite-2001
Google grew to be profitable because they controlled costs, invested in software vs service contracts and enterprise gear, had a simple non-intrusive text based ad model etc...
Most of what you mention above was well after that model focused on users and thrift allowed them to scale and is survivorship bias. Internal incentives that directed capitol expenditures to meet the mission vs protect peoples back was absolutely a related to their survival.
Even though it was a metasearch, my personal preference was SavvySearch until it was bought and killed or what ever that story way.
OpenAI is far more like Yahoo than Google.
> I hope he has any hearing left today
I opted for a fanless graphics board, for just that reason.
In theory, the more people use the product, the more OpenAI knows what they are asking about and what they do after the first result, the better it can align its model to deliver better results.
A similar dynamic occurred in the early days of search engines.
I call it the experience flywheel. Humans come with problems, AI asistant generates some ideas, human tries them out and comes back to iterate. The model gets feedback on prior ideas. So you could say AI tested an idea in the real world, using a human. This happens many times over for 300M users at OpenAI. They put a trillion tokens into human brains, and as many into their logs. The influence is bidirectional. People adapt to the model, and the model adapts to us.. But that is in theory.
In practice I never heard OpenAI mention how they use chat logs for improving the model. They are either afraid to say, for privacy reasons, or want to keep it secret for technical advantage. But just think about the billions of sessions per month. A large number of them contain extensive problem solving. So the LLMs can collect experience, and use it to improve problem solving. This makes them into a flywheel of human experience.
They have more data on what people want from models?
Their SOTA models can generate better synthetic data for the next training run - leading to a flywheel effect?
> What is Google's first mover moat?
AdSense
But _why_ did AdSense work? They had to bootstrap with eyeballs.
Claude has effectively no eyeballs. API calls != eyeballs.
It's like people forget Google is an ad company
>What is OpenAI's first-mover moat?
The same one that underpins the entire existence of a little company called Spotify: I'm just too lazy to cancel my subscription and move to a newer player.
Not exactly a good sign for OpenAI considering Spotify has no power to increase prices enough such that it can earn a decent profit. Spotify’s potential is capped at whatever Apple/Amazon/Alphabet let them earn.
OpenAI has a lot more revenue than Claude.
Late in 2024, OpenAI had $3.7b in revenue. Meanwhile, Claude’s mobile app hit $1 million in revenue around the same time.
> Late in 2024, OpenAI had $3.7b in revenue
Where do they report these ?
edit i found it here https://www.cnbc.com/2024/09/27/openai-sees-5-billion-loss-t...
"OpenAI sees roughly $5 billion loss this year on $3.7 billion in revenue"
almost everyone I know is the same. 'Claude seems to be better and can take more data' is what I hear a lot.
One moat will eventually come in the form of personal knowledge about you - consider talking with a close friend of many years vs a stranger
Couldn't you just copy all your conversations over?
*sigh*
This broken record again.
Just observe reality. OpenAI is leading, by far.
All these "OpenAI has no moat" arguments will only make sense whenever there's a material, observable (as in not imaginary), shift on their market share.
I moved 100% over to deepseek. No switch cost. Zero.
These things aren't the same, though... yet.
ChatGPT is somewhat less censored (certainly on topics painful to the CCP), and GPT is multi-modal, which is a big selling point.
Depends on your use-case, of course.
OpenAI does not have a business model that is cashflow positive at this point and/or a product that gives them a significant leg up in the same moat sense Office/Teams might give to Microsoft.
Companies in the mobile era took a decade or more to become profitable. For example, Uber and Airbnb.
Why do you expect OpenAI to become profitable after 3 years of chatgpt?
Interest rates have an effect too, Uber and Airbnb were starting in a much more fundraising friendly time.
High interest rates are supposed to force the remaining businesses out there to be profitable, so in theory, the startups of today should be far faster to profitability or they burn out.
True, but it makes it much more difficult to get started in the first place.
Nobody expects it but what we know for sure is that they have burnt billions of dollars. If other startups can get there spending millions, the fact is that openai won't ever be profitable.
And more important (for us), let the hiring frenzy start again :)
They have a ton of revenue and high gross margins. They burn billions because they need to keep training ever better models until the market slows and competition consolidates.
The counter argument is that they won't be able to sustain those gross margins when the market matures because they don't have an effective moat.
In this world, R&D costs and gross margin/revenue are inextricably correlated.
When the market matures, there will be fewer competitors so they won’t need to sustain the level of investment.
The market always consolidates when it matures. Every time. The market always consolidates into 2-3 big players. Often a duopoly. OpenAI is trying to be one of the two or three companies left standing.
> First mover advantage acquired and keeps subscribers.
Does it? As a chat-based (Claude Pro, ChatGPT Plus etc.) user, LLMs have zero stickiness to me right now, and the APIs hardly can be called moats either.
If it's for mass consumer market then it does matter. Ask any non-technical person around you. High chance is that they know ChatGPT but can't name a single other AI model or service. Gemini, just a distant maybe. Claude, definitely not -- I'm positive I'm hard pressed to find anyone in my technical friends who knows about Claude.
They probably know CoPilot as the thing Microsoft is trying to shove down their throat...
They also burnt a hell of a lot more cash. That’s a disadvantage.
> DeepSeek just further reinforces the idea that there is a first-move disadvantage in developing AI models.
you are assuming that what DeepSeek achieved can be reasonably easily replicated by other companies. then the question is when all big techs and tons of startups in China and the US are involved, how come none of those companies succeeded?
deepseek is unique.
Deepseek is unique, but the US has consistently underestimated Chinese R&D, which is not a winning strategy in iterated games.
There seem to be a 100 fold uptick in jingoists in the last 3-4 years which makes my head hurt but I think there is no consistent "underestimation" in academic circles? I think I have read articles about the up and coming Chinese STEM for like 20 years.
Yes, for people in academia the trend is clear, but it seems that WallStreet didn't believe this was possible. They assume that spending more money is all you need to dominate technology. Wrong! Technology is about human potential. If you have less money but bigger investment in people you'll win the technological race.
I think Wall Street is in for surprise as they have been profiting from liquidating the inefficiency of worker trust and loyalty for quite some time now.
It think they think American engineering excellence was due to neoliberal inginuenity visavi the USSR, not the engineers and the transfer of academic legacy from generation to generation.
This is even more apparent when large tech corporations are, supposedly, in a big competition but at the same time firing thousands of developers and scientists. Are they interested in making progress or just reducing costs?
What does DeepSeek or really High Flyer do that is particularly exceptional regarding employees? HFT and other elite law or Hedge funds are known to have pretty zany benefits.
Orwellian Communism is the opposite of investing in people.
Whatever you think about the Chinese system, they educate hundreds of thousands of engineers and scientists every year. That's a fact.
Precisely. This is the view from the ivory tower.
That doesn't the calculus regarding the actions you would pick externally, in fact it only strengthens the point for increased tech restrictions and more funding.
Unique, ye, but isn't their method open? I read something about a group replicating a smaller variant of their main model.
Which brings the question, if LLMs are an asset of such strategic value, why did China allow the DeepSeek to be released?
I see two possibilities here, either that the CCP is not that all-reaching as we think, or that the value of the technology isn't critical, and that the release was further cleared with the CCP and maybe even timed to come right after Trump's announcement of American AI supremacy.
I really doubt there was any intention behind it at all. I bet deepseek themselves are surprised at the impact this is having, and probably regret releasing so much information into the open.
It's early innings, and supporting the open source community could be viewed by the CCP as an effective way to undermine the US's lead in AI.
In a way, their strategy could be:
1) Let the US invest $1 trillion in R&D
2) Support the open source community such that their capability to replicate these models only marginally lags the private sector
3) When R&D costs are more manageable, lean in and play catch up
It is hard to estimate how much it is "didn't care", "didn't know" or "did it" I think. Rather pointless unless there are public party discussion about it to read.
It will be assumed by the American policy establishment that this represents what the CCP doesn't consider important, meaning that they have even better stuff in store. It will also be assumed that this was timed to take a dump on Trump's announcement, like you said.
And it did a great job. Nvidia stock's sunk, and investors are going to be asking if it's really that smart to give American AI companies their money when the Chinese can do something similar for significantly less money.
I mean, it's a strategic asset in the sense that it's already devalued a lot of the American tech companies because they're so heavily invested in AI. Just look at NVDA today.
We have one success after ~two years of ChatGPT hype (and therefore subsequent replication attempts). That's as fast as it gets.
Your making some big assumptions projecting into the future. One that deepseek takes market position, two that the information they have released is honest regarding training usage, spend etc.
Theres a lot more still to unpack and I don’t expect this to stay solely in the tech realm. Seems to politically sensitive.
I feel like AI tech just reverse scales and reverse flywheels, unlike the tech giant walls and moats now, and I think that is wonderful. OpenAI has really never made sense from a financial standpoint and that is healthier for humans. There’s no network effect because there’s no social aspect to AI chatbots. I can hop on DeepSeek from Google Gemini or OpenAI at ease because I don’t have to have friends there and/or convince them to move. AI is going to be a race to the bottom that keeps prices low to zero. In fact I don’t know how they are going to monetize it at all.
DeepSeek is profitable, openai is not. That big expensive moat won't help much when the competition knows how to fly.
DeepSeek is not profitable. As far as I know, they don’t have any significant revenue from their models. Meanwhile, OpenAI has $3.7b in revenue last reported and has high gross margins.
tell that to the stock market then, it might change the graph direction back to green.
I’m doing the best I can.
Deepseek inference API has positive margin. This however does not take into account R&D like salary and training cost. I believe OpenAI is the same in these aspects, at least before now.
Even if DeepSeek has figured out how to do more (or at least as much) with less, doesn't the Jevons Paradox come into play? GPU sales would actually increase because even smaller companies would get the idea that they can compete in a space that only 6 months ago we assumed would be the realm of the large mega tech companies (the Metas, Googles, OpenAIs) since the small players couldn't afford to compete. Now that story is in question since DeepSeek only has ~200 employees and claims to be able to train a competitive model for about 20X less than the big boys spend.
My interpretation is that yes in the long haul, lower energy/hardware requirements might increase demand rather than decrease it. But right now, DeepSeek has demonstrated that the current bottleneck to progress is _not_ compute, which decreases the near term pressure on buying GPUs at any cost, which decreases NVIDIA's stock price.
Short term, I 100% agree, but remains to be seen what "short" means. According to at least some benchmarks, Deepseek is two full orders of magnitude cheaper for comparable performance. Massive. But that opens the door for much more elaborate "architectures" (chain of thought, architect/editor, multiple choice) etc, since it's possible to run it over and over to get better results, so raw speed & latency will still matter.
I think it's worth carefully pulling apart _what_ DeepSeek is cheaper at. It's somewhat cheaper at inference (0.3 OOM), and about 1-1.5 OOM cheaper for training (Inference costs: https://www.latent.space/p/reasoning-price-war)
It's also worth keeping in mind that depending on benchmark, these values change (and can shrink quite a bit)
And it's also worth keeping in mind that the drastic drop in training cost(if reproducible) will mean that training is suddenly affordable for a much larger number of organizations.
I'm not sure the impact on GPU demand will be as big as people assume.
It does, but proving that it can be done with cheaper (and more importantly for NVidia), lower margin chips breaks the spell that NVidia will just be eating everybody's lunch until the end of time.
If demand for AI chips will increase due to Jevon’s paradox, why would Nvidia’s chips become cheaper?
In the long run, yes, they will be cheaper due to more competition and better tech. But next month? It will be more expensive.
The usage of existing but cheaper nvidia chips to make models of similar quality is the main takeaway.
It'll be much harder to convince people to buy the latest and greatest with this out there.
The sweet spot for running local LLMs (from what I'm seeing on forums like r/localLlama) is 2 to 4 3090s each with 24GB of VRAM. NVidia (or AMD or Intel) would clean up if they offered a card with 3090 level performance but with 64GB of VRAM. Doesn't have to be the leading edge GPU, just a decent GPU with lots of VRAM. This is kind of what Digits will be (though the memory bandwidth is going to be slower with because it'll be DDR5) and kind of what AMD's Strix Halo is aiming for - unified memory systems where the CPU & GPU have access to the same large pool of memory.
The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)
Because if you don't have infinite money, considering whether to buy a thing is about the ratio of price to performance, not just performance. If you can get enough performance for your needs out of a cheaper chip, you buy the cheaper chip.
The AI industry isn't pausing because DeepSeek is good enough. The industry is in an arms race to AGI. Having a more efficient method to train and use LLMs only accelerates progress, leading to more chip demand.
There is no indication that adding more compute will give AGI
Is there still evidence that more compute = better model?
Yes. Plenty of evidence.
The DeepSeek R1 model people are freaking out about, runs better with more compute because it's a chain of thoughts model.
Selling 100 chips for $1 profit is less profitable than selling 20 chips for $10 profit.
Margin only goes down if a competitor shows up. Getting more "performance" per chip will actually let nvidia raise prices even more if they want.
Since you no longer need CUDA, AMD becomes a new viable option.
Deepseek uses cuda.
Jevons paradox isn't some iron law like gravity.
feels like it is in tech. any gains in hardware or algorithm advance, immediately get consumed by increase in data retention and software bloat.
But why would the customers accept the high prices and high gross margin of Nvidia if they no longer fear missing out with insufficient hardware?
Important to note: the $5 million alleged cost is just the cpu compute cost for the final version of the model; it's not the cumulative cost of the research to date.
The analogous costs would be what OpenAI spent to go from GPT 4 to GPT 4o (i.e., to develop the reasoning model from the most up-to-date LLM model). $5 million is still less than what OpenAI spent but it's not a magnitude lower. (OpenAI spent up to $100 million on GPT4 but a fraction of that to get GPT 4o. Will update comment if I can find numbers for 4o before edit window closes)
It doesn't make sense to compare individual models. A better way is to look at total compute consumed, normalized by the output. In the end what counts is the cost of providing tokens.
Great article but it seems to have a fatal flaw.
As pointed out in the article, Nvidia has several advantages including:
Each of the advantages is under attack: The article concludes that NVIDIA faces an unprecedented convergence of competitive threats. The flaw in the analysis is that these threats are not unified. Any serious competitor must address ALL of Nvidia's advantages. Instead Nvidia is being attacked by multiple disconnected competitors, and each of those competitors is only attacking one Nvidia advantage at a time. Even if each of those attacks are individually successful, Nvidia will remain the only company that has ALL of the advantages.I want the NVIDIA monopoly to end, but there is no real competition still. * George Hotz has basically given up on AMD: https://x.com/__tinygrad__/status/1770151484363354195
* Groq can't produce more hardware past their "demo". It seems like they haven't grown capacity in the years since they announced, and they switched to a complete SaaS model and don't even sell hardware anymore.
* I dont know enough about MLX, Triton, and JAX,
I also noticed that Groq's Chief Architect now works for NVIDIA.
https://research.nvidia.com/person/dennis-abts
That George Hotz tweet is from March last year. He's gone back and forth on AMD a bunch more times since then.
The same Hotz who lasted like 4 weeks at Twitter after announcing that he'd fix everything? It doesn't really inspire a ton of confidence that he can single handedly take down Nvidia...
is that good or bad?
I consider it a good sign that he hasn’t completely given up. But it sure all seems shaky.
Honestly I tried searching his recent tweets for AMD and there was way too much noise in there to figure out his current position!
" we are going to move it off AMD to our own or partner silicon. We have developed it to be very portable."
https://x.com/__tinygrad__/status/1879617702526087346
Honest question. That sounds more difficult that getting things to play with commodity hardware. Maybe I am oversimplifying it though.
They have their own nn,etc libraries so adapting should be fairly focused and AMD drivers have a hilariously bad reputation historically among people who program GPU's (I've been bitten a couple of times myself by weirdness).
I think you should consider it as, if they're trying to avoid Nvidia and make sure their code isn't tied to NVidia-isms, and AMD is troublesome enough for basics the step to customized solutions is small enough to be worthwhile for something even cheaper than AMD.
Thanks, I don't have any experience in this realm and this was helpful to digest the problem space.
It looks like he’s close to having own AMD stack, tweet linked in the article, Jan 15,2025: https://x.com/__tinygrad__/status/1879615316378198516
We'll check in again with him in 3 months and he'll still be just 1 piece away.
$1000 bounty? That's like 2 hours of development time at market rate lol
> Any serious competitor must address ALL of Nvidia's advantages.
Not really, his article focuses on Nvidia's being valued so highly by stock markets, he's not saying that Nvidia's destined to lose its advantage in the space in the short term.
In any case, I also think that the likes of MSFT/AMZN/etc will be able to reduce their capex spending eventually by being able to work on a well integrated stack on their own.
They have an enormous amount of catching up to do, however; Nvidia have created an entire AI ecosystem that touches almost every aspect of what AI can do. Whatever it is, they have a model for it, and a framework and toolkit for working with or extending that model - and the ability to design software and hardware in lockstep. Microsoft and Amazon have a very diffuse surface area when it comes to hardware, and being a decent generalist doesn’t make you a good specialist.
Nvidia are doing phenomenal things with robotics, and that is likely to be the next shoe to drop, and they are positioned for another catalytic moment similar to that which we have seen with LLMS.
I do think we will see some drawback or at least deceleration this year while the current situation settles in, but within the next three years I think we will see humanoid robots popping up all over the place, particularly as labour shortages arise due to political trends - and somebody is going to have to provide the compute, both local and cloud, and the vision, movement, and other models. People will turn to the sensible and known choice.
So yeah, what you say is true, but I don’t think is going to have an impact on the trajectory of nvidia.
>So how is this possible? Well, the main reasons have to do with software— better drivers that "just work" on Linux and which are highly battle-tested and reliable (unlike AMD, which is notorious for the low quality and instability of their Linux drivers)
This does not match my experience from the past ~6 years of using AMD graphics on Linux. Maybe things are different with AI/Compute, I've never messed with that, but in terms of normal consumer stuff the experience of using AMD is vastly superior than trying to deal with Nvidia's out-of-tree drivers.
They are.
He's setting up a case for shorting the stock, ie if the growth or margins drop a little from any of these (often well-funded) threats. The accuracy of the article is a function of the current valuation.
Exactly. You just need to see a slight deceleration in projected revenue growth (which has been running 120%+ YoY recently) and some downward pressure on gross margins, and maybe even just some market share loss, and the stock could easily fall 25% from that.
AMD P/E ratio is 109, NVDA is 56. Which stock is overvalued?
You have to look at non-gaap numbers, and therefore looking at forward PE ratios is necessary. When you look at that, AMD is cheaper than NVDA. Moreover, the reason why AMD PE ratio looks high is because they bought xilinx, and in order to save on taxes, it makes their PE ratio look really high.
rofl Forward PE ....
Intel had a great P/E a couple of years ago as well :)
On the other hand, getting a bigger slice of the existing cake as a smaller challenger can be easier than baking a bigger cake as the incumbent.
Hey let’s buy intel
That is extraordinarily simplistic. If NVDA is slowing and AMD has gains to realize compared to NVDA, then the 10x difference in market cap would imply that AMD is the better buy. Which is why I am long in AMD. You can't just look at the current P/E delta. You have to look at expectations of one vs the other. AMD gaining 2x over NVDA means they are approximately equivalently valued. If there are unrealized AI related gains all bets are off. AMD closing 50% of the gap in market cap value between NVDA and AMD means AMD is ~2.5x undervalued.
Disclaimer: long AMD, and not precise on percentages. Just illustrating a point.
The point is, it should not be taken for granted that NVDA is overvalued. Their P/E is low enough that if you’re going to state that they are overvalued you have to make the case. The article while well written, fails to make the case because it has a flaw: it assumes that addressing just one of Nvidia’s advantages is enough to make it crash and that’s just not true.
If investing were as simple as looking at the P/E, all P/Es would already be at 15-20, wouldn't they?
Not saying it is as simple as looking at P/E
My point is that you have to make the case for anything being over/undervalued. The null hypothesis is that the market has correctly valued it, after all.
In the long run, probably yes, but a particular stock is less likely to be accurately value in the short run.
If medium to long term you believe the space will eventually get commoditized I the bear case is obvious. And based on history there's a pretty high likelihood for that to happen.
glad you are not my financial adviser :)
If it were all so simple, they wouldn’t pay hedge fund analysts so much money…
No thats not true. Hedge funds get paid so well because getting a small percentage of a big bag of money is still a big bag of money. This statement is more true the closer the big bag of money is to infinity.
NVDA is valued at $3.5 trillion, which means investors think it will grow to around $1 trillion in yearly revenue. Current revenue is around $35 billion per quarter, so call it $140 billion yearly. Investors are betting on a 7x increase in revenue. Not impossible, sounds plausible but you need to assume AMD, INTC, GOOG, AMZN, and all the others who make GPUs/TPUs either won't take market share or the market will be worth multiple trillions per year.
I thought the valuation of public companies at 3x revenues or 5x earnings has long since sailed?
Tech companies are valued higher because lots of people think there's still room for the big tech companies to consolidate market share and for the market itself to grow, especially as they all race towards AI. Low interest rates, tech and AI hype add to it.
Funny timing though, today NVDA lost $589 billion in market cap as the market got spooked.
[dead]
> The accuracy of the article is a function of the current valuation.
ah ... no ... that's nonsense trying to hide behind stilted math lingo.
> - Better Linux drivers than AMD
Unless something radically changed in the last couple years, I am not sure where you got this from? (I am specifically talking about GPUs for computer usage rather than training/inference)
> Unless something radically changed in the last couple years, I am not sure where you got this from?
This was the first thing that stuck out to me when I skimmed the article, and the reason I decided to invest the time reading it all. I can tell the author knows his shit and isn't just parroting everyone's praise for AMD Linux drivers.
> (I am specifically talking about GPUs for computer usage rather than training/inference)
Same here. I suffered through the Vega 64 after everyone said how great it is. So many AMD-specific driver bugs, AMD driver devs not wanting to fix them for non-technical reasons, so many hard-locks when using less popular software.
The only complaints about Nvidia drivers I found were "it's proprietary" and "you have to rebuild the modules when you update the kernel" or "doesn't work with wayland".
I'd hesitate to ever touch an AMD GPU again after my experience with it, haven't had a single hick-up for years after switching to Nvidia.
Another ding against Nvidia for Linux desktop use is that only some distributions either make it easy to install and keep the proprietary drivers updated (e.g. Ubuntu) and/or ship variants with the proprietary drivers preinstalled (Mint, Pop!_OS, etc).
This isn’t a barrier for Linux veterans but it adds significant resistance for part-time users, even those that are technically inclined, compared to the “it just works” experience one gets with an Intel/AMD GPU under just about every Linux distro.
Wayland was a requirement for me. I've used an AMD GPU for years. I had a bug exactly once with a linux update. But has been stable since.
Wayland doesn't matter in the server space though.
they are, unless you get distracted by things like licensing and out of tree drivers and binary blobs. If you'd rather pontificate about open source philosophy and rights than get stuff done, go right ahead.
The unification of the flaws is the scarcity of H100s
He says this and talks about it in The Fallout section - even at BigCos with megabucks the teams are starved for time on the Nvidia chips and if these innovations work other teams will use them and then boom Nvidia's moat is truncated somehow which doesn't look good at such lofty multiples
Sorry, I don’t know who George Hotz is, but why isn’t AMD making better drivers for AMD?
George Hotz is a hot Internet celebrity that has basically accomplished nothing of value but has a large cult following. You can safely ignore.
(Famous for hacking the PS3–except he just took credit for a separate group’s work. And for making a self-driving car in his garage—except oh wait that didn’t happen either.)
He took an “internship” at Twitter/X with the stated goal of removing the login wall, apparently failing to realize that the wall was a deliberate product decision, not a technical challenge. Now the X login wall is more intrusive than ever.
He was famous before the PS3 hack, he was the first person to unlock the original iPhone.
Yes, but it's worth mentioning that the break consisted of opening up the phone and soldering on a bypass for the carrier card locking logic. That certainly required some skills to do, but is not an attack Apple was defending against. This unlocking break didn't really lead to anything, and was unlike the later software unlocking methods that could be widely deployed.
Well he also found novel exploits in multiple later iPhone hardware/software models and implemented complete jailbreak applications.
You’re not wrong, but after all these years it’s fair to give benefit of the doubt - geohot may have grown as a person. The PS3 affair was incredibly disappointing.
Given the number of times he has been on the news for bombastic claims he doesn’t follow through on, I don’t think we need to guess. He hasn’t changed.
Comma.ai works really well. I use it every day in my car.
What about comma.ai?
He promised Waymo.
What specifically is in comma.ai that makes it less technically impressive? Comma.ai looks like epic engineering to me. I haven't made any self driving cars.
Why do you think otherwise? Can you share specific details?
> - Better Linux drivers than AMD
In which way? As a user who switched from an AMD-GPU to Nvidia-GPU, I can only report a continued amount of problems with NVIDIAs proprietary driver, and none with AMD. Is this maybe about the open source-drivers or usage for AI?
George is writing software to directly talk to consumer AMD hardware, so that he can sell more Tinyboxes. He won't be doing that for enterprise.
Cerbras and Groq need to solve the memory problem. They can't scale without adding 10x the hardware.
Don't forget they bought Mellanox and have their own HBA and switch business.
> George Hotz is making better drivers for AMD
lol
*George Hotz is making posts online talking about how AMD isn’t helping him
George Hotz tried to extort AMD into giving him $500k in free hardware and $2m cash, and they politely declined.
Was arguably not that polite and caused them some bad PR IMHO
You have to know the history and a bit of inside rumors to understand what was really going on.
What came out of it (and the semianalysis article) was that Anush would step up to the plate and work on improving the software.
George making noise is just a momentary blip in time that will be forgotten a week later…
A new entrant, with an order of magnitude advantage in e.g. cost or availability or exportability, can succeed even with poor drivers and no CUDA etc. Its only when you cost nearly as much as NVidia that the tooling costs become relevant.
There is not enough water (to cool data centers) to justify NVDA's current valuation.
The same is true of electricity - neither nuclear power nor fusion will not be online anytime soon.
Those are definitely not the limiting factors here.
Not nearly all data centers are water cooled, and there is this amazing technology that can convert sunlight into electricity in a relatively straightforward way.
AI workloads (at least training) are just about as geographically distributeable as it gets due to not being very latency-sensitive, and even if you can't obtain sufficient grid interconnection or buffer storage, you can always leave them idle at night.
Right - they are not limiting factors, they are reasons that NVDA is overvalued.
Stock price is based on future earnings.
The smart money knows this and is reacting this morning - thus the drop in NVDA.
Solar microgrids are cheaper and faster than nuclear. New nuclear isn't happening on the timescales that matter, even assuming significant deregulation.
Can you back up that solar microgrids will supply enough power to justify NVDA's current valuation?
Well, prediction is very difficult, especially with respect to the future. But the fundamentals look good.
Current world marketed energy consumption is about 18 terawatts. Current mainstream solar panels are 21% efficient. At this efficiency, the terrestrial solar resource is about 37000 terawatts, 2000 times larger than the entire human economy:
IEA reports that currently (three years ago) datacenters used 460TWh/year. In SI units, that's 0.05 terawatts. https://iea.blob.core.windows.net/assets/6b2fd954-2017-408e-...So, once datacenters are using seven hundred thousand times more power than currently, we might need to seek power sources for them other than terrestrial solar panels running microgrids. Solar panels in space, for example.
You could be forgiven for wondering why this enormous resource has taken so long to tap into and why the power grid is still largely fossil-fuel-powered. The answer is that building fossil fuel plants only costs on the order of US$1–4 per watt (either nameplate or average), and until the last few years, solar panels cost so much more than that that even free "fuel" wasn't enough to make them economically competitive. See https://www.eia.gov/analysis/studies/powerplants/capitalcost... for example.
Today, however, solar panels cost US$0.10 per peak watt, which works out to about US$0.35 to US$1 per average watt, depending largely on latitude. This is 25% lower than the price of even a year ago and a third of the price of two years ago. https://www.solarserver.de/photovoltaik-preis-pv-modul-preis...
Geohot still at it?
goat.
Great article.
>Now, you still want to train the best model you can by cleverly leveraging as much compute as you can and as many trillion tokens of high quality training data as possible, but that's just the beginning of the story in this new world; now, you could easily use incredibly huge amounts of compute just to do inference from these models at a very high level of confidence or when trying to solve extremely tough problems that require "genius level" reasoning to avoid all the potential pitfalls that would lead a regular LLM astray.
I think this is the most interesting part. We always knew a huge fraction of the compute would be on inference rather than training, but it feels like the newest developments is pushing this even further towards inference.
Combine that with the fact that you can run the full R1 (680B) distributed on 3 consumer computers [1].
If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs, what happens when that is only important to a small fraction of the overall AI compute?
[1]: https://x.com/awnihannun/status/1883276535643455790
Conversely, how much larger can you scale if frontier models only currently need 3 consumer computers?
Imagine having 300. Could you build even better models? Is DeepSeek the right team to deliver that, or can OpenAI, Meta, HF, etc. adapt?
Going to be an interesting few months on the market. I think OpenAI lost a LOT in the board fiasco. I am bullish on HF. I anticipate Meta will lose folks to brain drain in response to management equivocation around company values. I don't put much stock into Google or Microsoft's AI capabilities, they are the new IBMs and are no longer innovating except at obvious margins.
Google is silently catching up fast with Gemini. They're also pursuing next gen architectures like Titan. But most importantly, the frontier of AI capabilities is shifting towards using RL at inference (thinking) time to perform tasks. Who has more data than Google there? They have a gargantuan database of queries paired with subsequent web nav, actions, follow up queries etc. Nobody can recreate this, Bing failed to get enough marketshare. Also, when you think of RL talent, which company comes to mind? I think Google has everyone checkmated already.
Can you say more about using RL at inference time, ideally with a pointer to read more about it? This doesn’t fit into my mental model, in a couple of ways. The main way is right in the name: “learning” isn’t something that happens at inference time; inference is generating results from already-trained models. Perhaps you’re conflating RL with multistage (e.g. “chain of thought”) inference? Or maybe you’re talking about feeding the result of inference-time interactions with the user back into subsequent rounds of training? I’m curious to hear more.
I wasn't clear. Model weights aren't changing at inference time. I meant at inference time the model will output a sequence of thoughts and actions to perform tasks given to it by the user. For instance, to answer a question it will search the web, navigate through some sites, scroll, summarize, etc. You can model this as a game played by emitting a sequence of actions in a browser. RL is the technique you want to train this component. To scale this up you need to have a massive amount of examples of sequences of actions taken in the browser, the outcome it led to, and a label for if that outcome was desirable or not. I am saying that by recording users googling stuff and emailing each other for decades Google has this massive dataset to train their RL powered browser using agent. Deepseek proving that simple RL ca be cheaply applied to a frontier LLM and have reasoning organically emerge makes this approach more obviously viable.
Makes sense, thanks. I wonder whether human web-browsing strategies are optimal for use in a LLM, e.g. given how much faster LLMs are at reading the webpages they find, compared to humans? Regardless, it does seem likely that Google’s dataset is good for something.
How quickly the narrative went from 'Google silently has the most advanced AI but they are afraid to release it' to 'Google is silently catching up' all using the same 'core Google competencies' to infer Google's position of strength. Wonder what the next lower level of Google silently leveraging their strength will be?
Google is clearly catching up. Have you tried the recent Gemini models? Have you tried deep research? Google is like a ship that is hard to turn around but also hard to stop once in motion.
Never underestimate Google's ability to fall flat on their face when it comes to shipping products.
If you watch this video, it explains well what the major difference is between DeepSeek and existing LLMs: https://www.youtube.com/watch?v=DCqqCLlsIBU
It seems like there is MUCH to gain by migrating to this approach - and it theoretically should not cost more to switch to that approach than vs the rewards to reap.
I expect all the major players are already working full-steam to incorporate this into their stacks as quickly as possible.
IMO, this seems incredibly bad to Nvidia, and incredibly good to everyone else.
I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.
They'll have a slight disadvantage to say Google - who can much more easily switch from GPU to CPU. ChatGPT could have some growing pains there. Google would not.
> I don't think this seems particularly bad for ChatGPT. They've built a strong brand. This should just help them reduce - by far - one of their largest expenses.
Often expenses like that are keeping your competitors away.
Yes, but it typically doesn't matter if someone can reach parity or even surpass you - they have to surpass you by a step function to take a significant number of your users.
This is a step function in terms of efficiency (which presumably will be incorporated into ChatGPT within months), but not in terms of end user experience. It's only slightly better there.
One data point but my subscription for chatgpt is cancelled every time. So I made every month decision to resub. And because the cost of switching is essentially zero - the moment a better service is up there I will switch in an instant.
There are obviously people like you, but I hope you realize this is not the typical user.
That is a fantastic video, BTW.
>Imagine having 300.
Would it not be useful to have multiple independent AIs observing and interacting to build a model of the world? I'm thinking something roughly like the "councelors" in the Civilization games, giving defense/economic/cultural advice, but generalized over any goal-oriented scenario (and including one to take the "user" role). A group of AIs with specific roles interacting with each other seems like a good area to explore, especially now given the downward scalability of LLMs.
This is exactly where Deepseeks enhancements come into play. Essentially deepseek lets the model think out loud via chain of thought (o1 and Claude also do this) but DS also does not supervise the chain of thought, and simply rewards CoT that get the answer correctly. This is just one of the half dozen training optimization that Deepseek has come up with.
Yes; to my understanding that is MoE.
This assumes no (or very small) diminishing returns effect.
I don't pretend to know much about the minutiae of LLM training, but it wouldn't surprise me at all if throwing massively more GPUs at this particular training paradigm only produces marginal increases in output quality.
I believe the margin to expand is on CoT, where tokens can grow dramatically. If there is value in putting more compute towards it, there may still be returns to be captured on that margin.
> If most of NVIDIAs moat is in being able to efficiently interconnect thousands of GPUs
nah. it moat is CUDA and millions of devs using CUDA aka the ecosystem
But if it's not combined with super high end chips with massive margins that moat is not worth anywhere close to 3T USD.
And then some chineese startup create an amazing compiler that takes cuda and moves it to X (AMD, Intel, Asic) and we are back at square one.
So far it seems that the best investment is in RAM producers. Unlike compute the ram requirements seem to be stubborn.
Don't forget that "CUDA" involves more than language constructs and programming paradigms.
With NVDA, you get tools to deploy at scale, maximize utilization, debug errors and perf issues, share HW between workflows, etc. These things are not cheap to develop.
It might not be cheap to develop them but if you can save $10B in hardware costs by doing so you're probably looking at positive ROI.
Yeah, I mean, 9 women can make a baby in a month so why not?
Oh wait, it takes years to do all that and in the meantime you're wasting energy on not staying at the forefront of a hot tech trend.
Running a 680-billion parameter frontier model on a few Macs (at 13 tok/s!) is nuts. That'a two years after ChatGPT was released. That rate of progress just blows my mind.
And those are M2 Ultras. M4 Ultra is about to drop in the next few weeks/months, and I'm guessing it might have higher RAM configs, so you can probably run the same 680b on two of those beasts.
The higher performing chips, with one less interconnect, is going to give you significantly higher t/s.
Link has all the params but running at 4 bit quant.
4-bit quant is generally kinda low, right?
I wonder how badly this quant affects the output on DeepSeek?
> NVIDIAs moat
Offtopic, but your comment finally pushed me over the edge to semantic satiation [1] regarding the word "moat". It is incredible how this word turned up a short while ago and now it seems to be a key ingredient of every second comment.
[1] https://en.wikipedia.org/wiki/Semantic_satiation
It is incredible how this word turned up a short while ago…
I’m sure if I looked, I could find quotes from Warren Buffet (the recognized originator of the term) going back a few decades. But your point stands.
The earliest occurrence of the word "moat" that I could find online from Buffett is from 1986: https://www.berkshirehathaway.com/letters/1986.html That shareholder letter is charmingly old-school.
Unfortunately letters before 1977 weren't available online so I wasn't able to search.
It also helps that I've been to several cities with an actual moat so this word is familiar to me.
Yeah, he's been talking about "economic moats" since at least the 1990s. At least since 1995;
https://www.berkshirehathaway.com/letters/1995.html
Nobody claimed it's a new word. Still, the frequency increased 100x over the last days, subjectively speaking.
The word moat was first used in english in the 15th century https://www.merriam-webster.com/dictionary/moat
Yes my wording was rubbish I should have said "tuned up" in the HN bubble. Quick ctrl-f shows 35 uses in this thread without loading all comments.
I did not mean that it was literally invented a short while ago - a few months ago I had to look up what it means though (not native English).
https://en.wikipedia.org/wiki/Frequency_illusion
I'm struggling to understand how a moat can have a CRACK in it.
perhaps if the moat is kept in place by some sort of berm or quay
This is excellent writing.
Even if you have no interest at all in stock market shorting strategies there is plenty of meaty technical content in here, including some of the clearest summaries I've seen anywhere of the interesting ideas from the DeepSeek v3 and R1 papers.
Thanks Simon! I’m a big fan of your writing (and tools) so it means a lot coming from you.
I was excited as soon as I saw the domain name. Even after a few months, this article[1] is still at the top of my mind. You have a certain way of writing.
I remember being surprised at first because I thought it would feel like a wall of text. But it was such a good read and I felt I gained so much.
1: https://youtubetranscriptoptimizer.com/blog/02_what_i_learne...
I was put off by the domain by bias against something that sounds like a company blog. Especially a "YouTube something".
You may get more milage from excellent writing on a yourname.com. This is a piece that sells you not this product, plus it feels more timeless. In 2050 someone my point to this post. Better if it were on your own name.
I had no idea this would get so much traction. I wanted to enhance my organic search ranking of my niche web app, not crash the global stock market!
I really appreciate that, thanks so much!
Many thanks for writing this - its extremely interesting and very well written - I feel like I've been brought up to date which is hard in AI world!
I'm curious if someone more informed than me can comment on this part:
> Besides things like the rise of humanoid robots, which I suspect is going to take most people by surprise when they are rapidly able to perform a huge number of tasks that currently require an unskilled (or even skilled) human worker (e.g., doing laundry ...
I've always said that the real test for humanoid AI is folding laundry, because it's an incredibly difficult problem. And I'm not talking about giving a machine clothing piece-by-piece flattened so it just has to fold, I'm talking about saying to a robot "There's a dryer full of clothes. Go fold it into separate piles (e.g. underwear, tops, bottoms) and don't mix the husband's clothes with the wife's". That is, something most humans in the developed world have to do a couple times a week.
I've been following some of the big advances in humanoid robot AI, but the above task still seems miles away given current tech. So is the author's quote just more unsubstantiated hype that I'm constantly bombarded with in the AI space, or have there been advancements recently in robot AI that I'm unaware of?
https://physicalintelligence.company is working on this – see a demo where their robot does ~exactly what you said, I believe based on a "generalist" model (not pretrained on the tasks): https://www.youtube.com/watch?v=J-UTyb7lOEw
That's the same video I commented on below: https://news.ycombinator.com/item?id=42844967
There's a huge gulf between what is shown in that video and what is needed to replace a human doing that task.
There are so many cuts in that 1 minute video, Jesus Christ. You'd think it was produced for TikTok.
There's a laundry folding section at the end of that isn't cut. Looks reasonably impressive, if your standard is slightly above that of a teenager
2 months ago, Boston Dynamics' Atlas was barely able to put solid objects in open cubbies. [1] Folding, hanging, and dresser drawer operation appears to be a few years out still.
https://www.youtube.com/watch?v=F_7IPm7f1vI
I saw such robot's demos doing exactly that on youtube/x - not very precisely yet, but almost sufficiently enough. And it is just a beginning. Considering that majority of the laundry is very similar (shirts, t-shirts, trousers, etc..) I think this will be solved soon with enough training.
Can you share what you've seen? Because from what I've seen, I'm far from convinced. E.g. there is this, https://youtube.com/shorts/CICq5klTomY , which nominally does what I've described. Still, as impressive as that is, I think the distance from what that robot does to what a human can do is a lot farther than it seems. Besides noticing that the folded clothes are more like a neatly arranged pile, what about all the edge cases? What about static cling? Can it match socks? What if something gets stuck in the dryer?
I'm just very wary of looking at that video and saying "Look! It's 90% of the way there! And think how fast AI advances!", because that critical last 10% can often be harder than the first 90% and then some.
First problem with that demo is that putting all your clothes in a dryer is a very American thing. Much of the world pegs their washing on a line.
> The beauty of the MOE model approach is that you can decompose the big model into a collection of smaller models that each know different, non-overlapping (at least fully) pieces of knowledge.
I was under the impression that this was not how MoE models work. They are not a collection of independent models, but instead a way of routing to a subset of active parameters at each layer. There is no "expert" that is loaded or unloaded per question. All of the weights are loaded in VRAM, its just a matter of which are actually loaded to the registers for calculation. As far as I could tell from the Deepseek v3/v2 papers, their MoE approach follows this instead of being an explicit collection of experts. If thats the case, theres no VRAM saving to be had using an MOE nor an ability to extract the weights of the expert to run locally (aside from distillation or similar).
If there is someone more versed on the construction of MoE architectures I would love some help understanding what I missed here.
Not sure about DeepSeek R1, but you are right in regards to previous MoE architectures.
It doesn’t reduce memory usage, as each subsequent token might require different expert buy it reduces per token compute/bandwidth usage. If you place experts in different GPUs, and run batched inference you would see these benefits.
Is there a concept of an expert that persists across layers? I thought each layer was essentially independent in terms of the "experts". I suppose you could look at what part of each layer was most likely to trigger together and segregate those by GPU though.
I could be very wrong on how experts work across layers though, I have only done a naive reading on it so far.
This doesn't sound like it would work if you're running just one chat, as you need all the experts loaded at once if you want to avoid spending lots of time loading and unloading models. But at scale with batches of requests it should work. There's some discussion of this in 2.1.2 but it's beyond my current ability to comprehend!
Ahh got it, thanks for the pointer. I am surprised there is enough correlation there to allow an entire GPU to be specialized. I'll have to dig in to the paper again.
It does. They have 256 experts per MLP layer, and some shared ones. The minimal deployment for decoding (aka. token generation) they recommend is 320 GPUs (H800). It is all in the DeepSeek v3 paper that everyone should read rather than speculating.
Got it. I’ll review the paper again for that portion. However, it still sounds like the end result is not VRAM savings but efficiently and speed improvements.
Yeah, if you look DeepSeek v3 paper deeper, each saving on each axis is understandable. Combined, they reach some magic number people can talk about (10x!): FP8: ~1.6 to 2x faster than BF16 / FP16; MLA: cut KV cache size by 4x (I think); MTP: converges 2x to 3x faster; DualPipe: maybe ~1.2 to 1.5x faster.
If you look deeper, many of these are only applicable to training (we already do FP8 for inference, MTP is to improve training convergence, and DualPipe is to overlapping communication / compute mostly for training purpose too). The efficiency improvement on inference IMHO is overblown.
I don't think entire GPU is specialised nor a singular token will use the same expert. I think about it as a gather-scatter operation at each layer.
Let's say you have an inference batch of 128 chats, at layer `i` you take the hidden states, compute their routing, scatter them along with the KV for those layers among GPUs (each one handling different experts), the attention and FF happens on these GPUs (as model params are there) and they get gathered again.
You might be able to avoid the gather by performing the routing on each of the GPUs, but I'm generally guessing here.
This is a humble and informed acrticle (comparing to others written by financial analysts the past a few days). But still have the flaw of over-estimating efficiency of deploying a 687B MoE model on commodity hardware (to use locally, cloud providers will do efficient batching and it is different): you cannot do that on any single Apple hardware (need to at least hook up 2 M2 Ultra). You can barely deploy that on desktop computers just because non-register DDR5 can have 64GiB per stick (so you are safe with 512 RAM). Now coming to PCIe bandwidth: 37B per token activation means exactly that, each activation requires new set of 37B weights, so you need to transfer 18GiB per token into VRAM (assuming 4-bit quant). PCIe 5 (5090) have 64GB/s transfer speed so your upper bound is limited to 4 tok/s with a well balanced propose built PC (and custom software). For programming tasks that usually requires ~3000 tokens for thinking, we are looking at 12 mins per interaction.
Is it really 37B different parameters for each token? Even with the "multi-token prediction system" that the article mentions?
I don't think anyone uses MTP for inference right now. Even if you use MTP for drafting, you need to batching in the next round to "verify" it is the right token, if that happens you need to activate more experts.
DELETED: If you don't use MTP for drafting, and use MTP to skip generations, sure. But you also need to evaluate your use case to make sure you don't get penalized for doing that. Their evaluation in the paper don't use MTP for generation.
EDIT: Actually, you cannot use MTP other than drafting because you need to fill in these KV caches. So, during generation, you cannot save your compute with MTP (you save memory bandwidth, but this is more complicated for MoE model due to more activated experts).
Great article. I still feel like very few people are viewing the Deepseek effects in the right light. If we are 10x more efficient it's not that we use 1/10th the resources we did before, we expand to have 10x the usage we did before. All technology products have moved this direction. Where there is capacity, we will use it. This argument would not work if we were close to AGI or something and didn't need more, but I don't think we're actually close to that at all.
Correct. This effect is known in economics since forever - new technology has
- An "income effect". You use the thing more because it's cheaper - new usecases come up
- A "substitution effect." You use other things more because of the savings.
I got into this on labor economics here [1] - you have counterintuitive examples with ATMs actually increasing the number of bank branches for several decades.
[1]: https://singlelunch.com/2019/10/21/the-economic-effects-of-a...
This is called Jevons Paradox.
https://en.wikipedia.org/wiki/Jevons_paradox.
Yep. I’ve been harping on this. DeepSeek is bullish for Nvidia.
>DeepSeek is bullish for Nvidia.
DeepSeek is bullish for the semiconductor industry as a whole. Whether it is for Nvidia remains to be seen. Intel was in Nvidia position in 2007 and they didn't want to trade margins for volumes in the phone market. And there they are today.
Why wouldn't it be for Nvidia? Explain more.
Well so far the paradigm is for powerful silicone that only Nvidia could deliver - so they could charge high margins.
Theoretically they could be on top if the paradigm changes to big volume slower and lower margin one. But there may be another winner.
At the end of the day, it all boils down to value.
Do AMD chips offer more value than Nvidia chips?
Would this not mean we need much much more training data to fully utilize the now "free" capacities?
It's pretty clear that the reasoning models are using mass amounts of synthetic data so it's not a bottleneck.
Great, now I can rewrite 10x more emails or solve 10x more graduate level programming tasks (mostly incorrectly). Brave new world.
[dead]
Man, do I love myself a deep, well-researched long-form contrarian analysis published as a tangent of an already niche blog on a Sunday evening! The old web isn't dead yet :)
Hah thanks, that’s my favorite piece of feedback yet on this.
This was an amazing summary of the landscape of ML currently.
I think the title does the article injustice, or maybe it’s too long for people to read to appreciate it (eg the deepseek stuff can be an article within itself).
Whatever the ones with longer attention span will benefit from this read.
Thanks for summarising this up!
The site is currently offline, here's a snapshot:
https://archive.today/y4utp
We've changed the title to a different one suggested by the author.
Thanks! I was a bit disappointed that no one saw it on HN because I think they’d like it a lot.
I think they would like it a lot, but I think the title doesn’t match the content, and it takes too much reading before one realises it goes beyond the title.
Keep it up!
I'm wondering if there's a (probably illegal) strategy in the making here:
this is exactly what DeepSeek is doing, the only difference is they built the real model, not a fake one.
- Fail at the above.
I don’t think this is what happened with DeepSeek. It seems that they’ve genuinely optimized their model for efficiency and used GPUs properly (tiled FP8 trick and FP8 training). And came out on top.
The impact on the NVIDIA stock is ridiculous. DeepSeek took the advantage of flexible GPU architecture (unlike inflexible hardware acceleration).
This is what I still don't understand, how much of what they claim has been actually replicated? From what I understand the "50x cheaper" inference is coming from their pricing page, but is it actually 50x cheaper than the best open source models?
50x cheaper than OpenAI's pricing on an open source model which doesn't require giving that quality level up. The best open source models were much closer in pricing but V3/R1 are that way while being a results topper.
I'm rooting for DeepSeek (or any competitor) against OpenAI because I don't like Sam Altman. I'm confident in admitting it.
The enemy of your enemy is only temporarily your friend.
As a European I really don’t see the difference between US and Chinese tech right now - the last week from Trump has made me feel more threatened from the US than I ever have been by China (Greenland, living in a Nordic country with treaties to defend it).
I appreciate China has censorship, but the US is going that way too (recent “issues” for search terms). Might be different scales now, but I think it’ll happen. I don’t care as much if a Chinese company wins the LLM space than I did last year.
Indeed! Just ask DeepSeek something about Tiananmen or Taiwan. Answering seems to be an absolute "no-brainer" for it.
Wise words from the epoch of time.
[dead]
I really don't think he's a bad guy. He helped accelerate timelines and backed this tech when it was still a dream. Maybe he's not the brains behind it but he's been the brawn, and I think people should try to be more charitable and gracious about him rather than constantly vilify him.
People want a villain.
I used to own several adult companies in the past. Incredible huge margins and then along came Pornhub and we could barely survive after it as we did not adapt.
With Deepseek this is now the 'Pornhub of AI' moment. Adapt or die.
That analogy would be if right if a startup could dredge beach sand and pump put trillions of AI chips.
What actually happened was a better algorithm was created and people are betting against the main game in town for running said algorithm.
If someone came up with a CPU-superior AI that'd be worrying for NVidia.
Groq lpu interference chip?
You heard my 26khz whistle!
Curious what Pornhub did better, if you're able to say. Provide content at much lower cost, like DeepSeek?
Yes. close to free content
They understood the Dmca brilliantly so they did bulk cheap content purchases and hid behind the Dmca for all non licensed content which was "uploaded by users". They did bulk purchases of cheap content from some studios but that was just a fraction
Of course their risk of going advertise revenue only was high and in the beginning mostly only cam providers would advertise
Our problem was that we had contracts and close relationships with all the big studios so going the Dmca route would have severed these ties for an unknown risk. In hindsight not creating a company which did abuse the Dmca was the right decision. I am very loyal and it would have felt like cheating
Now it's a different story after the credit card shake down when they had to remove millions of videos and be able to provide 2257 documentation for each video
This is an excellent article, basically a patio11 / matt levine level breakdown of what's happening with the GPU market.
Couldn't agree more! If this is the byproduct, these must be some optimized Youtube transcripts :)
For sure NVIDIA is priced for perfection perhaps more than any of the other of similar market value.
I think two threats are the biggest:
First Apple. TSMC’s largest customer. They are already making their own GPUs for their data centers. If they were to sell these to others they would be a major competitor.
You would have the same GPU stack on your on phone, laptop, pc, and data center. Already big developer mind share. Also useful in a world where LLMs run (in part) on the end user’s local machine (like Apple Intelligence).
Second is China - Huawei, Deepseek etc.
Yes - there will be no GPUs from Huawei in the US in this decade. And the Chinese won’t win in a big massive battle. Rather it is going to be death by a thousand cuts.
Just as what happened with the Huawei Mate 60. It is only sold in China but today Apple is loosing business big time in China.
In the same manner OpenAi and Microsoft will have their business hurt by Deepseek even if Deepseek was completely banned in the west.
Likely we will see news on Chinese AI accelerators this year and I wouldn’t be surprised if we soon saw Chinese hyperscalars offering cheaper GPU cloud compute than the west due to a combination of cheaper energy, labor cost, and sheer scale.
Lastly AMD is no threat to NVIDIA as they are far behind and follow the same path with little way of differentiating themselves.
English economist William Stanley Jevons vs the author of the article.
Will NVIDIA be in trouble because of DSR1 ? Interpreting Jevon’s effect, if LLMs are “steam engines” and DSR1 brings 90% efficiency improvement for the same performance, more of it will be deployed. This is not considering the increase due to <think> tokens.
More NVIDIA GPUs will be sold to support growing use cases of more efficient LLMs.
The most important part for me is:
> DeepSeek is a tiny Chinese company that reportedly has under 200 employees. The story goes that they started out as a quant trading hedge fund similar to TwoSigma or RenTec, but after Xi Jinping cracked down on that space, they used their math and engineering chops to pivot into AI research.
I guess now we have the answer to the question that countless people have already asked: Where could we be if we figured out how to get most math and physics PhDs to work on things other than picking up pennies in front of steamrollers (a.k.a. HFT) again?
DeepSeek is a subsidiary of a relatively successful Chinese quant trading firm. It was the boss' weird passion project, after he made a few billion yuan from his other passion, trading. The whole thing was funded by quant trading profits, which kind of undermines your argument. Maybe we should just let extremely smart people work on the things that catch their interest?
Interest of extremely smart people is often is strongly correlated with potential profits, and these are very much correlated with policy, which in the case of financial regulation shapes market structures.
Another way of saying this: It's a well-known fact that complicated puzzles with a potentially huge reward attached to them attract the brightest people, so I'm arguing that we should be very conscious of the types of puzzles we implicitly come up with, and consider this an externality to be accounted for.
HFT is, to a large extent, a product of policy, in particular Reg NMS, based on the idea that we need to have many competing exchanges to make our markets more efficient. This has worked well in breaking down some inefficiencies, but has created a whole set of new ones, which are the basis of HFT being possible in the first place.
There are various ideas on whether different ways of investing might be more efficient, but these largely focus on benefits to investors (i.e. less money being "drained away" by HFT). What I'm arguing is that the "draining" might not even be the biggest problem, but rather that the people doing it could instead contribute to equally exciting, non-zero sum games instead.
We definitely want to keep around the the part of HFT that contributes to more efficient resource allocation (an inherently hard problem), but wouldn't it be great if we could avoid the part that only works around the kinks of a particular market structure emergent from a particular piece of regulation?
This is completely fake though. It was more like their founder decided to start a branch to do AI research. It was well planned, they bought significantly more GPUs than they can use for quant research even before they start to do anything AI.
There was a crack down on algorithmic trading, but it didn't had much impact and IMO someone higher up definitely does not want to kill these trading firms.
The optimal amount of algorithmic trading is definitely more than none (I appreciate liquidity and price quality as much as the next guy), but arguably there's a case here that we've overshot a bit.
The price data I (we?) get is 15 minute delayed. I would guess most of the profiteering is from consumers not knowing the last transaction prices? I.e. an artificially created edge by the broker who then sells the API to clean their hands of the scam.
Real-time price data is indeed not free, but widely available even in retail brokerages. I've never seen a 15 minute delay in any US based trade, and I think I can even access level 2 data a limited number of times on most exchanges (not that it does me much good as a retail investor).
> I would guess most of the profiteering is from consumers not knowing the last transaction prices?
No, not at all. And I wouldn't even necessarily call it profiteering. Ironically, as a retail investor you even benefit from hedge funds and HFTs being a counterpart to your trades: You get on average better (and worst case as good) execution from PFOF.
Institutional investors (which include pension funds, insurances etc.) are a different story.
OK ty I guess I got it wrong. I thought it was way more common than for my scrappy bank.
Who knows? That too is a bunch of mythmaking. One thing's for sure, there are no moats or secrets.
Well, I know. I still have connections back there. But yeah, I'm just a random guy on the Internet so what I said could be just myth too.
Interestingly a lot of the math and physics people in the ML community are considered "grumpy researchers." A joke apparent with this starter pack[0].
From my personal experience (undergrad physics, worked as engineer, came to CS & ML because I liked the math), there's a lot of pushback.
I've heard this from my advisor, dissertation committee, bosses[1], peers, and others (of course, HN). If my experience is short of being rare, I think it explains the grumpy group[2]. But I'm also not too surprised with how common it is in CS for people to claim that everything is easy or that leet code is proof of competence (as opposed to evidence).I think unfortunately the problem is a bit bigger, but it isn't unsolvable. Really, it is "easily" solvable since it just requires us to make different decisions. Meaning _each and every one of us_ has a direct impact on making this change. Maybe I'm grumpy because I want to see this better world. Maybe I'm grumpy because I know it is possible. Maybe I'm grumpy because it is my job to see problems and try to fix them lol
[0] https://bsky.app/starter-pack/roydanroy.bsky.social/3lba5lii... (not perfect, but there's a high correlation and I don't think that's a coincidence)
[1] Even after _demonstrating_ how my points directly improve the product, more than doubling performance on _customer_ data.
[2] not to mention the way experiments are done, since it is stressed in physicists that empirics is not enough. https://www.youtube.com/watch?v=hV41QEKiMlM
Is this in academia?
Arguably, the emergence of quant hedge funds and private AI research companies is at least as much a symptom of the dysfunctions of academia (and society's compensation of academics on dimensions monetary and beyond) as it is of the ability of Wall Street and Silicon Valley to treat former scientists better than that.
The thing is that there's always a pipeline. Academic does most of the low level research, say TRL[2] 1-4, partnerships happen between 4-6, and industry takes over the rest. (with some wiggleroom on these numbers). Much of ML academic research right now is tuning large models, made by big labs. This isn't low TRL. Additionally, a lot of research is rejected for not out-performing technologies that are already at TRL 5-7. See Mamba for a recent example. You could also point to KANs, which are probably around TRL 3.
Which is where I, again, both agree and disagree. It is not _just_ a symptom of the dysfunction of academia, but _also_ industry. The reason I pointed out the grumpy researchers is because a lot of these people have been discussing techniques that DeepSeek used, long before they were used. DeepSeek looks like what happens when you set these people free. Which is my argument, that we should do that. Scale Maximalists (also alled "Bitter Lesson Maximalists", but I dislike the term) have been dominating ML research, and DeepSeek shows that scale isn't enough. So will hopefully give the mathy people more weight. But then again, is not the common way monopolies fall is because they become too arrogant and incestuous?So mostly, I agree, I'm just pointing out that there is a bit more subtly and I think we need to recognize that to make progress. There are a lot of physicists and mathy people who like ML and have been doing research in the area but are often pushed out because of the thinking I listed. Though part of the success of the quant industry is recognizing that the strong math and modeling skills of physicists generalize pretty well and you go after people who recognize that an equation that describes a spring isn't only useful for springs, but is useful for anything that oscillates. That understanding of math at that level is very powerful and boy are there a lot of people that want the opportunity to demonstrate this in ML, they just never get similar GPU access.
[0] https://www.ft.com/content/d5f91c27-3be8-454a-bea5-bb8ff2a85...
[1] https://archive.is/20241125132313/https://www.thewrap.com/un...
[2] https://en.wikipedia.org/wiki/Technology_readiness_level
This story could be applied to every tech breakthrough. We start where the breakthrough is moated by hardware, access to knowledge, and IP. Over time:
- Competition gets crucial features into cheaper hardware
- Work-arounds for most IP are discovered
- Knowledge finds a way out of the castle
This leads to a "Cambrian explosion" of new devices and software that usually gives rise to some game-changing new ways to use the new technology. I'm not sure where we all thought this somehow wouldn't apply to AI. We've seen the pattern with almost every new technology you can think of. It's just how it works. Only the time it takes for patents to expire changes this... so long as everyone respects the patent.
Yes this is exactly right. All you need is the right incentives and enough capital and markets will find away to breech any moat that’s not enforced via regulations.
It's still wild to me that toasters have always been $20 but extremely expensive lasers, digital chips, amps, motors, LCD screens worked their way down to $20 CD players.
So... Electric toasters came to market in the 1920s, priced from $15, eventually getting as low as $5. Adjusting for inflation, that $15 toaster cost $236.70 in 2025 USD. Today's $15 toaster would be about 90¢ in 1920s dollars... so it follows the story.
On average toasters have always been $20. Wasn't $5 an outlier during dotCom crash homegoods firesales? There are some outliers. I just think it's wild that some coils cost the same as a radioactive laser, ICs, amps, motors, etc. There's a certain minimum cost and the complexity doesn't matter.
Invention is expensive. Innovation is less expensive. Production is (usually) the cheap part. Once the invention and innovation is paid off, it's just production...
The beginning of the article was good, but the analysis of DeepSeek and what it means for Nvidia is confused and clearly out of the loop.
His DeepSeek argument was essentially that experts who look at the economics of running these teams (eg. ha ha the engineers themselves might dabble) are looking over the hedge at DeepSeek's claims and they are really awestruck
Where do you have this "capacity" limit from? I can get as many H100s from GCP or wherever as I wish, the only thing that is capacity limited are 100k clusters ala ELON+X, but what DeepSeek (and the recent evidence of a limit in pure base-model scaling) shows is that this might actually not be profitable, and we end up with much smaller base models scaled at inference time. The moat for Nvidia in this inference time scaling is much smaller, also you don't need the humongous clusters for that either you can just distribute the inference (and in the future run it locally too).
Asking GCP to give you H100s on-demand is nowhere near cost efficient.
What's your GPU quota in GCP? How did you get it increased that much?
Part of the reason Musk, Zuckerberg, Ellison, Nadella and other CEOs are bragging about the number of GPUs they have (or plan to have) is to attract talent.
Perplexity CEO says he tried to hire an AI researcher from Meta, and was told to ‘come back to me when you have 10,000 H100 GPUs’
See https://www.businessinsider.nl/ceo-says-he-tried-to-hire-an-...
Maybe DeepSeek ain't it, but I expect a big "box of scraps"[1] moment soon. Constraint is mother of invention, and they are evading constraints with a promise of never-ending scale.
[1] https://youtu.be/9foB2z_OVHc?si=eZSTMMGYEB3Nb4zI
That's a weird way to read into it.
This reminds of the joke in physics, in which theoretical particle physicists told experimental physicists, over and over again, "trust me bro, standard model will be proved at 10x eV, we just need a bigger collider bro" after another world's biggest collider is built.
Wondering if we are in a similar position with "trust me bro AGI will be achieved with 10x more GPUs".
The difference is the AI researchers have clear plots showing capabilities scaling with GPUs and there's not a sign that it is flattening so they actually have a case for saying that AGI is possible at N GPUs.
Sauce? How do you even measure "capabilities" in that regard, just writing answers to standard tests? Because being able to ace a test doesn't mean it's AGI, it means its good at taking standard tests.
This is the canonical paper. Nothing I've seen seems to indicate the curves are flattening, you can ask "scaling what" but the trend is clear.
https://arxiv.org/pdf/2001.08361
Sorry, my blog crashed! Had a stupid bug where it was calling GitHub too frequently to pull in updated markdown for the posts and kept getting rate limits. Had to rewrite it but it should be much better now.
> Amazon gets a lot of flak for totally bungling their internal AI model development, squandering massive amounts of internal compute resources on models that ultimately are not competitive, but the custom silicon is another matter
Juicy. Anyone have a link or context to this? I'd not heard of this reception to NOVA and related.
I think Nova may have changed things here. Prior to Nova their LLMs were pretty rubbish - Nova only came out in December but seems a whole lot better, at least from initial impressions: https://simonwillison.net/2024/Dec/4/amazon-nova/
Thanks! That's consistent with my impression.
The point about using FP32 for training is wrong. Mixed precision (FP16 multiplies, FP32 accumulates) has been use for years – the original paper came out in 2017.
Fair enough, but that still uses a lot more memory during training than what DeepSeek is doing.
This just in.
Competition lowers the value of monopolies.
This is such a great read. The only missing facet of discussion here is that there is a valuation level of NVDA such that it would tip the balance of military action by China against Taiwan. TSMC can only drive so much global value before the incentive to invade becomes irresistible. Unclear where that threshold is; if we’re being honest, could be any day.
Very interesting and it seems like there is more room for optimizations for WASM using SIMD, boosting performance by a lot! It's cool to see how AI can now run even faster on web browsers.
Microsoft did a bunch of research into low-bit weights for models. I guess OAI didn’t look at this work.
https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7...
The R1 paper (https://arxiv.org/pdf/2501.12948) emphasizes their success with reinforcement learning without requiring any supervised data (unlike RLHF for example). They note that this works well for math and programming questions with verifiable answers.
What's totally unclear is what data they used for this reinforcement learning step. How many math problems of the right difficulty with well-defined labeled answers are available on the internet? (I see about 1,000 historical AIME questions, maybe another factor of 10 from other similar contests). Similarly, they mention LeetCode - it looks like there are around 3000 LeetCode questions online. Curious what others think - maybe the reinforcement learning step requires far less data than I would guess?
> While Apple's focus seems somewhat orthogonal to these other players in terms of its mobile-first, consumer oriented, "edge compute" focus, if it ends up spending enough money on its new contract with OpenAI to provide AI services to iPhone users, you have to imagine that they have teams looking into making their own custom silicon for inference/training
This is already happening today. Most of the new LLM features announced this year are primarily on-device, using the Neural Engine, and the rest is in Private Cloud Compute, which is also using Apple-trained models, on Apple hardware.
The only features using OpenAI for inference are the ones that announce the content came from ChatGPT.
"if it ends up spending enough money on its new contract with OpenAI to provide AI services to iPhone users"
John Gruber says neither Apple nor OpenAI are paying for that deal: https://daringfireball.net/linked/2024/06/13/gurman-openai-a...
Mark Gurman (from Bloomberg) is saying that.
First of all, I don't invest in Nvidia and like Olygopols. But it is too early to talk about Nvidia's future. People are just betting and wishing about Nvidia's future. No one knows people's what people will do in the future. what they will think? It's just guessing and betting. Their real competitor is not Deepseek. Did AMD or others release something new and compete with Nvidia's products? If NVDIA will be the market leader, this means they will lead the price. Being Olygopol is something like that. They don't need to compete for the price of competitors.
DeepSeek is not the black swan
NVDA was overpriced a lot already even without r1, the market is full of air GPUs hiding in the capex of tech giants like MSFT.
If orders are canceled or delivery fails for any reason, NVDA’s EPS would be pulled back to its fundamentally justified level
or if all those air GPUs are produced and delivered in recent years, and the demand keeps rising? well, that will be a crazy world then
it's a finance game, not related with the real world
When he says better linux drivers than AMD he's strictly talking about for AI, right? Because for video the opposite has been the case for as far back as I can remember.
Yes, AMD drivers work fine for games and things like that. Their problem is they basically only focused on games and other consumer applications and, as a result, ceded this massive growth market to Nvidia. I guess you can sort of give them a pass because they did manage to kill their archival Intel in data center CPUs but it’s a massive strategic failure if you look at how much it has cost them.
>With the advent of the revolutionary Chain-of-Thought ("COT") models introduced in the past year, most noticeably in OpenAI's flagship O1 model (but very recently in DeepSeek's new R1 model, which we will talk about later in much more detail), all that changed. Instead of the amount of inference compute being directly proportional to the length of the output text generated by the model (scaling up for larger context windows, model size, etc.), these new COT models also generate intermediate "logic tokens"; think of this as a sort of scratchpad or "internal monologue" of the model while it's trying to solve your problem or complete its assigned task.
Is this right? I thought CoT was a prompting method and are we calling the reasoning models as CoT models?
Reasoning models are a result of the learnings from CoT prompting.
I'm curious what are the key differences between "a reasoning model" and good old CoT prompting. Is there any reason to believe that the fundamental limitations of prompting don't apply to "reasoning models"? (hallucinations, plainly wrong output, bias towards to training data mean etc.)
The level of sophistication for CoT model varies. "good old CoT prompting" is you hoping the model generates some reasoning tokens prior to the final answer. When it did, the answers tended to be better for certain class of problems. But you had no control over what type of reasoning tokes it was generating. There were hypothesis that just having a <pause> tokens in between generated better answers as it allowed n+1 steps to generate an answer over n. I would consider Meta's "continuous chain of thought" to be on the other end of "good old CoT prompting" where they are passing back the next tokens from the latent space back to the model getting a "BHF" like effect. Who knows what's happening with O3 and Anthropics O3 like models.. The problems you mentioned is very broad and not limited to prompting. Reasoning models tend to outperform older models on math problems. So I'd assume it does reduce hallucination on certain class of problems.
NVIDIA sells shovels to the gold rush. One miner (Liang Wenfeng), who has previously purchased at least 10,000 A100 shovels... has a "side project" where they figured out how to dig really well with a shovel and shared their secrets.
The gold rush, wether real or a bubble is still there! NVIDA will still sell every shovel they can manufacture, as soon as it is available in inventory.
Fortune 100 companies will still want the biggest toolshed to invent the next paradigm or to be the first to get to AGI.
I think the biggest threat for future NVIDIa right now is their own current success.
Their software platforms and CUDA are a very strong moat against everyone else. I don't see any beating them on that front right now.
The problem is that I'm afraid that all that money sloshing inside the company is rotting the culture and that will compromise future development.
NVIDIA used to be extremely nimble and was way fighting way above it's weight class. Prior to Mellanox acquisition only around 10k employees and after another 10k more.If there's a real threat to their position at the top of the AI offerings will they be able to roll up the sleeves and get back to work or will the organizations be unable to move ahead.
Long term I think it's inevitable that China will take over the technology leadership. They have the population and they have the education programs and the skill to do this. At the same time in the old western democracies things are becoming stagnant and I even dare to say that the younger generations are declining. In my native country the educational system has collapsed, over 20% kids that finish elementary school cannot read or write. They can mouth-breath and scroll TikTok though but just barely since their attention span is about the same as gold fish.
LOL. This isn't rot, it is reaching the end goal, the people doing the work reach the rewards they were working towards. Rot would imply somehow management should prevent rest and vest but that is the exact model that they acquired their talent on. You would have to remove capitalism from companies when companies win at capitalism making it all just a giant rug pull for employees.
The vast majority of Nvidia's current value is tied to their dominance in AI hardware. That value could be threatened if LLMs could be trained and or ran efficiently using a CPU or a quantum chip. I don't understand enough about the capabilities of quantum computing to know if running or training a LLM would be possible using a quantum chip, but if it becomes possible, NVDA stock is unlikely to fair well (unless they are making the new chip).
reading this gave me a great idea for https://bookhead.net. thanks!!
also thank you for the incredibly informative article.
I always appreciate reading a take from someone who's well versed in the domains they have opinions about.
I think longer-term we'll eat up any slack in efficiency by throwing more inference demands at it -- but the shift is tectonic. It's a cultural thing. People got acclimated to shlepping around morbidly obese node packages and stringing together enormous python libraries - meanwhile the deepseek guys out here carving bits and bytes into bare metal. Back to FP!
This is a bizarre take. First Deepseek no doubt is still using the same bloated Python ML packages as everyone else. Second since this is "open source" it's pretty clear that the big labs are just going to replicate this basically immediately and with their already massive compute advantages put models out that are extra OOM larger/better/etc/ than what Deepseek can possibly put out. Theres just no reason to think that e.g. a 10x increase in training efficiency does anything but increase the size of the next model generation by 10x.
> which require low-latency responses, such as content moderation, fraud detection, dynamic pricing, etc.
Is it even legal to give different prices to different customers?
It depends on what basis. You can't discriminate based on protected classes.
Of course it is. That how the airlines stay in business.
However imagine entering a store where the camera looks up your face in shared database and profiles you as a person who will pay higher prices - and the prices are displayed near you according to your profile...
This is exactly where project digits comes in. Nvidia needs to pivot toward being a local inference platform if they want to survive the next shift.
Nvidia seem to be one step ahead of this and you can see their platform efforts are pushing towards creating large volumes of compute that are easy to manage for whatever your compute requirements are, be that training, inference or whatever comes next and whatever form. People are maybe tackling some of these areas in isolation but you do not want to build datacenters where everything is ringfenced per task or usage.
This is such a comprehensive analysis, thank you. For someone just starting to learn about the field, it’s a great way to understand what’s going on in the industry.
I think this is just a(nother) canary for many other markets in the US v China game of monopoly. One weird effect in all this is that US Tech may go on to be over valued (i.e., disconnect from fundamentals) for quite some time.
> Another very smart thing they did is to use what is known as a Mixture-of-Experts (MOE) Transformer architecture, but with key innovations around load balancing. As you might know, the size or capacity of an AI model is often measured in terms of the number of parameters the model contains. A parameter is just a number that stores some attribute of the model; either the "weight" or importance a particular artificial neuron has relative to another one, or the importance of a particular token depending on its context (in the "attention mechanism").
Has a wide-scale model analysis been performed inspecting the parameters and their weights for all popular open / available models yet? The impact and effects of disclosed inbound data and tuning parameters on individual vector tokens will prove highly informative and clarifying.
Such analysis will undoubtedly help semi-literate AI folks level up and bridge any gaps.
While Nvidia’s valuation may feel bloated due to AI hype, AMD might be the smarter play.
Considering the fact that current models were trained on top-notch books, those read and studied by the most brilliant engineers, the models are pretty dumb.
They are more like the thing which enabled computers to work with and digest text instead of just code. The fact that they can parrot pretty interesting relationships from the texts they've consumed kind of proofs that they are capable of statistically "understanding" what we're trying to talk with them about, so it's a pretty good interface.
But going back to the really valuable content of the books they've been trained on, they just don't understand it. There's other AI which needs to get created which can really learn the concepts taught in those books instead of just the words and the value of the proximities between them.
To learn that other missing part will require hardware just as uniquely powerful and flexible as what Nvidia has to offer. Those companies now optimizing for inference and LLM training will be good at it and have their market share, but they need to ensure that their entire stack is as capable of Nvidia's stack, if they also want to be part of future developments. I don't know if Tenstorrent or Groq are capable of doing this, but I doubt it.
I think it's more than just the market effect on "established" AI players like Nvidia.
I don't think it's necessarily a coincidence that DeepSeek dropped within a short time frame of the announcement of the AI investment initiative by the Trump administration.
The idea is to get the money from investors who want to earn a return. Lower capex is attractive to investors, and DS drops capex dramatically. It makes Chinese AI talent look like the smart, safe bet. Nothing like DS could happen in China unless the powers-that-be knew about it and got some level of control. I'm also willing to bet that this isn't the best they've got.
They're saying "we can deliver the same capabilities for far less, and we're not going to threaten you with a tariff for not complying".
Great article, thanks for writing it! Really great summary of the current state of the AI industry for someone like me who's outside of it (but tangential, given that I work with GPUs for graphics).
The one thing from the article that sticks out to me is that the author/people are assuming that deepseek needing 1/45th the amount of hardware means that the other 44/45ths large tech companies have invested were wasteful.
Does software not scale to meet hardware? I don't see this as 44/45ths wasted hardware, but as a free increase in the amount of hardware people have. Software needing less hardware means you can run even _more_ software without spending more money, not that you need less hardware, right? (for the top-end, non-embedded use cases).
---
As an aside, the state of the "AI" industry really freaks me out sometimes. Ignoring any sort of short or long term effects on society, jobs, people, etc, just the sheer amount of money and time invested into this one thing is, insane?
Tons of custom processing chips, interconnects, compilers, algorithms, _press releases!_, etc all for one specific field. It's like someone taking the last decade of advances in computers, software, etc, and shoving it in the space of a year. For comparison, Rust 1.0 is 10 years old - I vividly remember the release. And even then it took years to propagate out as a "thing" that people were interested in and invested significant time into. Meanwhile deepseek releases a new model (complete with a customer-facing product name and chat interface, instead of something boring and technical), and in 5 days it's being replicated (to at least some degree) and copied by competitors. Google, Apple, Microsoft, etc are all making custom chips and investing insane amounts of money into different compilers, programming languages, hardware, and research.
It's just, kind of disquieting? Like everyone involved in AI lives in another world operating at breakneck speed, with billions of dollars involved, and the rest of us are just watching from the sidelines. Most of it (LLMs specifically) is no longer exciting to me. It's like, what's the point of spending time on a non-AI related project? We can spend some time writing a nice API and working on a cool feature or making a UI prettier and that's great, and maybe with a good amount of contributors and solid, sustained effort, we can make a cool project that's useful and people enjoy, and earns money to support people if it's commercial. But then for AI, github repos with shiny well-written readmes pop up overnight, tons of text is being written, thought, effort, and billions of dollars get burned or speculated on in an instant on new things, as soon as the next marketing release is posted.
How can the next advancement in graphics, databases, cryptography, etc compete with the sheer amount of societal attention AI receives?
Where does that leave writing software for the rest of us?
To me, this seems like we are back again in 1953 and a company just announced they are now capable of building one of IBM's 5 computers for 10% of the price.
I really don't understand the rationale of "We can now train GPT 4o for 10% the price, so that will bring demand for GPUs down.". If I can train GPT 4o for 10% the price, and I have a budget of 1B USD, that means I'm now going to use the same budget and train my model for 10x as long (or 10x bigger).
At the same time, a lot of small players that couldn't properly train a model before, because the starting point was simply out of their reach, will now be able to purchase equipment that's capable of something of note, and they will buy even more GPUs.
P.S. Yes, I know that the original quote "I think there is a world market for maybe five computers", was taken out of context.
P.S.S. In this rationale, I'm also operating under the assumption that Deepseek numbers are real. Which, given the track record of Chinese companies, is probably not true.
Please tell me if I am wrong. I know very little details and heard a few headlines and my hasty conclusion is that this development clearly shows the exponential nature of AI development in terms of how people are able to piggyback from the resources, time and money of the previous iteration. They used the output from chatgpt as the input to their model. Is this true, more or less accurate or off base?
link seems to be dead... is this article still up somewhere?
It's back up, but just in case:
https://archive.is/y4utp
All this is good news for all of us. Bad news probably for Nvidia's margins long term but who cares. If we can train and inference in less cycles and watts that is awesome.
As a bystander it's so refreshing to see this, global tech competition is great for the market and it gives hope that LLMs aren't locked behind Bs of investments and smaller players can compete well as well.
Exciting times to be living in .
see also https://news.ycombinator.com/item?id=42839650
So at some point we will have too many cannon ball polishing factories and it will become apparent the cannon ball trajectory is not easily improved on.
Deepseek iOS app makes TikTok ban pointless.
Interesting take. They are now reading our minds vs looking at our kids and interiors.
yeah, what’s stopping zoom from integrating Deepseek and doing an end run around Microsoft teams.
what a compelling domain name. it compels me not to click on it
Despite the fact that this article is very well written and certainly contains high quality information, I choose to remain skeptical as it pertains to Nvidia's position in the market. I'll come right out and say that my experience likely makes me see this from a biased position.
The premise is simple: Business is warfare. Anything you can do to damage or slow down the market leader gives you more time to get caught up. FUD is a powerful force.
My bias comes from having been the subject of such attacks in my prior tech startup. Our technology was destroying the offerings of the market leading multi-billion-dollar global company that pretty much owned the sector. The natural processes of such a beast caused them not to be able to design their way out of a paper bag. We clearly had an advantage. The problem was that we did not have the deep pockets necessary to flood the market with it and take them out.
What did they do?
The started a FUD campaign.
They went to every single large customer and our resellers (this was a hardware/software product) a month or two before the two main industry tradeshows, and lied to them. They promised that they would show market-leading technology "in just a couple of months" and would add comments like "you might want to put your orders on hold until you see this". We had multi-million dollar orders held for months in anticipation of these product unveilings.
And, sure enough, they would announce the new products with a great marketing push at the next tradeshow. All demos were engineered and manipulated to deceive, all of them. Yet, the incredible power of throwing millions of dollars at this effort delivered what they needed, FUD.
The problem with new products is that it takes months for them to be properly validated. So, if the company that had frozen a $5MM order for our products decides to verify the claims of our competitor, it typically took around four months. In four months, they would discover that the new shiny object was shit and less stellar than what they were told. I other words, we won. Right?
No!
The mega-corp would then reassure them that they iterated vast improvements into the design and those would be presented --I kid you not-- at the next tradeshow. Spending millions of dollars they, at this point, denied us of millions of dollars of revenue for approximately one year. FUD, again.
The next tradeshow came and went and the same cycle repeats...it would take months for customers to realize the emperor had no clothes. It was brutal to be on the receiving end of this without the financial horsepower to be able to break through the FUD. It was a marketing arms race and we were unprepared to win it. In this context, the idea that a better mouse trap always wins is just laughable.
This did not end well. They were not going to survive another FUD cycle. Reality eventually comes into play. Except that, in this case, 2008 happened. The economic implosion caught us in serious financial peril due to the damage done by the FUD campaign. Ultimately, it was not survivable and I had to shut down the company.
It took this mega-corp another five years to finally deliver a product that approximated what we had and another five years after that to match and exceed it. I don't even want to imagine how many hundreds of millions they spent on this.
So, long way of saying: China wants to win. No company in China is independent from government forces. This is, without a doubt, a war for supremacy in the AI world. It is my opinion that, while the technology, as described, seems to make sense, it is highly likely that this is yet another form of a FUD campaign to gain time. If they can deny Nvidia (and others) the orders needed to maintain the current pace, they gain time to execute on a strategy that could give them the advantage.
Time will tell.
1. More efficient LLMs should lead to more usage, which means more AI chip demand. Jevon's Paradox.
2. Even if DeepSeek is 45x more efficient (it is not), models will just become 45x+ bigger. It won’t stay small.
3. To build a moat, OpenAI and American AI companies need to up their datacenter spending even more.
4. DeepSeek's breakthrough is in distilling models. You still need a ton of compute to train the foundational model to distill.
5. DeepSeek's conclusion in their paper says more compute is needed for next break through.
6. DeepSeek's model is trained on GPT4o/Sonnet outputs. Again, this reaffirms the fact that in order to take the next step, you need to continue to train better models. Better models will generate better data for next-gen models.
I think DeepSeek hurts OpenAI/Anthropic/Google/Microsoft. I think DeepSeek helps TSMC/Nvidia.
This is misguided. Let's think logically about this.More thinking = smarter models
Faster hardware = more thinking
More/newer Nvidia GPUs, better TSMC nodes = faster hardware
Therefore, you can conclude that Nvidia and TSMC demand should go up because of CoT models. In 2025, CoT models are clearly bottlenecked by not having enough compute.
Or that in order to build a moat, OpenAI/Anthropic/Google and other laps need to double down on even more compute.I agree with this.
Fwiw many of the improvements in Deepseek were already in other 'can run on your personal computer' AI's such as Meta's Llama. Deepseek is actually very similar to Llama in efficiency. People were already running that on home computers with M3's.
A couple of examples; Meta's multi-token prediction was specifically implemented as a huge efficiency improvement that was taken up by Deepseek. REcurrent ADaption (READ) was another big win by Meta that Deepseek utilized. Multi-head Latent Attention is another technique, not pioneered by Meta but used by both Deepseek and Llama.
Anyway Deepseek isn't some independent revolution out of nowhere. It's actually very very similar to the existing state of the art and just bundles a whole lot of efficiency gains in one model. There's no secret sauce here. It's much better than what openAI has but that's because openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.
Anyway why do i point out that Deepseek is very similar to something like Llama? Because Meta's spending 100's of billions on chips to run it. It's pretty damn efficient, especially compared to openAI but they are still spending billions on datacenter build-outs.
> openAI seem to have forgotten 'The Bitter Lesson'. They have been going at things in an extremely brute force way.
Isn't the point of 'The Bitter Lesson' precisely that in the end, brute force wins, and hand-crafted optimizations like the ones you mention llama and deepseek use are bound to lose in the end?
Imho the tldr is that the wins are always from 'scaling search and learning'.
Any customisations that aren't related to the above are destined to be overtaken by someone that can improve the scaling of compute. OpenAI do not seem to be doing as much to improve the scaling of the compute in software terms (they are doing a lot in hardware terms admitedly). They have models at the top of the charts for various benchmarks right now but it feels like a temporary win from chasing those benchmarks outside of the focus of scaling compute.
But Microsoft hosts 3rd party models too, and cheaper models means more usage, which means more $$$ to scaled cloud providers right?
it means they can serve more with what they have if they implement models with deepseek's optimizations. More usage doesn't mean Nvidia will get the same margins when cloud providers scale out with this innovation.
Yesterday I wrote up all my thoughts on whether NVDA stock is finally a decent short (or at least not a good thing to own at this point). I’m a huge bull when it comes to the power and potential of AI, but there are just too many forces arrayed against them to sustain supernormal profits.
Anyway, I hope people here find it interesting to read, and I welcome any debate or discussion about my arguments.
Wanted to add a preface: Thank you for your time on this article, I appreciate your perspective and experience, hoping you can help refine and reign in my bull case.
Where do you expect NVDA's forward and current eps to land? What revenue drop off are you expecting in late 2025/2026. Part of my bull case for NVDA, continuing, is it's very reasonable multiple on insane revenue. An leveling off can be expected, but I still feel bullish on it hitting $200+ (5 Trillion market cap? on ~195B revenue for Fiscal year 2026 (calendar 2025) at 33 EPS) based on this years revenue according to their guidance and the guidance of the hyperscalers spending. Finding a sell point is a whole different matter to being actively short. I can see the case to take some profits, hard for me to go short, especially in an inflationary environment (tariffs, electric energy, bullying for lower US interest rates).
The scale of production of Grace Hopper and Blackwell amaze me, 800k units of Blackwell coming out this quarter, is there even production room for AMD to get their chips made? (Looking at the new chip factories in Arizona)
R1 might be nice for reducing llm inferencing costs, unsure about the local llama one's accuracy (couldnt get it to correctly spit out the NFL teams and their associated conferences, kept mixing NFL with Euro Football) but I still want to train YOLO vision models on faster chips like A100's vs T4 (4-5x multiples in speed for me).
Lastly, if the Robot/Autonomous vehicle ML wave hits within the next year, (First drones and cars -> factories -> humanoids) I think this compute demand can sustain NVDA compute demand.
The real mystery is how we power all this within 2 years...
* This is not financial advice and some of my numbers might be a little off, still refining my model and verifying sources and numbers
Good article. Maybe I missed it, but I see lots of analysis without a clear concluding opinion.
[dead]
[dead]
Link isn't working. Is there another or a cached version?
Try again! Just rebooted the server since it’s going viral now.
[flagged]
[flagged]
[flagged]
Looks like huge astroturfing effort from CCP. I am seeing these coordinated propaganda inside every AI related sub on reddit, on social media and now - here.
Yeah I get that feeling too. Lots of old school astroturfing going on.
aand I am buried. China CCP is attacking on all vectors.
It seems like a pointless discussion since DeepSeek uses Nvidia GPUs after all.
it uses a fractional amount of GPUs though.
As it says in the article, you are talking about a mere constant of proportionality, a single multiple. When you're dealing with an exponential growth curve, that stuff gets washed out so quickly that it doesn't end up matter all that much.
Keep in mind that the goal everyone is driving towards is AGI, not simply an incremental improvement over the latest model from Open AI.
Why do you assume that exponential growth curve is real?
Jevons Paradox states that increasing efficiency can cause an even larger increase in demand.
Their loss curve with the RL didn't level off much though, could be taken a lot further and scaled up to more parameters on the big nvidia mega clusters out there. And the architecture is heavily tuned to nvidia optimizations.
Which due to the Jevons Paradox may ultimately cause more shovels to be sold
"wait" I suspect we are all in a bit of denial.
When was the last time the US got their lunch ate in technology?
Sputnik might be a bit hyperbolic but after using the model all day and as someone who had been thinking of a pro subscription, it is hard to grasp the ramifications.
There is just no good reference point that I can think of.
Yep some CEO said they have 50K GPUs of the prior generation. They probably accumulated them through intermediaries that are basically helping nvidia sell to sanctioned parties by proxy
Deepseek was there side project. They had a lot of GPUs from their crypto mining project.
Then Ethereum turned off PoW mining, so they looked into other things to do with their GPUs, and started DeepSeek.
Mining crypto on H100s?
Does no one realize this is a thinly-veiled ad? The URL is bizarre
A thinly veiled ad? You must be joking.
If we are to get to AGI why do we need to train on all data? That's silly, and all we get is compression and probabliatic retrieval.
Intelligence by definition is not compression, but ability to think and act according to new data, based on experience.
Trully AGI models will work on the this principle, not on best compression of as much data as possible.
We need a new approach.
Actually, compression is an incredibly good way to think about intelligence. If you understand something really well then you can compress it a lot. If you can compress most of human knowledge effectively without much reconstruction error while shrinking it down by 99.5%, then you must have in the process arrived at a coherent and essentially correct world model, which is the basis of effective cognition.
Fwiw there's highly cited papers that literally map AGI to compression. As in they map to the same thing and people write papers on this fact that are widely respected. Basically a prediction engine can be used to make a compression tool and an AI equally.
The tldr; if given inputs and a system that can accurately predict the next sequence you can either compress that data using that prediction (arithmetic coding) or you can take actions based on that prediction to achieve an end goal mapping predictions of new inputs to possible outcomes and then taking the path to a goal (AGI). They boil down to one and the same. So it's weird to have someone state they are not the same when it's widely accepted they absolutely are.
“If you can't explain it to a six year old, you don't understand it yourself.” -> "If you can compress knowledge, you understand it."