To me this is an illustration of one reason we are still some way off from a truly general system. Applying these "filters" to sensor data is enormously helpful, but only in certain applications. For a system giving GPS driving directions, you may be able to linearize and smooth movement, or constrain it to the nearest road. But you have to know that those assumptions don't apply to other domains that could use GPS data (e.g. surveying, wilderness navigation, animal tracking), and you also have to know when the situation calls for just going out with a tape measure, or installing a camera and referencing landmarks.
Probabilistic reasoning is useful when the problem space has been constrained to admit probabilistic solutions. But the real world is not just analog and continuous, it is also discrete, and it may require fine discrimination between two states that are covered by the same distribution. Effective operation as a general intelligence in the world requires the ability to tell when the stakes or details of an event require a different approach. Determining when and how to make those decisions is still a largely unsolved problem.
This smells LLM-written (which seems very likely given the rest of this site's content). There's a lot of weird apparatus like defining vocab, a very poor explanation of Kalman Filter which wouldn't help anyone, before jumping into the 'EIF' and then almost instantaneously concluding with takeaways. I'm reading this and not learning anything but buzzwords. And this is +14? Are you guys actually reading this and getting anything out of this?
Useful math such as this is likely to emerge and be refined as part of training, assuming they’re discoverable via gradient descent or whatever the cool kids are using these days. Reinforcement learning should then retain and refine anything useful once discovered.
If something like a Kalman filter or Gaussian filter is discovered by the training process, it’d be put to use and refined, and maybe this is part of “grokking” phenomenon - a decent explanation is here: https://medium.com/generative-ai-revolution-ai-native-transf....
DeepSeek talked about “aha moments” in https://arxiv.org/pdf/2501.12948#page9 where under RL the model learned to backtrack and correct itself, for example.
Makes one wonder what neat math we can find in models - eg. In May 2024, researchers at Anthropic found the Golden Gate Bridge and created a modified Claude that couldn’t stop talking about the bridge: https://www.anthropic.com/news/golden-gate-claude
Kind of like the bitter lesson of AI in reverse - instead of us trying to outsmart AI with hand-rolled tricks, why not let it tell us what tricks are out there to be discovered :-)
I had some other harsher words written but I don't want to be that guy.
Instead I'll just ask - what is it with the continuing obsession with KF, EKF, IF, and all the variants? We've lost enough headspace in new grads, it's much better to just show them loss function minimization and convex approximation and how every one of these filters just falls out of that.
Or even better - a filter is a computational trick to limit the memory footprint of a least squares loss minimization problem.
From there, many forms follow from the same equation.
To me this is an illustration of one reason we are still some way off from a truly general system. Applying these "filters" to sensor data is enormously helpful, but only in certain applications. For a system giving GPS driving directions, you may be able to linearize and smooth movement, or constrain it to the nearest road. But you have to know that those assumptions don't apply to other domains that could use GPS data (e.g. surveying, wilderness navigation, animal tracking), and you also have to know when the situation calls for just going out with a tape measure, or installing a camera and referencing landmarks.
Probabilistic reasoning is useful when the problem space has been constrained to admit probabilistic solutions. But the real world is not just analog and continuous, it is also discrete, and it may require fine discrimination between two states that are covered by the same distribution. Effective operation as a general intelligence in the world requires the ability to tell when the stakes or details of an event require a different approach. Determining when and how to make those decisions is still a largely unsolved problem.
This smells LLM-written (which seems very likely given the rest of this site's content). There's a lot of weird apparatus like defining vocab, a very poor explanation of Kalman Filter which wouldn't help anyone, before jumping into the 'EIF' and then almost instantaneously concluding with takeaways. I'm reading this and not learning anything but buzzwords. And this is +14? Are you guys actually reading this and getting anything out of this?
My bet is models can already do all these things.
Useful math such as this is likely to emerge and be refined as part of training, assuming they’re discoverable via gradient descent or whatever the cool kids are using these days. Reinforcement learning should then retain and refine anything useful once discovered.
If something like a Kalman filter or Gaussian filter is discovered by the training process, it’d be put to use and refined, and maybe this is part of “grokking” phenomenon - a decent explanation is here: https://medium.com/generative-ai-revolution-ai-native-transf....
DeepSeek talked about “aha moments” in https://arxiv.org/pdf/2501.12948#page9 where under RL the model learned to backtrack and correct itself, for example.
Makes one wonder what neat math we can find in models - eg. In May 2024, researchers at Anthropic found the Golden Gate Bridge and created a modified Claude that couldn’t stop talking about the bridge: https://www.anthropic.com/news/golden-gate-claude
Kind of like the bitter lesson of AI in reverse - instead of us trying to outsmart AI with hand-rolled tricks, why not let it tell us what tricks are out there to be discovered :-)
I had some other harsher words written but I don't want to be that guy.
Instead I'll just ask - what is it with the continuing obsession with KF, EKF, IF, and all the variants? We've lost enough headspace in new grads, it's much better to just show them loss function minimization and convex approximation and how every one of these filters just falls out of that.
Or even better - a filter is a computational trick to limit the memory footprint of a least squares loss minimization problem.
From there, many forms follow from the same equation.
[dead]