Teaching physics to AI
-
- from Shaastra :: vol 03 issue 07 :: Aug 2024
Data scientist Anima Anandkumar on the future of 'smarter' AI that is rather more grounded in reality.
When researcher and data scientist Anima Anandkumar entered the field of artificial intelligence in the early 2010s, it was poised for take-off. The "trinity of AI" – internet data, computing power and neural networks – together laid the path for a decade of groundwork by scientists that has today made AI a subject of drawing-room conversation. "AI was growing that whole decade. For example, we had face recognition on our phones: these kinds of narrow computer vision tasks got very good with the use of neural nets," she says. But it was the foundational language models – what we call GPTs and chatbots – that made it possible for people to interact with AI in a way that was more conversational and intimate. For Anandkumar, the question is: what next? How do you make AI smarter? Much of her work in recent years has revolved around grounding AI in reality, and teaching it the laws of physics so that it can truly understand how the world works. For her contributions to AI, Anandkumar, Bren Professor of Computing and Mathematical Sciences at the California Institute of Technology, was recently awarded the 2025 IEEE Kiyo Tomiyasu Award. Here, Anandkumar shares her outlook on the future of AI. Excerpts:
With better AI models, how have our expectations and definitions of what counts as intelligence changed over the past two decades?
I think the bar keeps moving. There's the classic Turing test, where the question is: can you tell if the system you're interacting with is a human or not? Many people say the Turing test is broken: in many chat interactions, you can't easily tell if it's a human or AI. I think it is the question of Kolmogorov complexity, or to put it in another way: how many interactions are you allowed to test this. Even very sophisticated chatbots can ultimately break down on some simple common-sense questions if you really probe them. So, AI is able to get through benign interactions, but when someone is adversarial and wants to manipulate the system... that's the aspect we haven't yet resolved. The system is still brittle and vulnerable to attacks much more than a standard human would be.
"What if we have large (language) models that broadly understand physics in different scenarios... To me, that's an exciting future."
Since 2020, we have been talking about large language models. Where do you think the research is headed?
To me, language is a great way to build a chatbot and interact. But these models suffer from hallucination: one source of hallucination is that they don't understand the physical world. If you ask it to play tennis, it can come up with theories on how to play tennis, but not really go execute that in the real world. If you ask it to predict what the weather will be tomorrow, it's not going to do it itself: it might look up a weather app and convey that information to the user. It can't internally simulate what's happening and come up with answers. The same goes for designing a better aircraft wing or a better drone. You can ask it for some ideas and it can come up with wacky and interesting designs. But we're not asking for an artistic answer. We want something that works in the real world. And most of the time it won't, because it's aligned for more creative pursuits rather than one that is physically grounded. You can't get the design 3D-printed and just let it fly. That's why we need to move from just training on text data or image data to training on both simulations and real-world phenomena of various kinds so we create models that understand physics from the bottom up.
So, we need not train a model for a certain application, but try to make AI understand the laws of physics, whatever the scenario.
Yeah, precisely: that's the next step in many use-cases. Think of how much language models have revolutionised the way we solve problems. Now, what if we have large models that broadly understand physics in different scenarios – so you can not only simulate but come up with optimal designs through that model. To me, that's an exciting future.
How do you achieve that?
We've already built a GPT-2 size model that is similar in the number of parameters, one that can solve different kinds of partial differential equations: think of fluid dynamics, heat transfer, wave propagation, and so on. We have one model learn multiple such phenomena, and we have shown that it does better than narrow models. That's really the cross-learning we need, which is true in language, and now is also true in physics... We can first train on simple phenomena and progressively go to more complex ones, which means we don't need as much data on the complex phenomena as would be otherwise required.
Could we then increase the diversity of problems that can be solved with one model?
It depends on how much computational power we can get to scale up the models. Unlike language models where all the data is already available, we need to think about being able to systematically train these models without getting overwhelmed by the scale and complexity of the problems we're trying to solve.
Is this where your field of innovation – neural operators – comes in?
Yes, if you look at text data, the vocabulary is known: it's fixed-size tokens (a string of a few letters). But when it comes to, say, forecasting a hurricane, I can't look at a zoomed-out picture of a hurricane and say where it's going to move. This is not intuitive physics: these phenomena require zooming into the fine details. Even very small eddies make a big difference in terms of where the hurricane moves. A standard computer vision model works at a certain resolution; if you zoom in and ask for finer details, it gets blurry. This is where neural operators come in: think about a model where you can keep zooming in and it still has sharp details, which you can refine by adding laws of physics. So neural operators don't just represent data in terms of fixed number of pixels, they represent data in terms of continuous functions.
What sparked your interest in neural operators?
When I was working at Amazon and at Caltech, we were exploring how to take deep learning to the next level, in terms of more efficient training methods... The other aspect was also early work on neural landers: how we can learn aerodynamics in a drone, keep it stable and make sure it lands safely. All of this was informing me how to think about the physical world and build AI there. It is much harder because we have so much text data and image data sitting there; whereas here, the data is limited, or we may need to go into simulations to get them.
Developing neural operators was more of an accident. We were talking to people in applied math like Andrew Stuart and Kaushik Bhattacharya and asking: "what is a general way to describe a lot of physical phenomena?" It's a partial differential equation. And so, if we came up with a general machine learning technique to solve PDEs, we could have a broad framework that tackles a wide range of problems. Many people had tried and they weren't that successful, or were only successful if it still had a solver in the loop. That's when we realised that the standard neural nets are not able to do it because they only work at a fixed resolution. So, we needed to build this property into our models as well. That's how we came up with neural operators.
"Different countries and regions need AI that understands not only the local language but the local culture and traditions. That's important..."
You've championed AI ethics, and fair machine learning, without social biases; for that to happen, there has to be a good amount of diversity in our datasets. Which means we need more indigenous AI models, right?
Yes, indigenous or what (NVIDIA CEO) Jensen Huang likes to call 'sovereign' AI. Different countries and regions need AI that understands not only the local language but the local culture and traditions. That's important if you want to run general models that are able to chat with you and solve problems. It's not just about the language, but even other aspects of what is needed to solve regional goals: in terms of maybe climate change, energy, scientific discoveries.
When we talk about AI without bias, there seems to be an ideological divide in the STEM hubs where a certain faction says we might be overcorrecting by compromising accuracy...
It's a tricky question because it's really a multi-objective function. Who decides what is the right level of awareness and who decides what are the right groups to look at? So, we released a tool called Bias Test GPT where the user can decide what social groups they want to test and what language models they want to test on. You have a lot more flexibility to look at how it behaves across different topics. Maybe you want the bias in terms of health applications, because different groups have different issues and you can't have the same advice for everyone. So, you need biases to make it personalised. At the same time, you don't want stereotypical biases that harm people in other aspects, like hiring. The only way is to look at the end goal and the desired outcome for it, because you can't fix the pre-training of the model for all biases. You may have to fine-tune the model at the very end to fix it.
Have a
story idea?
Tell us.
Do you have a recent research paper or an idea for a science/technology-themed article that you'd like to tell us about?
GET IN TOUCH