On AlphaGo, Fine-Tuning, and the Future of National Security
...or how to avoid a 'Sedol Moment'
AlphaGo
From the moment in 1997 that IBM's DeepBlue overcame Garry Kasparov in the second of their two series of matches, the ancient Chinese game of Go replaced chess as the standard bearer for the superiority of human intelligence over machines, and with good reason. Less susceptible to brute force, and more reliant on 'human' intuition, it took another 19 years for artificial intelligence to master Go, when DeepMind's AlphaGo finally defeated professional Go champion Lee Sedol in a series of five matches, immortalised in a Netflix documentary film.
The team at DeepMind didn't start with Go, but their early experiments with more basic computer games like Breakout revealed a phenomenon that was first observed in DeepBlue, and which is seen even more vividly in the AlphaGo series with Sedol: Moves that have never or seldom been made by humans.
In the case of Breakout, DeepMind's self-taught agent discovered that the best strategy was not to spend time endlessly batting the ball against the rainbow wall, removing one block at a time, but instead to create narrow tunnel at the side of the wall, sending the ball through the tunnel to bounce rapidly around the space behind it, removing blocks much more quickly and without the need for the agent's intervention. DeepMind's programmers had been oblivious to this technique until the agent started using it.
When AlphaGo took on Sedol in 2016, in the first of five games staged at the Four Seasons Hotel in Seoul, the professional player was frequently left bemused by the moves selected by AlphaGo, at times seemingly unable to respond with a move of his own. One move took him 12 minutes, the process leaving him clearly unsettled.
Demis Hassabis, founder of DeepMind, described its mission as being to 'understand intelligence, and re-create it artificially', the aim being 'to use that technology to help society solve all sorts of other problems.' In the case of AlphaGo, this was achieved by a combination of supervised and unsupervised reinforcement learning, described (Nature, 19 Oct 2017, Silver et al) as 'neural networks [...] trained by supervised learning from human expert moves, and by reinforcement learning from self-play.' The success of this approach was clear for all to see, leaving experienced observers stunned by the power and potential of AlphaGo.
I was particularly struck by this account, from The Atlantic, of the second game in the series, once Lee Sedol fully understood the power of AlphaGo:
"In Game 2, Lee exhibits a different style, attempting to play more cautiously. He waits for any opening he can exploit, but AlphaGo continues to surprise. At move 37, AlphaGo plays an unexpected move, what’s called a “shoulder hit” on the upper right side of the board. This move in this position is unseen in professional games, but its cleverness is immediately apparent. Fan Hui would later say, “I’ve never seen a human play this move. So beautiful.” (emphasis added)
The world of chess changed after DeepBlue, whose capabilities are now regarded as trivial, available even on the most basic computers. Humans still play humans, but a combination of grandmaster plus computer ('Centaur play') can defeat either humans or computers playing alone. This notion of a human-machine team has entered our lexicon, not least in the domain of defence and national security. But the world has changed again, and our ability to harness the unfathomable brilliance of artificial intelligence, demonstrated so clearly in the Sedol vs AlphaGo matches, is about to take on even greater significance.
Fine-Tuning
This week, OpenAI announced 'Fine-tuning for GPT3.5 Turbo', the latest in a string of innovations that have transformed our assumptions about artificial intelligence (AI).
In the past 12 months alone, the public release of GPT3, ChatGPT, GPT4, and other Large Language Models (LLMs) such as LLaMa and Claude has given us all powers that were hitherto the preserve of only the most advanced companies and research labs.
Fine-tuning allows an organisation to enhance an existing LLM such as GPT3.5 by incorporating bespoke or proprietary data not previously used by the model, using that data to optimise the performance of the LLM for a specific purpose. The potential is immense.
Just this week, I learned about the efforts of a major UK-based charitable organisation, whose annual website costs nearly £3m per year to maintain (don't ask), to adapt and incorporate the capabilities of LLMs into their service provision. This organisation generates huge volumes of highly specific data relating to the problems they tackle, which they are trusted to use for the benefit of their stakeholders. Independently, and without asking permission, a third party has already scraped the publicly-available data from this organisation's website, used it to fine-tune an LLM, and has produced a working chatbot prototype that offers the same service as the charity, automated, and at near zero cost.
Of course the chatbot is untested, unregulated, may hallucinate, and has none of the expert supervision and experience which the charity provides, but the challenge is clear.
The Future of National Security
AI is suddenly everywhere. Well, almost everywhere. To date, concerns about the security of LLMs, and well-understood approaches to the protection of data, sources and methods within the intelligence and security community may have made it hard to incorporate world-leading/proprietary foundational models into intelligence work. Information systems that are air-gapped from the internet are not designed for connectivity, and certainly not to the clouds which power OpenAI and other cutting edge LLMs. The intelligence and security community has a range of extraordinary capabilities, but ready access to commercial cloud compute may not be among them.
That may be about to change. With the ability to fine-tune OpenAI's LLMs and customise models for specific use-cases, national security organisations will soon be able to leverage their unique proprietary data to improve GPT4’s model performance specifically for the tasks and missions that matter most to them. Operational security concerns about using OpenAI’s API for fine-tuning mean that such capabilities remain tantalising but out of reach. However, with the release of OpenAI for Enterprise this week, inevitably to be followed by the other tech giants, we are moving ever closer to a framework for LLM access and fine-tuning that will overcome the security concerns of the community, and unlock the power of LLMs like OpenAI for national security applications.
At a basic level, this will enable the acceleration and automation of tasks which absorb manpower that could be better used on higher value tasks. But to understand the full potential of this innovation, we must first consider the role and importance of intelligence, before returning to AlphaGo.
The purpose of intelligence? To reveal intentions, provide insights, and deliver information which enables successful decision-making, and thereby better outcomes. The value of intelligence remains immense. From the D-Day landings to the Cuban Missile Crisis, Yom Kippur to Iraq, intelligence-led decisions (and errors) have been critical in the key geopolitical events of the last century; competition between countries and their respective intelligence agencies remains intense. Most recently, the use of intelligence to reveal Russian intentions before and during the attack on Ukraine has played a key role in shaping the conflict, and in rallying support for Ukraine.
Back to AlphaGo. Let's reconsider the reaction of Lee Sedol to game 2, move 37:
"He gets up and walks out of the room." (The Atlantic, my emphasis)
Like DeepBlue before it, and as foreshadowed by its innovations in Breakout, AlphaGo produced a move so astonishing to the acknowledged master of Go that he had to leave the room to regain his composure. An intelligence built upon human foundations, but developed through self-play, changed the paradigm.
Now, imagine what AlphaGo for National Security could do. Say we took the combined knowledge of 100 years of diplomacy, embodied in the archive of memos, telegrams, letters, emails, reports, and intelligence assessments at the Foreign, Commonwealth and Development Office (FCDO) - the equivalent of AlphaGo's training dataset - and used this treasure trove to fine-tune a cutting edge LLM.
And then? Well, we test our fine-tuned model against the most complex and challenging diplomatic questions we face, from the Middle East Peace Process to the war in Ukraine, and we wait.
What will be our ‘Sedol moment’? What is the national security equivalent of ‘Move 37’?
Could it be a hitherto overlooked sanction, which impacts the Russian economy so significantly that it forces Putin to negotiate? Or a novel redistribution of land that meets the aspirations of both Israelis and Palestinians? Is there a tactic that could dissuade Kim Jong-Un from lobbing missiles over Japan? Or a deterrent we haven’t thought of which would safeguard Taiwan’s independence? It is impossible to predict what these insights, tactics, or strategies might be, or what impact they could have. But geopolitical power will remain highly correlated to the mastery of technology, and the countries and organisations who harness the power of AI to advance their national security interests will have the upper hand.
The UK is a global leader in diplomacy, intelligence and security, admired for the professionalism of its agencies and the impact of their work. But we cannot afford to miss this opportunity to harness AI to our national interests, or a Sedol moment of our own awaits.
(screenshot credits: ‘AlphaGo’, from YouTube)
Jonathan Luff, August 2023
Note: I am grateful to a number of colleagues and friends who have offered comments on versions of this draft. I have included a number of revisions as a result, but the final version is no one’s responsibility but my own. I am hoping they will offer comments below on the areas where we disagreed, or where I couldn’t find a way to incorporate their comments into the essay.