The ability to handle single molecules as effectively as macroscopic building-blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles, and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach employs reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback. We demonstrate the potential of our RL approach by removing molecules autonomously with a scanning probe microscope from a supramolecular structure -- an exemplary task of subtractive manufacturing at the nanoscale. Our RL agent reaches an excellent performance, enabling us to automate a task which previously had to be performed by a human. We anticipate that our work opens the way towards autonomous agents for the robotic construction of functional supramolecular structures with speed, precision and perseverance beyond our current capabilities.
Extract authors, key findings, references, and an executive summary using AI.
This study presents the first demonstration of reinforcement learning (RL) applied to autonomous robotic nanofabrication, addressing the long-standing challenge of manipulating individual molecules without human intervention. The authors target a PTCDA (3,4,9,10-perylene-tetracarboxylic dianhydride) monolayer on an Ag(111) surface, with the RL agent tasked with autonomously removing single molecules using a scanning probe microscope (SPM)—a textbook example of subtractive manufacturing at the nanoscale. The fundamental difficulty is that the complete atomic-scale state of the environment is unobservable, the system is non-stationary due to unpredictable tip apex changes, and conventional approaches (human expertise or model-based simulation) fail at this scale. The RL framework models the problem as a Markov Decision Process with a simplified 3D state space (Cartesian tip coordinates) and five discrete actions moving the tip in different directions. Two critical algorithmic innovations enable practical data efficiency: a model-based planning component (Dyna-style) that exploits the deterministic Cartesian state transitions to generate synthetic training experience, and a rupture avoidance mechanism using negative training temperature that propagates failure-state information far back through trajectories, enabling rapid avoidance of dangerous regions. Together, these modifications reduce agent failure rates from 70% to 11% in simulation and make real-world application feasible. In physical experiments conducted at 5 K, the RL agent autonomously created 16 molecular vacancies in the PTCDA layer, each verified by STM imaging. Pre-trained agents (P-agents), initialized with weights from a previously successful run, outperformed randomly initialized (R-agents) by focusing exploration in the physically meaningful lower-left trajectory quadrant corresponding to peeling the molecule along its long axis—a universally valid policy that transfers across different tip configurations. The difficulty of the task scales inversely with tip-molecule bond strength, with weak tips requiring the agent to traverse a very narrow corridor in xy-space at critical heights. The work demonstrates that RL can succeed in a real-world nanoscale robotic task characterized by partial observability, non-stationarity, and sparse feedback—conditions under which classical robotics approaches fail. The authors envision future extensions incorporating tunneling current and force gradient signals into the state representation, hybrid simulation-guided RL for more complex tasks, and integration with autonomous tip preparation. Ultimately, this approach opens a path toward the autonomous construction of arbitrary metastable supramolecular structures with functional properties inaccessible through self-assembly alone.
The ability to handle single molecules as effectively as macroscopic building-blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles, and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach employs reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback. We demonstrate the potential of our RL approach by removing molecules autonomously with a scanning probe microscope from a supramolecular structure – an exemplary task of subtractive manufacturing at the nanoscale. Our RL agent reaches an excellent performance, enabling us to automate a task which previously had to be performed by a human. We anticipate that our work opens the way towards autonomous agents for the robotic construction of functional supramolecular structures with speed, precision and perseverance beyond our current capabilities.
1.Reinforcement learning was demonstrated for the first time to automate a manipulation task at the nanoscale, specifically autonomous removal of PTCDA molecules from a self-assembled monolayer on Ag(111) using a scanning probe microscope.
2.The RL agent successfully performed subtractive nanofabrication, creating 16 vacancies in a PTCDA monolayer as demonstrated by STM imaging, without human intervention during the manipulation process.
3.The nanofabrication problem was formulated as a Markov Decision Process (MDP) with a 3-dimensional state space consisting only of the Cartesian coordinates (x, y, z) of the SPM tip apex, making the approach tractable despite the partial observability of the full atomic-scale environment.
The discussion analyzes the learning process by comparing randomly initialized (R-agents) and pre-trained (P-agents), finding that P-agents perform better due to a transferable universal policy corresponding to exploration of the lower-left trajectory quadrant consistent with the physical 'peeling' mechanism. The performance variability is linked to tip-dependent bond strength, with weaker tips requiring narrower successful trajectory corridors and thus more episodes. The authors identify limited observability as the most severe limitation of RL at the nanoscale, noting that partial observability and stochasticity increase the number of trials needed. Future directions include: (1) hybrid approaches combining atomistic simulation insight with RL for guided exploration; (2) incorporation of measurable quantities such as tunneling current and force gradient into the state representation for tasks with hysteretic behavior; and (3) combination of autonomous SPM-based nanofabrication with autonomous tip preparation methods. The authors conclude that autonomous robotic nanofabrication is viable and enables progress towards designing quantum matter beyond the constraints of crystal growth and self-assembly.