Publication Type

Journal Article

Version

acceptedVersion

Publication Date

9-2026

Abstract

Machine unlearning has emerged as a key mechanism for enabling the “right to be forgotten” in neural network models, allowing the selective removal of specific training data upon request. Existing approaches typically rely on retraining models with the remaining data, which is computationally expensive and difficult to verify, especially when deployed models are distributed or resource-constrained. To address this challenge, our prior conference work introduced PRUNE, a patching-based framework that formulates unlearning as a neural network repair problem. PRUNE achieves targeted forgetting by learning lightweight patch networks that redirect model predictions on the data to be unlearned while preserving performance on the remaining data. In this extended journal version, we make three major advances: (1) we formally define a threat model that characterizes dishonest behaviors of model owners and corresponding privacy risks; (2) we extend PRUNE to support class-level unlearning, enabling removal of all samples from a target category; and (3) we perform additional experiments showing that PRUNE can resist membership inference attacks, demonstrating its privacy robustness. Extensive evaluations on multiple classification benchmarks confirm that PRUNE achieves certifiable unlearning with high efficiency, minimal performance degradation, and strong verifiability.

Keywords

Machine Learning, Machine Unlearning, Privacy Leakage, Data Privacy

Discipline

Information Security | OS and Networks

Publication

Neural Networks

Volume

201

First Page

1

Last Page

16

ISSN

0893-6080

Identifier

10.1016/j.neunet.2026.108897

Publisher

Elsevier

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1016/j.neunet.2026.108897

Share

COinS