// // // // //

Projects

Randomized Channel Shuffling, Minimal-Overhead Backdoor Attack Detection without Clean Datasets

Deep neural networks (DNNs) typically require massive data to train on, which is a hurdle for numerous practical domains. Facing the data shortfall, one viable option is to acquire domain-specific training data from external uncensored sources, such as open webs or third-party data collectors. However, the quality of such acquired data is often not rigorously scrutinized, and one cannot easily rule out the risk of ``poisoned’’ examples being included in such unreliable datasets, resulting in unreliable trained models which pose potential risks to many high-stake applications.
2022-09-23
2 min read

Robust Weight Signatures Gaining Robustness as Easy as Patching Weights?

Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that “robustness” encoded in the model weights, and how easily can it be disentangled and/or “zero-shot” transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start by drawing several key observations: (1)assuming that we train the same model architecture on both a clean dataset and its corrupted version, resultant weights mostly differ in shallow layers; (2)the weight difference after projection, which we call “Robust Weight Signature” (RWS), appears to be discriminative and indicative of different corruption types; (3)for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets.
2022-02-28
2 min read