Conference Proceeding Article
It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot be released to third-party development houses. For example, a healthcare provider may have a set of patient records that are strictly confidential and cannot be used by any third party. Simply masking sensitive values alone may not be sufficient, as the correlation among fields in the data can reveal the masked information. Also, masked data may exhibit different behavior in the system and become less useful than the original data for testing and debugging.For the purpose of releasing private data for testing and debugging, this paper proposes the kb-anonymity model, which combines the k-anonymity model commonly used in the data mining and database areas with the concept of program behavior preservation. Like k-anonymity, kb-anonymity replaces some information in the original data to ensure privacy preservation so that the replaced data can be released to third-party developers. Unlike k-anonymity, kb-anonymity ensures that the replaced data exhibits the same kind of program behavior exhibited by the original data so that the replaced data may still be useful for the purposes of testing and debugging. We also provide a concrete version of the model under three particular configurations and have successfully applied our prototype implementation to three open source programs, demonstrating the utility and scalability of our prototype.
k-anonymity, symbolic execution, third-party testing and debugging, behavior preservation
Software and Cyber-Physical Systems
PLDI 11: Proceedings of the 2011 ACM Conference on Programming Language Design and Implementation, San Jose, CA, June 4-8, 2011
City or Country
BUDI, Aditya; LO, David; JIANG, Lingxiao; and Lucia, Lucia.
kb-anonymity: A model for anonymized behavior-preserving test and debugging data. (2011). PLDI 11: Proceedings of the 2011 ACM Conference on Programming Language Design and Implementation, San Jose, CA, June 4-8, 2011. 447-457. Research Collection School Of Information Systems.
Available at: http://ink.library.smu.edu.sg/sis_research/1390
Copyright Owner and License
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.