Panel Data Analysis Via Variable Selection and Subject Clustering

Publication Type

Book Chapter

Publication Date



This book investigates tradeoff between security and usability in designing leakage resilient password systems (LRP) and introduces two practical LRP systems named Cover Pad and ShadowKey. It demonstrates that existing LRP systems are subject to both brute force attacks and statistical attacks and that these attacks cannot be effectively mitigated without sacrificing the usability of LRP systems. Quantitative analysis proves that a secure LRP system in practical settings imposes a considerable amount of cognitive workload unless certain secure channels are involved. The book introduces a secure and practical LRP A panel data set contains observations on multiple phenomena observed over multiple time periods for the same subjects (e.g., firms or individuals). Panel data sets frequently appeared in the study of Marketing, Economics, and many other social sciences. An important panel data analysis task is to analyze and predict a variable of interest. As in social sciences, the number of collected data records for each subject is usually not large enough to support accurate and reliable data analysis, a common solution is to pool all subjects together and then run a linear regression method in attempt to discover the underlying relationship between the variable of interest and other observed variables. However, this method suffers from two limitations. First, subjects might not be poolable due to their heterogeneous nature. Second, not all variables might have significant relationships to the variable of interest. A regression on many irrelevant regressors will lead to wrong predictions. To address these two issues, we propose a novel approach, called Selecting and Clustering, which derives underlying linear models by first selecting variables highly correlated to the variable of interest and then clustering subjects into homogenous groups of the same linear models with respect to those variables. Furthermore, we build an optimization model to formulate this problem, the solution of which enables one to select variables and clustering subjects simultaneously. Due to the combinatorial nature of the problem, an effective and efficient algorithm is proposed. Studies on real data sets validate the effectiveness of our approach as our approach performs significantly better than other existing approaches.


Computer Sciences | Numerical Analysis and Scientific Computing

Research Areas



Data Mining for Services


Yada, Katsutoshi

First Page


Last Page








City or Country


Additional URL