Title:
Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources
Time:
2022.4.11 14:30-16:00
Presenter:
Prof. Wang Hansheng
Abstract:
Modern statistical analysis often involves large data sets, for which con- ventional estimation methods are not suitable, owing to limited computational resources. To solve this problem, we propose a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample as if it were the population. Then, we obtain multiple subsamples with greatly reduced sizes using simple random sampling with replacement. We do not recommend sampling methods without replacement, because this would incur a significant data processing cost when the processing occurs on a hard drive. However, such a cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory and processed. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged to form the final estimator. We show theoretically that the final estimator is consistent and asymptotically normal.Furthermore, its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is easily implemented on most computer systems, and thus is widely applicable.
Click the website to join the meeting
https://meeting.tencent.com/dm/CNirPxvDwdis
#Tencent meeting:120-533-317
Password:0411