Apparate: Early-Exit Models for ML Latency and Throughput Optimization – Comparisons | HackerNoon
Authors: (1) Yinwei Dai, Princeton University (Equal contributions); (2) Rui Pan, Princeton University (Equal contributions); (3) Anand Iyer, Georgia Institute of Technology; (4) Ravi Netravali, Georgia Institute of Technology. Table of Links Abstract and 1 Introduction 2 Background and Motivation and 2.1 Model Serving Platforms 2.2 Early-Exit Models 2.3 Challenges 3 Design 3.1 Preparing Models with Early Exits 3.2 Accuracy-Aware Threshold Tuning 3.3 Latency-Focused Ramp Adjustments 4 Implementation 5 Evaluation and 5.1 Methodology 5.2 Overall