
TinyBERT: Distilling BERT for Natural Language Understanding
Sep 25, 2019 · Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, the pre-trained language …
TinyBERT by focusing on learning the task-specific knowledge. Although there is a big gap between BERT and TinyBERT in model size, by performing the proposed two-stage …
QC-BERT: A Quantum-Classical hybrid framework for Efficient...
May 18, 2025 · Transformers have revolutionized NLP but are constrained by their massive parameter counts, posing challenges for edge deployment. Quantum computing, leveraging …
ZipLM: Inference-Aware Structured Pruning of Language Models
Jun 20, 2023 · In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Of note is that on analyzed GLUE tasks, …
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained ...
Feb 1, 2023 · We propose a novel task-agnostic distillation method for Transformer-based language models equipped with iterative pruning.
Lifting the Curse of Capacity Gap in Distilling Large Language …
Sep 22, 2022 · Abstract: Large language models (LLMs) have shown compelling performance on various downstream tasks, but unfortunately require a tremendous amount of inference …
Language model compression with weighted low-rank factorization
Jan 28, 2022 · Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression …
A Token is Worth over 1,000 Tokens: Efficient Knowledge...
Sep 18, 2025 · The paper shows that this algorithm is much more token-efficient than standard pretraining from scratch and more time-efficient than TinyBert (distillation with no pruning). (b) …
Exploring extreme parameter compression for pre-trained …
Jan 28, 2022 · The paper only performs experiments on BERT-base and TinyBERT models, but I believe that the compression method proposed in the paper should be more demanded by …
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based...
May 1, 2025 · To protect user privacy, we aim to run language models directly on small devices like phones, which have limited computing power and need to save energy. Many key steps in …