About 50 results
Open links in new tab
  1. TinyBERT: Distilling BERT for Natural Language Understanding

    Sep 25, 2019 · Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, the pre-trained language …

  2. TinyBERT by focusing on learning the task-specific knowledge. Although there is a big gap between BERT and TinyBERT in model size, by performing the proposed two-stage …

  3. QC-BERT: A Quantum-Classical hybrid framework for Efficient...

    May 18, 2025 · Transformers have revolutionized NLP but are constrained by their massive parameter counts, posing challenges for edge deployment. Quantum computing, leveraging …

  4. ZipLM: Inference-Aware Structured Pruning of Language Models

    Jun 20, 2023 · In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Of note is that on analyzed GLUE tasks, …

  5. HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained ...

    Feb 1, 2023 · We propose a novel task-agnostic distillation method for Transformer-based language models equipped with iterative pruning.

  6. Lifting the Curse of Capacity Gap in Distilling Large Language …

    Sep 22, 2022 · Abstract: Large language models (LLMs) have shown compelling performance on various downstream tasks, but unfortunately require a tremendous amount of inference …

  7. Language model compression with weighted low-rank factorization

    Jan 28, 2022 · Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression …

  8. A Token is Worth over 1,000 Tokens: Efficient Knowledge...

    Sep 18, 2025 · The paper shows that this algorithm is much more token-efficient than standard pretraining from scratch and more time-efficient than TinyBert (distillation with no pruning). (b) …

  9. Exploring extreme parameter compression for pre-trained …

    Jan 28, 2022 · The paper only performs experiments on BERT-base and TinyBERT models, but I believe that the compression method proposed in the paper should be more demanded by …

  10. Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based...

    May 1, 2025 · To protect user privacy, we aim to run language models directly on small devices like phones, which have limited computing power and need to save energy. Many key steps in …