Tinybert Encoder Model Architecture

About 50 results

Open links in new tab

Any time

openreview.net
https://openreview.net › forum
TinyBERT: Distilling BERT for Natural Language Understanding
Sep 25, 2019 · Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, the pre-trained language …
openreview.net
https://openreview.net › pdf
[PDF]
TINYBERT: DISTILLING BERT FOR NATURAL LAN GUAGE …
TinyBERT by focusing on learning the task-specific knowledge. Although there is a big gap between BERT and TinyBERT in model size, by performing the proposed two-stage …
openreview.net
https://openreview.net › forum
QC-BERT: A Quantum-Classical hybrid framework for Efficient...
May 18, 2025 · Transformers have revolutionized NLP but are constrained by their massive parameter counts, posing challenges for edge deployment. Quantum computing, leveraging …
openreview.net
https://openreview.net › forum
ZipLM: Inference-Aware Structured Pruning of Language Models
Jun 20, 2023 · In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Of note is that on analyzed GLUE tasks, …
openreview.net
https://openreview.net › forum
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained ...
Feb 1, 2023 · We propose a novel task-agnostic distillation method for Transformer-based language models equipped with iterative pruning.
openreview.net
https://openreview.net › forum
Lifting the Curse of Capacity Gap in Distilling Large Language …
Sep 22, 2022 · Abstract: Large language models (LLMs) have shown compelling performance on various downstream tasks, but unfortunately require a tremendous amount of inference …
openreview.net
https://openreview.net › forum
Language model compression with weighted low-rank factorization
Jan 28, 2022 · Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular value decomposition (SVD) plays a vital role in this compression …
openreview.net
https://openreview.net › forum
A Token is Worth over 1,000 Tokens: Efficient Knowledge...
Sep 18, 2025 · The paper shows that this algorithm is much more token-efficient than standard pretraining from scratch and more time-efficient than TinyBert (distillation with no pruning). (b) …
openreview.net
https://openreview.net › forum
Exploring extreme parameter compression for pre-trained …
Jan 28, 2022 · The paper only performs experiments on BERT-base and TinyBERT models, but I believe that the compression method proposed in the paper should be more demanded by …
openreview.net
https://openreview.net › forum
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based...
May 1, 2025 · To protect user privacy, we aim to run language models directly on small devices like phones, which have limited computing power and need to save energy. Many key steps in …

Pagination
- 1
- 2
- 3
- Next