Papers tagged “efficiency”

18 papers · All papers →

Mixtral of Experts

2024 arXiv
3D Gaussian Splatting for Real-Time Radiance Field Rendering

2023 SIGGRAPH
Consistency Models

2023 ICML
Efficient Memory Management for Large Language Model Serving with PagedAttention

2023 SOSP
Fast Inference from Transformers via Speculative Decoding

2023 ICML
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

2023 arXiv
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

2023 arXiv
QLoRA: Efficient Finetuning of Quantized LLMs

2023 NeurIPS
RWKV: Reinventing RNNs for the Transformer Era

2023 EMNLP Findings
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

2022 NeurIPS
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

2022 NeurIPS
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

2022 ICLR
Efficiently Modeling Long Sequences with Structured State Spaces

2021 arXiv
LoRA: Low-Rank Adaptation of Large Language Models

2021 ICLR
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

2021 JMLR
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

2019 ICML
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

2019 ICLR
Distilling the Knowledge in a Neural Network

2015 NeurIPS Workshop

© 2026 Taeung Jeong

Main Explorer Papers