Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Preprint in arXiv (September 2024)
The most recent citing publications are shown below. View all 10 publications that cite this research output on Dimensions.
Preprint in arXiv (September 2024)
Conference proceeding (July 2024)
Article in Microelectronic Engineering (July 2024)