This webpage was generated automatically; to view the article in its initial site, you can click the link below:
https://www.nature.com/articles/s43588-024-00753-x
and if you wish to eliminate this article from our site, kindly get in touch with us
Jiang, A. Q. et al. Mixtral of specialists. Preprint at (2024).
Touvron, H. et al. Llama 2: open foundation and fine-tuned conversational models. Preprint at (2024).
Gemini Team Google et al. Gemini: a collection of highly proficient multimodal models. Preprint at (2023).
Brown, T. B. et al. Language models are few-shot learners. In Proc. Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, 2020).
Kaplan, J. et al. Scaling laws for neural language models. Preprint at (2020).
Hoffmann, J. et al. An empirical examination of compute-optimal large language model training. In Proc. Advances in Neural Information Processing Systems (eds Koyejo, S. et al.) Vol. 35 (Curran Associates, 2022).
Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Res. 24, 11324–11436 (2023).
Jordan, M. & Jacobs, R. Hierarchical mixtures of specialists and the EM algorithm. In Proc. 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan) Vol. 2, 1339–1344 (IEEE, 1993); https://doi.org/10.1109/IJCNN.1993.716791
Jacobs, R. A., Jordan, M. I., Nowlan, S. J. & Hinton, G. E. Adaptive mixtures of local specialists. Neural Comput. 3, 79–87 (1991).
Shazeer, N. et al. Outrageously large neural networks: the sparsely-gated mixture-of-specialists layer. In Proc. International Conference on Learning Representations (ICLR, 2017); https://openreview.net/forum?id=B1ckMDqlg
Fedus, W., Zoph, B. & Shazeer, N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23, 5232–5270 (2022).
Raffel, C. et al. Delving into the boundaries of transfer learning utilizing a unified text-to-text transformer. J. Mach. Learn. Res. 21, 5485–5551 (2020).
Du, N. et al. GLaM: effective scaling of language models with a mixture-of-experts approach. In Proc. 39th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 162, 5547–5569 (PMLR, 2022).
Clark, A. et al. Standardized scaling laws for routed language models. In Proc. 39th International Conference on Machine Learning Vol. 162 (eds Chaudhuri, K. et al.) 4057–4086 (PMLR, 2022).
Ludziejewski, J. et al. Scaling laws for fine-grained mixtures of experts. In Proc. ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models (PMLR, 2024); https://openreview.net/forum?id=Iizr8qwH7J
Csordás, R., Irie, K. & Schmidhuber, J. Estimating two-layer feedforward networks for optimized transformers. In Proc. Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H. et al.) 674–692 (ACL, 2023); https://doi.org/10.18653/v1/2023.findings-emnlp.49
Reuther, A. et al. A survey of AI and ML accelerators, along with trends. In Proc. 2022 IEEE High Performance Extreme Computing Conference (HPEC) 1–10 (IEEE, 2022); https://doi.org/10.1109/HPEC55821.2022.9926331
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory components and their applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
Lanza, M. et al. Memristive technologies for data retention, computing, encryption, and radio-frequency transmission. Science 376, eabj9979 (2022).
Mannocci, P. et al. In-memory computing utilizing novel memory devices: current status and prospects.
“`APL Mach. Learn 1, 010902 (2023).
Huang, Y. et al. Memristor-based hardware accelerators for artificial intelligence. Nat. Rev. Electr. Eng. 1, 286–299 (2024).
Le Gallo, M. et al. A 64-core mixed-signal in-memory computation chip based on phase-change memory for deep neural network inference. Nat. Electron. 6, 680–693 (2023).
Ambrogio, S. et al. An analog-AI chip for energy-conserving speech recognition and transcription. Nature 620, 768–775 (2023).
Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
Zhang, W. et al. Edge learning employing a fully integrated neuro-inspired memristor chip. Science 381, 1205–1211 (2023).
Wen, T.-H. et al. Integration of memristor and digital compute-in-memory processing for energy-efficient edge computing. Science 384, 325–332 (2024).
Fick, L., Skrzyniarz, S., Parikh, M., Henry, M. B. & Fick, D. Analog matrix processor for edge AI real-time video analysis. In Proc. 2022 IEEE International Solid-State Circuits Conference (ISSCC) Vol. 65, 260–262 (IEEE, 2022); https://doi.org/10.1109/ISSCC42614.2022.9731773
Arnaud, F. et al. High-density embedded PCM cell in 28 nm FDSOI technology tailored for automotive micro-controller applications. In Proc. 2020 IEEE International Electron Devices Meeting (IEDM) 24.2.1–24.2.4 (IEEE, 2020); https://doi.org/10.1109/IEDM13553.2020.9371934
Lee, S. et al. A 1 Tb 4b/cell 64-stacked-WL 3D NAND flash memory demonstrating 12 MB/s programming throughput. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 340–342 (IEEE, 2018); https://doi.org/10.1109/ISSCC.2018.8310323
Park, J.-W. et al. A 176-stacked 512 Gb 3b/cell 3D-NAND flash with a density of 10.8 Gb/mm2 and a peripheral circuit beneath the cell array architecture. In Proc. 2021 IEEE International Solid-State Circuits Conference (ISSCC) Vol. 64, 422–423 (IEEE, 2021); https://doi.org/10.1109/ISSCC42613.2021.9365809
Lee, S.-T. & Lee, J.-H. Neuromorphic computing utilizing NAND flash memory architecture with pulse width modulation technique. Front. Neurosci. 14, 571292 (2020).
Bavandpour, M., Sahay, S., Mahmoodi, M. R. & Strukov, D. B. 3D-aCortex: a highly compact energy-efficient neurocomputing platform leveraging commercial 3D-NAND flash memories. Neuromorphic Comput. Eng. 1, 014001 (2021).
Shim, W. & Yu, S. Architectural innovation of 3D NAND-based compute-in-memory structures for GB-scale deep neural networks. IEEE Electron Device Lett. 42, 160–163 (2020).
Hsieh, C.-C. et al. Demonstration via chip of a high-density (43 Gb) and high-bandwidth (300 Gb/s) 3D NAND powered in-memory search accelerator for Ternary Content Addressable Memory (TCAM) and Hamming distance proximity searches. In Proc. 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 1–2 (IEEE, 2023); https://doi.org/10.23919/VLSITechnologyandCir57934.2023.10185361
Huo, Q. et al. A computing-in-memory macro utilizing three-dimensional resistive random-access memory technology. Nat. Electron. 5, 469–477 (2022).
Jain, S. et al. A diverse and programmable compute-in-memory accelerator structure for analog-AI employing a dense 2-D mesh configuration. IEEE Trans. Very Large Scale Integr. VLSI Syst. 31, 114–127 (2023).
Cui, C. et al. An overview of multimodal large language models for self-driving vehicles. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops 958–979 (IEEE, 2024).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of profound bidirectional transformers for linguistic comprehension. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 4171–4186 (ACL, 2019); https://doi.org/10.18653/v1/N19-1423
Kim, W., Son, B. & Kim, I. ViLT: vision-and-language transformer devoid of convolution or region supervision. In Proc. 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang,
“`T.) 5583–5594 (PMLR, 2021); https://proceedings.mlr.press/v139/kim21k.html
Alayrac, J.-B. et al. Flamingo: a visual language framework for few-shot learning. In Proc. Advances in Neural Information Processing Systems Vol. 35 (eds Koyejo, S. et al.) 23716–23736 (Curran Associates, 2022); https://proceedings.neurips.cc/paper_files/paper/2022/file/960a172bc7fbf0177ccccbb411a7d800-Paper-Conference.pdf
Pope, R. et al. Effectively increasing transformer inference. In Proc. Machine Learning and Systems Vol. 5 (eds Song, D. et al.) 606–624 (Curran Associates, 2023).; https://proceedings.mlsys.org/paper_files/paper/2023/file/c4be71ab8d24cdfb45e3d06dbfca2780-Paper-mlsys2023.pdf
Choquette, J., Gandhi, W., Giroux, O., Stam, N. & Krashinsky, R. NVIDIA A100 Tensor Core GPU: efficiency and innovation. IEEE Micro 41, 29–35 (2021).
Radford, A. et al. Linguistic models are unsupervised multitask performers. Semantic Scholar (2019).
Merity, S., Xiong, C., Bradbury, J. & Socher, R. Pointer sentinel mixture methodologies. In Proc. International Conference on Learning Representations (ICLR, 2017); https://openreview.net/forum?id=Byj72udxe
Vasilopoulos, A. et al. Leveraging the state dependency of conductArticle
MATH
Google Scholar
Bernstein, D. & Rodeh, M. Comprehensive instruction scheduling for superscalar architectures. In Proc. ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation PLDI ’91 241–255 (ACM, 1991); https://doi.org/10.1145/113445.113466
Joshi, V. et al. Precise deep neural network inference utilizing computational phase-change memory. Nat. Commun. 11, 2473 (2020).
Kudo, T. & Richardson, J. SentencePiece: a straightforward and language-agnostic subword tokenizer and detokenizer for neural text manipulation. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (eds Blanco, E. & Lu, W.) 66–71 (Association for Computational Linguistics, 2018).
Tillet, P., Kung, H. T. & Cox, D. Triton: a mid-level language and compiler for tiled neural network computations. In Proc. 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages 10–19 (ACM, 2019); https://doi.org/10.1145/3315508.3329973
Le Gallo, M. et al. Implementing the IBM analog in-memory hardware acceleration kit for neural network training and inference. APL Mach. Learn. 1, 041102 (2023).
Büchel, J. et al. AIHWKIT-lightning: an adaptable HW-aware training toolkit for analog in-memory computation. In Proc. Advances in Neural Information Processing Systems 2024 Workshop, Machine Learning with new Compute Paradigms (Curran Associates, 2024); https://openreview.net/forum?id=QNdxOgGmhR
Büchel, J. et al. Gradient descent-driven programming of analog in-memory computing cores. In Proc. 2022 International Electron Devices Meeting (IEDM) 33.1.1–33.1.4 (IEEE, 2022); https://doi.org/10.1109/IEDM45625.2022.10019486
Büchel, J. Original data for figures in ‘Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing’. Zenodo (2024).
Büchel, J. IBM/analog-moe: code publication. Zenodo (2024).
Büchel, J. & Vasilopolous, A. IBM/3D-CiM-LLM-Inference-Simulator: code publication. Zenodo (2024).
Goda, A. Accomplishments in 3D NAND technology and forthcoming scaling outlooks. IEEE Trans. Electron Devices 67, 1373–1381 (2020).
Lacaita, A. L., Spinelli, A. S. & Compagnoni, C. M. Compact solid-state storage: a lengthy journey to accomplishment. In Proc. 2021 IEEE Latin America Electron Devices Conference (LAEDC) 1–4 (IEEE, 2021); https://doi.org/10.1109/LAEDC51812.2021.9437865
Shoeybi, M. et al. Megatron-LM: instructing multi-billion parameter linguistic models via model parallelism. Preprint at (2020).
This page was generated automatically; to view the article in its original setting, please visit the link below:
https://www.nature.com/articles/s43588-024-00753-x
and if you wish to get this article removed from our site, kindly contact us
This page was generated automatically. To view the article in its original context, you can…
This webpage was generated automatically; to view the article in its initial site, you can…
This webpage was generated automatically; to view the article at its source, please follow the…
This page was generated automatically; to access the article in its initial location, you can…
This page was generated programmatically; to view the article in its original location, you may…
This page was generated automatically. To access the article in its original setting, please refer…