Given the ease with which Transformers generalize, scale, and their efficiency on existing hardware, they have become the dominant architecture over the last ~7 years, achieving SoTA in most applications. That’s more true now than ever given that most of the researchers and developers are working on them, all