The landscape of deep learning has undergone a dramatic transformation in recent years, especially with the rise of Transformer-based architectures. Originally designed for natural language processing tasks, Transformers have now made a powerful entry into the visual domain, challenging the long-standing dominance of convolutional neural networks (CNNs).