High-performance matrix multiplication remains a cornerstone of numerical computing, underpinning a wide array of applications from scientific simulations to machine learning. Researchers continually ...
KernelOptimizer is an open-source tool that automates CUDA kernel optimization for PyTorch workloads using large language models (LLMs). Inspired by Stanford CRFM’s fast kernel research, it leverages ...
Abstract: Stochastic computing (SC) has emerged as a promising technique for reducing hardware costs in various applications, particularly in multiply-accumulate (MAC) intensive tasks such as neural ...
Abstract: Various studies have considered the reduction in sidelobes when using window functions, and further sidelobe reduction using existing design results is an important perspective. In this ...
Abstract: Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating ...