Apr 9, 2024
HI Andressa - Sorry somehow I missed responding to you. The models use a lot of GPU memory during inference. These memories are organized as pages. What nVidia unified memory features, which seemlesly swap the pages between GPU and CPU, providing a larger memory space to work with. SO the inferences and tuning, can continue and do not stop/crash due to memory overflow. Hope that helps