Nice work ! I'm curious, do you have any plans to provide FSDP support for training the model on GPUs with limited capacity (24/40GB) ? Also, how much time it takes to train the model on 8 - A100 GPUs( 80 GB probably used) ? Thank You!