Distributed training (#25) * distributed training * torchrun to start the training, but it quit directly * fixed several distributed bugs. I am going to change the Dataset to map rather than IterableDataset since it does not work with DistributedSampler * mode of dataset * fix remove_contact bug * distributed training is working.