Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal AlignmentMulti-Modal-Methods