Few-shot segmentation (FSS) aims to label pixels of an image with very few annotated images. The main challenge in FSS is determining the query pixel labels by referencing the class prototypes learned from the few labeled support exemplars. Previous methods focus on independently learning the class-wise descriptors from support images, which ignores the detailed contextual information and mutual relations of support-query features. To address this issue, we propose a joint learning method, Masked Cross-Image Encoding (MCE), to mine common visual properties describing object details and learn bidirectional inter-image dependencies enhancing feature interaction. MCE is more than a visual representation enrichment module; it also considers the cross-image mutual dependencies and contextual information. Thus the labeled visual feature representations of support images are better exploited to guide the prediction of a query image. Experiments on public FSS benchmarks PASCAL-5i and COCO-20i demonstrate the state-of-the-art meta-learning performance of the proposed method.