We propose a method for estimating the pose of a human body using its approximate 3D volume (visual hull) obtained in real time from synchronized videos. Our method can cope with loose-fitting clothing, which hides the human body and produces non-rigid motions and critical reconstruction errors, including phantom volumes in a visual hull, as well as tight-fitting clothing. To follow the shape variations robustly against erratic motions and the ambiguity between a reconstructed body shape and its pose, the probabilistic dynamical model of human volumes is learned in advance from training temporal volumes refined by error correction. The dynamical model of a body pose (joint angles) is also learned with its corresponding volume. By comparing the volume model with an input visual hull and regressing its pose from the pose model, pose estimation can be realized. In our method, this is improved by double volume comparison that integrates probabilistic and geometric models: 1) comparison in a low-dimensional latent space with probabilistic volume models and 2) comparison in an observation volume space using geometric constrains between a real volume and a visual hull. Comparative experiments demonstrate the effectiveness of our method faster than existing methods.