Beyond Labels and Captions: Contextualizing Grounded Semantics for Explainable Visual Interpretation