Learning Pose and State-Invariant Object Representations for Fine-Grained Recognition and Retrieval