Similar Items: Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring