Similar Items: Robustness and authorship bias of large language models in scientific abstracts scoring