Similar Items: Efficient, VRAM-Constrained xLM Inference on Clients