Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...
In some ways, data and its quality can seem strange to people used to assessing the quality of software. There’s often no observable behaviour to check and little in the way of structure to help you ...