You possibly can construct a fortress in two methods: Begin stacking bricks one above the opposite, or draw an image of the fortress you’re about to construct and plan its execution; then, maintain evaluating it in opposition to your plan.
Everyone knows the second is the one approach we are able to probably construct a fortress.
Generally, I’m the worst follower of my recommendation. I’m speaking about leaping straight right into a pocket book to construct an LLM app. It’s the worst factor we are able to do to damage our mission.
Earlier than we start something, we’d like a mechanism to inform us we’re transferring in the appropriate path — to say that the very last thing we tried was higher than earlier than (or in any other case.)
In software program engineering, it’s referred to as test-driven improvement. For machine studying, it’s analysis.
Step one and essentially the most precious ability in growing LLM-powered purposes is to outline the way you’ll consider your mission.
Evaluating LLM purposes is nowhere like software program testing. I don’t undermine the challenges in software program testing, however evaluating LLMs isn’t as easy as testing.