DSPy: Keeping Pipeline Promises Intact
Language-model pipelines began life as collections of personal tricks. We shipped features only after someone produced the right paragraph of instructions, and that paragraph lived in a notebook or a mind. The knowledge was fragile, undocumented, and always under revision. We learned this the hard way building a classifier that stubbornly hovered around 50% accuracy for days. After countless attempts and growing frustration, we discovered two innocuous words in our prompt were poisoning the results. Remove them, and accuracy jumped to 70% overnight. The "fix" lived in one person's head, and we had no process for finding it again.