Neural nets are glorified curve fitting. The are curves parameterized by the weight matrix. The weight matrix is relatively massive (e.g. 1M DOF), which makes the family of curves it generate essentially almost fluid like a piece of yarn. Now given a small amount of data, and a programmable piece of string, how well can you fit the data? Turns out the string is higher dimensional than the data, so you can fit any curve you like. The trick, it avoiding overfitting. Overfitting is the yarn warping its shape to fit noise that has no intrinsic meaning. That's what cross validation prevents ... overfitting. Stop moving the yarn to match the training data better when it fails to improve an independent performance test. Thats what machine learning is... figuring out algorithms that don't overfit and have some ability to generalize onto data not seen before. It's still basically glorified curve fitting.