commonly split in 70% train, 10% validation and 20% test.
simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm
Use D-fold cross-validation to determine the average number of epochs that optimizes validation performance
5-fold cross-validation for the value of k in KNN.
Train on the full data set using this many epochs to produce the final results
for k in k_choices:
k_to_accuracies[k]=[]
for i in range(num_folds):
num_test=X_train_folds[i].shape[0]
_X_validate=X_train_folds[i];_y_validate=y_train_folds[i]
if i==0:_X_train=X_train_folds[1];_y_train=y_train_folds[1];rg=range(2,num_folds)
else: _X_train=X_train_folds[0];_y_train=y_train_folds[0];rg=range(1,num_folds)
for j in rg:
if i!=j:
_X_train=np.concatenate((_X_train,X_train_folds[j]))
_y_train=np.concatenate((_y_train,y_train_folds[j]))
knn=KNearestNeighbor()
knn.train(_X_train,_y_train)
y_pred=knn.predict(_X_validate,k=k)
k_to_accuracies[k].append( float(np.sum(y_pred == _y_validate)) / num_test )