hyperparameters: choices that we set rather than learn

commonly split in 70% train, 10% validation and 20% test.

commonly split in 70% train, 10% validation and 20% test.

Grid Search

simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm

D-fold Cross-validation: make full use of training data

  1. Use D-fold cross-validation to determine the average number of epochs that optimizes validation performance

    5-fold cross-validation for the value of k in KNN.

    5-fold cross-validation for the value of k in KNN.

  2. Train on the full data set using this many epochs to produce the final results

Untitled

for k in k_choices:
  k_to_accuracies[k]=[]
  for i in range(num_folds):
    num_test=X_train_folds[i].shape[0]
    _X_validate=X_train_folds[i];_y_validate=y_train_folds[i]
    if i==0:_X_train=X_train_folds[1];_y_train=y_train_folds[1];rg=range(2,num_folds)
    else: _X_train=X_train_folds[0];_y_train=y_train_folds[0];rg=range(1,num_folds)
    for j in rg: 
      if i!=j:
        _X_train=np.concatenate((_X_train,X_train_folds[j]))
        _y_train=np.concatenate((_y_train,y_train_folds[j]))
    knn=KNearestNeighbor()
    knn.train(_X_train,_y_train)
    y_pred=knn.predict(_X_validate,k=k)
    k_to_accuracies[k].append( float(np.sum(y_pred == _y_validate)) / num_test )

Workflow

  1. check initial loss without weight decay
  2. overfit on a small sample (ACC=100) to fiddle with architecture, learning rate, weight initialization
  3. use all training data, find lr
  4. coarse gird: lr and weight decay(try 1e-4, 1e-5, 0), train 5 epochs
  5. refine grid: train for 20 epochs without lr decay