class Config: vocab_size = 50257 # GPT-2 BPE vocab size d_model = 288 n_heads = 6 n_layers = 6 max_seq_len = 256 dropout = 0.1 batch_size = 32 lr = 3e-4 epochs = 3 device = 'cuda' if torch.cuda.is_available() else 'cpu'
It includes a hyperparameter table for scaling. build a large language model %28from scratch%29 pdf
Result: A "Foundation Model" that understands language but can't follow instructions yet. : class Config: vocab_size = 50257 # GPT-2 BPE