Train DNNs on BrainScaleS-2 =========================== This example uses the PyTorch extension ``hxtorch``, already presented in the :doc:`introduction to the matrix multiplication <tp_01-properties>`, to train a deep neural network (DNN). In order to use the microscheduler we have to set some environment variables first: .. include:: common_quiggeldy_setup.rst Some imports that are needed later: .. code:: ipython3 import ipywidgets as w import numpy as np import torch import torch.nn.functional as F import hxtorch import matplotlib as mpl import matplotlib.pyplot as plt from contextlib import suppress with suppress(IOError): plt.style.use("_static/matplotlibrc") %matplotlib inline from IPython.display import HTML display(HTML("<style>.output_wrapper button.btn-default, " ".output_wrapper .ui-dialog-titlebar {display:none}</style>")) from _static.common.helpers import save_nightly_calibration Hardware in the loop -------------------- When training on the BrainScaleS-2 system using hxtorch, the multiply accumulate operation (MAC) in the forward pass is performed on the hardware, but the gradient is computed on the host computer: .. image:: _static/tutorial/hxtorch_itl.png :width: 25% :align: center For the calculation of the gradients, a mathematical model is required that approximately reflects the behavior of the hardware operation. In hxtorch, the following very simple linear relationship is assumed for this purpose: .. math:: y_i = \sum_j x_j \cdot w_{ij} \cdot g_\text{BSS-2} + \kappa_i\quad\text{with}\quad \kappa_i \sim N(0,\sigma) Here, the statistical noise :math:`\kappa_i` of the neurons is assumed to be Gaussian distributed with standard deviation :math:`\sigma`. The gain factor :math:`g_\text{BSS-2}` represents the conversion factor between the units of the input, weights and the analog-to-digital converter at the output and is specific to the individual hardware setup and its calibration. For the calculations on the host, these parameters can be measured after initialization of the hardware connection: .. code:: ipython3 # download claibration and initialize hardware configuration save_nightly_calibration('hagen_cocolist.pbin') hxtorch.init_hardware(hxtorch.CalibrationPath('hagen_cocolist.pbin')) # measures the hardware gain and the average statistical noise on the outputs hardware_parameter = hxtorch.perceptron.measure_mock_parameter() print(f"gain factor: {hardware_parameter.gain:.5f}") print(f"noise std.: {hardware_parameter.noise_std:.5f}") # use the measured parameters for backward pass and in mock mode hxtorch.perceptron.set_mock_parameter(hardware_parameter) Simulating hardware: The mock mode ---------------------------------- The linear model of the hardware presented above can optionally also be used for the forward pass with hxtorch. It also features the additional noise, reduced resolution and restricted value ranges of the system. .. image:: _static/tutorial/hxtorch_mock_mode.png :width: 90% :align: center This so-called mock mode can be switched on and off individually for each hxtorch operation and for each layer via the ``mock`` parameter, e.g. ``hxtorch.perceptron.matmul(..., mock=True)`` It is especially convenient when no BrainScaleS-2 system is available and allows fast prototyping of DNN models. .. admonition:: References for further reading The integration into the PyTorch software frontend ``hxtorch`` and a benchmark on the human activity recognition dataset is published in: - Spilger, Philipp, et al. “hxtorch: PyTorch for BrainScaleS-2.” IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning. Springer, Cham, 2020. 189-200. https://doi.org/10.1007/978-3-030-66770-2_14 More details on the implementation of the backward pass, the mock mode and the layer initilization can be found in (chapter 4.2 ff.): - Emmel, Arne “Inference with Convolutional Neural Networks on Analog Neuromorphic Hardware” *Master’s Thesis*. University of Heidelberg. `pdf <http://www.kip.uni-heidelberg.de/Veroeffentlichungen/details.php?id=4122>`__ Example application: the Yin-Yang dataset ----------------------------------------- .. code:: ipython3 class YinYangDataset(torch.utils.data.dataset.Dataset): """ The Yin-Yang dataset. Slightly modified version adapted from: https://github.com/lkriener/yin_yang_data_set """ def __init__(self, r_small=0.1, r_big=0.5, size=1000, seed=42): super(YinYangDataset, self).__init__() # numpy RNG to allow compatibility to other learning frameworks self.rng = np.random.RandomState(seed) self.r_small = r_small self.r_big = r_big self.size = size def get_sample(self, goal=None): # sample until goal is satisfied found_sample_yet = False while not found_sample_yet: # sample x,y coordinates x, y = self.rng.rand(2) * 2. * self.r_big # check if within yin-yang circle if np.sqrt((x - self.r_big)**2 + (y - self.r_big)**2) > self.r_big: continue # check if they have the same class as the goal for this sample c = self.which_class(x, y) if goal is None or c == goal: found_sample_yet = True break return x, y, c def which_class(self, x, y): # equations inspired by # https://link.springer.com/content/pdf/10.1007/11564126_19.pdf d_right = self.dist_to_right_dot(x, y) d_left = self.dist_to_left_dot(x, y) criterion1 = d_right <= self.r_small criterion2 = d_left > self.r_small and d_left <= 0.5 * self.r_big criterion3 = y > self.r_big and d_right > 0.5 * self.r_big is_yin = criterion1 or criterion2 or criterion3 is_circles = d_right < self.r_small or d_left < self.r_small if is_circles: return 2 return int(is_yin) def dist_to_right_dot(self, x, y): return np.sqrt((x - 1.5 * self.r_big)**2 + (y - self.r_big)**2) def dist_to_left_dot(self, x, y): return np.sqrt((x - 0.5 * self.r_big)**2 + (y - self.r_big)**2) def __getitem__(self, index): # keep num of class instances balanced by using rejection sampling # choose class for this sample goal_class = self.rng.randint(3) x, y, c = self.get_sample(goal=goal_class) sample = (torch.tensor([x, y, 1-x, 1-y], dtype=torch.float), c) return sample def __len__(self): return self.size Let's take a look at this dataset! .. code:: ipython3 colors = ('black', 'white', 'orange') class_names = ('yin', 'yang', 'dot') num_samples = 2000 loader = torch.utils.data.DataLoader( dataset=YinYangDataset(size=num_samples), batch_size=num_samples) The samples in the modified version are randomly redrawn each time they are accessed, so that each sample will be presented to the network only once. Repeated execution of the following code cell therefore shows slightly different samples each time. The number of samples is the same for each of the three classes. .. code:: ipython3 samples, labels = next(iter(loader)) fig, ax = plt.subplots(1,1) for i in range(3): ax.scatter( *samples[labels==i][..., :2].t(), c=colors[i], label=class_names[i]) ax.set_aspect('equal') ax.set_facecolor('gray') ax.set_xlim(0,1); ax.set_ylim(0,1) ax.set_xlabel("x"); ax.set_ylabel("y") _=ax.legend() .. image:: _static/tutorial/yin_yang_itl_dataset.png :width: 90% :align: center :class: solution .. admonition:: Further reading This dataset as well as some model proposes and benchmarks are presented in: - Kriener, L., Göltz, J., & Petrovici, M. A. (2021). The Yin-Yang dataset. arXiv preprint: `arXiv:2102.08211 <https://arxiv.org/abs/2102.08211>`__. .. code:: ipython3 def alpha_cmap(color): """ Returns a colormap ranging from transparent to specified color """ cmap = np.broadcast_to(mpl.colors.to_rgba(color), (256, 4)).copy() cmap[:, -1] = np.linspace(0, 1, 256) return mpl.colors.ListedColormap(cmap) def test_train_epoch(model: torch.nn.Module, loader: torch.utils.data.DataLoader, optimizer: torch.optim.Optimizer): """ Test the model and train for a single epoch afterwards. :param model: The model :param loader: Data loader containing the train data set :param optimizer: Optimizer that handles the weight updates """ # prepare test data (grid of equal spaced samples): gridsize = 16 # one dimension of the test grid x = y = torch.linspace(0, 1, gridsize) x, y = torch.meshgrid(x, y); x, y = x.flatten(), y.flatten() data_test = torch.tensor(list(zip(x, y, 1-x, 1-y)), requires_grad=True) data, target = next(iter(loader)) data = torch.cat((data_test, data)) # prepend to train data # the actual training: model.train() optimizer.zero_grad() output = model(data) loss = F.cross_entropy(output[len(data_test):], target) loss.backward() optimizer.step() # get test data from output and reshape: with torch.no_grad(): output_test = F.softmax(output[:len(data_test)], dim=-1) output_test = output_test.reshape(gridsize, gridsize, -1) return torch.transpose(output_test, 0, 1) def train(model: torch.nn.Module, loader: torch.utils.data.DataLoader, scheduler: torch.optim.lr_scheduler._LRScheduler, num_epochs: int = 200): """ Train the model while displaying the test results. :param model: The model :param loader: Data loader containing the train data set :param scheduler: Scheduler that handles the weight updates :param num_epochs: Number of epochs to train """ fig = plt.figure() ax = plt.gca() ax.grid(False) ax.set_title("Epoch 0") imgs = [] for i, c in enumerate(colors): imgs.append(plt.imshow( torch.zeros((1, 1)), vmin=0, vmax=1, extent=(0, 1, 0, 1), origin='lower', cmap=alpha_cmap(c))) plt.xlabel("$x$"); plt.ylabel("$y$") for _ in range(num_epochs + 1): test_out = test_train_epoch(model, loader, scheduler.optimizer) for i, img in enumerate(imgs): img.set_data(test_out[..., i]) ax.set_title(f"Epoch {scheduler.last_epoch}") fig.canvas.draw() scheduler.step() wout = w.Output(layout=w.Layout(height="450px")); display(wout) plt.close(); wout.layout=w.Layout(height="0px") Modeling with hxtorch feels almost like using PyTorch normally, you can even use layers of hxtorch and PyTorch together. If you are familiar with PyTorch, the code below will also look familiar to you: .. code:: ipython3 class Model(torch.nn.Module): """ Classify the YinYang dataset. """ def __init__(self, mock: bool = False): super().__init__() self.classifier = torch.nn.Sequential( hxtorch.perceptron.nn.Linear(4, 128, mock=mock), hxtorch.perceptron.nn.ConvertingReLU(shift=1), hxtorch.perceptron.nn.Linear(128, 3, avg=5, mock=mock), ) def forward(self, *x): x = x[0] * 31. # scale to the whole input range x = self.classifier(x) return x .. code:: ipython3 model_mock = Model(mock=True) model_mock .. parsed-literal:: :class: solution Model( (classifier): Sequential( (0): Linear(in_features=4, out_features=128, num_sends=20, mock=True) (1): ConvertingReLU(shift=1) (2): Linear(in_features=128, out_features=3, num_sends=4, mock=True) ) ) .. code:: ipython3 batch_size = 350 lr = 1 # learning rate gamma = 0.99 # learning parameters decay loader = torch.utils.data.DataLoader( dataset=YinYangDataset(size=batch_size), batch_size=batch_size) scheduler = torch.optim.lr_scheduler.StepLR( torch.optim.Adam(model_mock.parameters(), lr=lr), step_size=1, gamma=gamma) .. code:: ipython3 train(model_mock, loader, scheduler, num_epochs=200) .. image:: _static/tutorial/yin_yang_itl_out1.png :width: 90% :align: center :class: solution .. code:: ipython3 model_hw = Model(mock=False) # initialize with state of mock model model_hw.load_state_dict(model_mock.state_dict()) model_hw .. parsed-literal:: :class: solution Model( (classifier): Sequential( (0): Linear(in_features=4, out_features=128, bias=None, num_sends=20) (1): ConvertingReLU(shift=1, mock=True) (2): Linear(in_features=128, out_features=3, bias=None, num_sends=4) ) ) .. code:: ipython3 lr = 0.2 gamma = 0.9 scheduler_hw = torch.optim.lr_scheduler.StepLR( torch.optim.Adam(model_hw.parameters(), lr=lr), step_size=1, gamma=gamma) .. code:: ipython3 train(model_hw, loader, scheduler_hw, num_epochs=15) .. image:: _static/tutorial/yin_yang_itl_out2.png :width: 90% :align: center :class: solution