RBM Generation

RBM with 3D Volumetric Data

Mar 17, 2017


Restricted Boltzmann Machine (RBM) is a type of Neural Network that learns the probability distribution of the input data in an unsupervised manner. Before Generative Adversarial Networks (GAN) was introduced in 2014, RBM was one of the common techniques for stochastic data generation. In this post, we will use Bernoulli RBM on 3D volumetric data, with an aim to test out its ability in extracting useful latent features, reconstructing original data, and generating new data samples. Code and detailed configuration is up here.


The volumetric data comes from the 3D ShapeNets dataset from Princeton University. There were 8000 training data chosen for chair and airplane models, respectively. Each instance is a 32*32*32 voxelated chair/airplane model, filled with binary values indicating whether the positions are occupied with voxels or not.

Fig1. A sample training instance for chair and airplane, respectively.


To train the RBM model, we will use BernoulliRBM learner from Scikit-Learn. After tuning, the final hyper-parameters are chosen as follows:

  • n_components=1000 (1000 neurons on hidden layer)
  • learning_rate=0.02
  • batch_size=200 (perform gradient descent for every 200 training instances)
  • n_iter=200 (go through the 16000 training instances for 200 times)

Note that the dataset was shuffled before training to avoid highly correlated instances in each batch. The training process ended up taking more than 16 hours on a 2.4GHz laptop. The pseudo-likelihood started from -18960.92 and gradually converged around -450, marking a significant reduction in its reconstruction error as a result of effective training.

Visualize Activated Neurons Given An Input

One way to check the learning effectiveness of the RBM model and interpret its learning result is to take a peek into the weight parameters associated with the hidden neuron.

To do this, we firstly transformed the original dataset (16000*32768) into its latent representation (16000*1000), where each of the latent feature is a non-linear combination of the original features/voxels. This transformation gives us a mapping between original input to the activation status of the hidden neurons, which is between 0 to 1. Once we know which hidden neuron is activated given an original instance, we can visualize the weight parameters for that particular hidden neuron as shown below:







Fig2. Upper left is an original chair instance, the rest are the visualization of the weight parameters corresponding to the activated hidden neurons.

As shown from above, these five hidden neurons were the most activated when the chair is presented. We can visually inspect why it is the case: It seems hidden neuron #105 and #150 detects the back of the chair, #115 and #283 detects the seating platform as well as the legs. #175 is a hard one for me, looks like it also captures the seating platform but the overall structure remains inexplicable.

3D Model Reconstruction

Just like we can recreate a human face with principal components extracted using PCA, we can reconstruct the 3D model using the 1000 latent features from RBM: We start with an original instance by feeding it to the visible layer of the RBM network. We then calculate the possibility of the activation for each hidden neuron, given the input. Once we know which neurons are activated, we estimate the on/off statuses of the original features/voxels, that gives us a reconstructed model. This forward-and-backward process is known as Gibbs sampling, we can run it for multiple times to get a converged data distribution in the visible layer.






As we can see, the reconstructed model in the middle comes with a lot of noise, that is, isolated voxel points scattered around the main body of the model. To remove the noise and see what the underlying reconstructed model actually looks like, we will apply a 3-dimensional filter that erodes the noise and dilates the model back to its original size, the result of which is illustrated on the rightmost animation. The erosion and dilation are known as morphological operations, mostly seen in image processing tasks to remove noise or fill the holes.

3D Model Generation

Using the very idea of Gibbs sampling, we can perform data generation as well. In this case, instead of starting with an actual 3D model, we initiate a 3D object made of random noise points. The generation process is very similar to the reconstruction above. One thing to note is the density of the initial random noise. Ideally we would want the noise density to match the density extent in the original data instances, which is around 1.2-2%, depending on whether to generate a chair or an airplane.

Initial random noise

Generated...couch chair, with a tail?

Generated from random noise, ended up as a couch chair

Generated from random noise, ended up as a rotating office chair

Generated from a human figure, ended up as a bar stool!

The interesting thing is that the new instance generated doesn’t necessarily match the ones in the original data, the RBM is able to capture the regularities exist in the training models and “hallucinates” about the new model. Since Gibbs sampling is a stochastic process, we usually arrive at different models each time it is performed.