# A little summation exercise

In this post, I will derive a closed form for a simple summation I encountered in this Quora post. Basically, we wish to evaluate

$\displaystyle \sum_{x = 1}^n \sum_{y = 0}^{x - 1}(x + y).$

Now, without further ado, let us proceed.

\displaystyle \begin{aligned} f(n) &= \sum_{x = 1}^n \sum_{y = 0}^{x - 1}(x + y) \\ &= \sum_{x = 1}^n \sum_{y = 0}^{x - 1} x + \sum_{x = 1}^n \sum_{y = 0}^{x - 1} y \\ &= \sum_{x = 1}^n x^2 + \sum_{x = 1}^n \frac{x}{2}(x - 1) \\ &= \frac{1}{6}n(n + 1)(2n + 1) + \frac{1}{2} \sum_{x = 1}^n x^2 - \frac{1}{2}\sum_{x = 1}^n x \\ &= \frac{1}{6}n(n + 1)(2n + 1) + \frac{1}{12}n(n + 1)(2n + 1) - \frac{1}{4}n(n + 1) \\ &= \frac{3}{12}n(n + 1)(2n + 1) - \frac{1}{4}n^2 - \frac{1}{4} n \\ &= \frac{3}{12}(n^2 + n)(2n + 1) - \frac{1}{4}n^2 - \frac{1}{4} n \\ &= \frac{3}{12}(2n^3 + 3n^2 + n) - \frac{1}{4}n^2 - \frac{1}{4} n \\ &= \frac{1}{2}n^3 + \frac{3}{4}n^2 + \frac{1}{4}n - \frac{1}{4}n^2 - \frac{1}{4} n \\ &= \frac{1}{2}n^3 + \frac{1}{2}n^2 \\ &= \frac{1}{2}n^2(n + 1). \end{aligned}

Now, if we plug $n = 3$ in $f(n)$ we will get $f(n) = 18$, as expected.

# Simulating a closed particle system

Real-time simulation of particles colliding with each other is relatively easy to implement, yet, most likely, you will notice that the average speed of each particle grows. In this post, I will present one solution to this issue, or namely, keeping the total energy of a simulated particle system constant, yet before I proceed, I have to specify the “natural laws” of our system.

We are given a two-dimensional plane. In it, we create $n$ different particles with masses $m_1, m_2, \dots, m_n$. Given two particles $i$ and $j$, they repel each other with magnitude of

$\displaystyle F_{i,j} = \gamma \frac{m_i m_j}{r_{i,j}^2}.$

Since motion is essential in this setting, we consider each particle having a kinetic energy of magnitude

$\displaystyle E_i^k = \frac{1}{2} m_i v_i^2,$

where $v_i$ is the speed of particle $i$. Finally, a potential energy $E_{i,j}^p$ of two given particles $i,j$ is

\displaystyle \begin{aligned} \int_{-\infty}^{d} -\gamma \frac{m_i m_j}{r^2} \, \mathrm{d}r &= \Bigg[ \gamma \frac{m_i m_j}{r} \Bigg]_{r = -\infty}^{r = d} \\ &= \gamma\frac{m_i m_j}{d} - \Bigg( \lim_{r \to -\infty} \gamma \frac{m_i m_j}{r} \Bigg) \\ &= \gamma \frac{m_i m_j}{d}, \end{aligned}

where $d$ is the distance between $i$ and $j$. What comes to the total sum of energies, it is defined as follows:

$\displaystyle E_{\text{total}} = \sum_{i = 1}^n E_i^k + \sum_{i = 1}^n \sum_{j = i + 1}^n E_{i,j}^p.$

### Speed normalization

Suppose the total energy at an instant $t$ is $E_{\text{total}}$, and the total energy at an instant $t + 1$ is $E'_{\text{total}}$. Now, the difference of the two is $\Delta E = E_{\text{total}} - E'_{\text{total}}$. We wish to compute a constant $c$ such that

$\displaystyle \sum_{i = 1}^n \frac{1}{2} m_i (c v_i)^2 = \sum_{i = 1}^n \frac{1}{2} m_i v_i^2 + \Delta E,$

where $cv_i$ is the normalized speed of the particle $i$ at the time instant $t + 1$ and $v_i$ is the speed of the particle $i$ at the time instant $t$. Finally, we can calculate the value of such a $c$:

\displaystyle \begin{aligned} \sum_{i = 1}^n \frac{1}{2} m_i (c v_i)^2 &= \sum_{i = 1}^n \frac{1}{2} m_i v_i^2 + \Delta E \\ c^2 \sum_{i = 1}^n \frac{1}{2} m_i v_i^2 - \sum_{i = 1}^n \frac{1}{2} m_i v_i^2 &= \Delta E \\ (c^2 - 1) \sum_{i = 1}^n \frac{1}{2} m_i v_i^2 &= \Delta E \\ c^2 - 1 &= \frac{\Delta E}{\sum_{i = 1}^n \frac{1}{2} m_i v_i^2} \\ c &= \sqrt{\frac{\Delta E}{\sum_{i = 1}^n \frac{1}{2} m_i v_i^2} + 1}, \end{aligned}

which is sufficient for keeping the total sum of energies a constant throughout the simulation.

# Convolution is fun!

Given two functions $f$ and $g$, their convolution is

$\displaystyle (f*g)(t) = \int_{-\infty}^{\infty} f(\tau) g(t - \tau) \, \mathrm{d} \tau.$

Let us choose $f(x) = x$ and $g(x) = \frac{1}{1 + x^2}$. Now we have that

$\displaystyle (f * g)(t) = \int_{-\infty}^{\infty} \frac{\tau}{1 + (t - \tau)^2} \, \mathrm{d}\tau.$

Let us set $x = t - \tau$, which implies that $\tau = t - x$, $\frac{\mathrm{d}x}{\mathrm{d}\tau} = -1$, and so $\mathrm{d}\tau = -\mathrm{d}x$. Also, as $\tau \to \infty$, $x \to -\infty$ and vice versa. Finally, we can integrate:

\displaystyle \begin{aligned} (f*g)(t) &= \int_{\infty}^{-\infty} - \frac{t - x}{1 + x^2} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \frac{t - x}{1 + x^2} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \frac{t}{1 + x^2} - \frac{x}{1 + x^2} \, \mathrm{d}x \\ &= \Bigg[ t \arctan x \Bigg]_{x = -\infty}^{x = \infty} - \Bigg[ \frac{1}{2} \ln |x^2 + 1| \Bigg]_{x = -\infty}^{x = \infty} \\ &= t \Bigg[ \frac{\pi}{2} - \Bigg( - \frac{\pi}{2} \Bigg) \Bigg] - \lim_{x \to \infty} \Bigg[ \frac{1}{2} \ln |x^2 + 1| - \frac{1}{2} \ln |(-x)^2 + 1| \Bigg] \\ &= \pi t - \lim_{x \to \infty} [0] \\ &= \pi t. \end{aligned}

Now, what happens if we compute

$\displaystyle (g * f)(t) = \int_{-\infty}^{\infty} f(t - \tau) g(\tau) \, \mathrm{d}\tau$

instead? Let us take a look:

\displaystyle \begin{aligned} (g*f)(t) &= \int_{-\infty}^{\infty} \frac{t - \tau}{1 + \tau^2} \, \mathrm{d} \tau \\ &= \int_{-\infty}^{\infty} \frac{t}{1 + \tau^2} - \frac{\tau}{1 + \tau} \, \mathrm{d} \tau \\ &= \Bigg[ t \arctan \tau \Bigg]_{\tau = -\infty}^{\tau = \infty} - \Bigg[ \frac{1}{2} \ln |1 + \tau^2| \Bigg]_{\tau = -\infty}^{\tau = \infty} \\ &= t \Bigg[ \frac{\pi}{2} - \Bigg( - \frac{\pi}{2} \Bigg) \Bigg] - \lim_{\tau \to \infty} \Bigg[ \frac{1}{2} \ln |1 + \tau^2| - \frac{1}{2} \ln |1 + (-\tau)^2|\Bigg] \\ &= \pi t - \lim_{\tau \to \infty} [0] \\ &= \pi t. \end{aligned}

This is not a coincidence, since convolution commutes. To see that, set $x = t - \tau$, which implies $\tau = t - x, \mathrm{d}\tau = -\mathrm{d}x$. Note also that as $\tau \to \infty$, $x \to -\infty$. Now,

\displaystyle \begin{aligned} (f*g)(t) &= \int_{\infty}^{-\infty} -f(t-x) g(x) \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} f(t-x) g(x) \, \mathrm{d} x, \end{aligned}

which is $(g*f)(t)$.

However, there exist pairs of functions for which convolution does not converge. Let us demonstrate an example. We will reuse $g$, yet redefine $f$ to be $f(x) = x^2$. Here we go:

\displaystyle \begin{aligned} (f*g)(t) &= \int_{-\infty}^{\infty} f(\tau)g(t - \tau) \, \mathrm{d} \tau \\ &= \int_{-\infty}^{\infty} \frac{\tau^2}{1 + (t - \tau)^2} \, \mathrm{d} \tau. \end{aligned}

Setting $t - \tau = x$, we have that $\tau = t - x$, $\frac{\mathrm{d}\tau}{\mathrm{d}x} = -1$, and so $\mathrm{d}\tau = -\mathrm{d}x$. Once again, as $\tau \to \infty$, $x \to -\infty$, and vice versa. Finally

\displaystyle \begin{aligned} \int_{\infty}^{-\infty} - \frac{(t - x)^2}{1 + x^2} \, \mathrm{d}x &= \int_{-\infty}^{\infty} \frac{(t - x)^2}{1 + x^2} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \frac{t^2 - 2tx + x^2}{1 + x^2} \, \mathrm{d}x \\ &= \int_{-\infty}^{\infty} \frac{t^2}{ 1 + x^2 } \, \mathrm{d}x - \int_{-\infty}^{\infty} \frac{ 2tx }{ 1 + x^2 } \, \mathrm{d}x + \int_{-\infty}^{\infty} \frac{x^2 }{1 + x^2 } \, \mathrm{d}x \\ &= \pi t^2 - \Bigg[ t \ln |1 + x^2| \Bigg]_{x = -\infty}^{x = \infty} + \int_{-\infty}^{\infty} \frac{x^2}{1 + x^2} \, \mathrm{d}x \\ &= \pi t^2 - \lim_{x \to \infty} \Bigg[ t \ln |1 + x^2| - t \ln |1 + (-x)^2| \Bigg] + \int_{-\infty}^{\infty} \frac{x^2}{1 + x^2} \, \mathrm{d}x \\ &= \pi t^2 = \lim_{x \to \infty} [0] + \int_{-\infty}^{\infty} \frac{x^2}{1 + x^2} \, \mathrm{d}x \\ &= \pi t^2 + \int_{-\infty}^{\infty} \frac{x^2}{1 + x^2} \, \mathrm{d}x, \end{aligned}

which does not converge. In order to prove divergence, first observe that $\frac{x^2}{1+x^2} \in [0, 1)$. Next, let us choose $\varepsilon \in (0, 1)$. Now,

\displaystyle \begin{aligned} \frac{x^2}{1 + x^2} &> \varepsilon & \Longleftrightarrow \\ x^2 &> \varepsilon + \varepsilon x^2 & \Longleftrightarrow \\ (1 - \varepsilon) x^2 &> \varepsilon & \Longleftrightarrow \\ x^2 &> \frac{\varepsilon}{1 - \varepsilon} & \Longleftrightarrow \\ |x| &> \sqrt{\frac{\varepsilon}{1 - \varepsilon}} = \delta.& \end{aligned}

Since such $\delta > 0$ and $\varepsilon$ exist, $\int_{-\infty}^{\infty} \frac{x^2}{1 + x^2} \, \mathrm{d}x$ diverges by Lemma 1.

Lemma 1 Let $f \colon \mathbb{R} \to \mathbb{R}$ be a continuous function. If there exist $a \in \mathbb{R}$ and $\varepsilon > 0$ such that $f(x) > \varepsilon$ when $x < a$ and there exists $b > a$ and $\varepsilon_b \geq 0$ such that $f(x) \geq \varepsilon_b$ when $x > b$, then $\int_{-\infty}^{\infty} f(x) \, \mathrm{d}x = \infty$.
Proof

Let $C = \int_a^b f(x) \, \mathrm{d}x$. Since $f$ is continuous, $C$ is finite. Now

$\displaystyle \int_{-\infty}^{\infty} f(x) \, \mathrm{d}x = \int_{-\infty}^a f(x) \, \mathrm{d}x + C + \int_b^{\infty} f(x) \, \mathrm{d}x.$

Since $f(x) \geq 0$ when $x > b$, $\int_b^{\infty} f(x) \, \mathrm{d}x \geq 0$. Finally,

$\displaystyle \int_{-\infty}^a f(x) \, \mathrm{d}x > \lim_{b \to -\infty} \varepsilon_a(a - b) = \infty$.

# An efficient probability distribution data structure

A probability distribution is a list of nodes, each holding an object that may be sampled along its positive weight. Whenever asking for an object in the distribution (sampling the distribution), the data structure returns an object according to its weight and the weights of all the other objects. For example, if the list is $\langle (A, 1.0), (B, 1.0), (C, 3.0) \rangle$, whenever sampling the distribution, both $A$ and $B$ may be returned with 20% chance each, and $C$ may be returned with probability of 60%.

One obvious data structure is an array-based list (such as ArrayList in Java, or vector in C++). Both the list data structures are implemented in such a way, that adding a new object/weight pair is done in amortized constant time, yet sampling and removing an element will run in $\Theta(n)$ worst case time.

In this post I will explain how to implement a data structure that supports all three fundamental operations (insertion, sampling, removal) in logarithmic time in all cases. However, I will supply three other data structures for the sake of comparison.

#### The application programming interface

Clearly, in our probability distribution data structures, we need at least three methods:

• sampleElement,
• removeElement.

Now, let us proceed to defining an abstract base class:

net.coderodde.stat.AbstractProbabilityDistribution.java
package net.coderodde.stat;

import java.util.Objects;
import java.util.Random;

public abstract class AbstractProbabilityDistribution<E> {

protected double totalWeight;
protected final Random random;

protected AbstractProbabilityDistribution() {
this(new Random());
}

protected AbstractProbabilityDistribution(Random random) {
this.random =
Objects.requireNonNull(random,
"The random number generator is null.");
}

public abstract boolean isEmpty();
public abstract int size();
public abstract boolean addElement(E element, double weight);
public abstract E sampleElement();
public abstract boolean contains(E element);
public abstract boolean removeElement(E element);
public abstract void clear();

protected void checkWeight(double weight) {
if (Double.isNaN(weight)) {
throw new IllegalArgumentException("The element weight is NaN.");
}

if (weight <= 0.0) {
throw new IllegalArgumentException(
"The element weight must be positive. Received " + weight);
}

if (Double.isInfinite(weight)) {
// Once here, 'weight' is positive infinity.
throw new IllegalArgumentException(
"The element weight is infinite.");
}
}

protected void checkNotEmpty(int size) {
if (size == 0) {
throw new IllegalStateException(
"This probability distribution is empty.");
}
}
}


#### Trivial implementation: ArrayProbabilityDistribution.java

First, we start with a naïve implementation: the distribution is implemented by a vector of entries, each holding the actual element and its (positive) weight. When adding a new entry, it is appended to the end of the underlying vector. Whenever removing an entry, we remove it from the vector and shift the right part of the vector one position to the left. Clearly, addition runs in amortised constant time whenever using list data structures such as ArrayList or vector. What comes to the removal, unfortunately it is worst case and average case $\Theta(n)$. Also, in order to sample an element, we may need to visit as many entries as there is in the data structure, which yields worst and average case linear time for sampling an element.

All in all, the trivial implementation looks like this:

net.coderodde.stat.AbstractProbabilityDistribution.java
package net.coderodde.stat.support;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import net.coderodde.stat.AbstractProbabilityDistribution;

public class ArrayProbabilityDistribution<E>
extends AbstractProbabilityDistribution<E> {

private static final class Entry<E> {

private final E element;
private double weight;

Entry(E element, double weight) {
this.element = element;
this.weight = weight;
}

E getElement() {
return element;
}

double getWeight() {
return weight;
}

void setWeight(double weight) {
this.weight = weight;
}
}

private final List<Entry<E>> storage = new ArrayList<>();
private final Map<E, Entry<E>> map = new HashMap<>();

public ArrayProbabilityDistribution() {
this(new Random());
}

public ArrayProbabilityDistribution(Random random) {
super(random);
}

@Override
public boolean addElement(E element, double weight) {
checkWeightIsPositiveAndNonNanN(weight);
Entry<E> entry = map.get(element);

if (entry != null) {
entry.setWeight(entry.getWeight() + weight);
} else {
entry = new Entry<>(element, weight);
map.put(element, entry);
}

totalWeight += weight;
return true;
}

@Override
public E sampleElement() {
checkNotEmpty();
double value = random.nextDouble() * totalWeight;
int distributionSize = storage.size();

for (int i = 0; i < distributionSize; ++i) {
Entry<E> entry = storage.get(i);
double currentWeight = entry.getWeight();

if (value < currentWeight) {
return entry.getElement();
}

value -= currentWeight;
}

throw new IllegalStateException("Should not get here.");
}

@Override
public boolean removeElement(E element) {
Entry<E> entry = map.remove(element);

if (entry == null) {
return false;
}

totalWeight -= entry.getWeight();
storage.remove(entry);
return true;
}

@Override
public void clear() {
map.clear();
storage.clear();
totalWeight = 0.0;
}

@Override
public boolean isEmpty() {
return storage.isEmpty();
}

@Override
public int size() {
return storage.size();
}

@Override
public boolean contains(E element) {
return map.containsKey(element);
}

protected void checkNotEmpty() {
checkNotEmpty(storage.size());
}

public static void main(String[] args) {
AbstractProbabilityDistribution<Integer> pd =
new ArrayProbabilityDistribution<>();

int[] counts = new int[4];

for (int i = 0; i < 1000; ++i) {
Integer myint = pd.sampleElement();
counts[myint]++;
}

System.out.println(Arrays.toString(counts));
}
}


#### Improving the trivial implementation: BinarySearchProbabilityDistribution.java

The next probability distribution extends the one above by adding accumulated weights to each entry. Since the accumulated weights form a non-decreasing sequence as we march from the beginning of the storage list towards its end, and the storage list supports access in $\Theta(1)$, we can apply binary search to the list in order to sample an element.

Basically, binary search version has the same running times for insertion and removal operations as the very first one (ArrayProbabilityDistribution), yet it improves both average and worst case running times for sampling to $\Theta(\log n)$. The code follows:

package net.coderodde.stat.support;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import net.coderodde.stat.AbstractProbabilityDistribution;

/**
* This class implements a probability distribution data structure that
* maintains an accumulated sum of weights and thus allows sampling the elements
* in worst-case logarithmic time.
*
* @author Rodion "rodde" Efremov
* @version 1.61 (Sep 30, 2016)
*/
public class BinarySearchProbabilityDistribution<E>
extends AbstractProbabilityDistribution<E> {

/**
* This class implements the actual entry in the distribution.
*
* @param <E> the actual element type.
*/
private static final class Entry<E> {

private final E element;

private double weight;

private double accumulatedWeight;

Entry(E element, double weight, double accumulatedWeight) {
this.element = element;
this.weight = weight;
this.accumulatedWeight = accumulatedWeight;
}

E getElement() {
return element;
}

double getWeight() {
return weight;
}

void setWeight(double weight) {
this.weight = weight;
}

double getAccumulatedWeight() {
return accumulatedWeight;
}

accumulatedWeight += delta;
}
}

/**
* This map maps each element stored in this probability distribution to its
* respective entry.
*/
private final Map<E, Entry<E>> map = new HashMap<>();

/**
* Holds the actual distribution entries.
*/
private final List<Entry<E>> storage = new ArrayList<>();

/**
* Constructs this probability distribution with default random number
* generator.
*/
public BinarySearchProbabilityDistribution() {
this(new Random());
}

/**
* Constructs this probability distribution with given random number
* generator.
*
* @param random the random number generator.
*/
public BinarySearchProbabilityDistribution(Random random) {
super(random);
}

/**
* {@inheritDoc }
*/
@Override
public boolean addElement(E element, double weight) {
checkWeightIsPositiveAndNonNaN(weight);
Entry<E> entry = map.get(element);

if (entry == null) {
entry = new Entry<>(element, weight, totalWeight);
map.put(element, entry);
} else {
for (int i = storage.indexOf(entry); i < storage.size(); ++i) {
}
}

totalWeight += weight;
return true;
}

/**
* {@inheritDoc }
*/
@Override
public E sampleElement() {
checkNotEmpty();
double value = totalWeight * random.nextDouble();

int left = 0;
int right = storage.size() - 1;

while (left < right) {             int middle = left + ((right - left) >> 1);
Entry<E> middleEntry = storage.get(middle);
double lowerBound = middleEntry.getAccumulatedWeight();
double upperBound = lowerBound + middleEntry.getWeight();

if (lowerBound <= value && value < upperBound) {
return middleEntry.getElement();
}

if (value < lowerBound) {
right = middle - 1;
} else {
left = middle + 1;
}
}

return storage.get(left).getElement();
}

/**
* {@inheritDoc }
*/
@Override
public boolean contains(E element) {
return map.containsKey(element);
}

/**
* {@inheritDoc }
*/
@Override
public boolean removeElement(E element) {
Entry<E> entry = map.remove(element);

if (entry == null) {
return false;
}

int index = storage.indexOf(entry);
storage.remove(index);

for (int i = index; i < storage.size(); ++i) {
}

totalWeight -= entry.getWeight();
return true;
}

/**
* {@inheritDoc }
*/
@Override
public void clear() {
map.clear();
storage.clear();
totalWeight = 0.0;
}

/**
* {@inheritDoc }
*/
@Override
public boolean isEmpty() {
return storage.isEmpty();
}

/**
* {@inheritDoc }
*/
@Override
public int size() {
return storage.size();
}

private void checkNotEmpty() {
checkNotEmpty(storage.size());
}

public static void main(String[] args) {
BinarySearchProbabilityDistribution<Integer> d = new BinarySearchProbabilityDistribution<>();

d.removeElement(3);

System.out.println("");

binarySearchProbabilityDistributionDemo();
}

private static void binarySearchProbabilityDistributionDemo() {
BinarySearchProbabilityDistribution<Integer> pd =
new BinarySearchProbabilityDistribution<>();

int[] counts = new int[4];

for (int i = 0; i < 100; ++i) {
Integer myint = pd.sampleElement();
counts[myint]++;
System.out.println(myint);
}

System.out.println(Arrays.toString(counts));
}
}


#### Efficient removal of elements: LinkedListProbabilityDistribution.java

The third data structure improves the removal operation to constant time in all cases, yet runs the sampling operation in linear time in worst and average cases. The idea is to use a doubly-linked list for storing the entries. Also, we maintain a hashtable-based map mapping each element in the distribution to its node in the linked list. Clearly, in order to add a new element, we create a new linked list node for the element and map it in the map; both actions run in constant time with possible exception of the situation where the hash map has to be expanded, which is $\Theta(n)$. However, adding to a hash map runs in constant amortized time. What comes to the removal operation, we access the linked list node through the map, and unlink it; both actions run in constant time as well. However, in order to sample an element, in both average and worst cases we need to march through at least half the list, which is clearly linear time. The implementation follows:

package net.coderodde.stat.support;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import net.coderodde.stat.AbstractProbabilityDistribution;

/**
* This class implements a probability distribution relying on a linked list.
* The running times of the main methods are as follows:
*
*
<table>
*
<tr>
<td>Method</td>
<td>Complexity</td>
</tr>
*
<tr>
*
<td><tt>amortized constant time</tt>,</td>
</tr>
*
<tr>
<td><tt>sampleElement</tt></td>
<td><tt>O(n)</tt>,</td>
</tr>
*
<tr>
<td><tt>removeElement</tt></td>
<td><tt>O(1)</tt>.</td>
</tr>
*</table>
*
* This probability distribution class is best used whenever it is modified
* frequently compared to the number of queries made.
*
* @param <E> the actual type of the elements stored in this probability
*            distribution.
*
* @author Rodion "rodde" Efremov
* @version 1.61 (Sep 30, 2016)
*/
extends AbstractProbabilityDistribution<E> {

private static final class LinkedListNode<E> {

private final E element;
private double weight;

this.element = element;
this.weight  = weight;
}

E getElement() {
return element;
}

double getWeight() {
return weight;
}

void setWeight(double weight) {
this.weight = weight;
}

return prev;
}

return next;
}

prev = node;
}

next = node;
}
}

/**
* This map maps the elements to their respective linked list nodes.
*/
private final Map<E, LinkedListNode<E>> map = new HashMap<>();

/**
* Stores the very first linked list node in this probability distribution.
*/

/**
* Stores the very last linked list node in this probability distribution.
*/

/**
* Construct a new probability distribution.
*/
super();
}

/**
* Constructs a new probability distribution using the input random number
* generator.
*
* @param random the random number generator to use.
*/
super(random);
}

/**
* {@inheritDoc }
*/
@Override
public boolean addElement(E element, double weight) {
checkWeightIsPositiveAndNonNaN(weight);

if (node == null) {

} else {
}

map.put(element, node);
} else {
node.setWeight(node.getWeight() + weight);
}

totalWeight += weight;
return true;
}

/**
* {@inheritDoc }
*/
@Override
public E sampleElement() {
checkNotEmpty(map.size());
double value = random.nextDouble() * totalWeight;

node != null;
if (value < node.getWeight()) {
return node.getElement();
}

value -= node.getWeight();
}

throw new IllegalStateException("Should not get here.");
}

/**
* {@inheritDoc }
*/
@Override
public boolean contains(E element) {
return map.containsKey(element);
}

/**
* {@inheritDoc }
*/
@Override
public boolean removeElement(E element) {

if (node == null) {
return false;
}

totalWeight -= node.getWeight();
return true;
}

/**
* {@inheritDoc }
*/
@Override
public void clear() {
totalWeight = 0.0;
map.clear();
}

/**
* {@inheritDoc }
*/
@Override
public boolean isEmpty() {
return map.isEmpty();
}

/**
* {@inheritDoc }
*/
@Override
public int size() {
return map.size();
}

if (left != null) {
} else {
}

if (right != null) {
} else {
}
}

public static void main(String[] args) {

int[] counts = new int[4];

for (int i = 0; i < 100; ++i) {
Integer myint = pd.sampleElement();
counts[myint]++;
System.out.println(myint);
}

System.out.println(Arrays.toString(counts));
}
}


#### Efficient binary tree based implementation: BinaryTreeProbabilityDistribution.java

Finally, we can discuss the tree based probability distribution. The main idea is to organize the element entries according to a balanced binary tree. For that we define to different categories of nodes:

• leaf nodes,
• relay nodes.

Only leaf nodes store the actual elements. Each relay or leaf node $n$ is augmented with the following fields:

• element: stores the actual element in a leaf node; unused in the relay nodes,
• weight: if the node is a leaf node, stores the weight of element. Otherwise (if the node is a relay node), stores the sum of all weights in all the leaves of the subtree rooted at $n$,
• isRelayNode: set to true only in relay nodes,
• leftChild: the left subtree of $n$, or null if this node is a leaf node,
• rightChild: the right subtree of $n$, or null if this node is a leaf node,
• parent: the parent relay node of this node,
• numberOfLeafNodes: if $n$ is a leaf node, the field is set to one, otherwise it stores the number of leaf nodes in the subtree rooted at $n$.

Next, the invariant is that relay nodes have exactly two child nodes. Any of the two children of a relay node can be relay or leaf nodes.

###### Inserting an element

In order to insert an element in the data structure, we start from the root node. If the root is null (the distribution is empty), we create a leaf and set it as the root. Otherwise, if the root is a leaf $n$ (only one element in the structure), we create a new relay node $r$, append $n$ and the new leaf as its children, and set $r$ as a root. In case the root node is a relay node, we consult its two children, and descend to the one with smaller count field value. That way we keep the entire tree balanced. Note that this is possible due to the fact that we do not have to maintain any order in the tree: upon insertion, we just put a new leaf node to a least high tree. See Figure 1 for a demonstration of inserting some elements in the tree.

Figure 1: Inserting elements

In order to delete an element from the binary tree probability distribution, we have the following: let the tree node containing the element requested for removal be $n$. Now, if $n$ is the root, just set nil to the root. Otherwise, the parent of $n$ is a relay node $r$ by the data structure invariant. What we do is extracting the other child of $r$ (call it $n'$) and replace $r$ with $n'$. Note that even if we keep removing, say, rightmost leaf node, the entire tree remains balanced (try to read the Firgure 1 “backwards” for illustration).

Finally, what comes to the sampling, we toss a coin $c \in [0, \sum w_i)$, and start descending the tree downwards from the root until we reach a leaf node whose “bucket” contains $late x$. Since the depth of the tree is logarithmic in its size, we have the running time of $\Theta(\log n)$ for all cases. The Java implementation follows:

package net.coderodde.stat.support;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import net.coderodde.stat.AbstractProbabilityDistribution;

public class BinaryTreeProbabilityDistribution<E>
extends AbstractProbabilityDistribution<E> {

private static final class Node<E> {

private final E element;
private double weight;
private final boolean isRelayNode;
private Node<E> leftChild;
private Node<E> rightChild;
private Node<E> parent;
private int numberOfLeafNodes;

Node(E element, double weight) {
this.element           = element;
this.weight            = weight;
this.numberOfLeafNodes = 1;
this.isRelayNode       = false;
}

Node(double weight) {
this.element           = null;
this.weight            = weight;
this.numberOfLeafNodes = 1;
this.isRelayNode       = true;
}

@Override
public String toString() {
if (isRelayNode) {
return "[" + String.format("%.3f", getWeight()) + " : "
+ numberOfLeafNodes + "]";
}

return "(" + String.format("%.3f", getWeight()) + " : "
+ element + ")";
}

E getElement() {
return element;
}

double getWeight() {
return weight;
}

void setWeight(double weight) {
this.weight = weight;
}

int getNumberOfLeaves() {
return numberOfLeafNodes;
}

void setNumberOfLeaves(int numberOfLeaves) {
this.numberOfLeafNodes = numberOfLeaves;
}

Node<E> getLeftChild() {
return leftChild;
}

void setLeftChild(Node<E> block) {
this.leftChild = block;
}

Node<E> getRightChild() {
return rightChild;
}

void setRightChild(Node<E> block) {
this.rightChild = block;
}

Node<E> getParent() {
return parent;
}

void setParent(Node<E> block) {
this.parent = block;
}

boolean isRelayNode() {
return isRelayNode;
}
}

private final Map<E, Node<E>> map = new HashMap<>();
private Node<E> root;

public BinaryTreeProbabilityDistribution() {
this(new Random());
}

public BinaryTreeProbabilityDistribution(Random random) {
super(random);
}

@Override
public boolean addElement(E element, double weight) {
checkWeightIsPositiveAndNonNanN(weight);
Node<E> node = map.get(element);

if (node == null) {
node = new Node<>(element, weight);
insert(node);
map.put(element, node);
} else {
node.setWeight(node.getWeight() + weight);
}

totalWeight += weight;
return true;
}

@Override
public boolean contains(E element) {
return map.containsKey(element);
}

@Override
public E sampleElement() {
checkNotEmpty(map.size());
double value = totalWeight * random.nextDouble();
Node<E> node = root;

while (node.isRelayNode()) {
if (value < node.getLeftChild().getWeight()) {
node = node.getLeftChild();
} else {
value -= node.getLeftChild().getWeight();
node = node.getRightChild();
}
}

return node.getElement();
}

@Override
public boolean removeElement(E element) {
Node<E> node = map.remove(element);

if (node == null) {
return false;
}

delete(node);
totalWeight -= node.getWeight();
return true;
}

@Override
public void clear() {
root = null;
map.clear();
totalWeight = 0.0;
}

@Override
public boolean isEmpty() {
return map.isEmpty();
}

@Override
public int size() {
return map.size();
}

private void bypassLeafNode(Node<E> leafNodeToBypass,
Node<E> newNode) {
Node<E> relayNode = new Node<>(leafNodeToBypass.getWeight());
Node<E> parentOfCurrentNode = leafNodeToBypass.getParent();

relayNode.setLeftChild(leafNodeToBypass);
relayNode.setRightChild(newNode);

leafNodeToBypass.setParent(relayNode);
newNode.setParent(relayNode);

if (parentOfCurrentNode == null) {
root = relayNode;
} else if (parentOfCurrentNode.getLeftChild() == leafNodeToBypass) {
relayNode.setParent(parentOfCurrentNode);
parentOfCurrentNode.setLeftChild(relayNode);
} else {
relayNode.setParent(parentOfCurrentNode);
parentOfCurrentNode.setRightChild(relayNode);
}

}

private void insert(Node<E> node) {
if (root == null) {
root = node;
return;
}

Node<E> currentNode = root;

while (currentNode.isRelayNode()) {
if (currentNode.getLeftChild().getNumberOfLeaves() <
currentNode.getRightChild().getNumberOfLeaves()) {
currentNode = currentNode.getLeftChild();
} else {
currentNode = currentNode.getRightChild();
}
}

bypassLeafNode(currentNode, node);
}

private void delete(Node<E> leafToDelete) {
Node<E> relayNode = leafToDelete.getParent();

if (relayNode == null) {
root = null;
return;
}

Node<E> parentOfRelayNode = relayNode.getParent();
Node<E> siblingLeaf = relayNode.getLeftChild() == leafToDelete ?
relayNode.getRightChild() :
relayNode.getLeftChild();

if (parentOfRelayNode == null) {
root = siblingLeaf;
siblingLeaf.setParent(null);
return;
}

if (parentOfRelayNode.getLeftChild() == relayNode) {
parentOfRelayNode.setLeftChild(siblingLeaf);
} else {
parentOfRelayNode.setRightChild(siblingLeaf);
}

siblingLeaf.setParent(parentOfRelayNode);
}

double weightDelta,
int nodeDelta) {
while (node != null) {
node.setNumberOfLeaves(node.getNumberOfLeaves() + nodeDelta);
node.setWeight(node.getWeight() + weightDelta);
node = node.getParent();
}
}

public static void main(String[] args) {
AbstractProbabilityDistribution<Integer> pd =
new BinaryTreeProbabilityDistribution<>();

int[] counts = new int[4];

for (int i = 0; i < 1000; ++i) {
Integer myint = pd.sampleElement();
counts[myint]++;
}

System.out.println(Arrays.toString(counts));
}
}

ArrayProbabilityDistribution
Operation Worst case Average case Best case
addElement $\Theta(n)$ $\Theta(1)$ $\Theta(1)$
sampleElement $\Theta(n)$ $\Theta(n)$ $\Theta(1)$
removeElement $\Theta(n)$ $\Theta(n)$ $\Theta(1)$
BinarySearchProbabilityDistribution
Operation Worst case Average case Best case
addElement $\Theta(n)$ $\Theta(1)$ $\Theta(1)$
sampleElement $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(1)$
removeElement $\Theta(n)$ $\Theta(n)$ $\Theta(1)$
Operation Worst case Average case Best case
addElement $\Theta(n)$ $\Theta(1)$ $\Theta(1)$
sampleElement $\Theta(n)$ $\Theta(n)$ $\Theta(1)$
removeElement $\Theta(1)$ $\Theta(1)$ $\Theta(1)$
BinaryTreeProbabilityDistribution
Operation Worst case Average case Best case
addElement $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
sampleElement $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$
removeElement $\Theta(\log n)$ $\Theta(\log n)$ $\Theta(\log n)$

# Happy physics time

In this post, I consider a bar of infinite length with a finite charge, a charged particle, and examine what is the electromagnetic pull between the two. The situation is no more fuzzy than this:

Given a point $x$ on the $x$-axis (representing the charged bar), the distance between it and the charged particle $Q_2$ is given by

$\displaystyle d(x) = \sqrt{d^2 + x^2},$

where $d$ is the distance of the charged particle from the charged bar. By Coulomb’s law the scalar force between two charged objects with charges $Q_1$ and $Q_2$ is

$\displaystyle F = \frac{k_{\varepsilon}Q_1 Q_2}{r^2},$

where $k_{\varepsilon}$ is the Coulombs’s constant, and $r$ is the distance between the two particles.

We assume that the charged bar has constant charge density: all subchunks of equal length has equal charge. Now it is easy to see that if basic bar component of length $\Delta x$ has the charge of $\Delta x Q / w$, we can write

\displaystyle \begin{aligned} F &= \sum \Delta F \\ &= \sum \frac{k_{\varepsilon} Q_1 \frac{\Delta x}{w} Q_2}{d^2(x)} \\ &= \int_{-w/2}^{w/2} \frac{k_{\varepsilon} Q_1 Q_2 \mathrm{d}x}{wd^2(x)} \\ &= \frac{k_{\varepsilon}Q_1 Q_2}{w} \int_{-w/2}^{w/2} \frac{\mathrm{d}x}{x^2 + d^2} \\ &= \frac{k_{\varepsilon}Q_1 Q_2}{w} \Bigg[ \frac{\arctan(x / d)}{d} \Bigg]_{x = -w / 2}^{x = w / 2} \\ &= \frac{k_{\varepsilon}Q_1 Q_2}{dw} \Bigg[ \arctan(x / d) \Bigg]_{x = -w / 2}^{x = w / 2} \\ &= \frac{2k_{\varepsilon}Q_1 Q_2}{dw} \arctan(\frac{w}{2d}) \end{aligned}

Since $d, Q_1, Q_2$ and $k_{\varepsilon}$ are considered constants, the above expression approaches zero as $w \rightarrow \infty$.

# Funky triangle area

In this post, I will deal with an interesting high school math problem. Suppose we are given the following triangle:

Figure 1

We have two parameters:

• $n$, the number of circles touching a side of the triangle,
• $r$, the radius of each circle.

Figure 2

In order to find out the area of the triangle given the parameters $n$, $r$, we need to find the length of each side. Basically, we must calculate $d$ in Figure 2 as a function of $r$, which is fairly easy since we must have

\displaystyle \begin{aligned} \tan 30\textdegree &= \frac{r}{d}, \\ d &= \frac{r}{\tan 30\textdegree} \\ &= \frac{r}{\frac{1}{\sqrt{3}}} \\ &= \sqrt{3}r. \end{aligned}

Next, we can define the length of a side of the triangle as a function of $n$ and $r$:

\displaystyle \begin{aligned} s(n, r) &= 2d + 2r(n - 1) \\ &= 2\sqrt{3}r + 2r(n - 1) \\ &= 2r(n + \sqrt{3} - 1). \end{aligned}

If we set $p(n, r) = 3s(n, r)/2$ (i.e., $p(n, r)$ is the half of the perimeter of the triangle),
by Heron’s formula, we have that the area of the triangle is

\displaystyle \begin{aligned} A(n, r) &= \sqrt{p(n, r)[p(n, r) - s(n, r)]^3} \\ &= \sqrt{\frac{3}{2}s(n, r)\Bigg[\frac{3}{2}s(n, r) - s(n, r)\Bigg]^3} \\ &= \sqrt{\frac{3}{2}s(n, r)\Bigg[\frac{1}{2}s(n, r)\Bigg]^3} \\ &= \sqrt{\frac{3}{2}s(n, r)\frac{1}{8}s^3(n, r)} \\ &= \sqrt{\frac{3}{16}s^4(n, r)} \\ &= \frac{\sqrt{3}}{4}s^2(n, r) \\ &= \frac{\sqrt{3}}{4} [2r(n + \sqrt{3} - 1)]^2 \\ &= \frac{\sqrt{3}}{4} 4r^2 (n + \sqrt{3} - 1)^2 \\ &= \sqrt{3}r^2 (n + \sqrt{3} - 1)^2. \end{aligned}

# A question of mind

There are a couple of aspects of mind that bothered me for a while. First of all, I suspect that mind (or soul, or psyche) is matter-agnostic, i.e., its “state” does not depend on the actual particles comprising the brain, but rather the configuration of those particles. As a thought experiment, I was thinking about the following “inductive proof”: Suppose you can substitute an atom in a person’s brain with another atom of the same substance without disturbing all other atoms in the brain. Will mind get modified from its original course as a response to that atomic intervention? I suppose no, at least not in this idealized setting. Next, why not swap another atom, third, fourth, and so on? Eventually, we end up with the same psyche “running on another hardware.” If that would work, we might have concluded that, ultimately, psyche is just information. Next, can we “resurrect” a person by “recording” some of his memories while the person is alive, and after his physical death, bioprint a new corpus and “write” the memories back to the brain?