Can Machine Learn the Concept of Sine

Ying Xie
Towards Data Science
6 min readJun 14, 2017

--

It is well known that artificial neural networks are good at modeling any function. I wonder whether they can go one step further, and learn the generalized model of a function. For simplicity let’s try to learn a sine function with just one parameter A, which controls the frequency:

y = sin(A*x)

For us humans, once we understand the sine function, we know how it behaves under any parameter A. If we are presented with a partial sine wave, we can figure out what A should be, and we can extrapolate the wave out to infinity.

Can ML predict a sine wave for a parameter A it hasn’t seen?

Experiment Setup

Let’s do an experiment to find out. We frame the problem as a time-series prediction problem. Given some data points that match a function sin(A*x), try to predict the future values. Of course the challenge is that we want to learn the general concept of sine. We want to be able to predict the future values, even for a parameter (A) that was never seen by our models during training.

We will use Keras, and try several different models — fully connected network that is used often to model functions, CNN that is used often in pattern recognition, and LSTM that is used often in sequence modeling like NLP.

For each model, we will train under parameter A in the range of (0.06, 0.12). For test we will try to predict under A values of 0.033, 0.06, 0.083, and 0.163. This way we can see the performance with 2 parameters inside the training range, and 2 outside on each end.

During tests we will start with a history of real sin(A*x) value under the correct A. This is the equivalent of giving a human a partial sine wave. When we do prediction, the future prediction of value y will be using the earlier predicted values of y. To given an example, suppose we start with 40 real data samples — y[0] … y[39], with y[i] = sin(A*i). We use our model to predict y[40]. Then we will use y[1] … y[40], where y[40] is the predicted value, to predict y[41].

The reason we do the above, instead of using sin(A*i) to predict y[i+1], is to make the errors in our models easier to see by making the errors cumulative.

Fully Connected Network

In Keras a fully connected layer is called a Dense layer. We use 3 Dense layers in our FC network.

model = models.Sequential()
model.add(Dense(100, input_shape=(INPUT_COUNT,)))
model.add(LeakyReLU(alpha=0.03))
model.add(Dense(100))
model.add(LeakyReLU(alpha=0.03))
model.add(Dense(1))

The input shape is INPUT_COUNT (defined to be 40) prior data points. The last Dense layer has one unit, because we are predicting the next value given the prior 40 values.

Below are the results. The green dotted line is the prediction. Again during training parameter A is in the range of 0.06 to 0.12.

As we can see, our model managed A value of 0.06 and 0.083 pretty well, but did poorly for 0.033 and 0.163. Basically once parameter A is outside of the training range, our model can’t handle it.

Note that on the chart our function doesn’t start at 0, because we used 40 data points as historical data to feed into the model. All charts are offset by that 40 data points.

CNN

We used Conv1D layers, since our data is one dimensional.

model = models.Sequential()
model.add(Conv1D(100, 3, strides=1, input_shape=(INPUT_COUNT, 1)))
model.add(LeakyReLU(alpha=0.03))
model.add(Conv1D(100, 3, strides=1))
model.add(LeakyReLU(alpha=0.03))
model.add(Flatten())
model.add(Dense(100))
model.add(LeakyReLU(alpha=0.03))
model.add(Dense(1))

For convolution we used size 3 filters and stride of 1. We didn’t do max-pooling because the position matters in regression problems.

Similar to FC network, the inputs are 40 prior data points, and the output is the next point on the curve.

The result is similar to the fully connected network. It wasn’t able to learn the general formula of sine given any parameter A.

LSTM

LSTM network retains memory of the data it has seen in the past. Therefore the data we feed into it is in a different form. We just need to feed one data point at a time, instead of a history of past 40 data points for FC and CNN models. As we can see below, the input_batch_size is (1,1,1).

model = models.Sequential()
model.add(LSTM(100, batch_input_shape=(1, 1, 1), return_sequences=True, stateful=True))
model.add(LSTM(100, return_sequences=False, stateful=True))
model.add(Dense(1))

Because we can only feed in one data point at a time, as later data points depend on the LSTM internal state built up by earlier data points, we can’t leverage the parallelism in the hardware. As a result the training is really slow. I didn’t experiment much with the LSTM parameters because of it. Here is the result.

The result is worse than FC and CNN. Again this may be because I didn’t work with it enough. On the other hand, I don’t expect it to behave better, because the other models have enough historical data, and the data is repeating.

Conclusion

I find this problem interesting, because in life we often need to use historical data to predict future in a time-series. If NN models can generalize the concept of repeating patterns, and predict patterns even when the frequency change, they will be much more powerful in our applications.

In our experiments we see that the models all learned the general shape of sine function, but failed to generate future data points at a frequency outside of the training range.

Is the conclusion here that NN models have a hard time generalize the concept of sine, or simply that I suck and failed to build a model that can solve this problem? My code is on github:

Please play with it, send me your comments, and do let me know if a better model can solve the problem. Thank you.

--

--