Example of problem:

Pattern one is: 1, 10, 1: 1
Pattern two is: 10, 10, 10: 0

From this extremely simple pattern we will know that a small value followed by a large increase and then a large decrease means it is a 1. Otherwise its a 0. Using something like Genetic Algorithms I can write a system to learn these patterns.

But how do I normalize the numbers or avoid Genetic Algorithms from overfitting this data such that the underlying pattern is learned. For example, if all of a sudden the following set is encountered [500, 10000, 500] it will not be recognized as a 1.

Recommended Answers

All 3 Replies

This is more of a parametrization problem than an overfitting problem, I would say. You said it yourself in your description of the pattern 1: "large increase followed by a large decrease". This is a description of a frequency-domain pattern, because it describes the period of change in values. Similarly, pattern 0 is a DC pattern (zero frequency). The problem of detecting patterns 0 and 1 are basically the problem of detecting the amplitude of the DC component and of the Nyquist frequency component, respectively.

Methods for data fitting or other machine learning approaches will never be able to solve a problem that resides in the parametrization of the features that you are trying to detect or examine. If you search for frequency-domain patterns by feeding spatial-domain data to a generic fitting algorithm, then you are not going to get anywhere. Perform an FFT, and you'll get exactly the patterns you are looking for, or more complex frequency-domain patterns if you want.

Hmm...so it is a matter of normalizing that data prior to putting it into a Genetic Algorithm then?

I'm not yet as advanced in computing as yourself so I'd be greatful if you can give me simpler examples of how the methods you have proposed work.

If what I talked about in my last post is beyond your level, then you have a problem. You must learn to walk before you can run. You should not attempt any data fitting or pattern recognition work if you don't even understand the basics of frequency-domain analysis and Fourier transforms.

This is fundamental to any form of signal analysis or image processing, including pattern detection / recognition from data series.

For example, in image processing terms, a pattern like -1 1 -1 (or variations of that) is a kernel which could be used for detecting rapid changes in value (edge-detection, image sharpening, smoothing, etc.). This kernel matches a pattern of very rapid change in value, which, in frequency-domain terms, is at the Nyquist frequency. Similarly, the pattern 1 1 1 (or variations of that) is a kernel to detect the average (or underlying constant value) in a signal or image, because by averaging the values it removes any local changes. If you correlate a kernel with the signal, you get spikes of amplitude whenever the patterns match, that's called cross-correlating a signal.

As far as using a Genetic Algorithm, which you seem persistent about, there is nothing magical or special about it. A GA is not going to solve any of your problems. People think that a GA is a kind of magic pill that solves everything. It's not. GAs take just as much care, if not more, in setting up your problem correctly in terms of how you parametrize things and how you formulate your objectives as with any other method. And that's where general signal processing and pattern recognition knowledge is necessary.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.