HEM Data Support: Helpful Hints
Navigation

Automatically Removing A Bad Data Point Using Snap-Master’s Analysis Element

When acquiring “real world” data, we usually acquire noise along with the signal we are interested in. In some cases, the noise generated is what we would consider to be an undesirable data “glitch” that we want to somehow remove before performing any further data analysis on the signal. This is because some data functions (such as differentiation) can produce skewed results when a transient spike occurs, so we want to give the function the cleanest signal possible.

Removing a bad data point is a straightforward task using some simple functions found in Snap-Master’s Analysis element (version 3.0 and later). The advantage to using the standard Analysis element is that you are not limited to the exact algorithms listed in this application note – you can expand them however you like using all of the power available in Snap-Master.

The removal of a bad data point can be broken into three steps:

  • Defining when a data point is “bad”
  • Determining a suitable replacement value for the “bad” point
  • Replacing the “bad” point with the “better”
Defining A Bad Point

When we view the data, we determine that a specific data point is a “glitch” by comparing the value to the values of the data around it. Visually, the glitch looks like a spike in the data. So we define a bad data point as a point whose difference from the surrounding data points falls outside of what we consider to be acceptable. This explanation is not 100% suitable for all data (such as the square wave we will use for this application note), but the process we will use does not adversely affect the results.

For example, this square wave contains a “glitch” at the bottom end of the first transition from high to low. By our definition, even the transitions from high to low may be considered as a “glitch” because there is a large discontinuity between adjacent points. However, you will see that the results are still quite good for this type of waveform.

Now let’s consider the analysis equations we need to translate our definition into a mathematical expression. Again, we define a bad data point as a point whose difference from the surrounding data points falls outside a specified range. A simple form of this would be to compare the previous point to the current point and see if the difference is outside a certain value.
The Snap-Master Analysis function definition looks like this:

# Run Comments Equation Definition Label Units
1 define checkone(ch,delta)=(abs(ch-z[-1](ch))>abs(delta))

The function is called checkone because we are checking the current point against one data point, and its arguments are the input channel ch and the comparison difference between the values delta. Inside the function, we are subtracting the current value of the input channel to the previous value (using the Time Shift, or Z-Transform, function), and performing a logical comparison of its magnitude to the maximum difference we entered for the function. The result of the checkone function is a value of 0 (false) if the difference between the points is not outside of our delta, and the result is 1 (true)if the difference is too large.

A more robust method of determining a bad data point would be to compare it against the data points on either side of the “glitch.” The function checktwo uses the same type of logic as checkone, but we are also looking at the value after the current value (using a time shift of 1). When both conditions are satisfied, the output of checktwo function is 1.

The function looks like this (the equation should be written on a single line in the Analysis element):

# Run Comments Equation Definition Label Units
2 define checktwo(ch,delta)=((abs(ch-z[-1](ch))>abs(delta)) and abs(ch-z[1](ch))>abs(delta)))
Calculating Suitable Replacement Values

Once we have found a bad data point, what do we replace it with? The simplest options are to replace it with an adjacent data point so the data maintains good continuity. Again using the Z-Transform, the prevpt and nextpt functions get the values of the previous data point and next data point respectively.

# Run Comments Equation Definition Label Units
3 define prevpt(ch)=(z[-1](ch))
4 define nextpt(ch)=(z[1](ch))

A more advanced method is to average the data on either side of the “glitch” and use that for the replacement data value. Effectively, we are writing a miniature smoother for the bad data point by averaging the data. The easiest approach is to average the two data points around it, or one point on either side of the bad point which is shown below in the avgadjonept function definition. One step further would be to average two points on either side of the bad point, which is shown in the avgadjtwopts function.

# Run Comments Equation Definition Label Units
5 define avgadjonept(ch)=((z[-1](ch)+z[1](ch))/2)
6 define avgadjtwopts(ch)=((z[-2](ch)+z[-1](ch)+z[1](ch)+z[2](ch))/4)

These are not the only possibilities for calculating a replacement value. For example, you may want to replace the data with a fixed value such as 0. Or you may want to replace it with the running average for entire waveform using the ravg function. Feel free to experiment with the best replacement values for your particular data.

Replacing Bad Data Points

Now that we know when the bad data point occurs and what to replace it with, all that is left is the actual mechanism to perform the replacement. In general terms, we are saying that if the comparison falls outside a specific range, replace the bad point. Otherwise, keep the original data. This is a natural fit for the if operator in Snap-Master’s Analysis element.

Let’s write an example equation and look at the results. To start, assume we are using the one point comparison (where the current value is compared against the previous value). For our replacement value, we want to smooth the data around the bad point, so we will try the avgadjonept function. Now we need to select an appropriate value for the acceptable amount of change.

Looking back at our original data, we see that the change between valid transitions is approximately 13 volts. The acceptable noise looks like it is much less than 1 volt, but the “glitch” looks like it is about 7 volts below where it should be. These parameters make a delta value of 1 a good choice because it will pass the acceptable noise but catch the glitch we want to remove.

The Analysis equation looks like this (the original data channel is A0, and our result channel is R0):

# Run Comments Equation Definition Label Units
7 R0=if(checkone(A0,1),avgadjonept(A0),A0)

If we were to spell out the contents of the equation it would read as follows:
if the current point of channel A0 changes more than 1 from the previous point replace the value with the average of the previous and next points, otherwise use the current point. The results of our analysis look like this:

The magnitude of the “glitch” has decreased, but it is not reduced enough to our liking. If we were to use a cursor to look at each data value around the “glitch”, we would see that the actual point where it occurred was replaced with the proper value, but the data point after it was also replaced because using the one point check defined it as a “glitch” as well (the change is greater than our delta).

To correct this, let’s use the two point comparison method which will take care of the rise after the “glitch” occurs. The only change to our analysis equation is to use the different check function:

# Run Comments Equation Definition Label Units
7 R0=if(checktwo(A0,1),avgadjonept(A0),A0

The results of this analysis are much better:

We have not completely eliminated the bad data point, but we have replaced it with a much more manageable spike. Our next thought might be to use the avgadjtwopts replacement function, but it does not do much better on this data because the “glitch” occurs directly on a transition boundary.

Summary

Using some basic comparison functions and using the Z-Transform function, it is fairly easy to replace bad data points with appropriate values. This general approach can be used with a wide range of data, and other methods of detecting a bad data point and determining a replacement value are easy to integrate into this approach.