CURVE SMOOTHING DURING DATA REGRESSION
Often, collected data for analysis contains some degree of spurious noise. Data noise makes analysis difficult at best. MathCAD provides several curve smoothing tools to help 'smooth out' this noise. The tool we will use in this example is 'loess'.
Consider the data shown at the left. (Click on the matrix to expand it and review all 100 pairs of data points.
<-- Assign columns one and two to variables X & Y respectively
A plot of this data shows the 'noisiness' of the collected data. This plot could represent data collection from a data acquisition system monitoring a control system.
The loess command smoothes data by looking at the data over a specified range or 'neighborhood'. This is specified by the parameter span. The smaller the number span, the more closely the plot will track the original data. By widening the span parameter, loess will fit a more generalized curve rather than trying to fit a curve point-to-point. Below are three examples. The loess command is applied to the X & Y data points over spans of 0.1, 0.25 and 6. Each variation is plotted below. Note the use of the interp command to find an xs value corresponding to z (generated by the loess command) fitted to the dataset vectors X & Y. This allows a plot without having to actually determine the equation of the plot.
Note the plot above. The red plot indicates the actual data. The blue plot is a regression using a small value of loess. Note how it tracks the data. The magenta plot is a slightly larger value of span. Note the smoothing effect. A span of 6 gives us the brown plot. To continue with this example, we will use the magenta plot (span=0.25) and plot it with the original data below. This plot is obviously cyclic and likely follows a sin function. We will use this fact to fit a curve to the data.
<-- Function to fit with its derivatives
<-- guesses for the coefficients
<-- Solve for the coefficients
<-- The equation to fit
The above plot show that the regressed function is probably an acceptable fit to the smoothed data curve; at least within the range of data provided. Assuming the phenomena that generated this data is typically described with a sin function, this is probably an acceptable fit. Note that the assumption of a sin function for the phenomena would have been more difficult to recognize without first smoothing the data.