California Scientific
4011 Seaport Blvd
West Sacramento, CA 95691
California Scientific  *  BrainMaker Neural Network Software  *  Predict Forecast Classify Stocks Bonds Markets Commodities Diagnose Medical

BrainMaker Neural Network Software

Is my data ok?

Sometimes your data can be self-contradictory. This data cannot train. Here's an example:

Suppose you're trying to make a network which recognizes farm animals. One of your facts says "brown and furry = cow" and another fact says "brown and furry = horse". This is hopeless. You basically have two choices at this point: you can add another input "chews cud", or you can throw out either the horse fact or the cow fact.

More subtly, you can have two facts where the overall stock market looks about the same numerically (same inputs +/- 5%), where in one fact GM stock goes up $5 and in the other GM stock goes down $5. Again, you need to add more inputs "consumer debt level", "consumer interest rate", "unemployment rate", or you need to get rid of one of these facts.

BrainMaker Professional's utility NetChecker will look for contradictory and nearly contradictory facts and flag them for you. If only a few facts are flagged, you need to check the accuracy of your data and/or consider adding an input or two. If most of your facts are contradictory, your data does not appear to contain good predictors and is unlikely to ever train.

There's no such thing to BrainMaker as "missing data". If some of your facts are missing some of their fields, BrainMaker will fill in these missing fields with your data minimum. Such facts should almost certainly be removed from your training set. It may be very interesting to have such facts in your testing set to see how your network responds to such problems.

A more-or-less minimum is about 50 facts to train a typical network. 250 facts is much better. If you have more than a couple thousand facts, this is likely overkill. A random sample of these facts is probably a good choice. Sometimes you only have 10 facts (this comes up in medical studies - it's frowned on in this country to get people sick just to see what happens). You gotta do what you gotta do, but I'd hate to trust my life to conclusions drawn from 10 cases.

Make sure your data represents all the cases. Consider teaching your first-grader the way home from school. If every day you take a different route, but you always say "we're heading home", you're teaching your kid that all ways go home. It's important to go the wrong way about half the time and say "this is the wrong way". If your data is overwhelmingly representative of one case, consider culling many of these records to get to a ratio of maybe 8 to 1, instead of maybe 200 to 1.