KNIME tutorial: Random forest machine learning model to predict Kaggle Titanic (part 2)

KNIME tutorial: Random forest machine learning model to predict Kaggle Titanic (part 2)

  Random Forest Models The random forest model is easy to execute in KNIME.  It is a popular model because it is easy to implement, adaptable, and robust to overfitting.  Random forests are a common way for new people to get started with machine learning.   How do they work?   The random forest worksRead more about KNIME tutorial: Random forest machine learning model to predict Kaggle Titanic (part 2)[…]

KNIME tutorial: Kaggle Titanic machine learning problem data prep and cleaning (part 1)

KNIME tutorial: Kaggle Titanic machine learning problem data prep and cleaning (part 1)

  KNIME Machine Learning Tutorial I  love to help people who are climbing the career ladder, looking to make a switch, or established in their fields, to learn to use more analytics and data science in their work.  One of the things that has fascinated me for years is how people say they want toRead more about KNIME tutorial: Kaggle Titanic machine learning problem data prep and cleaning (part 1)[…]

Statistics Lie (part 5): Sampling on the dependent variable…or why waking up at 4 am won’t make you successful

Statistics Lie (part 5): Sampling on the dependent variable…or why waking up at 4 am won’t make you successful

  Sampling on the dependent variable is something you see all the time if you read clickbait articles like the crap in Business Insider.  These articles typically start with something like, “things all successful people do…” and then make claims about waking up early, or drinking 3 cups of coffee, etc.   If you areRead more about Statistics Lie (part 5): Sampling on the dependent variable…or why waking up at 4 am won’t make you successful[…]

Statistics Lie (part 4):  Normal Distribution…or its totally normal if your data is not normal

Statistics Lie (part 4): Normal Distribution…or its totally normal if your data is not normal

Statistics Lie, it’s Not Normal: Flaws of Assuming a Normal Distribution   The normal distribution is so common it is often taken for granted by non-statisticians.  However, real-world problems often follow when someone assumes their data is “normal” when it is not.  How do we recognize and avoid these mistakes?   What is the NormalRead more about Statistics Lie (part 4): Normal Distribution…or its totally normal if your data is not normal[…]

Statistics Lie (part three): Independent events…or how it was iid and not id that helped fuel the financial crisis

Statistics Lie (part three): Independent events…or how it was iid and not id that helped fuel the financial crisis

Statistics can be used to trick or deceive.  Statistics can also “prove” things that are not true at all.  One of the reasons this can happen is related to an assumption referred to as iid. Iid is shorthand for independent and identically distributed.  It is often a necessary assumption for statistical inference.  Assuming events are iidRead more about Statistics Lie (part three): Independent events…or how it was iid and not id that helped fuel the financial crisis[…]

Statistics Lie (part two): Correlation and Causation…or why vaccines don’t really cause autism

Statistics Lie (part two): Correlation and Causation…or why vaccines don’t really cause autism

Correlation is not Causation Everyone has heard the maxim that correlation does not necessarily imply causation.  But how do we prove when there is causation, when they are caused by the same thing (associated), and when it is just a coincidence? This is the second article in a series called “Statistics Lie” about how improperRead more about Statistics Lie (part two): Correlation and Causation…or why vaccines don’t really cause autism[…]