I spend a bit of time data wrangling. I try to pay mind to what the predictive analytic technique needs. Likewise, it does things on its own too. Then again, when interpreting results, I think on it again. Worse, when I try to compare models or create an ensemble, I really need to know. So, I made this one stop ref.
First of all, it is important to understand how different techniques handle data irregularity. This is a simple post that aggregates some things to know. Let’s compare decision trees, linear regression, and neural networks.
Comparison of data wrangling
Examples 
Decision Trees

Linear Regression

Neural Networks


Data Types 
Categorical vs continuous. Units of measure. 
Continuous vars are binned. 
Categorical vars are made continuous. Also, can lessen sizing challenge with transformations. 
Categorical vars are made continuous. Also, can adaptive normalize ordersofmagnitude. 
Missing Values 
Missing at Random (MAR). Similarly, Missing Completely at Random (MCAR). Not Missing at Random (NMAR). 
Doesn’t care, but there are different ways to deal (eg dumping into most popular node or keep as separate bin). Further, can use surrogate splitting rules. 
Cannot handle missing values. Thus, must drop or impute. Dropping NMAR can create bias. Generally, there are many ways to impute. 
Cannot handle missing values. Thus, must drop or impute. Dropping NMAR can create bias. Also, there are many ways to impute. 
Distributions 
Skewness. Outliers. Also, Class imbalance or small disjuncts. 
No assumption about inputs or targets distros. Also, skewness can cause problems. 
Assumes multivariate normality. Generally, outliers can cause problems. Can do transforms to make normal. 
Doesn’t assume any pattern. Also, problems can occur when skewed more than lognormal. 
Unbalanced Data (bias) 
Unrepresentative sample. Faulty polling. Awkward binning. Cherry picking. 
Overall, is low bias (no assumption about target) and high variance (small input change makes big difference). Also, could change penalties for wrong classification. Or, can limit tree depth. 
Can perform regularization (to prevent more complex models). Further, can add a weighting var. 
Can create drop out layers (deactivated neurons are temporarily not propagated). Also, can perform regularization to prevent more complex models. 
Variable Relationships 
Between vars. Conversely, between predictors and targets. 
No assumption (is nonparametric). Further, depth of tree lets target be nonlinear. Generally, likes one good var for first split. 
Assumes no correlation among vars and predictorstarget is linear. Further, makes assumptions about residuals too. Try to combine vars or use PCA. 
Can find nonlinear relationship of predictorstarget. Generally, likes a good a priori starting point. Try to combine vars or use PCA. 
For more info about Perficient and predictive analytics: Data, Cloud, Analytics, Big Data