the prediction problem, the machine learning methods, and a tutorial for execution
Acknowledgments: the student contributors Zesen Zhuang and Xinyu Tian were supported as Teaching Assistants by the Social Science Divisional Chair’s Discretionary Fund to encourage faculty engagement in undergraduate research and enhance student-faculty scholarly interactions outside of the classroom. The division chair is Prof. Keping Wu, Associate Professor of Anthropology at Duke Kunshan University.
In early philosophical literature, a ‘prediction’ was considered to be an empirical consequence of a theory that had not yet been verified at the time the theory was constructed—an ‘accommodation’ was one that had. I know the view that predictions are superior to accommodations in the assessment of scientific theories as ‘predictivism’.
quoted from “Prediction versus Accommodation,” Stanford Encyclopedia of Philosophy
Be careful with the validity of online news:
What are the data sources? Are the data likely to be trustworthy or not?
What are the algorithms that underpin the results?
What are the assumptions for the algorithms to work?
Can you find better data sources for scientific predictions?
Can you find another algorithm that can better answer the research questions?
Example source 1: Our world in data by Oxford University:
Example source 2: The world bank open data:
Example source 3: The Alpha Vintage API:
Reference Python Package:
how to add a trend line in time series visualizations?
Seasonal and non-seasonal cycles
Seasonal-Trend decomposition using LOESS(STL)
Multiple Seasonal-Trend decomposition using LOESS(MSTL)
A Sample Tutorial
Notes: there are no unique methods to detect outliers
The general prediction model:
The sciki-learn python packages:
Random Forest classifier utilizes ensemble learning that operates by constructing a crowd of decision trees with different and uncorrelated variable selection in the training time and outputting the most weighted one. For classification tasks, the output of the random forest is the class selected by most trees. [source: Wikipedia, sklearn]
A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural networks (ANN). An MLP consists of at least three layers of nodes with nonlinear activation functions: an input layer, a hidden layer, and an output layer. It utilizes supervised learning for training. The Classifier version utilize supervised learning of classifiers. [source: Wikipedia, sklearn]
Auto-Machine Learning (Auto-ML) Classifier
The AutoML method utilizes a well-built machine learning algorithm portfolio for users to easily train and achieve high predictive performances. The AutoML tool would automatically train the data with different trials in various machine learning models and select the best-performing one as the output. In our classification task, we use the AutoGluon library with Python programming. [souce: Wikipedia, AutoGluon]
The linear model assumes a linear relationship between the regressand (the dependent variable) and the parameters of the regressors (independent variables)[Source: sklearn]
Auto-Machine Learning (Auto-ML) Regression
A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural networks (ANN). An MLP consists of at least three layers of nodes with nonlinear activation functions: an input layer, a hidden layer, and an output layer. It utilizes supervised learning for training. The Classifier version utilize supervised learning of regressions. [source: Wikipedia, sklearn]
The train-test split (be careful with time series data):
Visualize the results:
Tuning the hyper-parameters
Metrics and scoring
Highly Recommended: Prof. Kevin Sheppard at Oxford University