Case Study: Gogoro Green Product Customer Satisfaction and Loyalty
1 Introduction to the Scenario
1.1 The Importance of Green Product Sustainability
Nowadays, it has been well known that economic growth should be accompanied by the minimization of ecological degradation, as well as attention to social problems. We are living in an era where climate change, resource depletion, and pollution dictate aggressive shifts in corporate strategy. Consequently, an increasing number of companies are working on the development of environmentally friendly products and concepts such as “design for green product development,” “green product design,” and “green product innovation” have come to the forefront of managerial science. As firms transition, understanding consumer behavior in a green context becomes an essential competitive advantage.
1.2 Gaps in Current Literature
Prior research has extensively explored consumer motivation for acquiring green products, the demographic and psychographic profiles of a typical green consumer, and how to orchestrate marketing programs that influence the initial purchase of green products. However, there is still surprisingly little robust research exploring the customer loyalty of green products. The realization of true sustainable environmental development needs customers to continue buying green products, referring them to friends, and committing to the ecosystem long-term, rather than just making a one-time novelty purchase.
1.3 The Gogoro Case Study
Gogoro is a highly innovative Taiwanese company primarily known for its electric smart scooters and battery-swapping infrastructure. It represents one of the premier green products which intended to fix the dilemma between the convenience of personal transportation and immense ecological degradation in densely populated areas like Taiwan.
The primary purpose of this computational case study is to take Gogoro as an empirical subject to examine the causal factors that directly affect customer satisfaction and follow-on customer loyalty for an established green product.
Research Scope: A quantitative sample of \(n=208\) respondents was collected from current product users. The research structural model proposes that independent variables—namely Function, Usability, Price, and Brand Image - will affect the mediator Customer Satisfaction significantly, yielding definitive Customer Loyalty. The subsequent computational results using Python will ultimately show that brand image affects customer satisfaction the most, followed by function. The deep managerial and environmental implications will be synthesized based on these empirical results.
2 Synthetic Dataset Generation
At the beginning of our workflow, we must generate a synthetic dataset reflecting the scenario constraints: \(n=208\), with Brand Image having the highest correlation with Satisfaction.
Run the following Python block in your environment to create gogoro_data.csv:
import pandas as pdimport numpy as npnp.random.seed(42)n =208# Generate random data representing the constructsdata = {'Respondent_ID': range(1, n +1), 'Age': np.random.randint(18, 60, n),'Func1': np.random.randint(2, 6, n), 'Func2': np.random.randint(2, 6, n),'Usab1': np.random.randint(2, 6, n), 'Usab2': np.random.randint(2, 6, n),'Price1': np.random.randint(2, 6, n), 'Price2': np.random.randint(2, 6, n),'Brand1': np.random.randint(3, 6, n), 'Brand2': np.random.randint(3, 6, n),'Sat1': np.random.randint(2, 6, n), 'Sat2': np.random.randint(2, 6, n),'Loyal1': np.random.randint(2, 6, n), 'Loyal2': np.random.randint(2, 6, n)}df = pd.DataFrame(data)df.to_csv('gogoro_data.csv', index=False)print("gogoro_data.csv created successfully with 208 rows.")
3 Module 1: Data Integrity and Descriptive Analysis
Before testing the impact of brand image or function, the data must be validated for quality and the sample must be described.
3.1 Section 1: Data Cleaning
3.1.1 Objective
To identify and remove incomplete responses or straight-lining to ensure the sample is clean and valid.
Keywords: data cleaning, missing values, straight-lining, row variance, dropna(), data integrity, survey quality control
3.1.2 Statistical Perspective
From a statistical analysis perspective, raw survey data is extremely noisy and often contains human-derived errors. Implementing robust data cleaning protocols is non-negotiable for establishing empirical credibility.
Missing Values (\(NaN\)):
Missing data fundamentally distorts the geometry of datasets. It can bias results computationally (e.g., matrix inversion algorithms cannot process NaN values, leading to software crashes) and theoretically (e.g., the missingness might not be random but systematic, indicating a flaw in the survey instrument).
Mathematical Impact: Reduces the effective degrees of freedom (\(df\)) in regression equations.
Standard Criteria: Common criteria dictate utilizing imputation techniques (like mean substitution or KNN imputation) for small gaps, or outright replacing/dropping survey records with > 10% missing values to preserve model integrity.
Straight-lining (Zero Variance):
This phenomenon occurs when a respondent selects the exact same answer down a column (e.g., “3 - Neutral” for 40 consecutive questions). This indicates a severe lack of respondent engagement and introduces artificial correlation into the covariance matrix, violating the fundamental assumption that survey data reflects genuine physiological or psychological variability in a real population.
Standard Criteria: To programmatically detect this, we calculate the mathematical variance (\(\sigma^2\)) across a continuous subset of items for an individual respondent. If the variance is exactly \(0\) (or below a tiny threshold like \(0.1\)), the case constitutes straight-lining and should be instantly excluded from the dataset.
Industry Application: In big tech companies conducting user experience (UX) research, automated pipelines immediately flag and drop straight-lined responses to prevent multi-million dollar product decisions from being corrupted by non-engaged click-farm respondents.
3.1.3 Python Example Code
import pandas as pddf = pd.read_csv('gogoro_data.csv')v_df = df.copy()# Report missing data before cleaningmissing = df.isnull().sum()missing_pct = (missing /len(df) *100).round(2)missing_report = pd.DataFrame({'Missing Count': missing, 'Missing %': missing_pct})missing_report = missing_report[missing_report['Missing Count'] >0]if missing_report.empty:print("No missing values detected.")else:print("Missing Data Report:")print(missing_report)print(f"Total rows before cleaning: {len(v_df)}")# Drop missing valuesv_df = v_df.dropna()print(f"Rows after dropping missing values: {len(v_df)}")# Check for straight-lining: variance across a subset of itemssurvey_cols = ['Func1', 'Func2', 'Usab1', 'Usab2']v_df['Variance'] = v_df[survey_cols].var(axis=1)n_straightline = (v_df['Variance'] <=0).sum()print(f"Straight-lining cases detected: {n_straightline}")v_df = v_df[v_df['Variance'] >0]print(f"Rows remaining after cleaning: {len(v_df)}")
No missing values detected.
Total rows before cleaning: 208
Rows after dropping missing values: 208
Straight-lining cases detected: 6
Rows remaining after cleaning: 202
3.1.4 Student Task
Open Visual Studio Code. Create a new file named StudentName_Cleaning.py. Modify the variance threshold from \(> 0\) to \(> 0.5\) and observe the change in the number of rows remaining.
3.1.5 Evaluation Questions
Why is data cleaning the necessary first step before any statistical analysis?
What does straight-lining indicate about a respondent’s survey answers?
How can missing data skew statistical results?
What does the dropna() function do in pandas?
Statistically, why do we calculate variance across a row to detect bad survey responses?
3.2 Section 2: Descriptive Statistics
3.2.1 Objective
To calculate the mean, standard deviation, and frequency of demographic variables to describe the sample profile.
Keywords: mean, standard deviation, median, frequency distribution, central tendency, dispersion, describe(), sample profile
3.2.2 Statistical Perspective
Before engaging in complex predictive modeling, researchers must thoroughly understand the topological shape of their variables. Descriptive statistics summarize the fundamental central tendency (mean, median, mode) and dispersion (variance, range, standard deviation) of the dataset multidimensionally.
For demographic variables (like age, gender, geographic location or income), calculating raw frequencies, proportions, and arithmetic means helps researchers definitively understand if their empirical sample is statistically representative of the true overarching population of green product consumers.
Mathematical Foundations:
Central Tendency (\(\mu\) or \(\bar{x}\)): Represents the “expected value” or structural center of gravity of a specific feature. In consumer research, knowing the average age fundamentally anchors the marketing demographic.
Dispersion (\(\sigma\) for standard dev): Represents the average Euclidean distance of all sample data points from the central mean. Wide dispersion means high variability in opinions; narrow dispersion means high consensus amongst users.
Industry Application: A marketing division at an automotive firm like Ford or Tesla relies deeply on descriptive pivot tables and dashboards to verify that early-adopter sample profiles (e.g., 25-35 year olds with high income) match their targeted ad-spend profiles, before they approve widespread structural analysis.
3.2.3 Python Example Code
import pandas as pddf = pd.read_csv('gogoro_data.csv')# Calculate descriptive statistics for numerical variablesprint("Age Description:")print(df['Age'].describe())# Mean of Brand perception itemsprint("\nMean of Brand1:", round(df['Brand1'].mean(), 2))
Age Description:
count 208.000000
mean 38.581731
std 12.570307
min 18.000000
25% 27.750000
50% 40.000000
75% 49.250000
max 59.000000
Name: Age, dtype: float64
Mean of Brand1: 3.95
3.2.4 Student Task
Create StudentName_Descriptive.py in Visual Studio Code. Calculate the mean and standard deviation specifically for the Price1 column. Print both values. Observe the standard deviation.
3.2.5 Evaluation Questions
What is central tendency in statistics?
How does standard deviation explain the dispersion of the data?
Why do we need to know the frequencies of demographic variables?
What pandas method provides a quick statistical summary of a numerical column?
How can descriptive statistics help determine if a sample represents a target population?
3.3 Section 3: Outlier Detection
3.3.1 Objective
To use Mahalanobis Distance to identify multivariate outliers that may skew the analysis results.
An outlier is a data point that differs significantly from other observations, often the result of data entry errors or truly anomalous respondent behavior. In univariate analysis, a boxplot handles this easily. In multivariate analysis, assumptions revolve around normality and the absence of extreme deviations across combined variables. A point might not be an outlier in any single variable but could be anomalous when predicting the relationship matrix.
For a single variable, the standard approach uses the Interquartile Range (IQR):
\[IQR = Q_3 - Q_1\]
where \(Q_1\) is the 25th percentile and \(Q_3\) is the 75th percentile of the distribution. A data point \(x\) is flagged as an outlier if it falls outside the inner fences:
Values beyond \(3 \times IQR\) from the quartiles are called extreme outliers. A boxplot directly visualises these boundaries: the box spans \(Q_1\) to \(Q_3\), the whiskers extend to the fences, and any points plotted beyond the whiskers are outliers.
Mahalanobis Distance (\(D^2\)): A multidimensional measure calculating how far a data point is from the multi-dimensional center of all variables, adjusting for covariance (how variables correlate with each other, scaling elliptical structures).
Hypothesis Test (Mahalanobis \(\chi^2\)):
\(H_0\) (Null Hypothesis): The observation’s multivariate distance from the centroid is consistent with the assumed normal distribution — it is not an outlier.
\(H_1\) (Alternative Hypothesis): The distance is significantly large, indicating a statistically anomalous data point (multivariate outlier).
Decision rule: If \(D^2 > \chi^2_{k,\,\alpha}\), reject \(H_0\) and flag the case as an outlier.
Criteria & Assumptions:
The structured data generally follows a multivariate normal distribution.
The critical value for identifying a multivariate outlier is determined using the \(\chi^2\) distribution, with degrees of freedom equivalent to the number of predictor variables (\(df = k\)).
Commonly, cases with \(p < 0.001\) on the \(\chi^2\) distribution curve are deemed extreme multivariate outliers and should be removed. Failing to do so can heavily skew regression hyperplanes or distort factor topologies.
Industry Application: In anti-fraud divisions within banking, Mahalanobis Distance is aggressively implemented to detect anomalous credit card behaviors (where individual transaction amounts might be normal, but the combined velocity, location, and amounts deviate from the owner’s multi-dimensional covariance matrix).
3.3.4 Python Example Code
import pandas as pdimport numpy as npimport scipy.stats as statsimport scipy.spatial.distancedf = pd.read_csv('gogoro_data.csv')X = df[['Func1', 'Price1', 'Brand1']]m_dist = [ scipy.spatial.distance.mahalanobis( x, X.mean(), np.linalg.inv(X.cov() ) ) for x in X.values]X = X.copy()X['Mahalanobis'] = m_distcrit_val = stats.chi2.ppf((1-0.01), df=3)outliers = X[X['Mahalanobis'] > crit_val]print("Number of outliers detected:", len(outliers))
Number of outliers detected: 0
Interpreting the Output: The count represents respondents whose squared Mahalanobis distance exceeds the \(\chi^2\) critical value at \(\alpha = 0.01\) with \(df = 3\). For each flagged case, \(H_0\) is rejected — the respondent’s combined pattern across Func1, Price1, and Brand1 is statistically anomalous relative to the multivariate distribution of the full sample. Removing these cases before structural analysis reduces the risk of distorted regression coefficients and inflated standard errors.
3.3.5 Student Task
Create StudentName_Outlier.py in Visual Studio Code. Modify the alpha threshold of the \(\chi^2\) test to \(0.05\) instead of \(0.01\) and observe the change in the number of outliers detected.
3.3.6 Evaluation Questions
What is the difference between a univariate and a multivariate outlier?
What does Mahalanobis distance measure?
Why are multivariate outliers problematic in survey data analysis?
What role does the \(\chi^2\) distribution play in outlier detection?
Statistically, what does the covariance matrix represent?
3.4 Section 4: Normality Testing
3.4.1 Objective
To assess the distribution of responses using Skewness and Kurtosis to ensure data fits the assumptions of standard regression.
Keywords: skewness, kurtosis, normal distribution, parametric assumption, bell curve, Shapiro-Wilk, data transformation
3.4.2 Statistical Perspective
Assumptions of Parametric Tests: Many foundational statistical procedures (like standard OLS regression, ANOVA, or maximum likelihood SEM) demand that the residuals of the model or the underlying data distribution natively follow a normal Gaussian curve (a symmetric “bell curve”).
Skewness & Kurtosis Evaluation: Instead of solely relying on rigid visual histograms, statisticians use rigorous numerical metrics:
Skewness measures the lateral lack of symmetry. For instance, if an overwhelmingly positive brand image causes most survey responses to cluster strongly around “5 - Strongly Agree”, the data is negatively skewed. A general criterion is that skewness statistics should be strictly between -2 and +2.
Kurtosis measures the absolute “tailedness” (the prevalence of extreme highs or lows producing heavy or light tails compared to a normal distribution curve). The conservative criterion for acceptable kurtosis is generally between -3 and +3 for robust SEM techniques.
Hypothesis Test (Shapiro-Wilk):
\(H_0\) (Null Hypothesis): The variable follows a normal distribution.
\(H_1\) (Alternative Hypothesis): The variable does not follow a normal distribution.
Decision rule: If \(p < 0.05\), reject \(H_0\) — the distribution significantly deviates from normality and non-parametric or bootstrapped methods are required.
Consequences of Violations: If the data significantly deviates from normality (i.e., fails strict normality tests like Shapiro-Wilk or Kolmogorov-Smirnov where \(p < 0.05\)), standard error estimations become completely unreliable. Non-parametric parametric procedural alternatives or specific techniques like bootstrapping (re-sampling the empirical data thousands of times programmatically to establish robust confidence interval standard errors independently of normal assumptions) must be fundamentally used.
Industry Application: In algorithmic stock trading, asset returns rarely follow a perfect normal distribution (often featuring “fat tails”/high kurtosis). Ignoring kurtosis leads to catastrophic underestimation of financial crash risks.
3.4.3 Python Example Code
import pandas as pdfrom scipy import statsdf = pd.read_csv('gogoro_data.csv')# Assessing normality for the Brand1 item (Shapiro-Wilk)stat, p = stats.shapiro(df['Brand1'])norm_test = pd.DataFrame({'W': [round(stat, 4)], 'pval': [round(p, 4)]}, index=['Brand1'])print(norm_test)
W pval
Brand1 0.7917 0.0
Interpreting the Output: The table reports the W statistic (ranges 0–1; values closer to 1 indicate greater normality) and the p-value. If \(p < 0.05\), \(H_0\) is rejected — the variable’s distribution is statistically non-normal. For Likert-scale survey items (1–5), mild non-normality is common. Assess acceptability using the skewness (\(|Skew| < 2\)) and kurtosis (\(|Kurt| < 3\)) criteria described above rather than relying on the Shapiro-Wilk p-value alone when \(n > 100\).
3.4.4 Student Task
Create StudentName_Normality.py in Visual Studio Code. Use scipy.stats.shapiro() to test the normality of Sat1. Observe the p-value. If \(p < 0.05\), the variable violates the normality assumption.
3.4.5 Evaluation Questions
Why is the assumption of normality important in parametric statistics?
What does a high positive skewness tell us about the survey responses?
What is Kurtosis?
If our data fails the normality test, what alternative statistical procedure can we use?
How can non-normal data affect the p-values in an OLS regression model?
4 Module 2: Measurement Model Verification
This module confirms that the survey items used to measure constructs like “Brand Image” or “Function” are actually measuring what they intend to.
4.1 Section 1: Reliability
4.1.1 Objective
To assess the reliability of survey constructs using Cronbach’s Alpha, confirming that measurement items produce consistent and stable results free from random error.
Reliability is the foundational requirement of any psychometric measurement instrument. It quantifies the degree to which a set of items produces consistent, stable results free from random measurement error. If a respondent has a genuinely strong brand perception, they should score consistently high across all brand-related items — random fluctuation signals an unreliable scale.
Cronbach’s Alpha (\(\alpha\)): The most widely used reliability coefficient, computed as the average pairwise inter-item correlation weighted by the number of items:
where \(k\) is the number of items, \(\sigma^2_i\) is the variance of item \(i\), and \(\sigma^2_T\) is the total score variance.
Criteria Standards:\(\alpha \ge 0.70\) is acceptable for exploratory research; \(\alpha \ge 0.80\) is good; \(\alpha \ge 0.90\) is excellent. Values below \(0.60\) indicate an unreliable scale requiring revision.
Parametric Assumption: Assumes unidimensionality (all items reflect a single latent construct) and \(\tau\)-equivalence (equal item contributions to the true score).
Composite Reliability (CR): A preferred alternative to Cronbach’s Alpha that accounts for unequal item loadings: \(CR = \frac{(\sum \lambda_i)^2}{(\sum \lambda_i)^2 + \sum \epsilon_i}\), where \(\lambda_i\) are standardized factor loadings and \(\epsilon_i\) are error variances. Criterion: \(CR \ge 0.70\).
Industry Application: HR departments rely on reliability when purchasing aptitude assessments for hiring decisions. A leadership questionnaire with \(\alpha = 0.50\) produces inconsistent candidate scores — making selection decisions essentially random and legally indefensible.
Create StudentName_Reliability.py in Visual Studio Code. Extract the items for Function (Func1, Func2) and calculate their Cronbach’s Alpha using the formula \(\\alpha = \\frac{n}{n-1}\\left(1 - \\frac{\\sum \\sigma_i^2}{\\sigma_T^2}\\right)\). Then repeat for Usability (Usab1, Usab2). Compare results and observe whether both constructs meet the \(0.70\) threshold.
4.1.5 Evaluation Questions
What does reliability mean in the context of a survey measurement model?
Why is \(\alpha \ge 0.70\) the standard threshold in management research?
How does increasing the number of items for a construct typically affect Cronbach’s Alpha?
If questions about “Price” have an alpha of \(0.45\), what should the researcher conclude?
What is the difference between Cronbach’s Alpha and Composite Reliability?
4.2 Section 2: Convergent Validity
4.2.1 Objective
To ensure items heavily load onto their intended constructs and evaluate Average Variance Extracted (\(AVE\)).
While internal reliability mathematically guarantees consistency, Validity ensures we are functionally and theoretically measuring the exact right concept.
Factor Analysis Model:
The single-factor model expresses each observed survey item as a linear function of the latent construct:
\[x_i = \lambda_i F + \epsilon_i\]
where \(x_i\) is the observed score on item \(i\), \(\lambda_i\) is the standardized factor loading (the strength of the item’s connection to the latent factor), \(F\) is the unobserved latent construct (e.g., Brand Image), and \(\epsilon_i\) is unique error variance not explained by \(F\). A larger \(\lambda_i\) means the item more faithfully measures the underlying construct.
Average Variance Extracted (AVE):
\[AVE = \frac{\sum_{i=1}^{n} \lambda_i^2}{n}\]
This is the mean of the squared standardized factor loadings — the average proportion of item variance attributable to the latent construct. When \(AVE \ge 0.50\), the construct explains more variance in its indicators than measurement error does.
Convergent Validity: Demonstrates that distinct empirical indicators (survey questions) of a specific theoretical construct actually converge (share a high proportion of variance in common). For example, asking about “Affordability” and “Value for Money” should yield highly correlated results if they both measure the construct ‘Price’.
Evaluation Criteria:
Standardized Factor Loadings (\(\lambda\)): Should ideally be strictly \(\ge 0.70\). Since variance is the square of the loading, this means over 50% (\(\lambda^2 = 0.7^2 = 0.49\)) of the variance in the indicator is definitively explained by the pure construct, rather than random noise. However, in exploratory business studies, loadings \(\ge 0.50\) are sometimes grudgingly retained to preserve structural content validity.
Average Variance Extracted (\(AVE\)): The grand mean of the squared factor loadings for all items in a single construct. The strict academic criterion is \(AVE \ge 0.50\), indicating the construct as a whole mathematically explains more than half the variance of its combined indicators, meaning the underlying signal is stronger than measurement error.
Industry Application: In high-stakes clinical psychometrics (e.g., depression diagnostic scales in healthcare), failing convergent validity means doctors are diagnosing patients based on unstructured noise, potentially leading to incorrect pharmaceutical prescriptions.
Loadings for Satisfaction Items:
[-0.30080579 0.30080579]
Interpreting the Output: Each value is a standardized factor loading \(\lambda_i\) from the equation \(x_i = \lambda_i F + \epsilon_i\). Loadings \(\ge 0.70\) indicate the item shares \(\ge 49\%\) of its variance with the latent construct (\(\lambda^2 \ge 0.49\)), satisfying convergent validity. Compute the AVE as the mean of the squared loadings: \(AVE = \frac{\sum \lambda_i^2}{n}\). If \(AVE \ge 0.50\), convergent validity is confirmed for the Satisfaction construct.
4.2.4 Student Task
Create StudentName_Convergent.py in Visual Studio Code. Extract the items for Price (Price1, Price2). Fit a Factor model from statsmodels.multivariate.factor with 1 factor using method='pa', and print the resulting factor loadings via result.loadings. Observe if they exceed \(0.50\).
4.2.5 Evaluation Questions
What does convergent validity measure?
In CFA, what does a factor loading of \(0.85\) imply about an item?
What is the rule of thumb threshold for an acceptable factor loading?
What does \(AVE\) represent mathematically?
Why must \(AVE\) be \(\ge 0.50\)?
4.3 Section 3: Discriminant Validity
4.3.1 Objective
To ensure constructs are statistically distinct from one another using the Fornell-Larcker Criterion.
Discriminant Validity: Ensures that a theoretical construct is genuinely, empirically distinct from all other constructs in the mathematical model. For instance, respondents should be able to cleanly distinguish between questions measuring “Brand Image” and questions measuring “Product Function.” If they cannot, the resulting latent variables merge, completely destroying the independent predictive power of the structural model.
Fornell-Larcker Criterion: This remains a mathematically stringent statistical test for ensuring discriminant validity via variance comparison.
Criteria Mechanics: The square root of the Average Variance Extracted (\(\sqrt{AVE}\)) for any given construct must be strictly greater than its highest absolute linear correlation coefficient (\(r\)) with any other construct in the entire model matrix.
Consequences: If it fails this test (\(\sqrt{AVE} < r\)), it essentially proves the construct shares more variance with an outside construct than it does with its own survey items. The researcher is then forced to merge the highly correlated constructs mathematically or completely drop confounding survey items.
Industry Application: A streaming service like Netflix running user surveys must ensure their metric for “App Navigability” is strictly discriminant from “Content Quality.” If they fail discriminant validity, engineers won’t know whether to allocate their multi-million dollar budget toward redesigning the UI or buying better TV shows.
Create StudentName_Discriminant.py in Visual Studio Code. Calculate the average score for Brand and for Sat. Calculate the correlation between these two scores and observe if it would likely shadow \(\sqrt{AVE}\).
4.3.5 Evaluation Questions
Why is discriminant validity necessary in a survey?
What happens to path modeling if two constructs fail discriminant validity?
Describe the Fornell-Larcker Criterion.
If \(\sqrt{AVE}\) for “Price” is \(0.60\), and its correlation with “Function” is \(0.75\), what is the conclusion?
Statistically, how do we combine survey items into a single construct score for basic correlation analysis?
4.4 Section 4: Correlation Analysis
4.4.1 Objective
To examine the linear relationships between constructs using Pearson correlation coefficients, interpret the direction and strength of associations, and build a correlation matrix for the Gogoro dataset.
Keywords: Pearson correlation, correlation matrix, r coefficient, p-value, multicollinearity, linear relationship, correlation heatmap
4.4.2 Statistical Perspective
Correlation Analysis measures the strength and direction of the linear relationship between two continuous variables. It is a prerequisite to regression and structural modeling — revealing which constructs are meaningfully related before causal claims are made.
Direction: Positive \(r\) means both variables increase together; negative \(r\) means one increases as the other decreases.
Significance: A p-value \(< 0.05\) confirms the correlation is not due to chance.
Hypothesis Test (Pearson \(r\)):
\(H_0\):\(r = 0\) — there is no linear relationship between the two constructs in the population.
\(H_1\):\(r \neq 0\) — a statistically significant linear relationship exists.
Decision rule: If \(p < 0.05\), reject \(H_0\). The .corr() method returns point estimates only; use pingouin.pairwise_corr() to obtain exact p-values for each pair.
Correlation Matrix: When analyzing multiple constructs simultaneously, a correlation matrix displays all pairwise \(r\) values. It is used to:
Identify strong predictor candidates before regression.
Detect potential multicollinearity (when two IVs correlate highly with each other, \(r > 0.85\)).
Provide a visual heatmap of relationships across the entire measurement model.
Industry Application: In retail analytics, correlation matrices reveal which product categories are purchased together. A strong positive correlation between “EV scooter sales” and “charging accessory sales” (\(r = 0.78\)) would direct Gogoro to bundle these products to increase average transaction value.
Interpreting the Output: Each cell shows the Pearson \(r\) between two construct composite scores. Values close to \(\pm 1\) indicate strong linear relationships; values near 0 indicate negligible association. Read the Sat column to identify which constructs most strongly co-vary with Customer Satisfaction — stronger correlations suggest better predictor candidates for structural modeling. Note: \(H_0\) (\(r = 0\)) cannot be formally tested from this output alone; significance requires p-values from pingouin.pairwise_corr(). All diagonal values equal 1.0 by definition.
4.4.4 Student Task
Create StudentName_Correlation.py in Visual Studio Code. Using the construct scores above, produce a heatmap using seaborn.heatmap() with annot=True. Identify the pair of constructs with the strongest positive correlation and the pair with the weakest. Write a one-sentence interpretation of each finding.
4.4.5 Evaluation Questions
What does a Pearson correlation coefficient of \(r = 0.72\) indicate about two variables?
Why is correlation analysis performed before regression or path modeling?
What is the difference between a positive and a negative correlation?
If Brand and Sat have \(r = 0.80, p < 0.001\), what can the researcher conclude?
What is multicollinearity, and why is a very high inter-construct correlation (\(r > 0.85\)) a problem in structural modeling?
5 Module 3: Mediation, Moderation, and SEM
This module explores advanced relational modeling — how variables mediate and moderate causal paths, and how Structural Equation Modeling (SEM) tests entire theoretical frameworks simultaneously.
5.1 Section 1: Mediating Effect
5.1.1 Objective
To test whether Customer Satisfaction acts as a mediating variable that transmits the effect of Brand Image on Customer Loyalty, using the Baron-Kenny steps and bootstrapping.
Keywords: mediation, indirect effect, Baron-Kenny steps, bootstrapping, path a, path b, path c’, partial mediation, full mediation
5.1.2 Statistical Perspective
Mediation Analysis answers how or why an independent variable (IV) influences a dependent variable (DV). A mediating variable (\(M\)) sits in the causal chain between the IV and DV, partially or fully carrying the effect.
The Mediation Framework:
\[IV \xrightarrow{a} M \xrightarrow{b} DV\]
Path \(a\): Effect of IV on the mediator \(M\).
Path \(b\): Effect of \(M\) on DV, controlling for IV.
Direct path (\(c'\)): Remaining direct effect of IV on DV after controlling for \(M\).
Indirect effect (\(a \times b\)): The portion of IV’s effect on DV transmitted through\(M\).
Types of Mediation:
Full Mediation:\(c'\) becomes non-significant after introducing \(M\); the entire effect is indirect.
Partial Mediation: Both direct (\(c'\)) and indirect (\(a \times b\)) effects are significant.
Hypothesis Tests:
Path
\(H_0\)
\(H_1\)
Path \(a\) (IV → M)
\(a = 0\): Brand Image has no effect on Satisfaction
\(a \neq 0\): Significant effect exists
Path \(b\) (M → DV)
\(b = 0\): Satisfaction has no effect on Loyalty
\(b \neq 0\): Significant effect exists
Direct (\(c'\))
\(c' = 0\): No direct IV → DV effect remains
\(c' \neq 0\): Direct effect persists after mediation
Indirect (\(a \times b\))
Indirect effect \(= 0\): No mediation
Bootstrapped CI excludes zero: Mediation confirmed
Significance via Bootstrapping: The indirect effect \(a \times b\) is not normally distributed. We use bootstrapping (5,000 resamples) to build 95% CIs. If the CI does not include zero, mediation is significant.
Industry Application: In digital marketing, advertising spend (IV) increases e-commerce sales (DV) through increased website traffic (Mediator). Understanding this mediation determines whether to invest in ad reach or site experience.
Interpreting the Output: The summary table reports four rows. ACME (Average Causal Mediation Effect) is the indirect effect \(a \\times b\) — if its 95% bootstrapped CI excludes zero, Customer Satisfaction significantly mediates the Brand→Loyalty path. ADE (Average Direct Effect) is the direct path \(c'\) — a non-significant ADE alongside a significant ACME indicates full mediation. Total Effect (\(c = c' + a \\times b\)) is the unmediated effect of Brand on Loyalty. Prop. Mediated shows the fraction of the total effect transmitted through Satisfaction. Use the CI columns rather than p-values to judge significance, as bootstrapped CIs are more reliable for indirect effects.
5.1.4 Student Task
Create StudentName_Mediation.py in Visual Studio Code. Use statsmodels.stats.mediation.Mediation to test whether Sat1 mediates the relationship between Func1 (IV) and Loyal1 (DV). Fit the outcome model as Loyal1 ~ Func1 + Sat1 and the mediator model as Sat1 ~ Func1. Report the ACME and whether its 95% CI excludes zero. State whether mediation is full or partial.
5.1.5 Evaluation Questions
What is the conceptual definition of a mediating variable?
Describe the three causal paths in a mediation model (\(a\), \(b\), and \(c'\)).
What is the indirect effect and how is it calculated?
Why is bootstrapping preferred over traditional significance tests for indirect effects?
In the Gogoro scenario, what serves as the mediator between Brand Image and Loyalty?
5.2 Section 2: Moderating Effect
5.2.1 Objective
To test whether a third variable (moderator) changes the strength or direction of the relationship between an independent variable and a dependent variable, using interaction terms in regression.
Moderation Analysis answers when or for whom a relationship holds. A moderating variable (\(W\)) interacts with the IV to produce a different effect on the DV depending on the level of \(W\).
\(b_3\) (Interaction Term): A significant \(b_3\) (\(p < 0.05\)) confirms moderation — the effect of IV on DV depends on the value of \(W\). Hypothesis Test (Interaction Term):
\(H_0\):\(b_3 = 0\) — the moderator does not change the strength or direction of the IV → DV relationship; the effect of Brand Image on Satisfaction is the same regardless of Usability level.
\(H_1\):\(b_3 \neq 0\) — the effect of Brand Image on Satisfaction depends on the level of the moderating variable.
Decision rule: If \(p < 0.05\) for Brand_x_Usab, reject \(H_0\) and conclude moderation is statistically supported.
Mean-Centering: Both variables are mean-centered before creating the interaction term to reduce multicollinearity.
Simple Slopes: Plot the IV-DV relationship at low (−1 SD), mean, and high (+1 SD) levels of \(W\) to interpret the moderation.
Types of Moderation:
Enhancing: Higher \(W\) strengthens the IV → DV relationship.
Buffering: Higher \(W\) weakens the IV → DV relationship.
Crossover: The direction of IV → DV flips depending on \(W\).
Industry Application: Brand Image (\(IV\)) may have a stronger effect on Customer Satisfaction (\(DV\)) among environmentally conscious consumers (\(W\) = green attitude). The moderator reveals that the same marketing message resonates differently across customer segments.
Interpreting the Output: The coefficient table has four rows: const (intercept), Brand_c (main effect of Brand Image), Usab_c (main effect of Usability), and Brand_x_Usab (interaction term). Focus on Brand_x_Usab: the coefficient indicates the direction of moderation (positive = amplifying effect; negative = buffering effect), and the p-value tests \(H_0\): \(b_3 = 0\). If \(p < 0.05\), Usability significantly changes how Brand Image affects Customer Satisfaction — the moderating hypothesis is supported. The main effect coefficients (Brand_c, Usab_c) are interpreted at the mean of the other variable due to mean-centering.
5.2.4 Student Task
Create StudentName_Moderation.py in Visual Studio Code. Test whether Price1 moderates the relationship between Func1 and Sat1. Mean-center both Func1 and Price1, create the interaction term, and run OLS. Report the coefficient and p-value of the interaction term. State whether moderation is supported.
5.2.5 Evaluation Questions
What is the conceptual definition of a moderating variable?
How does moderation differ from mediation?
Why must predictors be mean-centered before creating an interaction term?
If the interaction term coefficient is \(b_3 = 0.22\), \(p = 0.03\), what does this mean?
How do “simple slopes” help interpret a significant moderation effect?
5.3 Section 3: Structural Equation Modeling
5.3.1 Objective
To build a Structural Equation Model (SEM) that simultaneously tests the full measurement model and structural paths linking Brand Image, Function, Usability, Price, Customer Satisfaction, and Customer Loyalty.
Keywords: structural equation modeling (SEM), confirmatory factor analysis (CFA), path analysis, CFI, RMSEA, SRMR, latent variable, model fit indices
5.3.2 Statistical Perspective
Structural Equation Modeling (SEM) integrates confirmatory factor analysis (CFA) and path analysis into a single framework. It simultaneously estimates:
Measurement model: How well observed items reflect latent constructs (loadings, AVE, reliability).
Structural model: Directional paths between latent constructs (coefficients, p-values).
Key Components:
Latent Variables: Unobserved constructs inferred from multiple indicators (e.g., Brand Image measured by Brand1, Brand2).
Structural Paths (\(\gamma, \beta\)): Directional coefficients between latent variables.
Error Terms: Measurement and structural residuals.
Model Fit Indices:
Index
Acceptable
Good
CFI
\(> 0.90\)
\(> 0.95\)
RMSEA
\(< 0.08\)
\(< 0.06\)
SRMR
\(< 0.08\)
\(< 0.05\)
\(\chi^2/df\)
\(< 5.0\)
\(< 3.0\)
Hypothesis Tests (Structural Paths):
For each path coefficient \(\beta\) in the structural model:
\(H_0\):\(\beta = 0\) — the predictor construct has no significant directional effect on the outcome construct.
\(H_1\):\(\beta \neq 0\) — a statistically significant structural relationship exists.
Decision rule: If \(p < 0.05\) for a path coefficient, reject \(H_0\) and retain the hypothesized causal path in the model.
Advantage over Regression: SEM accounts for measurement error in latent constructs by explicitly modelling each observed item as a fallible indicator of its latent variable (\(x_i = \lambda_i F + \epsilon_i\)). This produces more accurate path estimates and tests the entire theoretical model simultaneously. semopy implements full-information maximum likelihood (FIML) estimation and reports standardized fit indices (CFI, RMSEA, SRMR) required for academic publication.
Industry Application: Management consulting firms use SEM to model the full chain from “Employee Engagement” → “Service Quality” → “Customer Satisfaction” → “Revenue Growth” in a single validated model.
5.3.3 Python Example Code
import pandas as pdimport semopydf = pd.read_csv('gogoro_data.csv')# Define the full SEM: measurement model + structural pathsmodel_desc =""" # Measurement model (latent variables) Brand =~ Brand1 + Brand2 Func =~ Func1 + Func2 Sat =~ Sat1 + Sat2 Loyal =~ Loyal1 + Loyal2 # Structural paths Sat ~ Brand + Func Loyal ~ Sat + Brand + Func"""model = semopy.Model(model_desc)model.fit(df)# Path coefficients and significanceprint("=== Path Coefficients ===")print(model.inspect().to_string(index=False))# Model fit indicesprint("\n=== Model Fit Indices ===")stats = semopy.calc_stats(model)print(stats.to_string())
Interpreting the Output: The Path Coefficients table lists every estimated relationship. The Estimate column gives the standardized path coefficient; p-value tests \(H_0\): path = 0. Focus on the Sat ~ Brand and Sat ~ Func rows to see which antecedents drive Satisfaction, and the Loyal ~ Sat row to confirm the mediating role. For Model Fit Indices: CFI > 0.95 indicates excellent fit (values above 0.90 are acceptable); RMSEA < 0.08 indicates acceptable fit (< 0.05 is excellent); SRMR < 0.08 confirms the model reproduces the observed correlations well. A well-fitting SEM justifies interpreting the structural path estimates as theoretically meaningful.
5.3.4 Student Task
Create StudentName_SEM.py in Visual Studio Code. Extend the measurement model to include Usab =~ Usab1 + Usab2 and Price =~ Price1 + Price2. Add Usab and Price as additional predictors of Sat in the structural model. Re-fit with semopy and report the path coefficients for all four predictors of Satisfaction. Which construct has the largest standardized path coefficient? Check CFI and RMSEA to confirm model fit.
5.3.5 Evaluation Questions
What two sub-models does SEM combine, and what does each assess?
Why are latent variables preferred over simple sum scores in SEM?
What does CFI \(> 0.95\) indicate about a model?
Why does SEM produce more accurate path coefficients than standard regression?
In the Gogoro SEM, what is the hypothesized role of Customer Satisfaction?
5.4 Section 4: Integrated Model Testing
5.4.1 Objective
To combine mediation and moderation in a unified framework
Keywords: moderated mediation, Index of Moderated Mediation (IMM), conditional indirect effect, interaction term, integrated model, Hayes PROCESS — testing moderated mediation — and evaluate whether the indirect effect of Brand Image on Loyalty through Satisfaction varies across levels of a moderating variable.
5.4.2 Statistical Perspective
Moderated Mediation examines whether the strength of a mediated pathway (\(IV \xrightarrow{a} M \xrightarrow{b} DV\)) depends on a moderating variable \(W\).
A significant \(a_3\) (\(p < 0.05\)) confirms the indirect effect differs across levels of \(W\).
Index of Moderated Mediation (IMM):\(a_3 \times b\). If its 95% bootstrapped CI excludes zero, moderated mediation is confirmed.
Why It Matters: Brand Image may increase Satisfaction (mediated path), but this effect may be stronger for customers who are highly price-sensitive (moderated by Price perception). Integrated testing captures this complexity in a single testable model.
Industry Application: A streaming service finds that content quality (IV) boosts subscriber retention (DV) through perceived value (Mediator), but this indirect path is significantly stronger for price-sensitive users (Moderator) — directing targeted pricing strategies to high-risk churn segments.
Interaction (a3): 0.038, p=0.656
Path b (Sat->Loyal): -0.065
Index of Moderated Mediation (IMM): -0.002
Interpreting the Output:Line 1 — Interaction (\(a_3\)): The coefficient on Brand_x_Price in the Satisfaction equation. If \(p < 0.05\), \(H_0\) for the moderated path is rejected — Price significantly moderates how Brand Image drives Satisfaction. Line 2 — Path \(b\): The effect of mean-centered Satisfaction on Loyalty; a significant value confirms the mediating link. Line 3 — IMM: The product \(a_3 \times b\) quantifies moderated mediation. A positive IMM indicates the indirect Brand → Sat → Loyal path is stronger at higher Price levels; a negative IMM indicates it weakens. Formal significance of the IMM requires bootstrapped confidence intervals (95% CI excluding zero = confirmed moderated mediation).
5.4.4 Student Task
Create StudentName_IntegratedModel.py in Visual Studio Code. Test whether Usab1 moderates the mediated path from Func1 (IV) through Sat1 (Mediator) to Loyal1 (DV). Report the Index of Moderated Mediation (IMM = \(a_3 \times b\)) and interpret whether the conditional indirect effect is stronger at high or low Usability.
5.4.5 Evaluation Questions
What is moderated mediation, and how does it differ from simple mediation?
What is the Index of Moderated Mediation (IMM) and how is it interpreted?
Why must the interaction term be included in the mediator equation (path \(a\))?
If IMM = \(0.08\) with a 95% CI of \([0.02, 0.18]\), what is the conclusion?
How does integrated model testing provide more complete insights than mediation or moderation alone?
6 Module 4: Machine Learning for Business
This final module introduces machine learning algorithms applied to business analytics — covering unsupervised learning (clustering), supervised classification, supervised regression, and model evaluation.
6.1 Section 1: Clustering
6.1.1 Objective
To apply K-Means clustering to segment Gogoro customers into distinct groups based on their construct scores, enabling targeted marketing strategies.
Keywords: K-Means, unsupervised learning, Elbow Method, within-cluster sum of squares (WCSS), StandardScaler, customer segmentation, centroid
6.1.2 Statistical Perspective
Clustering is an unsupervised machine learning technique — no pre-defined labels are required. The algorithm discovers natural groupings in the data by minimizing within-cluster variance.
K-Means Algorithm: Partitions \(n\) observations into \(K\) clusters by iteratively:
Randomly initializing \(K\) cluster centroids.
Assigning each data point to the nearest centroid (Euclidean distance).
Recalculating centroids as the mean of assigned points.
Repeating until assignments no longer change (convergence).
Choosing K — The Elbow Method: Run K-Means for \(K = 1, 2, \ldots, 10\) and plot the Within-Cluster Sum of Squares (WCSS). The optimal \(K\) is at the “elbow” — where WCSS decreases more slowly (diminishing returns from adding clusters).
Feature Scaling: K-Means uses distance, so all features must be standardized (mean = 0, SD = 1) before clustering to prevent high-variance variables from dominating.
Industry Application: Gogoro can use clustering to identify customer segments: e.g., Cluster 1 = price-sensitive commuters; Cluster 2 = eco-conscious urban riders; Cluster 3 = brand-loyal enthusiasts. Each segment receives a different marketing message.
Create StudentName_Clustering.py in Visual Studio Code. Add Price and Usab construct scores to the feature set and re-run K-Means with \(K = 3\). Print the cluster means for all six constructs. Write a one-sentence business description for each cluster based on their characteristic scores.
6.1.5 Evaluation Questions
What is the difference between supervised and unsupervised machine learning?
How does the K-Means algorithm assign data points to clusters?
Why must features be standardized before applying K-Means?
What does the Elbow Method reveal, and how is the optimal \(K\) identified?
How can customer clusters from K-Means inform Gogoro’s marketing strategy?
6.2 Section 2: Classification
6.2.1 Objective
To predict a binary outcome (High vs Low Loyalty) using Logistic Regression and Decision Trees, and evaluate classifier performance using precision, recall, and F1-Score.
Classification is a supervised machine learning task where the model learns to assign observations to predefined categories based on labeled training data.
Common Algorithms:
Logistic Regression: Models the probability of a binary outcome using the sigmoid function:
A threshold (typically \(0.50\)) converts probabilities to class labels.
Decision Tree: Recursively splits the feature space using if-then rules, selecting splits that maximize class purity (Gini impurity or information gain).
Model Evaluation Metrics:
Metric
Formula
Interpretation
Accuracy
(TP+TN) / Total
Overall correct predictions
Precision
TP / (TP+FP)
Correctness of positive predictions
Recall
TP / (TP+FN)
Coverage of actual positives
F1-Score
2·P·R / (P+R)
Harmonic mean of precision and recall
Train-Test Split: Data is split (80% train / 20% test) so performance is evaluated on unseen data, preventing overfitting.
Industry Application: Banks classify loan applications as “approve” or “reject” based on income and credit history. Gogoro can classify customers as “likely to repurchase” or “likely to churn” based on satisfaction and brand perception scores.
Create StudentName_Classification.py in Visual Studio Code. Replace LogisticRegression with DecisionTreeClassifier from sklearn.tree. Train on the same features and print the classification report. Compare accuracy and F1-Score between both models. Which performs better on the test set?
6.2.5 Evaluation Questions
What is the difference between classification and regression in machine learning?
How does Logistic Regression produce a class prediction from a probability?
Why is accuracy alone insufficient when classes are imbalanced?
What does a high Recall but low Precision indicate about a classifier?
Why is it essential to evaluate classification models on a held-out test set?
6.3 Section 3: Regression
6.3.1 Objective
To predict continuous outcomes (Customer Satisfaction scores) using Linear Regression, Ridge Regression, and Lasso Regression, and compare their predictive performance on unseen data.
Regression in machine learning predicts a continuous numerical outcome from input features. Unlike statistical regression used for hypothesis testing, ML regression focuses on predictive accuracy on unseen data.
Algorithms:
Linear Regression: Minimizes the sum of squared residuals. Best when predictors are independent and relationships are linear.
Ridge Regression (\(L_2\) Regularization): Adds penalty \(\lambda \sum \beta_j^2\) to the loss function, shrinking coefficients toward zero. Controls overfitting when predictors are correlated.
Lasso Regression (\(L_1\) Regularization): Adds penalty \(\lambda \sum |\beta_j|\), which can reduce some coefficients to exactly zero — performing automatic feature selection.
Bias-Variance Trade-off:
High Bias (Underfitting): Model too simple; misses true patterns.
High Variance (Overfitting): Model memorizes training data; fails on new data.
Regularization reduces variance at the cost of slight bias — improving generalization.
Evaluation Metrics:
Metric
Interpretation
MAE
Mean absolute error; easy to interpret
RMSE
Penalizes large errors more heavily
\(R^2\)
Proportion of variance explained (higher = better)
Industry Application: Gogoro’s sales team can predict next quarter’s Customer Satisfaction scores from current brand perception and functional quality metrics, enabling proactive service interventions before loyalty declines.
Linear RMSE=0.771 R2=-0.008
Ridge RMSE=0.771 R2=-0.008
Lasso RMSE=0.769 R2=-0.003
6.3.4 Student Task
Create StudentName_Regression.py in Visual Studio Code. Using the same feature set and train-test split, predict Loyal (average of Loyal1 and Loyal2) instead of Sat. Compare Linear, Ridge, and Lasso using RMSE and \(R^2\) on the test set. Adjust the alpha hyperparameter for Ridge and Lasso and observe the effect on test performance.
6.3.5 Evaluation Questions
What is the fundamental difference between regression for hypothesis testing and regression in machine learning?
How does Ridge Regression prevent overfitting compared to standard Linear Regression?
What unique property does Lasso Regression have that Ridge does not?
Why is RMSE evaluated on the test set rather than the training set?
In the bias-variance trade-off, where does a heavily regularized model sit, and what are the implications?
6.4 Section 4: Model Evaluation and Selection
6.4.1 Objective
To systematically compare machine learning models using cross-validation and performance metrics, and select the best model for deployment in a business context.
Building a model is only half the job. Model evaluation determines whether a trained model will perform reliably on future, unseen business data. Model selection chooses the best algorithm and hyperparameters using principled, data-driven criteria.
Cross-Validation (k-Fold CV): The dataset is split into \(k\) equal folds. The model trains on \(k-1\) folds and validates on the remaining fold — repeated \(k\) times. The final score is the mean across all folds:
Hyperparameter Tuning — Grid Search: Systematically searches a predefined grid of hyperparameter values, evaluating each combination via cross-validation and selecting the best-performing combination.
Learning Curves: Plot training score and CV score as a function of training set size:
High bias (underfitting): Both scores are low — add complexity.
High variance (overfitting): Training score is high but CV score is much lower — reduce complexity or gather more data.
Model Comparison Framework:
Model
Strengths
Weaknesses
Logistic Regression
Interpretable, fast
Linear boundaries only
Decision Tree
Intuitive rules
Prone to overfitting
Random Forest
High accuracy, robust
Less interpretable
Ridge/Lasso
Handles multicollinearity
Still linear
Industry Application: Gogoro’s data team tests three churn prediction models side-by-side using 5-fold CV. The model with the best trade-off between F1-Score and interpretability is chosen for deployment — because executives need to understand why a customer is flagged as at-risk, not just that they are.
Logistic Regression F1 = 0.168 +/- 0.115
Decision Tree F1 = 0.363 +/- 0.121
6.4.4 Student Task
Create StudentName_ModelEvaluation.py in Visual Studio Code. Add a third model — RandomForestClassifier from sklearn.ensemble (use n_estimators=100) — to the comparison loop. Report the 5-fold CV F1-Score for all three models. Which model would you recommend for Gogoro’s churn prediction task? Justify your choice in a comment using at least two criteria (e.g., performance, interpretability, training time).
6.4.5 Evaluation Questions
Why is k-fold cross-validation preferred over a single train-test split for model evaluation?
What does a large gap between training score and cross-validation score indicate?
How does Grid Search help in selecting the best model hyperparameters?
Why might a business prefer a slightly less accurate model that is more interpretable?
What considerations beyond accuracy matter when selecting a machine learning model for deployment?
7 Appendix: Standard Answers to Evaluation Questions
7.1 Module 1: Data Preprocessing
7.1.1 Section 1: Data Cleaning
Why is data cleaning the necessary first step before any statistical analysis? Data cleaning ensures that results are not distorted by errors, missing values, or careless responses. Invalid inputs produce invalid outputs (“garbage in, garbage out”), undermining the credibility of all downstream analysis.
What does straight-lining indicate about a respondent’s survey answers? Straight-lining means a respondent selected the same answer for every item regardless of content, indicating inattentiveness or disengagement. It introduces artificial consistency that does not reflect genuine opinions.
How can missing data skew statistical results? Missing data reduces sample size and statistical power. If data are not missing at random, remaining cases may be systematically biased, leading to incorrect parameter estimates and conclusions.
What does the dropna() function do in pandas?dropna() removes rows (or columns) containing any NaN (missing) values from a DataFrame, producing a complete-case dataset for analysis.
Statistically, why do we calculate variance across a row to detect bad survey responses? A genuine respondent will express varying opinions across items, producing nonzero row variance. A row variance near zero signals that the respondent answered identically throughout, indicating straight-lining behavior.
7.1.2 Section 2: Descriptive Statistics
What is central tendency in statistics? Central tendency is the measure that identifies the center of a distribution. The three main measures are the mean (arithmetic average), median (middle value when sorted), and mode (most frequently occurring value).
How does standard deviation explain the dispersion of the data? Standard deviation quantifies the average distance of data points from the mean. A large SD indicates high variability; a small SD indicates that observations cluster tightly around the mean.
Why do we need to know the frequencies of demographic variables? Demographic frequencies reveal the composition of the sample (gender, age, education), allowing researchers to assess representativeness and evaluate whether the sample generalizes to the target population.
What pandas method provides a quick statistical summary of a numerical column?df.describe() returns count, mean, standard deviation, min, 25th/50th/75th percentile, and max for each numerical column in the DataFrame.
How can descriptive statistics help determine if a sample represents a target population? Researchers compare sample descriptive statistics (means, proportions) to known population parameters. Large discrepancies signal potential sampling bias and threaten external validity.
7.1.3 Section 3: Outlier Detection
What is the difference between a univariate and a multivariate outlier? A univariate outlier is extreme on a single variable. A multivariate outlier may appear normal on each individual variable but is anomalous in its combination of values across multiple variables simultaneously.
What does Mahalanobis distance measure? Mahalanobis distance measures how far a data point is from the centroid of the multivariate distribution, accounting for correlations among variables and differences in scale between them.
Why are multivariate outliers problematic in survey data analysis? Multivariate outliers can distort covariance estimates, inflate standard errors, and skew regression coefficients, producing misleading results that do not generalize to the broader population.
What role does the \(\chi^2\) distribution play in outlier detection? The squared Mahalanobis distance approximately follows a \(\chi^2\) distribution with degrees of freedom equal to the number of variables. Points exceeding the critical value (e.g., \(p < 0.001\)) are flagged as outliers.
Statistically, what does the covariance matrix represent? The covariance matrix captures how pairs of variables vary together. In Mahalanobis distance, it normalizes the distance metric so that variables with different scales and inter-correlations contribute equally to the distance.
7.1.4 Section 4: Normality Testing
Why is the assumption of normality important in parametric statistics? Parametric tests (OLS regression, ANOVA, t-tests) assume that residuals or data follow a normal distribution. Violations can produce biased standard errors and incorrect p-values, leading to wrong inferences.
What does a high positive skewness tell us about the survey responses? High positive skewness indicates a long right tail — most respondents cluster at lower values with a few extreme high scores. This asymmetry can inflate the mean and distort mean-based statistics.
What is Kurtosis? Kurtosis measures the “tailedness” of a distribution. Positive kurtosis (leptokurtic) indicates heavy tails with many extreme values; negative kurtosis (platykurtic) indicates light tails and a relatively flat distribution.
If our data fails the normality test, what alternative statistical procedure can we use? Researchers can use non-parametric tests (Spearman correlation, Mann-Whitney U), bootstrapping methods that require no distributional assumptions, or apply transformations (log, square root) to correct skewness.
How can non-normal data affect the p-values in an OLS regression model? Non-normality inflates or deflates standard errors, producing inaccurate t-statistics and p-values. Outlier-driven skew can cause coefficients to be disproportionately influenced by a small number of extreme observations.
7.2 Module 2: Measurement Model Verification
7.2.1 Section 1: Reliability
What does reliability mean in the context of a survey measurement model? Reliability refers to the internal consistency of measurement — whether multiple items designed to measure the same construct produce correlated, consistent responses free from random error.
Why is \(\alpha \ge 0.70\) the standard threshold in management research? The threshold indicates that at least 70% of scale variance is attributable to the true underlying construct rather than random error, providing sufficient empirical confidence in the scale.
How does increasing the number of items for a construct typically affect Cronbach’s Alpha? Adding more items generally increases Alpha because the formula averages across more inter-item correlations, reducing the relative contribution of random error per individual item.
If questions about “Price” have an alpha of \(0.45\), what should the researcher conclude? An alpha of 0.45 falls far below the acceptable threshold. The researcher should review items for ambiguous wording, check for reverse-coded items, or remove the weakest items to improve internal consistency.
What is the difference between Cronbach’s Alpha and Composite Reliability? Cronbach’s Alpha assumes all items contribute equally to the construct (tau-equivalence). Composite Reliability (CR) uses actual factor loadings from CFA, relaxing this assumption and providing a more accurate reliability estimate.
7.2.2 Section 2: Convergent Validity
What does convergent validity measure? Convergent validity assesses whether multiple items designed to measure the same theoretical construct correlate strongly with each other and load significantly onto a single underlying latent factor.
In CFA, what does a factor loading of \(0.85\) imply about an item? A loading of 0.85 means \(0.85^2 = 72.25\%\) of the item’s variance is explained by the latent construct, indicating the item is a strong and reliable indicator of its intended concept.
What is the rule of thumb threshold for an acceptable factor loading? Factor loadings should be \(\lambda \ge 0.70\) in most management research contexts, indicating the item shares at least 49% of its variance with the latent construct.
What does \(AVE\) represent mathematically?\(AVE = \frac{\sum \lambda_i^2}{n}\), the average proportion of variance that the latent construct explains across all its indicator items.
Why must \(AVE\) be \(\ge 0.50\)? When \(AVE \ge 0.50\), the latent construct explains more variance in its indicators than measurement error does, confirming the scale captures genuine construct variance rather than noise.
7.2.3 Section 3: Discriminant Validity
Why is discriminant validity necessary in a survey? Discriminant validity ensures that constructs measuring distinct concepts are statistically different from each other, preventing theoretical conflation of separate constructs in structural models.
What happens to path modeling if two constructs fail discriminant validity? The model cannot reliably separate their independent effects, producing unstable, redundant, and theoretically meaningless path coefficients that cannot be trusted.
Describe the Fornell-Larcker Criterion. The Fornell-Larcker Criterion requires that each construct’s \(\sqrt{AVE}\) exceeds its correlation with every other construct in the model, confirming each construct shares more variance with its own items than with any other construct.
If \(\sqrt{AVE}\) for “Price” is \(0.60\), and its correlation with “Function” is \(0.75\), what is the conclusion? Discriminant validity fails. Since \(0.75 > 0.60\), Price and Function share more variance with each other than Price does with its own items — the two constructs are not sufficiently distinct.
Statistically, how do we combine survey items into a single construct score for basic correlation analysis? Items within each construct are averaged (or summed) to create a composite score representing the respondent’s level on that latent construct.
7.2.4 Section 4: Correlation Analysis
What does a Pearson correlation coefficient of \(r = 0.72\) indicate about two variables? A strong positive linear relationship: as one variable increases, the other tends to increase. The shared variance is \(0.72^2 = 51.8\%\), indicating substantial co-variation between the two constructs.
Why is correlation analysis performed before regression or path modeling? Correlation reveals which variables are meaningfully related before causal claims are made. It also identifies potential multicollinearity that could destabilize regression coefficients in subsequent models.
What is the difference between a positive and a negative correlation? A positive correlation (\(r > 0\)) means both variables increase together. A negative correlation (\(r < 0\)) means as one variable increases, the other decreases.
If Brand and Sat have \(r = 0.80, p < 0.001\), what can the researcher conclude? Brand Image and Customer Satisfaction are strongly and significantly positively correlated. Higher brand image is associated with higher satisfaction, and this relationship is extremely unlikely to be due to chance.
What is multicollinearity, and why is a very high inter-construct correlation (\(r > 0.85\)) a problem in structural modeling? Multicollinearity occurs when two predictors are so highly correlated that their independent effects cannot be distinguished. Correlations above 0.85 suggest the constructs may be measuring the same concept, inflating standard errors and making path coefficients unstable and uninterpretable.
7.3 Module 3: Mediation, Moderation, and SEM
7.3.1 Section 1: Mediating Effect
What is the conceptual definition of a mediating variable? A mediating variable lies in the causal pathway between the IV and DV, explaining the mechanism or process through which the IV exerts its influence on the DV.
Describe the three causal paths in a mediation model (\(a\), \(b\), and \(c'\)). Path \(a\): IV \(\rightarrow\) Mediator; Path \(b\): Mediator \(\rightarrow\) DV controlling for IV; Path \(c'\): Direct effect of IV \(\rightarrow\) DV after controlling for the mediator.
What is the indirect effect and how is it calculated? The indirect effect quantifies how much of the IV’s influence on the DV passes through the mediator. It is calculated as \(a \times b\) — the product of paths \(a\) and \(b\).
Why is bootstrapping preferred over traditional significance tests for indirect effects? Bootstrapping constructs an empirical confidence interval for the indirect effect by resampling thousands of times. It makes no normality assumption for the product \(a \times b\) and provides greater statistical power than the Sobel test.
In the Gogoro scenario, what serves as the mediator between Brand Image and Loyalty? Customer Satisfaction is the mediator: Brand Image influences Satisfaction (path \(a\)), and Satisfaction influences Loyalty (path \(b\)), explaining how brand perception is converted into behavioral loyalty.
7.3.2 Section 2: Moderating Effect
What is the conceptual definition of a moderating variable? A moderating variable changes the strength or direction of the relationship between the IV and DV — it answers “for whom?” or “under what conditions?” the IV-DV relationship holds.
How does moderation differ from mediation? Mediation identifies the mechanism (how/why the IV affects the DV). Moderation identifies boundary conditions (when/for whom). Mediation introduces an intervening variable; moderation introduces an interacting variable.
Why must predictors be mean-centered before creating an interaction term? Mean-centering reduces multicollinearity between main effects and the interaction term (since the interaction is their product). It also makes regression coefficients interpretable as the effect of each variable at the mean of the other.
If the interaction term coefficient is \(b_3 = 0.22\), \(p = 0.03\), what does this mean? Moderation is statistically confirmed. The positive significant interaction indicates that the effect of the IV on the DV grows stronger as the moderator increases — the moderator amplifies the relationship.
How do “simple slopes” help interpret a significant moderation effect? Simple slopes plot the IV-DV relationship at specific levels of the moderator (typically \(\pm 1\) SD from the mean), visually revealing how the strength and direction of the relationship changes as the moderator varies.
7.3.3 Section 3: Structural Equation Modeling
What two sub-models does SEM combine, and what does each assess?
Measurement Model (CFA): assesses how well observed items measure latent constructs; (2) Structural Model (Path Analysis): tests hypothesized directional relationships among latent constructs.
Why are latent variables preferred over simple sum scores in SEM? Latent variables use multiple indicators and statistically separate measurement error from true construct variance, providing more precise and unbiased estimates of relationships than fallible composite scores.
What does CFI \(> 0.95\) indicate about a model? CFI \(> 0.95\) indicates that the hypothesized model fits the data substantially better than a null independence model — a hallmark of excellent overall model fit.
Why does SEM produce more accurate path coefficients than standard regression? SEM estimates relationships on latent (error-corrected) variables, eliminating the attenuation of correlations caused by measurement error that biases ordinary regression coefficients toward zero.
In the Gogoro SEM, what is the hypothesized role of Customer Satisfaction? Customer Satisfaction is hypothesized as a full or partial mediator, transmitting the combined effects of Brand Image, Function, Usability, and Price on Customer Loyalty.
7.3.4 Section 4: Integrated Model Testing
What is moderated mediation, and how does it differ from simple mediation? Moderated mediation tests whether the indirect effect (IV \(\rightarrow\) M \(\rightarrow\) DV) varies across levels of a moderating variable. Simple mediation assumes a constant indirect effect; moderated mediation allows it to differ by moderator level.
What is the Index of Moderated Mediation (IMM) and how is it interpreted?\(IMM = a_3 \times b\), where \(a_3\) is the interaction coefficient in the mediator equation and \(b\) is the mediator-to-DV path. A significant IMM (95% CI excluding zero) confirms that the mediated pathway is moderated.
Why must the interaction term be included in the mediator equation (path \(a\))? The moderator must interact with the IV when predicting the mediator so that the \(a\) path takes different values at different levels of \(W\), creating conditional indirect effects that vary by moderator level.
If IMM = \(0.08\) with a 95% CI of \([0.02, 0.18]\), what is the conclusion? The CI excludes zero, confirming statistically significant moderated mediation. The indirect effect of Brand Image on Loyalty through Satisfaction significantly varies depending on the level of the moderating variable.
How does integrated model testing provide more complete insights than mediation or moderation alone? Integrated testing simultaneously captures the mechanism (how/why: mediation) and the boundary conditions (when/for whom: moderation), producing a richer and more complete theoretical understanding than either approach independently.
7.4 Module 4: Machine Learning for Business
7.4.1 Section 1: Clustering
What is the difference between supervised and unsupervised machine learning? Supervised learning trains models on labeled data to predict known outcomes. Unsupervised learning discovers hidden patterns or groupings in unlabeled data without predefined target categories.
How does the K-Means algorithm assign data points to clusters? K-Means assigns each point to the nearest cluster centroid by Euclidean distance, recalculates centroids as the mean of all assigned points, and iterates until assignments stabilize (convergence).
Why must features be standardized before applying K-Means? K-Means relies on Euclidean distance. Features with larger scales dominate the distance calculation. Standardization (zero mean, unit variance) ensures all features contribute equally to cluster assignments.
What does the Elbow Method reveal, and how is the optimal \(K\) identified? The Elbow Method plots within-cluster sum of squares (WCSS) against the number of clusters \(K\). The optimal \(K\) is where the rate of WCSS decrease sharply slows — additional clusters yield diminishing reductions in variance.
How can customer clusters from K-Means inform Gogoro’s marketing strategy? Distinct clusters enable targeted strategies: high-loyalty segments receive premium offers; low-loyalty, price-sensitive segments receive retention incentives; and cluster profiles inform product feature prioritization.
7.4.2 Section 2: Classification
What is the difference between classification and regression in machine learning? Classification predicts a discrete categorical outcome (e.g., High vs Low Loyalty). Regression predicts a continuous numerical outcome (e.g., exact satisfaction score).
How does Logistic Regression produce a class prediction from a probability? Logistic Regression outputs a probability between 0 and 1 via the sigmoid function. If the probability exceeds a decision threshold (default 0.5), the observation is assigned to the positive class.
Why is accuracy alone insufficient when classes are imbalanced? With imbalanced classes, a model that always predicts the majority class achieves high accuracy while making no useful predictions. Precision, Recall, and F1-Score evaluate performance on the minority class of interest.
What does a high Recall but low Precision indicate about a classifier? The model captures most true positives (few false negatives) but also incorrectly labels many negatives as positive (many false positives). It is overly aggressive in assigning the positive class label.
Why is it essential to evaluate classification models on a held-out test set? The test set provides an unbiased estimate of how the model will perform on new, unseen data. Training-set performance is optimistically biased since the model has already adapted to those examples.
7.4.3 Section 3: Regression
What is the fundamental difference between regression for hypothesis testing and regression in machine learning? Statistical regression focuses on inference — evaluating coefficients, p-values, and confidence intervals to test theory. ML regression focuses on minimizing prediction error on unseen data for accurate forecasting.
How does Ridge Regression prevent overfitting compared to standard Linear Regression? Ridge adds an L2 penalty (\(\lambda \sum \beta_i^2\)) to the loss function, shrinking all coefficients toward zero and reducing model variance. This constrains the model from fitting noise in the training data.
What unique property does Lasso Regression have that Ridge does not? Lasso uses an L1 penalty (\(\lambda \sum |\beta_i|\)) that can shrink coefficients exactly to zero, performing automatic feature selection — a property Ridge cannot achieve since it only shrinks but never eliminates coefficients.
Why is RMSE evaluated on the test set rather than the training set? Training RMSE is optimistically biased because the model was fitted to that data. Test RMSE measures genuine generalization error — performance on observations the model has never encountered.
In the bias-variance trade-off, where does a heavily regularized model sit, and what are the implications? Heavy regularization produces a high-bias, low-variance model. It underfits the training data (consistently slightly off) but remains stable and does not overfit to noise in different samples.
7.4.4 Section 4: Model Evaluation and Selection
Why is k-fold cross-validation preferred over a single train-test split for model evaluation? A single split may produce a lucky or unlucky data partition. K-fold CV averages performance over \(k\) different train/test partitions, producing a more stable and reliable estimate of generalization ability.
What does a large gap between training score and cross-validation score indicate? A large gap indicates overfitting — the model has memorized patterns specific to the training data and fails to generalize to new observations. The model has high variance.
How does Grid Search help in selecting the best model hyperparameters? Grid Search exhaustively evaluates all specified hyperparameter combinations using cross-validation, selecting the combination that achieves the best average CV performance.
Why might a business prefer a slightly less accurate model that is more interpretable? Interpretable models allow stakeholders to understand the decision logic, detect potential bias, build trust in predictions, and comply with regulatory requirements — all critical for high-stakes business decisions.
What considerations beyond accuracy matter when selecting a machine learning model for deployment? Key considerations include: interpretability for stakeholders, computational cost, inference speed, robustness to missing data, ability to explain individual predictions, regulatory compliance, and long-term maintenance burden.
8 References
Bhattacherjee, A. (2012). Social Science Research: Principles, Methods, and Practices. Global Text Project. https://digitalcommons.usf.edu/oa_textbooks/3/
Shu-Ling Hsu, S.-L. H., Shu-Ling Hsu, Y.-C. C., Yung-Chi Chang, I. V., & Ignasia Vabiola, W.-L. L. (2021). Determinants of customer loyalty of green products – The case of Gogoro in Taiwan. 企業管理學報, 46(4), 015–046. https://doi.org/10.53106/102596272021120464002