Analyzing Impact of Mobile Game Launch for Platform

Context & Objectives

  • Kakao, the largest messenger & online services platform in Korea, launched its first gaming app “Anipang” in 2009.

    • To play, users will need to sign up for a KakaoTalk account, thereby granting them access to all other Kakao apps. Hence, a spillover effect was expected (and did occur) on the usage time of other Kakao apps.

  • In the present project, I built models to investigate:

    • Effect of Anipang adoption on KakaoStory usage time - is there a positive spillover effect?

    • Moderating effect of age in the above relationship

    Link to R script

About the Dataset

  • Individual-level weekly panel data of various mobile apps’ usage time, sample size = 849

  • 16 variables, including:

    • User ID

    • Time indicator (week before / week after Anipang launch)

    • Treatment indicator variables (adopted Anipang or not)

    • Demographics (age, income, education etc.)

    • Usage time for each of the app measured within the Kakao platform (Kakao Talk, Kakao Story, Kakao Game)

    • Usage time for external apps (other talk app, other story app, other game app)

What kind of biases may be involved in the analysis? What can be done to address them?

Selection bias

  • Users who adopt Anipang (and are therefore classified as the treatment group) may have underlying demographic or behavioral differences compared to those in the control group. These inherent differences could introduce bias in the analysis, as they may influence the outcomes independently of the treatment itself.

Solution: PSM + DID Models

  • To address this potential bias, we will first create Propensity Score Matched (PSM) samples using the dataset, ensuring that the treatment and control groups are comparable based on observed characteristics. Once the matched samples are established, we will apply Difference-in-Difference (DID) models to account for systematic differences between the two groups, helping isolate true effect of Anipang adoption.

Omitted Variable Bias

  • There are certain attributes - which potentially affected users’ adoption of Anipang or app usage behavior - that were not captured by a dataset (which is always limited in terms of the number of attributes it can include), such as: other marketing campaigns, personality traits, user lifestyles etc.

Solution: Panel Data Models

  • To address this, we will use Fixed Effects Estimation (FEE) and Dummy Variable Regression (DVR). These models account for unobserved, time-invariant variables by controlling for individual-specific characteristics that might have otherwise introduced bias. By using these techniques, we can better isolate the impact of Anipang adoption from other influencing factors that were not accounted for by the dataset.

Workflow: Pre-Modeling

A. Load libraries & read in the dataset

B. Exploratory Data Analysis

  • Investigate the demographic makeup of the sample

  • Split dataset into pre-Anipang adoption and post-Anipang adoption

  • For each of the mobile apps provided by the dataset, compare its usage time before & after Anipang adoption by the sampled users.

Findings

  • As shown in the charts below, KakaoStory usage saw the greatest percentage change after Anipang adoption compared with other apps, so already we are seeing some model-free evidence supporting our initial hypothesis.

  • Slicing the sample by age, it was observed that the spillover effect varies by age level. Older age groups tend to see a much greater increase in Kakao Story usage than younger counterparts after they adopt Anipang.

Percentage Change in App Usage Time Pre & Post Anipang Adoption

Kakao Story Usage Time Pre & Post Anipang Adoption, By Age Group

Workflow: Propensity Score Matching & Modeling

C. Calling matchit() and get_matches() to create PSM samples

  • Repeat 10 times with different combinations of hyperparameters (ratio and caliper) to create 10 different matched samples - the goal is to increase robustness of the modeling result

  • Example PSM result as shown in the table. Absolute values of SMD and eCDF Max were significantly reduced, indicating successful balancing of the covariates.

D. Building panel data models, two (fixed effect estimation & dummy variable regression) for each of the 10 PSM samples

  • Calling summary() to inspect variable significance

Model Result Summary & Interpretations

Coefficients of the treatment indicator variable in the base model and model with interaction are presented above.

  • In the base model, Anipang adoption (x) shows predominantly negative coefficients.

    • To take DVR results using sample 2 as an example: a coefficient of -266.3 means that on average, we see an extra decrease in usage of KakaoStory equaling 266.3 seconds among Anipang adopters.

    • However, none of the coefficients of Anipang adoption are statistically significant.

  • In the models where age is introduced as a moderator, the coefficient of x is greatly amplified. In the meantime, age returns a positive moderating effect.

    • To take DVR model using sample 2 with interaction term as an example: for every one unit increase in age range, the effect of x on KakaoStory usage heads toward the positive direction by 1510 seconds. Both the interaction term and ii have statistically significant coefficients across most of the 20 models.

  • Beware - the significantly enlarged coefficients are due to multicollinearity (as confirmed by calling the VIF() function on the model).

    • This means that the exact values of the coefficients are unstable and not to be trusted.

    • However, multicollinearity does not impact the prediction efficacy of the model. We can still make reliable inferences on the projected KakaoStory usage change resulting from Anipang adoption and age combined.