Statistics_for_Data_Scientists

📊 Statistics for Data Scientists

Comprehensive statistics cheatsheet & formula reference
Available in Turkish 🇹🇷 and English 🇬🇧

Formulas EN Cheatsheet EN Formulas TR Cheatsheet TR Kaggle Stats Cheatsheet

ML Metrics EN ML Metrics TR Kaggle ML Metrics


📖 What’s Inside?

A set of interactive HTML cheatsheets and a Kaggle notebook covering all essential statistics concepts for data science — from descriptive statistics to Bayesian inference and ML evaluation metrics. Each HTML page is single, self-contained, and can be viewed directly in your browser.

🗂️ Resources

📐 Statistics Fundamentals

Resource Language Description Link
Formula Reference 🇬🇧 English LaTeX-rendered formulas, theory only — no code View →
Cheatsheet 🇬🇧 English Formulas with Python code examples View →
Formül Rehberi 🇹🇷 Türkçe LaTeX formüller, sadece teori — kod yok Görüntüle →
Cheatsheet 🇹🇷 Türkçe Formüller ve Python kod örnekleri Görüntüle →
Kaggle Notebook — Stats Cheatsheet 🇬🇧 English Statistics cheatsheet with Python examples Open in Kaggle →

📊 ML Evaluation Metrics (NEW)

Resource Language Description Link
ML Metrics Guide 🇬🇧 English Precision, Recall, F1, ROC-AUC, Regression metrics View →
ML Metrik Rehberi 🇹🇷 Türkçe Precision, Recall, F1, ROC-AUC, Regresyon metrikleri Görüntüle →
Kaggle Notebook — ML Metrics 🇬🇧 English Interactive notebook with visualizations & runnable code Open in Kaggle →

Formula Reference — Pure theory with beautifully typeset LaTeX math (powered by KaTeX)

Cheatsheet — Same topics + ready-to-use Python code snippets you can copy-paste

ML Metrics Guide — Covers classification & regression evaluation metrics with formulas, examples, and a decision guide

Kaggle Notebook — Hands-on notebook with scikit-learn, matplotlib & seaborn visualizations


📋 Topics Covered

📐 Statistics Fundamentals (18 Sections)

# Topic Key Concepts
01 Descriptive Statistics Mean, Median, Mode, Variance, Std Dev, Skewness, Kurtosis
02 Probability Distributions Normal, Binomial, Poisson, CLT
03 Z-Score Standardization, Critical Values, CDF
04 Confidence Intervals z-interval, t-interval, Proportion CI
05 Hypothesis Testing H₀/H₁, Type I & II Errors, p-value
06 Normality Tests Shapiro-Wilk, D’Agostino, QQ-Plot
07 Variance Homogeneity Levene, Bartlett
08 T-Test One-sample, Independent, Paired, Welch
09 Z-Test Large-sample, Two-proportion
10 ANOVA One-way, F-statistic, Tukey HSD
11 Chi-Square Test Independence, Goodness of Fit
12 Correlation & Regression Pearson, Spearman, OLS, R²
13 Non-Parametric Tests Mann-Whitney U, Wilcoxon, Kruskal-Wallis
14 Effect Size Cohen’s d, Eta-squared
15 Power Analysis Sample size calculation
16 A/B Testing Proportion tests, MDE, Lift, Pitfalls
17 Bayesian Basics Bayes’ Theorem, Prior/Posterior, Medical Test Paradox
18 Decision Tree Which test to choose? + Quick checklist

📊 ML Evaluation Metrics (11 Sections)

# Topic Key Concepts
01 Confusion Matrix TP, FP, FN, TN, Type I & II Errors
02 Accuracy Overall correctness, Accuracy Paradox
03 Precision Positive predictive value, when FP is costly
04 Recall (Sensitivity) True positive rate, when FN is costly
05 F1 Score Harmonic mean, F-Beta variants (F0.5, F2)
06 Specificity True negative rate, Sensitivity vs Specificity
07 ROC Curve & AUC Threshold-independent model comparison
08 Log Loss Probability calibration, cross-entropy
09 Regression Metrics MAE, MSE, RMSE, R²
10 Metric Selection Guide Which metric for which scenario
11 Python Implementation scikit-learn code with visualizations

🖨️ Save as PDF

  1. Open any of the links above in your browser
  2. Press Ctrl + P (or Cmd + P on Mac)
  3. Select “Save as PDF”
  4. ⚠️ Enable “Background graphics” for the dark theme colors to render properly

✨ Features


📄 License

This project is open source and available for educational purposes.


Made with ❤️ for data science learners