This project focuses on building a robust machine learning solution for EasyVisa, a visa services provider, to predict the outcome of U.S. visa applications (Certified or Denied). The model aims to optimize client screening, improve operational decision-making, and drive business insights.
To develop a predictive model using advanced machine learning techniques that:
- Classifies visa applications as Certified or Denied
- Identifies the most influential factors affecting certification
- Assists EasyVisa in prioritizing strong applications
- Source: Internal records from EasyVisa
- Size: 25,480 visa applications
- Target Variable:
case_status(Certified = 1, Denied = 0) - Features: Education level, job experience, wage, employer size, employment region, full-time status, continent of origin, etc.
- No missing or duplicate values
- Fixed negative values and standardized wage units
- Handled outliers through capping or log transformation
- Feature engineering: One-hot encoding, binary flags, company age derivation
- Oversampling: SMOTE
- Undersampling: RandomUnderSampler
- Algorithms Used:
- Decision Tree
- Random Forest
- Bagging Classifier
- AdaBoost
- Gradient Boosting (GBM)
- XGBoost
- Evaluation Metrics: Accuracy, Recall, Precision, F1 Score, ROC AUC
- Cross-validation: Stratified K-Fold
Used RandomizedSearchCV to optimize:
n_estimators,learning_rate,max_depth,min_samples_leaf, etc.
| Metric | Score |
|---|---|
| Recall | 0.8104 |
| F1 Score | 0.7968 |
| Precision | 0.7836 |
| Accuracy | 0.7239 |
| ROC AUC | 0.7239 |
Why GBM?
- Balanced performance across all metrics
- Excellent generalization
- High interpretability
- 🎓 Education: Higher degrees (Master's, Doctorate) improve approval odds
- 👨💼 Experience: Prior job experience is a strong predictor
- 🌍 Geography: South & Midwest regions yield better certification rates
- 💵 Wage: Higher and yearly wages are linked to approval
- 🧾 Full-Time Status: Full-time jobs are more likely to be approved
- Implement a pre-screening tool using GBM to flag high-risk applications
- Educate clients about factors that influence approval
- Focus resources on candidates with high approval likelihood
- Deploy a visa risk dashboard for internal triage
- Retrain model periodically to adapt to policy changes
- Include job titles, industry, employer history, and application types
- Use application-level metadata (e.g., service center, benefits)
- Develop a dynamic risk scoring system to prioritize cases
📦 easyvisa-ml-prediction/ ├── data/ ├── notebooks/ ├── models/ ├── src/ │ ├── preprocessing.py │ ├── modeling.py │ └── evaluation.py ├── reports/ ├── requirements.txt └── README.md
- Python (Pandas, Scikit-learn, XGBoost, Imbalanced-learn)
- Jupyter Notebooks
- Matplotlib / Seaborn
- Git / GitHub
Contributions are welcome. Please fork the repo and open a pull request with clear documentation of changes.
This project is licensed under the MIT License. See LICENSE for more details.