Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
378b188
added validation_config.json
niveditasing May 8, 2026
839f6a0
added validation_config.json
niveditasing May 8, 2026
faccf01
added validation_config.json
niveditasing May 8, 2026
7be6ddd
added validation_config.json
niveditasing May 8, 2026
4d1fdee
changes made
niveditasing May 11, 2026
9ef40b8
fixed validation config
niveditasing May 12, 2026
50ffd7f
fixed json
niveditasing May 12, 2026
49a82cd
fixed json
niveditasing May 12, 2026
fc48353
fix
niveditasing May 12, 2026
27d3399
updated Readme.md & added goldens
niveditasing May 14, 2026
f0dfec6
updated Readme.md & added goldens
niveditasing May 14, 2026
43a010c
Remove mistakenly pushed golden_observations.csv
niveditasing May 14, 2026
b809efe
Merge branch 'master' into implemented_golden_checks
niveditasing May 17, 2026
06b8b1b
added golden files
niveditasing May 17, 2026
87f90cb
Remove old golden files from tools directory
niveditasing May 17, 2026
bfcf6f5
added golden files
niveditasing May 17, 2026
e8970dc
modified README.md
niveditasing May 18, 2026
d965dcb
modifief validation.md
niveditasing May 18, 2026
95ad06b
modifief validation.md & output
niveditasing May 18, 2026
2016f43
Merge branch 'master' into implemented_golden_checks
niveditasing May 19, 2026
2a1af75
changed threshold
niveditasing May 20, 2026
a3dff61
added threshold
niveditasing May 21, 2026
c4be0cd
added only one column in golden_WprldBank.csv
niveditasing May 21, 2026
c8a4fa3
modified readme
niveditasing May 21, 2026
7838fb8
Merge branch 'master' into implemented_golden_checks
niveditasing May 27, 2026
859e815
fixed the golden_WorldBank.csv format
niveditasing May 27, 2026
03c7af5
removed double quotes
niveditasing Jun 4, 2026
51bff99
removed double quotes
niveditasing Jun 4, 2026
7240543
improved relative path resolution and fixed golden format
niveditasing Jun 4, 2026
469fa6a
broaden rule path resolution to match any summary_report rule
niveditasing Jun 4, 2026
25eb37d
add debug logging to path resolution
niveditasing Jun 4, 2026
1a0ece9
dynamically locate golden_data by walking up path tree
niveditasing Jun 4, 2026
5618c3e
fix stats_summary field reference in path resolution
niveditasing Jun 4, 2026
2b1a9b8
update deleted records threshold to 0.61 in validation_config.json
niveditasing Jun 4, 2026
4ae0e90
Generalize relative path resolution for GOLDENS_CHECK validator and r…
niveditasing Jun 7, 2026
e3868a4
Fix colon auto-delimiter detection in unquoted golden CSV files
niveditasing Jun 7, 2026
6eafe77
Format golden_WorldBank.csv with double quotes for all fields
niveditasing Jun 7, 2026
ee9cc29
Revert changes to tools/import_validation files
niveditasing Jun 8, 2026
ddea703
testing
niveditasing Jun 9, 2026
4b5eb09
testing
niveditasing Jun 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions scripts/world_bank/wdi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,5 +146,24 @@ If you want to perform "only download", run the below command:
python3 worldbank.py --mode=download
```

### Added golden files and increased the threshold with golden checks in validation_config.json.
Comment thread
niveditasing marked this conversation as resolved.

The `GOLDENS_CHECK` validator confirms that the import includes a specific set of expected records. This is useful for verifying that critical StatVars, Places, or specific metadata combinations are consistently present in the output.

The validator compares the input data (usually from the stats data source) against one or more "golden" files (MCF or CSV).

If any combination of values in a golden file row is missing from the input, the validation fails. The missing golden rows are then listed in the validation report JSON.

If you want to get goldens, run the below command:
```bash
#goldens from output csv
python3 validator_goldens.py --validate_goldens_input=../../scripts/world_bank/wdi/output/WorldBank.csv --generate_goldens=golden_data/golden_observations.csv --goldens_must_include="ISO3166Alpha3:gs://unresolved_mcf/import_validation/top_100k_places.csv" --generate_goldens_property_sets="ISO3166Alpha3"
```

#goldens from summary reports
```bash
python3 validator_goldens.py --validate_goldens_input="summary_report.csv" --generate_goldens=golden_data/golden_summary_report.csv --generate_goldens_property_sets="StatVar|Units|MinDate|MeasurementMethods|observationPeriod"
```

We highly recommend the use of the import validation tool for this import which
you can find in https://github.com/datacommonsorg/tools/tree/master/import-validation-helper.
217 changes: 217 additions & 0 deletions scripts/world_bank/wdi/golden_data/golden_WorldBank.csv
Comment thread
niveditasing marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
"ISO3166Alpha3"
"Earth"
"country/AGO"
"country/ALB"
"country/ARE"
"country/ARG"
"country/ARM"
"country/AUS"
"country/AUT"
"country/AZE"
"country/BEL"
"country/BEN"
"country/BFA"
"country/BGD"
"country/BGR"
"country/BHR"
"country/BIH"
"country/BLR"
"country/BOL"
"country/BRA"
"country/BRN"
"country/BWA"
"country/CAN"
"country/CHE"
"country/CHL"
"country/CHN"
"country/CIV"
"country/CMR"
"country/COD"
"country/COG"
"country/COL"
"country/CRI"
"country/CUB"
"country/CUW"
"country/CYP"
"country/CZE"
"country/DEU"
"country/DNK"
"country/DOM"
"country/DZA"
"country/ECU"
"country/EGY"
"country/ERI"
"country/ESP"
"country/EST"
"country/ETH"
"country/FIN"
"country/FRA"
"country/GAB"
"country/GBR"
"country/GEO"
"country/GHA"
"country/GIB"
"country/GNQ"
"country/GRC"
"country/GTM"
"country/HKG"
"country/HND"
"country/HRV"
"country/HTI"
"country/HUN"
"country/IDN"
"country/IND"
"country/IRL"
"country/IRN"
"country/IRQ"
"country/ISL"
"country/ISR"
"country/ITA"
"country/JAM"
"country/JOR"
"country/JPN"
"country/KAZ"
"country/KEN"
"country/KGZ"
"country/KHM"
"country/KOR"
"country/KWT"
"country/LAO"
"country/LBN"
"country/LBY"
"country/LKA"
"country/LTU"
"country/LUX"
"country/LVA"
"country/MAR"
"country/MDA"
"country/MDG"
"country/MEX"
"country/MKD"
"country/MLT"
"country/MMR"
"country/MNE"
"country/MNG"
"country/MOZ"
"country/MUS"
"country/MYS"
"country/NAM"
"country/NER"
"country/NGA"
"country/NIC"
"country/NLD"
"country/NOR"
"country/NPL"
"country/NZL"
"country/OMN"
"country/PAN"
"country/PER"
"country/PHL"
"country/POL"
"country/PRK"
"country/PRT"
"country/PRY"
"country/QAT"
"country/ROU"
"country/RUS"
"country/RWA"
"country/SAU"
"country/SDN"
"country/SEN"
"country/SGP"
"country/SLV"
"country/SRB"
"country/SSD"
"country/SUR"
"country/SVK"
"country/SVN"
"country/SWE"
"country/SWZ"
"country/SYR"
"country/TCD"
"country/TGO"
"country/TJK"
"country/TKM"
"country/TTO"
"country/TUN"
"country/TUR"
"country/TZA"
"country/UGA"
"country/UKR"
"country/URY"
"country/USA"
"country/UZB"
"country/VEN"
"country/VNM"
"country/XKS"
"country/YEM"
"country/ZAF"
"country/ZMB"
"country/ZWE"
"country/ATG"
"country/BHS"
"country/BLZ"
"country/BRB"
"country/BTN"
"country/COM"
"country/CPV"
"country/DJI"
"country/DMA"
"country/FJI"
"country/GMB"
"country/GNB"
"country/GRD"
"country/GUY"
"country/KIR"
"country/KNA"
"country/LCA"
"country/LSO"
"country/MDV"
"country/MHL"
"country/PLW"
"country/SLB"
"country/STP"
"country/SYC"
"country/TLS"
"country/TON"
"country/VCT"
"country/VUT"
"country/WSM"
"ChannelIslands"
"country/ABW"
"country/AFG"
"country/AND"
"country/ASM"
"country/BDI"
"country/BMU"
"country/CAF"
"country/CYM"
"country/FRO"
"country/FSM"
"country/GIN"
"country/GRL"
"country/GUM"
"country/IMN"
"country/LBR"
"country/LIE"
"country/MAC"
"country/MAF"
"country/MCO"
"country/MLI"
"country/MNP"
"country/MRT"
"country/MWI"
"country/NCL"
"country/PNG"
"country/PRI"
"country/PSE"
"country/PYF"
"country/SLE"
"country/SMR"
"country/SOM"
"country/SXM"
"country/TCA"
"country/VIR"
"country/PAK"
"country/VGB"
"country/THA"
71 changes: 71 additions & 0 deletions scripts/world_bank/wdi/golden_data/golden_summary_report.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"NumPlaces","StatVar","ScalingFactors","MeasurementMethods","Units","observationPeriods","MinDate"
"186","Count_Death_IntentionalSelfHarm_Male_AsFractionOf_Count_Person_Male","[]","[]","[Per100000Males]","[P1Y]","2000"
"203","Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity","[]","[]","[InternationalDollar]","[P1Y]","1990"
"165","Count_Person_Upto4Years_Wasting_AsFractionOf_Count_Person_Upto4Years","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1983"
"144","Count_Person_25OrMoreYears_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears","[]","[]","[]","[P1Y]","1994"
"204","Amount_Emissions_CarbonDioxide_PerCapita","[]","[]","[MetricTon]","[P1Y]","1970"
"184","Count_Person_25OrMoreYears_Male_TertiaryEducation_AsFractionOf_Count_Person_25OrMoreYears_Male","[]","[]","[]","[P1Y]","1970"
"218","LifeExpectancy_Person_Female","[]","[]","[Year]","[P1Y]","1960"
"139","Count_Person_25OrMoreYears_Male_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears_Male","[]","[]","[]","[P1Y]","1994"
"197","Count_CriminalActivities_MurderAndNonNegligentManslaughter_AsFractionOf_Count_Person","[]","[]","[Per100000Persons]","[P1Y]","1990"
"194","Amount_EconomicActivity_ExpenditureActivity_HealthcareExpenditure_AsFractionOf_Count_Person","[]","[]","[InternationalDollar, USDollar]","[P1Y]","2000"
"202","Amount_EconomicActivity_ExpenditureActivity_EducationExpenditure_Government_AsFractionOf_Amount_EconomicActivity_ExpenditureActivity_Government","[100]","[]","[Percent]","[P1Y]","1980"
"188","Count_Person_25OrMoreYears_Male_BachelorsDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears_Male","[]","[]","[]","[P1Y]","1970"
"218","FertilityRate_Person_Female","[]","[]","[]","[]","1960"
"218","Count_Person_Rural","[]","[WorldBankEstimate]","[]","[P1Y]","1960"
"183","Count_Person_25OrMoreYears_Female_TertiaryEducation_AsFractionOf_Count_Person_25OrMoreYears_Female","[]","[]","[]","[P1Y]","1970"
"218","Count_Person_Urban","[]","[WorldBankEstimate]","[]","[P1Y]","1960"
"165","Count_Person_Upto4Years_Overweight_AsFractionOf_Count_Person_Upto4Years","[]","[]","[]","[P1Y]","1983"
"218","LifeExpectancy_Person_Male","[]","[]","[Year]","[P1Y]","1960"
"218","Count_BirthEvent_LiveBirth_AsFractionOf_Count_Person","[]","[]","[Per1000Persons]","[P1Y]","1960"
"197","MortalityRate_Person_Upto4Years_AsFractionOf_Count_BirthEvent_LiveBirth","[]","[]","[Per1000LiveBirths]","[P1Y]","1960"
"218","Count_Person","[]","[]","[]","[P1Y]","1960"
"160","Count_Person_Upto4Years_Male_Wasting_AsFractionOf_Count_Person_Upto4Years_Male","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1986"
"204","Amount_EconomicActivity_ExpenditureActivity_EducationExpenditure_Government_AsFractionOf_Amount_EconomicActivity_GrossDomesticProduction_Nominal","[100]","[]","[Percent]","[P1Y]","1970"
"188","Count_Person_25OrMoreYears_BachelorsDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears","[]","[]","[]","[P1Y]","1970"
"165","Count_Person_15OrMoreYears_Female_Smoking_AsFractionOf_Count_Person_15OrMoreYears_Female","[]","[AgeAdjustedPrevalence]","[]","[P1Y]","2000"
"165","Count_Person_15OrMoreYears_Smoking_AsFractionOf_Count_Person_15OrMoreYears","[]","[AgeAdjustedPrevalence]","[]","[P1Y]","2000"
"203","Amount_EconomicActivity_GrossNationalIncome_PurchasingPowerParity_PerCapita","[]","[]","[InternationalDollar]","[P1Y]","1990"
"160","Count_Person_Upto4Years_Male_Overweight_AsFractionOf_Count_Person_Upto4Years_Male","[]","[]","[]","[P1Y]","1986"
"195","Amount_EconomicActivity_ExpenditureActivity_TertiaryEducationExpenditure_Government_AsFractionOf_Amount_EconomicActivity_ExpenditureActivity_EducationExpenditure_Government","[]","[]","[]","[P1Y]","1970"
"159","Count_Person_Upto4Years_Male_SevereWasting_AsFractionOf_Count_Person_Upto4Years_Male","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1986"
"151","Amount_Consumption_Electricity_PerCapita","[]","[]","[KilowattHour]","[P1Y]","1990"
"180","Amount_Consumption_Energy_PerCapita","[]","[]","[KilogramOfOilEquivalent]","[P1Y]","1990"
"186","Count_Death_IntentionalSelfHarm_Female_AsFractionOf_Count_Person_Female","[]","[]","[Per100000Females]","[P1Y]","2000"
"165","Count_Person_15OrMoreYears_Male_Smoking_AsFractionOf_Count_Person_15OrMoreYears_Male","[]","[AgeAdjustedPrevalence]","[]","[P1Y]","2000"
"149","Count_CriminalActivities_MurderAndNonNegligentManslaughter_Male_AsFractionOf_Count_Person_Male","[]","[]","[Per100000Males]","[P1Y]","1990"
"200","Amount_Remittance_InwardRemittance_AsFractionOf_Amount_EconomicActivity_GrossDomesticProduction_Nominal","[100]","[WorldBankEstimate]","[Percent]","[P1Y]","1970"
"188","Count_Person_15To64Years_InLaborForce_AsFractionOf_Count_Person_15To64Years","[]","[]","[]","[P1Y]","1990"
"171","GiniIndex_EconomicActivity","[]","[WorldBankEstimate]","[]","[P1Y]","1963"
"162","Count_Person_25OrMoreYears_Female_MastersDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears_Female","[]","[]","[]","[P1Y]","1990"
"170","Count_Person_25OrMoreYears_MastersDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears","[]","[]","[]","[P1Y]","1990"
"152","Count_CriminalActivities_MurderAndNonNegligentManslaughter_Female_AsFractionOf_Count_Person_Female","[]","[]","[Per100000Females]","[P1Y]","1990"
"188","Count_Person_15To64Years_Female_InLaborForce_AsFractionOf_Count_Person_15To64Years_Female","[]","[]","[]","[P1Y]","1990"
"104","Amount_Stock_AsFractionOf_Amount_EconomicActivity_GrossDomesticProduction_Nominal","[100]","[]","[Percent]","[P1Y]","1975"
"131","Count_Person_25OrMoreYears_Female_DoctorateDegree_AsFractionOf_Count_Person_25OrMoreYears_Female","[]","[]","[]","[P1Y]","1994"
"215","GrowthRate_Amount_EconomicActivity_GrossDomesticProduction","[]","[]","[]","[P1Y]","1961"
"218","Count_Death_AsAFractionOfCount_Person","[]","[WorldBankWeightedAverage]","[Per1000Persons]","[P1Y]","1960"
"215","Amount_EconomicActivity_GrossDomesticProduction_Nominal","[]","[]","[USDollar]","[P1Y]","1960"
"188","Count_Person_15To64Years_Male_InLaborForce_AsFractionOf_Count_Person_15To64Years_Male","[]","[]","[]","[P1Y]","1990"
"200","Amount_Remittance_InwardRemittance","[]","[WorldBankEstimate]","[USDollar]","[P1Y]","1970"
"161","Count_Person_Upto4Years_SevereWasting_AsFractionOf_Count_Person_Upto4Years","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1983"
"188","Count_Person_25OrMoreYears_Female_BachelorsDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears_Female","[]","[]","[]","[P1Y]","1970"
"167","Count_Person_25OrMoreYears_Male_MastersDegreeOrHigher_AsFractionOf_Count_Person_25OrMoreYears_Male","[]","[]","[]","[P1Y]","1990"
"188","Amount_Consumption_Alcohol_15OrMoreYears_AsFractionOf_Count_Person_15OrMoreYears","[]","[WorldHealthOrganizationEstimates]","[Liter]","[P1Y]","2000"
"188","Count_Person_15OrMoreYears_InLaborForce_Female_AsFractionOf_Count_Person_InLaborForce","[]","[]","[]","[P1Y]","1990"
"215","Count_Product_MobileCellularSubscription_AsFractionOf_Count_Person","[]","[]","[]","[P1Y]","1960"
"188","Count_Person_InLaborForce","[]","[InternationalLaborOrganization]","[]","[P1Y]","1990"
"186","Count_Death_IntentionalSelfHarm_AsFractionOf_Count_Person","[]","[]","[Per100000Persons]","[P1Y]","2000"
"197","Count_Death_0Years_AsFractionOf_Count_BirthEvent_LiveBirth","[]","[UnitedNationsIGMEEstimate]","[Per1000LiveBirths]","[P1Y]","1960"
"160","Count_Person_Upto4Years_Female_Wasting_AsFractionOf_Count_Person_Upto4Years_Female","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1986"
"203","Amount_Remittance_OutwardRemittance","[]","[WorldBankEstimate]","[USDollar]","[P1Y]","1970"
"160","Count_Person_Upto4Years_Female_Overweight_AsFractionOf_Count_Person_Upto4Years_Female","[]","[]","[]","[P1Y]","1986"
"214","Count_Person_IsInternetUser_PerCapita","[100]","[]","[]","[P1Y]","1990"
"210","Amount_Production_ElectricityFromNuclearSources_AsFractionOf_Amount_Production_Energy","[]","[]","[]","[P1Y]","1990"
"159","Count_Person_Upto4Years_Female_SevereWasting_AsFractionOf_Count_Person_Upto4Years_Female","[100]","[JointChildMalnutritionEstimate]","[Percent]","[P1Y]","1986"
"184","Count_Person_25OrMoreYears_TertiaryEducation_AsFractionOf_Count_Person_25OrMoreYears","[]","[]","[]","[P1Y]","1970"
"210","Amount_Production_ElectricityFromOilGasOrCoalSources_AsFractionOf_Amount_Production_Energy","[]","[]","[]","[P1Y]","1990"
"218","GrowthRate_Count_Person","[]","[]","[]","[P1Y]","1961"
"213","Amount_Consumption_RenewableEnergy_AsFractionOf_Amount_Consumption_Energy","[]","[]","[]","[P1Y]","1990"
"104","Amount_Stock","[]","[]","[USDollar]","[P1Y]","1975"
"218","LifeExpectancy_Person","[]","[]","[Year]","[]","1960"
3 changes: 2 additions & 1 deletion scripts/world_bank/wdi/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@
"WorldBankCountries.csv",
"schema_csvs/WorldBankIndicators_prod.csv"
],
"cron_schedule": "0 11 * * 2"
"cron_schedule": "0 11 * * 2",
"validation_config_file": "validation_config.json"
}
]
}
28 changes: 28 additions & 0 deletions scripts/world_bank/wdi/validation_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"schema_version": "1.0",
"rules": [
{
"rule_id": "check_deleted_records_percent",
"description": "Checks that the percentage of deleted points is within the threshold.",
"validator": "DELETED_RECORDS_PERCENT",
"params": {
"threshold": 0.61
}
},
{
"rule_id": "check_goldens_output_csv",
"validator": "GOLDENS_CHECK",
"params": {
"golden_files": "golden_data/golden_WorldBank.csv",
Comment thread
niveditasing marked this conversation as resolved.
"input_files": "output/WorldBank.csv"
}
},
{
"rule_id": "check_goldens_summary_report",
"validator": "GOLDENS_CHECK",
"params": {
"golden_files": "golden_data/golden_summary_report.csv"
}
}
]
}
Loading
Loading