Introduction

UNHCR’s current budget allocation process is needs-based and bottom-up, and therefore explicitly not “formulaic”. The methodology starts with comprehensive, context-specific assessments at the field level, prioritizes within projected funding constraints, and aggregates upward without rigid mathematical formulas dictating fixed shares per outcome area or population.

Though, since the introduction of Results Based Management (RBM) in 2009, the organization collects a wealth of outcome indicators every year, recording for each a baseline, current and target values. This allows for the calculation of progress towards targets, and the estimation of the cost of achieving those targets, therefore creating the potential for an outcome based allocation formula.

Pairing the current needs-based approach with a formula could offer several benefits:

Greater Predictability and Stability: Formula-based elements provide more consistent year-to-year or cross-operation baselines, reducing reliance on ad-hoc reprioritization when funding falls short. This helps operations plan longer-term (e.g., multi-year strategies) without sudden mid-year cuts disrupting protection activities.
Improved Transparency and Accountability : A clear, objective formula makes allocation decisions easier to explain to donors, the Executive Committee, partners, and affected communities. It reduces perceptions of subjectivity or favoritism (e.g., why one outcome area gets more than another) and strengthens audit trails—vital for an organization under constant scrutiny.
Enhanced Equity and Fairness: Formulas incorporating standardized indicators (e.g., per-capita needs, vulnerability indices, or GNI-adjusted host burdens) could better ensure resources reach the most underserved populations proportionally, minimizing biases toward high-profile emergencies while under-resourcing protracted or “forgotten” situations.
Efficiency in Decision-Making: In a system handling dozens of operations and 16 outcome areas, formulas automate or guide initial allocations, freeing time for nuanced qualitative adjustments. This could speed up budgeting cycles and reduce transaction costs compared to fully manual, consultative processes.
Better Comparability and Learning: Standardized formulas enable benchmarking across regions/operations (e.g., cost-per-outcome trends) and facilitate evidence-based refinements over time, supporting organizational learning and donor confidence in value-for-money.
Risk Mitigation in Funding-Constrained Scenarios: With persistent shortfalls, a formula could enforce minimum “floors” for core protection outcomes (e.g., status determination or child protection), preventing deprioritization of essential but less visible activities.

This notebook shows how different budget choices affect the lives of the people UNHCR serves and attempt to bring a bit of formula to the allocation process. Think of it as a “what-if” machine. What if we spent more on clean water? What if we focused entirely on protecting children? We use records from outcome indicators to teach our machine what works, and then we run a simulation to see the results.

To get an overall view, success can be measured by looking at six simple beneficiary-focused effects to be improved for all displaced households:

Lowering protection risks (keeping people safe from harm).
Reducing vulnerability (helping people become more stable).
Meeting basic needs gaps (making sure people have enough food, housing, and water).
Improving health and lowering morbidity (preventing sickness).
Ensuring kids can go to school and have access to education.
Supporting service efficiency (making sure our own systems, like registration, work well).

How we do this in simple terms:

Learning from the past: We look at outcome indicators of previous year. If a project aimed to build 100 shelters and built 90, we know it was 90% successful. We use this to estimate how much “real-world good” comes from every dollar spent.
The Virtual World: We create a computer simulation of thousands of digital households. We let them live out a year in our simulation. If we do nothing, their situations slowly drift and get worse.
Testing Ideas: We give our digital households different support packages based on different budget ideas (like “Equal split”, “Focus on basic needs”, or “AI suggested”). We run each test many times to be perfectly sure of the results.
Finding the Best Value: We compare the results to find out which budget plan gives the most support to the most people for the least amount of money.

This is done across 16 main areas of work. The map below shows how spending on a specific area (like Education) directly connects to improving one of our six goals (like Access to Education). Effects have also inherent interactions.

Code

import networkx as nx
import matplotlib.pyplot as plt

# 1. Start by creating an empty "map" or graph where we will place our nodes and links.
G = nx.DiGraph()

# 2. Define the 'Outcome Areas'—these are the 16 types of programs UNHCR funds.
outcomes = [
    "OA1-Access", "OA2-Status", "OA3-Stateless", "OA4-Gender", "OA5-Child",
    "OA6-Safety", "OA7-Community", "OA8-Basic", "OA9-Housing", "OA10-Health",
    "OA11-Education", "OA12-WASH", "OA13-Well-being", "OA14-16-Solutions"
]

# 3. Define the 'Effects'—these are the 6 key ways we measure improvement in people's lives.
effects = [
    "protection_risk", "vulnerability", "service_efficiency",
    "basic_needs_gap", "morbidity", "education_access"
]

# 4. Add these categories to our graph so the computer knows they exist.
G.add_nodes_from(outcomes + effects)

# 5. Define the "Primary Links": how spending in one program area is expected to change an effect.
# For example, spending on "Education" (OA11) should "increase" "education_access".
edges = [
    ("OA1-Access", "protection_risk", "decrease"),
    ("OA1-Access", "vulnerability", "decrease"),
    ("OA2-Status", "protection_risk", "decrease"),
    ("OA2-Status", "vulnerability", "decrease"),
    ("OA3-Stateless", "protection_risk", "decrease"),
    ("OA3-Stateless", "vulnerability", "decrease"),
    ("OA4-Gender", "protection_risk", "decrease"),
    ("OA5-Child", "protection_risk", "decrease"),
    ("OA4-Gender", "vulnerability", "decrease"),
    ("OA5-Child", "vulnerability", "decrease"),
    ("OA6-Safety", "protection_risk", "decrease"),
    ("OA7-Community", "service_efficiency", "increase"),
    ("OA7-Community", "vulnerability", "decrease"),
    ("OA8-Basic", "basic_needs_gap", "decrease"),
    ("OA9-Housing", "basic_needs_gap", "decrease"),
    ("OA10-Health", "morbidity", "decrease"),
    ("OA11-Education", "education_access", "increase"),
    ("OA12-WASH", "basic_needs_gap", "decrease"),
    ("OA12-WASH", "service_efficiency", "increase"),
    ("OA13-Well-being", "vulnerability", "decrease"),
    ("OA14-16-Solutions", "vulnerability", "decrease"),
    ("OA14-16-Solutions", "protection_risk", "decrease")
]

# 6. Tell the graph to draw arrows for all these primary links.
G.add_edges_from([(u, v) for u, v, _ in edges])

# 7. Define "Secondary Links" (Collinearity): how these effects naturally pull on each other.
# For example, if sickness (morbidity) goes down, the basic needs gap usually improves too.
collinearity_edges = [
    ("protection_risk", "vulnerability"),
    ("basic_needs_gap", "morbidity"),
    ("service_efficiency", "education_access"),
    ("basic_needs_gap", "vulnerability")
]

# 8. Set the layout: Programs on the left, Goals/Effects on the right.
pos = {}

# Layout for the left side (Outcomes)
x_left = -1
x_left_2 = -1.25
y_start = 1
step = -0.15

for i, node in enumerate(outcomes):
    x = x_left if i % 2 == 0 else x_left_2
    y = y_start + step * i
    pos[node] = (x, y)

# Layout for the right side (Effects)
x_right = 1.2
x_right_2 = 1.45
y_effects_start = 0.6
effect_step = -0.22

for i, node in enumerate(effects):
    x = x_right if i % 2 == 0 else x_right_2
    y = y_effects_start + effect_step * i
    pos[node] = (x, y)

# 9. Actually draw the diagram with specific colors and shapes.
plt.figure(figsize=(8, 10))

# Outcomes are green circles.
nx.draw_networkx_nodes(
    G, pos,
    nodelist=outcomes,
    node_color="lightgreen",
    node_size=2200,
    alpha=0.9
)

# Effects are blue squares.
nx.draw_networkx_nodes(
    G, pos,
    nodelist=effects,
    node_color="lightblue",
    node_size=2600,
    node_shape="s",
    alpha=0.85
)

# Add the labels.
nx.draw_networkx_labels(G, pos, font_size=9)

# Draw the arrows: Blue means something increases, Red means it decreases.
for u, v, t in edges:
    color = "blue" if t == "increase" else "red"
    nx.draw_networkx_edges(
        G, pos,
        edgelist=[(u, v)],
        edge_color=color,
        width=2,
        arrowsize=20,
        arrowstyle='-|>',
        connectionstyle="arc3,rad=0.2"
    )

# Draw dotted gray lines for the secondary "spill-over" links.
for u, v in collinearity_edges:
    nx.draw_networkx_edges(
        G, pos,
        edgelist=[(u, v)],
        style="dotted",
        edge_color="gray",
        width=2,
        arrows=False,
        connectionstyle="arc3,rad=0.25"
    )

# 10. Add a legend so the reader knows what the colors and shapes mean.
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='lightgreen', label='Outcome Areas'),
    Patch(facecolor='lightblue', label='Effects'),
    plt.Line2D([0], [0], color='blue', lw=2, label='Increase (+)'),
    plt.Line2D([0], [0], color='red', lw=2, label='Decrease (–)'),
    plt.Line2D([0], [0], color='gray', lw=2, linestyle='dotted', label='Collinearity')
]

plt.legend(handles=legend_elements, loc='upper right')

# Final adjustments to make it look clean.
plt.axis('off')
plt.tight_layout()
plt.show()

Analysis Steps

Step 1: Gathering the Data

UNHCR as part of its commitment to transparency, makes available extensive information about its activities through the International Aid Transparency Initiative (IATI). Using this public data source, we can look at past projects to see what the budget & the goal were, and what actually happened by the end of the year.

Code

# 1. Load the necessary tools (libraries) for data processing and visualization.
import os, re, json, math, random, textwrap, logging
from collections import defaultdict
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from IPython.display import display, HTML

# Set the visual style for all our charts.
plt.style.use('seaborn-v0_8')
plt.rcParams.update({'figure.figsize':(7,5),'axes.titlesize':12,'axes.labelsize':11,'legend.fontsize':10})

# Set up a logger to keep track of what the computer is doing.
logging.basicConfig(level=logging.INFO, format='[%(levelname)s] %(message)s')

# Try to load secret environment variables (like API keys) if they exist.
try:
    from dotenv import load_dotenv
    load_dotenv('.env')
except Exception:
    pass

# 2. Retrieve the settings for this specific simulation (the "Parameters").
try:
    PARAMS = params
except NameError:
    PARAMS = {}

# We set defaults for things like which operation we are looking at and how many people to simulate.
ACTIVITY_ID = PARAMS.get('activity_id', "XM-DAC-41121-2024-AME-PER") 
MONTHS = int(PARAMS.get('months', 12))
HOUSEHOLDS = int(PARAMS.get('households', 10000))
USE_LLM = str(PARAMS.get('use_llm', 'true')).lower() in ['1','true','yes']
BASE_SEED = int(PARAMS.get('seed', 7))
N_RUNS = int(PARAMS.get('n_runs', 20))
DEA_VRS = bool(PARAMS.get('dea_vrs', 'true').__str__().lower() in ['1','true','yes'])

# We also set parameters for how different effects "leak" into each other (collinearity).
COLLINEARITY_PR_VUL = float(PARAMS.get('col_pr_vul', 0.2)) # protection_risk -> vulnerability
COLLINEARITY_VUL_PR = float(PARAMS.get('col_vul_pr', 0.2)) # vulnerability -> protection_risk
COLLINEARITY_BNG_MOR = float(PARAMS.get('col_bng_mor', 0.2)) # basic_needs_gap -> morbidity
COLLINEARITY_MOR_BNG = float(PARAMS.get('col_mor_bng', 0.2)) # morbidity -> basic_needs_gap
COLLINEARITY_SE_EA = float(PARAMS.get('col_se_ea', 0.2)) # service_efficiency -> education_access
COLLINEARITY_EA_SE = float(PARAMS.get('col_ea_se', 0.2)) # education_access -> service_efficiency
COLLINEARITY_BNG_VUL = float(PARAMS.get('col_bng_vul', 0.2)) # basic_needs_gap -> vulnerability
COLLINEARITY_VUL_BNG = float(PARAMS.get('col_vul_bng', 0.2)) # vulnerability -> basic_needs_gap

# 3. Locate and check the raw data file (the public IATI record).
xml_path = 'unhcr-activities-2024.xml'
assert os.path.exists(xml_path), 'Place unhcr-activities-2024.xml alongside this .qmd'

# Set up the random number generator so our results are consistent and reproducible.
from numpy.random import SeedSequence, default_rng
ss_root = SeedSequence(BASE_SEED)

# Define simple helper functions to clean up text and numbers from the raw data.
def get_text(e, default=None):
    return e.text.strip() if (e is not None and e.text) else default

def safe_float(x, default=None):
    try:
        if x is None: return default
        return float(str(x).strip())
    except Exception:
        return default

# 4. Open and parse the massive XML data file.
import xml.etree.ElementTree as ET
root = ET.parse(xml_path).getroot()

# Create empty lists to store the different pieces of information we find.
activities, sectors, transactions, budgets = [], [], [], []
indicators_rows = []

# Loop through every project (activity) in the file and pull out relevant details.
for act in root.findall('.//iati-activity'):
    aid = get_text(act.find('iati-identifier'))
    title = get_text(act.find('title/narrative'))
    status = (act.find('activity-status').get('code') if act.find('activity-status') is not None else None)

    # Extract information about which sectors (Outcome Areas) are involved.
    for s in act.findall('sector'):
        sectors.append({'activity_id': aid,'vocabulary': s.get('vocabulary'),'code': s.get('code'),
                        'percentage': safe_float(s.get('percentage'), 0.0),'label': get_text(s.find('narrative'))})

    # Extract the planned budget for each project.
    for b in act.findall('budget'):
        v = b.find('value')
        budgets.append({'activity_id': aid,'type': b.get('type'),'status': b.get('status'),
                        'period_start': b.find('period-start').get('iso-date') if b.find('period-start') is not None else None,
                        'period_end': b.find('period-end').get('iso-date') if b.find('period-end') is not None else None,
                        'currency': v.get('currency') if v is not None else None,
                        'value': safe_float(v.text if v is not None else None, 0.0)})

    # Extract every financial transaction (how much was actually spent).
    for tr in act.findall('transaction'):
        val = tr.find('value')
        usd = get_text(tr.find('{http://reporting.unhcr.org/}value-USD'))
        transactions.append({'activity_id':aid,'type':get_text(tr.find('transaction-type')),
                             'date': tr.find('transaction-date').get('iso-date') if tr.find('transaction-date') is not None else None,
                             'value': safe_float(val.text if val is not None else None),
                             'currency': val.get('currency') if val is not None else None,
                             'value_usd': safe_float(usd,0.0),'provider': get_text(tr.find('provider-org/narrative'))})

    # Extract the outcome indicators (the targets and actual results).
    for res in act.findall('result'):
        r_title = get_text(res.find('title/narrative'))
        r_type = res.get('type')
        m = re.match(r"\s*(\d+)\.", r_title or '')
        idx = m.group(1) if m else None
        code = f'IA{idx}' if (r_type=='3' and idx) else (f'OA{idx}' if (r_type=='2' and idx) else None)
        for ind in res.findall('indicator'):
            i_title = get_text(ind.find('title/narrative'))
            baseline = ind.find('baseline')
            bval = safe_float(baseline.get('value')) if baseline is not None else None
            periods = ind.findall('period')
            if not periods:
                indicators_rows.append({'activity_id':aid,'result_code':code,'result_type':r_type,'result_title':r_title,
                                        'indicator_title':i_title,'baseline':bval,'target':None,'actual':None,
                                        'period_start':None,'period_end':None})
            for per in periods:
                p_start = per.find('period-start').get('iso-date') if per.find('period-start') is not None else None
                p_end = per.find('period-end').get('iso-date') if per.find('period-end') is not None else None
                t = per.find('target'); a = per.find('actual')
                indicators_rows.append({'activity_id':aid,'result_code':code,'result_type':r_type,'result_title':r_title,
                                        'indicator_title':i_title,'baseline':bval,
                                        'target': safe_float(t.get('value')) if t is not None else None,
                                        'actual': safe_float(a.get('value')) if a is not None else None,
                                        'period_start':p_start,'period_end':p_end})

# 5. Convert all these lists into clean, searchable tables (DataFrames).
activities_df = pd.DataFrame([{'activity_id': get_text(act.find('iati-identifier')),
                               'title': get_text(act.find('title/narrative')),
                               'status': (act.find('activity-status').get('code') if act.find('activity-status') is not None else None)}
                               for act in root.findall('.//iati-activity')])
sectors_df = pd.DataFrame(sectors)
transactions_df = pd.DataFrame(transactions)
budgets_df = pd.DataFrame(budgets)
indicators_df = pd.DataFrame(indicators_rows)

# 6. Filter the data to focus ONLY on the specific country/activity we selected at the start.
if ACTIVITY_ID is None:
    ACTIVITY_ID = activities_df['activity_id'].iloc[0]
assert ACTIVITY_ID in set(activities_df['activity_id']), 'activity_id not found'

sel_act = activities_df[activities_df['activity_id']==ACTIVITY_ID].iloc[0]

# Split the data into different categories (Impact Areas, Outcome Areas, and Indicators).
sel_ia = sectors_df[(sectors_df['activity_id']==ACTIVITY_ID) & (sectors_df['vocabulary']=='99') & (sectors_df['code'].str.startswith('IA', na=False))].copy()
sel_oa = sectors_df[(sectors_df['activity_id']==ACTIVITY_ID) & (sectors_df['vocabulary']=='98') & (sectors_df['code'].str.startswith('OA', na=False))].copy()
sel_inds = indicators_df[indicators_df['activity_id']==ACTIVITY_ID].copy()

# 7. Calculate the total budget and actual expenditure reported for this project.
sel_bud = budgets_df[budgets_df['activity_id']==ACTIVITY_ID].copy()
if not sel_bud.empty:
    currs = set(sel_bud['currency'].dropna().unique().tolist())
    if len(currs) > 1:
        logging.warning("Budget values use multiple currencies; totals may be incomparable without FX conversion.")
    year_rows = sel_bud[(sel_bud['type']=='1') & (sel_bud['status']=='2')]
    budget_total_usd = float(year_rows['value'].sum()) if not year_rows.empty else float(sel_bud['value'].sum())
else:
    budget_total_usd = 0.0

print(f"Selected activity: {ACTIVITY_ID} - {sel_act['title']}")
print(f"Budget total (as reported): ${budget_total_usd:,.2f}")

sel_exp = transactions_df[(transactions_df['activity_id']==ACTIVITY_ID) & (transactions_df['type']=='4')]
expenditure_total_usd = float(sel_exp['value_usd'].sum()) if not sel_exp.empty else 0.0
print(f"Expenditure total: ${expenditure_total_usd:,.2f}")

Selected activity: XM-DAC-41121-2024-AME-PER - UNHCR operation in Peru (2024)
Budget total (as reported): $71,778,432.00
Expenditure total: $0.00

Step 2: Learning from Past Success

Here we figure out how powerful each type of outcome spending is. If spending on “Education” historically hits all its targets, we give it a high score. We use these scores to predict how much a future project will support our digital households.

Simple Explanation: How We Grade Our Programs

Did we reach the target? If a past project’s goal was to build 100 shelters (from a baseline of 0), and they actually built 80, we mathematically score that outcome area as 80% successful ((Actual - Baseline) / (Target - Baseline) = +80%). We are very strict: we only look at projects where we have the exact Actual, Target, and Baseline numbers recorded.
Mapping to our 6 goals: We link that historic success score directly to one of our six main household goals. For example, if the “Water and Sanitation” outcome area achieves a high target score, spending money on it will strongly improve our digital households’ “Basic Needs” score.
Keeping it realistic (Capping the score): If a project completely over-delivers (say, building 200 shelters instead of 100, which would be 200%), or if a slight data error implies a massive 500% jump, we automatically cap the maximum possible effect. The script limits the maximum positive or negative effect a outcome can have to exactly ±20%. This prevents the simulation from breaking reality with impossible miracles or disasters just because of one anomalous data point.

Code

# 1. Define human-readable labels for the 16 Outcome Areas (OA) used by UNHCR.
OA_FULL_LABELS = {
    'OA1': 'OA1. Access to Territory, Registration and Documentation',
    'OA2': 'OA2. Status Determination',
    'OA3': 'OA3. Protection Policy and Law',
    'OA4': 'OA4. Gender-based Violence',
    'OA5': 'OA5. Child Protection',
    'OA6': 'OA6. Safety and Access to Justice',
    'OA7': 'OA7. Community Engagement and Participation',
    'OA8': 'OA8. Well-Being and Basic Needs',
    'OA9': 'OA9. Sustainable Housing and Settlements',
    'OA10': 'OA10. Healthy Lives',
    'OA11': 'OA11. Education',
    'OA12': 'OA12. Clean Water, Sanitation and Hygiene',
    'OA13': 'OA13. Self Reliance, Economic Inclusion and Livelihoods',
    'OA14': 'OA14. Voluntary Repatriation and Sustainable Reintegration',
    'OA15': 'OA15. Resettlement and Complementary Pathways',
    'OA16': 'OA16. Local Integration and other Local Solutions',
}

# 2. Map these labels to the specific data we extracted earlier.
SRF_MAP = {}
for _, r in sel_ia.iterrows():
    SRF_MAP[r['code']] = {
        'domain': r['label'],
        'level': 'IA',
        'percentage': float(r['percentage'] or 0.0)
    }
for _, r in sel_oa.iterrows():
    label = OA_FULL_LABELS.get(r['code'], r['label'])
    SRF_MAP[r['code']] = {
        'domain': label,
        'level': 'OA',
        'percentage': float(r['percentage'] or 0.0)
    }

# 3. Use "Bayesian Estimation" to figure out how successful each program area is.
# This is a fancy way of saying we look at the results and filter out "noise" or lucky guesses.
import pymc as pm
import numpy as np
import pandas as pd
import pytensor.tensor as pt

# Clean the data: only look at indicators where we have baseline, actual, and target numbers.
df_imp = sel_inds.dropna(subset=['baseline', 'actual', 'target']).copy()

# Calculate the "Raw Improvement" (how close we got to the target).
df_imp["raw_improve"] = (df_imp["actual"] - df_imp["baseline"]) / \
                        (df_imp["target"] - df_imp["baseline"])
df_imp["raw_improve"] = df_imp["raw_improve"].clip(-1, 1)

# Categorize the data by program area.
codes = df_imp["result_code"].astype("category").cat
code_idx = codes.codes.values
n_codes = len(codes.categories)

# Build a statistical model that estimates the "true" success rate of each program area.
with pm.Model() as m:
    
    # Estimate the global average improvement across all programs.
    mu_global = pm.Normal("mu_global", mu=0.0, sigma=0.3)

    # Estimate how much each specific program (OA) typically differs from that average.
    sigma_oa = pm.Exponential("sigma_oa", 10)
    oa_effect = pm.Normal("oa_effect",
                          mu=mu_global,
                          sigma=sigma_oa,
                          shape=n_codes)

    # Account for the random noise found in individual indicators.
    sigma_obs = pm.Exponential("sigma_obs", 20)

    # Tell the model to learn from the actual improvements we observed.
    obs = pm.Normal(
        "obs",
        mu = oa_effect[code_idx],
        sigma = sigma_obs,
        observed = df_imp["raw_improve"].values
    )

    # Run the "Sampling" process (the computer runs thousands of tests to find the best fit).
    idata = pm.sample(
        draws=800,
        tune=1200,
        chains=2,
        target_accept=0.9,
        progressbar=False
    )

# Extract the final "Success Scores" for each program area.
posterior_means = idata.posterior["oa_effect"].mean(
    dim=("chain", "draw")
).values

# Store these scores in a dictionary for the simulation to use.
code_imp = dict(zip(codes.categories, posterior_means))

# 4. Map the programs to the specific household goals they improve.
# This is a "Hybrid" mapping: part expert knowledge, part data-driven correlation.

# Expert Mapping: Which programs SHOULD influence which goals?
EXP_OA_TO_VARS = {
    'OA1': ['protection_risk','vulnerability'],
    'OA2': ['protection_risk','vulnerability'],
    'OA3': ['protection_risk','vulnerability'],
    'OA4': ['protection_risk','vulnerability'],
    'OA5': ['protection_risk','vulnerability'],
    'OA6': ['protection_risk'],
    'OA7': ['service_efficiency','vulnerability'],
    'OA8': ['basic_needs_gap'],
    'OA9': ['basic_needs_gap'],
    'OA10':['morbidity'],
    'OA11':['education_access'],
    'OA12':['basic_needs_gap','service_efficiency'],
    'OA13':['vulnerability'],
    'OA14':['vulnerability','protection_risk'],
    'OA15':['vulnerability','protection_risk'],
    'OA16':['vulnerability','protection_risk'],
}

EXP_IA_TO_VARS = {
    'IA1':['protection_risk','vulnerability'],
    'IA2':['basic_needs_gap','morbidity','education_access','vulnerability'],
    'IA3':['vulnerability','education_access','service_efficiency'],
    'IA4':['vulnerability'],
}

# Data-driven part: check how strongly the numbers in our data actually correlate.
corr_scores = {}
for c in df_imp["result_code"].unique():
    subset = df_imp[df_imp["result_code"] == c]
    if len(subset) < 3:
        corr_scores[c] = 0
        continue
    numeric = subset[["baseline","actual","target"]].corr().abs()
    score = float(numeric.mean().mean())
    corr_scores[c] = score if not pd.isna(score) else 0.0

max_corr = max(corr_scores.values()) if corr_scores else 1
norm_corr = {c: corr_scores[c]/max_corr for c in corr_scores}

# Combine the Expert and Data-driven parts (60% expert weight).
alpha = 0.6

OA_TO_VARS = {}
for code, vars_ in EXP_OA_TO_VARS.items():
    data_strength = norm_corr.get(code, 0)
    if pd.isna(data_strength): data_strength = 0.0
    OA_TO_VARS[code] = {
        v: alpha*1.0 + (1-alpha)*data_strength
        for v in vars_
    }

IA_TO_VARS = {}
for code, vars_ in EXP_IA_TO_VARS.items():
    data_strength = norm_corr.get(code, 0)
    if pd.isna(data_strength): data_strength = 0.0
    IA_TO_VARS[code] = {
        v: alpha*1.0 + (1-alpha)*data_strength
        for v in vars_
    }

# 5. Define a "Non-Linear" scale for how success turns into real-world change.
# This means that the first few dollars spent have a big impact, but it gets harder to improve further.
INCREASING_VARS = {'education_access', 'service_efficiency'}
ELASTICITY_BETA = 0.15       # Controls the intensity of the effect.
MAX_ABS_EFFECT  = 0.20       # A safety cap to keep the simulation realistic.

def nonlinear_effect(imp, beta=ELASTICITY_BETA):
    if imp is None or np.isnan(imp):
        return 0.0
    imp = float(np.clip(imp, -1.0, 1.0))
    # Use a logarithmic curve to represent diminishing returns.
    return float(np.sign(imp) * np.log1p(abs(imp)) * beta)

# Compute the final calculated effects for every program area.
code_effects = {}

for code, imp in code_imp.items():
    if not code:
        continue
    var_map = OA_TO_VARS.get(code, {}) if code.startswith('OA') else IA_TO_VARS.get(code, {})
    if not var_map:
        continue
    effs = {}
    for var, w in var_map.items():
        # Flip the sign for things we want to DECREASE (like risks or sickness).
        signed_imp = imp if var in INCREASING_VARS else -imp
        e = nonlinear_effect(signed_imp) * float(w)
        effs[var] = float(np.clip(e, -MAX_ABS_EFFECT, MAX_ABS_EFFECT))
    if effs:
        code_effects[code] = effs

# 6. Organize everything into a "Library of Effects" for the simulation.
from collections import defaultdict
label_to_effects = defaultdict(list)
for code, effs in code_effects.items():
    info = SRF_MAP.get(code, {})
    label = f"{code} - {info.get('domain', code)}"
    label_to_effects[label].append(effs)

SRF_EFFECT_LIBRARY = {}
for label, lst in sorted(label_to_effects.items()):
    all_vars = set().union(*[d.keys() for d in lst]) if lst else set()
    SRF_EFFECT_LIBRARY[label] = {
        v: float(np.mean([d.get(v, 0.0) for d in lst]))
        for v in all_vars
    }

# 7. Create a "Traceability Table" to show the math behind every score.
from IPython.display import Markdown
print("\n[Indicator-to-Effect Traceability]")
trace_rows = []
for code, imp in code_imp.items():
    if not code.startswith('OA'):
        continue
    info = SRF_MAP.get(code, {})
    inds = df_imp[df_imp['result_code'] == code] if not df_imp.empty else pd.DataFrame()
    if inds.empty:
        inds = sel_inds[sel_inds['result_code'] == code]
    vars_ = OA_TO_VARS.get(code, [])
    budget_pct = info.get('percentage', 0.0) * 100
    display_label = f"{code} - {info.get('domain', code)}"
    eff_lib_entry = SRF_EFFECT_LIBRARY.get(display_label, {})
    oa_full = f"{OA_FULL_LABELS.get(code, display_label)} ({budget_pct:.1f}% of budget)"
    for _, idx_row in inds.iterrows():
        for v in vars_:
            raw_val = idx_row.get('raw_improve', idx_row.get('improve', None))
            trace_rows.append({
                'Effect': v,
                'Outcome': oa_full,
                'Indicator': idx_row['indicator_title'],
                'Baseline': idx_row['baseline'],
                'Actual': idx_row['actual'],
                'Target': idx_row['target'],
                'Raw Improvement': round(float(raw_val), 4) if pd.notna(raw_val) else None,
                'Effect Magnitude': round(eff_lib_entry.get(v, 0.0), 4),
            })

if trace_rows:
    trace_df = pd.DataFrame(trace_rows).sort_values(['Effect', 'Outcome'])
    def _effect_cell_color(val):
        if pd.isna(val) or val is None: return ''
        if val > 0:
            intensity = min(abs(val) * 5, 1.0)
            return f'background-color:rgba(46,204,113,{intensity:.2f});color:{"white" if intensity>0.5 else "black"}'
        elif val < 0:
            intensity = min(abs(val) * 5, 1.0)
            return f'background-color:rgba(231,76,60,{intensity:.2f});color:{"white" if intensity>0.5 else "black"}'
        return ''
    def _imp_cell_color(val):
        if pd.isna(val) or val is None: return ''
        if val > 0:
            return 'color:#27ae60;font-weight:bold'
        elif val < 0:
            return 'color:#e74c3c;font-weight:bold'
        return ''
    md_out = "::: {.panel-tabset}\n\n"
    for effect_name, group in trace_df.groupby('Effect'):
        md_out += f"## {effect_name.replace('_', ' ').title()}\n\n"
        rows_html = ''
        for _, r in group.iterrows():
            eff_style  = _effect_cell_color(r['Effect Magnitude'])
            imp_style  = _imp_cell_color(r['Raw Improvement'])
            rows_html += (
                f'<tr>'
                f'<td style="font-size:12px">{r["Outcome"]}</td>'
                f'<td style="font-size:11px;max-width:250px">{r["Indicator"]}</td>'
                f'<td style="text-align:right;font-size:11px">{r["Baseline"]}</td>'
                f'<td style="text-align:right;font-size:11px">{r["Actual"]}</td>'
                f'<td style="text-align:right;font-size:11px">{r["Target"]}</td>'
                f'<td style="text-align:center;{imp_style}">{r["Raw Improvement"]:.4f}</td>'
                f'<td style="text-align:center;{eff_style}">{r["Effect Magnitude"]:.4f}</td>'
                f'</tr>\n'
            )
        table_html = f"""
        <div style="max-height:400px; overflow-y:auto; border:1px solid #ddd; border-radius:4px;">
        <table style="border-collapse:collapse; width:100%; font-family:sans-serif; font-size:13px;">
          <thead>
            <tr>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:left; position:sticky; top:0; z-index:1;">Outcome</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:left; position:sticky; top:0; z-index:1;">Indicator</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:right; position:sticky; top:0; z-index:1;">Baseline</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:right; position:sticky; top:0; z-index:1;">Actual</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:right; position:sticky; top:0; z-index:1;">Target</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:center; position:sticky; top:0; z-index:1;">Raw Improvement</th>
              <th style="background:#2c3e50; color:white; padding:8px 10px; text-align:center; position:sticky; top:0; z-index:1;">Effect Magnitude</th>
            </tr>
          </thead>
          <tbody>
        {rows_html}
          </tbody>
        </table>
        </div>
        """
        md_out += f"```{=html}\n{table_html}\n```\n\n"
    md_out += ":::\n"
    display(Markdown(md_out))
else:
    print("No indicator traceability data found.")

# 8. Build final summary tables and visual networks for the Outcome Areas.
srf_df = pd.DataFrame.from_dict(SRF_MAP, orient='index').reset_index().rename(columns={'index':'Code'})
eff_rows = []
for display_label, effs in SRF_EFFECT_LIBRARY.items():
    for var, val in effs.items():
        eff_rows.append({'Domain': display_label, 'Variable': var, 'Effect': val})

if eff_rows:
    eff_df = pd.DataFrame(eff_rows)
    pivot_eff = eff_df.pivot(index='Domain', columns='Variable', values='Effect').fillna(0.0)
    def style_effects(v):
        if v == 0: return ""
        color = "0, 114, 181" if v > 0 else "214, 39, 40"
        alpha = min(abs(v) * 4, 1.0)
        return f"background-color: rgba({color}, {alpha:.2f}); color: {'white' if alpha > 0.5 else 'black'}"
    print("\n[Predicted Outcome Effects Estimation]")
    oa_pivot_eff = pivot_eff[pivot_eff.index.str.startswith('OA')].copy()
    pivot_index_to_full_label = {idx: idx.split(' - ', 1)[1] if ' - ' in idx else idx for idx in oa_pivot_eff.index}
    oa_pivot_eff = oa_pivot_eff.rename(index=pivot_index_to_full_label)
    oa_order = list(OA_FULL_LABELS.values())
    oa_pivot_eff = oa_pivot_eff.reindex(oa_order)
    if not oa_pivot_eff.empty:
        display(oa_pivot_eff.style.map(style_effects).format("{:.3f}"))
    else:
        print("No Outcome Area (OA) effects calculated for this activity.")
else:
    print("\n[Predicted Effects Estimation]")
    print("No effects calculated for this activity based on available indicator data.")

# Define the network plotting tool to visualize the results.
import networkx as nx
import matplotlib.pyplot as plt
import textwrap

def plot_srf_network(srf_map, effect_lib, level_filter=None):
    G = nx.DiGraph()
    outcome_color = "#2ecc71"
    effect_color = "#b4d4ff"
    zero_edge_color = "#bdc3c7"
    filtered_map = ({k: v for k, v in srf_map.items() if v['level'] == level_filter} if level_filter else srf_map)
    if not filtered_map:
        return
    pcts = [v.get("percentage", 0.0) for v in filtered_map.values()]
    min_pct, max_pct = (min(pcts), max(pcts)) if pcts else (0.0, 1.0)
    min_node_size, max_node_size = 300, 2200
    for code, info in filtered_map.items():
        full_label = f"{code} - {info['domain']}"
        label_wrapped = "\n".join(textwrap.wrap(full_label, 20))
        pct = info.get("percentage", 0.0)
        pct_text = f"({pct:.1f}%)"
        node_label = f"{label_wrapped}\n{pct_text}"
        norm = (pct - min_pct) / (max_pct - min_pct) if max_pct > min_pct else 0.5
        node_size = min_node_size + int(norm * (max_node_size - min_node_size))
        G.add_node(node_label, type="outcome", color=outcome_color, size=node_size)
        for eff_name, value in effect_lib.get(full_label, {}).items():
            eff_label = "\n".join(textwrap.wrap(eff_name, 15))
            if eff_label not in G:
                G.add_node(eff_label, type="effect", color=effect_color, size=2600)
            if value == 0:
                G.add_edge(node_label, eff_label, weight=1.0, color=zero_edge_color, val=value, style="dotted", arrow=False)
            else:
                color = "#3498db" if value > 0 else "#e74c3c"
                G.add_edge(node_label, eff_label, weight=1.5 + abs(value) * 80, color=color, val=value, style="solid", arrow=True)
    pos = {}
    outcomes = [n for n, d in G.nodes(data=True) if d['type'] == "outcome"]
    effects  = [n for n, d in G.nodes(data=True) if d['type'] == "effect"]
    x1, x2, y0, dy = -1.3, -1.05, 1.0, -0.16
    for i, n in enumerate(outcomes): pos[n] = (x1 if i % 2 == 0 else x2, y0 + i * dy)
    xr1, xr2, y0e, dye = 1.2, 1.45, 0.8, -0.22
    for i, n in enumerate(effects): pos[n] = (xr1 if i % 2 == 0 else xr2, y0e + i * dye)
    plt.figure(figsize=(8, 10))
    nx.draw_networkx_nodes(G, pos, nodelist=outcomes, node_color=outcome_color, node_size=[G.nodes[n]["size"] for n in outcomes], alpha=0.95)
    nx.draw_networkx_nodes(G, pos, nodelist=effects, node_color=effect_color, node_size=[G.nodes[n]["size"] for n in effects], alpha=0.95, node_shape="s")
    for u, v, d in G.edges(data=True):
        nx.draw_networkx_edges(G, pos, edgelist=[(u, v)], width=d["weight"], edge_color=d["color"], style=d["style"], arrows=d.get("arrow", False), arrowsize=25, connectionstyle="arc3,rad=0.22")
    edge_labels = {(u, v): f"{d['val']:.3f}" for u, v, d in G.edges(data=True)}
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, font_size=7, font_color="#2c3e50")
    nx.draw_networkx_labels(G, pos, font_size=8, font_weight="bold")
    plt.title("Effect of Outcome Areas\n(Blue = Increase / Red = Decrease / Grey = Zero)", fontweight="bold", fontsize=14)
    plt.axis("off")
    plt.tight_layout()
    plt.show()

# Show the results.
plot_srf_network(SRF_MAP, SRF_EFFECT_LIBRARY, level_filter='OA')


[Indicator-to-Effect Traceability]

Outcome	Indicator	Baseline	Actual	Target	Raw Improvement	Effect Magnitude
OA8. Well-Being and Basic Needs (4057.0% of budget)	8.1 Proportion of people that receive cash transfers and/or non-food items	4.0	2.0	6.0	-1.0000	0.0241
OA8. Well-Being and Basic Needs (4057.0% of budget)	8.2 Proportion of people with primary reliance on clean (cooking) fuels and technology	96.0	90.0	100.0	-1.0000	0.0241

Outcome	Indicator	Baseline	Actual	Target	Raw Improvement	Effect Magnitude
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.3 Proportion of people with legally recognized identity documents or credentials	65.0	100.0	95.0	1.0000	-0.0214
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.1 Proportion of refugees and asylum seekers registered on an individual basis	16.0	18.0	100.0	0.0238	-0.0214
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.2 Proportion of children under 5 years of age whose births have been registered with a civil authority	100.0	100.0	100.0	nan	-0.0214
OA15. Resettlement and Complementary Pathways (131.0% of budget)	15.1 Number of refugees submitted by UNHCR for resettlement	1532.0	2745.0	3130.0	0.7591	-0.0134
OA3. Protection Policy and Law (0.0% of budget)	3.1 Extent national legal framework is in line with the 1951 Convention and/or its 1967 Protocol	2.0	2.0	1.0	-0.0000	-0.0005
OA3. Protection Policy and Law (0.0% of budget)	3.2 Extent national legal framework is in line with the 1961 Convention on the Reduction of Statelessness	1.0	1.0	1.0	nan	-0.0005
OA4. Gender-based Violence (724.0% of budget)	4.1 Proportion of people who know where to access available GBV services	33.0	59.0	85.0	0.5000	-0.0071
OA4. Gender-based Violence (724.0% of budget)	4.3 Proportion of survivors who are satisfied with GBV case management services	91.0	91.0	100.0	0.0000	-0.0071
OA5. Child Protection (708.0% of budget)	5.3 Proportion of unaccompanied and separated children who are in an alternative care arrangement	10.0	39.0	100.0	0.3222	0.0055
OA5. Child Protection (708.0% of budget)	5.2 Proportion of children who participate in community-based child protection programmes	21.0	15.0	25.0	-1.0000	0.0055
OA5. Child Protection (708.0% of budget)	5.1 Proportion of children at heightened risk who are supported by a Best Interests Procedure	54.0	64.0	85.0	0.3226	0.0055

Outcome	Indicator	Baseline	Actual	Target	Raw Improvement	Effect Magnitude
OA7. Community Engagement and Participation (923.0% of budget)	7.1 Extent participation of displaced and stateless people across programme phases is supported.	1.0	2.0	1.0	1.0000	0.0287
OA7. Community Engagement and Participation (923.0% of budget)	7.2 Proportion of people who have access to safe feedback and response mechanisms	0.0	47.0	100.0	0.4700	0.0287
OA7. Community Engagement and Participation (923.0% of budget)	7.3 Proportion of women participating in leadership/management structures	80.0	77.0	70.0	0.3000	0.0287

Outcome	Indicator	Baseline	Actual	Target	Raw Improvement	Effect Magnitude
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.3 Proportion of people with legally recognized identity documents or credentials	65.0	100.0	95.0	1.0000	-0.0214
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.1 Proportion of refugees and asylum seekers registered on an individual basis	16.0	18.0	100.0	0.0238	-0.0214
OA1. Access to Territory, Registration and Documentation (624.0% of budget)	1.2 Proportion of children under 5 years of age whose births have been registered with a civil authority	100.0	100.0	100.0	nan	-0.0214
OA13. Self Reliance, Economic Inclusion and Livelihoods (2408.0% of budget)	13.1. Proportion of people with an account at a bank or other financial institution or with a mobile-money-service provider	54.0	71.0	60.0	1.0000	-0.0128
OA13. Self Reliance, Economic Inclusion and Livelihoods (2408.0% of budget)	13.2. Proportion of people who self-report positive changes in their income compared to previous year	6.0	5.0	50.0	-0.0227	-0.0128
OA15. Resettlement and Complementary Pathways (131.0% of budget)	15.1 Number of refugees submitted by UNHCR for resettlement	1532.0	2745.0	3130.0	0.7591	-0.0134
OA3. Protection Policy and Law (0.0% of budget)	3.1 Extent national legal framework is in line with the 1951 Convention and/or its 1967 Protocol	2.0	2.0	1.0	-0.0000	-0.0005
OA3. Protection Policy and Law (0.0% of budget)	3.2 Extent national legal framework is in line with the 1961 Convention on the Reduction of Statelessness	1.0	1.0	1.0	nan	-0.0005
OA4. Gender-based Violence (724.0% of budget)	4.1 Proportion of people who know where to access available GBV services	33.0	59.0	85.0	0.5000	-0.0071
OA4. Gender-based Violence (724.0% of budget)	4.3 Proportion of survivors who are satisfied with GBV case management services	91.0	91.0	100.0	0.0000	-0.0071
OA5. Child Protection (708.0% of budget)	5.3 Proportion of unaccompanied and separated children who are in an alternative care arrangement	10.0	39.0	100.0	0.3222	0.0055
OA5. Child Protection (708.0% of budget)	5.2 Proportion of children who participate in community-based child protection programmes	21.0	15.0	25.0	-1.0000	0.0055
OA5. Child Protection (708.0% of budget)	5.1 Proportion of children at heightened risk who are supported by a Best Interests Procedure	54.0	64.0	85.0	0.3226	0.0055
OA7. Community Engagement and Participation (923.0% of budget)	7.1 Extent participation of displaced and stateless people across programme phases is supported.	1.0	2.0	1.0	1.0000	-0.0287
OA7. Community Engagement and Participation (923.0% of budget)	7.2 Proportion of people who have access to safe feedback and response mechanisms	0.0	47.0	100.0	0.4700	-0.0287
OA7. Community Engagement and Participation (923.0% of budget)	7.3 Proportion of women participating in leadership/management structures	80.0	77.0	70.0	0.3000	-0.0287


[Predicted Outcome Effects Estimation]

Variable	basic_needs_gap	education_access	morbidity	protection_risk	service_efficiency	vulnerability
Domain
OA1. Access to Territory, Registration and Documentation	0.000	0.000	0.000	-0.021	0.000	-0.021
OA2. Status Determination	nan	nan	nan	nan	nan	nan
OA3. Protection Policy and Law	nan	nan	nan	nan	nan	nan
OA4. Gender-based Violence	0.000	0.000	0.000	-0.007	0.000	-0.007
OA5. Child Protection	0.000	0.000	0.000	0.006	0.000	0.006
OA6. Safety and Access to Justice	nan	nan	nan	nan	nan	nan
OA7. Community Engagement and Participation	0.000	0.000	0.000	0.000	0.029	-0.029
OA8. Well-Being and Basic Needs	0.024	0.000	0.000	0.000	0.000	0.000
OA9. Sustainable Housing and Settlements	nan	nan	nan	nan	nan	nan
OA10. Healthy Lives	nan	nan	nan	nan	nan	nan
OA11. Education	nan	nan	nan	nan	nan	nan
OA12. Clean Water, Sanitation and Hygiene	nan	nan	nan	nan	nan	nan
OA13. Self Reliance, Economic Inclusion and Livelihoods	0.000	0.000	0.000	0.000	0.000	-0.013
OA14. Voluntary Repatriation and Sustainable Reintegration	nan	nan	nan	nan	nan	nan
OA15. Resettlement and Complementary Pathways	0.000	0.000	0.000	-0.013	0.000	-0.013
OA16. Local Integration and other Local Solutions	nan	nan	nan	nan	nan	nan

Step 3: “What-If” Budget Alternatives

We want to find the best way to spend our budget. To do this, we create several totally different “What-If” spending plans. By testing all of them, we can see which one supports people the most.

The Spending Plans We Will Test:

Current Plan: We spend the money exactly how it is currently planned in the official budget.
Equal Spilt: We divide the money perfectly evenly across all active areas.
Help the Struggling Programs: We give more money to the outcomes that are currently failing to meet their targets, hoping to close the gap.
AI Suggested: We ask an Artificial Intelligence to look at all the data and suggest the smartest way to spend the money.
Protection-First: We put 70% of the money into keeping people safe and determining their legal status.
Basic Needs Priority: We put 60% of the money directly into food, housing, and clean water.
Long-Term Solutions: We put 50% of the money into supporting people integrate locally or return home permanently.
Gender & Child Focus: We put 40% of the money into protecting women and children.

Code

# 1. Set up random number generators for each outcome area to ensure simulation variety.
all_effect_domains = sorted(SRF_EFFECT_LIBRARY.keys())
ss_acts = SeedSequence(BASE_SEED).spawn(len(all_effect_domains))
rng_by_domain = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, ss_acts)}

# 2. Define the "Activity" class: this acts as a blueprint for a specific intervention.
# It calculates how many households are supported based on the budget share.
class Activity:
    def __init__(self, name, srf_domain, budget_share, coverage=0.3, intensity=1.0, rng=None):
        self.name = name
        self.srf_domain = srf_domain
        self.budget_share = max(0.0, min(1.0, float(budget_share)))
        self.coverage = min(0.9, max(0.05, coverage))
        self.intensity = float(intensity)
        self.effects = SRF_EFFECT_LIBRARY.get(srf_domain, {})
        self.rng = rng or default_rng()

    def apply(self, hh, budget_scale=1.0, return_effect_only=False):
        # Calculate the impact without actually changing the households yet (this happens in the simulator).
        if not self.effects:
            return {} if return_effect_only else None
        effective_coverage = min(0.9, self.coverage * budget_scale)
        k = int(effective_coverage * hh.n)
        if k <= 0:
            return {} if return_effect_only else None
        idx = self.rng.choice(hh.n, size=k, replace=False)
        effect_dict = {var: (eff * self.intensity * EFFECT_SCALE_MULT, idx) for var, eff in self.effects.items()}
        return effect_dict if return_effect_only else None

# 3. Create helper functions to build our different "What-If" portfolios.
def portfolio_from_shares(df, cap=8):
    if df.empty or df['percentage'].sum() <= 0: return []
    d = df.sort_values('percentage', ascending=False).head(cap).copy()
    total = d['percentage'].sum(); acts = []
    for row in d.itertuples():
        label = f"{row.code} - {row.label}"
        acts.append(Activity(name=row.label, srf_domain=label, budget_share=float(row.percentage/total), rng=rng_by_domain.get(label)))
    return acts

def make_targeted_portfolio(oa_weights: dict, label_prefix: str = '') -> list:
    total_w = sum(oa_weights.values())
    if total_w <= 0: return []
    acts = []
    for code, w in oa_weights.items():
        info = SRF_MAP.get(code)
        if info is None: continue
        label = f"{code} - {info['domain']}"
        acts.append(Activity(name=f'{label_prefix}{info["domain"]}', srf_domain=label, budget_share=float(w/total_w), rng=rng_by_domain.get(label)))
    return acts

# 4. Build the predefined scenarios (Current, Equal, Protection-First, etc.).
current_activities = portfolio_from_shares(sel_oa, cap=8)

top_oa = sel_oa.sort_values('percentage', ascending=False).head(8)
if not top_oa.empty:
    share_equal = 1.0/len(top_oa); equal_activities = []
    for row in top_oa.itertuples():
        label = f"{row.code} - {row.label}"
        equal_activities.append(Activity(name=f'Equal-{row.label}', srf_domain=label, budget_share=share_equal, rng=rng_by_domain.get(label)))
else:
    equal_activities = []

protection_first_activities = make_targeted_portfolio({c:w for c,w in {'OA1':20,'OA2':15,'OA3':10,'OA4':10,'OA5':10,'OA6':5,'OA7':10,'OA13':10,'OA14':10}.items() if c in SRF_MAP}, 'Prot-')
basic_needs_activities = make_targeted_portfolio({c:w for c,w in {'OA8':25,'OA9':20,'OA12':20,'OA10':15,'OA1':10,'OA11':10}.items() if c in SRF_MAP}, 'Basic-')
solutions_activities = make_targeted_portfolio({c:w for c,w in {'OA14':20,'OA15':15,'OA16':15,'OA1':15,'OA2':10,'OA11':10,'OA13':15}.items() if c in SRF_MAP}, 'Sol-')
gender_child_activities = make_targeted_portfolio({c:w for c,w in {'OA4':25,'OA5':25,'OA1':10,'OA10':10,'OA11':10,'OA8':10,'OA6':10}.items() if c in SRF_MAP}, 'GC-')

# Create the "Gap-Closing" scenario: focuses money on areas that missed their targets the most.
if code_imp:
    oa_imp = {c:v for c,v in code_imp.items() if c in SRF_MAP and c.startswith('OA')}
    if oa_imp:
        mn, mx = min(oa_imp.values()), max(oa_imp.values())
        rng_imp = mx - mn if mx > mn else 1.0
        inv = {c: 1.0 - (v - mn)/rng_imp for c,v in oa_imp.items()}
        gap_closing_activities = make_targeted_portfolio(inv, 'Gap-')
    else:
        gap_closing_activities = []
else:
    gap_closing_activities = []

# 5. Integrate an AI (LLM) to suggest an "Optimized" portfolio.
# We send the performance data to the AI and ask for its strategic advice.

AZURE_ENDPOINT   = os.getenv('AZURE_OPENAI_ENDPOINT')
AZURE_KEY        = os.getenv('AZURE_OPENAI_API_KEY')
AZURE_API_VERSION = os.getenv('AZURE_OPENAI_API_VERSION', '2024-02-15-preview')
AZURE_DEPLOYMENT = os.getenv('AZURE_OPENAI_DEPLOYMENT', 'gpt-4o-mini')

try:
    from openai import AzureOpenAI
    azure_client = AzureOpenAI(api_key=AZURE_KEY, api_version=AZURE_API_VERSION, azure_endpoint=AZURE_ENDPOINT) if AZURE_KEY else None
except Exception:
    azure_client = None

SYSTEM_PROMPT = 'You are a UNHCR programme strategist.'
PORTFOLIO_SCHEMA = {
    'type': 'object',
    'properties': {
        'portfolio': {
            'type': 'array',
            'items': {
                'type': 'object',
                'properties': {
                    'activity_name': {'type': 'string'},
                    'srf_domain':    {'type': 'string', 'description': 'The exact Outcome Area standard label e.g. "OA11 - Education"'},
                    'budget_share':  {'type': 'number', 'description': 'Share of total budget in range [0, 1]; all shares must sum to 1.0'}
                },
                'required': ['activity_name', 'srf_domain', 'budget_share']
            }
        },
        'narrative': {'type': 'string', 'description': 'A 500-word justification referencing specific indicator gaps and strategic trade-offs.'}
    },
    'required': ['portfolio', 'narrative']
}

PORTFOLIO_PROMPT = '''Using the data below (Outcome Areas with indicators, baselines, targets, actuals), propose an OPTIMISED budget allocation plan.

Strategy:
1. Identify the largest GAPS (Actual far from Target).
2. Prioritise Outcome Areas that are the most effective in improving the lives of the people UNHCR serve.
3. You may exclude well-performing OAs (allocate 0%).
4. All budget_share values must sum to exactly 1.0.
5. Use the exact standard label format "OAxx - Domain Name" in srf_domain fields.

Return ONLY valid JSON matching this schema: {schema}

DATA CARD:
{card}
'''

def call_anthropic_portfolio(card):
    import urllib.request
    payload = json.dumps({
        "model": "claude-sonnet-4-20250514",
        "max_tokens": 1500,
        "system": SYSTEM_PROMPT,
        "messages": [{"role": "user", "content": PORTFOLIO_PROMPT.format(schema=json.dumps(PORTFOLIO_SCHEMA, indent=2), card=json.dumps(card, indent=2))}]
    }).encode()
    req = urllib.request.Request("https://api.anthropic.com/v1/messages", data=payload, headers={"Content-Type": "application/json", "anthropic-version": "2023-06-01"}, method="POST")
    with urllib.request.urlopen(req, timeout=60) as resp:
        data = json.loads(resp.read())
    return data['content'][0]['text']

def call_azure_portfolio(card):
    if azure_client is None: raise RuntimeError('Azure client not available.')
    user_prompt = PORTFOLIO_PROMPT.format(schema=json.dumps(PORTFOLIO_SCHEMA, indent=2), card=json.dumps(card, indent=2))
    resp = azure_client.chat.completions.create(model=AZURE_DEPLOYMENT, temperature=0.3, messages=[{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": user_prompt}])
    return resp.choices[0].message.content

def parse_llm_portfolio_response(raw_resp):
    clean = re.sub(r'```(?:json)?', '', raw_resp).strip().rstrip('`').strip()
    return json.loads(clean)

# Prepare the data summary for the AI.
tr = transactions_df[transactions_df['activity_id'] == ACTIVITY_ID]
providers = tr.groupby('provider')['value_usd'].sum().sort_values(ascending=False).head(8).round(0)
all_oa_codes = sorted(list(set(sel_oa['code'].unique()) | set(sel_inds[sel_inds['result_code'].str.startswith('OA', na=False)]['result_code'].unique())))
oa_data = []
for code in all_oa_codes:
    oa_budget = sel_oa[sel_oa['code'] == code]
    budget_pct = float(oa_budget['percentage'].iloc[0]) if not oa_budget.empty else 0.0
    info = SRF_MAP.get(code, {})
    label = info.get('domain', code)
    oa_inds = sel_inds[sel_inds['result_code'] == code].copy()
    inds_list = [{'title': i.get('indicator_title'), 'baseline': i.get('baseline'), 'target': i.get('target'), 'actual': i.get('actual'), 'normalized_improvement': round(float(i.get('improve', 0.0)), 4)} for _, i in oa_inds.iterrows()]
    oa_data.append({'code': code, 'label': label, 'current_budget_percentage': budget_pct, 'indicators': inds_list})

decision_card = {'activity_id': ACTIVITY_ID, 'title': sel_act.get('title', 'Unknown'), 'total_value_usd': expenditure_total_usd, 'top_providers_usd': list(providers.items()), 'outcome_areas': oa_data, 'available_srf_domains': [f"{c} - {v['domain']}" for c, v in SRF_MAP.items() if v['level'] == 'OA']}

llm_portfolio_json, llm_narrative = None, ""
if USE_LLM:
    raw_resp = None
    try:
        raw_resp = call_anthropic_portfolio(decision_card)
    except Exception:
        try: raw_resp = call_azure_portfolio(decision_card)
        except Exception: pass
    if raw_resp:
        try:
            data = parse_llm_portfolio_response(raw_resp)
            llm_portfolio_json, llm_narrative = data.get('portfolio', []), data.get('narrative', 'No narrative.')
        except Exception: pass

llm_activities = []
if llm_portfolio_json:
    for item in llm_portfolio_json:
        domain_name, found_code = item.get('srf_domain', ''), None
        for code, info in SRF_MAP.items():
            if f"{code} - {info['domain']}" == domain_name or info['domain'] == domain_name or code == domain_name:
                found_code, domain_name = code, info['domain']; break
        standard_label = f"{found_code} - {domain_name}" if found_code else domain_name
        llm_activities.append(Activity(item.get('activity_name', 'AI Activity'), standard_label, float(item.get('budget_share', 0.1))))
else:
    llm_activities, llm_narrative = gap_closing_activities, "(LLM unavailable — gap-closing used as fallback)"

# 6. Finalize the list of all scenarios to be compared.
all_scenarios = [
    ('No intervention', [], 0.0),
    ('Current', current_activities, 1.0),
    ('Equal', equal_activities, 1.0),
    ('Protection-First', protection_first_activities, 1.0),
    ('Basic Needs', basic_needs_activities, 1.0),
    ('Durable Solutions', solutions_activities, 1.0),
    ('Gender & Child', gender_child_activities, 1.0),
    ('Gap-Closing', gap_closing_activities, 1.0),
    ('AI Suggested', llm_activities, 1.0),
]
all_scenarios = [(n,a,s) for n,a,s in all_scenarios if (n=='No intervention' or a)]

# 7. Build a comparison table showing how the budget is split in each plan.
def relabel_domain(raw_label: str) -> str:
    code = raw_label.split(' - ')[0].strip()
    return OA_FULL_LABELS.get(code, raw_label)

scenario_data = []
for name, p, bscale in all_scenarios:
    if not p: continue
    eff_budget = expenditure_total_usd * bscale
    for a in p:
        scenario_data.append({'Scenario': name, 'Outcome': relabel_domain(a.srf_domain), 'Allocation %': round(a.budget_share * 100, 1), 'Budget Scale': f"{int(bscale*100)}%", 'Effective Budget USD': round(eff_budget * a.budget_share, 0)})

df_scenarios = pd.DataFrame(scenario_data)
pivot_p = df_scenarios.pivot_table(index='Outcome', columns='Scenario', values='Allocation %', aggfunc='sum').fillna(0)
pivot_p = pivot_p.reindex(index=list(OA_FULL_LABELS.values()), columns=[name for name, _, _ in all_scenarios])

def format_alloc(v): return f"{v:.1f}%" if v > 0 else "—"
def style_portfolio(v):
    if v == 0: return "color: #bdc3c7;"
    intensity = min(v / 35, 1.0)
    return f"background-color: rgba(46, 204, 113, {intensity:.2f}); color: {'white' if intensity > 0.5 else 'black'};"

print("--- Outcome Area Portfolio Comparison (Allocation %) ---")
display(pivot_p.style.map(style_portfolio).format(format_alloc))

if llm_narrative:
    display(HTML(f"""<div style="background-color: #f9f9f9; border-left: 6px solid #3498db; padding: 20px; margin-top: 20px; font-family: sans-serif;"><h3 style="margin-top: 0; color: #3498db;">AI Justification for Budget Allocation</h3><p style="line-height: 1.6; white-space: pre-wrap;">{llm_narrative}</p></div>"""))

--- Outcome Area Portfolio Comparison (Allocation %) ---

Scenario	No intervention	Current	Equal	Protection-First	Basic Needs	Durable Solutions	Gender & Child	Gap-Closing	AI Suggested
Outcome
OA1. Access to Territory, Registration and Documentation	—	6.2%	12.5%	26.7%	28.6%	27.3%	14.3%	5.0%	25.0%
OA2. Status Determination	—	4.2%	12.5%	20.0%	—	18.2%	—	—	—
OA3. Protection Policy and Law	—	—	—	—	—	—	—	—	—
OA4. Gender-based Violence	—	7.2%	12.5%	13.3%	—	—	35.7%	12.8%	10.0%
OA5. Child Protection	—	7.1%	12.5%	13.3%	—	—	35.7%	23.3%	10.0%
OA6. Safety and Access to Justice	—	—	—	—	—	—	—	—	—
OA7. Community Engagement and Participation	—	9.2%	12.5%	13.3%	—	—	—	—	10.0%
OA8. Well-Being and Basic Needs	—	40.6%	12.5%	—	71.4%	—	14.3%	46.1%	15.0%
OA9. Sustainable Housing and Settlements	—	—	—	—	—	—	—	—	—
OA10. Healthy Lives	—	—	—	—	—	—	—	—	—
OA11. Education	—	—	—	—	—	—	—	—	—
OA12. Clean Water, Sanitation and Hygiene	—	—	—	—	—	—	—	—	—
OA13. Self Reliance, Economic Inclusion and Livelihoods	—	24.1%	12.5%	13.3%	—	27.3%	—	6.7%	30.0%
OA14. Voluntary Repatriation and Sustainable Reintegration	—	—	—	—	—	—	—	—	—
OA15. Resettlement and Complementary Pathways	—	1.3%	12.5%	—	—	27.3%	—	6.0%	—
OA16. Local Integration and other Local Solutions	—	—	—	—	—	—	—	—	—

AI Justification for Budget Allocation

The budget allocation plan prioritizes Outcome Areas (OAs) with the largest gaps between actual performance and targets, focusing on those most impactful for improving the lives of refugees and displaced persons served by UNHCR in Peru. OA1 (Access to Territory, Registration and Documentation) shows a significant gap in the proportion of refugees and asylum seekers registered individually (actual 18% vs target 100%), indicating a critical need to strengthen registration and legal identity services. Allocating 25% of the budget here addresses foundational protection needs. OA13 (Self Reliance, Economic Inclusion and Livelihoods) reveals a negative gap in income self-reporting (actual 5% vs target 50%), despite a good proportion having financial accounts. This area is crucial for sustainable refugee integration and economic empowerment, thus receiving the largest share (30%). OA4 (Gender-based Violence) has a notable gap in awareness of GBV services (59% vs 85%) and satisfaction with case management (91% vs 100%), warranting 10% allocation to expand specialized programs and awareness. OA5 (Child Protection) indicators show substantial gaps in alternative care for unaccompanied children (39% vs 100%) and participation in protection programs, justifying a 10% share to enhance child protection services. OA7 (Community Engagement and Participation) has a significant gap in access to safe feedback mechanisms (47% vs 100%), vital for responsive programming and protection, allocated 10%. OA8 (Well-Being and Basic Needs) shows gaps in cash transfer coverage (2% vs 6%) and reliance on clean fuels (90% vs 100%), essential for immediate survival and dignity, thus allocated 15%. OAs with no clear gaps or well-performing indicators, such as OA15 (Resettlement) and OA2 (Status Determination), receive 0% to optimize resources. This allocation plan balances urgent protection gaps with strategic investments in self-reliance and community participation, ensuring funds target the most critical needs to improve refugee well-being and durable solutions in Peru.

Step 4: The Virtual Simulator

This is where the magic happens. We have created a “Virtual World Simulator” inside the computer. We populate this world with numerous digital household.

How the Simulator Works:

Time Passes: We watch these virtual households month by month for an entire year.
Natural Decline: If we provide zero support over the year (the “No Intervention” scenario), the households’ situations slowly get worse over time due to natural hardships.
Applying the Interventions: For all our other budget plans, we apply the intervention support to the households. Those receiving clean water become less sick. Kids who enter education support get better access.
Applying the Interactions: We also model the interactions between interventions. For example, providing clean water not only reduces illness but also improves the ability of children to attend school.
Checking the Results: At the end of the 12 virtual months, we check which budget plan led to the healthiest, safest, and most stable households.

Code

# This is the "Virtual World Simulator". It creates thousands of digital households 
# and tracks how their lives change month-by-month over a year.

from numpy.random import SeedSequence, default_rng

EFFECT_SCALE_MULT = 1.0  # used for sensitivity OAT

# 1. Define the "Households" class. 
# This represents thousands of digital people, each with their own health, safety, and stability scores.
class Households:
    def __init__(self, n=5000):
        # Initialize each household with random but realistic starting levels for 6 key areas.
        self.vulnerability     = np.clip(np.random.beta(2, 5, size=n), 0, 1)
        self.protection_risk   = np.clip(np.random.beta(2.5, 4, size=n), 0, 1)
        self.basic_needs_gap   = np.clip(np.random.beta(2, 6, size=n), 0, 1)
        self.education_access  = np.clip(np.random.beta(2, 3, size=n), 0, 1)
        self.morbidity         = np.clip(np.random.beta(2, 6, size=n), 0, 1)
        self.service_efficiency= np.zeros(n)
        self.n = n
    def as_dict(self):
        return {
            'vulnerability': self.vulnerability,
            'protection_risk': self.protection_risk,
            'basic_needs_gap': self.basic_needs_gap,
            'education_access': self.education_access,
            'morbidity': self.morbidity,
            'service_efficiency': self.service_efficiency,
        }


# 2. Define the "Environment" class.
# This handles "Natural Drift"—the tendency for situations to slowly worsen if no help is provided.
class Environment:
    def __init__(self, households: Households, rng=None, drift_mults=None):
        self.hh = households
        self.time = 0
        self.rng = rng or default_rng()
        self.drift_mults = drift_mults or {k:1.0 for k in ['vulnerability','protection','basic','morbidity','service','education']}
    def natural_drift(self):
        n = self.hh.n
        rnorm = self.rng.normal
        self.hh.vulnerability     = np.clip(self.hh.vulnerability    + rnorm( 0.005*self.drift_mults['vulnerability'], 0.01, n), 0, 1)
        self.hh.protection_risk   = np.clip(self.hh.protection_risk  + rnorm( 0.006*self.drift_mults['protection'], 0.01, n), 0, 1)
        self.hh.basic_needs_gap   = np.clip(self.hh.basic_needs_gap  + rnorm( 0.004*self.drift_mults['basic'], 0.01, n), 0, 1)
        self.hh.morbidity         = np.clip(self.hh.morbidity        + rnorm( 0.003*self.drift_mults['morbidity'], 0.01, n), 0, 1)
        self.hh.service_efficiency= np.clip(self.hh.service_efficiency + rnorm(-0.002*self.drift_mults['service'], 0.01, n), 0, 1)
        self.hh.education_access  = np.clip(self.hh.education_access + rnorm(-0.002*self.drift_mults['education'], 0.01, n), 0, 1)
    def indicators(self):
        return {
            'Vulnerability (average effect)': float(self.hh.vulnerability.mean()),
            'Protection risk (average effect)': float(self.hh.protection_risk.mean()),
            'Basic needs gap (average effect)': float(self.hh.basic_needs_gap.mean()),
            'Education access (average effect)': float(self.hh.education_access.mean()),
            'Morbidity (average effect)': float(self.hh.morbidity.mean()),
            'Service efficiency (average effect)': float(self.hh.service_efficiency.mean()),
        }



# 3. Use a "Gaussian Mixture Model" (GMM) to initialize the digital households.
# This ensures our virtual population has a realistic mix of stable, struggling, and high-risk families.
from sklearn.mixture import GaussianMixture
def fit_gmm_from_synthetic():
    """
    Creates a synthetic multimodal distribution that approximates
    realistic heterogeneity found in vulnerability, risk, needs, etc.
    This is a fallback in absence of real survey data.
    """
    synthetic = np.concatenate([
        np.random.beta(2, 6, 4000),   # majority stable households
        np.random.beta(6, 3, 3000),   # moderate vulnerability
        np.random.beta(2, 2, 3000),   # high risk / high need subgroup
    ]).reshape(-1, 1)

    gmm = GaussianMixture(n_components=3, covariance_type='full', max_iter=500)
    gmm.fit(synthetic)
    return gmm

# Fit a GMM once for the whole ABM run
GMM_MODEL = fit_gmm_from_synthetic()

def gmm_sample(n):
    """
    Draw bounded household state values from the calibrated GMM.
    Output is always in [0, 1].
    """
    samples, _ = GMM_MODEL.sample(n)
    return np.clip(samples.flatten(), 0, 1)

# Create the base household population using GMM
np.random.seed(BASE_SEED); random.seed(BASE_SEED)

H_base = Households(n=HOUSEHOLDS)
H_base.vulnerability      = gmm_sample(HOUSEHOLDS)
H_base.protection_risk    = gmm_sample(HOUSEHOLDS)
H_base.basic_needs_gap    = gmm_sample(HOUSEHOLDS)
H_base.education_access   = gmm_sample(HOUSEHOLDS)
H_base.morbidity          = gmm_sample(HOUSEHOLDS)
H_base.service_efficiency = gmm_sample(HOUSEHOLDS)

#print("[GMM] Household populations initialized using Gaussian Mixture distributions.")


# 4. Define the rules for "Dynamics"—how households change over time.
# Better than additive "state += delta" inside simulate()
# Variables where higher is better
INCREASING_VARS = {'education_access', 'service_efficiency'}

# Per‑variable responsiveness (you can tune these)
VAR_STRENGTH = {
    'vulnerability':      0.10,
    'protection_risk':    0.10,
    'basic_needs_gap':    0.14,
    'education_access':   0.08,
    'morbidity':          0.12,
    'service_efficiency': 0.16
}

# Optional per‑variable lag (months before interventions show effects)
VAR_LAG = {
    'vulnerability':      1,
    'protection_risk':    1,
    'basic_needs_gap':    0,
    'education_access':   2,
    'morbidity':          1,
    'service_efficiency': 0
}

# Slope of the logistic saturation
LOGISTIC_STEEPNESS = 1.8

def nonlinear_update_scalar(current, effect_value, month, var_name):
    """
    Logistic, delayed, bounded update for a single scalar state value.
    """
    lag = VAR_LAG.get(var_name, 0)
    if month < lag:
        return float(current)

    # Polarity: for "lower is better" variables we flip sign
    inc = (var_name in INCREASING_VARS)
    signed_effect = effect_value if inc else -effect_value

    # Logistic shift in [-0.5, +0.5]
    logistic_shift = (1.0 / (1.0 + np.exp(-LOGISTIC_STEEPNESS * signed_effect))) - 0.5

    # Apply with variable‑specific strength
    rs = VAR_STRENGTH.get(var_name, 0.12)
    new_val = current + rs * logistic_shift

    return float(np.clip(new_val, 0.0, 1.0))


def get_activity_effects(activity, hh, budget_scale):
    """
    Ask the activity to *compute* its per‑variable effect intensities
    WITHOUT mutating households (MODE A).
    Your Activity.apply must support: return_effect_only=True
    and return a dict, e.g. {'vulnerability': +0.03, 'basic_needs_gap': -0.02, ...}
    """
    try:
        return activity.apply(hh, budget_scale=budget_scale, return_effect_only=True)
    except TypeError as e:
        raise RuntimeError(
            "Activity.apply(...) must accept return_effect_only=True and return a dict "
            "of per‑variable effects WITHOUT mutating households (MODE A). "
            "Please patch Activity.apply accordingly."
        ) from e


def apply_collinearity_propagation(hh, deltas):
    """
    After the nonlinear updates, propagate cross‑effects using the per‑household
    realized deltas (element‑wise), then clip to [0,1].
    This maintains your original collinearity semantics.
    """
    pairs = [
        ('protection_risk', 'vulnerability',      COLLINEARITY_PR_VUL),
        ('vulnerability',   'protection_risk',    COLLINEARITY_VUL_PR),
        ('basic_needs_gap', 'morbidity',          COLLINEARITY_BNG_MOR),
        ('morbidity',       'basic_needs_gap',    COLLINEARITY_MOR_BNG),
        ('service_efficiency','education_access', COLLINEARITY_SE_EA),
        ('education_access','service_efficiency', COLLINEARITY_EA_SE),
        ('basic_needs_gap', 'vulnerability',      COLLINEARITY_BNG_VUL),
        ('vulnerability',   'basic_needs_gap',    COLLINEARITY_VUL_BNG),
    ]

    for src, dst, coeff in pairs:
        if coeff == 0.0: 
            continue
        src_delta = deltas.get(src, None)
        if src_delta is None:
            continue
        dst_arr = getattr(hh, dst)
        # Linear propagation on realized deltas; small coefficients recommended (0–0.3)
        dst_arr[:] = np.clip(dst_arr + coeff * src_delta, 0.0, 1.0)

# 5. Define "Synergy" Effects (Interactions).
# These functions model how different aspects of a household's life affect each other.
# Variables where higher is better
INCREASING_VARS = {'education_access', 'service_efficiency'}

def is_increasing(var_name: str) -> bool:
    return var_name in INCREASING_VARS

def helpful_delta(var_name: str, delta_arr: np.ndarray) -> np.ndarray:
    """
    Convert raw delta into 'helpful change' space:
      - If higher is better → helpful = +delta
      - If lower is better  → helpful = -delta  (since a decrease is helpful)
    """
    return delta_arr if is_increasing(var_name) else (-delta_arr)

def raw_from_helpful(var_name: str, helpful_arr: np.ndarray) -> np.ndarray:
    """
    Convert helpful change back to raw orientation for the target variable:
      - If higher is better → raw = helpful
      - If lower is better  → raw = -helpful
    """
    return helpful_arr if is_increasing(var_name) else (-helpful_arr)

# ---------------------------------------------
# Synergy rules:
# Each rule says: when BOTH sources improve (in 'helpful' terms),
# produce an *extra helpful change* on the target.
# coef controls strength; mode='help_only' ignores adverse (both-worse) cases.
# ---------------------------------------------
SYNERGY_RULES = [
    # Basic needs gap ↓ AND service efficiency ↑  → extra ↓ in vulnerability
    {'sources': ('basic_needs_gap', 'service_efficiency'), 'target': 'vulnerability',   'coef': 0.30, 'mode': 'help_only'},

    # Education access ↑ AND service efficiency ↑ → extra ↓ in protection risk
    {'sources': ('education_access', 'service_efficiency'), 'target': 'protection_risk','coef': 0.15, 'mode': 'help_only'},

    # Basic needs gap ↓ AND protection risk ↓ → extra ↓ in morbidity
    {'sources': ('basic_needs_gap', 'protection_risk'),     'target': 'morbidity',      'coef': 0.20, 'mode': 'help_only'},
]

# Global synergy cap per step (per household, per rule), keeps realism
ABS_SYNERGY_CAP = 0.03

def apply_synergy_effects(hh, base_deltas: dict):
    """
    Apply interaction (synergy) based on the *realized base deltas* this month
    BEFORE collinearity propagation.

    base_deltas: dict of {var_name: np.ndarray of per-household raw delta}
    """
    n = hh.n

    for rule in SYNERGY_RULES:
        s1, s2 = rule['sources']
        tgt    = rule['target']
        coef   = float(rule.get('coef', 0.0))
        mode   = rule.get('mode', 'help_only')

        if coef == 0.0:
            continue
        if s1 not in base_deltas or s2 not in base_deltas:
            continue

        d1_raw = base_deltas[s1]
        d2_raw = base_deltas[s2]
        if d1_raw is None or d2_raw is None:
            continue

        # Transform into helpful space for sources
        h1 = helpful_delta(s1, d1_raw)
        h2 = helpful_delta(s2, d2_raw)

        # Synergy is product of helpful improvements
        # (both must be > 0 for 'help_only' mode)
        prod = h1 * h2
        if mode == 'help_only':
            prod = np.where((h1 > 0) & (h2 > 0), prod, 0.0)

        extra_helpful = coef * prod
        # Cap synergy magnitude
        extra_helpful = np.clip(extra_helpful, -ABS_SYNERGY_CAP, ABS_SYNERGY_CAP)

        # Convert back to raw delta for the target variable
        extra_raw = raw_from_helpful(tgt, extra_helpful)

        # Apply to target state and clip
        tgt_arr = getattr(hh, tgt)
        tgt_arr[:] = np.clip(tgt_arr + extra_raw, 0.0, 1.0)

# 6. The "Simulate" function: the master loop that runs the virtual world.
def simulate(env: Environment, activities: list, months: int = 6, budget_scale: float = 1.0):
    """
    Simulate with synergy:
      0) record indicators
      1) natural drift
      2) collect non-mutating per-activity effects: {var: [(value, idx), ...]}
      3) nonlinear + delayed update per var, applying each (value, idx)
      4) compute base (realized) deltas
      5) apply collinearity (uses realized deltas)
      6) apply synergy (uses base deltas only)
    """
    timeline = []

    for t in range(months + 1):
        # record
        timeline.append({**env.indicators(), 'month': t})
        if t == months:
            break

        # natural drift
        env.natural_drift()

        # collect effects (value, idx) per variable
        effects_by_var = {
            'vulnerability':      [],
            'protection_risk':    [],
            'basic_needs_gap':    [],
            'education_access':   [],
            'morbidity':          [],
            'service_efficiency': []
        }
        for a in activities:
            eff = get_activity_effects(a, env.hh, budget_scale=budget_scale)  # returns {var: (val, idx)}
            if not eff:
                continue
            for var, payload in eff.items():
                if payload is None:
                    continue
                val, idx = payload
                # validate idx array
                if idx is None or len(idx) == 0:
                    continue
                effects_by_var[var].append((float(val), idx))

        # Snapshot before activity application
        prev = {
            'vulnerability':      env.hh.vulnerability.copy(),
            'protection_risk':    env.hh.protection_risk.copy(),
            'basic_needs_gap':    env.hh.basic_needs_gap.copy(),
            'education_access':   env.hh.education_access.copy(),
            'morbidity':          env.hh.morbidity.copy(),
            'service_efficiency': env.hh.service_efficiency.copy()
        }

        # nonlinear + delayed update
        for var_name, batches in effects_by_var.items():
            if not batches:
                continue
            arr = getattr(env.hh, var_name)

            # Apply each activity's (value, idx) to its covered households
            for effect_value, idx in batches:
                if len(idx) == 0:
                    continue
                # Update ONLY covered indices
                arr[idx] = [
                    nonlinear_update_scalar(arr[i], effect_value, t, var_name)
                    for i in idx
                ]

        # compute realized base deltas (post-nonlinear, pre-collinearity)
        base_deltas = {
            'vulnerability':      env.hh.vulnerability - prev['vulnerability'],
            'protection_risk':    env.hh.protection_risk - prev['protection_risk'],
            'basic_needs_gap':    env.hh.basic_needs_gap - prev['basic_needs_gap'],
            'education_access':   env.hh.education_access - prev['education_access'],
            'morbidity':          env.hh.morbidity - prev['morbidity'],
            'service_efficiency': env.hh.service_efficiency - prev['service_efficiency']
        }

        # collinearity on realized deltas
        apply_collinearity_propagation(env.hh, base_deltas)

        # synergy on base deltas (pre-collinearity), adds second-order effects
        apply_synergy_effects(env.hh, base_deltas)

        # advance time
        env.time += 1

    return pd.DataFrame(timeline)

# 7. The "Simulation Pipeline": runs every scenario multiple times for accuracy.
metrics = [
    'Vulnerability (average effect)', 'Protection risk (average effect)', 'Basic needs gap (average effect)',
    'Education access (average effect)', 'Morbidity (average effect)', 'Service efficiency (average effect)'
]

def clone_households(h):
    H2 = Households(n=h.n)
    for k,v in h.as_dict().items(): setattr(H2,k,v.copy())
    return H2

all_results = []
for run_id in range(N_RUNS):
    ss_run = SeedSequence(BASE_SEED + run_id)
    rng_env = default_rng(ss_run.spawn(1)[0])
    child_ss = ss_run.spawn(len(all_effect_domains))
    rng_by_domain_run = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, child_ss)}


    # Build activities with per-run RNGs
    scen_built = []
    for name, acts, bscale in all_scenarios:
        built = [Activity(a.name, a.srf_domain, a.budget_share, coverage=a.coverage, intensity=a.intensity, rng=rng_by_domain_run.get(a.srf_domain, default_rng())) for a in acts]
        scen_built.append((name,built,bscale))


    # ----- Run "No intervention" baseline -----
    env = Environment(clone_households(H_base), rng=rng_env)
    traj = simulate(env, [], MONTHS, 1.0)
    traj['scenario'] = 'No intervention'; traj['run'] = run_id; all_results.append(traj)

    # ----- Run all other scenarios -----
    for name, acts, bscale in scen_built:
        if name=='No intervention': continue
        env = Environment(clone_households(H_base), rng=rng_env)
        traj = simulate(env, acts, MONTHS, budget_scale=bscale)
        traj['scenario'] = name; traj['run'] = run_id; all_results.append(traj)


# 8. Aggregate and save the final results.
results_df_ = pd.concat(all_results, ignore_index=True)

agg = []
for (scenario, month), grp in results_df_.groupby(['scenario','month']):
    row={'scenario':scenario,'month':month}
    for m in metrics:
        vals=grp[m].values
        row[m]=float(vals.mean()); row[f'{m}_lo']=float(np.percentile(vals,5)); row[f'{m}_hi']=float(np.percentile(vals,95))
    agg.append(row)
results_df_agg = pd.DataFrame(agg)


# ============================================================
# SAVE OUTPUTS - Persist params/seeds
# ============================================================

os.makedirs('outputs', exist_ok=True)
repro = {
  'activity_id': ACTIVITY_ID,
  'params': {'MONTHS': MONTHS, 'HOUSEHOLDS': HOUSEHOLDS, 'USE_LLM': USE_LLM, 'BASE_SEED': BASE_SEED, 'N_RUNS': N_RUNS, 'DEA_VRS': DEA_VRS},
  'run_seeds': [BASE_SEED + i for i in range(N_RUNS)],
  'timestamp': pd.Timestamp.now().isoformat()
}
with open(f'outputs/repro_report_{ACTIVITY_ID}.json', 'w') as f:
    json.dump(repro, f, indent=2)

results_df_agg.to_csv(f'outputs/simulation_results_{ACTIVITY_ID}.csv', index=False)

# print("Simulation complete & reproducibility artifacts saved.")
print(f"[ Simulation Complete ]")
print(f"Ran {N_RUNS} replicates × {results_df_['scenario'].nunique()} scenarios × {MONTHS} months.")
print(f"Aggregated to {len(results_df_)} rows (mean ± 90% CI per scenario-month).")

[ Simulation Complete ]
Ran 20 replicates × 9 scenarios × 12 months.
Aggregated to 2340 rows (mean ± 90% CI per scenario-month).

Code

import matplotlib.cm as cm

# 1. Prepare the visual styling for the time-series charts (colors and markers for each scenario).
scenarios_ordered = results_df_['scenario'].unique().tolist()
markers = ['o', 's', '^', 'D', 'v', 'p', '*', 'h', 'X', '+']
marker_map = {s: markers[i % len(markers)] for i, s in enumerate(scenarios_ordered)}
colors = plt.cm.get_cmap('tab20', len(scenarios_ordered))
color_map = {s: colors(i) for i, s in enumerate(scenarios_ordered)}

# 2. Create the "Time-Series" charts: tracking household effects over the 12 virtual months.
print("\n Tracking Household Effects Over the 12 Months")
fig, axes = plt.subplots(len(metrics), 1, figsize=(8, 6 * len(metrics)))
for i, m in enumerate(metrics):
    ax = axes[i]
    # Collect the final results to ensure labels on the right don't overlap.
    end_points = []
    for label, df in results_df_agg.groupby('scenario'):
        df_s = df.sort_values('month')
        # Plot the main line (the average result).
        ax.plot(df_s['month'], df_s[m],
                marker=marker_map[label], label=label,
                color=color_map[label], markersize=5, linewidth=2, alpha=0.9)
        # Add a faint "shadow" (the 90% confidence band) to show the range of likely results.
        ax.fill_between(df_s['month'], df_s[f'{m}_lo'], df_s[f'{m}_hi'],
                        color=color_map[label], alpha=0.1)
        end_points.append((float(df_s[m].iloc[-1]), label, color_map[label]))

    # Stagger the labels on the right side of the chart for better readability.
    end_points.sort(key=lambda x: x[0])
    MIN_GAP = 0.012
    adjusted_y = []
    for idx_ep, (y_raw, lbl, col) in enumerate(end_points):
        y_adj = y_raw
        for prev_y in adjusted_y:
            if abs(y_adj - prev_y) < MIN_GAP: y_adj = prev_y + MIN_GAP
        adjusted_y.append(y_adj)
        last_x = MONTHS
        ax.plot([last_x, last_x + 0.3], [y_raw, y_adj], color=col, linewidth=0.6, alpha=0.5)
        ax.text(last_x + 0.35, y_adj, lbl, color=col, fontweight='bold', va='center', fontsize=7.5,
                bbox=dict(facecolor='white', alpha=0.7, edgecolor='none', pad=1))
        
    # Set the titles and labels using friendly, non-technical language.
    friendly_title = m.replace(' (average effect)', '').replace('Morbidity', 'Sickness')
    ax.set_title(friendly_title, fontsize=13, fontweight='bold', pad=12)
    ax.set_xlabel('Month')
    ax.set_xlim(right=MONTHS + 5.5)
    ax.grid(True, alpha=0.2, linestyle='--')

plt.tight_layout(pad=2.0)
plt.show()

# 3. Create the "Heatmap": a quick visual summary of which plan works best compared to "Doing Nothing".
try:
    import seaborn as sns
except Exception:
    sns = None
finals = results_df_agg[results_df_agg['month'] == results_df_agg['month'].max()].copy()
baseline_row = finals[finals['scenario'] == 'No intervention']
if not baseline_row.empty and sns is not None:
    base = baseline_row.iloc[0]
    scenario_rows, row_labels = [], []
    for _, row in finals.iterrows():
        if row['scenario'] == 'No intervention': continue
        diffs = []
        for m in metrics:
            # Adjust the sign: improvement means the score goes UP for education, but DOWN for risk.
            sign = -1 if any(k in m.lower() for k in ['vulnerability', 'protection', 'needs', 'morbidity']) else +1
            diffs.append(sign * (row[m] - base[m]))
        scenario_rows.append(diffs)
        row_labels.append(row['scenario'])

    print("\n How Much Better is Each Plan compared to Doing Nothing? (Green is good)")
    diff_mat = np.array(scenario_rows)
    fig, ax = plt.subplots(figsize=(8, 0.7 * len(row_labels) + 1))
    sns.heatmap(
        diff_mat, annot=True, fmt='.4f', cmap='RdYlGn', center=0,
        yticklabels=row_labels,
        xticklabels=['Vulnerability', 'Protection', 'Basic Needs', 'Morbidity', 'Education', 'Services'],
        ax=ax
    )
    ax.set_title('Improvement vs. Doing Nothing (More Positive/Green = Better)', fontsize=11, fontweight='bold', pad=15)
    plt.tight_layout()
    plt.show()


 Tracking Household Effects Over the 12 Months


 How Much Better is Each Plan compared to Doing Nothing? (Green is good)

Step 5: Most Value for the Money

After running all the tests, we want to see mathematically which plans were the most perfectly efficient. We use an economic tool to grade them.

Explanation: Reading the Results

Overall Improvement Score
Shows the total positive change a plan delivers across all outcomes.
Higher = better overall results.

Improvement per Budget Unit
All plans use the same total budget.
So this score simply reflects how much total improvement each plan delivers.
Higher = more “value for results.”

Noise‑Adjusted Efficiency Score
Shows whether a plan makes better use of the same budget compared with the others (i.e. well-balanced and non‑wasteful a plan is compared with all other plans).
A score close to 1.0 means no other plan achieves clearly better results across all outcomes.

Likely Range
Shows the realistic uncertainty around the efficiency score.
A narrow range = more reliable performance.

Improvement per Budget Unit
All plans use the same total budget.
So this score simply reflects how much total piled improvement each plan delivers but does not balance optimised improvements across effects.
Higher = more “value for results.”

How to choose between plans:
Plans are ranked first by Noise‑Adjusted Efficiency (best use of the budget).
If two plans have similar efficiency, choose the one with the higher Improvement per Budget.

Code

# This analysis identifies which budget plans are the most "efficient"—meaning 
# they deliver the best possible results for the amount of money spent.

from scipy.optimize import linprog
import numpy as np
import pandas as pd

# ============================================================
# COMBINED EFFICIENCY ANALYSIS
# - Includes BOTH DEA (noise-adjusted efficiency)
# - AND simple improvement-per-budget (v2)
# - Produces ONE unified non-technical table
# ============================================================

finals = results_df_agg[results_df_agg['month']==results_df_agg['month'].max()].copy()
base  = finals[finals['scenario']=='No intervention'].iloc[0]

# ------------------------------------------------------------
# 1. Prepare outcome improvements for each scenario
# ------------------------------------------------------------
scenarios, Yrows, Xrows = [], [], []

for _, row in finals.iterrows():
    sname = row['scenario']
    if sname == 'No intervention':
        continue

    improvements = []
    for m in metrics:
        sign = -1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else 1
        imp = max(0.0, sign*(row[m] - base[m]))
        improvements.append(imp)

    Yrows.append(improvements)
    Xrows.append([1.0])      # All scenarios use the SAME total budget
    scenarios.append(sname)

Y = np.array(Yrows)
X = np.array(Xrows)
n, s = Y.shape

# 3. Define the DEA Scorer: a mathematical tool that finds the "Efficiency Frontier".
# It identifies which plans give the absolute best results for the budget spent.
def dea_score(Y, X, o):
    c=np.zeros(n+1); c[-1]=1.0
    A=[]; b=[]

    # Input constraint
    row=np.zeros(n+1); row[:n]=X[:,0]; row[-1]=-X[o,0]
    A.append(row); b.append(0.0)

    # Output constraints
    for r in range(s):
        row=np.zeros(n+1); row[:n]=-Y[:,r]
        A.append(row); b.append(-Y[o,r])

    bounds=[(0,None)]*n + [(0,1)]
    res = linprog(c, A_ub=np.array(A), b_ub=np.array(b), bounds=bounds, method='highs')
    if not res.success:
        return np.nan
    return float(res.x[-1])

# 4. Use "Bootstrapping" to adjust for noise.
# We run the math hundreds of times with slightly different data to get a "Noise-Adjusted" score.
def bootstrap_efficiency(Y, X, o, B=300, seed=42):
    rng = np.random.default_rng(seed)

    original = dea_score(Y, X, o)
    if np.isnan(original):
        return np.nan, np.nan, np.nan

    boots=[]
    for _ in range(B):
        idx = rng.integers(0, n, n)
        Yb,Yx = Y[idx], X[idx]
        val = dea_score(Yb, Yx, o)
        if not np.isnan(val): boots.append(val)

    if not boots:
        return original, np.nan, np.nan

    boots=np.array(boots)
    bias = boots.mean()-original
    corrected = original - bias

    return corrected, float(np.percentile(boots,5)), float(np.percentile(boots,95))

# 5. Calculate a simple "Total Improvement" score for each plan.
def total_improvement(row):
    tot=0.0
    for m in metrics:
        sign=-1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else 1
        tot += max(0.0, sign*(row[m]-base[m]))
    return float(tot)

simple_scores = { 
    row['scenario']: total_improvement(row) 
    for _,row in finals.iterrows() if row['scenario']!="No intervention"
}


# 6. Build the final summary table, ranking the plans from most to least efficient.
rows=[]
for i, sname in enumerate(scenarios):
    corrected, lo, hi = bootstrap_efficiency(Y, X, i)
    ti = simple_scores.get(sname, 0.0)

    # Non-technical interpretation
    if np.isnan(corrected):
        interp = "Could not evaluate efficiency."
    elif corrected >= 0.90:
        interp = "Excellent conversion of budget into results."
    elif corrected >= 0.75:
        interp = "Good use of resources."
    elif corrected >= 0.55:
        interp = "Moderate efficiency."
    else:
        interp = "Limited results for the same budget."

    rows.append({
        "Scenario": sname,
        "Noise-Adjusted Efficiency Score": round(corrected,3),
        "Likely Range (uncertainty)": f"{lo:.3f} – {hi:.3f}",
        "Overall Improvement per Budget Unit": round(ti,4),   
        "Interpretation": interp
    })



eff_combined_df = (
    pd.DataFrame(rows)
    .sort_values(
        by=["Noise-Adjusted Efficiency Score", "Overall Improvement per Budget Unit"],
        ascending=[False, False]
    )
    .reset_index(drop=True)
)


print("\n📊 **Combined Efficiency Summary**")
display(
    eff_combined_df.style.format({
        "Overall Improvement per Budget Unit": "{:.4f}",
        "Noise-Adjusted Efficiency Score": "{:.3f}"
    })
)


📊 **Combined Efficiency Summary**

	Scenario	Noise-Adjusted Efficiency Score	Likely Range (uncertainty)	Overall Improvement per Budget Unit	Interpretation
0	Current	1.162	0.092 – 1.000	0.0005	Excellent conversion of budget into results.
1	AI Suggested	1.152	0.092 – 1.000	0.0128	Excellent conversion of budget into results.
2	Basic Needs	1.147	0.092 – 1.000	0.0059	Excellent conversion of budget into results.
3	Gender & Child	1.144	0.092 – 1.000	0.0058	Excellent conversion of budget into results.
4	Gap-Closing	1.136	0.092 – 1.000	0.0129	Excellent conversion of budget into results.
5	Protection-First	1.130	0.092 – 1.000	0.0087	Excellent conversion of budget into results.
6	Equal	0.184	0.092 – 1.000	0.0002	Limited results for the same budget.
7	Durable Solutions	-0.643	0.092 – 1.000	0.0001	Limited results for the same budget.

Step 6: Doing More With Less

A plan can be efficient and still operate with more budget than necessary. So how small can we shrink the budget and still provide the exact same level of overall support to the people as our Current Plan?

How the Optimization Works

Defining the Goal: We first calculate the total “support” generated by the Current Plan. We add up all the positive improvements across protection, vulnerability, basic needs, health, education, and service efficiency to create a single benchmark “Target Score”. This is what we mean by the “same level of support”.
Mixing and Matching: The computer doesn’t just pick one of the previous scenarios. Instead, it creates a custom “Optimized Blend”. It takes the smartest individual budget ideas (allocations to specific outcome areas) from all the “What-If” plans designed above.
Shrinking the Budget: The computer repeatedly simulates this Optimized Blend, slowly shrinking the total budget. It stops shrinking the budget the exact moment it can no longer hit our benchmark “Target Score”.
The Result: We discover the absolute lowest budget (b*) needed to achieve the exact same level of support as our current, more expensive plan.
Comparison: “Δ” shows how the Optimized plan compares to the Current plan. For indicators where “lower is better” (like vulnerability or morbidity), the sign is automatically flipped.

Code

# This part of the analysis finds the absolute "Lowest Budget" needed to match 
# the overall support provided by the current, more expensive plan.

from functools import lru_cache
from concurrent.futures import ThreadPoolExecutor, as_completed
from numpy.random import SeedSequence, default_rng
import numpy as np
import pandas as pd
import json

# print("⚡ Optimized Budget Minimisation Running...")
# 1. Define the Goal: we first calculate the total "support" generated by the Current Plan.
# This becomes our benchmark "Target Score" that the optimized plan must hit.
finals_full = results_df_agg[results_df_agg['month']==results_df_agg['month'].max()].copy()

base_row = finals_full[finals_full['scenario']=='No intervention'].iloc[0]
current_row = finals_full[finals_full['scenario']=='Current']

def total_improvement(row):
    tot = 0.0
    for m in metrics:
        sign = -1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else +1
        tot += max(0.0, sign*(row[m]-base_row[m]))
    return float(tot)

target_imp = total_improvement(current_row.iloc[0])
# print(f"🎯 Target improvement (Current full $): {target_imp:.5f}")

candidate_oas = []
for code, info in SRF_MAP.items():
    if info['level']!='OA': continue
    label = f"{code} - {info['domain']}"
    if SRF_EFFECT_LIBRARY.get(label, {}):
        candidate_oas.append(code)

assert candidate_oas, "No OA effects available for optimization."


# ============================================================
# BUILDING ACTIVITIES FROM OAs
# ============================================================
def build_activities_from_weights(weights: dict):
    """Convert OA weights into an activity portfolio."""
    # take top 8 weighted OAs
    items = sorted(weights.items(), key=lambda x: x[1], reverse=True)[:8]
    items = {k:(v if v>0 else 0.0) for k,v in items}
    s = sum(items.values())
    if s <= 0:
        return []
    norm = {k: v/s for k,v in items.items()}
    return make_targeted_portfolio(norm)

# 2. Mixing and Matching: the computer repeatedly simulates different "Optimized Blends" 
# of budget ideas to find the smartest possible combination.
@lru_cache(maxsize=50000)
def evaluate_portfolio_cached(weights_json: str, budget_scale: float, n_runs_opt: int):
    """Memoized wrapper so repeated evaluations don't re-run ABM."""
    weights = json.loads(weights_json)

    acts = build_activities_from_weights(weights)
    if not acts:
        return -1.0, (np.nan, np.nan), acts

    run_totals = []
    for run_id in range(n_runs_opt):
        ss_run = SeedSequence(BASE_SEED + 3000 + run_id + int(budget_scale*1000))
        rng_env = default_rng(ss_run.spawn(1)[0])
        child_ss = ss_run.spawn(len(all_effect_domains))
        rng_by_domain_run = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, child_ss)}

        acts_run = [
            Activity(a.name, a.srf_domain, a.budget_share, coverage=a.coverage, 
                     intensity=a.intensity, rng=rng_by_domain_run.get(a.srf_domain, default_rng()))
            for a in acts
        ]

        env = Environment(clone_households(H_base), rng=rng_env)
        traj = simulate(env, acts_run, MONTHS, budget_scale=budget_scale)
        final = traj.iloc[-1]

        tot=0.0
        for m in metrics:
            sign = -1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else +1
            tot += max(0.0, sign*(final[m] - base_row[m]))
        run_totals.append(tot)

    run_totals = np.array(run_totals)
    return float(run_totals.mean()), (float(np.percentile(run_totals,5)), float(np.percentile(run_totals,95))), acts


def evaluate_portfolio(weights: dict, budget_scale: float, n_runs_opt: int):
    """Serialize for cache."""
    w_json = json.dumps(weights, sort_keys=True)
    return evaluate_portfolio_cached(w_json, budget_scale, n_runs_opt)


# ============================================================
# FAST OPTIMIZATION USING RANDOM + ELITE + LOCAL MUTATION
# ============================================================
def optimize_portfolio_for_budget(budget_scale: float, iters: int=150, seed: int=123):
    rng = np.random.default_rng(seed)

    # initial seeds from existing scenarios
    seeds = []
    for name, acts, _ in all_scenarios:
        if name == 'No intervention': 
            continue
        w = {}
        for a in acts:
            code = a.srf_domain.split(' - ')[0]
            if code in candidate_oas:
                w[code] = w.get(code,0) + a.budget_share
        if w:
            seeds.append(w)

    # uniform equal allocation
    seeds.append({c:1.0/len(candidate_oas) for c in candidate_oas})

    # Evaluate seed candidates in parallel
    best_mean = -1.0
    best_ci = (np.nan, np.nan)
    best_w = None
    best_acts = None

    with ThreadPoolExecutor(max_workers=8) as pool:
        futures = [
            pool.submit(evaluate_portfolio, w, budget_scale, 2)
            for w in seeds
        ]
        for w, fut in zip(seeds, futures):
            m, ci, acts = fut.result()
            if m > best_mean:
                best_mean, best_ci, best_w, best_acts = m, ci, w, acts

    # Randomised + Mutated Search
    for t in range(iters):
        vec = rng.dirichlet(np.ones(len(candidate_oas)))
        w = {c: float(v) for c,v in zip(candidate_oas, vec)}

        # Drop tiny weights (<4%)
        w = {k:(v if v>=0.04 else 0.0) for k,v in w.items()}
        s = sum(w.values())
        if s <= 0:
            continue
        w = {k:v/s for k,v in w.items()}

        m, ci, acts = evaluate_portfolio(w, budget_scale, 1)

        if m > best_mean:
            best_mean, best_ci, best_w, best_acts = m, ci, w, acts

        # Early stopping (we already reach target)
        if best_mean >= 0.99*target_imp:
            break

    return best_mean, best_ci, best_w, best_acts

# 3. Shrinking the Budget: the computer tests the optimized blend at lower and lower 
# budget levels until it can no longer hit our "Target Score".
coarse_grid = [1.0] + list(np.round(np.arange(0.9,0.49,-0.05),2))
feasible = []

#print("\n🔍 Coarse grid search…")
for b in coarse_grid:
    mean_imp, ci_imp, w, acts = optimize_portfolio_for_budget(b, iters=40, seed=int(1000*b))
    feasible.append({'b':b, 'mean_imp':mean_imp, 'ci5':ci_imp[0], 'ci95':ci_imp[1], 'weights':w, 'acts':acts})
    print(f"  b={b:.2f} → mean={mean_imp:.5f}  (target {target_imp:.5f})")

good = [r for r in feasible if r['mean_imp'] >= 0.99 * target_imp]

if not good:
    print("❌ No budget reduction found. Try increasing iterations.")
    b_star=None
    best_record=None
else:
    b_low = min(r['b'] for r in good)
    below = [r for r in feasible if r['b'] < b_low and r['mean_imp'] < 0.99*target_imp]
    b_high_bad = max((r['b'] for r in below), default=b_low-0.05)
    b_left, b_right = b_high_bad, b_low

    best_record = next(r for r in feasible if r['b']==b_low)

    #print("\n🔍 Bisection refinement…")
    for _ in range(8):
        if b_right - b_left < 0.01:
            break
        b_mid = round((b_left + b_right)/2, 3)
        mean_imp, ci_imp, w, acts = optimize_portfolio_for_budget(b_mid, iters=20, seed=777)

        if mean_imp >= 0.99 * target_imp:
            best_record = {'b':b_mid, 'mean_imp':mean_imp,
                           'ci5':ci_imp[0], 'ci95':ci_imp[1],
                           'weights':w, 'acts':acts}
            b_right = b_mid
        else:
            b_left = b_mid

    b_star = round(best_record['b'], 3)
   # print(f"\n✅ Minimal budget scale found: b* = {b_star:.3f}")

# ============================================================
# Update reproducibility file
# ============================================================
try:
    repro_path = f'outputs/repro_report_{ACTIVITY_ID}.json'
    data = json.load(open(repro_path)) if os.path.exists(repro_path) else {}
    data.update({'budget_min': {
        'grid': coarse_grid,
        'b_star': b_star,
        'iters_grid': 40,
        'iters_bisect': 20
    }})
    json.dump(data, open(repro_path,'w'), indent=2)
except Exception as e:
    print("Repro write failed:", e)

#print("\n--- Budget Minimisation Completed ---")

if b_star is not None:
    red_pct = round((1.0 - b_star)*100, 1)
   # print(f"✅ Minimal budget scale to match Current (full $): b* = {b_star:.3f}  (reduction ≈ {red_pct}%)")

# 4. The Result: we display the "Minimal Budget" found and compare it to the current plan.
if 'b_star' in globals() and b_star is not None and best_record:
    # Build final metrics for Current (full $) and Optimized at b*
    def simulate_weights(weights, budget_scale, n_runs=10, seed_offset=5000):
        acts = build_activities_from_weights(weights)
        all_metrics = []
        for run_id in range(n_runs):
            ss_run = SeedSequence(BASE_SEED + seed_offset + run_id)
            rng_env = default_rng(ss_run.spawn(1)[0])
            child_ss = ss_run.spawn(len(all_effect_domains))
            rng_by_domain_run = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, child_ss)}
            acts_run = [Activity(a.name, a.srf_domain, a.budget_share, coverage=a.coverage, intensity=a.intensity, rng=rng_by_domain_run.get(a.srf_domain, default_rng())) for a in acts]
            env = Environment(clone_households(H_base), rng=rng_env)
            traj = simulate(env, acts_run, MONTHS, budget_scale=budget_scale)
            all_metrics.append(traj.iloc[-1][metrics].to_dict())
        df = pd.DataFrame(all_metrics)
        return df.mean().to_dict()

    # Current metrics
    def simulate_named_scenario(name, budget_scale=1.0, n_runs=10, seed_offset=6000):
        acts = [a for n, A, s in all_scenarios if n==name for a in A]
        all_metrics=[]
        for run_id in range(n_runs):
            ss_run = SeedSequence(BASE_SEED + seed_offset + run_id)
            rng_env = default_rng(ss_run.spawn(1)[0])
            child_ss = ss_run.spawn(len(all_effect_domains))
            rng_by_domain_run = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, child_ss)}
            acts_run = [Activity(a.name, a.srf_domain, a.budget_share, coverage=a.coverage, intensity=a.intensity, rng=rng_by_domain_run.get(a.srf_domain, default_rng())) for a in acts]
            env = Environment(clone_households(H_base), rng=rng_env)
            traj = simulate(env, acts_run, MONTHS, budget_scale=budget_scale)
            all_metrics.append(traj.iloc[-1][metrics].to_dict())
        return pd.DataFrame(all_metrics).mean().to_dict()

    m_current = simulate_named_scenario('Current', 1.0, n_runs=10)
    m_opt     = simulate_weights(best_record['weights'], b_star, n_runs=10)

    # Build compact table
    rows=[]
    for m in metrics:
        cur=m_current[m]; opt=m_opt[m]
        # Direction-adjusted delta for color cue
        sign=-1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else +1
        delta = sign*(opt-cur)
        rows.append({'Metric': m, 'Current (full $)': cur, f'Optimized @ b*={b_star}': opt, 'Δ (good↑)': delta})
    rep_df = pd.DataFrame(rows)

    # Pretty HTML one-pager
    red_pct = round((1.0 - b_star)*100, 1)
    alloc = sorted([(k, v*100) for k,v in best_record['weights'].items()], key=lambda x:x[1], reverse=True)[:10]
    alloc_table = ''.join([f"<tr><td style='padding:4px 8px'>{OA_FULL_LABELS.get(k,k)}</td><td style='padding:4px 8px;text-align:right'>{v:.1f}%</td></tr>" for k,v in alloc])
    metrics_table = ''.join([
        f"<tr><td style='padding:4px 8px'>{r['Metric']}</td>"
        f"<td style='padding:4px 8px;text-align:right'>{r['Current (full $)']:.4f}</td>"
        f"<td style='padding:4px 8px;text-align:right'>{r[f'Optimized @ b*={b_star}']:.4f}</td>"
        f"<td style='padding:4px 8px;text-align:right;color:{'#27ae60' if r['Δ (good↑)']>=0 else '#e74c3c'}'>{r['Δ (good↑)']:.4f}</td></tr>"
        for _,r in rep_df.iterrows()])

    html = f"""
    <div style='font-family:sans-serif;max-width:1000px;'>
      <div style='display:flex;gap:18px'>
        <div style='flex:1;background:#0072BC;color:white;padding:16px;border-radius:8px'>
          <h3 style='margin:0 0 8px 0'>Budget one‑pager</h3>
          <p style='margin:0'>Minimal budget scale to match <b>Current (full $)</b>:<br>
             <span style='font-size:1.4em;font-weight:700'>b* = {b_star:.3f}</span>
             <span style='opacity:0.9'>(≈ {red_pct}% reduction)</span>
          </p>
        </div>
        <div style='flex:1;background:#f4f9ff;padding:16px;border-radius:8px'>
          <h4 style='margin:0 0 8px 0;color:#0072BC'>Top 10 OA allocations at b*</h4>
          <table style='width:100%;font-size:13px;border-collapse:collapse'>{alloc_table}</table>
        </div>
      </div>
      <div style='margin-top:16px;background:#fff;border:1px solid #e6e6e6;border-radius:8px'>
        <h4 style='margin:12px 12px;color:#333'>Compact comparison (final‑month means)</h4>
        <table style='width:100%;font-size:13px;border-collapse:collapse'>
          <thead>
            <tr style='background:#f0f0f0'>
              <th style='text-align:left;padding:6px 8px'>Metric</th>
              <th style='text-align:right;padding:6px 8px'>Current (full $)</th>
              <th style='text-align:right;padding:6px 8px'>Optimized @ b*</th>
              <th style='text-align:right;padding:6px 8px'>Δ (good↑)</th>
            </tr>
          </thead>
          <tbody>{metrics_table}</tbody>
        </table>
      </div>
    </div>
    """
    display(HTML(html))
else:
    print("One-pager not available — b* not computed or no best_record found.")

  b=1.00 → mean=0.01323  (target 0.00046)
  b=0.90 → mean=0.01415  (target 0.00046)
  b=0.85 → mean=0.01074  (target 0.00046)
  b=0.80 → mean=0.01114  (target 0.00046)
  b=0.75 → mean=0.01004  (target 0.00046)
  b=0.70 → mean=0.00884  (target 0.00046)
  b=0.65 → mean=0.00741  (target 0.00046)
  b=0.60 → mean=0.00890  (target 0.00046)
  b=0.55 → mean=0.00675  (target 0.00046)
  b=0.50 → mean=0.00729  (target 0.00046)

Budget one‑pager

Minimal budget scale to match Current (full $):
b* = 0.456 (≈ 54.4% reduction)

Top 10 OA allocations at b*

OA8. Well-Being and Basic Needs	12.5%
OA13. Self Reliance, Economic Inclusion and Livelihoods	12.5%
OA7. Community Engagement and Participation	12.5%
OA4. Gender-based Violence	12.5%
OA5. Child Protection	12.5%
OA1. Access to Territory, Registration and Documentation	12.5%
OA15. Resettlement and Complementary Pathways	12.5%

Compact comparison (final‑month means)

Metric	Current (full $)	Optimized @ b*	Δ (good↑)
Vulnerability (average effect)	0.5136	0.5190	-0.0054
Protection risk (average effect)	0.5181	0.5216	-0.0035
Basic needs gap (average effect)	0.4955	0.4942	0.0012
Education access (average effect)	0.4272	0.4276	0.0004
Morbidity (average effect)	0.4884	0.4880	0.0005
Service efficiency (average effect)	0.4208	0.4242	0.0034

Step 7: Testing Sensitivity

Just in case our assumptions were slightly wrong, we test how sturdy and sensitive to changes our results are. What if things actually got 20% worse naturally over the year (drift)? What if our outcomes were 20% less effective than we thought (effects)?

We test these “shocks” to verify that our winning budget plans are still the best, even if the real world is tougher than the simulation.

Reading the chart:

Total effect (ST) shows the overall importance of a parameter, including interactions with other parameters.
Direct effect (S1) shows the importance on its own, holding others fixed.
If ST >> S1, it means the parameter mainly matters through interactions (e.g., combined with other parameters).
Focus risk management or calibration effort on parameters with the largest ST.

Code

# This is the "Stress Test". We test how our results change if the world gets 
# 20% tougher (drift) or if our programs are 20% less effective (effects).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numpy.random import SeedSequence, default_rng
from concurrent.futures import ThreadPoolExecutor, as_completed

# ================
# 0) Parameters
# ================
SENS_RUNS = 6           # Monte Carlo runs per evaluation (keep modest for speed)
BASE_SEED_SENS = BASE_SEED + 9000
BUDGET_SCALE_SENS = 1.0  # we test "Current" at full budget

# Bounds for the global study
PARAM_NAMES = [
    "effect_scale",
    "drift_vulnerability", "drift_protection", "drift_basic",
    "drift_morbidity", "drift_education", "drift_service"
]
PARAM_BOUNDS = [
    [0.8, 1.2],  # effect_scale
    [0.8, 1.2],  # vul
    [0.8, 1.2],  # protection
    [0.8, 1.2],  # basic needs
    [0.8, 1.2],  # morbidity
    [0.8, 1.2],  # education
    [0.8, 1.2],  # service
]

# Helper: compute "total improvement" for a given final row (already in your notebook)
def _total_improvement_from_row(row, base_row, metrics):
    tot=0.0
    for m in metrics:
        sign = -1 if any(k in m.lower() for k in ['vulnerability','protection','needs','morbidity']) else +1
        tot += max(0.0, sign*(row[m]-base_row[m]))
    return float(tot)

# Helper: evaluate "Current" with specific knobs
def _run_current_with_knobs(effect_scale_mult=1.0, drift_mults=None, n_runs=SENS_RUNS, seed_offset=0):
    # We temporarily adjust the global EFFECT_SCALE_MULT inside this call
    global EFFECT_SCALE_MULT
    orig = EFFECT_SCALE_MULT
    EFFECT_SCALE_MULT = effect_scale_mult
    try:
        acts = [a for n, A, s in all_scenarios if n=='Current' for a in A]
        totals=[]
        for run_id in range(n_runs):
            ss_run = SeedSequence(BASE_SEED_SENS + seed_offset + run_id)
            rng_env = default_rng(ss_run.spawn(1)[0])
            child_ss = ss_run.spawn(len(all_effect_domains))
            rng_by_domain_run = {dom: default_rng(ss) for dom, ss in zip(all_effect_domains, child_ss)}
            acts_run = [Activity(a.name, a.srf_domain, a.budget_share, coverage=a.coverage, intensity=a.intensity, rng=rng_by_domain_run.get(a.srf_domain, default_rng())) for a in acts]
            env = Environment(clone_households(H_base), rng=rng_env, drift_mults=drift_mults)
            traj = simulate(env, acts_run, MONTHS, budget_scale=BUDGET_SCALE_SENS)
            final = traj.iloc[-1]
            totals.append(_total_improvement_from_row(final, base_row, metrics))
        return float(np.mean(totals))
    finally:
        EFFECT_SCALE_MULT = orig

# Baseline
finals_full = results_df_agg[results_df_agg['month']==results_df_agg['month'].max()].copy()
base_row = finals_full[finals_full['scenario']=='No intervention'].iloc[0]
cur_row  = finals_full[finals_full['scenario']=='Current']
baseline_total = _total_improvement_from_row(cur_row.iloc[0], base_row, metrics)

# 1. Global Stress Test: we analyze how every parameter (like effectiveness and 
# natural decline) contributes to the final result.
try:
    from SALib.sample import saltelli
    from SALib.analyze import sobol
    #print("🔬 Running global sensitivity (Sobol)…")

    problem = {
        "num_vars": len(PARAM_NAMES),
        "names": PARAM_NAMES,
        "bounds": PARAM_BOUNDS
    }

    # Sample design (Saltelli). Start small (e.g., 256) then scale (512/1024) as needed.
    N_SAMPLES = 256
    X = saltelli.sample(problem, N_SAMPLES, calc_second_order=False)

    # Map each sample to an evaluation of total improvement
    def _eval_row(x, idx):
        # x order: [effect_scale, drift_vul, drift_prot, drift_basic, drift_morb, drift_edu, drift_serv]
        effect_scale = float(x[0])
        drift_mults = {
            'vulnerability': float(x[1]),
            'protection':    float(x[2]),
            'basic':         float(x[3]),
            'morbidity':     float(x[4]),
            'education':     float(x[5]),
            'service':       float(x[6])
        }
        return _run_current_with_knobs(effect_scale_mult=effect_scale, drift_mults=drift_mults, n_runs=SENS_RUNS, seed_offset=10000+idx)

    # Parallel evaluation
    Y = np.zeros(len(X), dtype=float)
    with ThreadPoolExecutor(max_workers=8) as pool:
        futures = {pool.submit(_eval_row, X[i], i): i for i in range(len(X))}
        for fut in as_completed(futures):
            i = futures[fut]
            Y[i] = fut.result()

    # Analyze Sobol indices
    Si = sobol.analyze(problem, Y, calc_second_order=False, print_to_console=False)

    # Build results table
    sobol_df = pd.DataFrame({
        'Parameter': PARAM_NAMES,
        'First-order (S1)': Si['S1'],
        'S1 conf.': Si['S1_conf'],
        'Total-order (ST)': Si['ST'],
        'ST conf.': Si['ST_conf']
    }).sort_values('Total-order (ST)', ascending=False).reset_index(drop=True)

    # Plot: total-order and first-order indices
    plt.figure(figsize=(9,6))
    y = np.arange(len(sobol_df))
    plt.barh(y+0.15, sobol_df['Total-order (ST)'], height=0.3, color='#1f77b4', label='Total effect (ST)')
    plt.barh(y-0.15, sobol_df['First-order (S1)'], height=0.3, color='#2ca02c', label='Direct effect (S1)')
    plt.gca().set_yticks(y)
    plt.gca().set_yticklabels(sobol_df['Parameter'])
    plt.axvline(0, color='black', lw=1)
    plt.title("Global Sensitivity (Sobol): Contribution of Each Parameter", fontsize=13, fontweight='bold')
    plt.xlabel("Sensitivity index (0–1)")
    plt.legend()
    plt.grid(axis='x', alpha=0.3, linestyle='--')
    plt.tight_layout()
    plt.show()

    display(sobol_df.style.format({'First-order (S1)':'{:.3f}','S1 conf.':'{:.3f}','Total-order (ST)':'{:.3f}','ST conf.':'{:.3f}'}))



except Exception as e:
    # 2. Simplified Stress Test (Fallback): we test specific ±20% "shocks" one by one 
    # to see how they impact the total improvement.
    print("⚠️ SALib not available; running local OAT tornado instead.\n(Install with: pip install SALib)")
    # Effect scale ±20%
    records=[]
    for f in [0.8, 1.0, 1.2]:
        val = _run_current_with_knobs(effect_scale_mult=f, drift_mults=None, n_runs=SENS_RUNS, seed_offset=0)
        records.append({'Factor': f'Effect scale ×{f}', 'Total': val, 'Δ vs base': val-baseline_total})

    # Drift multipliers (each dimension ±20%, OAT)
    keys = [('vulnerability','Vulnerability'), ('protection','Protection risk'), ('basic','Basic needs gap'), ('morbidity','Morbidity'), ('education','Education access'), ('service','Service efficiency')]
    for key, label in keys:
        for f in [0.8, 1.2]:
            mults = {k:1.0 for k,_ in keys}
            mults[key]=f
            val = _run_current_with_knobs(effect_scale_mult=1.0, drift_mults=mults, n_runs=SENS_RUNS, seed_offset=100)
            records.append({'Factor': f'Drift {label} ×{f}', 'Total': val, 'Δ vs base': val-baseline_total})

    sens_df = pd.DataFrame(records)

    # Tornado chart
    fig, ax = plt.subplots(figsize=(8, 6))
    sens_df_sorted = sens_df.sort_values('Δ vs base', ascending=True)
    colors = ['#e74c3c' if x < 0 else '#27ae60' for x in sens_df_sorted['Δ vs base']]
    ax.barh(sens_df_sorted['Factor'], sens_df_sorted['Δ vs base'], color=colors)
    ax.axvline(0, color='black', linewidth=1)
    ax.set_title("Sensitivity (OAT): How ±20% Shocks Affect Total Improvement", fontsize=13, fontweight='bold', pad=15)
    ax.set_xlabel("Change in Overall Help (vs. Baseline)")
    ax.set_ylabel("Factor shocked")
    ax.grid(axis='x', alpha=0.3, linestyle='--')
    plt.tight_layout()
    plt.show()

	Parameter	First-order (S1)	S1 conf.	Total-order (ST)	ST conf.
0	drift_protection	0.402	0.086	0.382	0.061
1	drift_vulnerability	0.272	0.094	0.277	0.052
2	drift_basic	0.169	0.067	0.174	0.032
3	drift_morbidity	0.100	0.053	0.106	0.022
4	drift_education	0.042	0.039	0.044	0.009
5	drift_service	0.037	0.032	0.041	0.009
6	effect_scale	0.000	0.003	0.000	0.000

Conclusion

In this analysis, we have developed a comprehensive framework to evaluate and optimize UNHCR’s resource allocation strategies. By integrating IATI data, agent-based modeling, and data envelopment analysis, we have created a robust system that can simulate the impact of different budget scenarios and identify the most efficient allocation strategies.

Key Findings

Impact of Allocation Scenarios: We have demonstrated how different budget allocation scenarios can have varying impacts on the lives of the people UNHCR serves. By simulating these scenarios, we can identify the most effective ways to allocate resources to maximize positive outcomes.
Efficiency Frontier: We have used data envelopment analysis to identify the efficiency frontier, which represents the optimal allocation strategies that maximize impact for a given budget level. This allows us to identify the most efficient ways to allocate resources to achieve our goals.
Sensitivity Analysis: We have conducted a sensitivity analysis to test the robustness of our results. We have found that our results are relatively robust to changes in the assumptions we have made, which increases our confidence in our findings.

Limitations

Data Availability: Our analysis is limited by the availability of data. While we have used IATI data to inform our analysis, there may be other data sources that could be used to further improve our results.
Model Simplification: Our agent-based model is a simplification of the real world. While we have tried to capture the most important factors, there are many other factors that could be included to further improve our results.
Assumptions: Our analysis is based on a number of assumptions, such as the assumption that our effect estimates are accurate. While we have tested the sensitivity of our results to these assumptions, there is always a degree of uncertainty in any analysis.

Future Work

Expand Data Sources: We could expand our analysis to include other data sources, such as UNHCR’s household survey data, to further improve the agent-based model part.
Enhance Model Complexity: We could enhance the complexity of our agent-based model to include more factors, such as the impact of different types of interventions, to further improve our results.
Real-Time Monitoring: We could develop a real-time monitoring system that would allow us to track the impact of our resource allocation strategies in real-time and make adjustments as needed.