Agentic AI for Energy - A Data Engineer's Guide

#ai #energy #data ![[agentic-ai-for-energy.png]] I found a PDF on agentic AI in energy. It's 27 pages, dense, and written partly in marketing language. But underneath the market-speak, there are some genuinely useful ideas about how autonomous agents could coordinate energy operations. So I read it, used AI to help me understand the nuances, and I'm writing this note to make sense of it for someone in data roles. The PDF is [Agentic AI for Energy Explained by Hypercube](https://wearehypercube.com/wp-content/uploads/2026/01/Agentic-AI-for-Energy-Explained.pdf). This note translates what it says into what actually matters for a data engineer considering this space. ## Three Waves: Predict, Understand, Act The PDF describes AI in energy as moving through three distinct phases. I think this framing is useful because it explains why agentic systems are different from what most companies already have. ### Wave 1: Machine Learning - You Make a Forecast, Someone Else Decides Traditional ML takes structured data (sensor readings, weather observations, market prices) and produces forecasts. A wind model predicts tomorrow's generation at 3 PM. A solar model predicts panel output. A load forecasting model predicts demand. These go on a dashboard. A human reads them. Maybe they adjust their schedule. Maybe they don't notice. Either way, there's a gap between the prediction and the action. **For a data engineer**: You're building the pipelines that feed these models and ensuring the data quality is good enough for accurate predictions. You optimise the data layer. But you're not responsible for what happens with the forecast once it exists. ### Wave 2: Generative AI - You Synthesize Context, Someone Else Decides GenAI reads unstructured stuff—emails from suppliers, PDF reports, contract documents—and produces summaries and contextual insights. A supplier email says "we're 3 weeks late." GenAI reads that email, extracts the key facts (what's late, why, duration), and maybe even cross-references it with your project schedule. Now you have context. But someone still needs to act on it. They need to update the project plan, notify teams, adjust timelines, manage stakeholders. The AI understands the situation. It doesn't coordinate the response. **For a data engineer**: You're integrating unstructured data sources (email systems, document repositories) with your structured data. You're building pipelines that get documents into the LLM and pull structured outputs back out. You're thinking about how to organize data so it's findable and contextual. And all of this is deterministic engineering to get inputs to the LLM, even though the outputs maybe non-deterministic[^1] at times. ### Wave 3: Agentic AI - The System Coordinates the Response This is where it gets different. An agentic system doesn't just forecast or summarize. It orchestrates a response. The supplier delay email triggers a cascade: the system extracts the info, checks what tasks depend on this delivery, calculates the impact, proposes mitigations, drafts communications, and waits for a human to review and approve before updating systems. Intelligence flows directly into action. Not instantly—there's human oversight. But it's coordinated, it's documented, and it propagates across systems automatically instead of waiting for someone to manually update ten different spreadsheets. **For a data engineer**: You're building data infrastructure that lets agents see across systems. You're designing the pipelines and integrations that give agents a unified view of scattered data. You're creating the audit trails that let humans understand what the agent decided and why. ## How Agentic Systems Actually Work The PDF walks through the mechanics and this is what I have been able to make out of it. ### Step 1: Perception - Continuous Data Ingestion Agents pull data from everywhere: - Structured: [SCADA](https://inductiveautomation.com/resources/article/what-is-scada) systems, databases, APIs, market feeds - Unstructured: Emails, PDFs, reports, documents - Real-time: Live grid frequency, wholesale prices, weather observations - Reference: Project schedules, asset specs, contract terms From a data engineering angle, this is the hard part. Most energy organisations have data scattered across incompatible systems. You've got a 30-year-old SCADA system talking to nobody. You've got project schedules in Smartsheet. You've got budget data in SAP. You've got supplier communications in email. An agent needs access to all of it simultaneously. That's not a technology problem. That's an integration problem. To fans of the Model Context Protocol (MCP), these can potentially help solve the integration problem. But, how would an MCP be built for legacy systems? That could be tricky. [Also, are MCPs even secure](https://www.docker.com/blog/mcp-security-issues-threatening-ai-infrastructure/)? ### Step 2: Reasoning - LLM Interprets and Plans Once the agent has data from all those sources, an LLM reads it and decides what to do. What does this email mean? What downstream tasks are affected? What constraints apply? What's the next action? Here's the thing the PDF doesn't spell out clearly: the LLM is not as useful without domain knowledge. It doesn't inherently know that cycling a battery 5 times a day accelerates degradation. It doesn't know that offshore wind maintenance windows are dictated by 48-hour weather forecasts. It doesn't know that PPAs have specific performance guarantees that constrain what you can do. You have to encode all that knowledge somehow. Either in the prompt, or in structured form that the agent can query. That's domain-specific data work, and it's critical. ### Step 3: Action - With Human Review The agent proposes actions: update the project schedule, send notifications, file tickets, adjust dispatch decisions. But—and this is important—in energy contexts, critical actions go through human approval before execution. An operator reviews the battery dispatch recommendation and approves it. A project manager reviews the schedule adjustment and confirms it makes sense. This is the "human-in-the-loop" pattern. It's not the agent deciding autonomously. It's the agent handling the routine cognitive load and humans making the actual decisions faster because they've got better information and clearer recommendations. ## The Concrete Example: When a Supplier Says "We're Late" Here is a supplier delay scenario. It's worth understanding because it shows how the three steps above actually flow together. **The problem:** A supplier email arrives saying switchgear delivery is 3 weeks late. This is bad news. It affects project schedules, which affects budgets, which affects financing. But information is scattered. The email is in an inbox. The schedule is in Smartsheet. Dependencies are in design specs. Budget is in a finance system. By the time anyone manually connects these dots, precious planning time is lost. **The agentic flow:** 1. The system monitors shared inboxes and detects delay language 2. A language agent extracts facts: what's delayed, who's the supplier, how long 3. A planning agent queries the project schedule and identifies downstream dependencies and how much slack exists 4. A mitigation agent runs scenarios: if we accept this delay, what slips? What's the cost? What can we reschedule? 5. A communications agent drafts internal updates and external responses 6. A project manager reviews everything in a single interface and approves 7. The system logs every decision and updates all relevant systems From detecting the delay to updated schedules: instead of days of manual back-and-forth, this could be minutes. More importantly, it's systematic. Nothing gets missed. **What a data engineer needs to know here:** You're the one who makes this possible. You integrate the email system with the project scheduler with the budget system. You design the data model that lets agents query "what depends on this component?" You set up the audit trail so everyone can see what changed and why. ## Where This Actually Matters in Energy The PDF catalogs use cases across renewables, storage, offshore wind, solar, trading, flexibility, and traditional energy. Instead of listing all of them, let me point out what they have in common: they're all coordination problems, not prediction problems. **Renewables (Solar/Wind)**: Energy output varies unpredictably. You need contract management (when are milestones due?), feasibility assessment (is this site viable?), performance monitoring (why is this asset underperforming?), ESG reporting (what's the impact?). These aren't solved by better ML models. They're solved by coordinating across scattered information sources quickly. **Battery storage**: Batteries are economically complex. They're paid for generating when valuable, charged when cheap, degraded by cycling, constrained by warranties. You need to track degradation, coordinate dispatch across multiple batteries, participate in markets, account for true marginal cost of each cycle. Again, this is coordination across multiple constraints, not pure prediction. **Offshore wind**: Heavily regulated, capital-intensive. You're tracking permit conditions, translating survey data into engineering constraints, optimizing layouts against wake effects and seabed conditions, scheduling maintenance around weather windows. This is coordination across teams and constraints. **Solar**: Site selection, forecasting, fault detection, soiling identification. You're integrating satellite imagery, grid data, environmental factors, performance telemetry. Again, it's not prediction. It's synthesis and coordination. The pattern is clear: the value isn't in better models. It's in faster, more coordinated responses to complex situations. ## The Data Infrastructure You Need If you're working in this space, here's what you'd actually be building: ### 1. Real-Time Data Pipelines Energy operations don't batch. You need continuous ingestion from SCADA systems, market APIs, weather services, email systems, project management tools. This is fundamentally different from traditional BI pipelines that run nightly. ### 2. Domain Knowledge as Structured Data Agents need to understand constraints. What's the degradation curve for this battery? What are the grid codes? What are the warranty limits? What are the approval thresholds? Store these as queryable metadata, not buried in documents or institutional knowledge. ### 3. Integration Layer Between Systems You're not replacing legacy systems. You're building APIs and connectors that let agents read from and write to systems that weren't designed to talk to each other. This is tedious, unglamorous infrastructure work. It's also where 80% of the value is. ### 4. Audit and Observability Every agent decision gets logged. What data informed this? What reasoning was applied? Who approved it? What was the outcome? This isn't compliance theater. It's how you improve the system. ## What This Means for a Data Engineer Here's my honest take: if you want to work in agentic AI for energy, you're not building models. You're solving integration and coordination problems. **The work:** - Designing data pipelines that bring together siloed systems - Building domain context layers that encode operational constraints - Creating audit trails that let humans understand agent decisions - Working with domain experts to translate their knowledge into structured form - Integrating with APIs that weren't designed to be integrated **The prerequisite skills:** - Data engineering (you're building pipelines and integrations) - Systems thinking (you need to understand how energy systems actually work) - Some domain knowledge about energy (not expert-level, but enough to know what's being coordinated) - Comfort with ambiguity (you're often working with legacy systems that are partially documented) **The bottleneck:** It's not the LLMs. It's not the agents. It's data. Can you get all the relevant information accessible to the agent? Can you structure it so the agent can reason about it? Do you have an audit trail humans can trust? ## What I'm Uncertain About The PDF is clear about the vision. It's less clear on the implementation details: - **How do you actually encode domain knowledge?** The PDF mentions "constraints" and "guardrails" abstractly. In practice, does this mean prompt engineering? Structured metadata? Training? The answer matters because it affects how you'd approach the data work. - **What does "human-in-the-loop" actually look like operationally?** Sub-minute approvals? Human review committees? It matters because it affects your audit and notification systems. - **How much of the value is agent-specific vs. better data infrastructure?** The PDF shows correlations (digital leaders outperform on climate goals), but doesn't isolate the agentic AI component. Is the value in the agent, or in the fact that they invested in better data infrastructure first? - **What's the economic return today?** The PDF projects $1.3 trillion cost reduction globally by 2050. That's fine for long-term strategy. But what are actual implementations seeing today? 10% efficiency gains? 5%? Are we still figuring it out? These aren't criticisms. They're just the questions you'd ask if you were deciding whether to invest time in this space. ## Why This Matters for Decarbonisation The energy transition isn't blocked by technology. We know how to build renewables. We know how to operate storage. The bottleneck is coordination. You've got thousands of distributed assets across an increasingly variable system. No human team can coordinate that in real-time. Agentic systems remove that bottleneck: - **Faster decision cycles**: Problems that took days to coordinate now take minutes - **Better asset utilisation**: Assets operate closer to economic optimum because decisions account for multiple factors simultaneously - **Risk visibility**: Problems surface automatically instead of being buried in spreadsheets - **Scaling**: You can coordinate 10,000 solar installations or 500 EV chargers or 100 battery facilities From a data engineer's perspective, the opportunity is building the infrastructure that makes that coordination possible. That's real work. It's not glamorous. But it's how you actually accelerate decarbonisation. [^1]: Non-determinism can also be controlled with better engineering I understand. But, I don't understand this too well at the moment. One for my learning in the future!