Solar Digital Operations Series #1: The Only Way Solar O&M Scales

(And Why Most Companies Get Digitalization Wrong)

Jul 18, 2025

Today I'm in a building mood, and that's actually an essential part of my mission. Because while everyone talks about solar scaling, few understand the brutal reality of what actually prevents it.

O&M and Asset management in solar have a scalability problem that costs the industry millions of euros and dollars. But what are the real bottlenecks to scaling?

The Three Bottlenecks Killing Solar Operations:

The talent pool crisis - from entry-level solar technicians (electricians, after all) to top executives, there simply aren't enough qualified people
Broken systems - the operational sector in most countries is in its teenage years, learning to be a grown-up, functioning on whims and shaky systems
Stone-age tech - companies either can't leverage available market technology, or their tech stack is riddled with technical debt

Here's the harsh reality: the talent pool will never grow fast enough to fill all the roles and compensate for the lack of tech or systems. So technology must compensate. But how?

Let me share two stories that changed everything about how I approach this problem.

This is the first post in our Digital Operations Series, where I'll be breaking down the real challenges and solutions for scaling solar O&M using technology and automation. Each post builds on the previous one, so if you're serious about digital transformation, hit subscribe.

Story #1: When Business Systems Meet Reality

We said "pass for now" to a huge opportunity simply because the prospect didn't understand that business systems, procedures, and SOP1s are deeply intertwined with IT infrastructure. Their siloed approach was a recipe for disaster, and the only solution was co-design.

This taught me Fundamental Principle #1: O&M and Asset Management operational and business systems need to be designed together with OT2, IT, and AI functionalities in mind. IT & OT infrastructure allows the business to run, but on the other hand, the business needs to adapt to IT solutions to score on those multiplier effects.

Story #2: The Day I Decided to Clone Our Brains

After training the key players in the O&M team of one of our customers, they picked up their stuff and left. While the reasons deserve an entire series of posts, this moment crystallized my new vision for SolarNotions.

Frustrated that all the know-how accumulated through training sessions was gone and the systems and IT infrastructure in place couldn't be operated, it dawned on me: Solar Notions' brains need to be multiplied and made to run the existing systems. Those multipliers are robots and AI agents.

This revealed Fundamental Principle #2: The only way to scale in our sector is to unleash the power of AI agents and robotics across the board. And if you want to use AI agents and robots to handle the immense volume of work, you must design your processes and technology stack accordingly from day one.

(By the way, who actually likes being covered in mud looking for a cable cut, or staring at 6 screens full of rapidly changing values?)

Which story resonates more with your experience? Have you faced similar challenges with team turnover or system integration? Share your war stories in the comments below.

Layer 1: On-Site Level - Where Robots Meet Reality

For your operations, you need to design how technicians get work done in the field. Not how tickets reach the site, but how they replace modules, troubleshoot inverters, perform thermography scans. Super practical, hands-on SOPs that you can already find in equipment manuals and measurement instrument libraries from Fluke, Flir, Megger, Metrel.

On the tech stack, this is your OT - Operational Technology network:

Read, log, and save as many indicators as your devices allow - all data provides value
Connect all devices together: use fiber if necessary, but all devices in a control loop (inverters, sensors, PLC/Edge computers) need to live on the same VLAN
Control and dispatch: deploy PLCs and Edge Computers that are robust and offer the widest range of commands for your controllable devices
Secure it properly - lock your SCADA cabinet, require ID for key access, secure credentials, and deploy managed networks
Build to scale - leave room in cabinets for future devices, keep enough Ethernet and fiber ports available
Robust connectivity - not just internet connection (preferably independent), but interfaces with Grid Operators, alarm systems, and any other required communication
Robots and drones - the most complex edge devices that will offload your O&M team for a fraction of the cost. New they can be resident3 and cloud controlled

With SOPs locked in and a future-proof local SCADA setup, plus robots getting ready, you move to the layer that brings context and enriches Layer 1 choices through standardization, interoperability, and vertical integration.

Layer 2: Transportation Layer - Moving Everything That Matters

This is how you move people, data, and materials from HQ/Cloud to the site.

Operations considerations:

Which van/car do techs drive, and how is it equipped?
How well is the geographical distribution of plants matched with tech team locations?
Which other vehicles need site access? High-worker, boat, buggy, drone?
For drones: always have the geolocated digital twin and flying box ready
How do technicians access ticket information and site documentation? Smartphone, tablet, laptop

Technology translation:

Excellent internet with strong GSM signal - comes down to good antennas placed correctly
Roaming connectivity and decent telecom operator options (M2M SIM card providers)
Remote access - so basic, yet so few actually use it properly
Secure remote access - VPN, firewall, the works
All the apps, software, and services needed to enable both operations and secure internet connection with remote access

This layer ensures data, control, and access are both robust and secure. Nothing's more frustrating than being blind or impotent in front of a disconnected plant, or being ready to work but unable to access plant drawings stored in the cloud.

Quick check: Which layer is your biggest bottleneck right now? Drop a comment – I read and respond to every one.

Layer 3: Systems and Cloud - Where Operations Meet Intelligence

This is the most complex layer where operational design and cloud infrastructure converge. Both operations and IT rely on data as the foundation for decision-making. After all, automation isn't just action-taking but also decision-making support.

There's no way around it: you need a Data Lake and a Data Model.

You need some form of IoT Server that controls your fleet of edge devices.

At a high level, all objects, entities, people, datastreams, and devices need mapping. From the contract saved in Google Drive to the data indicators read from inverters and how they're named - this and much more are part of your Data Model.

You can only do this after mapping and designing business processes as if they were done with pen and paper(only initially).

With correct processes translated into a data model, you've laid the groundwork for any software application to consume this data.

All automation and formal/informal processes have services hosted on cloud solutions or on-premises. This is where automation and agents get created.

Data pipelines to read, compute, store, and issue commands to edge/plants are all hosted by web services or running apps. CMMS, Trading, Monitoring platforms have their engines at this level. This is where the biggest potential for automation and interoperability actually lives.

In the era of APIs and MCPs, more operations can be digitalized into IT services and delegated to agents ready to run 24/7.

Isn’t it great when robots, drones and datacenter consume solar generated electricity to provide services to solar plants!

Layer 4: Intelligence and Decision Making - Where Humans and AI Collaborate

The whole purpose of having operational and IT systems is enabling better, faster, more reliable decisions and actions. Layers 1-3 act as amplification mediums for Layer 4.

At this level you find:

The Control Room - literally several screens, monitoring, making sense of plant status, issuing remote control commands and troubleshooting

The Trading Desk - real-time plant control, forecasting systems, automated capacity forecasting and control

This means Grafana, Power BI dashboards, custom platforms, plus all the ready-to-buy platforms that deliver all or not enough of your scope. Most office-based staff operations and SOPs happen here, making it the most susceptible for deploying AI agents.

Correct process mapping translated into AI agent teams turns technicians and planners into robotically-assisted operators, collaborating in troubleshooting.

Your Starting Point: The Baseline Audit

Before building anything, you need brutal honesty about where you stand.

Do it internally or externally - doesn't matter as long as you do it right and don't hide dirt under the carpet.

Go through your processes, SOPs, contracts, teams, roles, technology. Document everything. Don't leave a stone unturned.

You can start with this 10-minute assessment.

If you're tight on budget but have people and time, diagrams in Miro or PowerPoint plus notes in Google Docs or Notion work fine.

But if there's a budget, there's also enough MWp installed, which means you're running higher risk of losing grip. Consequences are expensive and can damage both equipment and reputation. In such cases, an external impartial party is your best route. Costs money, delivers harsh reality checks, but at least you know what you're up against.

We always run an audit or assessment to determine the real t0 - your starting point. Because truth is, you never start from zero. There's always a WhatsApp chat with informal workflow between technicians and managers, or a Google Drive with some file system. Maybe heavy reliance on ChatGPT for troubleshooting. It's free, it's there, why not use it?

The final assessment piece: have main stakeholders answer "Why are we doing this?" or "How are we bringing value to our customers?"

When a company knows where they want to go, where they start from, and the risks they're currently taking, that awareness and tension fuels the next steps.

The Reality Check

Here's what I'm seeing: most companies are stuck in Layer 1 thinking while their competitors are already deploying Layer 4 AI agents. Every month you delay this transformation, you're not just missing optimization opportunities—you're falling behind operators who understand that the future belongs to those who build with robots and AI agents in mind from day one.

The companies calling us aren't just asking for consulting—they're asking us to save them from their own technical debt.

Three clients this month alone started with "We tried to build this internally, but..."

One had spent 18 months and €400K on a system that couldn't even properly track which technician was at which site. Another had five different monitoring platforms that couldn't talk to each other. The third was manually transcribing data from inverter screens into Excel sheets because their "digital transformation" couldn't handle basic data integration.

These aren't small players. These are multi-gigawatt operators who thought they could figure it out themselves.

The difference between companies that scale and those that struggle isn't just technology—it's understanding that operational systems and AI infrastructure must be designed together from day one. Most try to bolt AI onto broken processes and wonder why it doesn't work.

Ready to see where you really stand? Take the 10-minute assessment that's already helped 200+ operators identify their biggest gaps before they became expensive problems.

If you’re into reading more, then take the Best Practice Guideline for O&M version 6 and read especially chapter 9, where I contributed

What's your next step after reading this?
Are you starting with the assessment, or do you have specific questions about implementation? Let's discuss in the comments.

Solar Notions