When we talk about complex data, we are usually referring to data that is made up of lots of other existing data types. Think things like: bills of materials, word processing documents, time series and maps.
While ‘complex data’ usually sits under the umbrella of ‘big data’, it is different in that it is not constantly evolving, doesn’t necessarily involve large volumes of data and you are not looking for patterns; the frameworks within which you are working are fixed and you know what you are looking for. This means it is great for evidence-building, informing decisions, and giving light to some of the more subtle, complex relationships and dependencies.
If you’re not already working with this type of data, it may come as no surprise that complex data is complicated. In addition to trying to understand what it all means – including all of the relationships inside the data – other core challenges include reformatting/translating the data into a usable format and presenting the complexity to end users in an engaging way.
Think of it like a mortgage application process: complex, dry and probably baffling for most people.
Calvium recently worked on one of these metaphorical mortgage applications for the energy crops industry when we developed the Perennial Energy Crops Decision Support System (PEC-DSS). For this project we were tasked with building an app that enables farmers, land managers and consultants to investigate the suitability and profitability of growing energy crops on different land sites.
This involved taking many years’ worth of calculations and forecasts that had, up until then, been collected and evaluated manually, and then inputting them into a tool that would present all of those complex calculations accurately.
With the PEC-DSS project still fresh in our minds, we thought now would be a great time to explore key factors and insights into developing successful tools to work with bespoke, large, complex data sets and how to tackle some of the challenges along the way.
Optimising multidisciplinary partnerships
Building collaborative, trust-based relationships with the client and any partners involved is essential to gaining the full understanding of what’s required of the data. Gathering as much knowledge as possible from the client early on will also help to limit changes further down the line.
In the case of the PEC-DSS project, Kevin – the founder of our client, Crops for Energy – had spent years building these incredible manual data sets that provided information about how well multiple species of trees and energy crops grow in local areas. These data sets considered soil type, exposure and land use, identified by a postcode; how much land is needed for each crop in each location to reach a certain yield; and time until yield is delivered.
With so much of Kevin’s time, energy and expertise invested, we had to make sure that mutual trust was firmly embedded before we could start doing anything with the data.
But before we could do that, we needed to make sure we understood the specialist knowledge coming from all of our project partners too.
Understanding Specialist Knowledge
As a developer, having a clear picture is essential before touching the computer. Anyone that works with complex data will know that in order to write something the computer understands, you first need to understand it yourself.
For this project, understanding the purpose and relationships between Kevin’s datasets was crucial to designing the tool. This meant making sure we knew the relationship between the location data – e.g. soil type, average rainfall, shade, and location aspects such as the sunny side of a hill and prevailing winds – and the growth of the species, e.g. yield and speed of growth given location conditions.
We also needed to understand the relationships between the expected harvest in a year, the energy that could be produced on site from the harvest with the existing boiler and equipment, the energy usage needs of the landowner/site, and the cost of growing the crop and harvesting.
There will be times when key information is held only in a specialist’s brain and not in any data or external document. In these instances – and harking back to the importance of building trust with clients – it is even more essential to invest time in building positive relations and open conversations to ensure the full complexity of the calculations can be accurately understood. This can’t always come from a single hour around a table; it takes time to share and build.
The data sets were pre-existing and the calculations were shared over the course of many meetings between the client and the Calvium team. Once we had established a clear picture of the process, translating it into computer format was comparably simple. We then replicated the calculations in code and connected them to the appropriate data sets so data could be pulled from relevant sources as required.
While making the first versions of the calculators in spreadsheets was limited – due to some of the complex calculations not being able to be done easily in this format, it was useful to understand how everything should work.
Working Iteratively
No matter how good your planning is, there will always be surprises. Being ready and flexible in processes is essential.
We saw the source data sets for the first time several weeks into the PEC-DSS project and this produced unforeseen challenges. All of the project partners had contributed different data sets, so the team had to build an understanding of each of them and gain understanding from each of the specialists throughout the project.
For example, we needed to make the location searchable by postcode. The location data came in a format which was incompatible with the desired search process, so we translated the values into a map format, which we could search with coordinates that were relatable to postcodes.
As we weren’t the creator of the original data set and therefore didn’t have a full understanding of it in the first place, taking a test-and-learn approach was especially important during this project.
We were building one layer, passing back to the client for feedback, adding another element and checking that; constantly bouncing back and forth until everything worked and all of those complex relationships matched. It’s a great feeling when that happens.
Technical Challenges
There were three particular challenges that arose during this project, which we’re sure many developers will recognise.
- Understanding the file formats
In addition to the purpose and processes, the team had to understand the bespoke/proprietary file formats, and extract the relevant data in a way that the other elements could work with.
- Converting data – transforming specialist data format into a processable state
We needed to extract data from a location based model – given to us in a proprietary binary format – in order to work with it in our development systems. In this instance, we converted it to an annotated image, which we could then work with.
- Large data processing
Using large data sets consumed a lot of processor resources. While in development, the tool would work on local machines, but when shared with the client on a remote server it was too processor intensive and failed. Our solution involved subdividing the data into smaller subsets, which could be processed individually or in groups as required, reducing power demand. Consequently, the efficient processes we developed during this project will reduce the resource demands of large data projects in future – something we’re very pleased about.
Accessible User Interfaces
Presenting complex data in an easy-to-understand, accessible way can prove to be a challenge. Long, complex forms comprising many questions often put users off interacting, so we needed to figure out how to design the tool in a way that would keep users engaged throughout.
While we didn’t have experience in this particular industry, we were able to apply our pre-existing design knowledge to turn PDF flowcharts into an accessible, attractive user interface.
In our experience, guiding users through simpler stages helps retain their engagement to the end of a process, and so this is the approach we took with the PEC-DSS project.
Key Learnings
Utilising existing skills and sharing skills internally were both integral to making this project a success. We learnt many things along the way, too, which we will take forward and apply to other projects.
For one, the project re-enforced how important it is to communicate and ask the right questions. Without that two-way dialogue, there is no doubt that this project would have been much longer in the making.
On the technical side, we now have the capability and skills ready to process big data ourselves so that user-generated data can be received and fed into the ongoing growth and accuracy of these types of projects. We have also acquired new skills regarding mapping and dealing with specific data types, which we will be able to apply across sectors in future.
Now the PEC-DSS project has been funded as a second stage (now called EnviroCrops), we are excited to be able to put these new skills to use to help advance the project’s net zero goals.