Does the Australian Army Need a Data Corps?

Read

This article discusses the problems associated with utilising data and suggests Army needs a dedicated data corps to handle them.

By Benjamin White

10/07/2024

< 30 mins

All Corps

Introduction

The Australian Army has a data problem. The problem isn’t small or manageable and is getting exponentially bigger every year. The problem stems from the fact that data production is expanding at a rapid rate, both on the battlefield and in a global context. While many are heralding Artificial Intelligence (AI) as the saving grace that can help us manage the overwhelming data/information problem, it’s not that simple.

A familiar axiom in military parlance is that amateurs talk strategy; professionals talk logistics. The explanation is that logistics form the immutable foundation that both strategy and tactics require. This theme carries through to AI, with the foundation being the data and information that feed into AI systems.

This concept is reflected in comments by Craig Martell, the US’s Department of Defense’s first Chief Digital and AI Officer – stating that more attention and energy needs to be paid to data, which serves as the basis of the AI’s hierarchy of needs. Martell states that the “foundation is data, quality data, accessible, clean, regularly updated and with a notion of ownership”. Martell has correctly asserted that whilst the broad flexibility of AI makes it a powerful combat multiplier, AI is only as good as the data it receives and is trained on. Military organisations must have a good idea of how information flows on the battlefield before they can apply a comprehensive AI solution to manage it.

How Important is Information on the Battlefield?

The importance of the relationship between information and warfare has been continuously emphasised and recorded throughout history. Sun Tzu famously wrote that all of warfare is based on deception. Whilst this statement is not wrong, and undoubtedly, the primary objective of any combatant during warfare is to deceive the adversary (to make winning easier), the subtlety of Sun Tzu’s profound statement runs deeper than deception.

Deception could be viewed as manipulating information that is either displayed or portrayed to the enemy to coerce them into an action while the friendly force protects its own sources of information. This information input (and output) is a critical component of warfare, and all decisions that flow from this information input are assessed in context with temporal and uncertainty pressures.

A more accurate statement may be that all warfare is fundamentally based on information, both in the control and use of it.

Given that information dominates all aspects of warfare, the problem set extends to how a military can best filter, control, categorise, and distribute the types of information on the battlefield. Military decision-makers are often faced with complex decisions in non-routine situations – often for which no rules-based solutions exist. These decisions are framed in an environment of uncertainty, and to be effective, decisions need to be made on up-to-date, relevant, and timely information.

Data vs Information

While information and data are interrelated, they are separate and distinct components. The term information can be defined in various ways, but the broadly accepted definition is that information is processed, organised, and structured data presented in a meaningful context.

Information is predominantly fed by data – a collection of raw, unorganised facts and details, like figures, observations, text, symbols, and descriptions of things.

Without context, the process appears straightforward – military commanders take raw, unorganised facts and details, filter, process, and structure them to gain information that can be utilised to decide, translating data into an action or order that creates an effect. Whilst the solution is simple, ensuring commanders have reliable, relevant information at the right time and place has consumed military thinking power for decades. With no clear answer on the horizon, the flexible nature of AI offers a potential solution to the problem.

The growing problem

The concerning thing is that this data problem is increasing at a near-exponential rate. In 1986, the estimated global data generated for the year was 2.6 exabytes. This number has seen an exponential increase over the past four decades, with a recent study by Raconteur estimating that by 2025, the global population will generate 463 exabytes of data per day.

This growth of data is also witnessed on the battlefield. Consider an eight personnel section, each member equipped with some sort of multi-spectral sensor or real-time video recording system. Additionally, the section has several autonomous vehicles – both air and ground supporting them. This section has the capacity to generate 75 Megabytes/second of data. That’s 4.5 Gigabytes per minute, 270 Gigabytes per hour, or 6.5 Terabytes per day. That is a phenomenal amount of raw data, which must be managed or processed in some method.

Raconteur – A Day in Data

This exponential growth is being witnessed in conflicts today, highlighted by the war in Ukraine with a blending of military & civilian data sources. High-resolution imagery from Maxar is being used to track tank convoys, skirmishes posted on TikTok alert data gatherers about battles, and Ukrainian intelligence receives tens of thousands of daily reports on Russian movements.

Big Data

So how do we address the problem? The concepts of ‘Big Data’ have been studied for a number of years, and big data is typically broken down into different components and address data that contains greater variety, arriving in increasing volumes, and with more velocity (the three V’s). These three V’s were first termed in a Gartner blog by Doug Laney, and they have been supplemented with some emerging V’s over the past decade – Value and Veracity. Each aspect is relevant to the modern military information problem.

Volume is a characteristic that indicates the magnitude of data, often measured in terabytes or petabytes of data. The processing capacity of a system tends to throttle the volume of data that a system can receive. In a military context, this often presents challenges due to the limited and contested nature of data transfer networks. This capacity limitation may require the application of edge computing – where data is processed either at the source or near to the source. Edge-computing reduces the requirement to transmit large amounts of raw data.

Velocity typically refers to the rate at which data is in motion and is throttled by the rate of generation or the required speed of analysis and decision support. In a military context, the core limitation is the ability to process the volume of data generated at the velocity required to be effective, with the problem referred to as the “input-output problem”. This limitation can be weaponised, as observed in 2014 when Russia overwhelmed a Ukrainian system by inundating it with more data than it could process.

Variety refers to the types of data that are available, with traditional datasets that were structured or organised into a database. Unstructured and semi-trusted data types such as text, audio, and video require additional pre-processing to derive meaning and support metadata. Due to the broad nature of the operating environment, militaries are exposed to a wide variety of both structured and unstructured data.

Value represents the expected result of processing and analysing the data, which is often low before analysis of the data. Most data possess some form of inherent value, but it is of no use unless the value is drawn out. In an adversarial environment like warfare, the hidden value of aggregating data could provide a significant combat advantage once appropriate analysis has occurred.

Veracity refers to the truthfulness and reliability of the data – given that data can be based on datasets of varying degrees of precision, authenticity, and trustworthiness. In the context of warfare and the potential application of violence against humans this is of utmost importance.

“Good” information vs “Bad” information

The other challenge that is posed with data and information management is that there is no specific measure or guarantee that the data or information will be in any way accurate or correct. The situation is so bad that three individual terms have had to be coined to describe the different types of ‘bad information’. Generally, whilst AI systems can be trained to manage bad information, the more reliable solution is aggregating information from a variety of sources.

Firstly, we have misinformation, typically defined as false, inaccurate, or misleading information deliberately or unintentionally shared or promoted as correct. Whilst sharing misinformation may be deliberate to support viewpoints, the sharer assumes that the information is accurate and correct.

Secondly, disinformation introduces a deliberate component – the information is known to be untrue or misleading, but the author consciously decides to promote fake news or propaganda. Disinformation is meant to influence how people think about divisive subjects such as politics or social concerns.

The final element of bad data is malinformation. Malinformation is accurate and factual information but is conveyed in a manner that aims to inflict harm or threat of harm on a person, organisation, or country.

Unstructured Data

The final complexity is the amount of unstructured data in organisations, with analysts estimating that eighty to ninety per cent of any organisation’s data is unstructured, with the volume of data growing at a rate of forty to sixty per cent per year. It is assessed that militaries face the same challenges of managing these unlabelled databases.

LTGEN John Shanahan summarised this problem, then director of America’s Joint Artificial Intelligence Centre stated in a 2019 interview that whilst some said data was the new oil, it is more like mineral ore – a lot of filtering and analysis is required to get the nuggets.

The Solution

Whilst the challenge is immense and complex, the solution is simple. The Australian Army must get their data in order. This can only be achieved through careful study and analysis of the data and information flows on the battlefield, collating and analysing where required, and applying AI applications at appropriate levels to assist in the coordination.

The challenge though, is that the Army is not currently structured to deal with the complexity of the problem. Data and Information are relevant to all aspects of the organisation, and there is no dedicated organisation focused on data and information management across the enterprise.

A potential solution would be to:

Define a Data Strategy that drives tactical change. Before any significant changes in data management can be conducted, the Army should define a clear data strategy, encompassing the data collection, management, and utilisation principles. This strategy is essential, as typically large datasets of assured data are required to create AI applications.

Establish a task focused data management organisation (Data Corps). The Australian Army should look to create a task focused organisation, that can integrate across the army into all units and entities, with an organisational mandate to study, research, and develop solutions to the Army’s big data problem. This organisation should include data scientists, AI specialists, and military personnel with understanding of operational effects. This “Data Corps” could be absorbed into the Royal Australian Signals Corps, or it could be its own separate corps.

Invest in Edge Computing. Edge computing allows devices in remote locations to process data at the “edge” of the network. By processing data as close to where it is generated will reduce the amount of data that needs to be transported, alleviating pressure on military networks (which are both bandwidth constrained and potentially contested by an adversary). What becomes important is understanding what information needs to be transported – for example, a raw video that depicts an adversary system does not need to be transmitted to a decision-maker. Rather, the video can be processed on site and the relevant information (target data) extracted and transmitted. Understanding what constitutes ‘relevant information’ to the decision-maker/commander is the core of the problem.

Conclusion

Data and information flows are the fuel that will feed the strategic deterrent that AI offers. The Australian Army must prioritise understanding and managing these flows to harness the full potential of AI, ensuring timely, accurate, and actionable insights that can decisively influence the battlefield. Establishing a dedicated Data Corps, defining a robust data strategy, and investing in edge computing are critical steps toward this goal.

The challenges posed by the exponential growth of data are immense, but the benefits of overcoming these challenges are equally significant. By adopting a proactive approach to data management, the Australian Army can maintain a competitive edge, improve decision-making processes, and enhance overall operational effectiveness. This strategic focus on data will not only address current issues but also future-proof the Army against emerging threats in an increasingly data-driven world. Now is the time for the Australian Army to lead by example and transform its approach to data, setting a standard for modern military operations globally.

Biography

Benjamin White

Ben is a Field Artillery officer with experience at synchronising joint fires and effects at the battlegroup, brigade, and divisional levels.

Comments

Post By: Christian Brooker

10/07/2024

Defence has a data strategy, a data division, and army has a data hub. More resources need to be put into all three.

Post By: Jack

10/07/2024

I agree defence needs a data strategy. Army knows it needs data and wants to use it. It has a robotics and autonomous systems strategy which seems like a cart and horse situation. To me, the closer target is data management systems within the enterprise. I think the logical approach is focus on training data first - every qual, skill set, performance indicator, grade and career progression course; all feeding into a massive feed of usable data for command. It’s static, simple(r) and helps prep for the bigger data challenges.

Post By: Carlso

12/07/2024

Basically, no. Army exists to win the land component of joint land combat. Does the DoD require a data ‘corps’ sure. But does that ‘corps’ need to be staffed with blue lanyard signalers on 3 year posting cycles? Where the lessons learned by junior PTEs are lost when the post back into big army; and further lost when det commanders and junior officers post in from big army. And should we water down our recruitment with another ECN that offers limited value in the land combat domain?
Expenditure on buzzwords, which are well outside army’s traditional remit; come at real cost to army’s ability to actually generate the capability it needs to meet its core role.

Army also, in its current structure directly collects limited amounts of the kind of raw data (outside some key capabilities) this kind of expenditure would justify.

To me, this problem set is defence wide and better served with a DoD civilian workforce (or given time, AI) to service.

Reply by: Brendan

13/11/2024

What I like about this response is it suggests:

"To me, this problem set is Defence wide and better served with a DoD civilian workforce (or given time, AI) to service."

But my question is - how is the AI going to learn what data is important to Army?

We are in the nascent stage of using data within Defence. While there is immense potential from data, it is fair to suggest Army that without investment in a workforce - uniform, civilian, or contractor - to identify, refine, and validate its future data needs we will just continue to overlook opportunities. This is no different to any military technology - we wouldn't leave the development of our next service rifle to someone that has only shot civilian sporting rifles.

The author has done a good job casting a spotlight on a important challenge for Army, and offers some potential types of solutions, acknowledging there is still more work to go. We shouldn't get so caught up on the words, we overlook the message.

Post By: Brett

16/07/2024

Yes, there is a need to simply and modernize the people, processes and technology used to support data and information management, but an Army Data Corps is not the answer. Any approach needs to be ADF wide and leverage a skilled team and common platforms/processes. Data engineering skills are in demand, talent is short, pers are expensive and they want to work on leading edge data engineering and analytics capabilities such as those from the hyperscalers (Azure, AWS, Google etc) – not work on legacy products in a complex environment that is resistant to change. Leading technology providers pay $200K + for quality data engineering resources (architect/ Data Scientist/ AI/ML/NLP Engineer etc) – so it will be hard for the ADF to compete for the limited talent. Posting of uniformed pers kills any progress, with the expert Data Engineer, that spends years acquiring skills, posted to a role that does not use these skills. The ADF centre of gravity in Canberra means it is competing for data engineers in the most expensive and talent constrained region in Australia - if our focus is the North, base the teams in Darwin/Townsville.

The ADF’s Defence Date Strategy is weasel words that has delivered little capability (as did the previous IM strategy); the first line in the strategy vision states “Data-driven insights becoming an embedded part of our work”. The services all seem to do their own IM thing (different platforms, processes etc). Units depend on massive volumes of unstructured, poorly governed data, a records management system that uses a 20 year-old architecture, excel, end of life Sharepoint. A good start would be to turn the Defence Data Strategy into actionable plans that demonstrate value at unit level - not just in Canberra. The ASI on IM needs to be more than just records management and a grab bag of compliance tasks that units are not resourced to deliver.

Post By: Erica01

12/08/2024

Benjamin, you raise some interesting points, which will become increasingly pertinent, particularly as the proliferation of data and AI continues to contribute to digital complexity, both in the battlespace and everyday operations. I agree with Brett’s well-made comments that what’s lacking are actionable plans at the unit level; I would further add that many organisations at formation level and below lack the right personnel in appropriate roles to manage data and information effectively. While we don’t necessarily need to create another ‘Corps’ or ECN (Enterprise Capability Network), we must focus on managing the entire lifecycle of vast data quantities, which is harnessed as information and employed as knowledge. Importantly, this responsibility extends beyond simply a ‘blue-lanyard’ role, although collaboration between Information Technology specialists, Information and Records Management staff and general staff at all levels will be essential.

Post By: Anonymous

01/11/2024

DDG have failed to democratise analytical insights into the armed forces. Their 1DD is a basket case with outdated technology and constantly missing delivery dates due to lack of staff. They can't hire any more techs to build the platform as there is no-one in the market with the skills in the products. Thus the platform has a sustainability issue that will only grow bigger into the future. I've seen some great data initiatives within Army. Very smart, innovative people. But they get pushed aside as everyone believes in this mystical 1DD project that will be all things to all people.

Biography

Art & Science of Thinking

Art & Science of War

Leadership, Ethics & Society

STEM

Organisation, People & Projects

Officers

NCO/WO

Other ranks

Written

Video

Audio

Mixed media

Announcement

AAC

All Corps

Cbt Arms

Cbt Spt

Health

Log

Pers Spt

Biography

The Tourniquet Conversion-Replacement Drill: Audiovisual

Signalling the Future: Royal Australian Corps of Signals and the Next Century of Australian Space Power

Cert Cito Ex Astris: RASigs Centenary - Space Domain Essays Call for Submissions

Soldier's 5: The Tourniquet Traffic Light Audiovisual