As software developers we initially understand software as a system of
commands, functions and algorithms. This instruction-oriented view of
software aids us in learning how to build software, but it is this
very same perspective that starts to hamper us when we try to build
bigger systems.
If you stand back a little, a computer is
nothing more than a fancy tool to help you access and manipulate piles
of data. It is the structure of this data that lies at the heart of
understanding how to manage complexity in a huge system. Millions of
instructions are intrinsically complicated, but underneath we can
easily get our brains around a smaller set of basic data structures.
For
instance, if you want to understand the UNIX operating system, digging
through the source code line-by-line is unlikely to help. If however
you read a book outlining the primary internal data-structures for
handling things like processes and the filesystem, you’ll have a better
chance of understanding how UNIX works underneath. The data is
conceptually smaller than the code and considerably less complicated.
As
code is running in a computer, the underlying state of the data is
continually changing. In an abstract sense, we can see any algorithm as
just being just a simple transformation from one version of the data to
another. We can see all functionality as just a larger set of
well-defined transformations pushing the data through different revisions.
This data-oriented perspective -- seeing the
system, entirely by the structure of its underlying information -- can
reduce even the most complicated system down to a tangible collection
of details. A reduction in complexity that is necessary for
understanding how to build and run complex systems.
Data sits at
the core of most problems. Business domain problems creep into the code
via the data. Most key algorithms, for example, are often well
understood, it is the structure and relationships of the data that
frequently change. Operational issues like upgrades are also
considerably more difficult if they effect data. This happens because
changing code or behavior is not a big issue, it just needs to be
released, but revising data structures can involve a huge effort in
transforming the old version into a newer one.
And of course,
many of the base problems in software architecture are really about
data. Is the system collecting the right data at the right time, and
who should be able to see or modify it? If the data exists, what is its
quality and how fast is it growing? If not, what is its structure, and
where does it reliably come from? In this light, once the data is in the system the only other
question is whether or not there is already a way to
view and/or edit the specific data, or does that need to be added?
From
a design perspective, the critical issue for most systems is to get the
right data into the system at the right time. From there, applying
different transformations to the data is a matter of making it
available, executing the functionality and then saving the results.
Most systems don't have to be particularly complex underneath in order
for them to work, they just need to build up bigger and bigger piles of
data. Functionality is what we see first, but it's data that forms the
core of every system.
This work is licensed under a Creative Commons Attribution 3

RSS