All about Dataflow Diagrams and Notation
The dataflow diagram above is taken from the Acme
Fashion feasibility
study. It's the top level (aka level 1) current physical version. It's
a good enough
basis to describe the notation used in DFDs, though doesn't show every
variant (see Dataflow Diagram
Variants later on the page) of each symbol in use.
Enclosing box
The enclosing box represents the scope of the area
under
study on this DFD. At the top level it is the same boundary as shown on
the context
diagram. At lower levels it is the boundary of the process that is
being decomposed (see DFD
Decomposition further down the page).
For the top level DFD, as shown here, in the
boxed-off
area at the top goes the project name or some derivative of it and the
DFD variant (i.e. Current Physical, Current Logical, Required System).
For all
other levels, the process name being diagrammed is used, and to its
left goes its process number.
External Entities
External
entities represent people, IT systems, or other organisations
that are external to the system under study and are the source
/ recipient of data.
Note that if an external entity is an IT system,
then you're talking
about some form of interface, maybe manual, maybe automated.
To help stop the confusion caused by crossing
lines on
the diagram, it often helps to use the same external entity more than
once. If you do, you use a duplicate external entity symbol for all of
them - it's the one with the diagonal line at the top left corner, This
is so the reader knows if they see one, that it's not the only one.
Processes
Processes
transform data. The symbol is a rectangle with a couple of boxed-off
areas at the top. In the top left hand box goes the process number.
This is purely an aid to referencing it and in no way implies sequence.
I'll say that again; the process numbers mustn't be used to
imply
sequence - properly formed DFDs do not, cannot be used to show sequence.
Making up the numbers
On the top level the number will be a single digit
(you'll see why you don't go to more than five or six processes on a
diagram in DFD Decomposition,
later on
the page). On lower levels it will start with the number of the higher
level that is being decomposed, followed by a period and then a unique
number, e.g. 3.1, 3.2, etc. for the processes on the decomposition of
process 3, Control Stock.
Location, location, location
In the box to the right of the number goes the
location
where the process is conducted or the name of the role carrying out the
process (it's only used in the current physical DFD
variation).
What's in a name
In the main area of the box goes a phrase with an
active
verb that really describes and gets to the nub of what the process is
about.
It's worth spending time to get this right, making
sure
it's right in the minds of the business representatives. The act of
doing so will clarify in your mind that you're structuring the diagram
correctly and will also help by getting the users
solidly involved
(user buy-in is vital to success - they have to feel ownership of the
model that is developing).
Note the diagonal line and asterisk at the bottom
right
corner. This means that this particular process is not further
described on a more detailed diagram.
Datastores
A
datastore is somewhere where data comes to rest. The only way that data
can get to or from a datastore is via a process.
The symbol is an open-ended rectangle with the
datastore's name. A boxed off area on its left holds a reference
number. The number is preceded by one of:
- C stands for Clerical (or manual)
- D is an automated / computerised datastore
- T is for
temporary
Strictly speaking, a temporary data store means that a
record is removed if it is read - a stick pin is a good example. (With
legacy systems some computerised stores are temporary. In these cases,
T takes precedence.)
You only show those datastores that are shared
between
processes. Private datastores (entirely private to a particular
process) will
only be shown if / when you decompose that process into its own DFD.
Numbering datastores
If a datastore is private,
therefore appearing only on decomposed
diagrams, its number is preceded by the number of the process that is
being decomposed, and a period. So if process 3, Control Stock, were
decomposed, its private datastores would be numbered C3.1, C3.2, D3.1,
D3.2 etc.
Keeping it clean
Again, to prevent awkward crossing lines on the
diagram,
you can use the same datastore more than once. If you do,
you'll use the duplicate datastore symbol, the one with the extra line
down the left hand side.
An example of a datastore in use: When the stock
room
receives the dispatch notes from Sales Operations, they pick the items
listed and make up a parcel. They then update the stock master file,
C2, by
changing on it the number recorded in stock (for each dispatched item)
to the
new level.
In or out?
Finally, when does a datastore appear outside the
boundary (like C1 does)? The answer is, if it is used by another
process
outside that boundary. In the case of Acme Fashion Supplies, the
accounts department is out of scope but also uses C1, Customer Master
File so the datastore is outside.
On the other hand, C2, Stock Master File, is only
used by processes within the
boundary and so stays within it.
Dataflows
A
dataflow shows data on the move. It is a line with an arrowhead at one
end (one-way flow) or at both ends (two-way flow).
Entities (or parts of them) shown on the logical data
structure
appear in these flows.
One way street
On any level other than the lowest level, it is
quite ok to show two-way dataflows.
This can help to keep clutter down but is imprecise (by their nature,
higher level DFDs aggregate processing, so it wouldn't be unreasonable
to expect dataflows between diagram objects to be aggregated, too).
On
diagrams that aren't decomposed
further, only one-way flows are ok.
Paperwork
In a predominantly manual system, the dataflows on
the current physical DFD are often named after the paperwork that they
represent, for example, "Dispatch Note", "Sales Order", "Returns Note".
Thou salt not...
Dataflows can occur only between:
- External Entity and Process
- Process and Datastore
- Process and Process
If you think you've got one of these, double check - in a manual system
this would be equivalent to someone stopping what they're doing in one
process and either running over to someone in another process with some
piece
of information, or, more likely, starting up another process themselves
with that piece of information as the trigger. More usually
process-to-process flows have a datastore
in between - like an out-tray, for example.
Occasionally, where helpful, dataflows between
external
entities can be shown. If you must. But only where it adds clarity.
Remember, if it's outside the boundary of the top level, it's outside
the scope of your study.
One last point, if you need to show that a
datastore is updated, you only show a flow going into it - you don't
need to show a flow out of it (representing the record that has to be
read to be updated). The exception to this would be that if the read
was needed to allow some other processing to be carried out before
writing an update back.
DFD
Decomposition
In all but the very simplest of systems one DFD
will not be enough to diagram all you need to. At the Feasibility Study
stage it isn't
uncommon for there just to be a top level but more usually there will
be a levelled set of DFDs. In a set, each process on the top level (aka
level 1) gets
decomposed into its own (level 2) diagram. In larger business
systems, processes on level 2 DFDs may in turn be decomposed. You may
need to go down as far as level 3 or even 4 in larger
projects.
KISS (keep it simple, stupid!)
The guiding
principle is readability. You don't want more than five or six
processes on a diagram. Any more and it all start to get cluttered and
impenetrable (it would also help alienate the key people on the project
- the business representatives who have to help develop, read and
review them.
How do you know when to stop decomposing? (as the
actress said to the dead bishop!) Experience will guide but indicators
are - no more two way dataflows, one process per inbound flow. Also,
think of this. If you have an average of four processes per DFD, that's
4 on the top level, 16 on all the level 2s, 48 by the time you get to
level 3. Hmmm, that's a lot of diagrams!
In practice, most will go to level 2, some will
need a level 3 and even fewer may go to level 4. Of course, if you're
part of a team doing a full study for something as large as
the NHS Records project, level 4s will be the order of the day.
Dataflow
Diagram Variants
There are differences in the way you use
the
symbols in the various DFD variants you'll produce as an investigation
proceeds. These variants are Current Physical, Current Logical,
Required System.
Current Physical
In the early stages you'll build
a current physical
DFD. On these you'll be recording, warts and all, the ways
things are
presently (often known as the "As-is"). It will have all sorts of
un-logical things to show and will reflect how things are done
currently. There will likely be processes that exist
purely because things are done manually. Datastores (such as stick pins
or copy orders file etc.) will exists because of how things are
presently done.
So:
- The people carrying out the processes are
considered to be a part of them and are therefore not shown as
External Entities.
- Clerical and temporary datastores will likely
feature heavily
- What you document on the DFD won't seem to be
very logical
(I've even found processes that aren't actually needed, but still are
done because no one told the staff they were no longer needed!).
- There will be a lot of 'how' expressed in the
DFDs - something to be avoided in all but the current physical.
- Processes will all have the 'location' box
filled in
- Dataflow names will likely correspond to
document names
- There are sure to be process to process flows
- Dataflows between external entities will
possibly feature
- Entities from the Logical Data Structure may
appear in more than one datastore
In Acme Fashion, a sales order may sit in an in-tray (and
therefore a datastore, as data has come to rest) before being dealt
with and filed in the Customer Master File.
Current Logical
In the current logical DFD variant, you start with
the current physical and remove all aspects that are
to do with "how" things are done presently - you need to be showing
only the "what" aspects.
For example, because of physical constraints, a
mailshot in ACME Fashion Supplies has to be stuffed into envelopes and
then sorted into "mailwalk" order before being given to the Post Office
(because this attracts a significant discount on the postage costs).
In the current logical DFD you'd not need a manual
sort process because that depends on continuing to use the Post Office
- maybe the required system won't use postal mail at all. The important
thing here is that business wants to communicate en-masse with its
customers. Postal mail is just one solution.
Another way of putting it is that your are trying
to remove all solution-specific processes / datastores / dataflows so
that, later on, other solutions can be considered.
So, on a Current Logical DFD:
- You'll not be showing Clerical or
Temporary datastores. Only D type datastores will feature
- Processes won't use the location box
- Each Entity from the Logical Data Structure
will appear in one and only one datastore.
By the way, if you have the situation that a
particular process is performed based
on time, such as
- A monthly update flow to an external
entity
- A weekly
de-duping / cleaning of email addresses
..you can show this by using a
clock face icon as an external entity with a triggering dataflow coming
into the process. This is much the clearest way of showing this. (And I
suppose if you use more than one on the same diagram, you'll need a
diagonal line through the top left. Stick with good practice, eh?)
Required System
The required system DFDs first appear in Business
System Options and include decisions about precisely where the boundary
between the automated aspects of the required system and the rest lies.
So:
- You'll have, for the first time, user / roles
shown as external entities.
They're no longer doing the transformation of data (which is what a
process does) they're driving it, and no doubt providing some of the information into the
process. Of course, other external entities will still appear on your diagram as
appropriate)
- Then pretty much the same rules as per
the Current Logical.
- Oh, and always keep in mind that DFDs
are a vehicle for analysis and communication, not design. There are
more powerful tools for that, primarily Function
Definition.
|