Requirements Analysis and Specification: dataflow diagramming notation
Dataflow diagram example
The dataflow diagram above is taken from the Acme Fashion feasibility study. It's the top level (aka level 1) current physical version. It's a good enough basis to describe the notation used in DFDs, though doesn't show every variant (see Dataflow Diagram Variants later on the page) of each symbol in use.
The enclosing box represents the scope of the area under study on this DFD. At the top level it is the same boundary as shown on the context diagram. At lower levels it is the boundary of the process that is being decomposed (see DFD Decomposition further down the page).
For the top level DFD, as shown here, in the boxed-off area at the top goes the project name or some derivative of it and the DFD variant (i.e. Current Physical, Current Logical, Required System). For all other levels, the process name being diagrammed is used, and to its left goes its process number.
External entities represent people, IT systems, or other organisations that are external to the system under study and are the source / recipient of data.
Note that if an external entity is an IT system, then you're talking about some form of interface, maybe manual, maybe automated.
To help stop the confusion caused by crossing lines on the diagram, it often helps to use the same external entity more than once. If you do, you use a duplicate external entity symbol for all of them - it's the one with the diagonal line at the top left corner, This is so the reader knows if they see one, that it's not the only one.
Processes transform data. The symbol is a rectangle with a couple of boxed-off areas at the top. In the top left hand box goes the process number. This is purely an aid to referencing it and in no way implies sequence. I'll state that again; the process numbers mustn't be used to imply sequence - properly formed DFDs do not, cannot be used to show sequence.
Making up the numbers
On the top level the number will be a single digit (you'll see why you don't go to more than five or six processes on a diagram in DFD Decomposition, later on the page). On lower levels it will start with the number of the higher level that is being decomposed, followed by a period and then a unique number, e.g. 3.1, 3.2, etc. for the processes on the decomposition of process 3, Control Stock.
Location, location, location
In the box to the right of the number goes the location where the process is conducted or the name of the role carrying out the process (it's only used in the current physical DFD variation).
What's in a name
In the main area of the box goes a phrase with an active verb that really describes and gets to the nub of what the process is about.
It's worth spending time to get this right, making sure it's right in the minds of the business representatives. The act of doing so will clarify in your mind that you're structuring the diagram correctly and will also help by getting the users solidly involved (user buy-in is vital to success - they have to feel ownership of the model that is developing).
Note the diagonal line and asterisk at the bottom rightcorner. This means that this particular process is not further described on a more detailed diagram.
A datastore is somewhere where data comes to rest. The only way that data can get to or from a datastore is via a process.
The symbol is an open-ended rectangle with the datastore's name. A boxed off area on its left holds a reference number. The number is preceded by one of:
- C stands for Clerical (or manual)
- D is an automated / computerised datastore
- T is for temporary
Strictly speaking, a temporary data store means that a record is removed if it is read - a stick pin is a good example. (With legacy systems, some computerised stores are temporary. In these cases, T takes precedence.)
You only show those datastores that are shared between processes. Private datastores (entirely private to a particular process) will only be shown if / when you decompose that process into its own DFD.
If a datastore is private, therefore appearing only on decomposed diagrams, its number is preceded by the number of the process that is being decomposed, and a period. So if process 3, Control Stock, were decomposed, its private datastores would be numbered C3.1, C3.2, D3.1, D3.2 etc.
Keeping it clean
Again, to prevent awkward crossing lines on the diagram, you can use the same datastore more than once. If you do, you'll use the duplicate datastore symbol, the one with the extra line down the left hand side.
An example of a datastore in use: When the stockroom receives the dispatch notes from Sales Operations, they pick the items listed and make up a parcel. They then update the stock master file, C2, by changing on it the number recorded in stock (for each dispatched item) to the new level.
In or out?
Finally, when does a datastore appear outside the boundary (like C1 does)? The answer is, if it is used by another process outside that boundary. In the case of Acme Fashion Supplies, the accounts department is out of scope but also uses C1, Customer Master File so the datastore is outside.
On the other hand, C2, Stock Master File, is only used by processes within the boundary and so stays within it.
A dataflow shows data on the move. It's a line with an arrow head at one end (one-way flow) or at both ends (two-way flow).
Entities (or parts of them) shown on the logical data structure appear in these flows.
One way street
On any level other than the lowest level, it is quite ok to show two-way dataflows. This can help to keep clutter down but is imprecise (by their nature, higher level DFDs aggregate processing, so it wouldn't be unreasonable to expect dataflows between diagram objects to be aggregated, too).
On diagrams that aren't decomposed further, only one-way flows are ok.
In a predominantly manual system, the dataflows on the current physical DFD are often named after the paperwork that they represent, for example, "Dispatch Note", "Sales Order", "Returns Note".
Thou salt not...
Dataflows can occur only between:
- External Entity and Process
- Process and Datastore
- Process and Process
If you think you've got one of these, double check - in a manual system this would be equivalent to someone stopping what they're doing in one process and either running over to someone in another process with some pieceof information, or, more likely, starting up another process themselves with that piece of information as the trigger. More usually process-to-process flows have a datastore in between - like an out-tray, for example.
Occasionally, where helpful, dataflows between external entities can be shown. If you must. But only where it adds clarity. Remember, if it's outside the boundary of the top level, it's outside the scope of your study.
One last point, if you need to show that a datastore is updated, you only show a flow going into it - you don't need to show a flow out of it (representing the record that has to be read to be updated). The exception to this would be that if the read was needed to allow some other processing to be carried out before writing an update back.
In all but the very simplest of systems one DFD will not be enough to diagram all you need to. At the Feasibility Study stage it isn't uncommon for there just to be a top level but more usually there will be a levelled set of DFDs. In a set, each process on the top level (aka level 1) gets decomposed into its own (level 2) diagram. In larger business systems, processes on level 2 DFDs may in turn be decomposed. You may need to go down as far as level 3 or even 4 in larger projects.
KISS (keep it simple, stupid!)
The guiding principle is readability. You don't want more than five or six processes on a diagram. Any more and it all starts to get cluttered and impenetrable (it would also help alienate the key people on the project- the business representatives who have to help develop, read and review them).
How do you know when to stop decomposing? (as the actress said to the dead bishop!) Experience will guide but indicators are - no more two way dataflows, one process per inbound flow. Also, think of this. If you have an average of four processes per DFD, that's 4 on the top level, 16 on all the level 2s, 48 by the time you get to level 3. Hmmm, that's a lot of diagrams!
In practice, most will go to level 2, some will need a level 3 and even fewer may go to level 4. Of course, if you're part of a team doing a full study for something as large as the NHS Records project, level 4s will be the order of the day.
There are differences in the way you use the symbols in the various DFD variants you'll produce as an investigation proceeds. These variants are Current Physical, Current Logical, Required System.
In the early stages you'll build a current physical DFD. On these you'll be recording, warts and all, the ways things are presently (often known as the "As-is"). It will have all sorts of un-logical things to show and will reflect how things are done currently. There will likely be processes that exist purely because things are done manually. Datastores (such as stick pins or copy orders file etc.) will exists because of how things are presently done.
- The people carrying out the processes are considered to be a part of them and are therefore not shown as External Entities
- Clerical and temporary datastores will likely feature heavily
- What you document on the DFD won't seem to be very logical
(I've even found processes that aren't actually needed, but still are done because no one told the staff they were no longer needed!)
- There will be a lot of 'how' expressed in the DFDs - something to be avoided in all but the current physical
- Processes will all have the 'location' box filled in
- Dataflow names will likely correspond to document names
- There are sure to be process to process flows
- Dataflows between external entities will possibly feature
- Entities from the Logical Data Structure may appear in more
than one datastore
In Acme Fashion, a sales order may sit in an in-tray (and therefore a datastore, as data has come to rest) before being dealt with and filed in the Customer Master File.
In the current logical DFD variant, you start with the current physical and remove all aspects that are to do with "how" things are done presently - you need to be showing only the "what" aspects.
For example, because of physical constraints, a mailshot in ACME Fashion Supplies has to be stuffed into envelopes and then sorted into "mailwalk" order before being given to the Post Office (because this attracts a significant discount on the postage costs).
In the current logical DFD you'd not need a manual sort process because that depends on continuing to use the Post Office - maybe the required system won't use postal mail at all. The important thing here is that the business wants to communicate en-masse with its customers. Postal mail is just one solution.
Another way of putting it is that your are trying to remove all solution-specific processes / datastores / dataflows so that, later on, other solutions can be considered.
So, on a Current Logical DFD:
- You'll not be showing Clerical or Temporary datastores. Only D type datastores will feature
- Processes won't use the location box
- Each Entity from the Logical Data Structure will appear in one and only one datastore.
By the way, if you have the situation that a particular process is performed based on time, such as
- A monthly update flow to an external entity
- A weekly de-duping / cleaning of email addresses
..you can show this by using a clock face icon as an external entity with a triggering dataflow coming into the process. This is much the clearest way of showing this. (And I suppose if you use more than one on the same diagram, you'll need a diagonal line through the top left. Stick with good practice, eh?)
The required system DFDs first appear in Business System Options and include decisions about precisely where the boundary between the automated aspects of the required system and the rest lies.
- You'll have, likely for the first time, user / roles shown as
They're no longer doing the transformation of data (which is what a process does) they're driving it, and no doubt providing some of the information into the process. Of course, other external entities will still appear on your diagram as appropriate.
- Then pretty much the same rules as per the Current Logical.
- Oh, and always keep in mind that DFDs are a vehicle for analysis and communication, not design. There are more powerful tools for that, primarily Function Definition.
In action and in theory
You might like to look at the Acme Fashion Supplies case study that these examples were taken from or the Requirements Analysis and Specification briefing study where they're described.
When you've worked through the materials, you can see how well you've got on by testing yourself with these:
briefing / case study quiz
- dataflow diagram notation quiz
- ...or do the exercises on dataflow diagramming and logical data modelling