Data Flow Analysis - Usage and Limitations

Handling of Data Types

Structs and unions in C/C++ along with classes in C++ together make up a set of symbol types called aggregate variables. Several of the Task Flow Check reports have options to control how these aggregate variables are reported. If you include the Container Summary through the report options, then any member of an aggregate variable can contribute to the container variable summary. For example, if one member of a struct is set in one task and a different member of the same struct is set in another task, then no members will showing up in the Variables Set in Multiple Tasks report but the containing struct variable will.

On the other hand, arrays are treated as single variables by these reports. This is because array indices can be dynamic expressions. Therefore, generally it is not possible to distinguish which array component has been assigned.

Through the "Analyze Union/Bitfield Members Separately" option, you control how the analysis will treat members that are part of a union or bitfield. When enabled, union members are tracked as separate independent variables like they would be in a struct. When disabled, the members are considered to be the same variable, as with an array. In the latter case, an assignment to a union member un.member1 is translated into assignments to all union members, such as un.member2, etc. This option also applies to bitfields. In typical computer architectures, assigning a bitfield requires updating the whole word or byte in memory. The assignment could therefore impact the other neighboring bitfields if there is concurrent assignment to another bitfield from another task.

Analysis of Pointer Expressions

As much as is feasible via static analysis, these reports track which objects are referenced by pointers. Pointers or reference parameters track their actual parameter variables and derive potential changes that way. Pointers set to an array and used to index through an array cause the array to be modified. These reports assume "clean" pointer usage, i.e. pointers are set to a certain variable and operate just on that variable but don't use side effects to access independent variables which are allocated in neighboring memory areas.

This static analysis of pointer expressions can be time and resource intensive. So a user setting controls which of four levels of analysis is applied:

Basic Pointer Expressions -- Basic mode tracks single pointers set to variables. This includes parameters that are pointers. Single dereference of pointers (*ptr) is evaluated but multiple dereferences (**ptr or *(ptr1->ptr2)) are not. Pointers passed on to other pointers are only tracked over one step (ptr1 = &var; ptr2 = ptr1; *ptr2).

Standard Pointer Expressions -- Standard mode tracks single or double pointers set to variables. This includes parameters that are pointers. Single or double dereference of pointers (*ptr, **ptr) is evaluated but more than 2 dereferences (***ptr or *(ptr1->ptr2->ptr3)) are not. Pointers passed on to other pointers are tracked (ptr1 = &var; ptr2 = ptr1; *ptr2).

Complex Pointer Expressions -- Complex mode tracks up to 5 levels of pointers and dereferences. Pointers passed on to other pointers are tracked at these levels (ptr1 = &var; ptr2 = &ptr1; ptr3 = ptr2; ptr4 = &ptr3; ***ptr4).

Complex and Hidden Pointer Expressions -- Is the same as Complex mode except that it also considers integer variables of pointer length as pointers. This mode is for applications that store pointers in integer variables using unions or casts. Hidden pointers are very unusual. This mode is very expensive to compute because the number of variables to track as pointers increases dramatically.

Handling of Function Pointers

When analyzing source code that uses function pointers to make calls, these reports track all possible assignments to that pointer throughout the whole program. They then assume that for any call through a function pointer, any one of the possible functions found in the global analysis could be called. This will most likely extend the scope of any reports beyond what actual results would be. For example, if function pointer variable fp is set to func1 in one task, then called, then set to func2 in another task and called, these reports assume that both func1 and func2 are called in both tasks. Hence, any global variables set in either func1 or func2 will show up under both tasks. This generalization of calls through function pointers applies to all reports where function pointers play a role:

Unused Variablesno limitations
Uninitialized Variables Readthe reading of variables might be exaggerated
Useless Assignmentssome useless assignments might not be recognized
Variable Dependenciesextraneous calculation steps might be listed

Variable Flow Between Tasksthe use of variables in certain tasks might be exaggerated
Variables Set in Multiple Tasksextraneous variables or extraneous locations might be reported
Out of Step (Z) Variablesextraneous out-of-step variables might be reported
Reentrant Functionsextraneous functions might be recorded as reentrant
Functions Not Used in Tasksunused functions might be under reported
Mismatched Critical Regionsextraneous mismatches might be reported
Calls in Critical Regionsextraneous called functions might be reported
Events Transitions Between Tasksextraneous wait fors, posts and clears might be reported
Event Calls in Tasksextraneous wait fors, posts and clears might be reported

Note that with the exception of the Useless Assignments report, these extraneous reported instances are false positives but the reports would not miss any true negatives.

In tracking calls through function pointers, the default behavior of the source code analyzer is to only record assignments to pointers that have a function pointer type. Sometimes, an application will also use "void *" pointers to store function pointers. In such cases, you can add the source analyzer option -voidfptr, so that such assignments are recorded and thereby known to the data flow analysis.

Analysis Messages

Certain of the Flow Check reports may contain the warning:

Statement or partial statement in file.x line xx not reachable and eliminated from analysis
These warnings do not indicate a problem with the data flow analysis. They simply flag any lines of code which are not reachable, in case that is an unintended situation. The unreachable lines are also the focus of the Unreachable Statements report, and occur in the following cases:
  • A condition statically evaluates to true or false and there is code in the other branch
  • Any code occurs after control transfers such as goto, break or continue

The warning message would be issued for the indicated lines in this example:

#define TRUE 1
#define TEST TRUE
void funcA () {
    int condition1, condition2;
    if (TEST) {
    } else {
        not_reached_due_to_condition();  /* flagged as unreachable */
    while (condition1) {
        if (condition2)
        else continue;
        occurs_after_control_transfer();  /* flagged as unreachable */

Memory Requirements

Analyzing data flow within function control logic and across function boundaries, the Flow Check Reports are very compute intensive. Both time and memory consumption can be large, even for medium-sized programs.

Memory requirements can extend beyond the 2 or 4 GB limitation of a 32-bit operating system. To handle larger projects, it's recommended that you use of 64-bit versions of Imagix 4D on 64-bit operating systems and assign 16 GB of virtual memory, with at least 8 GB and preferably 16 GB of physical memory. In addition, Imagix 4D data flow engine includes its own virtual memory system. Memory used for the data flow analysis is paged to a hard drive. This is described in more detail in Project Resources.