( transcript for https://www.imagix.com/apps/Learning_Unfamiliar_Code.mp4 ) Learning Unfamiliar Source Code - Reverse Engineering C, C++ and Java [0:00] This video is going to demonstrate how to use Imagix 4D to get up speed on code that's unfamiliar. Overall, Imagix 4D supports C, C++ and Java source code. For the demo, we're going to use a project containing C and C++. As you can see, in this project, there are 50 C++ files, another 36 C files and 83 header files. Altogether, there are 19000 total lines in these files, 9000 of which contain source code. So it's quite a small project, and will give us a good chance to focus on the key features of Imagix 4D. Also, this project ships with Imagix 4D, so once you're done watching this demo, you'll be able to download Imagix 4D and explore these concepts further by yourself. [0:50] Now Imagix 4D supports analysis and visualization of software from very high to very low levels of detail. At the lowest level, you can use the tool's flow charts to unravel the program's if-then-else logic inside of individual functions. Since we'll be working with unfamiliar code, we're going to start at the other end with the highest level views. These are provided by Imagix 4D's Subsystem Architecture diagrams. The diagrams are based on the Subsystem Interface Dependency View, described by Jeff Garland and Richard Anthony in their book Large-Scale Software Architecture, A Practical Guide To Using UML. The default Subsystem Architecture diagrams reveal the as-built architecture, based on the directory or namespace packaging of your code. To generate an as-built architecture, we'll use the architecture wizard. Because this code does not use namespaces, the wizard automatically defaults to directory structure as the starting point. From there, we'll tell it to include both functions and variables in the architecture... And then we give the architecture a name, so that we can distinguish it from other architectures we might create to analyze this code. [2:10] In the resulting diagram, you can see a series of names and shapes representing the subsystems that make up the architecture. The layout of the diagram is very meaningful. A set of subsystems being drawn inside another subsystem indicates that the outer subsystem contains all of the inner subsystems. So for example, the include subsystem consists of the 'hardware', 'model', 'types.h', 'utility' and 'host' subsystems. The relative placement of the subsystems is also meaningful. In the default layout, the flow of control is always downwards or sideways, never upwards. So this indicates that the functions in 'hardware' might call the functions and set or read the variables in 'model', but not vice versa. By displaying the relationships we can see the actual dependencies, and see that the arrows point sideways and down, but not up. But for now, to keep the diagram simpler, we'll turn off the relationships and focus on the layers of the architecture within the diagram. If we simplify this further, we can see that see that there are two parallel layers, 'include' and 'src', that are built on top of 'os'... Returning to three levels, we can see more detail. As we explore the architecture, we can expand out specific subsystems. For example, clicking on the icon in the upper right of 'hardware' expands that subsystem. And we can do the same with 'base'. [3:48] The colors in the diagram are meaningful. The redish subsystems represent directories. The yellowish subsystems represent files. And if we expand a subsystem that represents a file, by again clicking on the right hand corner icon, we can see its contents. The blue rectangles are functions and the green trapezoids represent variables. In addition, there's another subsystem, as you can see from the icon. It's a purplish color, indicating that it represents a class. If we expand it, we can see the functions and variables that it contains. Whatever the degree of expansion, the layout continues to represent the control flow layering. So the functions at the top of this subsystem might call the lower functions or use the lower variables, but not the opposite. As level of detail increases, the overall diagram can grow beyond the size of the window. At this point, the Mapper tab becomes helpful. The Mapper shows the full diagram, and uses a black rectangle to indicate the portion of the diagram currently visible. By dragging the rectangle around the Mapper tab, we can control which portion of the overall diagram is visible. [5:00] To start to study the software, let's return to the simplier 3 level display... One of the challenges is to determine the significance of the various subsystems. The names and layout give us some indication, but we can use the subsystem metrics to get a further sense about the subsystems. These can be displayed within the diagram to provide insight. Let's open the Metrics dialog... Imagix 4D generates a set of metrics about each subsystem. These span a number of attributes. The size metrics give a sense of the scope of a given subsystem. The complexity metrics indicate the amount of computation done in a subsystem. There are metrics for structure that give an indication of the goodness of the architecture. Fan-in metrics show how widely a subsystem is used. And fan-out metrics provide a measure of the external dependencies of a subsystem. [5:58] Let's start by examining some size metrics. Here, we've enable the lines of source code. Once we select a given metric, the subsystems are colored to represent that metric. In this case, we can see that the range of metric values is 27 to 2600. This is the number of lines containing source code in the function definitions in each subsystem. So we can see that 'hardware', being red, contains the most source code. Most of the other subsystems are green or near green, indicating that they have relatively much less source code. Notice that several of the subsystems are blue. This indicates that they don't contain any function definitions. Let's examine another metric, Class Definitions... Here we can see that 'hardware' once again is significantly larger than the other subsystems... Further exploration shows that the 'model' subsystem contains most of the variable declarations... Switching to total complexity, we can see that 'hardware' contains significantly more of the calculations than the other subsystems... But switching to average complexity shows that the functions in 'utility' are somewhat more complex than those of the other subsystems... And switching to the maximum complexity shows that the most complex function is within 'hardware'. But because the maximum value is only 26, we can conclude that the calculations in the code are fairly well partitioned. [7:30] We can combine the diagram expansion with the metrics displays to learn more. Let's go back to the Lines of Source Code metric... Once we see that hardware contains the most source code, we can expand the hardware subsystem. Doing so reveals that the 'base' subsystem is its largest subcomponent. By bringing back the metric dialog, we can see that there are only 788 lines in this subsystem... Let's expand base. And now we can see that each of the displayed subsystems is relatively small compared to utility, which has 681 lines. Once we've got a sense of the scope of the subsystems and where they lie within the overall layering of the software, we might want to study certain aspects of specific subsystems. Let's go back to the 3 level diagram... And let's turn off metric coloring. [8:30] Now let's use the Analyzer feature to learn more about the 'hardware' subsystem. We can focus the Analyzer on 'hardware' through the right mouse button pop-up menu... This brings up the Analyzer tab and focuses it on 'hardware'. The Analyzer tab can be used to examine any symbol in the project - any subsystem, any file, any class, any function, etc. The resulting Analyzer tab will display a set of queries appropriate to the type of symbol being studied. For subsystems. the first queries examine the internal hierachy of the subsystem. Let's create a new graph showing the internal function calls of 'hardware'... We can see that the resulting graph is fairly complex... And if we switch to the Mapper, we can see that the graph extends well beyond what we're currently viewing. So this graph is not too useful for examining the subsystem. Let's switch to the file level abstraction. Here, we're seeing all of the files in 'hardware', and the calls that they make to each other -- meaning that a function in one file calls a function in another function. This starts to contain a manageable amount of information, and we can get a sense of what files are related to what other files. [9:50] Because this project includes C++ code, let's look at a similar graph of class interactions. Here again we can see which classes are related to which other classes, and get a sense of how those relationships are clustered. We can drill down further to understand specific class to class relationships. For example, let's right click on the relationship line between 'bEncodedMotor' and 'bDCmotor'... The diagram shows the function calls that occur between the classes, and the functions that are involved in those. We can expland the Class Diagram to show all members of the two classes... And we can expand it further to display inter-class variable usage as well. [10:48] We can drill down into further detail through the Symbol panel. By clicking on a specific function, we can use the Symbol panel to learn more about the function. Here we see cross reference information for brake, and usage information, as well as look at source code, examine metrics, etc. Another of the Analyzer categories is Use of Members, which shows function calls into the 'hardware' subsystem. The external funcs calling internal funcs creates a very small graph, showing two handler functions calling 'qstat'. This agrees with what we would expect, given that 'hardware' is a top level subsystem. The calls must have been made from a parallel subsystem. Running the "subsystem diagram" query confirms this... Turning on the relationships display adds more detail. [11:48] Here we can use the Dependencies category to study the external dependencies of the 'hardware' system. Let's start with "external functions called by internal functions". In the resulting display, we see two columns. By scrolling to the top, we can see that the left column lists the functions in 'hardware', and the right column shows called functions that are outside of 'hardware'. The resulting graph is small enough to be studied, but let's look at some other ways to examine subsystem dependency information. Selecting "subsystem diagram" generates a diagram showing the same functions, but now organizes them according to where they are in the architecture. Again, the Mapper tab provides valuable context and navigation. We can use Imagix 4D's graphical querying features to simplify the display and study this further. For example, let's assume that we want to study a specific interface, say between 'hardware' and the 'model' subsystem. We could simply select 'model' by left clicking on it... We could then find the functions in 'hardware' that directly depend on 'model' through the Traverse > Step Up menu. The functions that get highlighted in blue are the ones that directly call into 'model'. [13:09] The Analyzer tab offers another way to accomplish this same analysis. Let's go back to the simpler display, and focus the Analyzer tab on 'model'... Now, the final category on the Analyzer tab supports analysis of relationships between two subsystems. And it defaults to the previous subsystem that was the focus, in this case meaning 'hardware'. By selecting "interface functions of the two subsystems", we can see just those functions involved in calls between 'hardware' and 'model'... By changing to an architecture view, we can see these functions within context. And by turning on the relationship display, we can see the specific calls that make up the 'hardware' to 'model' interface. [14:04] With Imagix 4D, there are typically many ways to pursue an analysis, and this is no exception. I'm going to show one more way to do this. Let't go back to the simple 3 level architecture, and turn on relationships. Now let's right click on the specific relationship that we're interested in -- 'hardware' to 'model'. In the resulting pop-up menu, we can choose a graph of the function call relationships. And the resulting diagram once again shows the interface specifics between 'hardware' and 'model'. Of course, at this point, we could continue to examine specific subsystems and interfaces, learning as we explored. [14:49] But let's wrap up this demo and summarize what we've seen. First, the Subsystem Architecture diagrams provide the best high level overview of the composition and layering of the software. Second, displaying the subsystem metrics within the diagram aids in understanding the contents of the various subsystems. And finally, from this starting point, the Analyzer feature and Imagix 4D's drill down capabilities enable further exploration of the code to whatever extent is desired. Please download Imagix 4D to explore further how you can use it to get up to speed on unfamiliar code.