Step 2 - Extract from Makelog Approach

Under the Extract from Makelog approach, Imagix 4D pulls information from a log of the command line invocations issued for your compiler as your build system compiles your source code. A subset of the commands to the compiler, in particular the names of the source files and their associated -I, -D and -U arguments, is used by Imagix 4D's source analyzer to analyze your code.

Imagix 4D contains some specialized support for working with the Microsoft Visual Studio and the Microsoft Build tools. Instructions specific to this support are indicated by MSBuild flags in the following.

There are two alternatives for creating such logfiles:

Command Capture: You may be able to generate a logfile directly as part of your build process. For example, as you run your make system, you might redirect the commands it echoes into a logfile. Or you may be able to set your compiler to run in verbose mode and to capture the echoed compiler commands to a logfile. Some build systems automatically make a record of the compilation directives, such as a JSON Compilation Database.

Compiler Monitor: Alternatively, you can use Imagix 4D's imagix-cc-monitor utility to record invocations of the compiler. The utility monitors system level processes. By setting the utility to identify the compiler(s) that you're using with your source code, all the compiler invocations can be recognized and captured to a logfile. See Appendix C for more detail about running imagix-cc-monitor.

Once you have correctly configured the Extract from Makelog settings and successfully processed your buildlog, the processing results are stored and you're able to direct Imagix 4D to use those extracted results to analyze your source code.

2a. Generate a logfile that contains the compiler commands

Note: This step 2a applies if you're using the Command Capture alternative. For the Compiler Monitor alternative, see Appendix C for details about running imagix-cc-monitor utility -- the resulting logfile will contain all the necessary data.

MSBuild: See Microsoft Build logfiles for detailed instructions for generating the necessary makelog while you build your code. The resulting logfile will contain all the necessary data.

The Extract from Makelog approach requires that you have a logfile containing the commands your build system issues as it compiles your source code. The most important of these commands are those issued to your compiler. Also required are commands causing a change of directory, so that absolute paths can be accurately calculated from any relative paths passed to the compiler.

The actual process involved in creating such a logfile varies significantly between build environments. Some build environments explicitly create a specific build record, such as a JSON Compilation Database or particular formats of a Ninja buildlog. The Extract from Makelog approach has filters for dealing with those directly.

But otherwise, because the specifics of capturing compile commands differ greatly, only the theory is presented here. When make is run, each command line required for a build is echoed to standard out and then executed. The general approach for generating a logfile is to invoke the make that compiles your code and to capture or redirect the echoed commands into a file. If you are currently running make with -silent or -quiet type options, you will need to disable those so that the make commands are echoed and can be logged. There may be additional options you need to add to cause your build system to echo the commands it is executing.

Once you've created the logfile, you'll need to confirm that it contains all of the elements required by Imagix 4D to analyze your code. The logfile will need to specify all of the the source files being compiled. For each of the compile commands, there will need to be an indication of the include directories and macro definitions used.

Here's a simplification of the essential elements required in a build log's compile command.

compiler -I../dir1 -Dmacrodef sourcefile.c

In the following small sample, as it might actually appear in a logfile, each of the essential elements exists.

make[2]: Entering directory `/home/fredn/projects/uclib/armv-gnueabi/drivers/power'
/usr/bin/gcc -g -Os -fno-strict-aliasing -fno-common -ffixed-r8 -msoft-float
    -D__KERNEL__ -DTEXT_BASE=0x80e80000 -I/home/fredn/armv-gnueabi/u-boot/include
    -isystem /home/fredn/armv-gnueabi/toolchain/lib/gcc/armv-gnueabi/4.4.5/include
    -fno-builtin -nostdinc -DCONFIG_ARM -D__ARM__ -march=armv5 -mno-thumb-interwork
    -Wall -Wstrict-prototypes -fno-stack-protector -o iex_power.o iex_power.c -c
make[2]: Leaving directory `/home/fredn/projects/uclib/armv-gnueabi/drivers/power'

You can see that buried within a lot of additional options, this sample contains all of the essential elements. The actual compilation command occurs on the second line, identified by the leading /usr/bin/gcc. There is a source file iex_power.c specified, as well as some -I include directory options and some -D macro definition options.

The first and third lines are also important, as they indicate where the source file is located. When the compilation command uses relative rather than absolute paths for either the source file or the include directories, knowledge of the current working directory is necessary in order to derive the actual directory location.

Typically, the logfile you capture contains a large number of commands generated by the make system. The following Extract From Makelog process acts as a filter, extracting the information about source files and their related options, along with the directory context. The results of this extraction are stored in a file, and that file is referenced by Imagix 4D's source analyzer when it later analyzes your source code.

2b. Specify that you're adding a new data source

Open the Data Sources dialog (Project > Data Sources). To specify that you wish to add a new data source into the project rather than modify the settings for an existing data source, choose `+ add a data source' under the Data Sources selector on the left side of the dialog.

2c. Select the extracting from build log approach

For the rest of step 2, you'll be working on the right side of the Data Sources dialog. At the top, select [C/C++ Source Files][Make Based][Extract from Makelog] in the menubutton labeled `Select Data Source Type'.

For MSBuild: Select [C/C++ Source Files][Microsoft Visual Studio][MSBuild Log] instead.

2d. Specify the name and starting directory of your logfile

In the `Makelog file' field of the Makelog tab, enter the full path/file name of your logfile.

Your makelog may include references to relative path locations, for source files, include directories or even change directory commands. When processing the makelog, Imagix 4D needs to know what directory these paths are relative to. Sometimes, when the makelog is initially generated by your system, the paths are relative to the original location of the makelog. However, that may not have been the case, or you might have moved the makelog. So the `Starting directory' field enables you to specify the initial directory for any relative path names.

For MSBuild: You'll need to specify a logfile for each project in your Microsoft solution. Often, these logfiles are written to the same directory. To make specifying these logfiles easier, there are separate fields for specifying the directory and for specifying the logfiles themselves. In the `Log files' field, you can enter multiple file names, and use the * character for glob-style matching of file names.

2e. Specify any additional compiler and source analyzer options

Normally, you will leave the Options field empty. All of the -I, -D and -U compiler flags that appear in the logfile will be automatically extracted. However, if your logfile is missing some of the compiler flags you would like to use while analyzing your source code in Imagix 4D, you can add them here. You can also add any Imagix 4D source analyzer options you want to apply. (See Analyzer Syntax and Options for more info).

2f. Select your compiler configuration file

In the Compiler & Target combobox, select the compiler configuration file that you set up in step 1. If you haven't yet configured a compiler configuration file for the compiler and target platform of your software, strongly consider doing so now.

In step 2g, you'll see that you have the ability to extract the compiler directives for multiple compilers. If you do specify multiple compilers, it's possible that the names you specify are different names for the same compiler, such as gcc and g++. In this case, a single compiler configuration file can support all of the source files you are loading. If you're not specifying multiple compilers, or if the compilers you're specifying are equivalent and are supported by the same compiler configuration file, you're done with step 2f at this point.

However, your software build may incorporate multiple different compilers, and different compiler configuration files may therefore be appropriate for different source files within the build. There are two ways to address this.

The first is to add multiple Extract from Makelog data sources. For each data source, use the same logfile but specify a different compiler along with its appropriate compiler configuration file. The resulting project will span all of the source files compiled by each of the compilers / data sources you specify.

The second approach is to create a multi-compiler configuration file that uses #include directives to redirect to the appropriate underlying configuration file. When processing the logfile, the Extract from Makelog process records the -D flags used in compiling a given source file. When multiple compilers are specified, the Extract from Makelog process also generates an additional -D flag indicating the compiler used for that source file. The macro that is defined has the form __IMGXCOMPILER_compiler_id, where compiler_id is based on the name for a specific compiler as specified in the Compiler command field. The compiler_id string is generated by starting with a lower case version of the compiler name, and then substituting an underscore (_) for any non-alphanumeric characters in the compiler name.

You can define a simple, multi-compiler configuration file using these -D flags. The following example shows the entire contents of a new compiler configuration file that you could create to support a software build that uses both the GNU g++/gcc and the Microsoft cl compilers. The .inc file that you would create (see step 1) and then specify here would consist of just a few lines, and would redirect to compiler-specific .inc files:

#if defined(__IMGXCOMPILER_gcc) || defined(__IMGXCOMPILER_g__)
    #include "gnu_mingw32.inc"
#elif defined(__IMGXCOMPILER_cl)
    #include "msvc_win.inc"
#endif

2g. Review and modify the processing rules

On the Processing Rules tab, you will find a series of rules controlling how your logfile is analyzed and how information is extracted. By reviewing the rules side-by-side with your logfile, you'll be able to specify rule settings that are compatible with the format of the logfile.

For MSBuild: Most of the settings in the Processing Rules tab are set automatically, and only a few of these appear.

Note that this is the beginning of an iterative process. In step 2i, you will examine the results of processing the logfile using these rules, and then return to this step to refine your rules. You will be able to repeat this until you are satisfied with your results.

The Processing Rules tab is divided into a number of sections.

When each compiler command is processed, those tokens in the command line that end in one of the specified suffixes are considered to potentially be the name of a source file. Header files suffixes do not need to be specified; like a compiler, Imagix 4D analyzes the source files, and then pulls in whichever header files are included by the source files themselves.

In processing the logfile, the filter needs to identify which lines refer to compiler commands. The settings displayed here would cause the second line in the small sample above to be identified as a compiler command. This is because the /usr/bin/gcc (ignoring its path) at the beginning of the second line matches one of the compiler commands listed here.

If the name of the compiler in the logfile has a suffix, such as cl.exe, you'll need to include the suffix on the compiler command listed here. The two setting alternatives provide additional support for identifying compiler command lines in more complex logfiles. If you do enable multiple compilers, note the impact that this has on step 2f.

The actual format of a compiler command can be vary widely depending upon the make system being used and the compiler being invoked. A number of settings are available to configure how the filter analyzes the logfile line once it has been identified as a compiler command.

Many Windows compilers use "/I directory" and "/D macro" rather than "-Idirectory" and "-Dmacro" style options. The second Format choice enables such compiler command options to be extracted from the logfile and converted to the equivalent Imagix 4D option.

The filter processes each token in the compiler command line. By knowing which options take arguments, the filter is able to skip over a meaningless argument and to avoid the further steps of determining whether it is another option or the name of a source file.

Make systems and compilers also differ in how they interpret and communicate quotation marks in the value term of a -D macro definition. The Imagix 4D source analyzer will include any quotation marks to the right of the = as part of the string substitution; if this is not appropriate, the excess quotations marks can be eliminated.

Note: The Logfile Processing settings apply if you're using the Command Capture alternative. If your logfile was generated through the Compiler Monitor alternative, the logfile format is known, the directory location is recorded and so the settings are ignored.

Additional settings are available for general handling of the logfile, to configure the Imagix 4D processor to support a number of somewhat unique formats generated by different build systems.

The Filter for preprocessing makelog, if specified, is the most significant of these. The selections available for this refer to .flt files existing in the ../imagix/user/data_srcs directory. For example, JSON-Bear_CompDatabase.flt transforms the compiler commands from a JSON Compilation Database into a format supported by the Imagix 4D makelog analyzer. This includes combining the individual macro definitions, include directories and source file entry into a single line, as required by the Imagix 4D analyzer. Another example is Ninja-Soong_Buildlog.flt, which supports the command line output of the Soong build system used by the Android Open Source Project (AOSP), and contains options to limit which AOSP directories to load into the Imagix 4D project.

These preprocessing filters actually generate intermediate files that are then analyzed by the Imagix 4D processor. These intermediate files are written to the directory ../project.4D/datasources, and can be viewed there. See ../imagix/user/data_srcs/data_srcs_inventory.txt for more detail.

The remaining settings in the Logfile Processing section are used to configure how the Imagix 4D processor directly analyzes the logfile.

Entire lines can be skipped if they start with a specified token. In logfiles whose long lines have been split via line returns, the broken lines can be concatenated back together without the need for manual intervention. In logfiles where a single physical line is made up of two or more logical lines joined by "&&", those logical lines can be analyzed as separate lines during the processing.

Often, the source files and -I include directory options in the logfiles use relative path names. In such cases, the logfile analysis needs to track the directory location associated with each compiler command. The filter treats logfile lines containing "Entering directory", "Leaving directory", "START" and "END" as indicating a change in directory location. Additional change directory commands can also be specified, as "cd" is here. The settings also provide control over how those change directory commands are processed.

In this directory tracking, the directory location where the logfile exists is considered to be the starting directory. You might choose to manually prepend a change directory command, such as "Entering directory c:\starting\dir", to the beginning of the logfile to make the starting directory explicit.

When it encounters an option that it doesn't recognize, Imagix 4D's source analyzer converts it into a macro definition. For example, the option "-opt" gets converted to "-D__IMGX_opt__". The resulting macro definition can be used to control logic in the compiler configuration files. A filter setting enables you to control whether such unrecognized options get passed to the source analyzer.

While a makelog might contain multiple entries for the same source file, an Imagix 4D project would normally analyze each source file once. However, you might have a situation where a single source file is compiled multiple times with different macro definitions, and you want the project to capture the analysis with each set of flags. The processor can be configured to support this, in which case you would typically also enable `generate data for all conditional paths taken' on the Options tab. This setting is ignored if you're using a logfile from the imagix-cc-monitor, where duplicate entries are expected as a result of the sampling approach.

If you've generated your logfile on another computer or even another operating system, you can use Imagix 4D's path substitution to modify the directory paths, so that source files and include directories are mapped to where they exist in your current environment. When enabled, previously defined Directory Mapping substitutions are applied (File > Options... > Data Collection > Directory Mapping).

2h. Process the logfile

At this point you're ready to process the logfile. Click Process Makelog at the bottom of the dialog.

2i. Review your results and refine the processing rules

Imagix 4D provides several sources of feedback to use in reviewing the results of processing the logfile.

The first indicator you'll see is a dialog that lists the number of source files that were identified during the processing, divided into those found in your file system and those that weren't. The dialog enables you to generate project settings. If you choose to, any previously generated settings for this data source are replaced by settings derived from the new processing.

Especially useful are the extraction logs, which are generated if you invoke logfile processing from within the Status tab that appears on the right side of the Data Sources dialog. These logs are a series of files titled extract_xxx.log, located Imagix 4D project/datasources directory (step 4a). Separate logs cover the compiler actions, the compiler details, the change directory actions, and process warnings. These logs can be accessed through the Status tab.

Reviewing the results and revising the rules is an iterative process. It will require analysis and deductive reasoning on your part. Here are some strategies that can make the process easier.

Identify source files

When first starting, focus on identifying the source files referenced in the logfile.

If no source files are being found, refer to the Compiler log. It shows which lines from your logfile have been identified as containing compiler invocations. If no compiler invocations are found, first examine your logfile. If you're unable to manually identify compiler invocations there, you may need to regenerate your logfile (step 2a). If compiler invocation lines exist there, but aren't being identified according to the Compiler log, check your Compiler Invocation rules. Make sure that the compiler command you have specified matches the command as it appears in the logfile. You may also need to revise the `Commands to ignore at the beginning of line' rule.

However, if compiler invocations are found and source files are still not being found, compare the suffixes in the invocations with your `Source file suffixes' rule.

id="2i_options">Identify compiler options

Once the source files are being properly identified, review the extraction of compiler option flags.

Examining the Compiler Details log is particularly useful for this. You're able to compare the actual line from the logfile with the options that have been derived from it. If you see that options are being missed, or that the format of the options is not what you want, modify the Compiler Options rules.

You may also find the Warning log to be of help. Any `unrecognized arg' listings for a compiler line could indicate either that you need to modify the option format or that some options are missing from the `Compiler options taking argument' rule.

Adjust paths

As a last step, review and resolve the paths.

In addition to identifying the names of source files, and their associated -I, -D and -U flags, the extraction process tracks directory location, so that relative path names to both source files and include directories can be resolved. If you're using the Compiler Monitor alternative, the directory location is automatically recorded, but with the Command Capture alternative, you might need to make some adjustments to the settings in the Logfile Processing section.

The CD log provides insight into which lines in the logfile are identified as changing the base used for subsequent relative directory calculations. If cd invocation lines are being missed, review your `Change directory command' rule. If the cd invocation includes relative paths, select a `Processing change directory commands with relative paths' rule that causes the appropriate path to be calculated.

If the Summary of build log processing still indicates that other source files are being identified than those found in the file system, you can refer to the Warnings log to determine which files are not found in the file system. You'll then need to determine whether this is because the file is actually missing or because the file path was not properly resolved in the processing. If the latter occurred, the change directory rules need to be revised.

One final note for both alternatives with regard to paths. You may be in a situation where your source code is located in a different location for analysis by Imagix 4D than it was at the time you generated the logfile. This could occur for a number of reasons, including using a different platform for analysis than for compiling. If the logfile contains relative rather than absolute path names, you may have success specifying a different starting directory in step 2d. Alternatively, if the logfile contains absolute paths, you can make path substitutions using the Directory Mapping settings in the Data Collection section of the File > Options dialog.

2j. Start the analysis process

Reviewing the results and revising the rules is an iterative process. Each time you modify the rules (step 2g) and process the logfile (step 2h), you'll generate fresh results to review. Initially, you'll see enough evidence in your review (step 2i) that it will be clear that you need to again refine the rules and (re)process the logfile.

At some point point, you'll decide that the processing rules appear good enough that you'd like to see how well the resulting settings analyze your source code. This is not an irrevocable step. You can later return to step 2g, or even 2a, and continue with the iteration inherent in step 2.

When you're ready to apply the generate Dialog-Based (C/C++) data source definitions to analyze your code, click Load Data Sources at the bottom of the dialog.