rust-code-analysis

rust-code-analysis is a Rust library to analyze and extract information from source codes written in many different programming languages. It is based on a parser generator tool and an incremental parsing library called Tree Sitter.

You can find the source code of this software on GitHub, while issues and feature requests can be posted on the respective GitHub Issue Tracker.

Supported platforms

rust-code-analysis can run on the most common platforms: Linux, macOS, and Windows.

On our GitHub Release Page you can find the Linux and Windows binaries already compiled and packed for you.

API docs

If you prefer to use rust-code-analysis as a crate, you can find the API docs generated by Rustdoc here.

How to cite rust-code-analysis

@article{ARDITO2020100635,
    title = {rust-code-analysis: A Rust library to analyze and extract maintainability information from source codes},
    journal = {SoftwareX},
    volume = {12},
    pages = {100635},
    year = {2020},
    issn = {2352-7110},
    doi = {https://doi.org/10.1016/j.softx.2020.100635},
    url = {https://www.sciencedirect.com/science/article/pii/S2352711020303484},
    author = {Luca Ardito and Luca Barbato and Marco Castelluccio and Riccardo Coppola and Calixte Denizet and Sylvestre Ledru and Michele Valsesia},
    keywords = {Algorithm, Software metrics, Software maintainability, Software quality},
    abstract = {The literature proposes many software metrics for evaluating the source code non-functional properties, such as its complexity and maintainability. The literature also proposes several tools to compute those properties on source codes developed with many different software languages. However, the Rust language emergence has not been paired by the community’s effort in developing parsers and tools able to compute metrics for the Rust source code. Also, metrics tools often fall short in providing immediate means of comparing maintainability metrics between different algorithms or coding languages. We hence introduce rust-code-analysis, a Rust library that allows the extraction of a set of eleven maintainability metrics for ten different languages, including Rust. rust-code-analysis, through the Abstract Syntax Tree (AST) of a source file, allows the inspection of the code structure, analyzing source code metrics at different levels of granularity, and finding code syntax errors before compiling time. The tool also offers a command-line interface that allows exporting the results in different formats. The possibility of analyzing source codes written in different programming languages enables simple and systematic comparisons between the metrics produced from different empirical and large-scale analysis sources.}
}

License

  • Mozilla-defined grammars are released under the MIT license.

  • rust-code-analysis, rust-code-analysis-cli and rust-code-analysis-web are released under the Mozilla Public License v2.0.

Supported Languages

This is the list of programming languages parsed by rust-code-analysis.

  • [x] C++
  • [ ] C#
  • [ ] CSS
  • [ ] Go
  • [ ] HTML
  • [ ] Java
  • [x] JavaScript
  • [x] The JavaScript used in Firefox internal
  • [x] Python
  • [x] Rust
  • [x] Typescript

A check indicates which languages have metrics implemented.

Supported Metrics

rust-code-analysis implements a series of metrics

  • CC: it calculates the code complexity examining the control flow of a program.
  • SLOC: it counts the number of lines in a source file.
  • PLOC: it counts the number of physical lines (instructions) contained in a source file.
  • LLOC: it counts the number of logical lines (statements) contained in a source file.
  • CLOC: it counts the number of comments in a source file.
  • BLANK: it counts the number of blank lines in a source file.
  • HALSTEAD: it is a suite that provides a series of information, such as the effort required to maintain the analyzed code, the size in bits to store the program, the difficulty to understand the code, an estimate of the number of bugs present in the codebase, and an estimate of the time needed to implement the software.
  • MI: it is a suite that allows to evaluate the maintainability of a software.
  • NOM: it counts the number of functions and closures in a file/trait/class.
  • NEXITS: it counts the number of possible exit points from a method/function.
  • NARGS: it counts the number of arguments of a function/method.

The metrics above are still NOT implemented for C#, CSS, Go, HTML, and Java languages.

Commands

With the term command, we define any procedure used by rust-code-analysis-cli to extract information from source codes. At each command may be associated parameters depending on the task it needs to carry out. In this page we have grouped the principal types of commands implemented in rust-code-analysis-cli.

Metrics

Metrics are a series of measures that can be used to:

  • Compare different programming languages
  • Provide information on the quality of a code
  • Tell developers where their code is more tough to handle
  • Discover errors earlier

rust-code-analysis calculates the metrics starting from the source code of a program. These kind of metrics are called static metrics.

Nodes

To represent the structure of program code, rust-code-analysis-cli builds an Abstract Syntax Tree (AST). A node is an element of this tree and denotes any syntactic construct present in a language.

Nodes can be used to:

  • Create the syntactic structure of a source file
  • Discover if a construct of a language is present in the analyzed code
  • Count the number of constructs of a certain kind
  • Detect errors i the source code

REST API

rust-code-analysis-cli can be run as a server which accepts requests sent through REST API. The server receives in input the filename of a source code file and returns the relative metrics formatted as a json file.

Metrics

Metrics can be printed on screen or exported as different output formats through rust-code-analysis-cli.

Print metrics

For each function space, rust-code-analysis computes the list of metrics described above. At the end of this process, rust-code-analysis-cli dumps the result formatted in a certain way on the screen. The command used to print the metrics is the following one:

rust-code-analysis-cli -m -p /path/to/your/file/or/directory

The -p option represents the path to a file or a directory. If a directory is passed as input, rust-code-analysis-cli computes the metrics for each file contained in it.

Export formats

Different output formats can be used to export metrics:

  • Cbor
  • Json
  • Toml
  • Yaml

Json and Toml can also be exported pretty-printed.

Export command

For example, if you want to export metrics as a json file, run:

rust-code-analysis-cli -m -O json -o /output/path -p /path/to/your/file/or/directory

The -O option allows you to choose the output format. It supports only these values: cbor, json, toml, yaml.

The -o option is used to specify the path where your file will be saved. It accepts only paths. The filename of your output file is the same as your input file plus the extension associated to the format. When this option is not given, the output is printed on shell.

As we said before, Json and Toml can be exported as pretty-printed. To do so, the --pr option is used. In the case below, the pretty-printed json output will be printed on shell:

rust-code-analysis-cli -m -O json --pr -p /path/to/your/file/or/directory

Nodes

rust-code-analysis-cli allows to extract some information from the nodes which compose the Abstract Syntax Tree (AST) of a source code.

Find Errors

To know if there are some syntactic errors in your code, run:

rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" -f -error

The -p option represents the path to a file or a directory. If a directory is passed as input, rust-code-analysis-cli computes the metrics for each file contained in it. The -I option is a glob filter used to consider only the files written in the language defined by the extension of the file. The -f option instead searches all nodes of a certain type. In the case above, we are looking for all the erroneous nodes present in the code.

Count Errors

It is also possible to count the number of nodes of a certain type using the --count option:

rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" --count -error

Print AST

If you want to print the AST of a source code, run the following command:

rust-code-analysis-cli -p /path/to/your/file/or/directory -d

The -d option prints the entire AST on the shell.

Code Splitting

Commands can be run on a single portion of the code using the --ls and --le options. The former represents the starting line of the code to be considered, while the latter its ending line. For example, if we want to print the AST of a single function which starts at line 5 and ends at line 10, we need to launch this command:

rust-code-analysis-cli -p /path/to/your/file/or/directory -d --ls 5 --le 10

Rest API

It is possible to run rust-code-analysis-cli as a HTTP service using REST API to share data between client and server. We will use the port 9090 to show you the possible ways to interact with the server.

Server

rust-code-analysis-cli can act as a server running on your localhost at a specific port.

rust-code-analysis-cli --serve --port 9090

The --port option sets the port used by the server. One possible value could be 9090.

Ping

If you want to ping the server, make a GET request at this URL:

http://127.0.0.1:9090/ping

Metrics

To get metrics formatted as a json file, make a POST request at this URL:

http://127.0.0.1:9090/metrics?file_name={filename}&unit={unit}

The filename parameter represents the path to the source file to be analyzed, while unit is a boolean value that can assume only 0 or 1. The latter tells rust-code-analysis-cli to consider only top-level metrics, while the former returns detailed metrics for all classes, functions, nested functions, and other sub-spaces.

Developers Guide

If you want to contribute to the development of rust-code-analysis we have summarized here a series of guidelines that are supposed to help you in your building process.

As prerequisite, you need to install the last available version of Rust. You can learn how to do that here.

Clone Repository

First of all, you need to clone the repository. You can do that:

through HTTPS

git clone -j8 https://github.com/mozilla/rust-code-analysis.git

or through SSH

git clone -j8 git@github.com:mozilla/rust-code-analysis.git

Building

To build the rust-code-analysis library, you need to run the following command:

cargo build

If you want to build the cli:

cargo build -p rust-code-analysis-cli

If you want to build the web server:

cargo build -p rust-code-analysis-web

If you want to build everything in one fell swoop:

cargo build --workspace

Testing

After you have finished changing the code, you should always verify whether all tests pass with the cargo test command.

cargo test --workspace --all-features --verbose

Code Formatting

If all previous steps went well, and you want to make a pull request to integrate your invaluable help in the codebase, the last step left is code formatting.

Rustfmt

This tool formats your code according to Rust style guidelines.

To install:

rustup component add rustfmt

To format the code:

cargo fmt

Clippy

This tool helps developers to write better code catching automatically lots of common mistakes for them. It detects in your code a series of errors and warnings that must be fixed before making a pull request.

To install:

rustup component add clippy

To detect errors and warnings:

cargo clippy --workspace --all-targets --

Code Documentation

If you have documented your code, to generate the final documentation, run this command:

cargo doc --open --no-deps

Remove the --no-deps option if you also want to build the documentation of each dependency used by rust-code-analysis.

Run your code

You can run rust-code-analysis-cli using:

cargo run -p rust-code-analysis-cli -- [rust-code-analysis-cli-parameters]

To know the list of rust-code-analysis-cli parameters, run:

cargo run -p rust-code-analysis-cli -- --help

You can run rust-code-analysis-web using:

cargo run -p rust-code-analysis-web -- [rust-code-analysis-web-parameters]

To know the list of rust-code-analysis-web parameters, run:

cargo run -p rust-code-analysis-web -- --help

Practical advice

  • When you add a new feature, add at least one unit or integration test to verify that everything works correctly
  • Document public API
  • Do not add dead code
  • Comment intricate code such that others can comprehend what you have accomplished

Supporting a new language

This section is to help developers implement support for a new language in rust-code-analysis.

To implement a new language, two steps are required:

  1. Generate the grammar
  2. Add the grammar to rust-code-analysis

A number of metrics are supported and help to implement those are covered elsewhere in the documentation.

Generating the grammar

As a prerequisite for adding a new grammar, there needs to exist a tree-sitter version for the desired language that matches the version used in this project.

The grammars are generated by a project in this repository called enums. The following steps add the language support from the language crate and generate an enum file that is then used as the grammar in this project to evaluate metrics.

  1. Add the language specific tree-sitter crate to the enum crate, making sure to tie it to the tree-sitter version used in the ruse-code-analysis crate. For example, for the Rust support at time of writing the following line exists in the /enums/Cargo.toml: tree-sitter-rust = "version number".
  2. Append the language to the enum crate in /enums/src/languages.rs. Keeping with Rust as the example, the line would be (Rust, tree_sitter_rust). The first parameter is the name of the Rust enum that will be generated, the second is the tree-sitter function to call to get the language's grammar.
  3. Add a case to the end of the match in mk_get_language macro rule in /enums/src/macros.rs eg. for Rust Lang::Rust => tree_sitter_rust::language().
  4. Lastly, we execute the /recreate-grammars.sh script that runs the enums crate to generate the grammar for the new language.

At this point we should have a new grammar file for the new language in /src/languages/. See /src/languages/language_rust.rs as an example of the generated enum.

Adding the new grammar to rust-code-analysis

  1. Add the language specific tree-sitter crate to the rust-code-analysis project, making sure to tie it to the tree-sitter version used in this project. For example, for the Rust support at time of writing the following line exists in the Cargo.toml: tree-sitter-rust = "0.19.0".
  2. Next we add the new tree-sitter language namespace to /src/languages/mod.rs eg.

# #![allow(unused_variables)]
#fn main() {
pub mod language_rust;
pub use language_rust::*;
#}
  1. Lastly, we add a definition of the language to the arguments of mk_langs! macro in /src/langs.rs.

# #![allow(unused_variables)]
#fn main() {
// 1) Name for enum
// 2) Language description
// 3) Display name
// 4) Empty struct name to implement
// 5) Parser name
// 6) tree-sitter function to call to get a Language
// 7) file extensions
// 8) emacs modes
(
    Rust,
    "The `Rust` language",
    "rust",
    RustCode,
    RustParser,
    tree_sitter_rust,
    [rs],
    ["rust"]
)
#}

Lines of Code (LoC)

In this document we give some guidance on how to implement the LoC metrics available in this crate. Lines of code is a software metric that gives an indication of the size of some source code by counting the lines of the source code. There are many types of LoC so we will first explain those by way of an example.

Types of LoC


# #![allow(unused_variables)]
#fn main() {
/*
Instruction: Implement factorial function
For extra credits, do not use mutable state or a imperative loop like `for` or `while`.
 */

/// Factorial: n! = n*(n-1)*(n-2)*(n-3)...3*2*1
fn factorial(num: u64) -> u64 {
    
    // use `product` on `Iterator`
    (1..=num).product()
}
#}

The example above will be used to illustrate each of the LoC metrics described below.

SLOC

A straight count of all lines in the file including code, comments, and blank lines.
METRIC VALUE: 11

PLOC

A count of the instruction lines of code contained in the source code. This would include any brackets or similar syntax on a new line. Note that comments and blank lines are not counted in this.
METRIC VALUE: 3

LLOC

The "logical" lines is a count of the number of statements in the code. Note that what a statement is depends on the language.
In the above example there is only a single statement which id the function call of product with the Iterator as its argument.
METRIC VALUE: 1

CLOC

A count of the comments in the code. The type of comment does not matter ie single line, block, or doc.
METRIC VALUE: 6

BLANK

Last but not least, this metric counts the blank lines present in a code. METRIC VALUE: 2

Implementation

To implement the LoC related metrics described above you need to implement the Loc trait for the language you want to support.

This requires implementing the compute function. See /src/metrics/loc.rs for where to implement, as well as examples from other languages.