rust-code-analysis

rust-code-analysis is a Rust library to analyze and extract information from source codes written in many different programming languages. It is based on a parser generator tool and an incremental parsing library called Tree Sitter.

You can find the source code of this software on GitHub, while issues and feature requests can be posted on the respective GitHub Issue Tracker.

Supported platforms

rust-code-analysis can run on the most common platforms: Linux, macOS, and Windows.

On our GitHub Release Page you can find the Linux and Windows binaries already compiled and packed for you.

API docs

If you prefer to use rust-code-analysis as a crate, you can find the API docs generated by Rustdoc here.

How to cite rust-code-analysis

@article{ARDITO2020100635, title = {rust-code-analysis: A Rust library to analyze and extract maintainability information from source codes}, journal = {SoftwareX}, volume = {12}, pages = {100635}, year = {2020}, issn = {2352-7110}, doi = {https://doi.org/10.1016/j.softx.2020.100635}, url = {https://www.sciencedirect.com/science/article/pii/S2352711020303484}, author = {Luca Ardito and Luca Barbato and Marco Castelluccio and Riccardo Coppola and Calixte Denizet and Sylvestre Ledru and Michele Valsesia}, keywords = {Algorithm, Software metrics, Software maintainability, Software quality}, abstract = {The literature proposes many software metrics for evaluating the source code non-functional properties, such as its complexity and maintainability. The literature also proposes several tools to compute those properties on source codes developed with many different software languages. However, the Rust language emergence has not been paired by the community’s effort in developing parsers and tools able to compute metrics for the Rust source code. Also, metrics tools often fall short in providing immediate means of comparing maintainability metrics between different algorithms or coding languages. We hence introduce rust-code-analysis, a Rust library that allows the extraction of a set of eleven maintainability metrics for ten different languages, including Rust. rust-code-analysis, through the Abstract Syntax Tree (AST) of a source file, allows the inspection of the code structure, analyzing source code metrics at different levels of granularity, and finding code syntax errors before compiling time. The tool also offers a command-line interface that allows exporting the results in different formats. The possibility of analyzing source codes written in different programming languages enables simple and systematic comparisons between the metrics produced from different empirical and large-scale analysis sources.} }

License

  • Mozilla-defined grammars are released under the MIT license.

  • rust-code-analysis, rust-code-analysis-cli and rust-code-analysis-web are released under the Mozilla Public License v2.0.

Supported Languages

This is the list of programming languages parsed by rust-code-analysis.

  • C
  • C++
  • Mozcpp
  • Ccomment
  • Preproc
  • Java
  • JavaScript
  • Mozjs
  • Python
  • Rust
  • Typescript

Supported Metrics

rust-code-analysis implements a series of metrics:

  • ABC: it measures the size of a source code by counting the number of Assignments (A), Branches (B) and Conditions (C).
  • BLANK: it counts the number of blank lines in a source file.
  • CC: it calculates the Cyclomatic complexity examining the control flow of a program.
  • CLOC: it counts the number of comments in a source file.
  • COGNITIVE: it calculates the Cognitive complexity, measuring how complex it is to understand a unit of code.
  • HALSTEAD: it is a suite that provides a series of information, such as the effort required to maintain the analyzed code, the size in bits to store the program, the difficulty to understand the code, an estimate of the number of bugs present in the codebase, and an estimate of the time needed to implement the software.
  • LLOC: it counts the number of logical lines (statements) contained in a source file.
  • MI: it is a suite that allows to evaluate the maintainability of a software.
  • NARGS: it counts the number of arguments of a function/method.
  • NEXITS: it counts the number of possible exit points from a method/function.
  • NOM: it counts the number of functions and closures in a file/trait/class.
  • NPA: it counts the number of public attributes in classes/interfaces.
  • NPM: it counts the number of public methods in classes/interfaces.
  • PLOC: it counts the number of physical lines (instructions) contained in a source file.
  • SLOC: it counts the number of lines in a source file.
  • WMC: it sums the Cyclomatic complexity of every method defined in a class.

Commands

rust-code-analysis-cli offers a range of commands to analyze and extract information from source code. Each command may include parameters specific to the task it performs. Below, we describe the core types of commands available in rust-code-analysis-cli.

Metrics

Metrics provide quantitative measures about source code, which can help in:

  • Compare different programming languages
  • Provide information on the quality of a code
  • Tell developers where their code is more tough to handle
  • Discovering potential issues early in the development process

rust-code-analysis calculates the metrics starting from the source code of a program. These kind of metrics are called static metrics.

Nodes

To represent the structure of program code, rust-code-analysis-cli builds an Abstract Syntax Tree (AST). A node is an element of this tree and denotes any syntactic construct present in a language.

Nodes can be used to:

  • Create the syntactic structure of a source file
  • Discover if a construct of a language is present in the analyzed code
  • Count the number of constructs of a certain kind
  • Detect errors i the source code

REST API

rust-code-analysis-web runs a server offering a REST API. This allows users to send source code via HTTP and receive corresponding metrics in JSON format.

Metrics

Metrics can be displayed or exported in various formats using rust-code-analysis-cli.

Display Metrics

To compute and display metrics for a given file or directory, run:

rust-code-analysis-cli -m -p /path/to/your/file/or/directory
  • -p: Path to the file or directory to analyze. If a directory is provided, metrics will be computed for all supported files it contains.

Exporting Metrics

rust-code-analysis-cli supports multiple output formats for exporting metrics, including:

  • CBOR
  • JSON
  • TOML
  • YAML

Both JSON and TOML can be exported as pretty-printed.

Export Command

To export metrics as a JSON file:

rust-code-analysis-cli -m -p /path/to/your/file/or/directory -O json -o /path/to/output/directory
  • -O: Specifies the output format (e.g., json, toml, yaml, cbor).
  • -o: Path to save the output file. The filename of the output file is the same as the input file plus the extension associated to the format. If not specified, the result will be printed in the shell.

Pretty Print

To output pretty-printed JSON metrics:

rust-code-analysis-cli -m -p /path/to/your/file/or/directory --pr -O json

This command prints the formatted metrics to the console or the specified output path.

Nodes

The rust-code-analysis-cli provides commands to analyze and extract information from the nodes in the Abstract Syntax Tree (AST) of a source file.

Error Detection

To detect syntactic errors in your code, run:

rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" -f error
  • -p: Path to a file or directory (analyzes all files in the directory).
  • -I: Glob filter for selecting files by extension (e.g., *.js, *.rs).
  • -f: Flag to search for nodes of a specific type (e.g., errors).

Counting Nodes

You can count the number of specific node types in your code by using the --count flag:

rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" --count <NODE_TYPE>

This counts how many nodes of the specified type exist in the analyzed files.

Printing the AST

To visualize the AST of a source file, use the -d flag:

rust-code-analysis-cli -p /path/to/your/file/or/directory -d

The -d flag prints the entire AST, allowing you to inspect the code's syntactic structure.

Analyzing Code Portions

To analyze only a specific part of the code, use the --ls (line start) and --le (line end) options. For example, if we want to print the AST of a single function which starts at line 5 and ends at line 10:

rust-code-analysis-cli -p /path/to/your/file/or/directory -d --ls 5 --le 10

Rest API

rust-code-analysis-web is a web server that allows users to analyze source code through a REST API. This service is useful for anyone looking to perform code analysis over HTTP.

The server can be run on any host and port, and supports the following main functionalities:

  • Remove Comments from source code.
  • Retrieve Function Spans for given code.
  • Compute Metrics for the provided source code.

Running the Server

To run the server, you can use the following command:

rust-code-analysis-web --host 127.0.0.1 --port 9090
  • --host specifies the IP address where the server should run (default is 127.0.0.1).
  • --port specifies the port to be used (default is 8080).
  • -j specifies the number of parallel jobs (optional).

Endpoints

1. Ping the Server

Use this endpoint to check if the server is running.

Request:

GET http://127.0.0.1:8080/ping

Response:

  • Status Code: 200 OK
  • Body:
{ "message": "pong" }

2. Remove Comments

This endpoint removes comments from the provided source code.

Request:

POST http://127.0.0.1:8080/comments

Payload:

{ "id": "unique-id", "file_name": "filename.ext", "code": "source code with comments" }
  • id: A unique identifier for the request.
  • file_name: The name of the file being analyzed.
  • code: The source code with comments.

Response:

{ "id": "unique-id", "code": "source code without comments" }

3. Retrieve Function Spans

This endpoint retrieves the spans of functions in the provided source code.

Request:

POST http://127.0.0.1:8080/functions

Payload:

{ "id": "unique-id", "file_name": "filename.ext", "code": "source code with functions" }
  • id: A unique identifier for the request.
  • file_name: The name of the file being analyzed.
  • code: The source code with functions.

Response:

{ "id": "unique-id", "spans": [ { "name": "function_name", "start_line": 1, "end_line": 10 } ] }

4. Compute Metrics

This endpoint computes various metrics for the provided source code.

Request:

POST http://127.0.0.1:8080/metrics

Payload:

{ "id": "unique-id", "file_name": "filename.ext", "code": "source code for metrics" "unit": false }
  • id: Unique identifier for the request.
  • file_name: The filename of the source code file.
  • code: The source code to analyze.
  • unit: A boolean value. true to compute only top-level metrics, false for detailed metrics across all units (functions, classes, etc.).

Response:

{ "id": "unique-id", "language": "Rust", "spaces": { "metrics": { "cyclomatic_complexity": 5, "lines_of_code": 100, "function_count": 10 } } }

Developers Guide

If you want to contribute to the development of rust-code-analysis we have summarized here a series of guidelines that are supposed to help you in your building process.

As prerequisite, you need to install the last available version of Rust. You can learn how to do that here.

Clone Repository

First of all, you need to clone the repository. You can do that:

through HTTPS

git clone -j8 https://github.com/mozilla/rust-code-analysis.git

or through SSH

git clone -j8 git@github.com:mozilla/rust-code-analysis.git

Building

To build the rust-code-analysis library, you need to run the following command:

cargo build

If you want to build the cli:

cargo build -p rust-code-analysis-cli

If you want to build the web server:

cargo build -p rust-code-analysis-web

If you want to build everything in one fell swoop:

cargo build --workspace

Testing

After you have finished changing the code, you should always verify whether all tests pass with the cargo test command.

cargo test --workspace --all-features --verbose

Code Formatting

If all previous steps went well, and you want to make a pull request to integrate your invaluable help in the codebase, the last step left is code formatting.

Rustfmt

This tool formats your code according to Rust style guidelines.

To install:

rustup component add rustfmt

To format the code:

cargo fmt

Clippy

This tool helps developers to write better code catching automatically lots of common mistakes for them. It detects in your code a series of errors and warnings that must be fixed before making a pull request.

To install:

rustup component add clippy

To detect errors and warnings:

cargo clippy --workspace --all-targets --

Code Documentation

If you have documented your code, to generate the final documentation, run this command:

cargo doc --open --no-deps

Remove the --no-deps option if you also want to build the documentation of each dependency used by rust-code-analysis.

Run your code

You can run rust-code-analysis-cli using:

cargo run -p rust-code-analysis-cli -- [rust-code-analysis-cli-parameters]

To know the list of rust-code-analysis-cli parameters, run:

cargo run -p rust-code-analysis-cli -- --help

You can run rust-code-analysis-web using:

cargo run -p rust-code-analysis-web -- [rust-code-analysis-web-parameters]

To know the list of rust-code-analysis-web parameters, run:

cargo run -p rust-code-analysis-web -- --help

Practical advice

  • When you add a new feature, add at least one unit or integration test to verify that everything works correctly
  • Document public API
  • Do not add dead code
  • Comment intricate code such that others can comprehend what you have accomplished

Supporting a new language

This section is to help developers implement support for a new language in rust-code-analysis.

To implement a new language, two steps are required:

  1. Generate the grammar
  2. Add the grammar to rust-code-analysis

A number of metrics are supported and help to implement those are covered elsewhere in the documentation.

Generating the grammar

As a prerequisite for adding a new grammar, there needs to exist a tree-sitter version for the desired language that matches the version used in this project.

The grammars are generated by a project in this repository called enums. The following steps add the language support from the language crate and generate an enum file that is then used as the grammar in this project to evaluate metrics.

  1. Add the language specific tree-sitter crate to the enum crate, making sure to tie it to the tree-sitter version used in the ruse-code-analysis crate. For example, for the Rust support at time of writing the following line exists in the /enums/Cargo.toml: tree-sitter-rust = "version number".
  2. Append the language to the enum crate in /enums/src/languages.rs. Keeping with Rust as the example, the line would be (Rust, tree_sitter_rust). The first parameter is the name of the Rust enum that will be generated, the second is the tree-sitter function to call to get the language's grammar.
  3. Add a case to the end of the match in mk_get_language macro rule in /enums/src/macros.rs eg. for Rust Lang::Rust => tree_sitter_rust::language().
  4. Lastly, we execute the /recreate-grammars.sh script that runs the enums crate to generate the grammar for the new language.

At this point we should have a new grammar file for the new language in /src/languages/. See /src/languages/language_rust.rs as an example of the generated enum.

Adding the new grammar to rust-code-analysis

  1. Add the language specific tree-sitter crate to the rust-code-analysis project, making sure to tie it to the tree-sitter version used in this project. For example, for the Rust support at time of writing the following line exists in the Cargo.toml: tree-sitter-rust = "0.19.0".
  2. Next we add the new tree-sitter language namespace to /src/languages/mod.rs eg.
#![allow(unused)] fn main() { pub mod language_rust; pub use language_rust::*; }
  1. Lastly, we add a definition of the language to the arguments of mk_langs! macro in /src/langs.rs.
#![allow(unused)] fn main() { // 1) Name for enum // 2) Language description // 3) Display name // 4) Empty struct name to implement // 5) Parser name // 6) tree-sitter function to call to get a Language // 7) file extensions // 8) emacs modes ( Rust, "The `Rust` language", "rust", RustCode, RustParser, tree_sitter_rust, [rs], ["rust"] ) }

Lines of Code (LoC)

In this document we give some guidance on how to implement the LoC metrics available in this crate. Lines of code is a software metric that gives an indication of the size of some source code by counting the lines of the source code. There are many types of LoC so we will first explain those by way of an example.

Types of LoC

#![allow(unused)] fn main() { /* Instruction: Implement factorial function For extra credits, do not use mutable state or a imperative loop like `for` or `while`. */ /// Factorial: n! = n*(n-1)*(n-2)*(n-3)...3*2*1 fn factorial(num: u64) -> u64 { // use `product` on `Iterator` (1..=num).product() } }

The example above will be used to illustrate each of the LoC metrics described below.

SLOC

A straight count of all lines in the file including code, comments, and blank lines.
METRIC VALUE: 11

PLOC

A count of the instruction lines of code contained in the source code. This would include any brackets or similar syntax on a new line. Note that comments and blank lines are not counted in this.
METRIC VALUE: 3

LLOC

The "logical" lines is a count of the number of statements in the code. Note that what a statement is depends on the language.
In the above example there is only a single statement which id the function call of product with the Iterator as its argument.
METRIC VALUE: 1

CLOC

A count of the comments in the code. The type of comment does not matter ie single line, block, or doc.
METRIC VALUE: 6

BLANK

Last but not least, this metric counts the blank lines present in a code. METRIC VALUE: 2

Implementation

To implement the LoC related metrics described above you need to implement the Loc trait for the language you want to support.

This requires implementing the compute function. See /src/metrics/loc.rs for where to implement, as well as examples from other languages.

Update grammars

Each programming language needs to be parsed in order to extract its syntax and semantic: the so-called grammar of a language. In rust-code-analysis, we use tree-sitter as parsing library since it provides a set of distinct grammars for each of our supported programming languages. But a grammar is not a static monolith, it changes over time, and it can also be affected by bugs, hence it is necessary to update it every now and then.

As now, since we have used bash scripts to automate the operations, grammars can be updated natively only on Linux and MacOS systems, but these scripts can also run on Windows using WSL.

In rust-code-analysis we use both third-party and internal grammars. The first ones are published on crates.io and maintained by external developers, while the second ones have been thought and defined inside the project to manage variant of some languages used in Firefox. We are going to explain how to update both of them in the following sections.

Third-party grammars

Update the grammar version in Cargo.toml and enums/Cargo.toml. Below an example for the tree-sitter-java grammar

tree-sitter-java = "x.xx.x"

where x represents a digit.

Run ./recreate-grammars.sh to recreate and refresh all grammars structures and data

./recreate-grammars.sh

Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.

Commit your changes and create a new pull request

Internal grammars

Update the version of tree-sitter-cli in the package.json file of the internal grammar and then install the updated version.

Update dependency version field in Cargo.toml and enums/Cargo.toml. Below an example for the tree-sitter-ccomment grammar

tree-sitter-ccomment = { path = "./tree-sitter-ccomment", version = "=x.xx.x" }

where x represents a digit.

Open the Cargo.toml file of the chosen grammar and:

  • Set its version to the same value present in the main Cargo.toml file
  • Increase the tree-sitter version to the most recent one

Run the appropriate script to update the grammar by recreating and refreshing every file and script.

For tree-sitter-ccomment and tree-sitter-preproc run ./generate-grammars/generate-grammar.sh followed by the name of the grammar. Below an example always using the tree-sitter-ccomment grammar

./generate-grammars/generate-grammar.sh tree-sitter-ccomment

Instead, for tree-sitter-mozcpp and tree-sitter-mozjs, use their specific scripts.

For tree-sitter-mozcpp, run

./generate-grammars/generate-mozcpp.sh

For tree-sitter-mozjs, run

./generate-grammars/generate-mozjs.sh

Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.

Commit your changes and create a new pull request