Contents

Leaking memory on purpose in Rust

Krzysztof Grajek

16 Dec 2024.8 minutes read

Leaking memory on purpose in Rust webp image

Today, we will tackle the topic associated with memory management and how you can ‘leak’ it on purpose in Rust. I’m new to the Rust ecosystem, and when I saw that we have a method called leak on the smart pointer type like Box, I got interested in why it's there and where it can be used. There are a plethora of other topics associated with those mechanisms and how they can be used, so buckle up and read on. I hope I won’t make it too long.

Leak is not really leaking memory, or is it?

If you come, like me, from a background other than systems programming, you are probably more accustomed to the term leaking memory as the mechanism in JVM or other managed environments like .NET, where leaks usually occur due to unintended references that prevent the garbage collector from reclaiming memory which is no longer needed.

In Rust, with Box::leak, we are not really leaking anything; we just deliberately convert Box<T> to a static mutable reference &'static mut T, thereby preventing whatever is managed by the Box from being deallocated.

Because of the conversion to a reference which cannot be deallocated anymore, we actually did kind of leak the memory in a similar fashion to other languages. The only difference is that we did it on purpose, whereas classic memory leaks are undesirable due to their potential to consume system resources unnecessarily.

Box::leak usage examples

When investigating this topic, I found a couple of interesting usage scenarios for this method. The first one is about sharing configuration data across your application.

Global config example

use once_cell::sync::OnceCell;

static GLOBAL_CONFIG: OnceCell<&'static AppConfig> = OnceCell::new();

#[derive(Debug)]
struct AppConfig {
   database_url: String,
   app_mode: String,
}

impl AppConfig {
   // Simulates loading a configuration
   fn load() -> Self {
       AppConfig {
           database_url: "https://example.com/mydb".to_string(),
           app_mode: "production".to_string(),
       }
   }
}

fn setup_global_config() {
   let config = Box::new(AppConfig::load());
   let static_ref = Box::leak(config); // Leak the Box to get a static reference
   GLOBAL_CONFIG.set(static_ref).expect("Failed to set global configuration");
}

fn main() {
   setup_global_config();

   // Access the global configuration
   if let Some(config) = GLOBAL_CONFIG.get() {
       println!("Database URL: {}", config.database_url);
       println!("App Mode: {}", config.app_mode);
   }
}

Let’s imagine that your application needs to access some data stored in AppConfig throughout the application's life. This scenario can be accomplished by loading the config onto the heap with classic Box::new, and then leaking it so that we convert it into a static mutable reference passed to OnceCell. You can access the AppConfig fields like you would with a normal instance of this type. This scenario is particularly useful for read-only data which, once set, is read frequently across the whole application.

Note on OnceCell

The OnceCell type is provided by the utility crate once_cell and allows you to create one-time initialised values. Once created, it cannot be changed, but it creates a safe and efficient way to manage immutable values and defer initialization when needed.

We have multiple flavours of OnceCell:

  • OnceCell<T>: Value set once, typically used for lazy initialization (like in our example above).
  • Lazy<T>: A wrapper over OnceCell for values being lazily initialised on the first access.
  • Sync and unsync versions of OnceCell are available.

Example usage of OnceCell in a classical manner:

use once_cell::unsync::OnceCell;

fn main() {
   let cell = OnceCell::new();

   // Set the value
   cell.set(42).expect("Value has already been set");

   // Access the value
   if let Some(value) = cell.get() {
       println!("The value is: {}", value);
   }

   // Trying to set the value again will result in an error
   assert!(cell.set(100).is_err());
}

Lazy example:

use once_cell::sync::Lazy;

// Lazily initialize a static value
static CONFIG: Lazy<String> = Lazy::new(|| {
   println!("Initializing configuration...");
   "Config data".to_string()
});

fn main() {
   // The closure is only executed once, on the first access
   println!("CONFIG: {}", *CONFIG);
   println!("CONFIG: {}", *CONFIG); // The closure will not run again
}

You may wonder at this point why we would use Box::leak instead of Lazy for our global config example. The difference between the two is that Lazy is limited to compile-time closures for initialization and cannot handle runtime dependencies. If you have some complex data structure, built over during the execution of your program, which you want to share later as a static variable, the best option would be to wrap it in a Box and leak it into a static mutable reference.

Thread safe initialization:

use once_cell::sync::OnceCell;
use std::thread;

static GLOBAL_CONFIG: OnceCell<String> = OnceCell::new();

fn main() {
   let handle = thread::spawn(|| {
       GLOBAL_CONFIG.set("Config set by thread".to_string()).unwrap();
   });

   handle.join().unwrap();

   println!("Global config: {}", GLOBAL_CONFIG.get().unwrap());
}

While talking about threads, I have already mentioned the sync and unsync versions of OnceCell available, but you need to be aware that the sync version helps us only during the initialization phase, so that we can be sure that whatever is executed in the given closure is executed only once on the initialization, even in a multithreaded context. Lazy, however, does not manage any concurrent access to the stored value, and that needs to be taken care of separately when in a multithreaded context. Of course, we can easily solve that for our example with a Mutex, e.g.,:

use once_cell::sync::Lazy;
use std::sync::Mutex;

struct MySingleton {
   // Example fields
   pub data: String,
}

// Global singleton instance
static SINGLETON: Lazy<Mutex<MySingleton>> = Lazy::new(|| {
   Mutex::new(MySingleton {
       data: "Initial data".to_string(),
   })
});

fn main() {
   // Access the singleton
   let singleton = SINGLETON.lock().unwrap();
   println!("Singleton data: {}", singleton.data);
}

When talking about OnceCell, it's worth mentioning another alternative available from std::sync called Once. This structure is similar in concept to OnceCell, but, compared to OnceCell, it doesn’t store a value. Once allows us to execute a given closure once but does not provide access to its result. This, of course, can be mitigated by using a static mutable variable, e.g.,:

use std::sync::Once;

static INIT: Once = Once::new();
static mut VALUE: Option<String> = None;

fn main() {
   unsafe {
       INIT.call_once(|| {
           VALUE = Some("Hello, world!".to_string());
       });
       println!("{}", VALUE.as_ref().unwrap());
   }
}

As you can imagine at this point, Box::leak can be used whenever you need to make something available for the duration of the program and you want to guarantee that the data itself won’t be deallocated prematurely. You can use it for config data which will be initialized once and then read, but you can also use it to store a global piece of data which you can mutate later too (as you get a mutable reference on Box::leak). This can become useful in scenarios like a job queue or logging system, e.g.,:

struct JobQueue {
   jobs: Vec<String>,
}

impl JobQueue {
   // Creates a new JobQueue
   fn new() -> Self {
       JobQueue {
           jobs: Vec::new(),
       }
   }

   // Adds a job to the JobQueue
   fn add_job(&mut self, job: String) {
       self.jobs.push(job);
   }

   // Displays all jobs in the queue
   fn display_jobs(&self) {
       for job in &self.jobs {
           println!("{}", job);
       }
   }
}

fn initialize_job_queue() -> &'static JobQueue {
   let queue = Box::new(JobQueue::new());
   Box::leak(queue)
}

fn main() {
   let queue = initialize_job_queue();

   // Add some jobs to the queue
   unsafe {
       let queue_mut = queue as *const JobQueue as *mut JobQueue;
       (*queue_mut).add_job("Job 1".to_string());
       (*queue_mut).add_job("Job 2".to_string());
   }

   // Display the jobs
   queue.display_jobs();
}

Another useful scenario for Box::leak is when you need to create a simple callback mechanism. Instead of leaking a single piece of data, we will leak the full closure, as in the example below:

use std::thread;

fn setup_callback<F: FnOnce() + Send + 'static>(f: F) {
   thread::spawn(move || {
       // Simulate doing some work...
       f(); // Execute the callback
   });
}

fn main() {
   let my_data = Box::new(|| println!("This is a callback!"));
   let static_callback = Box::leak(my_data);
   setup_callback(*static_callback);
}

Leaking memory beyond Box::leak

Except for the well-known Box::leak, there are some other ways we can hack memory in Rust. These hacks are also controlled and intentional, but they are close cousins of Box::leak in terms of behavior.

Rc::into_raw and Arc::into_raw

Those too are a bit exotic, at least for me. The functions into_raw convert reference counted or atomic reference counted pointers into raw pointers. Raw pointers are analogous to pointers in languages like C and C++, and allow direct memory access without the borrowing rules normally enforced by Rust. This can become useful when you interact with C code and need to hand over the ownership of those structures. You can explicitly reclaim the memory and convert the raw pointers back to Rc/Arc if needed, e.g.,:

use std::rc::Rc;

fn main() {
   let data: Rc<i32> = Rc::new(123);
   let raw: *const i32 = Rc::into_raw(data);

   // Later, convert back to an Rc to avoid memory leak
   unsafe {
       let reconstructed = Rc::from_raw(raw);
       println!("{}", *reconstructed);
       // Rc is dropped here, and memory is deallocated
   }
}

std::mem and std::ptr

Last but not least, while we are on the topic of interfacing with C, it's worth mentioning the existence of std::mem and std::ptr. With these modules, you can allocate, deallocate, and modify memory as you wish, similarly to what you can do in C/C++. However, you need to be really careful not to unintentionally leak memory for real and watch for undefined behavior.

use std::ptr;

fn main() {
   unsafe {
       let layout = std::alloc::Layout::from_size_align(1024, 1).unwrap();
       let buffer = std::alloc::alloc(layout);
       ptr::write(buffer, 42u8);  // Example write
       println!("{}", *buffer);

       // Manually deallocate if needed, or "leak" by never deallocating
       std::alloc::dealloc(buffer, layout);
   }
}

Real data leaking in Rust

At this point you may wonder if it's possible to actually leak the memory in Rust with Box::leak and indeed there is. Consider this example:

use std::thread;

fn main() {
   thread::spawn(|| {
       let leaked_data: &'static str = Box::leak(Box::new(String::from("I leak!")));
       println!("{}", leaked_data);
   })
       .join()
       .unwrap();
}

When executed, the String "I leak!" is created on the heap (with Box::new) and leaked into static memory with Box::leak. The created reference to the static memory is available only during the lifetime of the thread spawned in the main thread and becomes inaccessible when that reference is dropped (when the thread finishes on join). In effect, the data persists after the thread finishes, but you are not able to access it anymore.

The important bit here is that with Box::leak, we just changed the ownership of the data we allocated on the heap with Box::new. The data physically remains on the heap but has a 'static lifetime and cannot be reclaimed until the program ends, hence we have a real memory leak.

Check our articles about Rust:

Blog Comments powered by Disqus.