Leaking memory on purpose in Rust
Today, we will tackle the topic associated with memory management and how you can ‘leak’ it on purpose in Rust. I’m new to the Rust ecosystem, and when I saw that we have a method called leak
on the smart pointer type like Box, I got interested in why it's there and where it can be used. There are a plethora of other topics associated with those mechanisms and how they can be used, so buckle up and read on. I hope I won’t make it too long.
Leak is not really leaking memory, or is it?
If you come, like me, from a background other than systems programming, you are probably more accustomed to the term leaking memory
as the mechanism in JVM or other managed environments like .NET, where leaks usually occur due to unintended references that prevent the garbage collector from reclaiming memory which is no longer needed.
In Rust, with Box::leak
, we are not really leaking anything; we just deliberately convert Box<T>
to a static mutable reference &'static mut T
, thereby preventing whatever is managed by the Box
from being deallocated.
Because of the conversion to a reference which cannot be deallocated anymore, we actually did kind of leak the memory in a similar fashion to other languages. The only difference is that we did it on purpose, whereas classic memory leaks are undesirable due to their potential to consume system resources unnecessarily.
Box::leak usage examples
When investigating this topic, I found a couple of interesting usage scenarios for this method. The first one is about sharing configuration data across your application.
Global config example
use once_cell::sync::OnceCell;
static GLOBAL_CONFIG: OnceCell<&'static AppConfig> = OnceCell::new();
#[derive(Debug)]
struct AppConfig {
database_url: String,
app_mode: String,
}
impl AppConfig {
// Simulates loading a configuration
fn load() -> Self {
AppConfig {
database_url: "https://example.com/mydb".to_string(),
app_mode: "production".to_string(),
}
}
}
fn setup_global_config() {
let config = Box::new(AppConfig::load());
let static_ref = Box::leak(config); // Leak the Box to get a static reference
GLOBAL_CONFIG.set(static_ref).expect("Failed to set global configuration");
}
fn main() {
setup_global_config();
// Access the global configuration
if let Some(config) = GLOBAL_CONFIG.get() {
println!("Database URL: {}", config.database_url);
println!("App Mode: {}", config.app_mode);
}
}
Let’s imagine that your application needs to access some data stored in AppConfig
throughout the application's life. This scenario can be accomplished by loading the config onto the heap with classic Box::new
, and then leaking it so that we convert it into a static mutable reference passed to OnceCell
. You can access the AppConfig
fields like you would with a normal instance of this type. This scenario is particularly useful for read-only data which, once set, is read frequently across the whole application.
Note on OnceCell
The OnceCell
type is provided by the utility crate once_cell
and allows you to create one-time initialised values. Once created, it cannot be changed, but it creates a safe and efficient way to manage immutable values and defer initialization when needed.
We have multiple flavours of OnceCell
:
OnceCell<T>
: Value set once, typically used for lazy initialization (like in our example above).Lazy<T>
: A wrapper overOnceCell
for values being lazily initialised on the first access.- Sync and unsync versions of
OnceCell
are available.
Example usage of OnceCell in a classical manner:
use once_cell::unsync::OnceCell;
fn main() {
let cell = OnceCell::new();
// Set the value
cell.set(42).expect("Value has already been set");
// Access the value
if let Some(value) = cell.get() {
println!("The value is: {}", value);
}
// Trying to set the value again will result in an error
assert!(cell.set(100).is_err());
}
Lazy example:
use once_cell::sync::Lazy;
// Lazily initialize a static value
static CONFIG: Lazy<String> = Lazy::new(|| {
println!("Initializing configuration...");
"Config data".to_string()
});
fn main() {
// The closure is only executed once, on the first access
println!("CONFIG: {}", *CONFIG);
println!("CONFIG: {}", *CONFIG); // The closure will not run again
}
You may wonder at this point why we would use Box::leak
instead of Lazy
for our global config example. The difference between the two is that Lazy
is limited to compile-time closures for initialization and cannot handle runtime dependencies. If you have some complex data structure, built over during the execution of your program, which you want to share later as a static variable, the best option would be to wrap it in a Box and leak it into a static mutable reference.
Thread safe initialization:
use once_cell::sync::OnceCell;
use std::thread;
static GLOBAL_CONFIG: OnceCell<String> = OnceCell::new();
fn main() {
let handle = thread::spawn(|| {
GLOBAL_CONFIG.set("Config set by thread".to_string()).unwrap();
});
handle.join().unwrap();
println!("Global config: {}", GLOBAL_CONFIG.get().unwrap());
}
While talking about threads, I have already mentioned the sync
and unsync
versions of OnceCell
available, but you need to be aware that the sync
version helps us only during the initialization phase, so that we can be sure that whatever is executed in the given closure is executed only once on the initialization, even in a multithreaded context. Lazy
, however, does not manage any concurrent access to the stored value, and that needs to be taken care of separately when in a multithreaded context. Of course, we can easily solve that for our example with a Mutex
, e.g.,:
use once_cell::sync::Lazy;
use std::sync::Mutex;
struct MySingleton {
// Example fields
pub data: String,
}
// Global singleton instance
static SINGLETON: Lazy<Mutex<MySingleton>> = Lazy::new(|| {
Mutex::new(MySingleton {
data: "Initial data".to_string(),
})
});
fn main() {
// Access the singleton
let singleton = SINGLETON.lock().unwrap();
println!("Singleton data: {}", singleton.data);
}
When talking about OnceCell
, it's worth mentioning another alternative available from std::sync
called Once
. This structure is similar in concept to OnceCell
, but, compared to OnceCell
, it doesn’t store a value. Once
allows us to execute a given closure once but does not provide access to its result. This, of course, can be mitigated by using a static mutable variable, e.g.,:
use std::sync::Once;
static INIT: Once = Once::new();
static mut VALUE: Option<String> = None;
fn main() {
unsafe {
INIT.call_once(|| {
VALUE = Some("Hello, world!".to_string());
});
println!("{}", VALUE.as_ref().unwrap());
}
}
As you can imagine at this point, Box::leak
can be used whenever you need to make something available for the duration of the program and you want to guarantee that the data itself won’t be deallocated prematurely. You can use it for config data which will be initialized once and then read, but you can also use it to store a global piece of data which you can mutate later too. This can become useful in scenarios like a job queue or logging system, e.g.,:
struct JobQueue {
jobs: Vec<String>,
}
impl JobQueue {
// Creates a new JobQueue
fn new() -> Self {
JobQueue {
jobs: Vec::new(),
}
}
// Adds a job to the JobQueue
fn add_job(&mut self, job: String) {
self.jobs.push(job);
}
// Displays all jobs in the queue
fn display_jobs(&self) {
for job in &self.jobs {
println!("{}", job);
}
}
}
fn initialize_job_queue() -> &'static JobQueue {
let queue = Box::new(JobQueue::new());
Box::leak(queue)
}
fn main() {
let queue = initialize_job_queue();
// Add some jobs to the queue
unsafe {
let queue_mut = queue as *const JobQueue as *mut JobQueue;
(*queue_mut).add_job("Job 1".to_string());
(*queue_mut).add_job("Job 2".to_string());
}
// Display the jobs
queue.display_jobs();
}
Another useful scenario for Box::leak
is when you need to create a simple callback mechanism. Instead of leaking a single piece of data, we will leak the full closure, as in the example below:
use std::thread;
fn setup_callback<F: FnOnce() + Send + 'static>(f: F) {
thread::spawn(move || {
// Simulate doing some work...
f(); // Execute the callback
});
}
fn main() {
let my_data = Box::new(|| println!("This is a callback!"));
let static_callback = Box::leak(my_data);
setup_callback(*static_callback);
}
Leaking memory beyond Box::leak
Except for the well-known Box::leak
, there are some other ways we can hack
memory in Rust. These hacks
are also controlled and intentional, but they are close cousins of Box::leak
in terms of behavior.
Rc::into_raw
and Arc::into_raw
Those too are a bit exotic, at least for me. The functions into_raw
convert reference counted or atomic reference counted pointers into raw pointers. Raw pointers are analogous to pointers in languages like C and C++, and allow direct memory access without the borrowing rules normally enforced by Rust. This can become useful when you interact with C code and need to hand over the ownership of those structures. You can explicitly reclaim the memory and convert the raw pointers back to Rc
/Arc
if needed, e.g.,:
use std::rc::Rc;
fn main() {
let data: Rc<i32> = Rc::new(123);
let raw: *const i32 = Rc::into_raw(data);
// Later, convert back to an Rc to avoid memory leak
unsafe {
let reconstructed = Rc::from_raw(raw);
println!("{}", *reconstructed);
// Rc is dropped here, and memory is deallocated
}
}
std::mem
and std::ptr
Last but not least, while we are on the topic of interfacing with C, it's worth mentioning the existence of std::mem
and std::ptr
. With these modules, you can allocate, deallocate, and modify memory as you wish, similarly to what you can do in C/C++. However, you need to be really careful not to unintentionally leak memory for real and watch for undefined behavior.
use std::ptr;
fn main() {
unsafe {
let layout = std::alloc::Layout::from_size_align(1024, 1).unwrap();
let buffer = std::alloc::alloc(layout);
ptr::write(buffer, 42u8); // Example write
println!("{}", *buffer);
// Manually deallocate if needed, or "leak" by never deallocating
std::alloc::dealloc(buffer, layout);
}
}
Real data leaking in Rust
At this point you may wonder if it's possible to actually leak the memory in Rust with Box::leak
and indeed there is. Consider this example:
use std::thread;
fn main() {
thread::spawn(|| {
let leaked_data: &'static str = Box::leak(Box::new(String::from("I leak!")));
println!("{}", leaked_data);
})
.join()
.unwrap();
}
When executed, the String
"I leak!" is created on the heap and leaked into static memory with Box::leak
. The created reference to the static memory is available only during the lifetime of the thread spawned in the main thread and becomes inaccessible when that reference is dropped (when the thread finishes on join). In effect, the data persists after the thread finishes, but you are not able to access it anymore.
The important bit here is that with Box::leak
, we just changed the ownership of the data we allocated on the heap with Box::new
. The data physically remains on the heap but has a 'static
lifetime and cannot be reclaimed until the program ends, hence we have a real memory leak.
Check our articles about Rust: