Skip to main content

Defensive Copy? What Defensive Copy?

· 8 min read
Linh Nguyen
T-90MS Main Battle Tank
thumbnail

What is defensive copy? What defensive copy?

The Wild West Collections Era

The era of using Stream API made people forget an era when we lived mostly with ArrayList, LinkedList, HashSet, TreeMap... you know, the classics. The good old days when we manually looped through everything like cavemen discovering fire.

What do those collections and maps have in common?

They are mutable, yep.

You could poke them, prod them, add stuff, remove stuff. Total chaos, but in a fun way.

The Stream API Made Us Soft

Stream operations produced mostly immutable collections, so we often didn't hear about defensive copy anymore. We got comfortable. Too comfortable. We thought we were safe.

☢️ Trap Alert!!! ☢️

There are some gotchas about using collections that you need to pay attention to:

Differences between .toList() and .collect(Collectors.toList())

Collectors.toList() actually returns an ArrayList! Dangerous if you expected this to not be modified! It's like ordering a locked safe and receiving a cardboard box instead. Use .toList() since Java 16, or use Collectors.toUnmodifiableList() instead.

Even SonarQube agrees with me on this one! Check out RSPEC-6204 where they literally say:

"Stream.toList()" method should be used instead of "collectors" when unmodifiable list needed

When the linters are throwing shade at your code, you know it's serious.

// The trap (brought to you by Java's commitment to backwards compatibility)
var names = stream.collect(Collectors.toList());
// Oh no, someone can do names.add("chaos")!

// The safe way (Java 16+)
var names = stream.toList(); // Actually immutable! Finally!

// The verbose but safe way
var names = stream.collect(Collectors.toUnmodifiableList());
The Trollish Arrays.asList Shenanigan

Arrays.asList is a convenient method for wrapping around an array, but that fake "ArrayList" (same name, but different FQCN) allowed modification, and it reflects back the backing array. Yikes! It's like looking in a mirror that punches you back.

var arr = new String[] {"Java", "Python", "Go"};
var list = Arrays.asList(arr); // Looks innocent enough...
list.set(0, "Kotlin"); // This changes arr too! Surprise!
System.out.println(arr[0]); // Prints "Kotlin" 😱

// Use List.of instead (the adult in the room)
var safeList = List.of(arr); // Truly immutable, no tricks
Collectors.groupingBy, The Gift That Keeps On Giving (Mutability)

Even some methods like Collectors.groupingBy, by default, don't return immutable maps. The worst of the worst: a HashMap of a key and an ArrayList value! Doubly mutable! Heresy!

Also:

// The backing map is still a HashMap! Got you again!
Collectors.groupingBy(
keyMapper,
Collectors.toUnmodifiableList());

// Collectors.toUnmodifiableMap() is still susceptible
// to value mutation, if the value mapper does not
// give an immutable collection
// It's turtles all the way down, folks

When You Need to Go Commando with Mutable Collections

But what if you need to work "nakedly" with the lowest implementation of collections, like mutable ArrayList or rebellious HashMap? Defensive copy comes to the rescue!

The Problem: Shallow Immutability

In Java, final fields are shallowly immutable, which means their references (or value for primitive ones) stay fixed, but nothing prevents some funny users from getting the references and messing with the content.

public class BadDataHolder {

// Being final here won't save your data from modification!
// It's like putting a "Do Not Touch" sign on a public buffet
private final String[] data;

public BadDataHolder(String[] data) {
// Oops, mutations inside this class will reflect back at the source too!
this.data = data;
}

public String[] getData() {
// Uh oh, direct exposure! We're basically handing out the keys to the kingdom
return data;
}
}

// Meanwhile, in villain headquarters...
var secretData = new String[] {"password123", "admin", "secret"};
var holder = new BadDataHolder(secretData);
var exposed = holder.getData();

// The original is now compromised! *evil laughter* 😈
exposed[0] = "HACKED";
// Your "secure" holder just became a security theater

From the code above, while you cannot specify a different array, you can freely mutate it as you wish. It's like having a vault with a door that can't be replaced, but the door is wide open.

The Solution: Copy Everything Like You're Being Paranoid

Solution? Create a wrapper around the field you wish to return, also from the input, if you feel extra cautious.

See here:

public class GoodDataHolder {

private final String[] data;

public GoodDataHolder(String[] data) {
// Defensive copy on input
// Trust no one, not even the constructor caller
this.data = new String[data.length];
System.arraycopy(data, 0, this.data, 0, data.length);
}

public String[] getData() {
// Defensive copy on output
// Still trusting no one, good policy
var copy = new String[data.length];

// Here's YOUR copy, go wild with it
System.arraycopy(data, 0, copy, 0, data.length);
return copy;
}
}

They can retrieve a copy of the original content and do whatever they want with it, while the original remains intact. Problem solved! You can sleep peacefully at night.

tip

However, you can relax this rule if you are passing parameters formed from an immutable factory method itself. List.of()? Go with the flow. Map.ofEntries()? Go wild, never worry.

Still, List.copyOf and others are very smart: they will return the immutable collection itself if you pass an immutable collection as a parameter, so... safer than sorry, I guess?

var immutable = List.of("A", "B", "C");
var copy = List.copyOf(immutable);
// copy == immutable (same reference!),
// because it's already immutable
// Java being smart for once! It won't waste time
// copying what's already safe, thank you Brian Goetz

But if you need some mutations internally, but cannot let others jeopardize your work? Then wrapping the returning value for the getters is a MUST:

class InternalShenanigan {

// Default initial value: a mutable ArrayList
private List<String> works = new ArrayList<>();

public InternalShenanigan() {
// Default constructor does nothing
}

public InternalShenanigan(List<String> works) {
// You truly need mutable collections
this.works = new ArrayList<>(works);
}

public void process() {
// Do something with the works above
}

public List<String> getWorks() {
// Don't let outside noise distract you!
return List.copyOf(works);
// Can also return a mutable copy like this:
// return new ArrayList<>(works);
// But only if the caller needs to modify their own copy
// (and they're too lazy to wrap it in ArrayList themselves, how dare they)
}
}

The Price of Safety

Sure, this will incur some overhead in performance. But here's the kicker: you're actually getting better constant folding optimizations from the JVM when working with immutable collections! The compiler can make assumptions about your data that it simply cannot make with mutable structures.

So you sacrifice a tiny bit of performance upfront (copying data), but you get a MUCH better return on investment:

  • Data integrity: Your data stays yours, untouched, unmolested

  • Thread safety: No more race conditions because someone mutated your collection

  • Better JVM optimizations: Constant folding and other compiler tricks kick in

  • Peace of mind: Sleep soundly knowing your data is safe

  • Clearer code: Immutability makes reasoning about code so much easier

And thanks to List.copyOf and its gang's smart decision to return the same reference for already immutable collections, the overhead is minimal when you're already working with immutable data structures.

Or better yet, favor immutable collection types in place of raw arrays if possible. Modern Java gives you all the tools you need!

You might get the occasional UnsupportedOperationException when someone tries to modify your immutable collections (and they'll learn their lesson real quick). But that's a feature, not a bug! It's the universe telling them "No. Bad developer. No biscuit."

Your data is worth more than a few nanoseconds of copying time. Probably. Maybe. Okay, it depends on your use case, but you get the idea! And honestly, with those JVM optimizations, you might even come out ahead in the long run. Win-win!