A Deep Dive Into Strings in Rust

Photo by David Clode on Unsplash

A Deep Dive Into Strings in Rust

·

3 min read

In many programming languages, manipulating strings is a crucial aspect of writing applications. The Rust programming language, known for its performance and safety, is no different. This article provides an in-depth exploration of strings in Rust, including the special notations and "tricks" that could simplify your coding experience.

Understanding Basic Strings in Rust

At its most basic level, a string in Rust is represented as a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. Strings are created using double quotes "".

let s = "Hello, World!";

In this code snippet, s is a string that contains the text "Hello, World!".

String Literals and String Slices

In Rust, a string literal is a slice (&str) that points to a specific section of our program's binary output – which is read-only and thus immutable. This is also why string literals are sometimes referred to as 'static strings'.

let s: &'static str = "Hello, World!";

Here, s is a string slice pointing to the string literal "Hello, World!".

Raw Strings

In Rust, the r before a string literal denotes a raw string. Raw strings ignore all escape characters and print the string as it is. This is helpful when you want to avoid escaping backslashes in your strings, for example, in the case of regular expressions or file paths.

let s = r"C:\Users\YourUser\Documents";

Byte Strings

Rust also has the concept of byte strings. They're similar to text strings, but they're constructed of bytes instead of characters. You can create a byte string by prefixing a string literal with a b.

let bs: &[u8; 4] = b"test"; // bs is a byte array: [116, 101, 115, 116]

Raw Byte Strings

A raw byte string is a combination of raw strings and byte strings. This type of string is useful for including byte sequences that might not be valid UTF-8. A raw byte string is created by prefixing a string literal with br.

let raw_bs = br"\xFF"; // raw_bs is a byte array: [92, 120, 70, 70]

Escaping in Raw Strings

If you need to include quotation marks in a raw string, you can do so by adding additional # symbols on both sides of the string.

let s = r#"This string contains "quotes"."#;

Multiline Raw Strings

Raw strings can be multiline. The content of the string starts at the first line that does not contain only a #.

let s = r####"
This string contains "quotes".
It also spans multiple lines.
"####;

Unicode Strings

String literals in Rust can also contain any valid Unicode characters.

let s = "Hello, 世界!";

Character Escapes

Regular (non-raw) string literals support several escape sequences:

  • \\ Backslash

  • \" Double quote

  • \n Newline

  • \r Carriage return

  • \t Tab

  • \0 Null

There are also Unicode escapes:

  • \u{7FFF} Unicode character (variable length, up to 6 digits)

  • \u{1F600} Unicode emoji

Conclusion

In summary, Rust provides powerful and flexible tools for working with strings. From raw and byte strings to Unicode and escape sequences,