06: Strings and Unicode
Audience: All Time: 90 minutes Prerequisites: 01-05 You'll learn: String operations, Unicode, formatting, pattern matching with regex
The Big Picture
Strings are how programs work with text. Zebra treats strings as: - Unicode-aware — Emoji, Chinese, Arabic, etc. all work correctly - First-class — Rich library of methods - Immutable — Can't change them after creation (create new ones instead) - Efficient — UTF-8 encoding optimizes storage
String Basics
String Literals
// file: 06_string_basics.zbr
// teaches: string creation // chapter: 06-Strings-and-Unicode
class Main shared def main // Simple string var greeting = "Hello" print greeting
// With quotes inside var quoted = "She said \"Hello\"" print quoted
// Multi-line (if supported) var poem = """ Roses are red Violets are blue """ print poem
// Escape sequences var path = "C:\\Users\\Name\\Documents" var tab = "Name\tAge\tCity" var newline = "Line1\nLine2"
String Properties
// file: 06_string_props.zbr
// teaches: string properties and methods // chapter: 06-Strings-and-Unicode
class Main shared def main var text = "Hello, World!"
// Length print text.len // 13
// Character at index var first_char = text[0] print first_char // H
// Substring/slice var part = text[0..4] print part // Hello
// Case conversion print text.upper() // HELLO, WORLD! print text.lower() // hello, world!
String Interpolation
// file: 06_interpolation.zbr
// teaches: string interpolation // chapter: 06-Strings-and-Unicode
class Main shared def main var name = "Alice" var age = 30
// Simple interpolation print "Name: ${name}" // Name: Alice
// Expressions in interpolation print "Age next year: ${age + 1}" // Age next year: 31
// Method calls var lower_name = name.lower() print "Lowercase: ${lower_name}" // Lowercase: alice
// Format specifiers (if supported) var price = 19.99 print "Price: ${price:.2f}" // Price: 19.99
String Methods
Searching
// file: 06_search.zbr
// teaches: searching in strings // chapter: 06-Strings-and-Unicode
class Main shared def main var text = "Hello, World!"
// Contains print text.contains("World") // true print text.contains("xyz") // false
// Index var idx = text.indexOf("World") print idx // 7
// Not found returns -1 or special value var not_found = text.indexOf("xyz") print not_found
// Starts/ends with print text.startsWith("Hello") // true print text.endsWith("!") // true
Splitting and Joining
// file: 06_split_join.zbr
// teaches: splitting and joining strings // chapter: 06-Strings-and-Unicode
class Main shared def main // Split var csv = "apple,banana,cherry" var fruits = csv.split(",") for fruit in fruits print fruit
// Join var items as List(str) = List() items.add("one") items.add("two") items.add("three") var result = ", ".join(items) print result // one, two, three
Trimming and Padding
// file: 06_trim_pad.zbr
// teaches: trimming and padding // chapter: 06-Strings-and-Unicode
class Main shared def main var padded = " hello "
// Trim whitespace print "|${padded.trim()}|" // |hello| print "|${padded.trimLeft()}|" // |hello | print "|${padded.trimRight()}|" // | hello|
// Padding var short = "hi" print short.padLeft(10, "*") // hi print short.padRight(10, "-") // hi-------- print short.center(10, "*") // hi
Replacing
// file: 06_replace.zbr
// teaches: string replacement // chapter: 06-Strings-and-Unicode
class Main shared def main var text = "cat and dog and bird"
// Replace (first occurrence, or all) var once = text.replace("and", "or") // Replaces once print once
var all = text.replaceAll("and", "or") // Replaces all print all
// Case conversion replacement var lower = "Hello World".lower() print lower // hello world
Unicode and Internationalization
Unicode Basics
// file: 06_unicode.zbr
// teaches: unicode support // chapter: 06-Strings-and-Unicode
class Main shared def main // Emoji var emoji = "Hello 👋 🌍 🎉" print emoji print emoji.len // Byte length print emoji.codePointCount() // Character count
// Chinese var chinese = "你好世界" // Hello World in Chinese print chinese
// Arabic (right-to-left) var arabic = "مرحبا بالعالم" // Hello World print arabic
// Mixed scripts var mixed = "Hello 世界 مرحبا" print mixed
Character Iteration
// file: 06_char_iter.zbr
// teaches: iterating over characters // chapter: 06-Strings-and-Unicode
class Main shared def main var text = "Hello"
// Iterate characters for char in text.chars() print char
// Byte iteration var data = "AB" for byte in data.bytes() print byte // 65, 66 (ASCII values)
Regular Expressions (Intro)
Regular expressions let you search and validate text patterns.
Basic Patterns
// file: 06_regex_intro.zbr
// teaches: regular expressions introduction // chapter: 06-Strings-and-Unicode
class Main shared def main // Simple pattern var email = "alice@example.com" var pattern = Regex.compile("[a-z]+@[a-z]+\\.[a-z]+")
var is_valid = pattern.match(email) print is_valid // true
// Find matches var text = "I have 2 apples and 3 oranges" var digit_pattern = Regex.compile("\\d+") var found = digit_pattern.find(text) print found // 2
// Replace var clean = digit_pattern.replace(text, "X") print clean // I have X apples and X oranges
Real World: Text Processing
// file: 06_text_processing.zbr
// teaches: practical text operations // chapter: 06-Strings-and-Unicode
class Parser shared def parse_csv_line(line as str) as List(str) return line.split(",")
def normalize_whitespace(text as str) as str // Replace multiple spaces with one var lines as List(str) = List() for line in text.split("\n") var trimmed = line.trim() if trimmed.len > 0 lines.add(trimmed) return "\n".join(lines)
def extract_numbers(text as str) as List(str) var results as List(str) = List() var pattern = Regex.compile("\\d+") for match in pattern.findAll(text) results.add(match) return results
class Main shared def main // Parse CSV var csv_line = "Alice,30,alice@example.com" var fields = Parser.parse_csv_line(csv_line) print "Name: ${fields.at(0)}" print "Age: ${fields.at(1)}"
// Extract numbers var text = "I was born in 1990 and moved in 2005" var years = Parser.extract_numbers(text) for year in years print year
Common Patterns
Email Validation
def is_valid_email(email as str) as bool
if not email.contains("@") return false var parts = email.split("@") if parts.count() != 2 return false if not parts.at(1).contains(".") return false return true
URL Parsing
def parse_url(url as str) as HashMap(str, str)
var result as HashMap(str, str) = HashMap() var parts = url.split("://") if parts.count() == 2 result.put("protocol", parts.at(0)) return result
String Templating
def template(text as str, values as HashMap(str, str)) as str
var result = text for key, value in values var placeholder = "${${key}}" result = result.replace(placeholder, value) return result
Common Mistakes
> ❌ Mistake: Forgetting that strings are immutable > >
> var text = "hello" > text[0] = 'H' # ❌ Can't modify > > > ✅ Better: > > var text = "hello" > var capitalized = "H".concat(text[1..]) >
> ❌ Mistake: Ignoring Unicode length > >
> var emoji = "👋" > print emoji.len # ❌ Returns 4 (bytes), not 1 > > > ✅ Better: > > var emoji = "👋" > print emoji.codePointCount() # ✅ Returns 1 (characters) >
> ❌ Mistake: Inefficient concatenation in loops > >
> var result = "" > for i in 1..1000 > result = result + "${i}," # ❌ O(n²) complexity > > > ✅ Better: > > var sb as StringBuilder = StringBuilder() > for i in 1..1000 > sb.append("${i},") > var result = sb.build() # ✅ O(n) complexity >
Exercises
Exercise 1: String Reversal
Write a function that reverses a string:
Solution
class Reverser
shared def reverse(text as str) as str var chars as List(str) = List() for c in text.chars() chars.add(c.toString())
var result = "" var i = chars.count() - 1 while i >= 0 result = result.concat(chars.at(i)) i = i - 1 return result
class Main shared def main var reversed = Reverser.reverse("hello") print reversed // olleh
Exercise 2: Email Validator
Write a simple email validator:
Solution
class Validator
shared def is_valid_email(email as str) as bool if email.len < 5 return false if not email.contains("@") return false if not email.contains(".") return false var parts = email.split("@") if parts.count() != 2 return false if parts.at(1).len < 3 return false return true
class Main shared def main var emails as List(str) = List() emails.add("alice@example.com") emails.add("invalid") emails.add("bob@domain.co")
for email in emails if Validator.is_valid_email(email) print "Valid: ${email}" else print "Invalid: ${email}"
Exercise 3: CSV Parsing
Parse a CSV line and extract fields:
Solution
class CSVParser
shared def parse(line as str) as List(str) return line.split(",")
def parse_with_trim(line as str) as List(str) var raw = line.split(",") var trimmed as List(str) = List() for field in raw trimmed.add(field.trim()) return trimmed
class Main shared def main var csv = "Alice, 30, NYC" var fields = CSVParser.parse_with_trim(csv) for field in fields print "|${field}|"
Next Steps
- → 07-Classes — Object-oriented programming - → 18-Regular-Expressions — Deep dive into regex - 🏋️ Project-1-CLI-Tool — Text processing practical application
Key Takeaways
- Strings are immutable — Create new ones instead of modifying - Interpolation is readable — Use "${var}" over concatenation - Methods are rich — .split(), .replace(), .trim(), etc. - Unicode works seamlessly — emoji, Chinese, Arabic, etc. - Regex enables pattern matching — For validation and extraction - StringBuilder is efficient — Use for loop-based concatenation
Next: Head to Part 2 for object-oriented programming with 07-Classes.