String Basics
What Is a String?
A string is a sequence of characters. Every piece of text you have ever read on a screen — a name, a sentence, an error message, a URL — is a string.
In most programming languages, a string is not a primitive type like an integer. It is a data structure: an ordered collection of characters stored together, with operations defined on them as a whole.
The simplest way to think about a string is as an array of characters. "hello" is five characters — 'h', 'e', 'l', 'l', 'o' — arranged in sequence. This connection to arrays is not just an analogy. Internally, most string implementations are backed by a character array, which is why most array traversal and indexing concepts transfer directly to strings.
String: "hello" Index: 0 1 2 3 4 Char: 'h' 'e' 'l' 'l' 'o'
Declaring and Initializing Strings
String declaration syntax varies across languages, but the concept is the same — bind a name to a sequence of characters enclosed in quotes.
1public class StringDeclaration {
2
3 public static void main(String[] args) {
4 // String literal — the most common way
5 String greeting = "Hello, World!";
6
7 // Empty string — a valid string with zero characters
8 String empty = "";
9
10 // String from character array
11 char[] chars = {'h', 'e', 'l', 'l', 'o'};
12 String fromChars = new String(chars);
13
14 // String with special characters
15 String withNewline = "Line one\nLine two";
16 String withTab = "Name:\tAlice";
17 String withQuote = "She said \"hello\"";
18
19 System.out.println(greeting);
20 System.out.println("Empty length: " + empty.length());
21 System.out.println("From chars: " + fromChars);
22 System.out.println("With quote: " + withQuote);
23 }
24}Output:
Hello, World!
Empty length: 0
From chars: hello
With quote: She said "hello"
Accessing Individual Characters
Like arrays, strings support index-based access to individual characters. Indices are 0-based — the first character is at index 0, the last at index length - 1.
1public class CharacterAccess {
2
3 public static void main(String[] args) {
4 String s = "programming";
5
6 // charAt(index) returns the character at that position
7 System.out.println("First char: " + s.charAt(0)); // 'p'
8 System.out.println("Last char: " + s.charAt(s.length() - 1)); // 'g'
9 System.out.println("Index 3: " + s.charAt(3)); // 'g'
10
11 // Length of the string
12 System.out.println("Length: " + s.length()); // 11
13
14 // Convert character to integer (its ASCII/Unicode value)
15 char c = s.charAt(0);
16 System.out.println("'p' as int: " + (int) c); // 112
17
18 // Convert integer back to character
19 int code = 112;
20 System.out.println("112 as char: " + (char) code); // 'p'
21 }
22}Output:
First char: p
Last char: g
Index 3: g
Length: 11
'p' as int: 112
112 as char: p
String Length
Getting the length of a string is O(1) in all four languages — it is stored as a property, not recomputed each time.
Language Syntax Notes Java s.length() Method call — parentheses required Python len(s) Built-in function — no dot notation C++ s.length() Same as s.size() — both work JavaScript s.length Property — no parentheses
A critical beginner mistake is writing s.length() in JavaScript (where it is a property, not a method) or len.s in Python (wrong order). These cause immediate syntax errors that are easy to miss under interview pressure.
ASCII and Unicode: What Characters Actually Are
Every character is stored as a number. The mapping between numbers and characters is defined by a standard.
ASCII (American Standard Code for Information Interchange) covers 128 characters — the English alphabet (upper and lower case), digits, punctuation, and control characters.
Key ASCII values to memorize for interviews: 'A' = 65 'a' = 97 '0' = 48 'B' = 66 'b' = 98 '1' = 49 'Z' = 90 'z' = 122 '9' = 57 Gap between uppercase and lowercase: 97 - 65 = 32 Convert uppercase to lowercase: char + 32 (or use built-in methods) Convert 'A' to 0: 'A' - 'A' = 0, 'B' - 'A' = 1, ..., 'Z' - 'A' = 25 Convert '0' to 0: '0' - '0' = 0, '1' - '0' = 1, ..., '9' - '0' = 9
Unicode extends ASCII to cover over 140,000 characters including every writing system, emoji, and symbol. Python 3 strings are Unicode by default. Java strings use UTF-16. JavaScript strings are UTF-16. C++ std::string is typically bytes (ASCII/UTF-8) — Unicode requires std::wstring or external libraries.
For interview problems, you can almost always assume ASCII input (26 lowercase letters or 128 ASCII characters) unless the problem explicitly states otherwise.
1public class CharacterArithmetic {
2
3 public static void main(String[] args) {
4 // Convert character to its alphabetic position (0-indexed)
5 char c = 'e';
6 int position = c - 'a'; // 'e' - 'a' = 101 - 97 = 4
7 System.out.println("'e' is at position: " + position); // 4
8
9 // Convert digit character to integer value
10 char digit = '7';
11 int value = digit - '0'; // '7' - '0' = 55 - 48 = 7
12 System.out.println("'7' as integer: " + value); // 7
13
14 // Check if character is a letter
15 System.out.println("'a' is letter: " + Character.isLetter('a')); // true
16 System.out.println("'5' is letter: " + Character.isLetter('5')); // false
17 System.out.println("'5' is digit: " + Character.isDigit('5')); // true
18
19 // Lowercase and uppercase conversion
20 System.out.println("'A' to lower: " + Character.toLowerCase('A')); // 'a'
21 System.out.println("'z' to upper: " + Character.toUpperCase('z')); // 'Z'
22 }
23}Output:
'e' is at position: 4
'7' as integer: 7
'a' is letter: true
'5' is digit: true
'A' to lower: a
'z' to upper: Z
String Slicing and Substrings
Extracting a portion of a string is one of the most frequent operations in string problems. The syntax differs, but the concept is the same: give a start index and an end index, get back the characters in between.
1public class Substrings {
2
3 public static void main(String[] args) {
4 String s = "Hello, World!";
5
6 // substring(startIndex) — from start to end of string
7 // substring(startIndex, endIndex) — from start to endIndex (exclusive)
8 System.out.println(s.substring(7)); // "World!"
9 System.out.println(s.substring(7, 12)); // "World"
10 System.out.println(s.substring(0, 5)); // "Hello"
11
12 // Length of substring without creating it
13 int subLen = 12 - 7; // endIndex - startIndex = 5
14 System.out.println("Substring length: " + subLen);
15
16 // First and last character as substring (length 1)
17 System.out.println("First char as string: " + s.substring(0, 1)); // "H"
18 System.out.println("Last char as string: " + s.substring(s.length() - 1)); // "!"
19 }
20}Output:
World!
World
Hello
Substring length: 5
First char as string: H
Last char as string: !
Critical Difference: C++ substr Uses Length, Not End Index
Java/Python/JavaScript: substring(start, end) → end is exclusive index C++: substr(start, length) → length is the count Example: extract "World" from "Hello, World!" (indices 7-11) Java: s.substring(7, 12) → (start=7, end=12 exclusive) → "World" Python: s[7:12] → (start=7, end=12 exclusive) → "World" JavaScript: s.slice(7, 12) → (start=7, end=12 exclusive) → "World" C++: s.substr(7, 5) → (start=7, length=5) → "World" Getting this wrong in C++ is a very common bug — length vs end index.
String Concatenation
Joining strings together is simple but hides important performance differences. Understanding this now prevents a class of bugs that only appear at scale.
1public class Concatenation {
2
3 public static void main(String[] args) {
4 String first = "Hello";
5 String second = "World";
6
7 // + operator — creates a new string each time
8 String result = first + ", " + second + "!";
9 System.out.println(result); // "Hello, World!"
10
11 // Building a string in a loop — use StringBuilder, not +
12 // Using + in a loop is O(n²) because each + creates a new string
13 StringBuilder sb = new StringBuilder();
14 String[] words = {"The", "quick", "brown", "fox"};
15
16 for (String word : words) {
17 if (sb.length() > 0) sb.append(" ");
18 sb.append(word);
19 }
20
21 System.out.println(sb.toString()); // "The quick brown fox"
22 System.out.println("Length: " + sb.length());
23 }
24}Output:
Hello, World!
The quick brown fox
Strings Are Not Arrays (But They Are Similar)
The most important structural distinction between strings and arrays:
What strings share with arrays:
- ›0-based index access to individual elements
- ›Length property / function
- ›Sequential character storage
- ›O(1) character access by index
What strings do not share with arrays:
- ›In Java, Python, and JavaScript, strings are immutable — you cannot change a character at a specific index.
s[0] = 'X'is illegal or silently fails. - ›Strings have string-specific methods (split, trim, toLowerCase, contains) that arrays do not
- ›String comparison uses content equality in most languages, not reference equality (except when using
==in Java) - ›C++
std::stringis mutable — you can dos[0] = 'X'
Immutability example: Java: String s = "hello"; s.charAt(0) = 'H'; // COMPILE ERROR — charAt() returns a value, not a reference s = "Hello"; // OK — creates a new String object, rebinds s Python: s = "hello" s[0] = 'H' # TypeError: 'str' object does not support item assignment JavaScript: let s = "hello"; s[0] = 'H'; // Silently fails — no error, but s is unchanged console.log(s); // still "hello" C++: string s = "hello"; s[0] = 'H'; // Works! std::string is mutable cout << s; // "Hello"
The immutability topic has major performance implications and gets a full dedicated page. For now, the key point is: never assume you can modify a character at an index when working with Java, Python, or JavaScript strings.
Language-Specific String Behavior Summary
| Feature | Java | Python | C++ | JavaScript |
|---|---|---|---|---|
| Mutable | No | No | Yes | No |
| Index access | s.charAt(i) | s[i] | s[i] | s[i] or s.charAt(i) |
| Length | s.length() | len(s) | s.length() | s.length |
| Substring | s.substring(l,r) | s[l:r] | s.substr(l,len) | s.slice(l,r) |
| Concatenation | s1 + s2 | s1 + s2 | s1 + s2 | s1 + s2 |
| Char to int | (int) c | ord(c) | (int) c | c.charCodeAt(0) |
| Int to char | (char) n | chr(n) | (char) n | String.fromCharCode(n) |
| Uppercase | s.toUpperCase() | s.upper() | transform + toupper | s.toUpperCase() |
| Lowercase | s.toLowerCase() | s.lower() | transform + tolower | s.toLowerCase() |
Interview Questions
Q: What is the difference between a character and a string in programming?
A character is a single symbol — one letter, digit, space, or punctuation mark. A string is an ordered sequence of zero or more characters. In Java, char is a primitive type holding one character; String is a class holding a sequence. In Python, there is no separate character type — a single character is just a string of length 1. In C++, char is a single byte and std::string is a sequence.
Q: How do you access the last character of a string safely?
Use s.charAt(s.length() - 1) in Java, s[-1] in Python, s[s.length() - 1] in JavaScript, or s[s.length() - 1] in C++. Always verify the string is not empty before accessing the last character — on an empty string, length - 1 = -1, which is out of bounds in Java, JavaScript, and C++. Python's s[-1] raises IndexError on an empty string.
Q: Why is 'e' - 'a' useful in string problems?
Because characters are stored as integers, subtracting the base character gives the alphabetic position (0-indexed). 'e' - 'a' = 4 tells you 'e' is the 5th letter. This is used constantly for creating frequency arrays of size 26 (one slot per letter), converting characters to array indices without a hash map, and computing character differences for ordering. Similarly, digit - '0' converts a digit character to its integer value.
Q: How do you check if a character is a vowel or consonant?
One common approach: check membership in the vowel set {'a', 'e', 'i', 'o', 'u'}. Alternatively, precompute a boolean array of size 26 where vowels are true. For case-insensitive checking, convert the character to lowercase first, then check.
FAQs
Can a string contain spaces, numbers, and special characters?
Yes. A string can contain any character — letters, digits, spaces, punctuation, emoji, newlines, tabs. The content is just a sequence of Unicode code points. Interview problems usually specify the character set (lowercase letters only, or ASCII printable characters) in the constraints.
What does an empty string look like and how do I check for it?
An empty string is "" — a sequence of zero characters. Its length is 0. In Java: s.isEmpty() or s.length() == 0. In Python: s == "" or not s (empty string is falsy). In C++: s.empty() or s.length() == 0. In JavaScript: s === "" or s.length === 0. Never use s == null to check for empty — that checks for null reference, not empty content.
Is the + operator for string concatenation always safe to use?
For a small number of concatenations (say, 3 to 5), + is fine. For building a string in a loop with n iterations, + in Java creates n intermediate String objects — O(n²) total time and memory. Always use StringBuilder in Java for loop-based concatenation. Python's "".join(list) is the efficient equivalent. C++'s += on std::string is more efficient than Java's + in loops.
What is the difference between null and an empty string?
A null string has no value at all — the variable points to nothing. An empty string "" is a valid string object with zero characters. They are completely different. Accessing .length() on a null string causes a NullPointerException in Java. Checking s == null and s.isEmpty() should be done separately. Python does not have null strings — the closest equivalent is None, which is not a string.
Quick Quiz
Question 1: What index holds the last character of a string of length n?
- ›A) n
- ›B) n - 1
- ›C) n + 1
- ›D) 0
Answer: B) n - 1. Strings are 0-indexed like arrays. The first character is at index 0, the last at index length - 1. Accessing index n (equal to length) is out of bounds.
Question 2: What does 'z' - 'a' evaluate to?
- ›A) 0
- ›B) 25
- ›C) 26
- ›D) 122
Answer: B) 25. 'z' has ASCII value 122, 'a' has value 97. 122 - 97 = 25. The 26 letters of the alphabet span indices 0 to 25 using this formula.
Question 3: In Python, what does s[2:5] return for s = "abcdefg"?
- ›A) "bcde"
- ›B) "cde"
- ›C) "cdef"
- ›D) "cd"
Answer: B) "cde". Python slices use [start:end] where start is inclusive and end is exclusive. s[2:5] returns characters at indices 2, 3, 4 — which are 'c', 'd', 'e'.
Question 4: Why does s[0] = 'H' silently do nothing in JavaScript?
- ›A) JavaScript strings support index writes but discard them
- ›B) JavaScript strings are immutable — character assignment is a no-op
- ›C) The index 0 is reserved in JavaScript
- ›D) It causes a TypeError but JavaScript swallows errors
Answer: B) JavaScript strings are immutable — character assignment is a no-op. Strings in JavaScript are immutable. Attempting to set a character by index does not throw an error — it simply has no effect. The string remains unchanged. To modify a string, you must create a new one (e.g., by splitting into an array, modifying, and joining back).
Summary
A string is an ordered sequence of characters, accessible by 0-based index. It shares many properties with arrays — O(1) index access, sequential storage, length — but has critical differences in mutability and the rich set of string-specific operations.
The key ideas to carry forward:
- ›Index access is 0-based: first character at index 0, last at
length - 1 - ›Characters are integers:
'a' = 97,'A' = 65,'0' = 48 - ›
char - 'a'gives the 0-indexed alphabetic position (0 to 25) - ›
char - '0'converts a digit character to its integer value - ›Java, Python, and JavaScript strings are immutable — C++
std::stringis mutable - ›C++
substr(start, length)takes a length, not an end index — the rest take end index - ›Use
StringBuilderin Java (orjoinin Python) for loop-based concatenation
In the next topic, you will explore How Strings Work in Memory — understanding why immutability exists, how string interning works, and why concatenation in a loop is so expensive.