Consider the difference between File.ReadAllLines
and File.ReadLines
.
ReadAllLines
loads all of the lines into memory and returns a string[]
. All well and good if the file is small. If the file is larger than will fit in memory, you'll run out of memory.
ReadLines
, on the other hand, uses yield return
to return one line at a time. With it, you can read any size file. It doesn't load the whole file into memory.
Say you wanted to find the first line that contains the word "foo", and then exit. Using ReadAllLines
, you'd have to read the entire file into memory, even if "foo" occurs on the first line. With ReadLines
, you only read one line. Which one would be faster?
That's not the only reason. Consider a program that reads a file and processes each line. Using File.ReadAllLines
, you end up with:
string[] lines = File.ReadAllLines(filename);
for (int i = 0; i < lines.Length; ++i)
{
// process line
}
The time it takes that program to execute is equal to the time it takes to read the file, plus time to process the lines. Imagine that the processing takes so long that you want to speed it up with multiple threads. So you do something like:
lines = File.ReadAllLines(filename);
Parallel.Foreach(...);
But the reading is single-threaded. Your multiple threads can't start until the main thread has loaded the entire file.
With ReadLines
, though, you can do something like:
Parallel.Foreach(File.ReadLines(filename), line => { ProcessLine(line); });
That starts up multiple threads immediately, which are processing at the same time that other lines are being read. So the reading time is overlapped with the processing time, meaning that your program will execute faster.
I show my examples using files because it's easier to demonstrate the concepts that way, but the same holds true for in-memory collections. Using yield return
will use less memory and is potentially faster, especially when calling methods that only need to look at part of the collection (Enumerable.Any
, Enumerable.First
, etc.).