This is a list of common LINQ mistakes, which has grown out from reading code of others - and my own - and using the results to train people new to .NET and LINQ.
These mistakes and problems apply to all LINQ providers, but as people new to LINQ are hopefully familiar with SQL, some of the mistakes have explanations or examples in SQL.
In many cases, LINQ provider will fix the issue – but programmer should be smarter than her/his tools and not expext the provider, compiler or interpreter to fix the bad code. Not to mention, in many cases the misuse obfuscates the intent of the code, uses more resources or has invalid/unexpected results.
If you need to find these problems in the code, here is a list of regular expressions for that.
Understanding deferred execution is by far the biggest issue people have when learning LINQ.
There are great many explanations, for example (in no particular order):
- LINQ and Deferred Execution
- Deferred query execution
- Deferred vs Immediate Query Execution in LINQ
- Deferred execution vs immediate execution in LINQ
- Understanding LINQ's Deferred Execution
Incorrect:
var person = persons.Where(x => x.LastName == "Smith" && x.FirstName == "John").First();
Correct:
var person = persons.First(x => x.LastName == "Smith" && x.FirstName == "John");
Explanation
Depending on optimizations of the LINQ provider, this may mean that your database fetches all John Smiths to your application, and the application then returns the first of those. You definitely don't want that.
This also applies for Where(x => x.LastName == "Smith").Count()
and other similar constructs, all of which obfuscate the intent of the code.
Let's explain the example code with Rubber Duck Debugging:
- Incorrect: I will take all persons named John Smith and use the first of them.
- Correct: I will get and use the first person named John Smith.
Incorrect:
var person = persons.SingleOrDefault(x => x.Id == 21034);
Correct:
var person = persons.FirstOrDefault(x => x.Id == 21034);
Explanation
Single()/SingleOrDefault()
ensures that there is just one and only one match in the set, whereas First()/FirstOrDefault()
returns the first match.
Single..
methods iterate over all elements of the set until two matching items are found – validating that there is just one match, and throw an error if second is found - which is both costly and usually unneeded, unless validating user input or such.
In terms of SQL, Single()
is: SELECT TOP 2...
whereas First is SELECT TOP 1...
. If you have 10M rows, and your match is the very first row, Single()
will still run through all of the rows, validating that there are no more matches, whereas First()
returns immediately.
In most cases, you aren't interested in ensuring that there is just one match - or often it is logically impossible. See the examples above - if your database has two Person entries with same primary key (ID), you have far bigger problems than using LINQ badly...
A special case is Single()
without predicate, which should be used when logically or semantically there should be only a single element in the set.
This is highly situation-dependent. Sometimes First()
is correct, and sometimes it isn't.
It is important to understand that First()
and Single()
throw an exception when match is not found, whereas FirstOrDefault()
and SingleOrDefault()
won't – and return the default/null value, depending on the data type.
Take, for example:
var person = persons.First(x => x.Id == 21034);
Exception may be the correct behavior if a person with ID 21034 should exist, and not finding the entry is literally exceptional. However, remember that exceptions should not be used for flow control!
On the other hand:
var login = users.First(x => x.Username == "[email protected]");
Instead of having First()
throwing an exception, you probably should use FirstOrDefault()
and check login
variable for null - and log the invalid log-in attempt along with the details.
A very common pattern is "update or create" ("upsert" in SQL terms), for which you should always use FirstOrDefault()
and not exception handling.
Incorrect:
if (persons.Count() > 0)...
if (persons.Count(x => x.LastName == "Smith") > 0)...
if (persons.Count(x => x.LastName == "Smith") == persons.Count())...
Correct:
if (persons.Any())...
if (persons.Any(x => x.LastName == "Smith"))...
if (persons.All(x => x.LastName == "Smith"))...
Explanation
Count()
will do the full iteration of the (matching) items in the set. If you are interested that there is a matching item in the set, use Any()
- Any()
just returns true when there is at least one matching item, with an empty Any()
just returning true when there are items in your set.
E.g. if you have list of 10M persons, and you need to know that a person with last name Smith exists, persons.Count(x => x.LastName == "Smith")
will go through all the persons and check their last name, returning total number of Smiths. persons.Any(x => x.LastName == "Smith")
returns true after encountering the first entry with last name Smith.
All()
is the opposite of Any()
- if you need to know that all your persons have lastname Smith, persons.All(x => x.LastName == "Smith")
will return false when encountering first person with last name not Smith, without going over all of the set.
Incorrect:
persons.Where(x => x.LastName == "Smith").Where(x => x.FirstName == "John");
Correct:
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John");
Explanation
Chaining Where()
entries may end up in SQL database as multiple embedded queries (in style of SELECT * FROM (SELECT * FROM PERSONS WHERE LastName = 'Smith') WHERE FirstName = 'John'
, instead of SELECT * FROM PERSONS WHERE LastName = 'Smith' AND FirstName = 'John'
) – or will do multiple iterations over sets.
It also obfuscates the intent of the query. While most if not all LINQ providers optimize the multi-Where() properly - or the underlying provider does that - let's explain the code to our rubber duck again:
- Incorrect: I will fetch all persons with last name Smith, and out of them I will take people with first name John.
- Correct: I will fetch all persons named John Smith.
Incorrect:
persons.OrderBy(x => x.LastName).OrderBy(x => x.FirstName);
Correct:
persons.OrderBy(x => x.LastName).ThenBy(x => x.FirstName);
Explanation
The "incorrect" version sorts persons by LastName - and then ignores that sort completely and re-sorts persons by FirstName. Correct use is one OrderBy()
followed by ThenBy()
, which can be chained multiple times (persons.OrderBy(x => x.LastName).ThenBy(x => x.FirstName).ThenBy(x => x.Age)
).
This also applies to OrderByDescending()
and ThenByDescending()
.
Incorrect:
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John").Select(x => x);
Correct:
persons.Where(x => x.LastName == "Smith" && x.FirstName == "John");
Explanation
It is unclear why this artefact pops up every now and then. Maybe the author intended to do something with Select()
and forgot, or thought that Select()
materializes the query, like ToList()
or ToArray()
.
Another possible explanation is that they used this pattern to return IEnumerable<T>, but didn't know about AsEnumerable()
extension method.
Incorrect:
var employees = persons.Where(x => x is Employee).Select(x => x as Employee);
var employees = persons.Where(x => x is Employee).Select(x => (Employee)x);
var employees = persons.Select(x => x as Employee).Where(x => x != null);
Correct:
var employees = persons.OfType<Employee>();
Explanation
Using is...as
antipattern is a very common occurrence in .NET world. is
does nothing more internally besides trying to cast the object to specified type and returning false if the casting fails. Using is...as
antipattern therefore casts the object to specified type twice - something that really should be avoided (note that c# 7 added is
pattern matching, allowing constructs like if (person is Employee e)...
)
In case of LINQ, OfType<>() returns only the objects of the specified type. This allows both shorter and clearer code, compared to the incorrect version.
Incorrect:
var count = peopleArray.Count();
var count = peopleList.Count();
var count = peopleDictionary.Count();
Correct:
var count = peopleArray.Length;
var count = peopleList.Count;
var count = peopleDictionary.Count;
Explanation
Don't use Enumerable.Count() method for arrays or any collections that implement ICollection<T>/ICollection interface, such as List<T> and Dictionary<T>. Use Length for arrays and Count property for ICollection implementations.
Depending on platform, there are optimizations in place (see here) preventing O(n) operation of Count()
method, just returning existing Count
value.