Update 1: providing a truly generalized extension method FullOuterJoin
Update 2: optionally accepting a custom IEqualityComparer
for the key type
Update 3: this implementation has recently become part of MoreLinq
- Thanks guys!
Edit Added FullOuterGroupJoin
(ideone). I reused the GetOuter<>
implementation, making this a fraction less performant than it could be, but I'm aiming for 'highlevel' code, not bleeding-edge optimized, right now.
See it live on http://ideone.com/O36nWc
static void Main(string[] args)
{
var ax = new[] {
new { id = 1, name = "John" },
new { id = 2, name = "Sue" } };
var bx = new[] {
new { id = 1, surname = "Doe" },
new { id = 3, surname = "Smith" } };
ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
.ToList().ForEach(Console.WriteLine);
}
Prints the output:
{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b = }
{ a = , b = { id = 3, surname = Smith } }
You could also supply defaults: http://ideone.com/kG4kqO
ax.FullOuterJoin(
bx, a => a.id, b => b.id,
(a, b, id) => new { a.name, b.surname },
new { id = -1, name = "(no firstname)" },
new { id = -2, surname = "(no surname)" }
)
Printing:
{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }
Explanation of terms used:
Joining is a term borrowed from relational database design:
- A join will repeat elements from
a
as many times as there are elements in b
with corresponding key (i.e.: nothing if b
were empty). Database lingo calls this inner (equi)join
.
- An outer join includes elements from
a
for which no corresponding
element exists in b
. (i.e.: even results if b
were empty). This is usually referred to as left join
.
- A full outer join includes records from
a
as well as b
if no corresponding element exists in the other. (i.e. even results if a
were empty)
Something not usually seen in RDBMS is a group join[1]:
- A group join, does the same as described above, but instead of repeating elements from
a
for multiple corresponding b
, it groups the records with corresponding keys. This is often more convenient when you wish to enumerate through 'joined' records, based on a common key.
See also GroupJoin which contains some general background explanations as well.
[1] (I believe Oracle and MSSQL have proprietary extensions for this)
Full code
A generalized 'drop-in' Extension class for this
internal static class MyExtensions
{
internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
let xa = alookup[key]
let xb = blookup[key]
select projection(xa, xb, key);
return join;
}
internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
this IEnumerable<TA> a,
IEnumerable<TB> b,
Func<TA, TKey> selectKeyA,
Func<TB, TKey> selectKeyB,
Func<TA, TB, TKey, TResult> projection,
TA defaultA = default(TA),
TB defaultB = default(TB),
IEqualityComparer<TKey> cmp = null)
{
cmp = cmp?? EqualityComparer<TKey>.Default;
var alookup = a.ToLookup(selectKeyA, cmp);
var blookup = b.ToLookup(selectKeyB, cmp);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
keys.UnionWith(blookup.Select(p => p.Key));
var join = from key in keys
from xa in alookup[key].DefaultIfEmpty(defaultA)
from xb in blookup[key].DefaultIfEmpty(defaultB)
select projection(xa, xb, key);
return join;
}
}