You are here:   Blog
Register   |  Login

Mar 2

Written by: Richard Waddell
3/2/2013 12:16 PM  RssIcon

LINQ joins, both inner and outer, are not always what they seem and how you go about it may depend on whether you want grouping.

Even for simple projects, I like to use LightSwitch for what I call Back Room screens because it’s just so darn convenient. You’ll see what I mean later, because these first two tables are so primitive, so uncouth, so crude that I don’t even call them tables – LightSwitch will attach to them but only as read-only. I call them MasterRowPile and ChildRowPile LightSwitch, as you will see, wants little to do with them because they have no primary keys. Brace yourselves, they are hideous to behold.

image

The purpose of these unrelated tables is to start with the simplest case and illustrate the differences as relationships are added.

To set started I’m going to:

  • Create a manual relationship between the two tables by setting MasterRowID in ChildRowPile rows to match ID values in MasterRowPile rows.
  • Show you two ways LINQ provides to join between the two
  • Show you the T-SQL generated by the LINQ provider
  • Show the Execution Plans generated by the T-SQL

Since LightSwitch wouldn’t help me out, I had to use SSMS to enter values:

image

Inner and Outer Joins on Unrelated Tables

The tables shown above have no relationship as far as SQL is concerned, so any relationship is strictly by convention, meaning there is a column or columns on the ‘child’ table that is used to match a column or columns on the ‘master’ table. There are two ways to join these tables, using ‘join’, and using ‘where’

Join using ‘join’

private static void UnKeyedInnerJoinFromQuery()
{
    ShowTitle("UnKeyedInnerJoinFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = from master in dc.MasterRowPiles
                    join child in dc.ChildRowPiles
                        on master.ID equals child.MasterRowID
                    select new { MasterID = master.ID, Master = master.MasterRowName, ChildMasterID = child.MasterRowID, Child = child.ChildRowName };
        foreach (var entity in query)
            Console.WriteLine("{0} : {1} : {2} - {3}", entity.MasterID, entity.Master, entity.ChildMasterID, entity.Child);
    }
}

Join using ‘where’ (Equi-Join)

private static void UnKeyedEquiJoinFromQuery()
{
    ShowTitle("UnKeyedEquiJoinFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = from master in dc.MasterRowPiles
                    from child in dc.ChildRowPiles
                    where master.ID == child.MasterRowID
                    select new { MasterID = master.ID, Master = master.MasterRowName, ChildMasterID = child.MasterRowID, Child = child.ChildRowName };
        foreach (var entity in query)
            Console.WriteLine("{0} : {1} - {2} : {3}", entity.MasterID, entity.Master, entity.ChildMasterID, entity.Child);
    }
}

The Results

As we expect, we get a row for every master child combination and now rows for masters with no children.

image

The T-SQL and Execution Plans

image

It turns out that both approaches generate very similar T-SQL and identical execution plans.

So that takes care of the simple case. Not very interesting, and we’re not even going to look at the Extension Method syntax because it will come up later. We’re going to move on to Left Outer Joins on unrelated tables. This is where we get into whether to group or not to group.

Left Outer Join – Flattened Results

private static void UnKeyedLeftEquiJoinFromQuery()
{
    ShowTitle("UnKeyedLeftEquiJoinFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = 
        from master in dc.MasterRowPiles
        from child in dc.ChildRowPiles
            .Where(c => c.MasterRowID == master.ID)
            .DefaultIfEmpty()
            select new 
            { 
                MasterID = master.ID, 
                Master = master.MasterRowName, 
                ChildMasterID = child.MasterRowID != null ? child.MasterRowID : 0, 
                Child = child.ChildRowName 
            };
        foreach (var entity in query)
            Console.WriteLine("{0} : {1} - {2} : {3}", entity.MasterID, entity.Master, entity.ChildMasterID, entity.Child);
    }
}

This works because instead of the where keyword, I used the Where extension method to filter on the column values. This allowed me to apply the DefaultIfEmpty extension method to provide a default / empty child object if none exist on the database. That’s all it takes to convert an inner join to a left outer join.

As expected, we get not only the masters with children, but one row for each master that doesn’t have a child. I had to accommodate the constraint that null can’t be assigned to ChildMasterID, our first complication from dealing with results of Left Outer Joins. We’ll get into that more later.

image

Left Outer Join – Grouped Results

This is similar to the inner join up until the into keyword. At that point one group is created for every master row and linked to a collection of child rows associated with that group. If there are no child rows the collection is empty, but the group still exists.

private static void LeftJoinUnkeyedUsingIntoFromQuery()
{
    ShowTitle("LeftJoinUnkeyedUsingIntoFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = 
            from master in dc.MasterRowPiles
            join child in dc.ChildRowPiles
                on master.ID equals child.MasterRowID
            into children
            select new
            {
                Master = master.MasterRowName,
                Children = children
            };
        foreach (var group in query)
        {
            Console.WriteLine(group.Master);
            foreach (var child in group.Children)
                Console.WriteLine("{0} - {1} ", child.MasterRowID, child.ChildRowName);
        }
    }
}

The program output shows the grouping:

image

T-SQL From the Left Outer Join using the ‘into’ keyword

I’m going to show you the T-SQL, but first I want to show you how I get it from Visual studio, so I’m going to show you each step I’m taking to get the T-SQL that will be generated by the query above:

  • Place your cursor on a line after the IQueryable you’re interested in and select ‘Run To Cursor’. In the following, I’m going to inspect ‘query’ using the Debugging Visualizer, so I place my cursor on the ‘foreach’ line, right-click, and select ‘Run To Cursor’
  • image
  • When the line is hit, hover your cursor over ‘query’ and a tooltip will appear:
  • image
  • Click on SELECT in the tooltip and the textbox will open up.
  • image
  • Hit ctl-a / ctl-c to copy the contents.
  • image

And here’s the T-SQL:

{SELECT [t0].[MasterRowName] AS [Master], [t1].[ChildRowName], [t1].[MasterRowID], (
    SELECT COUNT(*)
    FROM [dbo].[ChildRowPile] AS [t2]
    WHERE [t0].[ID] = [t2].[MasterRowID]
    ) AS [value]
FROM [dbo].[MasterRowPile] AS [t0]
LEFT OUTER JOIN [dbo].[ChildRowPile] AS [t1] ON [t0].[ID] = [t1].[MasterRowID]
}

Now I’m a little confused at this point, because if I execute that T-SQL in SSMS I see flattened rows:image

If I look at the output from the LINQ in LINQPad in rich text format I see grouping:

image

If I view it in LINQPad grid view:

imageimage

But LINQPad sees the same T-SQL and yields the same results when you run it:

image

So there’s more going on here than is revealed in the T-SQL. As a matter of fact, Later we’ll look at GroupJoin(), which is the underlying extension method to see if we can figure out what’s going on. For now it’s enough to know the ‘join into’ approach will give you left outer join results, but grouped.

Let’s go back and look at the T-SQL generated by the method that gave us the flattened results:

T-SQL From the Left Outer Join using the Where and DefaultIfEmpty LINQ Extension Methods

It’s doing the same left outer join as the T-SQL we saw above generated by the join into approach. The only differences are the CASE statement that is preventing the NULL assignment, which is an implementation detail. The other difference is that the ‘join into’ T-SQL was saving the count of child rows as can be seen in the value column above. I assume that is to facilitate stepping through the ‘groups’.

SELECT [t0].[ID] AS [MasterID], [t0].[MasterRowName] AS [Master], 
    (CASE 
        WHEN ([t1].[MasterRowID]) IS NOT NULL THEN [t1].[MasterRowID]
        ELSE @p0
     END) AS [ChildMasterID], [t1].[ChildRowName] AS [Child]
FROM [dbo].[MasterRowPile] AS [t0]
LEFT OUTER JOIN [dbo].[ChildRowPile] AS [t1] ON [t1].[MasterRowID] = [t0].[ID]

Note the zero I specified as the null replacement assignment to MasterRowID is parameterized in the resultant T-SQL. After substituting a zero and running the query in SSMS I see what I would expect:

image

It turns out that the extension method generated when two ‘froms’ are used like this is SelectMany. So it seems that GroupJoin and SelectMany both do left outer joins, but GroupJoin groups the results and SelectMany leaves them flattened. We’ll look more closely later on.

Right now, let’s look at the difference when joining related tables.

Inner and Outer Joins on Related Tables

I have a pair of existing tables, Departments and Employees

image

Employees.DepartmentID is a foreign key pointing to Departments.ID. Each Department can be said to have a collection of zero or more Employees, as a Department may exist that has no Employees.

LightSwitch likes these tables just fine, so I can use it to show you the values in the tables.

image

Let’s begin with an inner join that will be the basis of our left outer join.

Inner Join on Related Tables

private static void InnerJoinOnCollectionFromQuery()
{
    // Same Result as EquiJoin
    ShowTitle("InnerJoinOnCollectionFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = from dept in dc.Departments
                    from emp in dept.Employees
                    select new { emp.Name, Department = dept.Name };
        foreach (var emp in query)
            Console.WriteLine("{0} - {1}", emp.Name, emp.Department);
    }
}

The difference is in the second ‘from’, which is using dot notation to get to the collection of Employees linked to the current Department. It turns out the T-SQL is exactly the same, but I think this code is more elegant than using a where to filter all Employees. And it definitely makes the left outer join easier.

As we’d expect for an inner join, we only get rows for Departments that have Employees:

image

And the T-SQL shows we’re actually getting an Equi-Join, which as we’ve already seen, generates exactly the same execution plan as an inner join.

{SELECT [t1].[Name], [t0].[Name] AS [Department]
FROM [dbo].[Departments] AS [t0], [dbo].[Employees] AS [t1]
WHERE [t1].[DepartmentID] = [t0].[ID]
}

Although now it can take advantage of the primary keys and relationship between the tables, just as the inner join would:

image

Left Outer Join on Related Tables

We only have to modify our inner join query slightly to convert it to a left outer join. In fact, we only have to add one extension method, DefaultIfEmpty. Because our dept.Employees collection is already limited to Employees in the current department, we can just slap the DefaultIfEmpty() method directly onto the collection:

private static void LeftJoinUsingCollectionFromQuery()
{
    ShowTitle("LeftJoinUsingCollectionFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = from dept in dc.Departments
                    from emp in dept.Employees.DefaultIfEmpty()
                    select new
                    {
                        Department = dept.Name,
                        Employee = emp.Name
                    };
        foreach (var emp in query)
        {
            Console.WriteLine("{0} - {1}", emp.Department, emp.Employee);
        }
    }
}

And sure enough, rows for every Department, including the ones with no Employees

image

The T-SQL is just a straightforward left outer join:

SELECT [t0].[Name] AS [Department], [t1].[Name] AS [Employee]
FROM [dbo].[Departments] AS [t0]
LEFT OUTER JOIN [dbo].[Employees] AS [t1] ON [t1].[DepartmentID] = [t0].[ID]

Grouped Left Outer Join On Related Tables Using the ‘join into’ Approach

I’m only showing you this because later we’ll want to compare the Extension Method Syntax, which hereafter I’m just going to call ‘Lambda Syntax’

private static void LeftJoinUsingIntoFromQuery()
{
    ShowTitle("LeftJoinUsingIntoFromQuery");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = from dept in dc.Departments
                    join emp in dc.Employees
                    on
                        dept.ID equals emp.DepartmentID
                        into ed
                    select new
                    {
                        Department = dept.Name,
                        Employees = ed
                    };
        foreach (var group in query)
        {
            Console.WriteLine(group.Department);
            foreach (var emp in group.Employees)
                Console.WriteLine("  " + emp.Name);
        }
    }
}

The results have groups for every master row and collections of employees under each Department that has Employees:

image

The T-SQL is what we expect, with a column for the count of employees in each group.

SELECT [t0].[Name] AS [Department], [t1].[ID], [t1].[Name], [t1].[Salary], [t1].[DepartmentID], (
    SELECT COUNT(*)
    FROM [dbo].[Employees] AS [t2]
    WHERE [t0].[ID] = [t2].[DepartmentID]
    ) AS [value]
FROM [dbo].[Departments] AS [t0]
LEFT OUTER JOIN [dbo].[Employees] AS [t1] ON [t0].[ID] = [t1].[DepartmentID]
ORDER BY [t0].[ID], [t1].[ID]

And the output from the T-SQL:

image

Actually I can see now that the grouping is achieved by parsing the flattened output.

LINQ Extension Method Syntax

To understand the difference between the flattened and grouped results, we need to understand the function of the Extension Methods used in Lambda Syntax. Lambda syntax is always generated and passed to IQueryable as Func<> and Action<> delegates wrapped in Expressions, so even if we write the query in Comprehension syntax, understanding the Extension Method that actually gets called is key.

In the case of the successive from keywords in the Comprehension syntax query, the Extension Method is SelectMany()

private static void LeftJoinUsingCollectionFromLambda()
{
    ShowTitle("LeftJoinUsingCollectionFromLambda");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = 
        dc.Departments
        .SelectMany
        (
            dept => dept.Employees.DefaultIfEmpty(),
            (dept, emp) =>
            new 
            {
                Department = dept.Name,
                Employee = emp.Name
            }
        );
        foreach (var emp in query)
        {
            Console.WriteLine("{0} - {1}", emp.Department, emp.Employee);
        }
    }
}

In the case of the ‘join into’ approach in the Comprehension syntax query, the Extension Method is GroupJoin()

private static void LeftJoinUsingIntoFromLambda()
{
    ShowTitle("LeftJoinUsingIntoFromLambda");
    using (RW_DataDataContext dc = new RW_DataDataContext())
    {
        var query = 
        dc.Departments
            .GroupJoin
            (
                dc.Employees,
                dept => (Int32?)(dept.ID),
                emp => emp.DepartmentID,
                (d, emps) =>
                new
                {
                    Department = d.Name,
                    Employees = emps
                }
            );
        foreach (var group in query)
        {
            Console.WriteLine(group.Department);
            foreach (var emp in group.Employees)
                Console.WriteLine("  " + emp.Name);
        }
    }
}

SelectMany() and GroupJoin() LINQ Extension Methods

  • Queryable.SelectMany() Documentation
    • Queryable.SelectMany<TSource, TCollection, TResult>
    • “Projects each element of a sequence to an IEnumerable<T> and invokes a result selector function on each element therein. The resulting values from each intermediate sequence are combined into a single, one-dimensional sequence and returned.’
    • The Enumerable.SelectMany descriiption is a little easier to understand – “Projects each element of a sequence to an IEnumerable<T> and flattens the resulting sequences into one sequence”

Yeah, Yeah, but how does it work. To make the lambda syntax a little easier to follow, In the comprehension query I did something I normally never do, which was to use one letter variable names that would be spread over more than one line of code. Since there’s only c and d, it’s clear what’s being referred to, and it makes them easier to follow in the lambda syntax query. Here it is as generated by LINQPad

image

So that’s not too bad. Each department ‘goes to’ a collection of department employees. For each member in that collection, department and employee ‘go to’ an anonymous type object with department and employee name.

  • Queryable.GroupJoin().Documentation
    • Queryable.GroupJoin<TOuter, TInner, Tkey, TResult>
    • “Correlates the elements of two sequences based on key equality and groups the results.”
    • The Enumerable.GroupJoin description is the same

Here’s the ‘join into’ Lambda with one character variable names

image

Not too bad, I think. Here goes. From Departments join to Employees. Join on Department.ID and Employee.DepartmentID. The output consists of the Department Name and the Employees Collection of all the Employees where Employee.DepartmentID matches Department.ID. As we’ve seen this results in T-SQL that puts out a flattened result set with a column for the count of Employees in each group.

Where’s The LightSwitch?

You have me there. Hopefully the need for LINQ in LightSwitch and WCF RIA services and everywhere else will justify posting this here.

I will tell you this. If you attach to a table that has a nullable foreign key, LightSwitch will see that as a zero-or-one to many relationship, as it should. But if you later make that foreign-key column non-nullable your screens will stop working. If you then rebuild the entire solution you’ll get an error

image

However it is not possible to edit the relationship.

image

So you’re stuck with it unless you do some deleting, which always promises / threatens to delete all related code, so that’s a hassle.

Conclusion

If you want a flattened result set from your left outer join, use the multiple from approach in your Comprehension Query to generate the SelectMany() Extension Method. Otherwise use ‘join into’ to generate the GroupJoin extension method which will give you grouped results.

Tags:
Categories:

6 comment(s) so far...


Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

Good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts

By snow plowing on   3/15/2013 6:40 AM
Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

This site is excellent and so is how the subject matter was explained. I also like some of the comments too.Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work.

By snow plowing on   3/15/2013 6:41 AM
Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

Dear snow plowing,

Thanks so much for the wonderful comments.

Most of my information comes from the Pluralsight course "LINQ Fundamentals", which I highly recommend, along with Pluralsight in general. But I was prompted to write the article after I semi-blew an interview question by mumbling something about grouping. When I started to research the question afterwards I ran across an old posting (2006) by Joe Albahari at

http://social.msdn.microsoft.com/Forums/en-US/linqprojectgeneral/thread/c139313e-d745-4e1d-b3dc-ab355507eb48/

asking what the recommended way was to do an outer join in LINQ to SQL. You know, the creator of LINQPad. My first thought was, "if he's asking, what chance to I have of getting it right".

His code:
var query =
from c in Data.Customers
from p in c.Purchases.DefaultIfEmpty()
...

This was the first time I recalled seeing this approach, and it seemed and still does, a more intuitive syntax than "join into", and yields a more intuitive result set. So I went back to the course and started trying to understand the differences. But they kept slipping away. So I decided the only way I would ever understand was to try and explain it to someone else.

It does seem that when the question is addressed the answer is usually "join into" / GroupJoin(). Maybe because they have the word join right in the name. But it really depends on whether you need grouping or not.

Regards,
Richard

By Richard Waddell on   3/15/2013 7:16 AM
Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

The above, LINQ queried are only for select . I want to delete a record from master page and all its depended record from its related table then how to write query.
Details of my question as given below:
As we have three table

Emp
{
eid(primary key), deptid(foreign key)
}

dept
{
deptid (primary key),
deptname,
subdivisionid(foreign key)
}

subdepartment
{
subdivisionid(primary key)
}
Now my question is , i want delete emp record and all dependented records from dept and subdepartment.

1. would you send lamda expression query.
2. would you send sql query

By raj on   3/20/2013 6:17 AM
Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

raj,
I'm not sure I understand the question, but it sounds like you want deletion of an Emp to cause deletion of the dept he belongs to and the subdepartment (or subdivision?) that the dept belongs to. However, I don't think I'm understanding correctly.

To answer what I think you're asking - Delete rows automatically from a table when they are in a foreign-key relationship with another table by specifying "Cascade Delete" in the relationship 'On Delete Behavior' property. The result of that setting will be that when a row in the primary key table is deleted, all rows in the foreign-key table that point to that primary key will be deleted automatically. So if you specified 'Cascade Delete' on the relationships between subdivision/dept, and dept/Emp, then deleted a subdivision, all dept rows that point to it would be deleted, and all Emp rows that point to the deleted dept rows would be deleted.

By Richard Waddell on   3/20/2013 6:26 AM
Gravatar

Re: Joins and Left Outer Joins in LINQ–Do you want groups with that?

After reading your sharing I want to say that: most of your postings here are so very good with useful information. I will try to shre it . Thank you!



aion accounts

By aion accounts on   3/28/2013 6:07 AM

Your name:
Gravatar Preview
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment   Cancel 

Microsoft Visual Studio is a registered trademark of Microsoft Corporation / LightSwitch is a registered trademark of Microsoft Corporation