.NET & Architecture Thoughts: LINQ Explained

This is the second part of my on going series on LINQ. In the first installment, we had an overview of LINQ. In this post, we will look at some of the underlying concepts which are important to understand to work with LINQ. Though you are not required to master them but an understanding of these concepts gives you the extra level of confidence to work with LINQ. (Note: I will use C# as language of preference in this series).

LINQ and C# Language

LINQ works with C# 3.0 and Visual Basic 9.0. It relies heavily on the features provided by these languages. Although these features may be used separately, they are fundamental to the working of LINQ. Some of these features were delivered with C# 1.x and 2.0 – the predecessor to C# 3.0. The following sections describe these features in more detail.

Generics

Let us look at a simple example of comparing two numbers. The following method accepts two integers as argument, compares them and returns the result:

private int Compare (int x, int y)
{
 return x < y ? x : y;
}

This code works fine for comparing two integers. But suppose we wanted to compare two floating values or even two strings. For this purpose, either we change the method signature to accept the respective types or we end up writing entirely new methods for each type. But as developers, we would be inclined towards using a generalized method to perform the same function irrespective of the argument types.

One solution to the above problem is to accept object as arguments. In C#, every type is driven from the base type object so it can be cast to and back from the object type. We can rewrite the above method as following:

private object Compare (object x, object y)
{
           // comparison logic goes here
}

The above method now accepts object as argument. This makes the method more generalized but with some shortcomings. First of all, C# is a type-safe language, that is, objects have associated type and only operations defined by the associated types can be performed on an object. A comparison operator such as ‘<’ will not operate on the reference type object. Thus a conversion of object to a value type is required before the comparison is performed. This means to use a type, explicit casting required. For example, to convert to an integer, we write the following code:

int i = (int) x;
int j = (int) y;

Similarly, to check for string, we perform the following casting:

string str1 = (string) x;
string str2 = (string) y;

Second, when a value type is converted to a reference type (boxing), it involves an overhead. A conversion back to the value type from a reference type (unboxing) also involves an overhead. As a result of boxing and unboxing, there is a performance hit for an application. This is further magnified when Collections are used. Collections such as Stack, Queue, ArrayList operate on object type only. When an element is added to a collection, boxing takes places. Similarly, when a value is retrieved from the collection, unboxing is done. This means every time an element is added or retrieved from a collection, a performance overhead is involved.

The above issues can be catered through Generics. A C# 2.0 language feature, Generics introduced the concept of type parameters. Classes and methods can defer the definition of one or more type until runtime. With Generics, there is one implementation for all types. The type definition is performed when the method is invoked or an object is instantiated. Let us redefine our Compare method using Generics:


private T Compare <T> (T x, T y) where T : IComparable <T>
{
           return x.CompareTo (y) < 0 ? x : y;
}

A placeholder <T> appended to the method name points to a generic method. The placeholder <T> represents the type parameter to be provided when this method is used. The same placeholder is used for the input and output parameters. Note that since <T> is just a placeholder (and not a type), the CompareTo method cannot operate on it directly. For this reason, <T> implements the IComparable interface. We will look at the ‘where’ constraint shortly. For now we can use the above method for the comparison of different types as following:

int a = 20, b = 19;
int c = Compare <int> (a, b);

string str1 = "Zzzz", str2 = "Aaaa";
string str3 = Compare <string> (str1, str2);

Notice that there is no explicit casting required for the arguments. The method is invoked with the type parameter inplace of the placeholder and that’s it. The CLR is responsible for handling the rest.

The Generic concept also applies to classes and structures. Let us look at a Generic class:


public class UserAuthentication <T>
{
 private T myPassword;
             private string myUserID;

             public T Password
     {
  get { return myPassword; }
                          set { myPassword = value; }
     }

     public string UserID
     {
          get { return      myUserID; }
          set { myUserID    = value; }
     }
}

The above class creates a token for user authentication. Notice the placeholder <T> defined next to the class name. The password field is of the same type parameter. Similarly the Password property has a returns type of <T>. We can instantiated the above class with following code:


UserAuthentication <string> userAuth;
        
userAuth                 = new UserAuthentication <string> ();
userAuth.Password   = "Secret";
userAuth.UserID       = "User1";

UserAuthentication <int> userAuth;
        
userAuth                  = new UserAuthentication <int> ();
userAuth.Password    = 123456;
userAuth.UserID        = "User2";

.NET framework supports different generic collections under the System.Collections.Generic namespace. These generic collections include:

Stack <T> - a generic collection representing a Last-In-First-Out collection
Queue <T> - a generic collection representing a First-In-First-Out collection
List <T> - a generic collection of strongly type object list
Dictionary <K, V> - a generic collection of key-pair values

Let us see a generic List in action which accepts a strongly-typed parameter. We first define the strong-type Product followed by the generic list:


public class Product
{
     string productName;

     public Product (string pName)
     {
          productName = pName;
     }

     public string ProductDetails
     {
          get
          {
              return "Product-Name: " + productName;
          }
     }
}

public class ProductList <T> : IEnumerable <T> where T : Product
{
             List <T> productList = new List <T> ();

     public void AddProduct (T product)
     {
          productList.Add (product);
     }

             public T GetProduct (int index)
     {
          return productList [index];
     }
           
             IEnumerator <T> IEnumerable <T>.GetEnumerator ()
     {
          return productList.GetEnumerator ();
     }

              IEnumerator IEnumerable.GetEnumerator()
     {
          return productList.GetEnumerator ();
     }
}

We can now use the ProductList class to add and list products (a discussion of IEnumerable will follow shortly):


ProductList <Product> productList = new ProductList <Product> ();
        
productList.AddProduct (new Product ("Rice"));
productList.AddProduct (new Product ("Milk"));
productList.AddProduct (new Product ("Sugar"));

… = ((Product) productList.GetProduct (1)).ProductDetails;

Before I sum up the generics discussion, one last thing worth mentioning is constraints. If you have noticed in the ProductList class (and the Compare method), there is a use of ‘where’ constraint. A constraint is a condition applied on the type parameter. We can use constraints to treat only specific types. For this reason the constraint ‘where T : Product’ is added to the class definition. This way we create a generic list which only deals with Product objects. We can have the following different constraints attached to the generic type:

where T : class – type parameter is a reference type
where T : struct – type parameter is a value type
where T : new () – type parameter with a default constructor
where T : interface – type parameter implements an interface

Delegates

A delegate is an object which holds a reference to a method. When the delegate is called, the underlying method is invoked. This way a delegate behaves exactly like the referenced method. The method can either be static or an instance method. A delegate defines the method signature and any method with matching signature can be reference by the delegate. This makes it possible to change the reference to a different method programmatically and update the code in the methods without modifying the delegate. This simple concept of abstraction adds lots of power to the .NET Framework (A detailed discussion of delegates is beyond the scope of this post. A detailed post or two would cover delegates, events, asynchronous callback and threading in the future).

Working with delegates is a pretty simple in C#. Always keep in mind the method signature when defining a delegate. Let us look at the syntax of defining a delegate:

delegate result-type Name (parameters);

The delegate keyword is used as prefix to define the delegate. The result-type reflects the return type from the referenced method. The Name is the identifier of the delegate and the optional comma separated parameters are the input argument to the referenced method. The result-type and parameters define the signature of the delegate. Using the above syntax, we can create a delegate as following:

public delegate void Calculate (int value, int amount);

Any method with matching signature can be referenced by the above delegate. Let us define a method with the matching signature:


public class Accounts
{
          public void DebitAccount (int x, int y)
          {
                    int sum;
                    sum = ((x * 10) / y) * 2;
                    // use sum…
          }        
}

We can now instantiate and invoke the delegate by referencing the DebitAccount method as following:

Accounts objAccount = new Accounts ();
Calculate calc = new Calculate (objAccount.DebitAccount); // reference method
calc (2, 3); // call delegate

When we call the delegate, the DebitAccount method is invoked.

Multicasting is one of the features provided by delegates. A multicast delegate can reference more than one method at a time. When the delegate is called, all the referenced methods are invoked. The methods are invoked in the order in which they are referenced by the delegate. Let us modify the Accounts class by adding the following method to it:


public void CreditAccount (int x, int y)
{
           int average;
           average = ((x / 2) + 10) - y;
           // use average…
}

The calc delegate can now reference the above method using the compound assignment operator (+=):

calc += objAccount.CreditAccount;

Now if the call the delegate using calc (2, 3), both methods get invoked. Once you have used a delegate, the reference must be released. References can be removed using the compound subtraction statement or null value assignment as following:
calc -= objAccount.CreditAccount; // remove reference to CreditAccount
calc = null; // remove all references

Delegates are also used for Asynchronous Callbacks. When asp.net receives a request for a page, it assigns a thread from the thread-pool to the requested page. In a synchronous call, the page holds on to the thread for the duration of the request, blocking calls to the thread for new requests. This is acceptable for a short lived request but if the request is time-bound such as calling multiple web services or an I/O bound job, the delay is annoying.

Delegates help perform asynchronous tasks using Method Callback. With this technique, the delegate invokes the time-consuming method in a separate thread and the control returns immediately. The time-bound task executes in the background while we can continue with our processing. When the background job is finished, control is transferred to a callback method which can handle the result and update any control. Let me demonstrate this concept with an example. We first write the time-consuming process as following:


// the time consuming process
public bool LongProcess (int wait)
{
          // your lengthy task goes here 
          System.Threading.Thread.Sleep (wait); // just for demonstration
          return true; // return the result
}

Next we declare a delegate and use it to invoke the above method.


// define the delegate
public delegate bool LengthyProcessDelegate (int wait);

User clicks on a button to start the process:


// Use the delegate to start the lengthy process
protected void StartProcessing_Click (object sender, EventArgs e)
{
          LengthyProcessDelegate  lDelegate = new LengthyProcessDelegate (LongProcess);
          lDelegate.BeginInvoke (5000, new AsyncCallback (LongProcessCallback), lDelegate);
          for (int i = 0; i <= 50; i++)        
          { 
                    // do something 
          }
}

The above code first creates a new instance of the delegate which holds rerference to the time-consuming method. Next it calls the BeginInvoke method using the delegate. You must be wondering what this method is? Remember, when we declare a delegate, the compiler generates code similar to the following:


class LengthyProcessDelegate : System.MulticastDelegate 
{ 

             // synchronous execution
             public bool Invoke (int wait); 
 
            // asynchronous execution methods 
            public IAsyncResult BeginInvoke (int wait, 
                                               AsyncCallback callback, 
                                               object asyncState); 

            public bool EndInvoke (IAsyncResult result); 
}

The Invoke method is used for sychronous calls. The other two methods, BeginInvoke and EndInvoke handle the asynchronous activity.

BeginInvoke method returns an instance of interface IAsyncResult. It accepts the same arguments as defined by the delegate plus two additional optional parameters. The first parameter is an instance of AsyncCallback (another delegate) which references the callback method. The second parameter is of type object which can be used to pass any information.

The EndInvoke method has the same return type as defined by the delegate. It accepts an instance of IAsyncResult. As mentioned above, BeginInvoke returns an instance of IAsyncResult. This instance is passed down to the callback method which is used by the EndInvoke method as parameter. It in turn returns the result of the time-consuming method which can be used for further processing. Let us look at the callback method in action:


// Callback method
public void LongProcessCallback (IAsyncResult result)
{
          LengthyProcessDelegate lDelegate= (LengthyProcessDelegate) result.AsyncState;
          bool returnValue = lDelegate.EndInvoke (result);
          // use returnvalue
}

As mentioned above, when BeginInvoke is called, the delegate invokes the callback method in a separate thread and the control returns to the program immediately. If you have noticed above, I have got a dummy loop after the call to BeginInvoke method. This is just to show you that the processing will continue and not wait for the time consuming process to complete.

delegates also provide a rich programming model to handle Events. An event lets an object notify the program when its state changes. Events allow objects to provide noification to be responded. This simple concept is very important for inter-process communication where the change of state of one object signals other objects to respond. A good example of events is a Graphical User Interface. The program transfers the control to an event handler when an event such as Button-Click is triggered by the user action. Another example would be an Accounts object raising an event when a transaction is made.

In C# events and delegates go hand-in-hand. Any object which triggers an event isn’t aware when the event is raised. This is left to a delegate which act as a bridge between the object and the event. Let us see this concept with a simple example. We begin with defining a delegate:

// define the delegate
public delegate void AccountDelegate (); // no input, output parameters

Next we define an Accounts class. This class has a Transaction property which fires an event when its value changes. The event is defined using the AccountDelegate. Since the delegate’s signature does not have any input or output parameter, the event handler for the event will have a similar signature. The event handling method OnTransactionOccur is defined as virtual which can be overridden by derived classes.


public class Accounts
{
         private int amount;
         // define the event
         public event AccountDelegate transactionComplete;

         public int Transaction
         {
                    get { return amount; }

                    set 
                    {
                              if (value <= 100)
                    amount--;
                              else
                                         amount++;

                              OnTransactionOccur (); // raise the event
                    }
          }

          protected virtual void OnTransactionOccur ()
          {
                  if (transactionComplete != null)
                        transactionComplete ();
          } 
}

We can now raise the event with the following code:


public void StartProcessing_Click (object sender, EventArgs e)
{
          Accounts account = new Accounts ();
          // register the event
          account.transactionComplete += new AccountDelegate (AccountEventHandler);

          account.Transaction = 100; // raise the event
}

First we instantiate an object of Accounts class. Next we register the event using the delegate with the event handler AccountEventHandler. We then set the Transaction property to raise the event. Remember the base class method OnTransactionOccur is actually responsible for raising the event. As soon as the event is raised, the control is transferred to the following event handler.


public static void AccountEventHandler ()
{
          // event handling code goes here
}

Enumerators

Enumeration, a powerful .NET concept, allows us to iterator through a collection of objects. In .NET, enumerators are based on the Iterator Pattern. Using this pattern, we can access elements of an aggregate (combination of many elements) object without revealing the inner working. The terms Enumerators and Iterators are used interchangably but .NET uses the term Enumerator.

A class must implement the IEnumerable interface to provide iteration. This interface exposes the following single method:


public interface IEnumerable
{
          IEnumerator GetEnumerator ();
}

The GetEnumerator method returns an object of IEnumerator interface. This object does the actual iteration on our collections. According to MSDN, Enumerators can be used to read the data in the collection, but they cannot be used to modify the underlying collection. Enumerator interface exposes the following methods:


public interface IEnumerator
{
          bool MoveNext();
          object Current{ get; }
          void Reset();
}

When we implement our own enumerators using the IEnumerator interface, the enumerator is positioned before the first element initially. To read the first (and subsequent) element, we use the MoveNext method. MoveNext method returns true until the end of the collection is reached. When MoveNext reaches the end of the collection, it returns false. To get the active element, we use the Current method. The Reset method positions the enumerator before the first element.

Let me demonstrate the above concept by a simple example. I begin by defining (beaten to death :-) Product class as following:


public class Product
{
          private string productID;
          private string productName;

          public Product (string id, string name)
          {
           productID   = id;
                    productName = name;
          }         

          public override string ToString()
          {
                     return String.Format ("Product details are ID: {0}, Name: {1}", productID, 
                                                     productName);
          }
}

Next we define our Custom Collection class which implements IEnumerable interface:


public class ProductCollection : IEnumerable
{
          private ArrayList productList;

          public ProductCollection ()
          {
                   productList = new ArrayList ();

                    productList.Add (new Product ("P1", "Tea"));
                    productList.Add (new Product ("P2", "Beverage"));
                    productList.Add (new Product ("P3", "Milk"));
          }
           
          public IEnumerator GetEnumerator ()
          {
                      return ((IEnumerable) productList).GetEnumerator ();
          }
}

I have used an arraylist to define a product collection. Since the arraylist already implements the IEnumerable interface, we can get hold of the its enumerator object by calling its respective GetEnumerator method. The enumerator object can be used with foreach loop to provide enumeration:


public void StartProcessing_Click (object sender, EventArgs e)
{
          ProductCollection collection = new ProductCollection ();

          foreach (Product p in collection)
                 // ListBox1.Items. Add (p.ToString ()); - ading to a listbox
}

The foreach statements simplifies the enumeration code for us. Under the hood, when we use the foreach loop, the compiler generates an initial call to GetEnumerator. It then uses MoveNext for each iteration to get the current item. Since the enumerator is positioned before the first element, the compiler doesn’t have to call the Reset method.

We can also create our own Enumerator class by implementing the IEnumerator interface. Let us modify our ProductCollection class with a nested class as following:


public class ProductEnumerator : IEnumerator
{
          private ProductCollection productCollection;
          private int index;

          public ProductEnumerator (ProductCollection collection)
          {
            productCollection = collection;
                 index = -1;
          }

          public void Reset()
          {
            index = -1;
          }

          public object Current
          {
            get
                 { return productCollection.productList[index]; }
          }

          public bool MoveNext()
          {
            index++;
                 if (index >= productCollection.productList.Count)
                               return false;
                    else
                               return true;
           }
}

Here is the tricky part. We used the GetEnumerator method of the ArrayList to get an enumerator object. We will modify that code to return an instance of our custom enumerator as following:


public IEnumerator GetEnumerator()
{
             return (IEnumerator) new ProductEnumerator (this);
}

One last thing worth mentioning are generic enumerators. These enumerators are used for a generic collection. The two interfaces are IEnumerable<T> and IEnumerator<T> found in System.Collections.Generic namespace. Both these interfaces inherit from their counterpart IEnumerable and IEnumerator. This means that generic enumerable objects are available both generically and non-generically. Let us first look at the IEnumerable<T> interface:


public interface IEnumerable<T> : IEnumerable
{
             IEnumerator<T> GetEnumerator();
}

IEnumerable<T> inherits from IEnumerable. This means any collection implementing IEnumerable<T> interface must define a generic and non-generic version of GetEnumerator method. So a generic collection class will have the following two implementations of GetEnumerator method:


public IEnumerator<T> GetEnumerator()
{
          return new Enumerator<T>(this);
}

IEnumerator IEnumerable.GetEnumerator()
{
          return new Enumerator<T>(this);
}

The same concept applies to IEnumerator<T>. This interface is defined as following:


public interface IEnumerator<T> : IDisposable, IEnumerator
{
           T Current { get; }
}

IEnumerator<T> interface implements IDisposable and IEnumerator interface. It has only one property. So any class implementing IEnumerator<T> inherits the rest of the members from IEnumerator and Idisposable interfaces.

Summary

In this post, we looked at some of the underlying concepts which help us better understand how LINQ works. Generics have introduced the concept of type parameter where the type definition is delayed till an object is instantiated. A place holder <T> defines a generic type. Generics apply to methods, classes and structures. We can apply constraints to our generic type to accept fixed type parameters. The .NET Framework ships with built in generic types such as Queue <T>, Stack <T> to reduce the development overhead.

Another feature important to understand LINQ is delgates. Delegates are objects which hold reference to methods. A call to the delegate invokes the reference method. Delegate can also hold reference to multiple methods. This property is known as multicating. Delegates are also used for asynchronous callbacks. Using a delegate, a callback method is attached to a long running process. After the process has finished, the delegate transfers control to the callback method which can retrieve the result and process it. Delegates also move hand-in-hand with events which notify us of a change. We can then write our event handlers to responsd to these changes.

We also looked at enumerators. Enumerators are used to iterate through the elements of an aggregate object. All enumerators implement IEnumerable and IEnumerator interfaces to provide iteration. The foreach comes in handy to iterate through a collection. Generic collections can also take advantage of enumerators by implementing IEnumerable <T> and IENumerator <T> interfaces.

When I sat down to write this post, I thought of explaining all the underlying concept in this post. But each topic is worth a separate post. For this reason, I will sum up the rest of the concept in the next post. Please do provide your feedback on this series and stay tuned for more…

10 comments:

AnonymousFebruary 24, 2009 at 6:52 PM
Very interesting and well-written. One suggestion: on line 10 of your UserAuthentication example, you have
userAuth.Password = "Hidden";
But userAuth is of type UserAuthentication<int>, so the Password should be an integer, not a string.
AnonymousFebruary 24, 2009 at 7:29 PM
fantastic post! - got me searching your site for whatever else i might of missed - i'm a subscriber.

i'm looking forward to future posts.
Kashif AhmadFebruary 24, 2009 at 8:12 PM
Glad to hear from you all and thanks for the appreciation. I have corrected the code on the mentioned line. Stay tuned for more :)
AnonymousFebruary 25, 2009 at 5:42 AM
Thorough, well written and easy understandable. I will definetely look forward to read more from you in the future. Good work!

Cheers
santoshFebruary 25, 2009 at 8:43 AM
five sata for u..
http://aspdotnetcodebook.blogspot.com
AnonymousFebruary 25, 2009 at 10:35 AM
take look at COMPARISON foreach and ForEach() at:
http://www.elemenex.com/index.php?option=com_content&view=article&id=25:foreach-vs-foreach-performance&catid=7:c&Itemid=8
Vibhas K DhingraFebruary 27, 2009 at 1:25 AM
Good posts and overall a very good attempt to explain and put in clear terms new elements of .NET. A useful resource for a cross section of technical people.
AnonymousFebruary 27, 2009 at 6:32 AM
Very nice and useful post. Besides LINQ, i have also learned Delegate. In near future could you please post another topic about Events and Delegates. Thanks.
Kashif AhmadFebruary 27, 2009 at 9:53 AM
I thank you all for the encouragement.

Flemming, my next post on LINQ will follow shortly.

Abul, I do plan to write about Delegates/Events in the future but it may take some time. I am sure you will also find the next post on LINQ useful which will following shortly.

Stay tuned for more...
AnonymousMarch 1, 2009 at 10:59 PM
Very informative. thanks

.NET & Architecture Thoughts

Pages

Thursday, December 11, 2008

LINQ Explained– Part 2

10 comments: