Pages

Monday, March 23, 2009

Reading a CSV File in ASP.NET



Recently I came across a programming need of reading a CSV (Comma Separated Values) file in ASP.NET. Thanks to ADO.NET, reading a CSV file in ASP.NET is a breeze. I am writing this post to share my understanding with you.

Reading a CSV File

A comma-separated-value (CSV) file is commonly used for daily programming needs. CSV files are used to import/export data from databases and spreadsheets. These are simple text file (*.txt, *.csv, *.dat) where data values are separated by a comma or some other character. Usually, the first row has the column-names followed by rows of data. Each row has the data in columns separated by some character. Following is a sample (Northing database – Customers table) of what data looks like in a CSV file with the first row having column-names:

CustomerID, CompanyName, ContactName, ContactTitle
ALFKI, Alfreds Futterkiste, Maria Anders, Sales Representative
ANATR, Ana Trujillo Emparedados y helados, Ana Trujillo, Owner

In ASP.NET, we can easily read and get data from a CSV file using ADO.NET. Listing 1 illustrates the code for reading a CSV file:

Listing 1



// filePath: Path where file resides
// filename: File name to be read
private DataTable ReadCSVFile (string filePath, string fileName)
{
string connString;
OleDbConnection conn;
OleDbDataAdapter da;
OleDbCommand cmd;
DataTable dt;

connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filePath +
";Extended Properties='text;HDR=Yes;FMT=Delimited'";

conn = new OleDbConnection (connString);
conn.Open ();

cmd = new OleDbCommand ("SELECT * FROM " + fileName, conn);
da = new OleDbDataAdapter (cmd);
dt = new DataTable ();

da.Fill (dt);

return dt;
}


The above code is straight forward with a few points:

1. In the connection string, HDR stands for header indicating that the first row has the column-names.
2. Similarly, FMT stands for format and specifies the formatting type. By default it is set to Delimited which specifies a comma-delimited value. The other values include Delimited (x) where x is the specific character, TabDelimited which specifies tab-delimited values and FixeLength which specifies fields with fixed length.
3. The filePath specifies the file path on the machine hosting the application. If the application is hosted on the server then filePath reflects path on the server.

For your interest I would like to mention that we can also make a join between different CSV files. For example, we can use the following query to join two files.

“SELECT * FROM fileName1 f1, fileName2 f2 WHERE f1.CustomerID = f2.CustID”;

Hope this post has been useful for you. Stay tuned for more…

Saturday, March 21, 2009

LINQ Explained – Part 3



This is the third part of my on-going series on LINQ. In the second part, we looked at some of the underlying C# features which are important to understand to work with LINQ. In this part, we will conclude with the rest of the features. So let us dive in straight.


Yield Statement

The Yield statement was introduced with C# 2.0. In my previous post, we talked about Enumerators. Enumerators help us iterate through collections and custom classes. Collections and custom classes must implement the IEnumerable interface. This interface has one method, GetEnumerator, which returns an instance of IEnumerator interface. The IEnemrator interface performs the actual iteration.

All this is pretty straight forward but does require some coding on behalf of the developers. Implementing IEnumerator interface for complex classes can be time consuming. Wouldn’t it be nice if the compiler could handle the enumeration process for us? Thanks to the yield statement, this is still possible.

Let me explain this concept by the help of the following simple example:

Listing 1


protected void Button1_Click (object sender, EventArgs e)
{
foreach (string fruit in BuyFruits ()) // ref 1
{
ListBox1.Items.Add (fruit);
}
}


private IEnumerable BuyFruits ()
{
string [] fruits = new string [] {"apple", "orange", "coconut", "papaya",
"mango"};

for (int i = 0; i <= fruits.Length - 1; i++)
{
yield return fruits [i]; // ref 2
}
}


If you have noticed, BuyFruits method has a return type of IEnumerable (reference type) but it returns a string (value type) using the yield return statement. Although BuyFruits is a method, it is acting as a class which performs iteration. This may look strange but under the hood, two things are happening.

First, the yield statement, a compiler directive, instructs the compiler to generate an inner (nested) class which implements the IEnumerator interface. It is this inner class which handles iteration for us. Fig 1 shows this class which has been generated using the ILDASM tool. The nested class is named as d__0 and implements the generic and non-generic versions of IEnumerable interface. It also implements the member functions including MoveNext, Reset method and Current property.

Fig 1



Second, the yield statement returns a single element at a time but maintains state between calls. This means that each subsequent call to yield return statement will return the next element in the collection. This is possible because the compiler maintains a state engine which resumes execution from the previously returned value.

The first time the yield statement is executed in the loop, a new object of the inner (mentioned above) class is created. This instance is used across the loop until it iterates through and reaches the end of the entire collection. Each yield return is delegated to the MoveNext method of the inner class. After the loop terminates, the instance of the inner class is also disposed. A new instance is created for each new call. This makes it type-safe across calls.

The code in Listing 1 works under the above principles. First, a class implementing the IEnumerator interface is generated (Fig 1). Next in the Button1_Click event, the foreach loop calls the BuyFruits method. The first call will create an instance of the generated class. Each yield return call is then delegated to the MoveNext method. This way it iterates through the entire collection.

We can also implement enumeration for a custom collection using the yield statement. Following is the code for the custom collection:

Listing 2


public class FruitCollection : IEnumerable
{
private string[] fruits;

public FruitCollection ()
{
fruits = new string[] {"banana", "apple", "mango", "apricot",
"kiwi"};
}

public IEnumerator GetEnumerator ()
{
foreach (string fruit in fruits)
{
yield return fruit;
}
}
}
protected void Button1_Click(object sender, EventArgs e)
{
FruitCollection basket = new FruitCollection ();

foreach (string fruit in basket)
{
ListBox1.Items.Add (fruit);
}
}


The FruitCollection class implements the IEnumerable interface so GetEnumerator method must be implemented. Within this method, a list of strings is iterated using the yield statement (Note: We can use any looping technique). As mentioned above, a private nested class is generated. This class performs enumeration on behalf of FruitCollection for each yield return statement. The rest of the process is the same as mentioned above. We can then iterate throught the collection as shown in the Button1_Click event.

The yield break statement is another dialect of the yield statement. This statement can stop iteration at any point in the loop. The following snippet will only return the first string in the array.

Listing 3


private IEnumerable BuyFruits()
{
string[] fruits = new string[] {"apple", "orange", "coconut", "papaya",
"mango"};

for (int i = 0; i <= fruits.Length - 1; i++)
{
if (i > 1)
yield break;

yield return fruits[i];
}
}


One last point to mention is that yield statement is equally applicable to generic and non-generic types. For generic types, the IEnumerable interface is used.


Extension Methods

Extension methods are one of the new features added to C# 3.0. According to MSDN “Extension method enable you to ‘add’ methods to existing types without creating a new derived type, recompiling or otherwise modifying the original type”. As the definition implies, we can add new functionality to existing types, primitive or custom.

Extension methods are a special breed of static methods which are invoked like regular instance methods. Extension methods can be added to existing primitive types, classes, structures and interfaces. These methods are static and are defined in a separate static class. Importantly, the first parameter in an extension method defines the type the method will operate on. This parameter must be preceded by this keyword. For example an extension method with first parameter as this string input is available for string data types and input represents the string which invoked the extension method.

Let us see a simple example of an extension method. The following method adds a greeting message to a string type:

Listing 4


public static class GreetingsClass
{
public static string AddGreetings (this string name) //note the input parameter
{
return String.Concat ("We welcome you ", name);
}
}


We can now invoke the AddGreetings method with the following code:


protected void Button1_Click(object sender, EventArgs e)
{
string emp = "Employee";

txtMessage.Text = emp.AddGreetings ();//invoked as a regular method
}


The extension method AddGreetings is defined in a separate static class. The first parameter of the method preceded by this keyword makes it an extension method. Using Intellisence, if we look at the list of available methods for the string ‘emp’, we can locate AddGreetings method along other methods.

Extension methods can also be added to existing .NET classes. In listing 5, an extension method is added to the built-in Stack class. The extension method finds all items in an integer stack which have a value greater than the input parameter.

Listing 5


public static class StackExtender
{
public static int ItemsGreaterThanInput (this Stack stack, int maxValue)
{
int count = 0;

foreach (object o in stack)
{
if (Convert.ToInt32(o) > maxValue)
count++;
}

return count;
}
}

protected void Button1_Click(object sender, EventArgs e)
{
Stack intStack = new Stack ();

intStack.Push(5);
intStack.Push(4);
intStack.Push(1);
intStack.Push(8);

TextBox1.Text = "Total integers greater than (2) = " + intStack.ItemsGreaterThanInput(2).ToString();
}


The above code is pretty straight forward but it has two parameters. The first parameter is the type (Stack) on which it operates. The second is a required parameter which should be provided when this method is invoked.


Var keyword

C# 3.0 introduced another useful feature, the var keyword which allows us to declare implicitly typed variables. An explicit type declaration of these variables is not required which is handled by the compiler. The following line declares a variable of type string which is inferred by the compiler:

var greet = “Hello World”; // no use of string type

It is important to note that these variable types are strongly typed. A proof of this is that the type of these variables cannot be changed later on. For example, the above statement will result in an error “Cannot implicitly convert type 'int' to 'string' if later used as:

greet = 10;

The var keyword can only be used with local variables. An attempt to use it with class level variables will result in an error ”The contextual keyword 'var' may only appear within a local variable declaration”. Since the compiler infers the type of these variables so they ‘must be’ initialized when declared.

The var keyword also rids us of extra type declaration. For example, the following code:

ArrayList alst = new ArrayList ();

can also be written as:

var alst = new ArrayList ();

Similarly, the value returned from a method can also be assigned to a ‘var’ variable. For example, the following method returns an ArrayList:



private ArrayList GetMyList()
{
ArrayList alst = new ArrayList ();
// use alst;
return alst;
}


This method can be directly assigned to the ‘var’ variable:

var myList = GetList ();

If we change the implementation of the above method to return an instance of IList, the compiler is still smart enough to infer the type.



Anonymous Types

Anonymous types are a new feature added to C# 3.0. The idea is to create a type on the fly without actually declaring it. The compiler infers the type at runtime. This gives us a lot of flexibility in defining new types without actually declaring them.

How a type is created ‘just like that’ is based on the concept of object initialization. Object initialization was also introduced with C# 3.0. The idea is to initialize properties when creating objects without invoking the constructor. Let us consider the class in listing 6 (for those who are new, we can now use ‘automatic properties’ where a variable per property is not required):

Listing 6


public class Fruit
{
// automatic properties
public string FruitName { get; set; }
public int FruitPrice { get; set; }
}



The above class can be instantiated as following:

Fruit fruit = new Fruit ();
fruit.FruitName = “apple”;
fruit.FruitPrice = 10;

The above snippet is straightforward but can be time consuming for large classes. Using object initialization, we can reduce the amount of work required to initialize a class with the following syntax:

Fruit fruit = new Fruit {FruitName = “apple”, FruitPrice = 10 };

The concept can be further extended to generic types as following:


List<Fruit> basket = new List<Fruit>{
new Fruit {FruitName = “apple”, FruitPrice = 10 },
new Fruit {FruitName = “mango”, FruitPrice = 15}
};


Coming back to anonymous types, these are created at runtime. Creating an anonymous type is facilitated by ‘object initialization’ and ‘var’ keyword. The following snippet creates an anonymous type - Employee:

var Employee = new { Name = “Emp1”, Age = 35, Salary = 3000 };

We can even create a nested anonymous type using the same syntax. For example, the above code can be extended to add a nested type Address as:

var Employee = new { Name = “Emp1”, Age = 35, Salary = 3000,
Address = new { HouseNo = “H1”, Block = 3, City = “MyCity” }
};

Anonymous methods are extensively used in LINQ. The following post will describe this concept further.



Lambda Expressions and Anonymous Methods

Lambda Expressions are a new feature added to C# 3.0. They provide a handy way to write concise code. But before we jump on to this topic, it is important to understand anonymous methods.

C# 2.0 introduced the concept of ‘anonymous methods’. As the name implies, these are name-less methods which can be declared inline. The method is declared and implemented at the same place. The idea is to avoid defining a separate method which is not reused.

An anonymous method can be used implicitly at a place where a delegate is expected. As we know, when a delegate is called, the referenced method is invoked. In case of an anonymous method, the delegate is replaced by piece of code - inline. The anonymous method is defined using the delegate keyword followed by an optional list of parameters (within parenthesis) and the body of the method. If an anonymous method doesn’t have any parameters, we can omit the parameter parenthesis. An expression defining an anonymous method is known as anonymous-method-expression and has following syntax:

delegate (optional_signature) { // method body - code }

It is important to mention that the signature of the anonymous method must match with the delegate signature (this is how delegates basically work) being replaced. The return type and input parameter list must be the same.

Let us see a simple example of using anonymous methods. Delegates play a de-facto role in events-based programming. Consider the following code for an event:



protected void Page_Load (object sender, EventArgs e)
{
Button1.Click += new EventHandler (Button1_Click);
}

protected void Button1_Click (object sender, EventArgs e)
{
// implementation
}


Using anonymous method we can re-write the above as following:

Listing 7


protected void Page_Load (object sender, EventArgs e)
{
Button1.Click += delegate (object sender1, EventArgs s) { /* implementation */ };
}


In the code above, the EventHandler delegate has been replaced by an anonymous method. If we use ILSADM tool to view the assembly, you will find a new private method with matching signature as illustrated in figure 2. Under the hood, it is this method (b__0) which is invoked by the compiler:


Fig 2



An anonymous method can be declared within a static or an instance method. This also determines the type of the anonymous method, though private. Let us see another example where we actually declare a delegate. This anonymous method will perform some (complex :-) computation:

Listing 8


public delegate int MathDelegate (int n1, int n2, int n3); // declare delegate

protected void Button1_Click (object sender, EventArgs e)
{
int result;

MathDelegate mathdel = delegate (int num1, int num2, int num3) /* anonymous type */
{
return (((num1 * num2) / num3) + 50); // any calculation
};

result = (int) mathdel (10, 10, 2); // invoke delegate and cast reture-value
TextBox1.Text = result.ToString ();
}


Usually for the above code to work, the delegate needs to reference a method with matching signature. However, in the above code, an anonymous method is defined inline to perform the calculation. Under the hood, again a private method is declared which is invoked by the compiler.

The above is a trivial example; however anonymous methods can be useful in many scenarios such as collections and custom classes. Let us see an example of using an anonymous method within a collection. We first define a custom (my favorite :-) Fruit class:



public class Fruit
{
public Fruit (string n, string d)
{
FruitName = n;
FruitDescription = d;
}

// automatic-property
public string FruitName
{ get; set; }

// automatic-property
public string FruitDescription
{ get; set; }
}


Next we define a generic collection of type Fruit. This collection will let us find a particular fruit using an anonymous method:

Listing 9


protected void Button1_Click (object sender, EventArgs e)
{
string fruitName = "kiwi";

List<Fruit> fruits = new List<Fruit> ();

Fruit f1 = new Fruit ("mango", "A tropical fruit");
Fruit f2 = new Fruit ("apple", "Have an apple a day");
Fruit f3 = new Fruit ("kiwi", "Good for health");

fruits.Add (f1);
fruits.Add (f3);
fruits.Add (f2);

// using anonymous method
Fruit searchFruit = fruits.Find (delegate (Fruit f)
{
return f.FruitName == fruitName; // out of scope
});

if (searchFruit != null)
{
txtname.Text = searchFruit.FruitName;
txtdescription.Text = searchFruit.FruitDescription;
}
else
{
Label1.Text = "Fruit not found...";
}
}


The above example has a few points to note. First, the Find method expects a parameter of type Predicate. This parameter represents a delegate (a boolean expression) used by generic lists to filter and search an element. In the above code, the delegate gap is filled by an anonymous method.

Second, if you watch closely, the anonymous method accesses a local variable (fruitName) which is in the scope of the outer method. How this works is pretty interesting. Earlier in Figure 2 we saw that the compiler generated a method for an anonymous method. But for a variable outside the scope of the anonymous method, the compiler generates a class. The generated method and variable are members of this class. When the anonymous method is invoked, the compiler creates and invokes an instance of this class. All the local variables maintain their state across calls made by the same instance. Figure 3 illustrates this concept:


Fig 3



It is also important to note that an anonymous method cannot access ref or out variables of the outer method.

Returning back to Lambda Expressions, they offer a convenient way to write anonymous methods. Using lambda expressions, we can omit much of the syntactical requirement of an anonymous method. For example, using lambda expression, listing 7 can be re-written as:



protected void Page_Load (object sender, EventArgs e)
{
Button1.Click += (object s, EventArgs ea) => { // implementation }
}


To understand how the above works, you should know that a lambda expression offers the following liberty to developers:

1. The delegate keyword is not required.
2. For a single statement, braces can be avoided. We use the lambda operator => (pronounced as goes to) in place of braces. The left side of this operator represents the input parameters while the right side is the expression block.
3. The return keyword is not required.
4. Since C# 3.0 supports type inference, it is perfectly alright to drop the type definition for variables and let the compiler infer it (a very strong feature of lambda expressions).

Using the above features, let us rewrite listing 8 as following:



public delegate int MathDelegate (int n1, int n2, int n3); // declare delegate

protected void Button1_Click(object sender, EventArgs e)
{
int result;

MathDelegate mathdel = (num1, num2, num3) =>
(((num1 * num2) / num3) + 50); // lambda-expression

result = (int) mathdel (10, 10, 2); // invoke delegate and cast reture-value
TextBox1.Text = result.ToString ();

}


Clearly, the above is a neat and concise expression written using lambda expression. The delegate and return keyword are not omitted. Similarly braces have been avoided and the compiler infers the data type of the input parameters. The same applies to the code in listing 9.

To use a lambda expression, we need a delegate. .NET 3.5 facilitates us with two built-in generic delegate types, Func and Action, so that we don’t have to define our own delegates. The former returns a value while the later does not. In a Func delegate, the last parameter represents the return type. It has the following overloads:

public delegate TResult Func<TResult> ()
public delegate TResult Func<T, TResult> (T t)
public delegate TResult Func<T1, T2, TResult> (T1 t1, T2 t2)
public delegate TResult Func<T1, T2, T3, TResult> (T1 t1, T2 t2, T3 t3)
public delegate TResult Func<T1, T2, T3, T4, TResult> (T1 t1,T2 t2, T3 t3, T4 t4)

Similarly, an Action delegate accepts input parameter(s) but has a return type of void. It has the following overloads:

public delegate void Action();
public delegate void Action<T> (T t1);
public delegate void Action<T1, T2> (T1 t1, T2 t2);
public delegate void Action<T1, T2, T3> (T1 t1, T2 t2, T3 t3);
public delegate void Action<T1, T2, T3, T4> (T1 t1, T2 t2, T3 t3, T4 t4);

To help you understand the above, let me give you a simple example of using the Func delegate with three parameters. The Action delegate is no different (except with no return type).

Listing 10


Func<string, string, string /*return type */> GreetPerson = (message, person) =>
message + " " + person;

protected void Button1_Click (object sender, EventArgs e)
{
TextBox1.Text = GreetPerson (“Hello”, “Scott”);
}


I am sure you can evaluate the above code with ease. Func defines a delegate which accepts two input parameter (message, person) of type string and the last parameter, also a string, as the return type. Clearly, it has simplified the process of defining a delegate. Otherwise we had to define a delegate with matching signature.


Summary

In this post, we looked at different C# language features which make up LINQ. The yield statement lets us implement enumerators without implementing any enumerator interface. Also, the yield maintains state between calls. This is possible since the compiler maintains a state engine.

Extension methods enable us to add functionality to existing types. The types can be primitive or custom. Extension methods are static methods but are invoked like instance methods. These methods are defined in a separate static class. The first input parameter is preceded by ‘this’ keyword which makes it an extension method. This parameter also defines the type on which the extension method will operate.

The var keyword is used to declare implicitly typed local variables. These are strongly typed variables and their types cannot be changed later on. The compiler is responsible for determining the type at runtime. We can either explicitly assign a value or return a value from a method to a ‘var’ variable. These variables must be instantiated with a value so that the compiler can infer the type at runtime.

Anonymous Types facilitate us to create types without actually declaring it. The type is inferred by the compiler at runtime. Anonymous types in turn use a feature known as ‘object initialization’ to work. Object initialization lets us initialize properties without invoking the constructor, when creating objects.

Anonymous methods allow us to use a piece of code inline without defining a separate method. The code is declared and used at the same place. Anonymous methods can be used where a delegate is anticipated. An anonymous method is defined using the delegate keyword followed by an optional list of parameters (within parenthesis) and the body of the method. If an anonymous method doesn’t have any parameters, parenthesis can be omitted. The signature of anonymous method must match with the signature of the delegate being replaced.

Lambda expressions offer a concise way to write anonymous methods. Using lambda expression, we can avoid the extra syntactical requirement of an anonymous method. We can omit the delegate and return keyword. Also, the compiler can infer the data type of the input parameters. To use a lambda expression, a delegate is required. .NET framework has two built in function, ‘Action’ and ‘Func’ respectively, which help us use a lambda expression without actually defining it.

With this we come to the end of this post. To concentrate more on LINQ, I had to sum up all the above concepts in two posts otherwise each feature is worth a separate post. In the next post we will start looking at LINQ Syntax and how it can be leveraged into our code. So stay tuned for more…