Using C# on Patas

October 2009 : Mono upgraded to 2.4.2.3 on Patas. Thanks David!

C# is a powerful general-purpose programming language originally developed by Microsoft. After its approval as a standard by ECMA, it has been independently re-engineered as an open-source implementation which is available on many platforms, including Linux. It is installed and available for use on patas.

Because the mono implementation is extremely compatible, Console-oriented C# programs developed on Windows machines (i.e. with Visual Studio) will typically run on mono without changing the source code. (It is beyond the scope of this document to discuss the compatibility of graphical programs.)

C# programs operate in a fully sandboxed runtime environment (the Common Language Infrastructure, or CLI) which provides garbage-collection. As in Java, Python, and other high-level general-purpose languages, disposal of unused memory objects is tracked and managed by the system, relieving an enormous burden from the application programmer.

1. Sample Program
using System;
using System.Text;

static class MainClass
{
	static void Main(String[] args)
	{
		Console.WriteLine("hello world");
	}
}

To compile and run on patas:

gmcs hello.cs
mono hello.exe

The reason that this is a two-step process hints at the power of C# over interpreted languages; the first step "compiles" your source file into an intermediate byte-code called MSIL which is then processed by a runtime environment, called the Common Language Runtime, or CLR. This type of virtual instruction set is nothing new in computer science. But an innovation that Microsoft's CLR introduced was that this MSIL code is translated, on an as-needed basis, into actual native machine instructions for the target system. And it's retained in this optimal form as the program runs. This is called Just-in-time, or JIT compilation, and it means that your C# program runs with the performance of true native compilation.

In fact, you can even use mono to execute, on patas, an MSIL binary produced by Microsoft Visual Studio 2008! Just copy the .exe file (in binary mode) to patas and run it like so:

 mono compiled_by_msvc.exe

To invoke a mono program from a CONDOR script, you should specify the full path to the mono executable:

universe	= vanilla
getenv		= true
executable	= /opt/mono/bin/mono
arguments	= "myprogram.exe -myarg1 -myarg2"
input		= mystdin.txt
output		= mystdout.txt
error		= mystderr.txt
log		= /tmp/myuwid.log
transfer_executable = false
queue
2. Documentation

Microsoft's detailed commercial-quality documentation on C# is available freely on the web. Of primary interest will be the extensive CLR (".NET Framework") class libraries, which provide a wide array of system services and data structures. The mono project also offers a set of documentation.

Wikipedia entry on C#
mono - Documentation Library
MSDN - C# Language Reference
MSDN - .NET Framework Class Library

The examples below highlight basic programming tasks which are relevant for computational linguists. All have been developed and tested with mono on patas.

3. String Manipulation
using System;
using System.Text;

static class MainClass
{
	static void Main(String[] args)
	{
		String s = "1.\tThis is a string.";
		String[] string_arr = s.Split('\t');

		Char[] trim_chars = ".:;,".ToCharArray();
		String ns = string_arr[0].Trim(trim_chars);
		int i = Convert.ToInt32(ns);

		Console.WriteLine(i);

		foreach (String s2 in string_arr[1].Split())
			Console.WriteLine(s2.Replace('s','z'));
	}
}

Result:

1
Thiz
iz
a
ztring.

Internally, all strings in C# are Unicode, as is the Char data type. A rich set of functions is provided in the Class Library's 'Encoding' namespace for reading and writing the various 8-bit character sets as input and output.

As in many other modern languages, the C# String type is immutable. This allows for significant internal optimizations in the runtime environment but can be inefficient when doing intensive mutation and other editing operations. For this reason, the CLR also implements the StringBuilder object, which allows a large amount of text to be gathered through appending.

Strings can also be edited by converting them to an array of Char, as shown in the next example, "Reading and Writing Files." The example also shows how a String object can be created from an array of Char.

4. Reading and Writing Files
using System;
using System.IO;
using System.Linq;
using System.Text;

static class MainClass
{
	static void Main(String[] args)
	{
		String my_filename = "the_file.txt";

		String data = "Four score and seven years ago.";

		// Write some data to the file
		int i = 0;
		using (FileStream fs = new FileStream(my_filename, FileMode.Create, FileAccess.Write, FileShare.None))
		{
			using (StreamWriter sr = new StreamWriter(fs, Encoding.GetEncoding(28591)))	// Latin-1
			{
				foreach (String s in data.Split())
					sr.WriteLine((++i).ToString() + ". " + new String(s.ToCharArray().Reverse().ToArray()));
			}
		}

		// Read data from the file
		using (FileStream fs = File.Open(my_filename, FileMode.Open, FileAccess.Read, FileShare.Read))
		{
			using (StreamReader sr = new StreamReader(fs, Encoding.GetEncoding(28591)))
			{
				String s;
				while (null != (s = sr.ReadLine()))
					Console.WriteLine(s);
			}
		}
	}
}

Result:

1. ruoF
2. erocs
3. dna
4. neves
5. sraey
6. .oga
5. Hash Table of User-defined Objects
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

public class MyObject
{
	public Double d_x;
	public Double d_y;
	public Double d_z;

	// constructor
	public MyObject(Double x_arg, Double y_arg, Double z_arg)
	{
		d_x = x_arg; d_y = y_arg; d_z = z_arg;
	}
};

static class MainClass
{
	static void Main(String[] args)
	{
		Dictionary<String, MyObject> ht = new Dictionary<String, MyObject>();

		ht.Add("object 1", new MyObject(3.0, 2.1, Math.PI));
		ht.Add("object 2", new MyObject(Math.Sqrt(2.0), Math.Log(6.0,10.0), 3.2));
		ht.Add("3rd object", new MyObject(2.1, 9.9, Double.NaN));

		Console.WriteLine(ht["object 2"].d_x);
	}
}
6. Generic Collections

C# has a number of powerful strongly-typed generic collection objects. For example:

T[] (Array<T>) Fixed-size, ordered list of objects or values of type T
List<T> Ordered list of objects or values of type T
Stack<T> LIFO stack of objects or values of type T
Queue<T> FIFO queue of objects or values of type T
LinkedList<T> Doubly-linked list of objects or values of type T
HashSet<T> Hash table of objects or values of type T
Dictionary<TKey,TValue> Table of objects of type TValue hashed by objects of type TKey
SortedDictionary<TKey,TValue> Sorted table of objects of type TValue hashed by objects of type TKey

The following example shows how to use a generic Dictionary<String,int> to perform a very common Computational Linguistics operation, tallying the unique word-types in some text.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

static class MainClass
{
	static void Main(String[] args)
	{
		String text = "We the People of the United States, in Order to " +
		"form a more perfect Union, establish Justice, ensure domestic " +
		"Tranquility, provide for the common defence, promote the general " +
		"Welfare, and secure the Blessings of Liberty to ourselves and our " +
		"Posterity, do ordain and establish this Constitution for the United " +
		"States of America.";

		Char[] split_chars = new Char[] { '.', ',', ' ' };
		String[] words = text.ToLower().Split(split_chars,StringSplitOptions.RemoveEmptyEntries);

		// use a hash table to tally word-types:
		Dictionary<String, int> hash_tab = new Dictionary<String, int>();
		foreach (String w in words)
		{
			int v;
			if (hash_tab.TryGetValue(w,out v))
				hash_tab[w] = v+1;
			else
				hash_tab.Add(w,1);
		}

		// demonstrate using LINQ to display 12 of the most common word-types plus their counts
		KeyValuePair<String, int>[] kvp_arr = hash_tab.OrderByDescending(e => e.Value).Take(12).ToArray();
		// this is similar to 'list comprehension' in Python
		String s = kvp_arr.Aggregate(String.Empty,(av, e) => av + e.Key + "[" + e.Value + "] ");
		Console.WriteLine(s);
	}
}

Result:

the[6] of[3] and[3] establish[2] for[2] states[2] to[2] united[2] perfect[1] promote[1] general[1] we[1]
7. Access a Web Page
using System;
using System.IO;
using System.Text;
using System.Net;

static class MainClass
{
	static void Main(String[] args)
	{
		String s;
		using (WebClient wc = new WebClient())
		{
			using (Stream str = wc.OpenRead("http://www.compling.washington.edu/compling/"))
			{
				using (StreamReader sr = new StreamReader(str))
					s = sr.ReadToEnd();
			}
		}
		Console.WriteLine(s);
	}
}
8. LINQ Operations

One of the exciting things about mono is that it includes support for one of the latest developments in Microsoft's C# 3.5, namely Language-Integrated Query (LINQ), and its supporting technologies (extension methods and lambda expressions). LINQ allows sophisticated and concise retrieval and manipulation operations to be executed on data collections via native C# language expressions. Categories of operations include aggregation, quantification, conversion, concatenation, retrieval, set (union, intersection, etc.), generation, grouping, join, ordering, projection, partitioning, and restriction (filtering).

LINQ is an expansive topic in its own right. The following simple example gives a glimpse of what is possible.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

static class MainClass
{
	static void Main(string[] args)
	{
		String[] items = { "cat", "pear", "apple", "cat", "banana", "pear", "pear", "apple" };

		KeyValuePair[] tallies = items.GroupBy(k => k, e => 1)
				.Select(f => new KeyValuePair<String, int>(f.Key.ToUpper(), f.Sum()))
				.OrderBy(g => g.Key)
				.ToArray();

		foreach (KeyValuePair<String, int> kvp in tallies)
			Console.WriteLine(kvp.Key + '\t' + kvp.Value);
	}
}

Result:

APPLE   2
BANANA  1
CAT     2
PEAR    3
9. Calling a Python Script as a Shell Process
using System;
using System.IO;
using System.Diagnostics;

class MainClass
{
	public static void Main(string[] args)
	{
		ProcessStartInfo psi = new ProcessStartInfo();
		psi.FileName = "python";
		psi.Arguments = "test.py";
		psi.RedirectStandardOutput = true;
		psi.UseShellExecute = false;

		using (Process p = new Process())
		{
			p.StartInfo = psi;
			p.Start();

			Console.WriteLine(p.StandardOutput.ReadToEnd());
		}
	}
}
10. Using the GroupBy LINQ operator

We have found that, even with the new release 2.4.2 of Mono, the GroupBy operator is much slower than it is on .NET. It is to be expected that you will find some differences in performance between the Microsoft and Mono implementations. In the case of GroupBy, you can always break the operation in two steps, using a temporary dictionary, for example.

Keeping this limitation in mind, here is the example I presented at the 2009 CLMA orientation talk on C#. This complete program reads an HTML file with the complete text of "Moby Dick," strips out HTML tags using a regular expression, and then prints a "graphical" Zipf distribution (word-frequency histogram) of the 35 most common words onto your console.

using System;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass
{
	static void Main()
	{
		String text = new StreamReader("moby_dick.html").ReadToEnd();

		text = Regex.Replace(text, "<(.|\n)*?>", String.Empty);

		String[] words = text.Split(" \n\",.;-!?".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

		var grp = words.GroupBy(w => w.ToLower()).ToArray();

		var tallies = words
			.GroupBy(w => w.ToLower())
			.Select(g => new { g.Key, Tally = g.Count() })
			.OrderByDescending(e => e.Tally);

		int scale = tallies.First().Tally / 60;
		foreach (var tally in tallies.Take(35))
			Console.WriteLine("{0,6} {1}", tally.Key, new String('*', tally.Tally / scale));
	}
};
11. More Information

A recording of the treehouse presentation "Using C# on Patas" can be found here: http://uweoconnect.extn.washington.edu/p30062745/. The slides from this talk are on patas at /opt/dropbox/09-10/orientation/Mono_on_Patas.pdf.

Most of the code examples on this page, along with the "Moby Dick" text, can be found on patas in the following directory: /opt/dropbox/09-10/orientation/csharp-demos

The best book on LINQ that I've found is "Pro LINQ: Language Integrated Query in C# 2008" by Joseph C. Rattz, Jr. (APress 1-59059-789-3). This book presupposes familiarity with C#.

-- Main.gslayden - 28 Oct 2009

12. Discussion

If anybody has any info about how to get a good installation of mono onto ubuntu 8.0.x, I would really appreciate hearing about it. I put about 8 hours into trying to to install mono on Ubuntu, and that is about 6 more hours than I really had to spend.

My chances of adopting c# as my language of choice on Linux are pretty slim unless I can a good install of mono to run on my home version of Linux.

-- Main.andyf - 24 Nov 2008

Andy:

Unfortunately, support for non-RPM-based distributions is pretty slim. Mono is a Novell project, so they focus mainly on SuSE, their own Linux distribution. There is a Debian package, though, and since Ubuntu is Debian-derived, it's possible you could get that to work.

You could also experiment with some of the RPMs here, and see if any of them will install using "alien": http://download.opensuse.org/repositories/Mono/ Note that many of these are for mono 1.9.x. To get 2.x on patas I had to build it from source.

-- brodbd - 30 Dec 2008

-----Original Message-----
From: patas-announce-bounces@mailman2.u.washington.edu On Behalf Of brodb@u
Sent: Thursday, October 15, 2009 3:34 PM
To: patas-announce@u.washington.edu
Subject: [patas-announce] Mono upgraded to 2.4.2.3.

I've updated our local build of Mono, in /opt/mono/bin, to 2.4.2.3.  Mono Debugger (mdb) version 2.4.2.1 is also installed there.

The old version remains available in /opt/mono-2.0.1/bin, if you need it.
Topic revision: r13 - 2009-10-28 - 23:01:09 - gslayden
 

This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
Privacy Statement Terms & Conditions