Tags: , , | Categories: Math Posted by oleksii on 6/13/2012 3:54 PM | Comments (0)

Mathematical induction is one of the most powerful technique used to prove validity of the algorithms. Here is a small post showing how to apply induction to prove a few basic algorithms. A quick reference guide can be found in MIT course notes.

To brush up my knoweledge I took a few homework tasks from the "the Algorithm Design Manual" book by Skiena and published solutions to them.

There are several steps needed for proof by induction, they are

  1. State that the proof is by induction
  2. Define the induction hypothesis
  3. Prove the base case
  4. Assume that the  hypothesis from step 2 is valid
  5. Make an induction step and prove that it is valid

Example 1.Prove by induction that $$\sum_{i=0}^n i = \frac{n(n+1)}{2}, for \forall n \ge 0, n \in \mathbb{N}$$


Let's prove the base case \(n=0\):

$$\sum_{i=0}^n 0 = \frac{0(0+1)}{2} \Leftrightarrow \\ 0 \equiv 0.$$

Now assume that the hypothesis  is valid for \(n\). Let's prove it is valid for \(n+1\).

$$ \\ \sum_{i=0}^{n+1} i = \frac{(n+1)(n + 1 +1)}{2} \Leftrightarrow \\ \sum_{i=0}^{n} i + (n+1) = \frac{(n+1)(n + 2)}{2} \Leftrightarrow \\ \frac{n(n+1)}{2} + (n+1) = \frac{(n+1)(n + 2)}{2} \Leftrightarrow \\ \frac{(n+1)(n+2)}{2} \equiv \frac{(n+1)(n + 2)}{2}.$$


Example 2. Prove by induction that $$\sum_{i=0}^n i(i+1)(i+2) = \frac{n(n+1)(n+2)(n+3)}{4} for \forall n\ge0, n \in \mathbb{N}$$


Starting with the base case \(n=0\):

$$\sum_{i=0}^n 0(0+1)(0+2) =  \frac{0(0+1)(0+2)(0+3)}{4} \Leftrightarrow \\ 0 \equiv 0.$$

Assume that the hypothesis is valid for \(n\). Let's prove it is valid for \(n+1\).

$$ \\ \sum_{i=0}^{n+1} i(i+1)(i+2) = \frac{(n+1)(n+2)(n+3)(n+4)}{4} \Leftrightarrow \\ \sum_{i=0}^{n} i(i+1)(i+2) + (n+1)(n+2)(n+3) = \frac{(n+1)(n+2)(n+3)(n+4)}{4} \Leftrightarrow \\ \frac{n(n+1)(n+2)(n+3)}{4} + \frac{4(n+1)(n+2)(n+3)}{4}= \frac{(n+1)(n+2)(n+3)(n+4)}{4} \Leftrightarrow \\ \frac{(n+1)(n+2)(n+3)(n+4)}{4} \equiv \frac{(n+1)(n+2)(n+3)(n+4)}{4}.$$


Example 3. Prove by induction that \(n^3 + 2n\) is divisible by 3 for \(\forall n\ge0\), \(n \in \mathbb{N}\)


Base case \(n=0\):

$$ mod \frac{0^3 + 20}{3} = 0.$$

Assume that the hypothesis is valid for \(n\), i.e.  \(mod \frac{n^3 + 2n}{3} = 0 \). Let's prove it is valid for \(n+1\).

$$mod \frac{(n+1)^3 + 2(n + 1)}{3} = 0 \Leftrightarrow \\mod \frac{n^3 + 3n^2 + 3n + 1 + 2n + 2}{3} = 0 \Leftrightarrow \\mod \frac{(n^3 + 2n) + 3(n^2 + n + 1)}{3} = 0 \Leftrightarrow \\mod \frac{n^3 + 2n}{3} + mod \frac {3(n^2 + n + 1)}{3} = 0.$$


If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , , , | Categories: .NET, Architecture Posted by oleksii on 4/20/2012 4:02 PM | Comments (0)

This post covers a brief introduction into database sharding including a bit of theory and a sample project. As a database server I selected RavenDB, primarily because it is an impressive project by itself and I just wanted to play about with it. The concept however can be applied to any other server either relational or non-relational. 

The problem of high load

It is common that some projects generate much of attention and as result may receive high load: large number of users, huge network traffic, billions of bank transactions etc. Each software operation has its cost obviously, but the hardware resources have very defined limitations. Therefore there will be a point where hardware can no longer cope with numerous requests.


The easiest approach to deal with high load is to do vertical scaling, i.e. add more hardware: memory, processing power, disk space. This may make things run faster for a while. Unfortunately, if the load continue to increase, vertical scaling is not the cure and more scalable solution is needed. 

Vertical scaling assumes that there is one very powerful server, the beast. In contrast, horizontal scaling promotes usage of many inexpensive machines. The load distribution is the key concept of horizontal scaling, and database sharding is one of the techniques of horizontal load balancing.

Sharding can be defined as a procedure of breaking large databases into smaller independent ones. The idea is to denormalize data and split a highly loaded database into several independent chunks. Denormalization allows us to make chunks of a database independent, thus a single query can hit only one database shard. From the other side, denormalization introduces data duplication, as same information may need to be inserted into several shards to make data consistent. You see there is always a balance between consistency and scalability.

For more information on database sharding concepts see an article on code futures.

Sample overview

To demonstrate sharding technique with NoSQL database, I created a very simplified model of Twitter. In my model there are users and tweets. Each user has a dynamic array of tweets, so he/she can add or remove text messages. Assume that in order to provide consistently fast user experience one database can handle 10 users at most. This means I should store 10 users per database. All my users have an id which is a global counter, based on this counter I can store users with id 1 - 10 on a database shard 1, users 11 - 20 I store on shard 2 and so on. In this sample I will have 3 shards and provide support for 30 users.

This is a step-by-step guide to get the sample working

  1. Running servers. To start Raven DB servers, you can run
    This shall open three console application, one for each RavenDB server
  2. Running client
    1. Either open solution in Visual Studio 2010 and run the project
    2. Run the binary client directly from
  3. Take a look at the RavenDB servers to see which request goes to which server.

   Users are stored in the following shards:
      Users with id 1 - 10 are stored in shard "user_1_10"
      Users with id 11 - 20 are stored in shard "user_10_20"
      Users with id 21 - 30 are stored in shard "user_20_30"

RavenDB API notes 

To load all users from all shards, RavenDB provides same API as if there is only one database, thanks to interface-oriented design. Developer doesn't need to know the actual location of each user (be it shard 1 or shard N). Internally RavenDB understands that IDocumentStore is a ShardedDocumentStore and picks up the implementation of IShardSelectionStrategy. In this sample to resolve the location of the user, RavenDB checks user's id, which is uniquely mapped to a shard.

If I want to retrieve data for defined user, I will use exactly the same code as if there were no shards. The whole idea is to make the client side code unaware where the item is stored. Simplicity is beauty.

Sample output (trimmed)

Client first generates sample data using FizzWare.NBuilder. It then saves it into 3 RavenDB shards. Lastly, client queries the shards and displays all users and all tweets of the one randomly selected user.

FirstName1 LastName1
 Subscribes count: 1
 Tweets count: 24
FirstName2 LastName2
 Subscribes count: 28
 Tweets count: 25
FirstName30 LastName30
 Subscribes count: 4
 Tweets count: 9
Tweets by FirstName2 LastName2
dolore magna et et dolor amet elit lorem lorem amet dolor magna consectetur et a
met consectetur tempor et
consectetur dolore

RavenDB servers

Expected output of the three shards follow below

Note of advice

  1. Get all records is a killer. Client side code as well as server code must be safe by default. For example RavenDB server by design returns 1024 objects at most and client side only allows 128 objects. If user needs more data, a paged version shall be used (Take and Skip). After all any search engine doesn't return you all the results for a query, so why should the user be stormed with billions of objects?
  2. IO operations are expensive. If possible no interactions shall be made with hard drives. For this one can
    •  use caching
    •  hold all database in-memory
  3. RavenDB is an actively developed project, every so often braking changes are introduced so this sample project may not work with versions later than 800.

Additional resources

Download this project

This project is making use of several open source projects. Corresponding acknowledgement and licences are provided with the download.

From this web site:
RavenDbSharding.zip (15.22 mb) [Downloads: 389]
MD5: cf43b8afe1148ca32d88660bfca9b85d
SHA-1: 9d76bf443ea0bc87b3a6ed692dfdd51e7b080c51
SHA-256: 633c71ec9a9e15dc64139ae969d05da775f195ca9b557da4d443dd93ea336165

Or get it from github:

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , | Categories: Interview questions Posted by oleksii on 4/18/2012 6:22 PM | Comments (0)

Suppose there is an island where all people tell either truth or lie (they cannot alliterate between truthful and false statements). People who always say truth are knights, the others who always lie are knaves. The puzzle is to deduce the type of each person.

1. Now, a visitor comes to the island and meets three inhabitants. He asks the first one (Person1) what's his type. Person1 doesn't reply and the second inhabitant (Person2) says "Person1 is a knave". Person3 contradicts Person2 and says "Don't trust Person2, he is lying" (there are problems here, can you spot them?).

2. Now, a visitor comes to the island and meets three inhabitants. He asks the first one (Person1) what's his type. Person1 mumbles something the visitor cannot hear. So he then asks the second inhabitant (Person2) what has Person1 said, to which Person2 replies "Person1 said he was a knave". Person3 contradicts Person2 and says "Don't trust Person2, he is lying" (there is a problem here, can you spot it?).

3. Now, a visitor comes to the island and meets three inhabitants. He asks the first one (Person1) what's his type. Person1 mumbles something the visitor cannot hear. So he then asks the second inhabitant (Person2) what has Person1 said, to which Person2 replies "Person1 said he was a knave". Person3 contradicts Person2 and says "Don't trust Person2, he is lying. I know Person1 is a knight" (this one has one solution).

So can you deduce what is the type of the three persons?

Task level (easy)

If you enjoyed this post, make sure you subscribe to my RSS feed!

Tags: , , | Categories: .NET, Events Posted by oleksii on 3/1/2012 5:28 PM | Comments (0)

Visual studio 2011 Beta was released yesterday. This seems to be a good point to start looking into the new features provided with it. VS comes with new 4.5 .NET framework and C# 5. These are just the main points of interest to me now, there are lots of other exciting tech bits and bobs, see the official release blog entry by Jason Zander.

Some folks have already seen async/await syntax from the community technology preview I guess. If not, here is a simple hello-kinda-world async sample (many more samples).

I start with a basic worker class that uses an Action class (no input, no output, just does the work provided through lambda expression) and executes it asynchronously with the help of async/await keywords. If you wonder what actually happens behind the scene, the answer lies in the compiler's reasoning and code injections.

Compiler parses the code and recognises async and await keywords, it then generates and injects code to perform async operations. Once compiler hits an await keyword, it records this position and returns control to the caller methods. At the same time it also tries to spawn a new task with the asynchronous code in it. NB this code may or may not be executed in a separate thread, in fact if the result is available immediately it is returned straightaway. If the code is a short-running operation, it is dispatched to the threadpool. Lastly, if the task is a long-running operation, it is sent to a separate thread. Clever enough and pretty neat syntax, hm?

public class Worker
    public async void DoWorkAsync(Action action)
        Console.WriteLine("  Before DoWorkAsync");
        await Task.Factory.StartNew(action.Invoke);
        Console.WriteLine("  After DoWorkAsync");

Here is a sample usage, take a quick look at step 2, where the work is actually done. SpinWait is a class that burns CPU without blocking or sleeping (this uses all quants of CPU time for the thread). This class is widely used in non-blocking concurrent collections.

class Program
    static void Main(string[] args)
        //Step 1
        var sl = new SpinWait();
        int max = 100000;
        Worker w = new Worker();

        //Step 2
        Console.WriteLine("Start worker");
        w.DoWorkAsync(() =>
                Console.WriteLine("  Before actual work");
                for (int i = 0; i < max; i++)
                Console.WriteLine("  After actual work");

        //Step 3
        Console.WriteLine("Returned to main");

Check out the output and see that the execution flow returns to the main thread right after it hits the await.

Start worker
  Before DoWorkAsync
Returned to main
  Before actual work
  After actual work
  After DoWorkAsync

A small exercise to the reader. What will happen if I remove the blocking call Console.Read?

If you enjoyed this post, make sure you subscribe to my RSS feed!

Posted by oleksii on 1/31/2012 10:59 PM | Comments (0)

People say generic collections offer a lot of benefits compared to the non-generic ones. This post is a short reminder to myself (mostly) as of what are those benefits.

1 Memory usage

I start with a few lines of code in C#4, all pretty basic stuff. I ran it in Debug mode and waited till debugger stopped at the last line of main method. Then I got Immediate Window (in Visual Studio) opened, loaded SOS extension and dumped all the objects' sizes. Below is the code and in comments I put the extracted sizes of objects.

==Immediate Window==
.load sos

int count = 1000;
var arrayList = new ArrayList(count);        //4040 bytes
var byteList = new List<byte>(count);        //1036 bytes

var stack = new Stack(count);                //4040 bytes
var genericStack = new Stack<byte>(count);   //1036 bytes

var queue = new Queue(count);                //4052 bytes
var genericQueue = new Queue<byte>(count);   //1044 bytes

Apparently, this is all platform, CLR (pointers on 32-bit and 64-bit machines are different and different versions can use updated code), hardware etc etc dependent, but to me it's enough to see a generic collection use much less memory with value types. Overheat for storing boxed objects is quite large (nearly 4 times in this sample!).

2 Performance

I tested adding and taking 10.000.000 items into generic and non-generic collections. Here's a snippet of code and the results.

class Program
	static Random r = new Random();

	static void Main(string[] args)
		int count = 10000000;
		byte[] values = new byte[count];
		var arrayList = new ArrayList(count);
		var byteList = new List<byte>(count);
		//Timed stress-load: add then take
		var arrayListTime = TimeAction(() =>
			for (int i = 0; i < values.Length; i++)
				var value = values[i];
			for (int i = 0; i < values.Length; i++)
				var value = values[i];
				int k = (byte)arrayList[i];
		//Output results
		Console.WriteLine("array list: " + arrayListTime);		
//... } private static TimeSpan TimeAction(Action action) { var sw = new Stopwatch(); sw.Start(); action.Invoke(); sw.Stop(); return sw.Elapsed; } }

And the output is

array list: 	2.2440829 sec
generic list: 	0.2694271 sec
stack: 		2.0472536 sec
generic stack: 	0.3369304 sec
queue: 		1.9683970 sec
generic queue: 	0.3751191 sec

It can be clearly seen that generics outperform non-generic collections. The time gain happens because generic collection doesn't spend time on boxing/unboxing and subsequent copy of the value to the value-type location.

3 Clean code

Consider a situation where one developer thinks he works with fruits, but somebody else has made an error and used an incorrect type.

stack.Push(new Apple());
stack.Push(new Dog());
var item = stack.Pop(); //you cannot peel a dog //but guess when you learn this? ((Fruit)item).Peel();

4 Variance

Generic classes are invariant in .NET2 - .NET4 inclusive. However in .NET 4 delegates and interface got support for one-directional variance (via either in or out keywords). Because it's a big topic I shall stop here and refer the reader to thoughts on stackoverflow and msdn.

5 Limitations

Generics support only classes, structs,interfaces, delegates and methods. It's not possible to parametrize events, constructors, properties, indexers etc.

Performance gain is not noticeable for reference type's collections (as boxing/unboxing/copying does occur) and for single parametrized item (as it's just too quick to make an effect).

If you enjoyed this post, make sure you subscribe to my RSS feed!