Thursday, May 3, 2012

Graph Partitioning 1 - KL/FM Introduction

Graph partitioning is an algorithm which attempts to separate an undirected graph into equal sized groups with the minimal edges cuts, those edges that cross the divide. This is an NP-complete problem for which there is no known polynomial time algorithm. This means with all of the computers in the world and all of the time in the universe, we can never really know what the optimal solution is. The best we can hope for is a 'good' solution using heuristic algorithms. Here is a link to an overview of graph partitioning.

I will implement several algorithms over the coming posts to give you a feel for what's out there.
Here is a link to a bunch of algorithms, benchmarks, and results. (If you know of others, tell me and I will include them here)

The first algorithm I will cover is the Fiduccia-Mattheyses(FM) algorithm. This is the best link I found and covers the KL algorithm upon which FM is built.

The basic idea is to swap nodes between partitions looking for better solutions until we "can't" get better. The measure of change, for better, is the total cut value that results from a node being swapped to the other partition, known as gain. We track which partition each node is in, as well as the gain from swapping. This information is stored in each partition using a set of gain buckets. A gain bucket is a set of nodes with a common gain value. The buckets themselves are LIFO queues, why, I will explain tomorrow when I detail implementation and how to achieve the optimal time complexity.

KL/FM Algorithm:
P := ititRandomPartitions(G Graph)
while ( we find improvements )
  while ( unlocked nodes remain )
    N := best node from larger partition
    move N & lock
  Roll back to best configuration
  Unlock all nodes
return best P

We begin by splitting the graph into two sets by randomly assigning nodes to a partition. Each node has its gain calculated and is stored in its partitions appropriate gain bucket. We then start switching nodes by choosing the best node from the largest partition. We do this to select the best choice at the time while moving the partitions toward equal sizes. Once a node is switched, we lock it so that we don't continually swap the node back and forth. The node is moved into a new gain bucket in its new partition. Each of its neighbors have their gains recalculated and are likely moved to a new bucket.
This process repeats until all nodes are locked. We then recall the best solution seen so far, return to it,
and continue to search for better partitioning. This repeated search is known as hill climbing and continues until no improvement is made whatsoever.

KL/FM is a randomized heuristic for solving the graph partitioning problem. However, the solution you get depends on where you start, hence the randomized initial partitioning. "In practive TEN is the magic number for FM" says Madden, meaning that when you run the algorithm over and over, you get similar results. You get results that are close to each other and look something like a Gaussian curve. This is common in heuristic algorithms. You can't really ever know if you have the best solution.

Tomorrow I will show you how to efficiently implement this algorithm with an augmented graph and custom linked list. As of now I have a low score of 1448[seed 2] with LIFOs for each partition instead of the gain buckets. (Just a map and an index in, but still thinking on this one)

No comments:

Post a Comment