renatoppl – Big Red Bits

Three interesting puzzles

renatoppl — Thu, 29 Sep 2011 06:12:42 +0000

Here are three puzzles I got from 3 different people recently. The first I got from Charles, who got it from a problem set of the Brazilian Programming Competition.

Puzzle #1: Given and a vector of in-degrees and out-degrees and for , find if there is a simple directed graph on nodes with those in and out-degrees in time . By a simple directed graph, I mean at most one edge between each pair , allowing self loops .

Solving this problem in time is easy using a max-flow computation – simply consider a bipartite graph with and edge between each pair with capacity . Add a source and connect to each node in the left size with capacity and add also a sink in the natural way, compute max-flow and check if it is . But it turns out we can do it in a lot more efficient way. The solution I thought works in time but maybe there is a linear solution out there. If you know of one, I am curious.

The second puzzle was given to me by Hyung-Chan An:

Puzzle #2: There is grid of size formed by rigid bars. Some cells of the grid have a rigid bar in the diagonal, making that whole square rigid. The question is to decide, given a grid and the location of the diagonal bars if the entire structure is rigid or not. By rigid I mean, being able to be deformed.

We thought right away in a linear algebraic formulation: look at each node and create a variable for each of the 4 angles around it. Now, write linear equations saying that some variables sum to 360, since they are around one node. Equations saying that some variable must be 90 (because it is in a rigid cell). Now, for the variables internal to each square, write that opposite angles must be equal (since all the edges are of equal length) and then you have a linear system of type where are the variables (angles). Now, we need to check if this system admits more then one solution. We know a trivial solution to it, which is all variable is 90. So, we just need to check if the matrix has full rank.

It turns out this problem has a much more beautiful and elegant solution and it is totally combinatorial – it is based on verifying that a certain bipartite graph is connected. You can read more about this solution in Bracing rectangular frameworks. I by (Bolker and Crapo 1979). A cute idea is to use the the following more general linear system (which works for rigidity in any number of dimensions). Consider a rigid bar from point to point . If the structure is not rigid, then there is a movement it can make: let and be the instantaneous velocities of points and . If are the movements of points , then it must hold that: , so taking derivatives we have:

This is a linear system in the velocities. Now, our job is to check if there are non zero velocities, which again is to check that the matrix of the linear system is or is not full-rank. Â An interesting thing is that if we look at this question for the grid above, this matrix will be the matrix of a combinatorial problem! So we can simply check if it has full rank by solving the combinatorial problem. Look at the paper for more details.

The third puzzle I found in the amazing website called The Puzzle Toad, which is CMU’s puzzle website:

Puzzle #3: There is a game played between Arthur and Merlin. There is a table with lamps disposed in a circle, initially some are on and some are off. In each timestep, Arthur writes down the position of the lamps that are off. Then Merlin (in an adversarial way) rotates the table. The Arthur’s servant goes and flips (on –> off, off –> on) the lamps whose position Arthur wrote down (notice now he won’t be flipping the correct lamps, since Merlin rotated the table. if Arthur wrote lamp 1 and Merlin rotated the table by 3 positions, the servant will actually be flipping lamp 4. The question is: given and an initial position of the table, is there a strategy for Merlin Â such that Arthur never manages to turn all the lamps on.

See here for a better description and a link to the solution. For no matter what Merlin does, Arthur always manages to turn on all the lamps eventually, where eventually means in time. The solution is a very pretty (and simple) algebraic argument. I found this problem really nice.

Bayesian updates and the Lake Wobegon effect

renatoppl — Mon, 26 Sep 2011 01:48:27 +0000

We seem to have a good mathematical understanding of Bayesian updates, but somehow a very poor understanding of its practical implications. There are many situations in practice that we easily perceive as irrational, one of the most famous is the so calledÂ Lake Wobegon effect, named after theÂ fictional town in Minnesota, where “all the women are strong, all the men are good looking, and all the children are above average”. It is described as a cognitive bias where individuals tend to overestimate their own capabilities. In fact, when drivers are asked to rate their own skilled compared to the average in three groups: low-skilled, medium-skilled and high-skilled, most rate themselves above the average.

In fact, the behavioral economics literate is full of examples like this where the observed data is far from what you would expect to observe if all agents were rational – and those are normally attributed to cognitive biases. I was always a bitÂ suspicious of such arguments: it was never clear if agents were simply not being rational or whether their true objective wasn’t being captured by the model. I always thought the second was a lot more likely.

One of the main problems of the irrationality argument is that they ignore the fact that agents live in a world where its states are not completely observed. InÂ a beautiful paper in Econometrica called “Apparent Overconfidence“, Benoit and Dubra argue that:

“But the simple truism thatÂ most people cannot be better than the median does not imply that most people cannot rationally rate themselves above the median.”

The authors show that it is possible to reverse engineer a signaling scheme such that the data is mostly consistent with the observation. Let me try to give a simple example they give in the introduction: consider that each driver has one of three types of skill: low, medium or high: and . However, they can’t observe this. They can only observe some sample of their driving. Let’s say for simplicity that they can observe a signal that says if they caused an accident or not. Assume also that the larger that skill of a driver, the higher it is his probability of causing an accident, say:

Before observing each driver things of himself as having probability $\frac{1}{3}$ of having each type of skill. Now, after observing , they update their belief according to Bayes rule, i.e.,

doing the calculations, we have that and for the of the drivers that didn’t suffer an accident, they’ll evaluateÂ , , , so:

\mathbb{P}(L \cup M \vert \neg A)' title='\mathbb{P}(H \vert \neg A) > \mathbb{P}(L \cup M \vert \neg A)' class='latex' />

and therefore will report high-skill. Notice this is totally consistent with rational Bayesian-updaters. The main question in the paper is: “when it is possible to reverse engineer a signaling scheme ?”. More formally, let be a set of types of users and let , i.e., is a distribution on the types which is common knowledge. Now, if we ask agents to report their type, their report is some . Is there a signaling scheme which can be interpreted as a random variable correlated with such that is the distribution rational Bayesian updaters would report based on what they observed from ? The authors give necessary and sufficient condition on when this is possible given .

—————————–

A note also related to the Lake Wobegon effect: I started reading a very nice book by Duncan Watts called “Everything Is Obvious: *Once You Know the Answer” about traps of the common-sense. The discussion is different then above, but it also talks about the dangers of applying our usual common sense, which is very useful to our daily life, to scientific results. I highly recommend reading the intro of the book, which is open in Amazon. He gives examples of social phenomena where, once you are told them, you think: “oh yeah, this is obvious”. But then if you were told the exact opposite (in fact, he begins the example by telling you the opposite from the observed in data), you’d also think “yes, yes, this is obvious” and come up with very natural explanations. His point is that common sense is very useful to explaining data observations, specially observations of social data. On the other hand, it is performs very poorly on predicting how the data will look like before actually seeing it.

Restaurants in Jerusalem

renatoppl — Tue, 31 May 2011 09:31:50 +0000

The Algorithmic Game Theory Semester in Jerusalem has been amazing – made a lot of good friends and learned many interesting things. Now, ready to go back to the US, I feel I should share a bit about nice places to go in Jerusalem. Together with Vasilis Syrgkanis, I decided to compile a list of good restaurants, cafes, … specially a list of places open in shabbat (which is quite useful for the visitor).

My Favourite Restaurants

Machne-yuda : Next to the Mahane Yehuda market, the restaurant menu changes everyday (with a couple of dishes which are always there). The food is whatever is in that day in the market. Everything was perfect both times I went. I strongly recommend the polenta and the black risotto. We took almost 1 month to make a reservation and eventually we got one. Eventually we learned a nice trick – if you arrive there very late (say aroud midnight), it is not impossible to get a table (we got one in the bar right away, arriving at 11:30pm). Elisa Celis suggested this place, and I think Omer Reingold suggested it to her.
Chakra : There is a tasting menu that is amazing. It is around 160 NIS, but worth every cent. It started with chicken livers, salads, calamari, ceviche, shrimp (the shrimp was awesome), fish kebab and mussels. After the fish part ended, they brought us many meat dishes, like a beef strogonoff (I am still puzzled by how soft the meat was), lamb chops and beef kebab. It ended with a simple, yet perfect, chocolate dessert and ice-cream with tahina. Thanks to Shahar Dobzinski and Sigal Oren for the suggestion.
Carousella: It is a small restaurant in the corner of the street I used to live in Jerusalem (it is located in the corner of Azza and Metudela in Rehavia). It is a vegetarian French/Israeli restaurant and has my favorite shakshouka in town – it is actually a mix o shakshouka and ratatouille. I also like very much their risotto. We used to have breakfast/brunch there quite a lot – usually getting either the shakshouka or the musli. The house cake is definitely recommended too… It is my favourite place to work (they have wifi and a nice record player and a huge collection of records). Thanks to Omer Tamuz for this suggestion.

Hummus Places

Lina: It is known as the best hummus in the city (some claim the best hummus in the world). It is located in the Via Dolorosa and not hard to find. Arriving in the Via Dolorosa, you can just ask directions and everyone knows. Everything is recommended.
From Gaza to Berlin: is a small and super cheap restaurant in Rehavia – their hummus is very tasty (specially the hummus with meat), the kubbeh soup is very nice and their falafel is famous around here.
Marvad Haksamim (The Magic Carpet): a very nice arabic place – great bread, hummus, … Â The soups are very good (I tried the lentil and the kubeh soup) and the meat is usually good as well. I tried the chopped livers and the kebab. The Mixed Jerusalem Grill is a famous plate, I guess.

Places to eat/work on shabbat: on Saturday and Friday night most of the things in the city are closed, so it is good to know some places to go:

Restobar: good food and generally open
Zuni: very nice and open 24 hours a day, 7 days a week. Has every type of meal (brunch, lunch, coffee, drinks), served all the time. Good for working too…
Mona: a very cool American/Israeli restaurant inside a former art school. Food is very good and has also a very nice bar.
Spaguettim: nice location, wifi, good coffee and snacks. I never tried the food, though.
many places on Hillel Street: for example the Iwo Meat Burger (a pretty good non-kosher burger place), the pizza place next to it and a couple of bars nearby

In the old city

Cheese pie: I don’t know how to give an exact location, but this should help you: very near to the Church of the Holy Sepulcher in the Old City, you can find the Coptic Patriarchate. Try to locate the stairs leading to the Patriarchate and before going up, there is a place that sells only one thing: a cheese pie. There is a big marble table and a man with a bowl of dough, some cheese and syrup. He prepares everything in front of you and then puts it in the oven. It takes around 20 minutes. It is very impressive.
Armenian Tavern: a very beautiful place hidden in the old city. Food is nice Â (the lemonade is very refreshing and tastes great) but the better thing is the feeling that you are dining inside a museum.

Hotel Bars and Cafes

Mamilla Terrace: the restaurant in the terrace of Mamilla Hotel is pretty nice and has a good view of the old city. It is a good place to go for drinks as well. It is closed (I think) on shabbat.
Notre Dame Terrace: the Notre Dame hotel has also a great view to the Temple Mount and pretty nice food. It is very relaxing to sit there and look the old city. There is another restaurant in the hotel called La Rotisserie, which I very much appreciated. It is also open on shabbat.

In Tel Aviv

Raphael: inside the Dan Hotel and overlooking the sea. Most of the times I went to Tel Aviv I ended up eating there.
Boya: Thanks to Lior Seeman for this suggestion. We had many small dishes (something like tapas) and all were amazing. I think it was my favourite place in Tel Aviv.

MHR, Regular Distributions and Myerson’s Lemma

renatoppl — Mon, 30 May 2011 10:46:08 +0000

Monotone Hazard Rate (MHR) distributions and its superclass regular distributions keep appearing in the Mechanism Design literature and this is due to a very good reason: they are the class of distributions for which Myerson’s Optimal Auction is simple and natural. Let’s brief discuss some properties of those distributions. First, two definitions:

Hazard rate of a distribution :
Myerson virtual value of a distribution :

We can interpret the hazard rate in the following way: think of as a random variable that indicates the time that a light bulb will take to extinguish. If we are in time and the light bulb hasn’t extinguished so far, what is the probability it will extinguish in the next time:

t] \approx \frac{f(t) \delta}{1-F(t)}' title='\mathbb{P}[T \leq t+\delta \vert T > t] \approx \frac{f(t) \delta}{1-F(t)}' class='latex' />

We say that a distribution is monotone hazard rate, if is non-decreasing. This is very natural for light bulbs, for example. Many of the distributions that we are used to are MHR, for example, uniform, exponential and normal. The way that I like to think about MHR distributions is the following: if some distribution has hazard rate , then it means that . If we define , then , so:

From this characterization, it is simple to see that the extremal distributions for this class, i.e. the distributions that are in the edge of being MHR and non-MHR are constant hazard rate, which correspond to the exponential distribution for . They way I like to think about those distributions is that whenever you are able to prove something about the exponential distribution, then you can prove a similar statement about MHR distributions. Consider those three examples:

Example 1: for MHR distributions. This fact is straightforward for the exponential distribution. For the exponential distribution and therefore

\lambda^{-1}] = 1-F(\lambda^{-1}) = e^{-1} ' title='\mathbb{P}[\phi(z) \geq 0] \geq \mathbb{P}[z > \lambda^{-1}] = 1-F(\lambda^{-1}) = e^{-1} ' class='latex' />

but the proof for MHR is equally simple: Let , therefore .

Example 2: Given iid where is MHR and and , then . The proof for the exponential distribution is trivial, and in fact, this is tight for the exponential, the trick is to use the convexity of . We use that in the following way:

Since , we have that . This way, we get:

Example 3: For MHR distributions, there is a simple lemma that relates the virtual value and the real value and this lemma is quite useful in various settings: let 0 \}' title='r = \inf \{z; \phi(z) > 0 \}' class='latex' />, then for , . Again, this is tight for exponential distribution. The proof is quite trivial:

Now, MHR distributions are a subclass of regular distributions, which are the distributions for which Myerson’s virtual value is a monotone function. I usually find harder to think about regular distributions than to think about MHR (in fact, I don’t know so many examples that are regular, but not MHR. Here is one, though, called the equal-revenue-distribution. Consider distributed according to . The cumulative distribution is given by . The interesting thing of this distribution is that posted prices get the same revenue regardless of the price. For example, if we post any price , then a customer with valuations buys the item if r' title='z > r' class='latex' /> by price , gettingÂ revenue is . This can be expressed by the fact that . I was a bit puzzled by this fact, because of Myerson’s Lemma:

Myerson Lemma: If a mechanism sells to some player that has valuation with probability when he has value , then the revenue is .

And it seemed that the auctioneers was doomed to get zero revenue, since . For example, suppose we fix some price and we sell the item if by price . Then it seems that Myerson’s Lemma should go through by a derivation like that (for this special case, although the general proof is quite similar):

but those don’t seem to match, since one side is zero and the other is 1. The mistake we did above is classic, which is to calculate . We wrote:

but both are infinity! This made me realize that Myerson’s Lemma needs the condition that , which is quite a natural a distribution over valuations of a good. So, one of the bugs of the the equal-revenue-distribution is that . A family that is close to this, but doesn’t suffer this bug is: for , then . For 2' title='\alpha > 2' class='latex' /> we have , then we get .

Bimatrix games

renatoppl — Wed, 25 May 2011 16:02:17 +0000

This week in Israel is the Workshop on Innovations in Algorithmic Game Theory, which has been a collection of amazing talks and I thought of blogging about something new I learned here. First, Paul Goldberg gave an amazing talk about the Lemke-Howson algorithm and Homotopy Methods. Later, during the poster session Jugal Garg presented the impressive work on an exact algorithm for finding Nash equilibria in rank 1 bimatrix games. Both have in common the use of homotopy, which I found I quite cool idea – and has been present in Game Theory for a while, but I didn’t know about. They also have to do with improving the understanding we have on the Lemke-Howson Algorithm – a (potentially exponential time) algorithm to find Nash equilibrium in 2-person bi-matrix games, but that seems to work pretty well in practice. As always, I though that blogging about it would be a good idea in order to understand those concepts well.

Bimatrix Game and How to compute equilibrium

Bimatrix game is the simplest class of games studied. Basically it is a player game, with strategies for player and strategies for player 2 which is represented by a pair of matrices . Let represent a probability distribution that player assigns to his strategies and be the same for player . This way, the players experience utilities:

The best understood class of those games is the one where , called zero-sum games. For this class, computing a Nash equilibrium is very easy and it is given by the famous min-max theorem: player finds maximizing where is the -th unit vector. Similarly player finds maximizing . Then the pair of strategies obtained is a Nash equilibrium – and this verifying that is not hard.

When , the problems gets a lot more complicated. Proving that equilibrium exist can be done using various fixed point theorems, as Brouwer or Kakutani. There is a very simple exponential time algorithm for finding it and the key observation is the following, if is a Nash equilibrium then:

which means that each strategy for player is either in the support or is a best response. Proving that is trivial (if some strategy is in the support and is not a best response, then reducing the probability we play it is an improving deviation). Therefore if we just guess the support of and the support of we just need to find some strategies with this support satisfying the inequalities above. This can be done using a simple LP. This is clearly not very efficient, since it involves solving LPs. A still exponential, but a lot more practical method, is:

Lemke-Howson Algorithm

A good overview of the L-H algorithm can be found in those lecture notes or in a more detailed version in third chapter of the AGT book (a pdfÂ can be found in Tim’s website). Here I’ll present a quick overview. The main idea is to define the best-response polytopes and :

The intuition is that a point represents the fact that the payoff of player when player plays is at least . We could re-write as .

Each of the polytopes has inequalities and equality. Given we define as the indices of the tight inequalities in and similarly we define as the indices of the tight inequalities in . The theorem in the previous section can be rephrased as:

is Nash equilibrium iff

So we need to look at points in the polytope below that are fully-labeled. And the way that this is done in L-H is quite ingenious – it is a similar idea that is used by the Simplex Method – walking through the vertices of the polytope looking for the desired vertex. In order to do it, let’s define another polytope:

Note that there is a clear bijection between and . This is a projective transformation given by . Notice that the labels are preserved by this transformation and the vertex and edges of the polytope are mapped almost 1-1 (except the vertex ). Notice a couple of details:

1. A vertex in corresponds to a point with labels. A vertex in corresponds to a point with labels.

2. The point corresponds to a fully-labeled point of , unfortunately this is the only fully-labeled point that doesn’t correspond to a Nash equilibrium.

3. By taking an edge of the polytope (which is define by labels in and labels in . We can move between two nodes that are almost fully labeled.

The idea is to consider the set of nodes that are almost fully-labeled: fix some label and consider all the nodes such that . Those points are either , a Nash equilibrium or they have one duplicated label, since . An edge is composed of labels (no duplicated labels). So, one almost fully-labeled point that is not a Nash or is only connected to two other vertices via an edge (which correspond to dropping one of the duplicated labels and following the corresponding edge).

This gives us a topology of the space of Nash equilibria. This tells us a couple of facts: (1) we can find a Nash equilibrium by starting in and following the L-H path. In the end of the path, we must find a Nash equilibrium. (ii) If the polytope is not degenerate, then there is an odd number of Nash equilibria and they are connected by L-H edges in the following way:

where the blue dots are the Nash equilibria, the white dots are the almost fully-labeled points and the edges are the L-H edges. The number of Nash equilibria are odd by a simple parity argument.

The classic way in which simplex-like methods walk through the vertices of a polytope is by pivoting. The same way we can implement Lemke-Howson. For an explanation on the implementation, please look at the chapter 3 of the AGT book.

One can ask if we could modify the L-H path following to go through the path faster. Recently, Goldberg, Papadimitriou and Savani proved that finding the Nash equilibrium that L-H outputs is PSPACE-complete. So, in principle, finding this specific equilibrium, seems harder than finding any Nash, which is PPAD-complete.

My current plan is on some other blog posts on Bimatrix games. I wanted to discuss two things: first the homotopy methods and homotopy fixed-point-theorems (which is the heart of the two papers mentioned above) and about new techniques for solving zero-matrix where the strategy space is exponential.

DP and the ErdÅ‘sâ€“RÃ©nyi model

renatoppl — Mon, 16 May 2011 21:41:28 +0000

Yesterday I was in a pub with Vasilis Syrgkanis and Elisa Celis and we were discussing about how to calculate the expected size of a connected component in , the ErdÅ‘sâ€“RÃ©nyi model. is the classical random graph obtained by considering nodes and adding each edge independently with probability . A lot is known about its properties, which very interestingly change qualitatively as the value of changes relativeto . For example, for then there is no component greater than with high probability. When , 1' title='c>1' class='latex' /> and , then the graph has a giant component. All those phenomena are very well studied in the context of probabilistic combinatorics and also in social networks. I remember learning about them in Jon Kleinberg’s Structure of Information Networks class.

So, coming back to our conversation, we were thinking on how to calculate the size of a connected component. Fix some node in – it doesn’t matter which node, since all nodes are equivalent before we start tossing the random coins. Now, let be the size of the connected component of node . The question is how to calculate .

Recently I’ve been learning MATLAB (actually, I am learning Octave, but it is the same) and I am very amazed by it and impressed about why I haven’t learned it before. It is a programming language that somehow knows exactly how mathematicians think and the syntax is very intuitive. All the operations that you think of performing when doing mathematics, they have implemented. Not that you can’t do that in C++ or Python, in fact, I’ve been doing that all my life, but in Octave, things are so simple. So, I thought this was a nice opportunity for playing a bit with it.

We can calculate using a dynamic programming algorithm in time – well, maybe we can do it more efficiently, but the DP I thought was the following: let’s calculate where it is the expected size of the -connected component of a random graph with nodes where the edges between and other nodes have probability and an edge between and have probability . What we want to compute is .

What we can do is to use the Principle of Deferred Decisions,Â and toss the coins for the edges between and the other nodes. With probability , there are edges between and the other nodes, say nodes . If we collapse those nodes to we end up with a graph of nodes and the problem is equivalent to plus the size of the connected component of in the collapsed graph.

One difference, however is that the probability that the collapsed node is connected to a node of the nodes is the probability that at least one of is connected to , which is . In this way, we can write:

where . Now, we can calculate by using DP, simply by filling an table. In Octave, we can do it this way:


function component = C(N,p)
  C_table = zeros(N,N);
  for n = 1:N for s =1:N
    C_table(n,s) = binopdf(0,n-1,1-((1-p)^s)) ;
    for k = 1:n-1
      C_table(n,s) += binopdf(k,n-1,1-((1-p)^s)) * (k + C_table(n-k,k));
    end
  end end
  component = C_table(N,1);
endfunction

And in fact we can call for say and and see how varies. This allows us, for example, to observe the sharp transition that happens before the giant component is formed. The plot we get is:

ErdÅ‘sâ€“RÃ©nyi model

Reasoning about Knowledge

renatoppl — Sat, 16 Apr 2011 20:52:42 +0000

Last week, I went to Petra in Jordan together with Vasilis Syrgkanis and in the road we kept discussing about the Blue-Eyed Islanders Puzzle, posted by Terry Tao on his blog. It is a nice puzzle because it allows us to reflect on how to think about knowledge and how to make statements like “A knows that B knows that C doesn’t know fact F”. My goal in this post is not to discuss about the puzzle itself, but about a framework that allows us to think about it in a structured way. Nevertheless, I strongly recommend the puzzle, first because it is a lot of fun, and second, because it gives a great non-trivial example to think about this kind of thing. But first, a photo from Petra:

And then some references:

Classic: “Agreeing to Disagree” (Robert Aumann)
Survey:Â “Common Knowledge” (John Geanakoplos)
Book: “Reasoning about Knowledge” (Fagin, Halpern, Moses, Vardi)

I first read Aumann’s classic, which is a very beautiful and short paper — but where I got most intuition (and fun) was reading Chapter 2 of Reasoning About Knowledge.Â So, I’ll show here something in between of what Chapter 2 presents and what Geanakoplos’ survey presents (which is also an amazing source).

We want to reason about the world and the first thing we need is a set representing all possible states the world can take. Each is called a state of the world and completely described the world we are trying to reason about. To illustrate the example, consider the situation where there are people in a room and each person has a number in his head. Person can see the number of everyone else, except his. We want to reason about this situation, so a good way to describe the world is to simply define of all -strings. We define an event to be simply a subset of the possible states of the world, i.e., some set . For example, the even that player has number in his head is simply: . We could also think about the event that the sum of the numbers is odd, which would be: . Now, we need to define what it means for some person to know some event.

For each person , his knowledge structure is defined by a partition ofÂ . The rough intuition is that player is unable to distinguish two elements in the same cell of partition . For each , is the cell of partition containing . The way I see knowledge representation is that if is the true state of the world, then person knows that the true state of the world is some element in .

Definition: We say that person knows event on the state of the world is . Therefore, if person knows event , the world must be in some state .

Above we define the knowledge operator . Below, there is a picture in which we represent its action:

Now, this allows us to represent the fact that person knows that person knows of event as the event . Now, the fact the person knows that person doesn’t know that person knows event can be represented as: , where .

An equivalent and axiomatic way of defining the knowledge operator is by defining it as an operator such that:

Notice that axioms 1-4 define exactly a topology and together with 5 it is a topology that is closed under complement. The last two properties are more interesting: they say that if player knows something, then he knows that he knows and if the doesn’t know something, he knows that he doesn’t know. Aumann goes ahead and defines the notion of common knowledge:

Definition: We say that an event is common knowledge at if for any and for any sequence where are players, then .

Suppose that is a partition that is a simultaneous coarsening of , then for all cells of this partition, either or is common knowledge.

An alternative representation is to represent $\Omega$ as nodes in a graph and add an edge between and labeled with if they are in the same cell of . Now, given the true state of the world , one can easily calculate the smallest event such that knows : this is exactly the states that are reached from just following edges labeled with , which is easily recognizable as .

Now, what is the smallest set that knows that knows ? Those are the elements that we can arrive from a path following first an edge labeled and then an edge labeled . Extending this reasoning, it is easy to see that the smallest set that is common knowledge at are all the elements reachable from some path in this graph.

More about knowledge representation and reasoning about knowledge in future posts. In any case, I can’t recommend enough the references above.

Do we believe the Axiom of Choice ?

renatoppl — Sun, 10 Apr 2011 20:17:32 +0000

Continuing on my series of posts from Israel, I’d like to share some exciting puzzle that I heard today from Omer Tamuz. I’ve learned before about the Axiom of Choice in a Measure Theory class, but never saw a so striking and counter-intuitive application of it. Ok, you might say the Banachâ€“Tarski paradox is a pehaps better example – but since it’s proof is so complicated, it is not as striking as seeing how a simple application of it can generate un-intuitive results. First, let me present two puzzles:

Puzzle #0: There are people in a line, and each has a number on his hat. Each player can look to the numbers of the players in front of him. So, if is the number of player , then player knows . Now, from the players will say his own number. Is there a protocol such that players will get their own number right? (Notice that they hear what the players before him said).

Puzzle #1: Consider the same puzzle with an infinite number of players. I.e. there are and player knows for all i' title='j > i' class='latex' />. Show a protocol for all players, except the first to get the answer right?

Puzzle #2: Still the same setting, but now players don’t hear what the previous player said. Is there a protocol such that only a finite number of players get it wrong ? (notice that it needs to be finite, not bounded).

Puzzle #0 is very easy and the answer is simply parity check. Player couldÂ simply declares where stands for XOR. Now, player can for example reconstruct by . Now, player can do the same computation and figure out . Now, he can calculate and so on… When we move to an infinite number of players, however, we can’t do that anymore because taking the XOR of an infinite number of bits is not well defined. However, we can still can solve Puzzles #1 and #2 if we believe and are willing to accept the Axiom of Choice.

Axiom of Choice: Given a family of sets there is a set such that , i.e. a set that takes a representative from each element in the family.

It is used, for example to show that there is no measure that is shift invariant (say under addition modulo ) and . The proof goes the following way: define the following equivalence relation on : if . Now, consider the family of all the equivalence classes and invoke the Axiom of Choice. Let be the set obtained. Now, we can write the interval as a disjoint union:

where all operations are modulo and . Since it is an enumerable union, if such a measure existed, then: which is either if or if 0' title='\mu(K) >0' class='latex' />.

This is kinda surprising, but more surprising is how we can use the exact same technique to solve the puzzles: first, let’s solve Puzzle #2: let be the set of all infinite -strings and consider the equivalence relation on such that if the strings differ in a finite number of positions. Now, invoke the axiom of choice in the equivalence classes and let be the set of representatives. Now, if is the set of all strings with finite number of ‘s and the operation such that if . We can therefore write:

Now, a protocol the players could use is to look ahead and since they are seeing an infinite number of bits, they can figure out which equivalence class from they the entire string is. Now, they take the representative of this class and guess . Notice that will differ from the real string by at most a finite number of bits.

Now, to solve puzzle #1, the player simply looks at and figure out the equivalence class he is and let be the representative of this class. Now, since and differ by a finite number of bits, he can simply calculate XOR of and (now, since it is a finite number of them, XOR is well defined) and announce it. With this trick, it just becomes like Puzzle #0.

Submodular Allocation Problem

renatoppl — Wed, 06 Apr 2011 21:18:55 +0000

I am in Israel for the Algorithmic Game Theory Semester in the Center for the Study of Rationality. It is great to both explore Jerusalem and learn about games and algorithms. I think it is a great opportunity to start blogging again. To start, I decided to write about simple and beautiful algorithm by Lehman, Lehman and Nisan on the allocation problem when players have submodular valuations.

Consider a set of items and agents. Each agent has a monotone submodular valuation over the items,Â i.e., s.t. for any subsets of and for T. Now, the goal is to partition the items in sets in order to maximize .

This problem is clearly NP-hard (for example, we can reduce from Maximum Coverage or any similar problem), but is has a very simples Greedy Approximation. The approximation goes as follows: start with all sets being empty, i.e., start with then for each item , find the player with maximum and add to this player. This is a -approximation algorithm. The proof is simple:

Let be the sets returned by the algorithm and the optimal solution. Let also and . We can write:

if we added to set it means that by the Greedy rule. Therefore we can write:

where the first inequality follows from the Greedy rule and the second follows from submodularity. Now, we can simply write:

An improved algorithm was given by Dobzinski and Shapira achieving an approximation using demand queries – that are used as a separation oracle for a suitable linear program.

Probability Puzzles

renatoppl — Wed, 17 Feb 2010 02:54:53 +0000

Today in a dinner with Thanh, Hu and Joel I heard about a paradox I haven’t heard so far. Probability is full of cute problems that challenge our understanding of the basic concepts. The most famous of them is the Monty Hall Problem, which asks:

You are on a TV game show and there are doors – one of them contains a prize, say a car and the other two door contain things you don’t care about, say goats. You choose a door. Then the TV host, who knows where the prize is, opens one door you haven’t chosen and that he knows has a goat. Then he asks if you want to stick to the door you have chosen or if you want to change to the other door. What should you do?

Probably you’ve already came across this question in some moment of your life and the answer is that changing doors would double your probability of getting the price. There are several ways of convincing your intuitions:

Do the math: when you chose the door, there were three options so the prize is in the door you chose with probability and in the other door with probability (note that the presenter can always open some door with a goat, so conditioning on that event doesn’t give you any new information).
Do the actual experiment (computationally) as done here. One can always ask a friend to help, get some goats and perform the actual experiment.
To convince yourself that “it doesn’t matter” is not correct, think doors. You choose one and the TV host open of them and asks if you want to change or stick with your first choice. Wouldn’t you change?

I’ve seen TV shows where this happened and I acknowledge that other things may be involved: there might be behavioral and psychologic issues associated with the Monty Hall problem – and possibly those would interest Dan Ariely, whose book I began reading today – and looks quite fun. But the problem they told me about today in dinner was another: the envelope problem:

There are two envelopes and you are told that in one of them there is twice the amount that there is in the other. You choose one of the envelopes at random and open it: it contains bucks. Now, you don’t know if the other envelope has bucks or bucks. Then someone asks you if you wanted to pay bucks and change to the other envelope. Should you change?

Now, consider two different solutions to this problem: the first is fallacious and the second is correct:

If I don’t change, I get bucks, if I change I pay a penalty of and I get either or with equal probability, so my expected prize if I change is 100}' title='{\frac{200+50}{2}-10 = 115 > 100}' class='latex' />, so I should change.
I know there is one envelope with and one with , then my expected prize if I don’t change is . If I change, my expected prize is , so I should not change.

The fallacy in the first argument is perceiving a probability distribution where there is no one. Either the other envelope contains bucks or it contains bucks – we just don’t know, but there is no probability distribution there – it is a deterministic choice by the game designer. Most of those paradoxes are a result of either an ill-defined probability space, as Bertrand’s Paradox or a wrong comprehension of the probability space, as in Monty Hall or in several paradoxes exploring the same idea as: Three Prisioners, Sleeping Beauty, Boy or Girl Paradox, …

There was very recently a thrilling discussion about a variant on the envelope paradox in the xkcd blag – which is the blog accompaning that amazing webcomic. There was a recent blog post with a very intriguing problem. A better idea is to go there and read the discussion, but if you are not doing so, let me summarize it here. The problem is:

There are two envelopes containing each of them a distinct real number. You pick one envelope at random, open it and see the number, then you are asked to guess if the number in the other envelope is larger or smaller then the previous one. Can you guess correctly with more than probability?

A related problem is: given that you are playing the envelope game and there are number and (with ). You pick one envelope at random and then you are able to look at the content of the first envelope you open and then decide to switch or not. Is there a strategy that gives you expected earnings greater than ?

The very unexpected answers is yes !!! The strategy that Randall presents in the blog and there is a link to the source here is: let be a random variable on such that for each we have 0}' title='{P(a < X < b) > 0}' class='latex' />, for example, the normal distribution or the logistic distribution.

Sample then open the envelope and find a number now, if say the other number is lower and if S}' title='{X > S}' class='latex' /> say the other number is higher. You get it right with probability

A) + P(\text{picked }B) P(X < B) = \frac{1}{2} (1 + P(A < X < B)) ' title='\displaystyle P(\text{picked }A) P(X > A) + P(\text{picked }B) P(X < B) = \frac{1}{2} (1 + P(A < X < B)) ' class='latex' />

which is impressive. If you follow your guess, your expected earning is:

A) B] + \frac{1}{2} [P(XB) A] \\ &= \frac{1}{2}[A [P(XB)] + B [P(X>A) + P(X \frac{A+B}{2} \\ \end{aligned}' title='\displaystyle \begin{aligned} &P(\text{picked }A) \mathop{\mathbb E}[Y \vert \text{picked }A] + P(\text{picked }B) \mathop{\mathbb E}[Y \vert \text{picked }B] = \\ & = \frac{1}{2} [P(XA) B] + \frac{1}{2} [P(XB) A] \\ &= \frac{1}{2}[A [P(XB)] + B [P(X>A) + P(X \frac{A+B}{2} \\ \end{aligned}' class='latex' />

The xkcd pointed to this cool archive of puzzles and riddles. I was also told that the xkcd puzzle forum is also a source of excellent puzzles, as this:

You are the most eligible bachelor in the kingdom, and as such the King has invited you to his castle so that you may choose one of his three daughters to marry. The eldest princess is honest and always tells the truth. The youngest princess is dishonest and always lies. The middle princess is mischievous and tells the truth sometimes and lies the rest of the time. As you will be forever married to one of the princesses, you want to marry the eldest (truth-teller) or the youngest (liar) because at least you know where you stand with them. The problem is that you cannot tell which sister is which just by their appearance, and the King will only grant you ONE yes or no question which you may only address to ONE of the sisters. What yes or no question can you ask which will ensure you do not marry the middle sister?

copied from here.

Walrasian Equilibrium I

renatoppl — Tue, 16 Feb 2010 04:32:53 +0000

Currently I’ve been trying to understand more about the dynamics of markets and basic concepts of microeconomic theory and, as always, writing a blog post will help me to keep my ideas clear. First, why are markets interesting from a computer scientist/mathematician point of view?

Markets are multi-objective optimization problems: one can think of the possible state of a market some point in a space of possible . Each player of the market controls one variable, say and is interested in maximizing one objective function . So, player is trying to set .
Markets are a computational model: one can think of a market as a way of performing a certain computation – as extracting some kind of information, as a prediction market, stock exchanges, … If we think of it as a computational device, we are asking the same questions: given those preferences which are implicit functions to each of the agents, calculate “fair” prices of items.
Markets are distributed systems where each part of the system has a selfish interest.

A market is composed by a set of commodities, of consumers and of producers. Now, we describe how to characterize each of them:

Each consumer is defined by a set of commodities combinations he is interested (typically we take ) and an utility function expressing his interest for this bundle of commodities. Consumer will try to maximize in a further restricted .
Each producer is define by a set it has the capacity to produce.
Endowments: Each consumer comes to the market with an initial endowment , so for , is the amount of commodity that consumer originally has. The initial total endowment of the market is given by , which is a vector indicating how much of each commodity originally exists in the market.
Shares: consumers have shares in the companies, so for , , consumer has shares of company , such that .

Something very crucial is missing in this picture: a way to compare commodities and something that makes exchanges possible: the answer to that is to attribute prices to the items. How to attribute prices to the items so that the market works fine? A price vector is a vector . Consider the following scenario after prices are established to commodities:

by producing , company gets profit , so each company will try to maximize its profit producing .
each consumer sells its initial endowment and gets the profit respective to the companies he owns. So, consumer gets .
now consumer uses the money he has to buy the best bundle he can afford, which is .

The amount of commodities in the market must conserve, so that is possible only if we get:

First, it is not clear if such a price vector exists. If it exists, is it unique? If this is an equilibrium, is it the best thing for the consumers? How those prices can be set in practice without a centralized authority? Can people lie? Below, let’s collect a couple of questions I’ll try to answer (yes, no or unknown) in this and the following posts.

Question 1: Does a price vector always exist that generates an equilibrium?

Question 2: If it exists, is it unique?

Question 3: Can we describe an efficent method to find ?

Question 4: Is it the best thing for the consumers in the following sense: if is an equilibrium, are there feasible such that and for at least one consumer u_i(x_i^*)}' title='{u_i(x_i) > u_i(x_i^*)}' class='latex' />? (This is called Pareto improvement)

Question 5: A central authority could use the knowledge about functions and endowments to calculate the price vector using some method. Can consumers be better off by lieing about their utility and endowments?

Question 6: How prices get defined without a central authority? Is there a dynamic/game-theoretical model to that?

For simplicity, let’s think of Exchange Economies, which are economies with no producers. Let’s define it formally:

Definition 1 An exchange economy is composed by a set of commodities and a set of consumers each with an utility and an initial endowment .

Definition 2 A price vector is a Walrasian equilibrium for an exchange economy if there is such that:

s.t.

The first condition says that each consumer is maximizing his utility given his prices, the second says that we can’t buy more commodities than what is available in the market and the third, called Walras’ Law, says that if there is surplus of a certain product, it should have price zero. It is by far the most unnatural of those, but it can be easily justifiable in some circumnstances: suppose we say that utilities are non-satiated if for each and 0}' title='{\epsilon > 0}' class='latex' />, there is , such that u_i(x_i)}' title='{u_i(x'_i) > u_i(x_i)}' class='latex' />. If are differentiable, that would mean , for example a linear function with some 0}' title='{u_{i\ell} > 0}' class='latex' />. In that case, and some player has money surplus and therefore he could increase his utility.

Now, we define for each price vector the excess demand function and . Now, under non-satiated utilities, by the last argument, we have that is an equilibrium vector iff . Actually, if are also strong monotone, i.e., for each , then it becomes: is an equilibrium iff , which means that the market clears:

The question that is easier to answer is Question 4 and it is sometimes refered as the First Fundamental Theorem of Welfare Economics:

Theorem 3 Given non-satiated preferences, each equilibrium is Pareto, i.e. there is no other feasible allocation such that for all , with the inequality strict for at least one component.

Proof: Suppose there were, since then , because if then we could improve the utility of still within the budget, contradicting the optimality of for that budget. And clearly u_i(x_i)}' title='{u_i(x'_i) > u_i(x_i)}' class='latex' /> implies p \cdot \omega_i}' title='{p \cdot x'_i > p \cdot \omega_i}' class='latex' />.

Summing over , we get \sum_i p \omega_i}' title='{\sum_i p x'_i > \sum_i p \omega_i}' class='latex' />, what is a contradiction, because since is feasible, and therefore .

Now, let’s tackle Question 1. We assume linearly of utility: for 0}' title='{u_{i \ell} > 0}' class='latex' />. This gives us strong monotonicity and local nonsatiated preferences.

Theorem 4 Under linear utilities, there is always an equilibrium price vector .

Consider the function defined above: where is the bundle of best possible utility. Now, since we are using linear utilities we can’t guarantee there will be only one such bundle, so instead of considering a function, consider and as being correspondences: , i.e., is the set of all allocations that maximize subject to . Since are linear functionals, we can calculate by a Fractional Knapsack algorithm: we sort commodities by and start buying in the cost-benefit order (the ones that provide more utility per buck spent). Most of the time there will be just one solution, but in points where , then might be a convex region. This correpondence is upper hemicontinuous, which is the correspondence analogue to continuity for functions. As Wikipedia defines:

Definition 5 A correspondence is said to be upper hemicontinuous at the point if for any open neighbourhood of there exists a neighbourhood of a such that is a subset of for all in .

It is not hard to see that is upper hemicontinuous according to that definition. Our goal is to prove that there is one price vector for which or: . To prove that we use Kakutani’s Fixed Point Theorem. Before we go into that, we’ll explore some other properties of :

0-Homogeneous: 0}' title='{z(\alpha p) = z(p), \forall \alpha > 0}' class='latex' />
Walras’ Law: . For any we know by the definition of . So, if it not zero, some has money surplus what is absurd given that preferences are strongly monotone.
Bounded: is bounded from below, i.e., for some 0}' title='{s > 0}' class='latex' />. Simply take
Boundary behavior: if with , then . That is clear from the fractional knapsack algorithm when one desirable item gets price zero.

Now, we are in shape for applying Kakutani’s Fixed Point Theorem:

Theorem 6 (Kakutani, 1941) If is an upper hemicontinuous correspondence such that is a convex non-empty set for all then has a fixed point, i.e., s.t. .

Since prices are -homogeneous, consider the simplex , its relative interior 0; \sum_\ell p_\ell = 1\}}' title='{\Delta^0 = \{p > 0; \sum_\ell p_\ell = 1\}}' class='latex' /> and the boundary . Now we define the following price correcting correspondence .

If some price is set, it generates demand . For that demand, the price that would maximize profit would be , i.e. for all . It is natural to re-adjust the prices to . So we define for :

and for :

Now, I claim that this correspondence satisfies the conditions in Kakutani’s Theorem. We skip a formal proof of this fact, but this is intuitive for the interior – let’s give the intuition why this is true as we approach the boundary: if , then , therefore the demans explodes: and as a result the best thing to do is to set the prices of those commodities much higher than the rest. Therefore, the price of the commodities whose demand explode are positive while the prices of the commodities where the price doesn’t get value zero.

Now, after waiving our hands about the upper continuity of , we have by Kakutani’s Theorem a point such that . By the definition of we must have (because for , . Now, I claim . In fact if , still by Walras’ Law. So, if then there is with and therefore for all , and . For this reason .

In the next blog post (or serie of blog posts, let’s see) we discuss issues related to the other questions: uniqueness, dynamics, game-theoretical considerations, …

Bounded Degree Spanning Tree and an Uncrossing Lemma

renatoppl — Wed, 18 Nov 2009 04:07:18 +0000

I’ve been reading about the Bounded Degree Spanning Tree problem and I thought of writing some of what I am learning here. It illustrates a beautiful techique called Iterated Rounding and uses the combinatorial idea of uncrossing. I’ll try to give a high-level idea of the argument and give references on the details. The first result of this kind was given by Goemans (although there were previous results with weaker guarantees) by Goemans in Minimum Bounded Degree Spanning Trees, but the result based on iterated rounding and a subsequent improvement are due to Singh and Lau in a serie of papers. A main reference is Approximating minimum bounded degree spanning trees to within one of optimal.

The problem of bounded degree spanning tree is as follows: consider a graph with edge weights and we for some nodes a degree bound . We want to find, among the spanning trees for which the degree of is the one with minimum cost. It is clearly a hard problem, since taking all weights equal to and for all nodes is the Hamiltonian Path problem, which is NP-complete. We will get a different kind of approximation. Let OPT be the optimal solution: we will show an algorithm that gives a spanning tree of cost such that each node has degree (this can be improved to with a more sofisticated algorithm, also based on Iterated Rounding).

As always, the first step to design an approximation algorithm is to relax it to an LP. We consider the following LP:

The first constraint expresses that in a spanning tree, there are at most edges, the second prevent the formation of cycles and the third guarantees the degree bounds. For we have the standard Minimal Spanning Tree problem and for this problem the polytope is integral. With the degree bounds, we lose this nice property. We can solve this LP using the Ellipsoid Method. The separation oracle for the is done by a flow computation.

Iterated Rounding

Now, let’s go ahead and solve the LP. It would be great if we had an integral solution: we would be done. It is unfortunately not the case, but we can still hope it is almost integral in some sense: for example, some edges are integral and we can take them to the final solution and recurse the algorithm on a smaller graph. This is not far from truth and that’s the main idea of the iterated rounding. We will show that the support of the optimal solution 0\}}' title='{E(x) = \{e \in E; x_e > 0\}}' class='latex' /> has some nice structure. Consider the following lemma:

Lemma 1 For any basic solution of the LP, either there is with just one incident edge in the support or there is one such that that at most edges are incident to it.

If we can prove this lemma, we can solve the problem in the following way: we begin with an empty tree: then we solve the LP and look at the support . There are two possibilities according to the lemma:

If there is one node with just one edge incident to it in the support, we add it to the tree, remove from , decrease , make (the trick is to remove in each iteration edges from that are not in the support. Clearly, removing those edges doesn’t hurt the objective value) and run the algorithm again. Notice that the LP called in the recursion has value less or equal then the actual LP . So if by induction we get a spanning tree respecting the new degree bounds plus two and value less or equal than the new LP value, we can just add and we have a solution with value less or equal than the one of the original LP respecting the degree bounds plus two.
Otherwise, there is one node that has degree in the support. So, we just remove that degree bound on that vertex (i.e. remove from ), make (again,eliminate the edges not in the support) and run the algorithm again. Clearly, if one node is still in , it has , since there are only three edges in the support, there will be for the rest of the computation, just three edges incident to it, so there will be at most three edges more incident to it. So it will exceed its original by at most .

The algorithm eventually stops, since in each iteration we have less edges or less nodes in and the solution is as desired. The main effort is therefore to prove the lemma. But before, let’s look at the lemma: it is of the following kind: “any basic solution of the LP has some nice properties, which envolve having a not too big (at least in some point) support”. So, it involves proving that the support is not too large. That is our next task as we are trying to prove the lemma. And we will be done with:

Theorem 2 The algorithm described above produces a spanning tree of cost (the LP values and therefore )in which each node has degree .

Bounding the size of the support

We would like now to prove some result like the Lemma above: that in the solution of the LP we have either one with degree in or we have a node in with degree . First, we suppose the opposite, that has all the nodes with degree and all the nodes in have degree . This implies that we have a large number of edges in the support. From the degrees, we know that:

We want to prove that the support of the LP can’t be too large. The first question is: how to estimate the size of the support of a basic solution. The constraints look like that:

A basic solution can be represented by picking rows of the matrix and making them tight. So, if we have a general LP, we pick some submatrix of which is and the basic solution is just . The lines of matrix can be of three types: they can be , which are corresponding to , that correspond to or corresponding to . There are vectors in total. The size of the support is smaller or equal the number of rows of the form in the basic solution. Therefore the idea to bound the size of the support is to prove that “all basic solutions can be represented by a small number of rows in the form . And this is done using the following:

Lemma 3 Assuming , for any basic solution , there is and a family of sets such that:

The restrictions correspondent to and are tight for

is an independent set

is a laminar family

The first 3 items are straightfoward properties of basic solutions. The fourth one, means that for two sets , one of three things happen: , or . Now, we based on the previous lemma and in the following result that can be easily proved by induction, we will prove Lemma 1.

Lemma 4 If is a laminar family over the set where each set contains at least elements, then .

Now, the proof of Lemma 1 is easy. Let’s do it and then we come back to prove Lemma 3. Simply see that what contradicts .

Uncrossing argument

And now we arrive in the technical heart of the proof, which is proving Lemma 3. This says that given any basic solution, given any feasible solution, we can write it as a “structured” basic solution. We start with any basic feasible solution. This already satifies (1)-(3), then we need to change that solution to satisfy condition (4) as well. We need to get rid crossing elements, i.e., in the form:

We do that by the means of the:

Lemma 5 (Uncrossing Lemma) If and are intersecting and tight (tight in the sense that their respective constraint is tight), then and are also tight and:

Which corresponds to that picture:

Proof: First, we note that is a supermodular function, i.e.:

We can see that by case analysis. Every edge appearing in the left side appears in the right side with at least the same multiplicity. Notice also that it holds with strict inequality iff there are edges from to . Now, we have:

where the first relation is trivial, the second is by feasibility, the third is by supermodularity and the lastone is by tightness. So, all hold with equality and therefore and are tight. We also proved that:

so there can be no edge from to in and therefore, thinking just of edges in we have:

Uncrossing arguments are found everywhere in combinatorics. Now, we show how the Uncrossing Lemma can be used to prove Lemma 1:

Proof: Let be any basic solution. It can be represented by a pair where and is a family of sets. We will show that the same basic solution can be represented by where is a laminar family and has the same size of .

Let be all sets that are tight under and a maximal laminar family of tights sets in , such that are independent. I claim that .

In fact, suppose , then there are sets of we could add to without violating independence – the problem is that those sets would cross some set. Pick such intersecting fewer possible sets in . The set intersects some . Since both are tight we can use the Uncrossing Lemma and we get:

since , we can’t have simultaneously and in . Let’s consider two cases:

, then is in and intersects fewer sets of than , since all sets that intersect in must intersect as well (since no set can cross ).
, then is in and intersects fewer sets of than , since all sets that intersect in must intersect .

In either case we have a contradiction, so we proved that . So we can generate all the space of tight sets with a laminar family.

And this finishes the proof. Let’s go over all that we’ve done: we started with an LP and we wanted to prove that the support of each solution was not too large. We wanted that because we wanted to prove that there was one node with degree one in the support or a node in with small () degree. To prove that the degree of the support is small, we show that any basic solution has a representation in terms of a laminar family. Then we use the fact that laminar families can’t be very large families of sets. For that, we use the celebrated Uncrossing Lemma.

Note: Most of this is based on my notes on David Williamson’s Approximation Algorithms class. I spent some time thinking about this algorithm and therefore I decided o post it here.

Looking at probability distributions

renatoppl — Fri, 13 Nov 2009 03:16:27 +0000

I’ve been taking two classes in probability this semester and in those I saw the proofs of a lot of interesting theorems which I knew about previously but I have never seen the proof, as the Central Limit Theorem, the Laws of Large Numbers and so on… Also, some theory which is looks somewhat ugly in the undergrad courses becomes very clear with the proper formal treatment. Today I was thinking what was the main take-home message that a computer scientist could take from those classes and. at ;east for me, this message is the various ways of looking to probability distributions. I’ve heard about moments, Laplace transform, Fourier transform and other tools like that, but I never realized before their true power. Probably still today, most of their true power is hidden from me, but I am starting to look at them in a different way. Let me try to go over a few examples of different ways we can look at probability distributions and show cases where they are interesting.

Most of ways of looking at probability distributions are associated with multiplicative system: a multiplicative system is a set of real-valued functions with the property that if then . Those kinds of sets are powerful because of the Multiplicative Systems Theorem:

Theorem 1 (Multiplicative Systems Theorem) If is a multiplicative system, is a linear space containing (the constant function ) and is closed under bounded convergence, then implies that contains all bounded -measurable functions.

The theorem might look a bit cryptic if you are not familiar with the definitions, but it boils down to the following translation:

Theorem 2 (Translation of the Multiplicative Systems Theorem) If is “general” multiplicative system, and are random variable such that for all then and have the same distribution.

where general excludes some troublesome cases like or all constant functions, for example. In technical terms, we wanted to be the Borel -algebra. But let’s not worry about those technical details and just look at the translated version. We now, discuss several kinds of multiplicative systems:

The most common description of the a random variable is by the cummulative distribution function . This is associated with notice that simply .
We can characterize a random variable by its moments: the variable is characterized by the set . Given the moemnts , the variable is totally characterized, i.e., if two variables have the same moments, then they have the same distribution by the Multiplicative Systems Theorem. This description is associated with the system
Moment Generating Function: If is a variable that assumes only integer values, we can describe the it as , where . An interesting way of representing those probabilities is as the moment generating function . This is associated with the multiplicative system .Now suppose we are given two discrete independent variables and . What do we know about . It is easy to know its expectation, its variance, … but what about more complicated things? What is the distribution of ? Moment generating functions answer this question very easily, since:

If we know moment generating functions, we can calculate expectation very easily, since . For example, suppose we have a process like that: there is one bacteria in time . In each timestep, either this bacteria dies (with probability ), continues alive without reproducing (with probability or has offsprings (with probability ). In that case . Each time, the same happens, independently with each of the bacteria alive in that moment. The question is, what is the expected number of bacteria in time ?

It looks like a complicated problem with just elementary tools, but it is a simple problem if we have moment generating functions. Just let be the variable associated with the bacteria of time . It is zero if it dies, if it stays the same and if it has offsprings. Let also be the number of bacteria in time . We want to know . First, see that:

Now, let’s write that in terms of moment generating functions:

which is just:

since the variables are all independent and identically distributed. Now, notice that:

by the definition of moment generating function, so we effectively proved that:

We proved that is just iterated times. Now, calculating the expectation is easy, using the fact that and . Just see that: . Then, clearly . Using similar technique we can prove a lot more things about this process, just by analyzing the behavior of the moment generating function.
Laplace Tranform: Now, moving to continuous variables, if is a continuous non-negative variable we can define its Laplace tranform as: , where stands for the distribution of , for example, . This is associated with the multiplicative system . Again, by the Multiplicative Systems Theorem, if , then the two variables have the same distribution. The Laplace tranform has the same nice properties as the Moment Generating Function, for example, .And it allows us to do similar tricks than the one I just showed for Moment Generating Functions. One common trick that is used, for example, in the proof of Chernoff bounds is, given independent non-negative random variables:
u\right\} = P\left\{e^{\sum_i X_i} > e^u\right\} \leq \frac{\mathop{\mathbb E}[e^{\sum_i X_i} ]}{e^u} = \frac{\prod_i \mathop{\mathbb E}[e^{X_i} ]}{e^u} ' title='\displaystyle P\left\{\sum_i X_i > u\right\} = P\left\{e^{\sum_i X_i} > e^u\right\} \leq \frac{\mathop{\mathbb E}[e^{\sum_i X_i} ]}{e^u} = \frac{\prod_i \mathop{\mathbb E}[e^{X_i} ]}{e^u} ' class='latex' />

where we also used Markov Inequality: . Passing to the Laplace transform is the main ingredient in the Chernoff bound and it allows us to sort of “decouple” the random variables in the sum. There are several other cases where the Laplace transform proves itsself very useful and turns things that looked very complicated when we saw in undergrad courses into simple and clear things. One clear example of that is the motivation for the Poisson random variable:

If are independend exponentially distributed random variables with mean , then . An elementary calculation shows that its laplace transform is . Let , i.e., the time of the arrival. We want to know what is the distribution of . How to do that?

Now, we need to find such that . Now it is just a matter of solving this equation and we get: . Now, the Poisson varible measures the number of arrivals in and therefore:

t\} - P\{S_{n-1} \geq t\} \\ & = \int_t^\infty \rho_{S_n}(t) dt - \int_t^\infty \rho_{S_{n-1}}(t) dt = \frac{(\lambda t)^n}{n!} e^{-\lambda t} \end{aligned}' title='\displaystyle \begin{aligned} P\{N_t = n\} & = P\{S_{n-1} < t < S_n\} = P\{S_n > t\} - P\{S_{n-1} \geq t\} \\ & = \int_t^\infty \rho_{S_n}(t) dt - \int_t^\infty \rho_{S_{n-1}}(t) dt = \frac{(\lambda t)^n}{n!} e^{-\lambda t} \end{aligned}' class='latex' />
Characteristic Function or Fourier Tranform: Taking we get the Fourier Transform: which also has some of the nice properties of the previous ones and some additional ones. The characteristic functions were the main actors in the development of all the probability techniques that lead to the main result of 19th century Probability Theory: the Central Limit Theorem. We know that moment generating functions and Laplace transforms completely characterize the distributions, but it is not clear how to recover a distribution once we have a transform. For Fourier Transform there is a cleas and simple way of doing that by means of the Inversion Formula:

One fact that always puzzled me was: why is the normal distribution so important? What does it have in special to be the limiting distribution in the Central Limit Theorem, i.e., if is a sequence of independent random variables, then under some natural conditions on the variables. The reason the normal is so special is because it is a “fixed point” for the Fourier Transform. We can see that . And there we have something special about it that makes me believe the Central Limit Theorem.

————————-

This blog post was based on lectures by Professor Dynkin at Cornell.

Random Spanning Trees

renatoppl — Wed, 04 Nov 2009 04:51:59 +0000

BigRedBits is again pleased to have Igor Gorodezky as a guest blogger directly from UCLA. I leave you with his excelent post on the Wilson’s algorithm.

——————————————

Igor again, with another mathematical dispatch from UCLA, where I’m spending the semester eating and breathing combinatorics as part of the 2009 program on combinatorics and its applications at IPAM. In the course of some reading related to a problem with which I’ve been occupying myself, I ran across a neat algorithmic result – Wilson’s algorithm for uniformly generating spanning trees of a graph. With Renato’s kind permission, let me once again make myself at home here at Big Red Bits and tell you all about this little gem.

The problem is straightforward, and I’ve essentially already stated it: given an undirected, connected graph , we want an algorithm that outputs uniformly random spanning trees of . In the early ’90s, Aldous and Broder independently discovered an algorithm for accomplishing this task. This algorithm generates a tree by, roughly speaking, performing a random walk on and adding the edge to every time that the walk steps from to and is a vertex that has not been seen before.

Wilson’s algorithm (D. B. Wilson, “Generating random spanning trees more quickly than the cover time,” STOC ’96) takes a slightly different approach. Let us fix a root vertex . Wilson’s algorithm can be stated as a loop-erased random walk on as follows.

Algorithm 1 (Loop-erased random walk) Maintain a tree , initialized to consist of alone. While there remains a vertex not in : perform a random walk starting at , erasing loops as they are created, until the walk encounters a vertex in , then add to the cycle-erased simple path from to .

We observe that the algorithm halts with probability 1 (its expected running time is actually polynomial, but let’s not concern ourselves with these issues here), and outputs a random directed spanning tree oriented towards . It is a minor miracle that this tree is in fact sampled uniformly from the set of all such trees. Let us note that this offers a solution to the original problem, as sampling randomly and then running the algorithm will produce a uniformly generated spanning tree of .

It remains, then, to prove that the algorithm produces uniform spanning trees rooted at (by which we mean directed spanning trees oriented towards ). To this we dedicate the remainder of this post.

1. A “different” algorithm

Wilson’s proof is delightfully sneaky: we begin by stating and analyzing a seemingly different algorithm, the cycle-popping algorithm. We will prove that this algorithm has the desired properties, and then argue that it is equivalent to the loop-erased random walk (henceforth LERW).

The cycle-popping algorithm works as follows. Given and , associate with each non-root vertex an infinite stack of neighbors. More formally, to each we associate

where each is uniformly (and independently) sampled from the set of neighbors of . Note that each stack is not a random walk, just a list of neighbors. We refer to the left-most element above as the top of , and by popping the stack we mean removing this top vertex from .

Define the stack graph to be the directed graph on that has an edge from to if is at the top of the stack . Clearly, if has vertices then is an oriented subgraph of with edges. The following lemma follows immediately.

Lemma 1 Either is a directed spanning tree oriented towards or it contains a directed cycle.

If there is a directed cycle in we may pop it by popping for every . This eliminates , but of course might create other directed cycles. Without resolving this tension quite yet, let us go ahead and formally state the cycle-popping algorithm.

Algorithm 2 (Cycle-popping algorithm) Create a stack for every . While contains any directed cycles, pop a cycle from the stacks. If this process ever terminates, output .

Note that by the lemma, if the algorithm ever terminates then its output is a spanning tree rooted at . We claim that the algorithm terminates with probability 1, and moreover generates spanning trees rooted at uniformly.

To this end, some more definitions: let us say that given a stack , the vertex is at level . The level of a vertex in a stack is static, and is defined when the stack is created. That is, the level of does not change even if advances to the top of the stack as a result of the stack getting popped.

We regard the sequence of stack graphs produced by the algorithm as leveled stack graphs: each non-root vertex is assigned the level of its stack. Observe that the level of in is the number of times that has been popped. In the same way, we regard cycles encountered by the algorithm as leveled cycles, and we can regard the tree produced by the algorithm (if indeed one is produced) as a leveled tree.

The analysis of the algorithm relies on the following key lemma (Theorem 4 in Wilson’s paper), which tells us that the order in which the algorithm pops cycles is irrelevant.

Lemma 2 For a given set of stacks, either the cycle-popping algorithm never terminates, or there exists a unique leveled spanning tree rooted at such that the algorithm outputs irrespective of the order in which cycles are popped.

Proof: Fix a set of stacks . Consider a leveled cycle that is pop-able, i.e.~there exist leveled cycles that can be popped in sequence. We claim that if the algorithm pops any cycle not equal to , then there still must exist a series of cycles that ends in and that can be popped in sequence. In other words, if is pop-able then it remains pop-able, no matter which cycles are popped, until itself is actually popped.

Let be a cycle popped by the algorithm. If then the claim is clearly true. Also, if shares no vertices with , then the claim is true again. So assume otherwise, and let be the first in the series to share a vertex with . Let us show that by contradiction.

If , then and must share a vertex that has different successors in and . But by definition of , none of the contain , and this implies that has the same level in and . Therefore its successor in both cycles is the same, a contradiction. This proves .

Moreover, the argument above proves that and are equal as leveled cycles (i.e.~every vertex has the same level in both cycles). Hence

is a series of cycles that can be popped in sequence, which proves the original claim about .

We conclude that given a set of stacks, either there is an infinite number of pop-able cycles, in which case there will always be an infinite number and the algorithm will never terminate, or there is a finite number of such cycles. In the latter case, every one of these cycles is eventually popped, and the algorithm produces a spanning tree rooted at . The level of each non-root vertex in is given by (one plus) the number of popped cycles that contained .

Wilson summarizes the cycle-popping algorithm thusly: “[T]he stacks uniquely define a tree together with a partially ordered set of cycles layered on top of it. The algorithm peels off these cycles to find the tree.”

Theorem 3 The cycle-popping algorithm terminates with probability 1, and the tree that it outputs is a uniformly sampled spanning tree rooted at .

Proof: The first claim is easy: has a spanning tree, therefore it has a directed spanning tree oriented towards . The stacks generated in the first step of the algorithm will contain such a tree, and hence the algorithm will terminate, with probability 1.

Now, consider a spanning tree rooted at . We’ll abuse notation and let be the event that is produced by the algorithm. Similarly, given a collection of leveled cycles , we will write for the event that is the set of leveled cycles popped by the algorithm before it terminates. Finally, let be the event that the algorithm popped the leveled cycles in and terminated, with the resulting leveled tree being equal to .

By the independence of the stack entries, we have , where is the probability that the algorithm’s output is a leveled version of , a quantity which a moment’s reflection will reveal is independent of . Now,

which, as desired, is independent of .

2. Conclusion

We have shown that the cycle-popping algorithm generates spanning trees rooted at uniformly. It remains to observe that the LERW algorithm is nothing more than an implementation of the cycle-popping algorithm! Instead of initially generating the (infinitely long) stacks and then looking for cycles to pop, the LERW generates stack elements as necessary via random walk (computer scientists might recognize this as the Principle of Deferred Decisions). If the LERW encounters a loop, then it has found a cycle in the stack graph induced by the stacks that the LERW has been generating. Erasing the loop is equivalent to popping this cycle. We conclude that the LERW algorithm generates spanning trees rooted at uniformly.

More about hats and auctions

renatoppl — Thu, 29 Oct 2009 05:41:52 +0000

In my last post about hats, I told I’ll soon post another version with some more problems, which I ended up not doing and would talk a bit more about those kind of problems. I ended up not doing, but here are a few nice problems:

Those people are again a room, each with a hat which is either black or white (picked with probability at random) and they can see the color of the other people’s hats but they can’t see their own color. They write in a piece of paper either “BLACK” or “WHITE”. The whole team wins if all of them get their colors right. The whole team loses, if at least one writes the wrong color. Before entering the room and getting the hats, they can strategyze. What is a strategy that makes them win with probability?

If they all choose their colors at random, the probability of winning is very small: . So we should try to correlate them somehow. The solution is again related with error correcting codes. We can think of the hats as a string of bits. How to correct one bit if it is lost? The simple engineering solution is to add a parity check. We append to the string a bit . So, if bit is lost, we know it is . We can use this idea to solve the puzzle above: if hats are places with probability, the parity check will be with probability and with probability . They can decide before hand that everyone will use and with probability they are right and everyone gets his hat color right. Now, let’s extend this problem in some ways:

The same problem, but there are hat colors, they are choosen independently with probability and they win if everyone gets his color right. Find a strategy that wins with probability .

There are again hat colors, they are choosen independently with probability and they win if at least a fraction () of the people guesses the right color. Find a strategy that wins with probability .

Again to the problem where we just have BLACK and WHITE colors, they are chosen with probability and everyone needs to find the right color to win, can you prove that is the best one can do? And what about the two other problems above?

The first two use variations of the parity check idea in the solution. For the second case, given any strategy of the players, for each string they have probability . Therefore the total probability of winning is . Let , i.e., the same input but with the bit flipped. Notice that the answer of player is the same (or at least has the same probabilities) in both and , since he can’t distinguish between and . Therefore, . So,

. This way, no strategy can have more than probability of winning.

Another variation of it:

Suppose now we have two colors BLACK and WHITE and the hats are drawn from one distribution , i.e., we have a probability distribution over and we draw the colors from that distribution. Notice that now the hats are not uncorrelated. How to win again with probability (to win, everyone needs the right answer).

I like a lot those hat problems. A friend of mine just pointed out to me that there is a very nice paper by Bobby Kleinberg generalizing several aspects of hat problems, for example, when players have limited visibility of other players hats.

I began being interested by this sort of problem after reading the Derandomization of Auctions paper. Hat guessing games are not just a good model for error correcting codes, but they are also a good model for truthful auctions. Consider an auction with a set single parameter agents, i.e., an auction where each player gives one bid indicating how much he is willing to pay to win. We have a set of constraints: of all feasible allocations. Based on the bids we choose an allocation and we charge payments to the bidders. An example of a problem like this is the Digital Goods Auction, where .

In this blog post, I discussed the concept of truthful auction. If an auction is randomized, an universal truthful auction is an auction that is truthful even if all the random bits in the mechanism are revealed to the bidders. Consider the Digital Goods Auction. We can characterize universal truthful digital goods auction as bid-independent auctions. A bid-independent auction is given by function , which associated for each a random variable . In that auction, we offer the service to player at price . If we allocate to and charge him . Otherwise, we don’t allocate and we charge nothing.

It is not hard to see that all universal truthful mechanisms are like that: if is the probability that player gets the item bidding let be an uniform random variable on and define . Notice that here , but we are inverting with respect to . It is a simple exercise to prove that.

With this characterization, universal truthful auctions suddenly look very much like hat guessing games: we need to design a function that looks at everyone else’s bid but not on our own and in some sense, “guesses” what we probably have and with that calculated the price we offer. It would be great to be able to design a function that returns . That is unfortunately impossible. But how to approximate nicely? Some papers, like the Derandomization of Auctions and Competitiveness via Consensus use this idea.

Cayley-Hamilton Theorem and Jordan Canonical Form

renatoppl — Thu, 29 Oct 2009 04:17:15 +0000

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an matrix we can define its characteristic polynomial by . The Cayley-Hamilton Theorem says that . The polynomiale is something like:

so we can just see it as a formal polynomial and think of:

which is an matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix in the basis of its eigenvectors, then we can write where is the diagonal matrix with the eigenvalues in the main diagonal.

and since we have . Now, it is simple to see that:

and therefore:

And that was the one-line proof. One even simpler proof is: let be the eigenvectors, then , so must be since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

which has only one eigenvalue and the the eigenvectors are all of the form for . So, the dimension of the space spanned by the eigenvectors is , less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that and therefore there must be , so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with . The matrices with identical eigenvalues form a zero measure subset of , they are in fact the roots of a polynomial in . This polynomial is the resultant polynomial . Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in . Since is a continuous function, it extends naturally to all matrices .

We can also interpret that probabilistically: get a matrix where is taken uniformly at random from . Then has with probability all different eigenvalues. So, with probability . Now, just make .

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any there is a matrix so that:

where are blocks of the form:

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that . If the block has size , then it is exacly the proof above. If the block is bigger, then we need to look at how does looks like. By inspection:

Tipically, for we have in each row, starting in column the sequence , i.e., . So, we have

If block has size , then has multiplicity in and therefore and therefore, as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

Hats, codes and puzzles

renatoppl — Sat, 03 Oct 2009 22:51:16 +0000

When I was a child someone told me the following problem:

A king promised to marry his daughter to the most intelligent man. Three princes came to claim her hand and he tryed the following logic experiment with them: The princes are gathered into a room and seated in a line, one behind the other, and are shown 2 black hats and 3 white hats. They are blindfolded, and 1 hat is placed on each of their heads, with the remaining hats hidden in a different room. The first one to deduce his hat color will marry the princess. If some prince claims his hat color incorrectly he dies.

The prince who is seated behind removes his blindfold, sees the two hats in front of him and says nothing. Then the prince in the middle removes his blindfold after that and he can see the hat of the prince in front of him. He also says nothing. Noticing the other princes said nothing, the prince seated in the first whole, without even removing his blindfold, gives the correct answer? The question is: what is the color he said?

This is a simple logical puzzle: we just write all the possibilities and start ruling them out given that the other princes didn’t answer and in the end we can find the color of his hat. I remember that this puzzle surprised me a lot as a kid. A found it extremly cool by then, what made me want to read books about logic problems. After that, I had a lot of fun reading the books by Raymond Smullyan. I usually would read the problems, think something like: there can’t ba a solution to that. Then go to school with the problem in mind and spend the day thinking about that. Here is a problem I liked a lot:

There is one prisoner and there are two doors: each has one guardian. One door leads to an exit and one door leads to death. The prisioner can choose one door to open. One guardian speaks only the truth and one guardian always lies. But you don’t know which door is which, which guardian is which and who guards each door. You are allowed to choose one guardian and make him one Yes/No question, and then you need to choose a door. What is the right question to ask?

But my goal is not to talk about logic puzzles, but about Hat problems. There are a lot of variations of the problems above: in all of them a person is allowed to see the other hats but not his own hat and we need to “guess” which is the color of our hat. If we think carefully, we will see that this is a very general kind of problem in computer science: (i) the whole goal of learning theory is to predict one thing from a lot of other things you observe; (ii) in error correcting code, we should guess one bit from all the others, or from some subset of the others; (iii) in universal truthful mechanisms, we need to make a price offer to one player that just depends on all other players bids. I’ll come back to this example in a later post, since it is what made me interested in those kinds of problems, but for now, let’s look at one puzzle I was told about by David Malec at EC’09:

There are black and white hats and people: for each of them we choose one color independently at random with probability . Now, they can look at each others hats but not at their own hat. Then they need to write in a piece of paper either “PASS” or one color. If all pass or if someone has a wrong color, the whole team loses (this is a team game) and if at lest one person gets the color right and no one gets wrong, the whole team wins. Create a strategy for the team to win with probability.

To win with probability is easy: one person will always write “BLACK” and the others “PASS”. A better strategy is the following: if one person sees two hats of equal color, he writes the opposite color, otherwise, he passes. It is easy to see the team wins except in the case where all hats are the same color, what happens with probability. We would like to extend this to a more general setting:

There are black and white hats and people: for each of them we choose one color independently at random with probability . Now, they can look at each others hats but not at their own hat. Then they need to write in a piece of paper either “PASS” or one color. If all pass or if someone has a wrong color, the whole team loses (this is a team game) and if at lest one person gets the color right and no one gets wrong, the whole team wins. Create a strategy for the team to win with probability.

It is a tricky question on how to extend the above solution in that case. A detailed solution can be found here. The idea is quite ingenious, so I’ll sketch here. It envolves Error Correcting Code, in that case, the Hamming Code. Let with sum and product modulo . Let be the non-zero vector of and the following linear map:

Let be the kernel of that application. Then, it is not hard to see that is a partition of and also that because of that fact, for each either or exists a unique s.t. . This gives an algorithm for just one player to guess his correct color. Let be the color vector of the hats. Player sees this vector as:

which can be or . The strategy is: if either one of those vector is in , write the color corresponding to the other vector. If both are out of , pass. The team wins iff , what happens with probability. Is is an easy and fun exercise to prove those facts. Or you can refer to the solution I just wrote.

Now, we can complicate it a bit more: we can add other colors and other distributions. But I wanted to move to a different problem: the paper Derandomization of Auctions showed me an impressive thing: we can use coding theory to derandomize algorithms. To illustrate their ideas, they propose the following problem:

Color guessing problem: There are people wearing hats of different colors. If each person can see everyone else’s hats but not his or her own. Each person needs to guess the color of his own hat. We want a deterministic guessing algorithm that fraction of each color class is guessed correctly.

The problem is very easy if we have a source of random bits. Each person guesses some color at random. It seems very complicated to do that without random bits. Surprisingly, we will solve that using a flow computation:

Let be an array of colors the array with color removed. Consider the following flow network: nodes and (source and sink), nodes for each . There are such nodes. Consider also nodes in the form where is a color () and is a color vector. There are such nodes.

We have edges from to for all nodes of that kind. And we have edges from to . Now, if , i.e., if completed in the -th coordinate with generates , then add an edge from to .

Consider the following flow: add unit of flow from to and from split that flow in pieces of size and send each to for . Now, each node receives flow, where is the number of occurencies of in . Send all that flow to .

We can think of that flow as the guessing procedure. When we see we choose the guess independently at random and this way, each receives in expectation guesses . Notice that an integral flow in that graph represents a deterministic guessing procedure: so all we need is an integral flow so that the flow from to is . The flow received is from nodes of the type: and that means that bidder in , looking at the other hats will correctly choose , times.

Now, define the capacities this way: for all edges from to and from to have capacity and from to capacity . There is an integral flow that saturates all edges from to , because of the fractional flow showed. So, the solution gives us a deterministic decision procedure.

In the next blog post, I’ll try to show the result in the Derandomization of Auctions that relates that to competitive auctions.

Using expanders to prove sum-product inequalities in finite fields

renatoppl — Thu, 24 Sep 2009 03:47:28 +0000

I am happy to have the first guest post of BigRedBits written by Igor Gorodezky about an elegant and exciting result in combinatorics.

————————-

I’m fortunate enough to be spending this semester in beautiful Los Angeles as a core participant in the 2009 long program on combinatorics at IPAM (an NSF-funded math institute on UCLA’s campus). We’ve recently wrapped up the first part of the program, which consisted of tutorial lectures on various topics in combinatorics. There was an abundance of gorgeous mathematics, and with Renato’s kind permission I’d like to hijack his blog and write about what to me was one of the most memorable lectures.

This was a lecture by Jozsef Solymosi of the University of British Columbia describing some of his recent work on the sum-product problem in finite fields. In particular, he outlined a spectral-graph-theoretic proof of a recent sum-product inequality due to Garaev. Solymosi’s proof is an extremely clever and elegant application of spectral graph theory to a classical problem in combinatorial number theory, and so I thought I’d present it here. Before stating the result and giving Solymosi’s proof, let us begin with a very brief introduction to the sum-product problem.

1. Introduction

Given a finite set of real numbers , define the sum set

and the product set

Both the sum set and the product set clearly must have cardinality between and . Observe that if is an arithmetic progression then while , while if is a geometric progression then while . Intuition suggests that keeping small by giving lots of additive structure inevitably blows up , while keep small by giving lots of multiplicative structure in turn blows up . For an arbitrary , one would expect at least one of these sets, if not both, to be fairly large.

Estimating the maximum of and is the sum-product problem. It was posed in a paper by Erdos and Szemeredi, who proved the existence of a small constant such that

for any finite . They conjecture that we actually have

for any 0}' title='{\delta > 0}' class='latex' /> and sufficiently large . In other words, the value of in their bound can be made arbitrarily close to 1.

Much ink has been spilled in attempts to push up the value of . At present, the best sum-product bound is due to Solymosi and gives . As an aside, I want to mention an extremely simple and elegant probabilistic proof of Elekes that gives ; it is detailed in Alon and Spencer’s classic text The Probabilistic Method, and is an absolute gem (look in the chapter on crossing numbers of graphs).

2. The finite field variant

Solymosi’s IPAM lecture was not, however, on this original sum-product problem, but rather on its natural finite field analogue: if is a subset of , the finite field of prime order , what can we say about the maximum of and ? Observe that it is important to consider fields whose order is prime and not the power of a prime, for in the latter case we could take to be a subring and end up with the degenerate case .

Bourgain, Katz and Tao got the party started in 2004 by proving the following sum-product bound.

Theorem 1 For all 0}' title='{\epsilon > 0}' class='latex' /> there exists 0}' title='{\delta > 0}' class='latex' /> such that if with then

We note that the implied constant also depends on . The best known bound is the following, due to Garaev (2007).

Theorem 2 If then

To illustrate the theorem, observe that if then . It is this theorem that Solymosi proves using spectral graph theory (Garaev’s original proof went by way of Fourier analysis and the estimation of exponential sums).

3. Solymosi’s proof

In this section we give Solymosi’s proof of Theorem 2 (we will assume familiarity with basic facts about eigenvalues and eigenvectors of adjacency matrices). Let us first establish some notation. Consider a -regular graph and let the eigenvalues of its adjacency matrix be

As usual, we define

Recall that if is connected and non-bipartite then is strictly smaller than , and such a graph is an expander if is bounded below by some constant.

We will make use of a fundamental result in spectral graph theory: the expander mixing lemma.

Lemma 3 Let be a -regular graph with vertices. For all we have

where is the number of edges with one endpoint in and the other in .

The proof of the lemma is straightforward; we omit it because of space considerations, but it can be found in the survey on expanders by Linial, Hoory, and Wigderson.

Now, back to Theorem 2: we have a finite field and a subset . Solymosi proves the desired lower bound on by constructing a sum-product graph over and using its spectral properties to reason about and . So without further ado, let’s define .

The vertex set of is , and two vertices have an edge between them if . It is easy to see that has vertices and is -regular (some edges are loops). We also have the following key fact.

Lemma 4 Consider . If or then these two vertices have no common neighbor, otherwise they have precisely one.

Proof: This follows from the fact that the unique solution of the system

is given by

which is only defined when .

Lemma 4 can be used to show that is in fact a very good expander (those more familiar with the literature on expanders will recognize that moreover, is almost a Ramanujan graph).

Lemma 5 .

Proof: Let be the adjacency matrix of . Recall that the -entry of is the number of walks from to of length 2. If , this number is the degree , while if , with and , Lemma 4 tells us that this number is 1 if and and 0 otherwise. It follows that

where is the all-1 matrix, is the identity matrix, and is the adjacency matrix of the graph whose vertex set is the same as , and in which two vertices and are connected by an edge if or . It is easy to see that is a -regular graph.

Since is regular and its adjacency matrix is symmetric, we know that the all-1 vector is an eigenvector of and all other eigenvectors are orthogonal to it. It is easy to check that is connected and not bipartite, so that the eigenvalue has multiplicity 1, and for any other eigenvalue we have .

Given such an eigenvalue , let be a corresponding eigenvector. Then by equation~(1),

since is the all-0 vector. Therefore is an eigenvalue of .

Now, the degree of is an upper bound on the absolute value of every eigenvalue of . It follows that

which implies , as desired.

So is an expander; very good, but what about ? Solymosi introduces into the proof through very clever use of the expander mixing lemma: if we define by

then that lemma tells us that

where the second inequality used Lemma 5.

But for every there is an edge between and , so that . Using this observation and rearranging the resulting inequality gives

\left( \frac{\sqrt{|A \cdot A||A+A|}}{p|A|}+\sqrt{3}\frac{p^{1/2}}{|A|^2} \right)^{-1}. ' title='\displaystyle \sqrt{|A \cdot A||A+A|} > \left( \frac{\sqrt{|A \cdot A||A+A|}}{p|A|}+\sqrt{3}\frac{p^{1/2}}{|A|^2} \right)^{-1}. ' class='latex' />

Now, since for positive and , we find that

which in turn implies

To finish the proof, we need only cite the two-term AM-GM inequality:

4. A very terse bibliography

Solymosi’s proof is from his paper “Incidences and the spectra of graphs” (requires institutional access).

A more knowledgeable treatment of sum-product problems than I could ever provide can be found in these two entries from Terry Tao’s blog. In these, Tao provides a detailed introduction to the problem, gives the probabilistic proof of Elekes’ bound, discusses an interesting cryptographic application, and provides many references.

Igor Gorodezky

Competitive Auctions

renatoppl — Thu, 17 Sep 2009 19:18:38 +0000

This week I will present the Theory Discussion Group about Competitive Auctions. It is mainly a serie of results in papers from Jason Hartline, Andrew Goldberg, Anna Karlin, Amos Fiat, … The first paper is Competitive Auctions and Digital Goods and the second is Competitive Generalized Auctions. My objective is to begin with a short introduction about Mechanism Design, the concept of truthfulness and the characterization of Truthful Mechanisms for Single Parameter Agents. Then we describe the Random Sampling Auction for Digital Goods and in the end we discuss open questions. I thought writting a blog post was a good way of organizing my ideas to the talk.

1. Mechanism Design and Truthfulness

A mechanism is an algorithm augmented with economic incentives. They are usually applied in the following context: there is an algorithmic problem and the input is distributed among several agents that have some interest in the final outcome and therefore they may try manipulate the algorithm. Today we restrict our attention to a specific class of mechanisms called single parameter agents. In that setting, there is a set consisting of agents and a service. Each agent has a value for receiving the service and otherwise. We can think of as the maximum player is willing to pay for that service. We call an environment the subsets of the bidders that can be simultaneously served. For example:

Single item auction:
Multi item auction:
Digital goods auction:
Matroid auctions: is a matroid on
Path auctions: is the set of edges in a graph and is the set of -paths in the graph
Knapsack auctions: there is a size for each and iff for a fixed

Most mechanism design problems focus in maximizing (or approximating) the social welfare, i.e., finding maximizing . Our focus here will be maximizing the revenue of the auctioneer. Before we start searching for such a mechanism, we should first see which properties it is supposed to have, and maybe even first that that, define what we mean by a mechanism. In the first moment, the agents report their valuations (which can be their true valuations or lies), then the mechanism decides on an allocation (in a possibly randomized way) and charges a payment for each allocated agents. The profit of the auctioneer is and the utility of a bidder is:

The agents will report valuations so to maximize their final utility. We could either consider a general mechanism e calculate the profit/social welfare in the game induced by this mechanism or we could design an algorithm that gives incentives for the bidders to report their true valuation. The revelation principle says there is no loss of generality to consider only mechanisms of the second type. The intuition is: the mechanisms of the first type can be simulated by mechanisms of the second type. So, we restrict our attention to mechanisms of the second type, which we call truthful mechanisms. This definnition is clear for deterministic mechanisms but not so clear for randomized mechanisms. There are two such definitions:

Universal Truthful mechanisms: distribution over deterministic truthful mechanisms, i.e., some coins are tossed and based on those coins, we choose a deterministic mechanism and run it. Even if the players knew the random coins, the mechanism would still be truthful.
Truthful in Expectation mechanisms: Let be the utility of agent if he bids . Since it is a randomized mechanism, then it is random variable. Truthful in expectation means that .

Clearly all Universal Truthful mechanisms are Truthful in Expectation but the converse is not true. Now, before we proceed, we will redefine a mechanism in a more formal way so that it will be easier to reason about:

Definition 1 A mechanism is a function that associated for each a distribution over elements of .

Theorem 2 Let be the probability that is allocated by the mechanism given is reported. The mechanism is truthful iff is monotone and each allocated bidder is charged payment:

This is a classical theorem by Myerson about the characterization of truthful auctions. It is not hard to see that the auction define above is truthful. We just need to check that for all . The opposite is trickier but is also not hard to see.

Note that this characterization implies the following characterization of deterministic truthful auctions, i.e., auctions that map each to a set , i.e., the probability distribution is concentrated in one set.

Theorem 3 A mechanism is a truthful deterministic auction iff there is a functions such that for each we allocate to bidder iff and in case it is allocated, we charge payment .

It is actually easy to generate this function. Given a mechanism, is a monotone and is a -function. Let the point where it transitions from to . Now, we can give a similar characterization for Universal Truthful Mechanism:

Theorem 4 A mechanism is a universal truthful randomized auction if there are functions such that for each we allocate to bidder iff and in case it is allocated, we charge payment , where are random bits.

2. Profit benchmarks

Let’s consider a Digital Goods auction, where . Two natural goals for profit extraction would be and where we can think of , the first is the best profit you can extract charging different prices and the second is the best profit you can hope to extract by charging a fixed price. Unfortunately it is impossible to design a mechanism that even -approximates both benchmarks on every input. The intuition is that can be much larger then the rest, so there is no way of setting in a proper way. Under the assumption that the first value is not much larger than the second, we can do a good profit approximation, though. This motivates us to find an universal truthful mechanism that approximates the following profit benchmark:

which is the highest single-price profit we can get selling to at least agents. We will show a truthful mechanism that -approximates this benchmark.

3. Profit Extractors

Profit extractor are building blocks of many mechanisms. The goal of a profit extractor is, given a constant target profit , extract that profit from a set of agents if that is possible. In this first moment, let’s see as an exogenous constant. Consider the following mechanism called CostShare: find the largest s.t. . Then allocate to

Lemma 5 CostShare is a truthful profit-extractor that can extract profit whenever .

Proof: It is clear that it can extract profit at most if . We just need to prove it is a truthful mechanism and this can be done by checking the characterization of truthful mechanisms. Suppose that under CostShare exacly bidders are getting the item, then let’s look at a bidder . If bidder is not getting the item, then his value is smaller than , otherwise we could incluse all bidders up to and sell for a price for some k}' title='{k_1 > k}' class='latex' />. It is easy to see that bidder will get the item just if he changes his value to some value greater or equal than .

On the other hand, it is currently getting the item under , then increasing his value won’t make it change. It is also clear that for any value , he will still get the item. For he doesn’t get it. Suppose it got, then at least people get the item, because the price they sell it to must be less than . Thefore, increasing back to its original value, we could still sell it to players, what is a contradiction, since we assumed we were selling to players.

We checked monotonicity and we also need to check the payments, but it is straightforward to check they satisfy the second condition, since for and zero instead.

4. Random Sampling Auctions

Now, using that profit extractor as a building block, the main idea is to estimate smaller than for one subset of the agents and extract that profit from them using a profit extractor. First we partition is two sets and tossing a coin for each agent to decide in which set we will place it, then we calculate and . Now, we run CostShare and CostShare. This is called Random Cost Sharing Auction.

Theorem 6 The Random Cost Sharing Auction is a truthful auction whose revenue -approximates the benchmark .

Proof: Let be a random variable associated with the revenue of the Sampling Auction mechanism. It is clear that . Let’s write meaning that we sell items at price . Let where and are the items among those items that went to and respectively. Then, clearly and , what gives us:

and from there, it is a straighforward probability exercise:

since:

and therefore:

This similar approximations can be extended to more general environments with very little change. For example, for multi-unit auctions, where we use the benchmark and we can be -competitive against it, by random-sampling, evaluating on both sets and running a profit extractor on both. The profit extractor is a simple generalization of the previous one.

Minimum average cost cycle and TSP

renatoppl — Thu, 03 Sep 2009 04:01:34 +0000

After some time, I did again Code Jam – well, not again, this is the first time I do Code Jam, but there is a while I don’t do Programming Competitions. Back in my undergrad I remember all the fun I had with my ACM-ICPC team solving problems and discussing algorithms problems. Actually, ICPC was what made me interested in Algorithms and Theory of Computing for the first time. I was remembering that not only because Code Jam because I came across a nice problem whose solution I learned in programming competitions, specifically a technique I learned to solve this problem.

Let’s formulate the problem in a more abstract way: Given a graph and two functions: a cost function and a benefit function , we define the cost-benefit of a set of edges as . Now, consider those two questions:

Question I: Find the spanning tree of maximum (minimum) cost-benefit.

Question II: Find the cycle of maximum (minimum) cost-benefit.

The solution of those uses binary search. If we can answer the following query: given 0}' title='{\beta > 0}' class='latex' />, is there a cycle (spanning tree) of cost-benefit smaller (larger) than ? We either state there is no such tree (cycle) or exhibit that. How can we solve this? It is simple: consider the graph with edge weights given by . Then there is a cycle (spanning tree) of cost benefit if and only if there is a cycle (spanning tree) in this graph with transformed weights with negative total weight. Finding a cycle with negative weight is easy and can be done, for example, using Bellman Ford’s algorithm. Finding a spanning tree with negative weights can be done using any minimal spanning tree algorithm, as Kruskal, Prim or Boruvka.

Taking for all we can find using binary search, the cycle with smallest average length, i.e., smallest where is the number of edges in the cycle.

Asymmetric Travelling Salesman Problem

We can use this trick just described to design an -approximation to the asymmetric TSP problem. Consider we have nodes in and a function , not necessarily symmetric, such that the triangular inequality holds, i.e., . A TSP tour is an ordering and has total cost:

where . Let OPT be the cost of the optimal tour. It is NP-complete to calculate the optimal, but consider the following approximation algorithm: find the cycle with smallest average cost. Then remove all the nodes in that cycle except one, in the remaining graph find again the cycle of smallest average cost and remove all nodes except one. Continue doing that until there is just one node left. Taking all those cycles together, we have a strongly connected Eulerian graph (in-degrees are equal to out-degrees) for each node). I claim that the total weight of edges in this Eulerian graph is:

where is the harmonic number. Now, since we have this graph we can find an Eulerian tour and transform it into a TSP tour shortcutting when necessary (triangle inequality guarantees that shortcutting doesn’t decrease the cost of the tour). So, we just need to prove the claim.

In fact, it is not hard to see that after removing some nodes, the optimal tour is still , where is the tour of smallest cost for all nodes. To see this, just take the original tour and shortcut it, for example, if the original tour passed through a sequence of nodes but nodes then by triangle inequality:

so we can just substitute the edges by . Now, suppose we do iterations and in the beginning of the iteration there are nodes left. So, clearly the average length of the cycle we picked in the algorithm is and therefore, if are the cycles chosen, we have:

since:

we plug those two expressions together and we get the claim.

Entropy

renatoppl — Fri, 28 Aug 2009 01:28:54 +0000

Today was the first day of classes here at Cornell and as usual, I attend to a lot of different classes to try to decide which ones to take. I usually feel like I wanted to take them all, but there is this constant struggle: if I take too many classes I have no time to do research and to read random things that happen to catch my attention at that moment, and if I don’t take many classes I feel like not learning a lot of interesting stuff I wanted to be learning. The solution in the middle of the way is to audit a lot of classes and start dropping them as a start needing more time: what happens usually quickly. This particular fall I decided that I need to build a stronger background in probability – since I am finding a lot of probabilistic stuff in my way and I have nothing more than my undergrad course and things I learned on demand. I attended at least three probability classes with different flavours today and I decided to blog about a simple, yet very impressive result I saw in one of them.

Since I took a class on “Principles of Telecommunications” in my undergrad, I became impressed by Shannon’s Information Theory and the concept of entropy. There was one theorem that I always heard about but never saw the proof. I thought it was a somewhat complicated proof, but it turned out not to be that much.

Consder an alphabet and a probability distribution over it. I want to associate to each a string of -digits to represent each simbol of the alphabet. One way of allowing the code to be decodable is to make them a proper code. A proper code is a code such that given any and , is not a prefix of . There are several codes like this, but some are more efficient then others. Since the letters have different frequencies, it makes sense to code a frequent letter (say ‘e’ in English) with few bits and a letter that doesn’t appear much, say ‘q’ with more bits. We want to find a proper code to minimize:

The celebrated theorem by Shannon shows that for any proper code (actually it holds more generally for any decodable code), we have where is the entropy of the alphabet, defined as:

even more impressive is that we can achieve something very close to it:

Theorem 1 There is a code such that .

With an additional trick we can get for any 0}' title='{\epsilon > 0}' class='latex' />. The first part is trickier and I won’t do here (but again, it is not as hard as I thought it would be). For proving that there is a code with average length we use the following lemma:

Lemma 2 There is a proper code for with code-lengths if and only if

Proof: Let and imagine all the possible codewords of length as a complete binary tree. Since it is a proper code, no two codes and are in the same path to the root. So, picking one node as a codeword means that we can’t pick any node in the subtree from it. Also, for each leave, the is at most one codeword in its path to the root. Therefore we can assign each leaf of the tree to a single codeword or to no codeword at all. It is easy to see that a codeword with size has associated with it leaves. Since there are leaves in total, we have that:

what proves one direction of the result. Now, to prove the converse direction, we can propose a greedy algorithm: given and such that , let . Now, suppose . Start with leaves in a whole block. Start dividing them in blocks and assign one to . Now we define the recursive step: when we analyze , the leaves are divided in blocks, some occupied, some not. Divide each free block in blocks and assign one of them to . It is not hard to see that each block corresponds to one node in the tree (the common ancestor of all the leaves in that block) and that it corresponds to a proper code.

Now, using this we show how to find a code with with . For each , since we can always find such that . Now, clearly:

and:

Cool, but now how to bring it to ? The idea is to code multiple blocks at the same time (even if they are independent, we are not taking advantage of correlation between the blocks). Consider and the probability function induced on it, i.e.:

It is not hard ot see that with has entropy because:

and then we can just apply the last theorem to that: we can find a function that codifies symbols with symbols such that:

since codifies symbols, we are actually interested in and therefore we get:

Programming and Storytelling

renatoppl — Sat, 22 Aug 2009 00:35:25 +0000

Recently I looked in Papadimitriou’s website looking for something else and found this great article called: “MythematiCS: In Praise of Storytelling in the Teaching of Computer Science and Math”. He begins by pointing out the in early times knowledge was transferred mostly by storytelling – and there is much more place in contemporary technical teaching to storytelling than most of people realize. He has several interesting points: one of them is that we can think of writting a computer program as telling a story. For example, the variables are the characters: they have characteristics (data types) and the whole story (program) is about playing around with them. Sometimes they have multiple faces and behaviors depending on the circumstance (polymorphism). Iteration and recursion are common literary tools, used for example in fairy tales “in the first day, this happens, in the second day, that happens, then…” or “he might be able to do that just if he does that…”. He mentions one of my favourite books: “If on a Winter’s Night a Traveler” as a great example of recursion. This made me think that maybe Italo Calvino is my favourite author because his stories are so beautifully constructed in an almost mathematical fashion – like an Escher paiting orÂ Bach music. They went very far in showing the beauty of math and showing it is really one art. For example, this beautiful representation of the hyperbolic plane:

Back to programming there are still a lot of interesting relations: several novels are multi-threaded. We look at the novels from perspectives of multiple characters. Stories also need to “compile and run”, which in this case mean, make sense and be accepted by people. I was thinking that there are a lot of books which everyone knows about but very few people have ever read (Ulisses, for example). Are those NP-complete problems?

Back to Papadimitriou’s article, he talks about a few interesting books that do a good job in mixing together math and stories. One that he mentions I read a long time ago, still when I was in high-school and it did a great job in further stimulating me on math. The book was The Parrot’s Theorem. Recently I also read one other book that he mentioned: Surreal Numbers, by Don Knuth. Although I am a great fan of almost everything Knuth writes, this book didn’t caught me much. I think it may be because I am not the right audience. If I read it a couple of years back I might have enjoyed it much more.

When I was in Greece last year, I came across this very interesting comic book: Logicomix. It was in Greek but just by looking into it I figured out it was something about math and it seemed pretty cool. Later I found out this was written by Papadimitriou and Doxiadis, which made me even more curious to read it. Now I am waiting the English translation of it. One last pointer: Doxiadis has a webpage with some interesting essays about the relations of mathematical proofs, computer programming and storytelling.

as t

MythematiCS: ⁽¹⁾

In Praise of Storytelling in the Teaching of
Computer Science and Math

Duality Theorem for Semidefinite Programming

renatoppl — Mon, 10 Aug 2009 04:44:28 +0000

I’ve been reading through Luca Trevisan’s blog posts about relaxations of the Sparsest Cut problem. He discusses three relaxations: (i) spectral relaxation; (ii) Leighton-Rao and (iii) Arora-Rao-Vazirani. (i) and (iii) are semi-definite programs (SDP), which is an incredibly beautiful and useful generalization of Linear Programs. This technique is the core of the famous MAX-CUT algorithm from Goemans and Williamson and was also applied to design approximation algorithms for a large variety of combinatorial problems. As for Linear Programming, there is a Duality Theorem that allows us to give certificates to lower and upper bounds. I was curious to learn more about it and my friend Ashwin suggested me those great lecture notes by Lazlo Lovasz about semi-definite programming. I was surprised that the proof of the duality theorem for SDP is almost the same as for LP, just changing a few steps. We begin by using the Convex Separation Theorem to derive a Semi-definite version of Farkas’ Lemma. We use this version of Farkas’ Lemma to prove the Duality Theorem in a similar manner that is normally done for LP. Let’s go through the proof, but first, let’s define positive semidefinite matrices:

Definition 1 A matrix is said to be positive semidefinite (denoted by ) if for all we have .

It is the same as saying that all eigenvalues of are non-negative, since the smallest eigenvalue of is given by . We denote when all eigenvalues of are stricly positive. It is the same as 0}' title='{x^t A x > 0}' class='latex' /> for all . Also, given two matrices and we use the following notation: what is nothing more than the dot-product when we see those matrices as vectors in . Therefore:

Definition 2 A semi-definite program (SDP) is an optimization problem where the variable is a matrix and has the form:

that is, it is a linear program and the restriction that the variable, viewed as a matrix, must be positive semi-definite.

The interesting thing is to use that the set of semi-definite matrices is a convex cone, i.e., if then for any . It is easy to see this is a convex set. Now we use the following theorem to prove Farkas’ Lemma:

Theorem 3 (convex separation) Given two convex sets such that then there is such that for all and for all . Besides, if one of them is a cone, it holds with .

Theorem 4 (Semi-definite Farkas’ Lemma) Exacly one of the following problems has a solution:

,

, …, ,

Proof: If problem doesn’t have a solution then the cone is disjoint from the convex cone and therefore they can be separated. This means that there is ( plays the role of in the convex separation theorem), such that: for all and for all . Now, taking and all others and then later and all other zero, we can easily prove that .

It remains to prove that . Just take for and then we have that for all and therefore, .

Now, we can use this to prove the duality theorem. We will use a slighly different version of Farkas’ Lemma together with an elementary result in Linear Algebra. The following version comes from applying the last theorem to the matrices:

instead of .

Theorem 5 (Semi-definite Farkas’ Lemma: Inomogeneous Version) Exacly one of the following problems has a solution:

,

, …, , ,

where means . Now, an elementary Linear Algebra result before we can proceed to the proof of the duality theorem:

Theorem 6 Given two semi-definite matrices and we have , where

Proof: The trace of a matrix is invariant under change of basis, i.e. for any non-singular matrix , we have that . This is very easy to see, since:

Now, we can write in the basis of ‘s eigenvectors. Let be a matrix s.t. where Now:

since because the matrix is positive semi-definite and:

Now, we are ready for the Duality Theorem:

Theorem 7 (SDP Duality) The dual of the program 1 is given by:

We call 1 the primal problem and 2 the dual. If is feasible for 1 and is feasible for 2, then . Besides, if the dual has a strict feasible solution (i.e. with ) then the dual optimum is attained and both have the the same optimal value.

As we can see in the theorem above, the main difference between this duality theorem and the duality theorem of linear programming is this “stictly feasible” condition in the theorem. Let’s proceed to the proof. As in the the LP proof the outline is the following: first we prove weak duality ( for feasible ) and then we consider the dual restrictions with v^*_{dual}}' title='{d^t y > v^*_{dual}}' class='latex' /> and we apply Farkas’ Lemma:

Proof: Weak duality: If is feasible for 1 and is feasible for 2 then:

because and the of two positive semi-definite matrices is non-negative. Now:

by 1. Therefore,

Strong duality: Clearly, the following problem has no solution:

v^*_{dual}' title='y^t d > v^*_{dual}' class='latex' />

where is the optimal value of 2. Now we apply the inomogeneous Farkas’ Lemma to:

and we get a matrix such that:

which means: and . We know that and . We just need to prove that 0}' title='{x_{n+1} > 0}' class='latex' />. We already know it is not negative, so we need to prove it is not zero and here we will need the hypothesis that the dual has a strict solution. If then there would be solution to:

and by Farkas’ Lemma, the problem:

would have no solution and we know it does have a solution. Therefore, 0}' title='{x_{n+1} > 0}' class='latex' />.

Everything else comes from combining the weak and strong version described above.

Mechanism Design

renatoppl — Wed, 05 Aug 2009 16:36:39 +0000

In the early stage of algorithm design, the goal was to solve exacly and efficiently the combinatorial problems. Cook’s Theorem and the NP-completeness theory showed us that some problems are inehently hard. How can we solve this problem? By trying to find approximated solutions or by trying to look at restricted instances of those problems that are tractable. It turns out that the lack of computational power (NP-hardness in some sense) is not the only obstracle to solving problems optimally. Online algorithms propose a model where the difficulty is due to lack of information. The algorithm must take some decisions before seeing the entire input. Again, in most of the cases it is hopeless to get an optimal solution (the solution we could get if we knew the input ahead of time) and our goal is not to be far from it. Other example of natural limitation are streaming algorithms, where you should solve a certain problem with limited memory. Imagine for example one algorithm that runs in a router, that received gigabits each second. It is impossible to store all the information, and yet, we want to process this very very large input and give an anwer at some point.

An additional model is inspired in economy: there are a bunch of agents which have part of the input to the algorithm and they are interested in the solution, say, they have a certain value for each final outcome. Now, they will release their part of the input. The may lie about it to manipulate the final result according to their own interest. How to prevent that? We need to augment the algorithm with some economic incentives to make sure they don’t harm the final solution too much. We need to care about two things now: we still want a solution not to far from the optimal, but we also need to provide incentives. Such algorithm with incentives is called a mechanism and this represents one important field in Algorithmic Game Theory called Mechanism Design.

The simplest setting where this can occur is in a matching problem. Suppose there are people and one item. I want to decide which person will get the item. Person has a value of of the item. Then I ask people their values and give the item to the person that has a higher value and this way I maximize the total cost of the matching. But wait, this is a not a mechanism yet. People don’t have incentives to play truthfully in this game. They may want to report higher valuations to get the item. The natural solution is to charge some payment from whoever gets the item. But how much to charge? The solution is the Vickrey auction, where we charge the second highest bid to the winner. More about Vickrey auction and some other examples of truthful mechanisms can be found in the Algorithmic Game Theory book, which can be found in Tim Roughgarden’s website.

One of the most famous early papers in this are is the “Optimal Design Auction” by Myerson where he discusses auctions mechanisms that maximize the profit for the seller. After reading some papers in this area, I thought I should return and read the original paper by Myerson, and it was indeed a good idea, since the paper formalizes (or makes very explicit) a lot of central concept in this area. I wanted to blog about the two main tools in mechanism design discussed in the paper: the revelation principle and the revenue equivalence theorem.

Revelation Principle

We could imagine thousands of possible ways of providing economic incentives. It seems a difficult task to look at all possible mechanisms and choose the best one. The good thing is: we don’t need to look at all kinds of mechanisms: we can look only at truthful revelation mechanisms. Let’s formalize that: consider bidders and each has value for getting what they want (suppose they are single parameter agents, i.e., they get if they get what they want or if they don’t). We can represent the final outcome as a vector . Here we consider also randomizes mechanisms, where the final outcome can be, for example, allocate the item to bidder with probability . This vector has some restrictions imposed by the structure of the problem, i.e., if there is only one item, then we must have . In the end of the mechanism, each player will pay some amount to the mechanism, say . Then at the end, player has utility: and the total profit of the auctioneer is .

Let’s describe a general mechanism: in this mechanism each player has a set of strategies, say . Each player chooses one strategy and the mechanism chooses one allocation and one payment function based on the strategies of the players. and . Notice it includes even complicated multi-round strategies: in this case, the strategy space would be a very complicated description of what a player would do for each outcome of each round.

Let be the set of the possible valuations a player would have. So, given the mechanism, each player would pick a function , i.e., is the strategy he would pick if he observed his value was . This mechanism has an equilibrium if there is a set of such functions that are in an equilibrium, i.e., no one would change his function and be better off. If those exist, we can implement a this as a direct revelation mechanism. A direct revelation mechanism is a mechanism where the strategies are simply to reveal a value in . So, I just ask each player his valuation.

Given a complicated mechanism , and equilibrium strategies , I can implement this as a direct revelation mechanism just by taking:

It is not hard to see that if the mechanism is , for the players the best thing to do is to reveal directly their true valuation, eliminating all the complicated steps in between. One practical example of a direct revelation mechanism is the Vickrey auction. Consider the English auction, which is the famous auction that happens in most of movied depicting auctions (or Sotheby’s for a clear example): there is one item and people keep raising their bids untill everyone else drops and the last person still biddings gets the item. The clear strategy in those auctions is to raise your bid as long as the current value is below your valuation and there still other bidders that haven’t dropped the auction. Clearly the person with highest value will get the item. Let be the second highest value. It is not hard to see that all but one will drop the auction after , so the highest bidders is likely to pay . This is exacly the Vickrey auction, where we just emulate the English as a direct revelation mechanism. There are of course other issues. The following quote I got from this paper:

“God help us if we ever take the theater out of the auction business or anything else. It would be an awfully boring world.” (A. Alfred Taubman, Chairman, Sotheby’s Galleries)

So, we can restrict our attention to mechanims in the form and that are truthful, i.e., where the players have no incentives not to report their true valuation. We can characterize truthful auctions using the next theorem. Just a bit of notation before: let , , … and for , let be the joint probability distribution and be the joint probability distribution over :

Theorem 1 An auction is truthful if and only if, for all possible probability distributions over values given by , …, we have

is monotone non-decreasing

where

Revenue Equivalence

The second main tool to reason about mechanisms concerns the revenue of the mechanism: it is Myerson’s Revnue Equivalence Principle, which roughly says that the revenue under a truthful mechanism depends only on the allocation and not on the payment function. This is somehow expected by the last theorem, since we showed that when a mechanism is truthtful, the payments are totally dependent on .

The profit of the auctioneer is given by . We can substitute by the payment formula in last theorem obtaining:

We can invert the order of the integration in the second part, getting:

So, we can rewrite profit as:

And that proves the following result:

Theorem 2 The seller’s expected utility from a truthful direct revelation mechanism depends only on the assignment function .

Now, to implement a revenue-maximizing mechanism we just need to find that optimize the profit functional above still meeting the truthfulness constraints in the first theorem. This is discussed by Myerson in his paper. There is just one issue here: our analysis is dependent on the probabilities . There are various approaches to that:

Assume that the values of the bidders are drawn from distribution and and given to them. The distributions are public knowledge but the realization of is just known to bidder .
Bidders have fixed values (i.e., are the Dirac distribution concentrated on ) and in this case, the revenue maximizing problem becomes trivial. It is still an interesting problem in the point of view of truthfulness. But in this case, we should assume that is fixed but just player knows its value.
The distributions exist but they are unknown by the mechanism designer. In this case, he wants to design a mechanism that provided good profit guaranteed against all possible distributions. The profit guarantees need to be established accourding to some benchmark. This is called prior-free mechanism design.

More references about Mechanism Design can be found in these lectures by Jason Harline, in the original Myerson paper or in the Algorithmic Game Theory book.

Consistent Labeling

renatoppl — Tue, 04 Aug 2009 04:14:48 +0000

This week, me and Igor are presenting the paper “Metric clustering via consistent labeling” by Krauthgamer and Roughgarden in our Algorithms Reading Group. To prepare for the presentation, I thought that writting a blog post about it was a nice idea. Consistent Labeling is a framework that allows us to represent a variety of metric problems, as computing a separating decomposition of a metric space, a padded decomposition (both which are the main ingredient of embedding metric spaces into dominating trees), sparse cover, metric triangulation and so on. Here, I’ll define the Consistent Labeling Problem, formulate it as an Integer Program and show how we can get an approximation algorithm to it by rounding its linear relaxation.

First, consider a base set . We want to attribute labels in to the elements of respecting some constraints. First, each element has associated with it a subset of labels it can be assigned. Each element can receive at most labels. Second, there is a collection of subsets of , so that for each , there must be one label that is assigned to all elements in . For each element in , there is a penalty to violate this constraint.

Our goal is to find a probability distribution over labelings that minimizes the total penalty . Let’s formulate this is an integer program. For that, the first thing we need are decision variables: let be a -variable indicating if label is assigned to element . Let the variable mean that the label was assigned to all elements in and let mean that set is consistently labeled. A linear programming formulation can be therefore expressed as:

It is not hard to see that if are -variables then the formulation corresponds to the original problem. Now, let’s relax it to a Linear Program and interpret those as probabilities. The rounding procedure we use is a generalization of the one in “Approximation algorithms for classification problems” of Kleinberg and Tardos: until every object has labels, repeat the following procedure: pick and and assign label to all objects with t}' title='{x_{ai} > t}' class='latex' />. In the end, pick the first labels assigned to each object.

What we just described is a procedure that, given a solution to the relaxed LP, produces a randomized labeling of the objects. Now, we need to prove that the solution produced by this randomized labeling is good in expectation, that is, that is good compared to the optimal single deterministic assignment. The authors prove that they are within a factor of , where .

Theorem 1 For each , the probability that is consistently labeled is lower bounded by .

Proof: Since we are trying to lower bound the probability of getting consistently labeled, we can consider just the probability of all the elements in to be consistently labeled in the same iteration – let’s estimate this probability. This is:

If is chosen in iteration , all elements are labeled with if for all , so the probability is . So, we have:

Now, let be the probability that set is hit by the labeling in phase . If label is chosen, the set is hit by the labeling if , therefore:

inverting the order of the summation, we get:

The probability that is consistently labeled is smaller greater than the probability that it is consistently labeled in the same iteration before the set is hit times. In one iteration three things may happen: either the set is not hit, or it is hit but it is not consistently labeled or it is consistently labeled. The figure below measures how many times the set is hit. The down arrows represent the event that the set is consistently labeled:

A standard trick in this cases is to disregard the self-loops and normalize the probabilities. This way the probability that is consistently labeled is:

Now, we just use that and to obtain the desired result.

The approximation factor of follows straight from the previous theorem by considering the following inequalities:

and noting that the LP solution is a lower bound of the optimal value

Theorem 2 The rounding procedure gives a randomized algorithm that, in expectation, achieves a approximation to the optimal consistent labeling.