Abstract: In this paper we empirically investigate the feasibility of using peer-designed agents (PDAs) instead of people for the purpose of mechanism evaluation. This approach has been increasingly advocated in agent research in recent years, mainly due to its many benefits in terms of time and cost. Our experiments compare the behavior of 31 PDAs and 150 people in a legacy eCommerce-based price-exploration setting, using different price-setting mechanisms and diverse performance measures. The results show a varying level of similarity between the aggregate behavior obtained when using people and when using PDAs. In some settings similar results were obtained, in others the use of PDAs rather than people yielded substantial differences. This suggests that the ability to generalize results from one successful implementation of PDA-based systems to another, regarding the use of PDAs as a substitute for people in system evaluations, is quite limited. The decision to prefer PDAs for mechanism evaluation is therefore setting dependent and the applicability of the approach must be re-evaluated when switching to a new setting or using a different measure. Furthermore, we show that even in settings where the aggregate behavior is found to be similar, the individual strategies used by agents in each group highly vary. Finally, we report the results of an extensive comparative analysis of the level of optimality reflected in people's and PDAs' individual decisions in our decision making setting. The results show that the decisions of both groups are far from optimal, however the use of PDAs results in strategies that are more than twice as close to the optimal ones.
Keywords: Peer-designed agents, system evaluation, bounded rationality, decisions' optimality, simulation