Abstract: The present paper suggests a procedure to enhance the operation of the heating, ventilation and air conditioning system, following the idea that a multi-objective optimal supervisory control for such a system should consider the cost of energy, activity schedules, occupancy patterns and the individual comfort preferences of each tenant. Considering that tenants tend to forget to adjust systems appropriately and that, in many spaces, the conditioning requirements are not adjusted to the occupancy of those spaces, the result is unnecessary energy waste. This paper studies the application of a discrete and a continuous reinforcement-learning-based supervisory control approach, which actively learns how to appropriately schedule thermostat temperature setpoints. The result is a learning controller that learns the statistical regularities in the tenant's behavior, allowing him/her to meet comfort requirements and optimize energy costs. Results are presented for a simulated thermal zone and tenant.