Adaptive Duty Cycling in Sensor Networks With Energy Harvesting Using Continuous-Time Markov Chain and Fluid Models
The dynamic and unpredictable nature of energy harvesting sources available for wireless sensor networks, and the time variation in network statistics like packet transmission rates and link qualities, necessitate the use of adaptive duty cycling techniques. Such adaptive control allows sensor nodes to achieve long-run energy neutrality, where energy supply and demand are balanced in a dynamic environment such that the nodes function continuously. In this paper, we develop a new framework enabling an adaptive duty cycling scheme for sensor networks that takes into account the node battery level, ambient energy that can be harvested, and application-level QoS requirements. We model the system as a Markov decision process (MDP) that modifies its state transition policy using reinforcement learning. The MDP uses continuous time Markov chains (CTMCs) to model the network state of a node to obtain key QoS metrics like latency, loss probability, and power consumption, as well as to model the node battery level taking into account physically feasible rates of change. We show that with an appropriate choice of the reward function for the MDP, as well as a suitable learning rate, exploitation probability, and discount factor, the need to maintain minimum QoS levels for optimal network performance can be balanced with the need to promote the maintenance of a finite battery level to ensure node operability. Extensive simulation results show the benefit of our algorithm for different reward functions and parameters.