Reinforcement Role in Learning

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Table of Contents

Abstract
Introduction
Thesis Statement
Counterargument
Argument (Author’s Position)
Conclusion
References

Abstract

This paper discusses the role of reinforcement in learning. It supports the idea that all behavior is due to reinforcement learning. Agents such as students in a language class act according to the teaching they have acquired over time. They may not show responsive behavior towards a stimulus presented in the present, but the observation should not be a reason for assuming that reinforcement learning is not taking place. This paper shows that learning relies on a person’s reinforcement history. The opposing argument, which is not supported by this article, states that reinforcement history has its limits in affecting behavior. Any living thing does not have to incur new reinforcement of the same or different stimulus to develop a memory that helps in making new decisions. The paper shows that reinforcement is continuous whose effect does not end with time. The conclusion is that learning is a product of reinforcement, but reinforcement can be a variety of things. This finding implies that researchers will have to incorporate diverse theories in their discussions on research findings to ensure their results and interpretations remain credible. On the other hand, practitioners ought to increase awareness of the concepts of learning before choosing particular systems or technologies.

Introduction

In reinforcement learning, the person or animal receiving and interpreting new information has to evaluate the need to use the new information as it is or combine it with existing information to make a decision. This paper supports the idea that all observed behavior in a person or an animal is due to reinforcement. The understanding used is that learning is both the use of information stored in memory and incoming information that acts as stimuli. The learner is usually referred to as an agent. Any other thing, person, factor, or condition that affects or causes learning happens in the agent’s environment. The person or animal being observed can exhibit different behaviors and reactions to the same stimulus in the process of interacting with new information and retrieving the stored information. This paper asks whether there is something wrong with the theory of reinforcement learning, as part of finding enough literature support for reinforcement learning. It also seeks to find out whether any other alternative methods or theories can help explain the learning process. Although research on different models of reinforcement learning continues, one concept remains valid at all times, as suggested by the literature examined in this paper.

Reward and punishment reinforce behavior and become part of the memory (Shteingart & Loewenstein, 2014). The intensity of the reward and punishment affects the memory of the event, the environment, and its future cognition by the agent. Thus, the total number of reward and punishment events contributes to the overall learned experience of the agent.

People learn continuously because they are always retrieving information from memory and using it to interpret the new information received. At the same time, they rely on new information to understand the previous information stored in memory. While doing so, they gather the experience of different reinforcements that inform their present and future decisions for exploiting stored knowledge or exploring new knowledge to increase understanding. The information that individuals get can be in various forms, like direct instruction or punishment after acting.

Why the Topic is Important

Reinforcements come in different forms, and some may seem to work better than others. As a result, applying reinforcement learning will likely take an experimental outlook as people seek to find the best way to achieve the desired behavior by weighing several options. A person seeking to induce behavior in another individual may use a particular technique that relies on reinforcement learning and still fail to achieve the objective. For example, training a person to associate binge eating with bodily harm can fail to work even when a person follows the right procedures of reinforcing the image of physical injury. If this happens, then the person might end up erroneously thinking that the method or the technology used is more important than the concept of reinforcement. Therefore, people need to understand how reinforcement works so that they do not succumb to false assumptions that the concept does not work in influencing behavior. Any study that helps to clarify the way reinforcement works will be useful to practitioners and future researchers.

Thesis Statement

The presence of reinforcement or the lack of it does not imply that learning will not take place or does not take place. People and animals behave according to historical reinforcements that they experience. The behavior observed may not always be due to the present reinforcement. It can be due to past reinforcement, which overshadows the current reinforcement. Learning will not take place without any reinforcement.

Counterargument

The main argument against the concept of reinforcement is that learning can occur without reinforcement. Support for this argument comes from several studies done on learning and the role of reinforcement. One of the examples of such studies is an experiment done with rats, where results showed that the animals failed to form conditioned responses (Iordanova, Good, & Honey, 2008). The study hypothesized that the results would demonstrate the ability to learn without reinforcements. The researchers wanted to demonstrate that rats conditioned to behave in a given way would change behavior when presented with a new stimulus. They expected this to happen because the new stimulus was increasing the complexity of the behavior and forcing the rats to lose their coordination. The research findings were significant because they disqualified the hypothesis of the study. They showed that the intention of the research and its objectives were flawed. The findings of the research changes in behavior were also expected not to conform to changes in the incentive, but the study concluded that rats formed associations, which signaled the presence and absence of reinforcement. Also, the research showed that rats used in the experiment changed behavior to conform partially to the stimulus. The researchers, Iordanova, Good, and Honey (2008), went on to suggest that rats could form an integral or configurable memory that does not require reinforcement. They based their conclusion on the partial association rather than the full association. Therefore, their argument that reinforcement is not necessary for learning is not valid.

Additional suggestions that disapprove the need for reinforcement claim that learning can naturally occur after performing a task and learners become familiar with a new concept with enough repetition. Therefore, instead of repetition, learning only needs sufficient time and resources to complete tasks for a personal reward that is not associated with reinforcements. The argument here is that time is more important than reinforcement. It appears that those who support this argument believe that the brain is capable of coming up with self-reinforcement; it is not dependent on external reinforcement. This statement implies that there are types of reinforcement that work and others that do not work. Supporters of the idea that time is all that is required to influence behavior to consider the influence of cost, time, and resources to control behavior. They show that reinforcement is only a byproduct of considering the other features that affect behavior. Supporters of the argument state that learning outcomes can still be achieved by doing without reinforcement in the learning process. Thus, learners do not need to work with tests, use summaries, or engage in class discussions when they are learning in a typical class setting (Ringbauer & Neumann, 2011). The fault in this argument is that it fails to consider the notion of reinforcement history. It only considers the present reinforcement, which is the wrong way to approach the principle of reinforcement and its association with the behavior.

In summary, the argument is that introducing a new stimulus causes complexity in learning, such that individuals will not show full behavior response to reinforcement. Also, reinforcement only works partially when there is sufficient time to allow it to affect behavior. The counterargument will be presented in the next section, showing that it is not possible to neglect the history of reinforcement when making behavior interpretations.

Argument (Author’s Position)

This paper reiterates that learning is the result of a reinforcement history. Even though some situations may seem to lack reinforcement, they still have a historical association with reinforcement. A person may respond to a stimulus presented to him or her by not showing any action or behavior because this is a responsive option arising from the person’s history of reinforcement. In this case, the presented stimulus is reinforcing other behaviors that were already learned by an individual. Therefore, any stimulus acts as reinforcement for a particular behavior. A person can use past learned behavior to respond to present reinforcement. Therefore, reinforcement does not have to influence a particular kind of behavior; it may influence a few or many behaviors at the same time.

The influence happens at present or in the future. While in most cases this is true, it is false in others because the agent responds to past reinforcement. The observed behavior of a person can be due to reinforcement introduced in the past that work in support of other reinforcements that are presented now. While someone may introduce behavior reinforcement to another person, he or she must also realize that the environment around them is also a reinforcement of wanted or unwanted behavior. Thus, in complex situations, learning may continue to occur, and the involved memory range could be too broad to comprehend. Once a practitioner in learning understands this fact, they should not succumb to false assumptions that reinforcement is not necessary.

Learning is an effect that occurs when a person acts in a given way. Whether the person is operating within specified control factors in an artificial environment or whether the person does not do not affect the learning process significantly (Anderson & Elloumi, 2004). People learn by selecting the available alternatives and opting to use the one that they find most useful to their learning practices or goals. Therefore, a learning method and its effect do not necessarily point out the superiority of the particular method when applied randomly to other subjects.

The short-term memory is a reaction to a stimulus in the environment. The combination of the stimulus with already associated meaning in the mind leads to the development of long-term memory. All that a person stores in long-term memory serve as an aid to the subsequent cognition of a new stimulus that the given person encounters. The preferred pedagogies of teaching today are results of reinforcement that have yielded positive results to contribute to overall positive learning outcomes. As a consequence, they are now standards that new learners and teachers embrace the virtue of their association with success. It means that the pedagogies as reinforced schemes allow students and teachers to tap into to aid their learning tasks.

In learning, the learner is an agent and the agent interacts with the environment. The presence of a teacher only enriches the agent’s environment, but learning only involves the agent’s interaction with the environment at all times. An important consideration for the agent and environment relationship is the fact that the boundary separating them is not easily definable at all times. Actions and reactions are also not always in sequence. For example, a person can process a stimulus and fail to act on it, until later, after responding to stimulus from additional sources. In this case, the person reacting to a stimulus determines the extent to which the stimuli will influence his or her behavior (Anderson & Elloumi, 2004).

Decision-making becomes a critical skill in any environment that humans or animals comprehend partially. Therefore, people will go on to gain rewards, which are positive outcomes of their decisions or suffer punishments for trying to find solutions and making the wrong decisions. Questions are raised as to whether the reinforcement history works for model-based choices otherwise known as goal-oriented actions in a similar way to model-free choices that are otherwise called habits. Goals oriented actions follow anticipated outcomes (Dayan & Law, 2008). Agents act so that they limit their deviation from the goal and will immediately act in the opposite of their previous action when they find out that the last act does not lead towards the desired goal. One may think that there is no significant use of reinforcement learning when dealing with a goal-oriented action, but that is a false assumption. Even with future goals and present actions, agents continue to act based on the collective knowledge and ability to control their environments. The fact that an action is a product of previous thoughts and experiences in reward or punishment shows that a goal-oriented action is a result of one’s reinforcement history (Dayan & Law, 2008).

Another important thing to note is that technologies, just like teachers, enrich the learner’s environment, but they may not necessarily affect the learner’s performance. This would happen when the reward or punishment stimuli provided by the technology do not meet a level that would cause an agent to react and learn. When the instructional strategies are right, any medium used for learning will be useful (Anderson & Elloumi, 2004). In agreeing that it is important to reach the required level of stimulus to cause an agent reaction appropriately, Jones et al. (2013) use the self-determination theory to explain their point in facilitating behavioral parent training. According to the authors, the major failures in behavioral parent training expansion into real-world therapy settings are due to the failure to follow the correct techniques. Rather than increase the availability of the training, it is important to make practitioners aware of the role that the core components, such as positive reinforcement, assigning and reviewing homework, as well as role-playing, have on the overall results of learning.

As with any learning, the need to control the environment sometimes becomes very costly and forces practitioners to seek affordable ways of meeting the demands of learning. Standardization of known methods and discarding costly intervention are some of the solutions used (Tittle, Antonaccio, & Botchkovar, 2012). Unfortunately, this also removes some of the agent control abilities in the learning process, thus hindering learning. When the agent fails to react appropriately, it shows that the agent has learned of alternative responses. Thus, instead of assuming that learning has not occurred, it is important to note that it has, but in a different way than what was desired.

In online learning, practitioners have moved to constructivism and away from behavioral and cognitive psychology thinking. In the behaviorist school, learning is due to external stimuli in the environment. Meanwhile, cognitive psychologies see learning as the use of memory, motivation, and thinking, with reflection playing a crucial role in the outcome. Thus, a learner with a high processing capacity will be in an excellent position to gain knowledge compared to others with limited processing capabilities. However, in the constructivist thought, learning happens by interpretation, followed by personalization. Learners actively interpret their environment and incorporate what they find in what they already know to give rise to new personal knowledge. The new personal experience influences subsequent interpretation and incorporation during future learning.

What emerges from the different thoughts is that some principles persist throughout the independent views. It is a suggestion that there is a comprehensive way in which learners learn. Moreover, one thought alone can only provide a particular explanation, but it cannot fit all learning situations. Additionally, the changes in technology, learning environments, and learning motivations will favor one thought over another, which explains why there are scholars who doubt the significance of the stimulus and reinforcement in the learning process. Nevertheless, the underlying fact is that even without direct observation, the cumulative literature on the subject suggests that reinforcement plays a crucial role in learning. It influences the choice to exploit or explore and can be a differentiating factor between the various decisions made by learners about their reactions to stimuli.

Unfortunately, those arguing against reinforcement have not considered the compounding effects of reinforcement history when observing responses to stimulus. However, internal reinforcement is not enough to allow the brain to perform motion and orientation tasks, as declared in their research. In fact, in a classroom environment, whether online or offline, students who receive reinforcement perform significantly better than their counterparts who do not receive the external reinforcement. Moreover, not all situations and tasks are easy to experiment; therefore, one type of test should not be enough to discredit the argument for reinforcement and its cumulative nature to promote learning (Seitz, Nanez, Holloway, Tsushima, & Watanabe, 2006).

Conclusion

The learner is usually referred to as an agent. Any other thing, person, factor, or condition that affects or causes learning to occur happens in the agent’s environment. Reinforcement of behavior can be visible or indirect, but it is present in the environment at all times. However, learners choose to exploit or explore when they interact with their environment, based on an accumulation of all their experiences with reinforcements. Thus, the agent will likely remain indifferent when the present stimulus is not high enough to prevent or cause an action. Many might think the indifference is a lack of learning, but it is an increase in the agent’s available knowledge of the stimulus. A counter-argument against total reinforcement over the life of a learner is that some learned actions and reactions are independent of the stimulus presented to the agent.

The empirical evidence using rats can be satisfying when looked at independently. However, learning is a communicating process. Agents may fail to act when presented with a stimulus not because they do not recognize the stimulus, but because there are no options of acting. Agents choose to preserve actions until the environment presents appropriate conditions for actions. For example, even though rats will associate a stimulus with food, they can still fail to act on it because it is food, or they have engaged another stimulus that is stronger than the former. Therefore, it is important to interpret experience in the collective literature on reinforcement learning. Even in the e-learning application, practitioners have already recognized the futility of sticking to a given theory, be it behavioral, cognitive psychology or constructivism. Instead, there is an emerging thought that concentrates on the cumulative principles and an understanding that it is important to look at the entire reinforcement history for a learner.

Implications

Researchers will have to incorporate diverse theories in their discussions on research findings to ensure their results and interpretations remain credible. On the other hand, most practitioners will resolve to increase the awareness of the concepts of learning before choosing particular systems or technologies to use for their implementation. Those who still emphasize technologies because they worked in the past or elsewhere risk alternative learners and other practitioners whose environments, past reinforcements, and stimulus may act antagonistically to the chosen technology or system.

References

Anderson, T., & Elloumi, F. (Eds.). (2004). Theory and practice of online learning. Athabasca, Canada: Athabasca University.

Dayan, P., & Law, N. D. (2008). Connections between computational and neurobiological perspectives on decision making. Cognitive, Affective & Behavioral Neuroscience, 8(4), 429-453.

Iordanova, M. D., Good, M. A., & Honey, R. C. (2008). Configural learning without reinforcement: Integrated memories for correlates of what, where, and when. The Quarterly Journal of Experimental Psychology, 61(12), 1785-1792.

Jones, D. J., Forehand, R., Cuellar, J., Kincaid, C., Parent, J., Fenton, N., & Goodrum, N. (2013). Harnessing innovative technologies to advance children’s mental health: Behavioral parent training as an example. Clinical Psychology Review, 33(2), 241-252.

Ringbauer, S., & Neumann, H. (2011). Perceptual learning without awareness: A motion pattern gated reinforcement learner. Journal of Vision, 11(11), 977-977.

Seitz, A. R., Nanez, J. E., Holloway, S., Tsushima, Y., & Watanabe, T. (2006). Two cases requiring external reinforcement in perceptual learning. Journal of Vision, 6, 966-973.

Shteingart, H., & Loewenstein, Y. (2014). Reinforcement learning and human behavior. Current Opinion in Neurobiology, 25, 93-98.

Tittle, C. R., Antonaccio, O., & Botchkovar, E. (2012). Social Learning, reinforcement and crime: Evidence from three European cities. Social Forces, 90(3), 863-890.

Click Here To Order Now!