Skip to content
LEADERSHIP

Leadership behaviors: good, toxic, and the gray zone

Leadership behaviors are the observable conduct raters score in a 360. This subtopic is the practical examples catalog: what good behaviors look like across the three leadership pillars, what their toxic counter-versions look like, and how to read the gray zone in between where the same behavior tips one way or the other based on frequency, context, and culture.

By Simon CarviPublished May 202610 min read

On this page

Why behaviors, not traits

Most articles on leadership behaviors start from a list of adjectives: visionary, decisive, empathetic, accountable. Adjectives are not behaviors. They are interpretations raters apply after the fact, which is why two raters scoring the same person on "visionary" can disagree by three points and both feel correct. A leadership behavior is the observable conduct that produced the interpretation: how the person opened a planning meeting, what they did when a missed deadline surfaced, how they responded to a peer challenging the strategy. Behaviors are the unit a 360 can actually measure. Traits are speculation about why.

This is also why a 360 measures frequency of behavior, not "how visionary is this person on a 1 to 5 scale." The first question has an answer a peer can give: "I observed this twice last quarter." The second question depends on how the rater happened to interpret the word that morning. A practical leadership framework with 7 competency families and 21 underlying behaviors is catalogued in the companion framework reference. This page is the practical layer underneath: what those behaviors look like in observable, scorable form, and how to read them when the picture is not binary.

  • 1The observable principle If two raters cannot agree on whether the behavior happened, the item is not a behavior. It is a trait dressed up in active verbs. Rewrite it until two raters watching the same week of work would score it the same.

Good, toxic, and the gray zone

Most leadership behaviors articles list ten good behaviors, ten bad ones, and stop. That structure fails the rater and the manager for one reason: real leaders do not live at the extremes. They live in the middle, where the same behavior can be the strength that earned them a promotion or the pattern that is now eroding the team, depending on three variables nobody named in the framework.

Frequency flips meaning. A good behavior over-applied becomes toxic. Coaching in 1:1s is excellent leadership. Coaching in every interaction with every report on every topic is smothering and erodes autonomy. Asking for input before deciding is collaborative. Asking for input every time is paralysis dressed as consensus. The dose makes the medicine.

Context flips meaning. Directive decision-making is appropriate in a crisis, dysfunctional in a steady-state team that has earned the right to be consulted. Public recognition motivates a confident contributor and embarrasses an introverted one. The behavior is the same. The read is opposite.

Culture flips meaning. Direct challenge to a senior leader in some team cultures reads as healthy debate. The same behavior in a team with stronger hierarchical norms can read as a serious breach. Neither read is wrong; the behavior was never the whole signal.

This is not just experience speaking. A 2024 study of 410 hotel employees across Istanbul (Yüksel Sakınç and Ergün) tested how three families of leadership behavior shape what subordinates actually call effective leadership. The headline finding for the gray-zone argument: task-focused behaviors (planning, monitoring, clarifying responsibilities) did not predict perceived effectiveness on their own, and they correlated negatively with the leader's perceived cultural awareness. Relationship-focused behaviors (supporting, developing, consulting) carried most of the weight, with cultural awareness sitting between the behaviors and the effectiveness perception. In plain terms: doing the right behaviors is not enough; the behavior has to land in the cultural context of the person on the receiving end, or the effectiveness signal does not register.

The three-pillar spectrum

A useful way to group leadership behaviors is into three pillars: Lead Self, Lead Others, and Lead Results. The pillar structure matters here because the spectrum looks different in each.

Lead Self covers self-awareness and resilience. The gray zone here is internal: a leader who owns mistakes in real time is valuable; one who excessively self-blames erodes their own credibility and signals fragility to the team. The toxic version is the leader who deflects every mistake outward and refuses to engage with feedback at all.

Lead Others covers team leadership, influential communication, and developing people. The gray zone is dosage. A leader who coaches their reports in 1:1s with specific observations is exactly what good development looks like. The same leader who turns every conversation into a coaching moment, including ones the report wanted to be transactional, is smothering autonomy. The toxic version criticises individuals in public and withholds recognition strategically.

Lead Results covers decision-making and execution accountability. The gray zone is speed versus inclusion. A leader who decides quickly with incomplete information when the situation calls for it is decisive. The same leader who decides quickly when the situation called for two more conversations is excluding people from a decision that affects them. The toxic version defers decisions indefinitely, then manufactures urgency to push the team into a corner.

The spectrum table below shows two example behaviors per cell. The next three sections work each pillar in more detail, with the rater items and the gray-zone tell signs.

The good / nuance / toxic spectrum across the three pillars

Lead Self

Self-awareness, resilience

Good
  • Owns mistakes in real time
  • Names own development gaps
Nuance / gray zone
  • Excessive self-blame erodes credibility
  • Performative apology without behavior change
Toxic
  • Deflects blame to team or context
  • Refuses or dismisses feedback

Lead Others

Team leadership, communication, developing people

Good
  • Coaches in 1:1s with specifics
  • Recognizes contribution publicly
Nuance / gray zone
  • Over-coaching smothers autonomy
  • Praise inflation erodes the signal
Toxic
  • Criticizes individuals in public
  • Withholds recognition to maintain control

Lead Results

Decision-making, execution, accountability

Good
  • Decides with incomplete information when needed
  • Escalates blockers without delay
Nuance / gray zone
  • Speed without consultation excludes the team
  • Constant escalation becomes blame routing
Toxic
  • Avoids decisions and defers indefinitely
  • Manufactures urgency to drive output

Self-awareness in practice

Self-awareness is one of two competencies inside the Lead Self pillar. In observable form it shows up as the leader's willingness to surface their own mistakes, name their own development gaps openly, and absorb feedback without performing the absorption. The frequency scale on the leadership 360 has five points plus N/A: rarely observed, sometimes observed, often observed, very often observed, consistently observed, and unable to observe. The N/A option matters here because raters who have not seen the leader handle a real mistake should not be forced toward a middle number.

A rater item for self-awareness reads like this: When a mistake or missed commitment surfaces in their work, this person owns it directly and names what they will do differently. It is short, it is observable, and a peer who saw two such moments last quarter can answer it without translation.

The gray zone here is performative apology. A leader who apologises smoothly and frequently can look very self-aware in a 360 score, while the team experiences something quite different: the apology arrives, the behaviour does not change, and the next mistake surfaces six weeks later. Raters often score this leader as a 4 or 5 on owning mistakes, because the action of owning is visible. The gap is between the verbal acknowledgement and the behavior change that should follow. A second item helps here: After acknowledging a mistake, this person visibly changes the behavior that caused it. The two items together pick up the gray zone.

The toxic counter-pattern is the leader who deflects: the mistake was a context problem, a team problem, an unclear brief, a missing tool. The deflection is sometimes accurate. The pattern is the absence of any first-person ownership across multiple mistakes. Raters often hesitate to score this honestly, which is why anonymity in the 360 instrument matters as much as the item wording.

Influential communication in practice

Influential communication sits inside the Lead Others pillar. The behavior most often searched for under "good leadership behaviors" is the version of this competency that shows up in meetings: the leader who can challenge a decision diplomatically, give direct feedback without crushing the recipient, and make a recommendation stick across a room without needing positional authority.

A clean rater item: When this person disagrees with a decision, they raise the disagreement clearly, with reasoning and an alternative, in a way that keeps the group productive. The item is precise. It is not measuring "is this person assertive"; it is measuring whether the disagreement, when it happens, is structured to help the team rather than to win a point.

The gray zone is the constant contrarian. A leader who challenges decisions thoughtfully once is a strength on the team. The same leader who challenges most decisions, including ones already settled and ones outside their scope, becomes a tax the team pays in every meeting. The behavior reads as principled challenge to senior raters who see only the high-quality challenges, and as obstruction to peer raters who attend the meetings where the challenges land badly. Pair the item above with: This person picks the moments to challenge a decision; they do not challenge most decisions. The second item closes the gap.

The same shape repeats for direct feedback. The good version is candid, specific, and timed. The gray zone is harsh candor that confuses being honest with being indifferent to impact. The toxic version is public criticism of individuals in group settings, often defended as transparency. Across cultures, this pattern reads differently. A direct manager style that lands as candor in one team can land as humiliation in another. The 360 instrument captures the perception data; the consultant interprets the cultural read.

  • 1Pair the item, close the gap When a behavior has a strong gray zone (constant contrarian, performative apology, praise inflation), one rater item rarely captures both sides. Pair two items: one for the behavior itself, one for the calibration that distinguishes the good version from the gray zone.

Run a leadership 360 with anchored items and gray-zone calibration

Configure your competency framework, set up the rater groups, run the instrument in three weeks. Get a report a leader can act on the same week they receive it.

See the leadership 360 product

Decision-making in practice

Decision-making lives inside the Lead Results pillar. It is the competency that produces the most disagreement between self-ratings and peer-ratings, because the leader experiences themselves as decisive while the team experiences the same leader as either rushed or stuck. The behavior of interest is not "does this person make decisions." It is the calibration: are they deciding at the speed and inclusion level the situation calls for.

A good rater item: This person makes a clear decision when the situation requires one, even when information is incomplete. That captures the decisiveness half. A second item is required: This person consults the right people before deciding on questions that affect their work. That captures the inclusion half. Either item alone produces a misleading score.

The gray zone has two distinct shapes. The first is bold sliding into reckless: the leader who decides fast because they like deciding fast, regardless of whether two more conversations would have changed the call. The second is consensus sliding into paralysis: the leader who is so committed to consulting that the decision drifts past the moment it was needed. Both versions look like strengths on a one-item rater question. They look very different on a two-item version that asks about timing and inclusion separately.

The toxic version is the leader who avoids decisions for weeks, then forces the team into a manufactured-urgency push to close out the question. The pattern is the absence of deliberate cadence: decisions happen either too late or under pressure that the leader created.

How to read the gray zone

Raters trained to use the spectrum well ask three questions before scoring a behavior they have observed.

1. What is the frequency

A behavior that lands well once does not warrant a 5. A behavior that lands well most of the time and occasionally over-shoots is a 4 with a development note on dosage. The frequency scale exists precisely so the rater is forced to think about consistency before agreeing or disagreeing with a competency label.

2. What was the context

Directive decision-making in a crisis is leadership. The same behavior in a calm planning cycle is exclusion. Raters who have only seen the leader in one context should use the N/A option on items where the context is wrong. Forcing a midpoint score on a behavior the rater never observed in the right context produces noisy data the consultant has to discount later.

3. What was the impact on others

A behavior that produced a strong outcome at the cost of a damaged relationship is not a 5. A behavior that produced a steady outcome with high relational quality is the actual definition of leadership. The impact lens is what separates self-ratings (which often focus on intent and outcome) from peer-ratings (which often focus on relational and team effect). The gap between the two is one of the most useful signals in the report.

  • 1Cross-cultural note Direct challenge in some team cultures signals competence and confidence. In teams with stronger hierarchical norms, the same behavior can signal a breach. Same item, same observed behavior, two different reads. Calibrate raters within their own cultural context; do not average the cultural signal away in the underlying scoring.

From observation to development

A behavior catalogued, scored, and read with nuance is not yet development. The bridge from observation to development is the conversation that follows the 360 report. The leadership 360 feedback process is the structured version of that conversation: rater composition, debrief sequence, and the translation from scored items into a development plan. The development plan itself follows the 70/20/10 IDP model: the majority of behavior change happens on the job, a smaller portion through coaching and feedback, the smallest through formal learning.

For a behavior that scored in the gray zone, the development question is not "do less of this." It is "in which contexts should this behavior continue at high frequency, and in which contexts should it dial down." That reframing is what turns a 360 score into a usable development plan. A leader who reads their constant-contrarian pattern as a context-calibration problem can keep the strength and lose the cost. A leader who reads it as "I challenge too much" tends to either ignore the feedback or over-correct into silence on the decisions that did warrant a challenge.

Inside Huneety's leadership 360, the L1 deterministic math surfaces the scoring patterns, the L2 AI prose interprets them in plain language, and the L3 consultant narrative adds the cultural and organisational read the algorithm cannot supply. The three layers together produce a report a leader can act on the same week they receive it, rather than a PDF that sits unread because the language is generic.

QUICK ANSWERS

Quick answers on leadership behaviors

Aren't "good" and "toxic" leadership behaviors subjective?
Less than you'd think. The behaviors themselves are observable: criticises in public, defers decisions, coaches in 1:1s. What people disagree on is the pattern and the impact, and those become clearer when you collect feedback from several people instead of relying on one perspective.
How can a 360 capture the gray zone if items are scored 1 to 5?
Ask two questions instead of one. A single question like "this person challenges decisions effectively" cannot tell you whether they are a principled challenger or a constant contrarian. Add a second one, such as "this person picks the moments to challenge," and the gray zone separates from the strength. Well-designed 360s use this paired-question approach.
Should rater scores be culturally normalised in the report?
Not at the scoring stage. Averaging the cultural signal away at the math layer hides one of the most useful parts of the report. Calibrate raters within their own cultural context (peers in a hierarchical team culture score against the norms of that culture), and read the cross-cultural patterns in the qualitative interpretation.
Share

Ready to put this catalog into a real leadership 360?

Huneety's leadership 360 ships with the 7 competencies, 21 behaviors, and the paired-item structure that captures the gray zone. Configure the framework to your company's language; run the instrument in three weeks.