In practice, the most workable approach is to measure a composite “civility score” built from multiple indicators.

 

1) Model-based indicators (machine learning)

  • Toxicity/Aggression: Use Jigsaw’s Perspective API to obtain probabilities for “TOXICITY,” “INSULT,” “PROFANITY,” “IDENTITY_ATTACK,” etc. Japanese is supported (see the official language table).

  • Politeness: Stanford’s research established a framework for estimating “politeness” from markers like request forms, hedges, honorifics, etc. It’s English-centric, but the methodology can be adapted to Japanese. The R package politeness is also useful.

2) Lightweight rules for Japanese (highly interpretable)

  • Honorific/hedge rate: Ratios of “です/ます,” “〜でしょうか,” “お手数ですが,” “お願いします,” and the like.

  • Presence of slurs/derogatory terms: A custom NG-word list (including figurative or euphemistic forms).

  • Imperatives/strong assertions: Frequency of “〜しろ,” “〜に決まってる,” heavy use of exclamation marks, ALL-KATAKANA bursts, etc.

  • Consideration/evidence markers: Signs of dialogic and verifiable style such as “根拠:,” “出典:,” “もし〜なら.”

3) Interaction signals (thread health)

  • Reports/blocks, constructiveness of replies, churn/exit rate, etc., also reflect “civility.” Toxic posts are empirically associated with reduced participation.

4) Composite score (example)

Compute 0–100 (one possible formula; tune weights with data):

  • Base: Civility = 40*(1 - Toxicity) + 20*Politeness + 20*Heuristics + 20*ThreadHealth

    • Toxicity: Perspective “TOXICITY” probability (0–1).

    • Politeness: Probability from a politeness classifier (0–1).

    • Heuristics: Averaged, normalized rule features (0–1).

    • ThreadHealth: 0–1 from low report rate, high constructive-reply rate, etc.

5) Ops flow (implementation tips)

  1. Preprocess: Keep URLs/emojis; tokenize with MeCab/Sudachi.

  2. Inference: Perspective for toxicity; politeness via transfer learning from English or start rule-based.

  3. Rule features: Honorific rate, slur hits, imperative/assertion strength, etc.

  4. Fusion: Combine with the weights above and apply thresholds (e.g., 80+ = Exemplary, 60–80 = Good, 40–60 = Caution, <40 = Needs action).

  5. Validation: Compare to human labels; evaluate with AUC/F1. Run a bias audit (e.g., over-flagging sentences that contain identity terms).

6) Caveats (very important)

  • Language/dialect bias: Models can misfire across languages and styles; check for language-specific errors.

  • Quoting/irony/reposts: Quoting harmful text to criticize it may still be flagged as “toxic.”

  • Chilling effects: Over-strict automation can suppress diversity of expression and deter participation.

Comments

Popular posts from this blog

Japan Jazz Anthology Select: Jazz of the SP Era