1) Model-based indicators (machine learning) Toxicity/Aggression: Use Jigsaw’s Perspective API to obtain probabilities for “TOXICITY,” “INSULT,” “PROFANITY,” “IDENTITY_ATTACK,” etc. Japanese is supported (see the official language table). Politeness: Stanford’s research established a framework for estimating “politeness” from markers like request forms, hedges, honorifics, etc. It’s English-centric, but the methodology can be adapted to Japanese. The R package politeness is also useful. 2) Lightweight rules for Japanese (highly interpretable) Honorific/hedge rate: Ratios of “です/ます,” “〜でしょうか,” “お手数ですが,” “お願いします,” and the like. Presence of slurs/derogatory terms: A custom NG-word list (including figurative or euphemistic forms). Imperatives/strong assertions: Frequency of “〜しろ,” “〜に決まってる,” heavy use of exclamation marks, ALL-KATAKANA bursts, etc. Consideration/evidence markers: Signs of dialogic and verifiable style such as “根拠:,” “出典:,” “もし〜なら.” 3) ...
Comments
Post a Comment