Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Say hi to Exclusive Self Attention (XSA), a (nearly) free improvement to Transformers for LM.
Observation: for y = attn(q, k, v), yᵢ and vᵢ tend to have a very high cosine similarity
Fix: exclude vᵢ from yᵢ via zᵢ = yᵢ - (yᵢᵀvᵢ)vᵢ/‖vᵢ‖²
Result: better training/val loss across model sizes; increasing gains as sequence length grows.
See more:

Top
Ranking
Favorites
