Dataset Viewer
Auto-converted to Parquet Duplicate
canonical_player_id
stringlengths
32
32
match_id
stringlengths
4
7
behavioral_vector
listlengths
192
192
0004a5173840a4c70598ff30a847fbd3
3893789
[ 0.20103101432323456, -0.3186185956001282, -1.8787721395492554, -0.8633803725242615, -0.6424100995063782, 0.1618831306695938, -0.0329667329788208, 0.9100651741027832, -0.14746814966201782, -0.02464263141155243, -0.3492129445075989, -0.44554421305656433, 2.476034164428711, -0.170806944370269...
0004a5173840a4c70598ff30a847fbd3
3893803
[ -0.08704925328493118, -0.05388515815138817, -1.6222771406173706, -0.6185851693153381, 0.8342979550361633, 0.5521843433380127, 0.9800839424133301, 0.7988664507865906, 1.1644819974899292, 0.18060456216335297, 0.7490257024765015, -0.6318356990814209, 1.4212048053741455, 1.2926713228225708, ...
0004a5173840a4c70598ff30a847fbd3
3893820
[ -0.4227903485298157, 0.14318124949932098, -1.3153702020645142, -0.6342871189117432, 0.15033459663391113, 0.5393247604370117, 1.1316667795181274, 1.042403221130371, 0.6850093603134155, -0.4748067259788513, 0.20258814096450806, -0.5393125414848328, 2.4371023178100586, 0.9210143685340881, -...
0019f712eda713c028f7a248a415e84c
68350
[ -0.555044412612915, -1.3071448802947998, -1.1375535726547241, 1.2032853364944458, -0.042770612984895706, -1.3137030601501465, 0.4449014961719513, 0.2683028280735016, 0.24876870214939117, 0.042867910116910934, 0.16199365258216858, -1.0977345705032349, 3.546846389770508, 0.57380211353302, ...
0019f712eda713c028f7a248a415e84c
69176
[ -0.5813397169113159, -1.6514400243759155, -1.1281718015670776, 1.4542515277862549, 0.26641008257865906, -1.2662684917449951, 0.7817767262458801, 0.18641145527362823, 0.3784065246582031, 0.053339626640081406, 0.364922434091568, -0.7641336917877197, 2.901244878768921, 0.43965739011764526, ...
001d437af7c98fffefda0084ff5847b7
2575997
[ 0.06375055760145187, -0.2634292542934418, -2.1903135776519775, -0.0020435990300029516, 0.7231795191764832, -0.4003050625324249, 0.6120042204856873, -0.4627949297428131, 0.8027263283729553, 0.5211275815963745, 0.9988146424293518, -0.8896163702011108, 0.000721247517503798, -0.017859950661659...
001d437af7c98fffefda0084ff5847b7
2576025
[ -0.37372177839279175, 0.6224198937416077, -1.9287476539611816, -0.5283270478248596, 0.8068039417266846, 3.0206377506256104, -0.4035727083683014, -0.3398206830024719, 1.5631786584854126, 0.8968086242675781, 0.9015615582466125, -0.6858196258544922, 0.30795013904571533, -0.6269280910491943, ...
001d437af7c98fffefda0084ff5847b7
2576030
[ -0.34040945768356323, 1.2365052700042725, -1.8196779489517212, 0.04102145880460739, 0.03429948538541794, 2.48359751701355, -0.5508068799972534, 0.2649366855621338, 1.0945087671279907, 0.8334924578666687, 0.08700550347566605, -1.1326464414596558, 0.659546434879303, -0.30392372608184814, -...
001d437af7c98fffefda0084ff5847b7
2576042
[ -0.2560321092605591, 0.7996488809585571, -2.239529848098755, 0.22386355698108673, 0.02351095713675022, 1.976608157157898, -0.39065277576446533, -0.7756553292274475, 1.0893731117248535, 0.5916842222213745, 1.19148588180542, -1.2963850498199463, 0.010467957705259323, -0.7217341661453247, -...
001d437af7c98fffefda0084ff5847b7
2576051
[ -0.6052923202514648, 0.9662848711013794, -2.26997447013855, 0.3646601140499115, -0.138527974486351, 2.2514026165008545, -0.3602313697338104, -0.6021344661712646, 0.982255220413208, 0.27204209566116333, 1.0244684219360352, -1.107640266418457, -0.2327297329902649, -0.9128183722496033, -0.3...
001d437af7c98fffefda0084ff5847b7
2576067
[ -0.49636733531951904, 0.585120677947998, -2.548746347427368, -0.6588567495346069, 0.1458774209022522, 2.4548823833465576, 0.16255314648151398, -0.9744091033935547, 0.9438899159431458, 0.5003142356872559, 1.3908401727676392, -0.4751194417476654, -0.29699641466140747, -1.210471749305725, 0...
001d437af7c98fffefda0084ff5847b7
2576071
[ -0.4355224072933197, 0.9810255765914917, -1.973080039024353, 0.8824018836021423, 0.11299687623977661, 1.815610408782959, -0.44885483384132385, -0.16669489443302155, 1.0639269351959229, 0.5244406461715698, 0.8694091439247131, -1.4932851791381836, 0.2589753568172455, -0.8511603474617004, -...
001d437af7c98fffefda0084ff5847b7
2576088
[ -0.5749000310897827, 0.9187750220298767, -2.2928433418273926, 0.10082656890153885, 0.1469309777021408, 2.16290283203125, -0.09112092852592468, -0.5807342529296875, 1.0834808349609375, 0.08146312087774277, 0.9740229845046997, -1.227557897567749, 0.17075370252132416, -0.7134791016578674, -...
001d437af7c98fffefda0084ff5847b7
2576091
[ -0.08143339306116104, 1.3634724617004395, -1.9404453039169312, 0.16997289657592773, 0.042975615710020065, 2.307239294052124, -0.27009475231170654, -0.3157773017883301, 1.1650490760803223, 0.28398796916007996, 0.2385052591562271, -1.2221577167510986, -0.06622102856636047, -0.809421896934509...
001d437af7c98fffefda0084ff5847b7
2576100
[ -0.6497258543968201, 0.4767789840698242, -2.2668535709381104, -0.5075594782829285, 0.6248962879180908, 2.1738150119781494, 0.008559133857488632, -0.8674082159996033, 1.4156354665756226, 0.5916360020637512, 0.9845597147941589, -0.7850708961486816, -0.18882162868976593, -0.5519460439682007, ...
001d437af7c98fffefda0084ff5847b7
2576109
[ -0.7589573860168457, 0.9213619232177734, -2.475001811981201, -0.38772326707839966, 0.10223923623561859, 2.24971079826355, -0.08883899450302124, -0.82236647605896, 0.8603571653366089, 0.22604572772979736, 0.9772640466690063, -1.0836400985717773, -0.41095662117004395, -0.807847797870636, 0...
001d437af7c98fffefda0084ff5847b7
2576126
[ -0.5112823247909546, 0.8154535889625549, -2.40238094329834, -0.8285950422286987, 0.40860453248023987, 2.891268730163574, 0.0035924273543059826, -0.47330665588378906, 1.122263789176941, 0.5074998736381531, 1.0959570407867432, -0.7766066193580627, -0.02224039100110531, -0.8403908014297485, ...
001d437af7c98fffefda0084ff5847b7
2576129
[ -0.4336474537849426, 0.702021598815918, -1.670649766921997, -0.3405110239982605, 0.741450309753418, 3.2949929237365723, -0.3240957260131836, -0.919941246509552, 1.8079363107681274, 0.9722062349319458, 1.0657947063446045, -0.9328892230987549, 0.3096584379673004, -1.0513535737991333, 0.153...
001d437af7c98fffefda0084ff5847b7
2576139
[ -0.3362032175064087, 0.6662922501564026, -1.9569287300109863, -0.7474387288093567, 0.7709362506866455, 2.1166296005249023, 0.13898281753063202, -1.2076740264892578, 1.0426201820373535, 0.286616712808609, 1.2092528343200684, -0.3134312629699707, -0.4690784215927124, -0.8330718874931335, -...
001d437af7c98fffefda0084ff5847b7
2576154
[ -0.7825233936309814, 1.2090381383895874, -2.2238714694976807, 0.08536475151777267, 0.7499362230300903, 3.3460166454315186, -0.5934954285621643, -0.9994473457336426, 1.5941660404205322, 0.37394478917121887, 0.5469014644622803, -1.014770746231079, -0.3304280638694763, -0.8216286301612854, ...
001d437af7c98fffefda0084ff5847b7
2576164
[ -0.7290873527526855, 1.1392420530319214, -2.572216510772705, 0.1303568333387375, 0.0713120773434639, 2.3175578117370605, -0.4599353075027466, -0.031062403693795204, 0.9616938233375549, 0.5202059745788574, 1.131658911705017, -1.550208568572998, 0.3154744505882263, -0.9252107739448547, -0....
001d437af7c98fffefda0084ff5847b7
2576215
[ -0.6181604266166687, 0.8203017711639404, -2.704448938369751, -0.43995094299316406, 0.22374963760375977, 2.808518886566162, -0.4783080518245697, -0.2668142318725586, 1.313873291015625, 0.7240211963653564, 1.3312444686889648, -0.9811862707138062, -0.19382183253765106, -0.8819746971130371, ...
001d437af7c98fffefda0084ff5847b7
2576232
[ -0.12651365995407104, 0.7434001564979553, -1.7320101261138916, 0.3130324184894562, 0.7820195555686951, 1.493852138519287, -0.2553110718727112, -0.4007219970226288, 1.344552993774414, 0.09713509678840637, 0.4886987507343292, -1.6484370231628418, 0.4060860574245453, -0.49414190649986267, -...
001d437af7c98fffefda0084ff5847b7
2576241
[ -0.6542195081710815, 0.5335093140602112, -1.5622197389602661, 0.09219862520694733, 0.8755742907524109, 2.3348498344421387, 0.03570009768009186, -1.12484610080719, 1.9306448698043823, 1.0031658411026, 0.9926797747612, -1.1646485328674316, 0.011157764121890068, -0.352607399225235, -0.41473...
001d437af7c98fffefda0084ff5847b7
2576257
[ -0.6832978129386902, 0.9891186952590942, -1.21524977684021, 0.3536015748977661, 1.593679666519165, 2.57426381111145, -0.7397922873497009, -0.6287572979927063, 2.284133195877075, 0.9492210745811462, 1.0304126739501953, -1.2047955989837646, 0.5578747987747192, -0.45243123173713684, -0.1090...
001d437af7c98fffefda0084ff5847b7
2576261
[ -0.4030790328979492, 0.7950044274330139, -2.303598642349243, -0.0031605639960616827, -0.1392376571893692, 2.4401419162750244, -0.4195338785648346, -0.5619379281997681, 0.5186076164245605, 1.1549053192138672, 0.5026743412017822, -1.089205026626587, 0.45275378227233887, -0.5268239378929138, ...
001d437af7c98fffefda0084ff5847b7
2576281
[ -0.6331931352615356, 1.0194463729858398, -2.711285352706909, -0.33731672167778015, 1.0401228666305542, 2.9475018978118896, -0.3601224422454834, -0.11922463029623032, 1.602857232093811, 0.5040340423583984, 1.4391146898269653, -1.2369922399520874, -0.293890118598938, -0.9781460165977478, -...
001d437af7c98fffefda0084ff5847b7
2576290
[ 0.6428419947624207, -0.574313223361969, -1.5944669246673584, -0.7770037651062012, -0.12117962539196014, -1.4310379028320312, 1.3823095560073853, -0.342568963766098, 0.081402488052845, -0.3952189087867737, 0.6036309003829956, -0.22193655371665955, 1.2156031131744385, -0.033604253083467484, ...
001e89c02bf446b1956e00e5f92fe328
2500899
[ -0.833387017250061, -0.06605351716279984, 0.09516068547964096, -1.1128243207931519, 0.3606145977973938, 2.631216526031494, -0.23640216886997223, -1.5222810506820679, 0.7501220107078552, 0.7582744359970093, -0.31791698932647705, 1.0576026439666748, -0.6399162411689758, -0.8723048567771912, ...
002dfad4e7ebaaa8519fcbfac63ae79f
3998855
[ -0.01767861470580101, -1.0228893756866455, -1.4792373180389404, 0.06403932720422745, 0.0065765283070504665, -1.1473889350891113, 1.3709546327590942, 0.3423703908920288, 0.5561623573303223, -0.13691146671772003, 0.4448554515838623, -1.0504103899002075, 2.2597529888153076, 0.0182544253766536...
00309525ca8e99d728878bcb0cda3169
3878542
[ -0.40030452609062195, -0.7079594731330872, -0.7181149125099182, 0.7959884405136108, 0.5207380652427673, -0.597575843334198, 0.9326990842819214, 1.173177719116211, 1.0108463764190674, 0.14875853061676025, 0.32200944423675537, -0.9219427704811096, 2.4824094772338867, 0.8281751275062561, -0...
00309525ca8e99d728878bcb0cda3169
3878551
[ 0.11887302994728088, -0.36773794889450073, -0.4005042314529419, 0.6968206763267517, -0.13171638548374176, -0.6879492402076721, 0.47072532773017883, 1.3927078247070312, 0.3609773516654968, 0.19711430370807648, -0.18121063709259033, -0.3638152778148651, 2.8388030529022217, 0.9539687037467957...
00309525ca8e99d728878bcb0cda3169
3879549
[ -0.42335954308509827, -1.5254034996032715, -1.337038278579712, 1.1733709573745728, -0.22846464812755585, -1.233686923980713, 0.5605054497718811, 0.5901111364364624, 0.5834008455276489, 0.44247597455978394, 0.07240340858697891, -1.1341207027435303, 3.14664626121521, 0.24506127834320068, -...
00309525ca8e99d728878bcb0cda3169
3879562
[ -0.4269489049911499, -1.86457097530365, -1.7055981159210205, 0.8528624176979065, -0.1557425856590271, -1.5064287185668945, 0.5862401127815247, 0.014048012904822826, 0.4643622636795044, -0.19987528026103973, 0.3631446361541748, -0.9517658948898315, 3.245642900466919, 0.4608120918273926, -...
00309525ca8e99d728878bcb0cda3169
3879568
[ -0.2788848876953125, -1.6073187589645386, -1.1995222568511963, 1.3197671175003052, -0.3488490581512451, -1.5481982231140137, 0.23231256008148193, 0.5977117419242859, 0.37032410502433777, 0.10287418961524963, -0.023159077391028404, -0.6806937456130981, 3.2900564670562744, 0.6298450231552124...
00309525ca8e99d728878bcb0cda3169
3879585
[ -0.3287128508090973, -1.7175776958465576, -2.4784786701202393, 0.24786575138568878, -0.15273791551589966, -1.4361094236373901, -0.862141489982605, 0.705805242061615, 0.11492936313152313, 0.8472830653190613, 0.34583768248558044, -1.0054429769515991, 1.9995249509811401, -0.40595415234565735,...
00309525ca8e99d728878bcb0cda3169
3879594
[ 0.020880693569779396, -1.3184343576431274, -1.0689771175384521, 1.1052621603012085, -0.6138311624526978, -1.3501948118209839, 0.026646485552191734, 1.0197325944900513, 0.06892561167478561, -0.1785607635974884, -0.38797900080680847, -0.1919616162776947, 3.6460561752319336, 0.627994537353515...
00309525ca8e99d728878bcb0cda3169
3879599
[ -0.677095353603363, -1.3045960664749146, -1.0354634523391724, 0.9001266360282898, 0.3129478693008423, -1.2118253707885742, 0.7883264422416687, 0.18748614192008972, 0.46648696064949036, -0.35232144594192505, 0.676602303981781, -0.9734338521957397, 3.2199318408966064, 0.9215576648712158, -...
00309525ca8e99d728878bcb0cda3169
3879652
[ -0.4477926790714264, -1.1413967609405518, -1.2568771839141846, 0.5756058692932129, 0.11984152346849442, -1.3059569597244263, 0.4323004484176636, 0.33764931559562683, 1.1166496276855469, 0.4990275502204895, 0.9708812832832336, -0.921455442905426, 2.4859137535095215, 0.4723091125488281, -0...
00309525ca8e99d728878bcb0cda3169
3879665
[ -0.06536868214607239, -1.2478358745574951, -0.8157307505607605, 1.412420392036438, -0.5292845964431763, -1.1962249279022217, -0.022138379514217377, 0.7930199503898621, 0.12584435939788818, -0.20095060765743256, -0.4029654562473297, -0.054680369794368744, 3.418123483657837, 0.66253048181533...
00309525ca8e99d728878bcb0cda3169
3879674
[ -0.07032214850187302, -1.0268958806991577, -0.8279229402542114, 1.0628753900527954, -0.7913949489593506, -1.1013346910476685, 0.19739459455013275, 1.124107003211975, 0.27363887429237366, -0.3678421080112457, -0.7016905546188354, -0.29604098200798035, 3.6715428829193115, 0.6505144238471985,...
00309525ca8e99d728878bcb0cda3169
3879676
[ -0.21937844157218933, -1.286050796508789, -1.1473414897918701, 1.064199686050415, -0.8339665532112122, -1.1143732070922852, 0.4695846140384674, 0.907279372215271, 0.2762424647808075, -0.31557321548461914, -0.31199854612350464, -0.4353683590888977, 3.6128528118133545, 0.641525387763977, 0...
00309525ca8e99d728878bcb0cda3169
3879690
[ -0.38967612385749817, -1.2489862442016602, -1.6616606712341309, 1.0024257898330688, -0.04206332191824913, -1.6074386835098267, -0.22368420660495758, 0.5258834362030029, 0.8184062838554382, 0.4177488386631012, 0.5107847452163696, -1.2626906633377075, 2.676356554031372, 0.5882237553596497, ...
00309525ca8e99d728878bcb0cda3169
3879698
[ -0.3571249544620514, -0.892379641532898, -0.9756942987442017, 0.948800265789032, -0.20833948254585266, -1.220293641090393, 0.24496905505657196, 0.9429143071174622, 0.798062801361084, 0.22907476127147675, 0.2521902620792389, -0.9758333563804626, 3.2176389694213867, 0.6141886711120605, -0....
00309525ca8e99d728878bcb0cda3169
3879712
[ -0.5094221234321594, -0.5450451374053955, -1.9415870904922485, 1.3723335266113281, 0.4617323577404022, -0.5792136192321777, 0.44824621081352234, 0.07341745495796204, 0.9845847487449646, 1.7479885816574097, 0.35541146993637085, -1.782706379890442, 0.33136552572250366, -0.573428213596344, ...
00309525ca8e99d728878bcb0cda3169
3879721
[ -0.5253955721855164, -1.355314016342163, -1.534232258796692, 0.5997166633605957, -0.6496675610542297, -1.5038551092147827, 0.17866699397563934, 0.04879186674952507, 0.556669294834137, -0.3839389681816101, 0.42062443494796753, -0.8238379955291748, 3.3104257583618164, 0.47782135009765625, ...
00309525ca8e99d728878bcb0cda3169
3879734
[ 0.14868700504302979, -1.2299593687057495, -0.780975341796875, 1.2598499059677124, -0.7649926543235779, -1.200284481048584, -0.022463081404566765, 1.0741572380065918, 0.003201957093551755, -0.13421596586704254, -0.5911455750465393, 0.13666746020317078, 3.587096929550171, 0.5032313466072083,...
00309525ca8e99d728878bcb0cda3169
3879742
[ 0.1269473433494568, -0.712578535079956, -0.4697332978248596, 1.233569622039795, -0.5300217866897583, -1.0517826080322266, 0.21465489268302917, 1.2459967136383057, -0.02196543663740158, -0.27022039890289307, -0.618285059928894, 0.1055021807551384, 3.5450010299682617, 0.9523640871047974, -...
00309525ca8e99d728878bcb0cda3169
3879756
[ -0.20032329857349396, -1.1069896221160889, -1.5275899171829224, 0.07529114186763763, -0.15830838680267334, -1.288272500038147, 1.3104243278503418, 0.24347756803035736, 0.4153772294521332, -0.2588992714881897, 0.587517499923706, -1.0487557649612427, 2.2133095264434814, -0.02608325704932213,...
00309525ca8e99d728878bcb0cda3169
3879758
[ -0.0059134033508598804, -1.1126948595046997, -0.8160953521728516, 1.2081787586212158, -0.5781320929527283, -1.4001020193099976, 0.1309618055820465, 0.8229253888130188, 0.23594410717487335, -0.105063296854496, -0.3447842001914978, -0.2311059981584549, 3.60551381111145, 0.7926097512245178, ...
End of preview. Expand in Data Studio

Football2Vec v2 Embeddings — StatsBomb + Wyscout

Per-player 192-dimensional embeddings produced by Football2Vec v2 — a transformer encoder with adversarial competition debiasing. Trained on ~3,000 StatsBomb + ~1,900 Wyscout open-data matches; the dataset here is the post-training embeddings output, one row per unique canonical player.

Part of the (Right! Luxury!) Lakehouse soccer analytics platform.

Naming note

This dataset is named football2vec-statsbomb-wyscout to match the historical v1 model repo name. The v2 model weights live at luxury-lakehouse/football2vec-v2; the legacy v1 model at luxury-lakehouse/football2vec-statsbomb-wyscout is deprecated and kept only for traceability. The embeddings in this dataset come from v2; the repo name is retained for backward compatibility with downstream consumers that read embeddings from this path.

Quick Start

from datasets import load_dataset
import numpy as np

ds = load_dataset("luxury-lakehouse/football2vec-statsbomb-wyscout")
df = ds["train"].to_pandas()
print(f"{len(df):,} players, dim={len(df.loc[0, 'embedding'])}")

Explore interactively: Soccer Analytics App

Schema

Column Type Description
canonical_player_id Int64 Cross-source-resolved canonical player identifier
player_name string Player display name at time of training
embedding list<float32> 192-dimensional vector produced by Football2Vec v2
total_matches Int64 Number of matches the player appeared in across both sources
data_sources list<string> Sources where the player has appearances (statsbomb, wyscout)

Schema Migration — Dual-Column Window (2026-04-25 → 2026-07-22)

PR 5b of the lakehouse Kimball migration (ADR-011) adds the BIGINT surrogate player_key to the underlying lakehouse marts that consume embeddings produced by this model. This dataset's payload is unchanged in PR 5b — the parquet files continue to ship canonical_player_id only. PR 8 (planned 2026-07-22) will add player_key to the payload in a backwards-compatible way and announce a sunset for canonical_player_id.

Recommended consumer behaviour during this window:

  • No change required. Continue to read canonical_player_id from this dataset.
  • If you maintain your own join to a dim_players clone, you may pre-compute player_key = xxhash64(provider || '|' || cast(player_id as string)) to align with the lakehouse Kimball convention ahead of the payload change.
  • After 2026-07-22 the dataset will carry both columns for at least one HF dataset version, then canonical_player_id will be deprecated. Migrate at your convenience inside that window.

If you depend on this dataset and need extra notice before the column drop, open an issue on the lakehouse repo.

Training Provenance

Use Cases

  • Player similarity search: cosine similarity over embedding returns behaviourally similar players across competitions
  • Counterfactual substitution ("what would Player X do in Team Y's possessions?"): input for downstream visual analytics
  • Role clustering: UMAP / PCA projections reveal role archetypes decoupled from competition identity (v1 baselines tended to cluster by league; v2 does not)

Limitations

  • Open data only: commercial datasets cover additional leagues and seasons
  • Training-time snapshot: embeddings are recomputed only when the v2 model is retrained; between training runs, new players are absent from this dataset
  • Position-agnostic: no role tag — downstream consumers apply role classifiers separately

License

CC-BY-NC 4.0 (inherited from Wyscout training-data licensing).

Companion Resources

Resource Type Description
Football2Vec v2 Model Model Transformer encoder that produced these embeddings
Football2Vec Training Data Dataset Upstream training corpus
Football2Vec v1 (deprecated) Model Legacy Doc2Vec model — superseded by v2
Football2Vec Player Embeddings Dataset Multi-granularity embeddings (career/season/per-match)

PR 7 changelog (2026-04-27)

PR 5b (2026-04-25) added Kimball surrogate FK player_key to the upstream gold mart. The HF model-companion dataset payload republish was deferred at PR 5b and is absorbed into PR 7's scope per feedback_hf_artifacts_in_scope_pr and project_kimball_pr8_scope_locked. Payload now carries player_key (BIGINT) alongside the legacy player_id and canonical_player_id columns during the 2026-07-22 dual-column window. PR 8 will sunset the legacy ID columns post-2026-07-22.

Downloads last month
70