§ Vp«fàqãó†—UdZddlmZddlZddlmZmZmZmZddl Z ddl Z ddlZddlm Z ddlmZddlZe je¦«ZdZdZd Zd Zddid diddidddœddidddœdœZded<dZdZdZdGd„ZdHd!„Z eeeeee iZ!d"ed#<d$d%œdId*„Z"dJd+„Z#dKd/„Z$dLd4„Z%dMd6„Z&dNd8„Z'dOd9„Z(dPd;„Z)dQd<„Z* dRd$d$d=œdSd@„Z+ dRd$dAd$dBœdTdF„Z,dS)Uz!Utils for building a device mesh.é)ÚannotationsN)ÚCallableÚ GeneratorÚMutableMappingÚSequence)ÚAny)Ú xla_bridgezTPU v2zTPU v3zTPU v4zTPU v5 lite)ér )rér )ér)ér)rr r))é@r)rr)ré))ér)rr))r r r)r r r)rrr)rrr)rrr)rr r z=dict[tuple[int, ...], dict[tuple[int, ...], tuple[int, ...]]]Ú_TRANSPOSE_TRICKS)rrr éééré)rrrr )rrr rrrré é éééé érrÚ mesh_shapeú Sequence[int]Údevicesú Sequence[Any]Úreturnú np.ndarraycó"—t|¦«dkrdt d¦«tj|¦«}|tjt¦«}| |¦«}|S|ddkrftj|¦« |¦«}t d¦«tjt¦«}|d|f}|Stj|¦« |¦«S)Nrz@Reordering mesh to physical ring order on single-tray TPU v2/v3.éÿÿÿÿz>Reordering mesh to physical ring order on each TPU v2/v3 tray..)ÚlenÚloggerÚinfoÚnpÚasarrayÚarrayÚ_TRAY_RING_ORDERÚreshape)rrÚ unused_kwargsÚdevice_meshÚperms ú[/var/www/html/nettyfy-visnx/env/lib/python3.11/site-packages/jax/experimental/mesh_utils.pyÚ_tpu_v2_v3_create_device_meshr1Gsø€õ ˆ\„\QÒÐÝ ‡K‚KØJñôðõ”*˜WÑ%Ô%€KØbœhÕ'7Ñ8Ô8Ô9€KØ×%Ò% jÑ1Ô1€KØÐØ"„~˜ÒÐÝ”*˜WÑ%Ô%×-Ò-¨jÑ9Ô9€KÝ ‡K‚KØHñôðõŒ8Õ$Ñ%Ô%€DØ˜c 4˜iÔ(€KØÐõ Œ:gÑÔ×&Ò& zÑ2Ô2Ð2óúnp.ndarray | NonecóŽ—td„|D¦«¦«\}}}|dz|dz|dz}}}t|d„¬¦«} ||cxkrdkrfnnc|dkr]t|¦«dkrJtj| ¦«} | tjt¦«} | |¦«} | S||cxkrdkr†nnƒ|dkr}t|¦«dkrjt|¦«t|¦«krJtj| ¦«} | tjt¦«} | |¦«} | SdS) zúCreates rotated pincer device assignment for selected topologies. Args: mesh_shape: Logical mesh shape used by the model. devices: TPU devices. **unused_kwargs: ... Returns: None or reordered devices reshaped as `mesh_shape`. c3ó8K—|]}t|dd¦«V—ŒdS)Úcoords©rrrN)Úgetattr©Ú.0Úds r0ú z*_vlc_create_device_mesh..ps.èè€ÐMÐMÀG A x°Ñ;Ô;ÐMÐMÐMÐMÐMÐMr2rc óX—ttt|dd¦«¦«¦«S)Nr6r7)ÚtupleÚreversedr8)r;s r0úz)_vlc_create_device_mesh..ws"€•E(¥7¨1¨h¸ Ñ#BÔ#BÑCÔCÑDÔD€r2)Úkeyr rr N) ÚmaxÚsortedr%r(r)r*Ú_TRAY_2x2_RING_ORDERr,Ú_TRAY_4x4_RING_ORDER)rrr-Úmax_xÚmax_yÚmax_zÚbound_xÚbound_yÚbound_zÚsequential_devicesr.s r0Ú_vlc_create_device_meshrMcsq€õÐMÐMÀWÐMÑMÔMÑMÔMÑ€%ˆØ# a™i¨°©°E¸A±IGˆ7€'õØ Ø DÐ DðFñFôFÐð ÐÐÒÐ˜1ÒÐÐÐÐ ¨A¢ µ#°g±,´,À!Ò2CÐ2CÝ”*Ð/Ñ0Ô0€KØbœhÕ';Ñ<Ô<Ô=€KØ×%Ò% jÑ1Ô1€KØÐàÐÐÒÐ˜1ÒÐÐÐÐ ¨A¢ µ#°g±,´,À"Ò2DÐ2Då ˆ:„#˜g™,œ,Ò&Ð&Ý”JÐ1Ñ2Ô2€kØ¥¤Õ)=Ñ >Ô >Ô?€kØ×'Ò'¨ Ñ3Ô3€kØ Ðà ˆr2z+dict[str, Callable[..., np.ndarray | None]]Údevice_kind_handler_dictF©Úallow_split_physical_axesÚ physical_meshrPÚboolútuple[np.ndarray, np.ndarray]c ó˜‡—t|j¦«}d„|D¦«}ttt|¦«¦«¦«D]Ã\}}t ddd¦«D]y}tjt|¦«|¦«}|D]H} t| Ž\Š} tj | ¦«|kr"‰||<ˆfd„t|¦«D¦«}nŒI||rn4Œz|dkr-|std|›d|›d |›d ¦«‚t||¦«cSŒÄg}tjt|j¦«t|¦«gtj¬¦«}t|¦«D]@\} }|D]8}t|¦«}|j|||| f<| |¦«Œ9ŒA| |¦« |¦«|fS)aÝAssigns logical parallelism axes to physical axes of an N-D torus network. Given logical parallelism axes with sizes in `mesh_shape` and devices in an N-dimensional torus network represented by `physical_mesh`, maps each logical axis to one or more physical axes. Prefer to map more-performance-sensitive logical axes to larger numbers of physical axes to maximize the bandwidth available to them. Also prefer to assign logical axes to multiple physical axes of the same size (e.g., a 2D square) rather than multiple physical axes of different sizes when possible. If allow_split_physical_axes = False (default), this routine will error out instead of splitting a physical axis over more than one logical axis (which would reduce total usable bandwidth). Let's use a concrete example to explain the concepts and considerations. As an example, suppose the logical mesh is [data, model], for data and model parallelism respectively. Also suppose that data parallelism is less performance sensitive than model parallelism. Consider a 3D TPU pod slice of shape 4x4x16, represented by a physical mesh of shape (4, 4, 16). A TPU pod slice has equal bandwidth along all axes with wraparound links, but a 2D plane of size 4x4 may have faster XLA collective implementations than a non-square plane or a 1D subgroup. If the mesh_shape is [16, 16], we may want the more performance sensitive `model` axis to be mapped to the 4x4 XY plane. Args: physical_mesh: a np.ndarray of devices in the shape of the N-D torus physical topology. mesh_shape: shape of the logical mesh (size of the various logical parallelism axes), with axes ordered by increasing network intensity. allow_split_physical_axes: If True, we would split physical axes if necessary to fit the desired mesh shape. Returns: An np.ndarray of devices in the shape of the logical mesh (mesh_shape), with each logical parallelism axis mapped to one or more physical mesh axes. The axis assignment matrix, which is a 2-d array mapping from (physical_axis, logical_axis) to the size assigned, with the invariant np.prod(assignment, axis=1) = physical_mesh_shape, and np.prod(assignment, axis=0) = mesh_shape. có—g|]}d‘ŒS)©rV)r:Ú_s r0ú z4_create_device_mesh_for_nd_torus..Ês€Ð&>Ð&>Ð&>¨a rÐ&>Ð&>Ð&>r2rrr$có&•—g|] \}}|‰vrdn|‘ŒS)rrV)r:ÚiÚvÚ c_indicess €r0rXz4_create_device_mesh_for_nd_torus..ãs9ø€ð&ð&ð&á!Qð˜ >>ˆaˆa qð&ð&ð&r2rz1Failed to find assignment for logical_axis_index z of size z with remaining assignable mesh a. The size of each axis in your logical mesh must be equal to the product of some subset of the physical mesh axis sizes. E.g. logical mesh (4, 16) is compatible with physical mesh 4x4x4 since 4=4 and 16=4x4. If you want to split physical axes, set allow_split_physical_axes to True.©Údtype)ÚlistÚshaper?Ú enumerateÚrangeÚ itertoolsÚcombinationsÚzipr(ÚprodÚNotImplementedErrorÚ/_create_device_mesh_for_nd_torus_splitting_axesÚonesr%Úint64ÚintÚappendÚ transposer,)rQrrPÚassignable_physical_meshÚ assignmentÚlogical_axis_indexÚlogical_axis_sizeÚnum_axesÚindices_and_axesÚelemÚc_axesrmÚassignment_arrayrZÚxÚyÚphysical_mesh_axisr\s @r0Ú _create_device_mesh_for_nd_torusrz—sŽø€õb" -Ô"5Ñ6Ô6Ðà&>Ð&>°:Ð&>Ñ&>Ô&>€*õ 08Ý 9ZÑ Ô Ñ!Ô!ñ0ô0ð2ð2Ñ+ÐÐ+õ˜!˜Q ‘O”Oð.ð.ˆå"Ô/Ý Ð,Ñ -Ô -¨xñôÐð#ððˆ$Ý ˜JÑˆ 6õŒ76‰?Œ?Ð/Ò/Ð/Ø+4ˆ*Ð'Ñ (ð&ð&ð&ð&å#Ð$<Ñ=Ô=ð&ñ&ô&Ð "ðˆ%ð0ð Ð&Ô 'ðà ˆðð ˜QÒ Ð Ø(ð õ$ð4Ø$ð4ð4Ø/@ð4ð4à,Dð4ð4ð4ñ ô ð õAØ˜Zñôðððøð €)Ý”WÝ ˆ=ÔÑÔ¥ Z¡¤Ð1½¼ðñôÐõ˜ Ñ#Ô#ð+ð+d€aˆØ ð+ð+ˆÝ˜q™6œ6ÐØ0=Ô0CØ ô1ÐÐ)¨1Ð,Ñ-ð×ÒÐ)Ñ*Ô*Ð*Ð*ð+ð×Ò˜iÑ(Ô(×0Ò0°Ñ<Ô<Øð ðr2có.—tj|j¦«tj|¦«krtd|j›d|›d¦«‚|j}t |¦«}tjt |¦«t |¦«gtj¬¦«}ttt|¦«¦«¦«D];\}}d}t|||¦«D]}|t||||¬¦«r|}Œ||dd…|f<Œrir%rjr?r_raÚ,_enumerate_feasible_logical_axis_assignmentsÚ%_prefer_first_logical_axis_assignmentÚ_generate_logical_mesh) rQrr}Úlogical_mesh_shaperoÚlogical_axisrqÚbest_logical_axis_assignmentÚlogical_axis_assignmentÚlogical_meshs r0rhrhs|€õ8„Wˆ]Ô Ñ!Ô!¥R¤W¨ZÑ%8Ô%8Ò8Ð8Ý ð *ØÔð *ð *à&ð *ð *ð *ñôðð&Ô+ÐÝ˜ZÑ(Ô(ÐõŒwÝ ÐÑÔ¥Ð%7Ñ!8Ô!8Ð9ÅÄðñô€*õ*2Ý 9Ð'Ñ(Ô(Ñ)Ô)ñ*ô*ð?ð?Ñ%€lÐ%ð $(Ð Ý#OØ˜ZÐ):ñ$ô$ð?ð?Ðð'Ð .Ý 2Ø%Ø*Ø"5Ø#ð ñôð/ð(?Ð$øØ">€Jˆqˆqˆq,ˆÑÐõ(ØÐ'¨ñô€,ð zÐ !Ð!r2rwrkú list[int]cóØ—|dksJ‚g}tdtj|¦«dz¦«D]8}||zdkr#| |¦«||z}||zdk°#|dkr|cSŒ9|gS)zÑ"?Ô"?Å2Ä7ØrðDñDôDñ#ÐõÔdÑ#Ô#ðõ,5Ø#ñ,ô,ð<ð<Ñ'€mÐ'õ%Ð%7Ñ8Ô8ð<ð<ˆØ Ð+Ð +Ð +ØØ˜fÔ%×,Ò,¨]Ñ;Ô;Ð;Ð;ð<ð €'ØÐØ2×8Ò8Ñ:Ô:ððÑ€fˆlØ‡N‚N6ÑÔÐØ× Ò ÝÝÔ"Ø'¨Ô/°ñ ô ñ ô ñôððõ#Ô*Ð,AÐBðð€oÝ ŒW•cÐ-Ñ.Ô.Ð/µr´xÐ @Ñ @Ô @€FÝ/8¸Ñ/IÔ/Ið7ð7Ñ+ˆÐ+Ø0ð7ð7ˆ-Øˆ}ÐÐÔ ¨Ô!6Ñ6ÐÐÑÐð7à €L€L€L€Lððr2rxcóÂ‡‡ —tjˆfd„t|¦«D¦«¦«}tjˆfd„t|¦«D¦«¦«}||kr||kStˆfd„t|¦«D¦«¦«}tˆfd„t|¦«D¦«¦«}||kr||kStj|d¬¦«Š tjˆ fd„t|¦«D¦«¦«}tjˆ fd„t|¦«D¦«¦«} || kr|| kSt |¦«t |¦«kS) afReturns True if the first axis assignment is preferred over the second. For now, this is implemented with some very simple heuristics. However, it is possible to introduce e.g., a value function here based on a more precise model of the underlying hardware. TODO(rosun): Use a proxy of network capacity to select the partitions. Args: x: Logical axis assignment as [len(physical_mesh_shape)] array. y: Logical axis assignment as [len(physical_mesh_shape)] array. physical_mesh_shape: Physical mesh shape. assignment: Assignment matrix. Returns: True if x is preferred over y. có2•—g|]\}}|‰|k¯|‘ŒSrVrV©r:rZÚsr}s €r0rXz9_prefer_first_logical_axis_assignment..Ñó-ø€ÐAÐAÐA‰TˆQ QÐ*=¸aÔ*@Ò%@Ð%@€qÐ%@Ð%@Ð%@r2có2•—g|]\}}|‰|k¯|‘ŒSrVrVr¥s €r0rXz9_prefer_first_logical_axis_assignment..Ôr§r2có>•—g|]\}}|‰|k¯|dk¯d‘ŒS©rrVr¥s €r0rXz9_prefer_first_logical_axis_assignment..Þó2ø€ÐKÐKÐK‰TˆQ QÐ*=¸aÔ*@Ò%@Ð%@ÀQÈÂUÀU€qÀUÀUÀUr2có>•—g|]\}}|‰|k¯|dk¯d‘ŒSrªrVr¥s €r0rXz9_prefer_first_logical_axis_assignment..ár«r2r$rcó2•—g|]\}}‰|dk¯|‘ŒSrªrV©r:rZr¦Úassigned_physical_mesh_shapes €r0rXz9_prefer_first_logical_axis_assignment..ïó-ø€ÐIÐIÐI‰TˆQÐ%AÀ!Ô%DÀqÒ%HÐ%H€qÐ%HÐ%HÐ%Hr2có2•—g|]\}}‰|dk¯|‘ŒSrªrVr®s €r0rXz9_prefer_first_logical_axis_assignment..òr°r2)r(rfrar%r>)rwrxr}roÚx_whole_axis_sizeÚy_whole_axis_sizeÚx_num_whole_axesÚy_num_whole_axesÚx_non_overlapping_axis_sizeÚy_non_overlapping_axis_sizer¯s ` @r0r€r€³s¢øø€õ:”gØAÐAÐAÐA•Y˜q‘\”\ÐAÑAÔAñôÐõ”gØAÐAÐAÐA•Y˜q‘\”\ÐAÑAÔAñôÐðÐ+Ò+Ð+ØÐ0Ò0Ð0õ ØKÐKÐKÐK•Y˜q‘\”\ÐKÑKÔKñôÐõØKÐKÐKÐK•Y˜q‘\”\ÐKÑKÔKñôÐðÐ)Ò)Ð)ØÐ.Ò.Ð.õ"$¤¨¸"Ð!=Ñ!=Ô!=Ðå "¤ØIÐIÐIÐI•Y˜q‘\”\ÐIÑIÔIñ!ô!Ðõ!#¤ØIÐIÐIÐI•Y˜q‘\”\ÐIÑIÔIñ!ô!Ðð!Ð$?Ò?Ð?Ø&Ð)DÒDÐDõ ˆq‰Œ•E˜!‘H”HÒ Ðr2r‚c óð—tjtjtjt |j¦«tj¬¦«d¬¦«|j¦« dg¦«}tjtjtjt |¦«tj¬¦«d¬¦«|j¦« dg¦«}tj|| dg¦«¦«}ttt||tt |¦«¦«¦«¦«Ž\}}}tj||¦«}tj||¦«}|S)aCompute the logical mesh from assignment map. Args: physical_mesh: Physical device mesh. logical_mesh_shape: Logical mesh shape. assignment: 2-d assignment matrix shape [physical_dims, logical_dims]. Returns: Logical mesh reshaped from physical mesh. r]r$rr)r(Úbroadcast_toÚexpand_dimsÚaranger%r`rjr,rerCrbrm)rQr‚roÚphysical_indicesÚlogical_indicesr†rWÚtranspose_axess r0rrýsT€õ”_Ý„nÝ Œ)•C˜ Ô+Ñ,Ô,µB´HÐ =Ñ =Ô =ÀBðñôðÔñ ô÷ ‚GˆRˆDM„Mðõ”OÝ„nÝ Œ)•CÐ*Ñ+Ô+µ2´8Ð <Ñ <Ô <À1ðñôðÔñ ô÷ ‚GˆRˆDM„Mðõ ”˜M¨:×+=Ò+=¸r¸dÑ+CÔ+CÑDÔD€,õÝ Ý ˆoÐ/µµs¸?Ñ7KÔ7KÑ1LÔ1LÑ MÔ MñôðÑ€!€Qˆõ ”˜l¨NÑ;Ô;€,õ”˜LÐ*<Ñ=Ô=€,à Ðr2cóz—t|d¦«s Jd¦«‚|j\}}}|dz|dz|dz|jdzfS)z*Gets the bound from the given last device.r6zOnly TPU supportedr)Úhasattrr6Úcore_on_chip)Úlast_devicerwrxÚzs r0Ú_bounds_from_last_devicerÄ3sT€õ ˜hÑ 'Ô 'Ð=Ð=Ð)=Ñ=Ô=Ð=ØÔ'€!€QˆØ ˆQ‰A‘q˜1‘u˜kÔ6¸Ñ:Ð :Ð:r2Újax_devicesc óØ—|dj}d„|D¦«}td„t|¦«D¦«¦«}t|¦«dks J|¦«‚|tt fvr‰td„|D¦«¦«dz}t j|dd…|fzt¬ ¦«}t||¦«D]4\}}|ddks J|¦«‚|||d|d|j f<Œ5ntt j|t¬ ¦«}t||¦«D]H\}}|j dkrtd |j ›d|›d|›d ¦«‚|||d|d|df<ŒI|S)aVRearrange TPU devices in a slice into a physical mesh. Args: jax_devices: A list of JAX devices in a TPU slice in process-tiled z, y, x, core order, e.g. from jax.devices(). Returns: A np.ndarray of JAX devices with shape [global_x, global_y, global_z]. On v2 and v3, global_z is instead cores_per_chip (i.e., 2). rcó—g|] }|j‘Œ SrV)r6r9s r0rXz*_get_physical_tpu_mesh..Is€Ð1Ð1Ð1 1”8Ð1Ð1Ð1r2c3ó K—|] }|dzV—Œ dS)rNrVr9s r0r<z)_get_physical_tpu_mesh..Js&èè€Ð1Ð1˜ˆq1‰uÐ1Ð1Ð1Ð1Ð1Ð1r2rc3ó$K—|]}|jV—ŒdS©N)rÁr9s r0r<z)_get_physical_tpu_mesh..Ms$èè€Ð=Ð=¨A˜œÐ=Ð=Ð=Ð=Ð=Ð=r2rNr r]zZCreating meshes for TPU >v3 requires one device per chip ("megacore" mode). Got device id z for a device of kind z: r|)Údevice_kindr>rBr%Ú_TPU_V2Ú_TPU_V3r(ÚemptyÚobjectrerÁÚAssertionError)rÅrËÚ device_coordsÚdimsÚcores_per_chipÚoutr6r;s r0Ú_get_physical_tpu_meshrÕ=s¸€ð˜A”Ô*€+Ø1Ð1 [Ð1Ñ1Ô1€-Ý Ð1Ð1c -Ñ0Ô0Ð1Ñ1Ô1Ñ 1Ô 1€$Ý ˆT‰ŒaŠˆˆ˜‰ŒˆØ•WgÐ&Ð&Ð&ÝÐ=Ð=°Ð=Ñ=Ô=Ñ=Ô=ÀÑA€NÝ Œ(4˜˜˜”8˜~Ð/Ñ/µvÐ >Ñ >Ô >€CÝ˜¨Ñ4Ô4ð4ð4‰ ˆØ AŒY˜!Š^ˆ^ˆ^˜Q‰^Œ^ˆ^Ø23€cˆ&Œ)V˜A”Y ¤Ð .Ñ/Ð/ð4õ Œ(4vÐ &Ñ &Ô &€CÝ˜¨Ñ4Ô4ð/ð/‰ ˆØ Œ˜1Ò Ð Ýð ,Ø12´ð ,ð ,à#ð ,ð ,à'(ð ,ð ,ð ,ñ ô ð ð ./€cˆ&Œ)V˜A”Y q¤ Ð )Ñ*Ð*Ø €*r2cób—t|¦«}|j}|tvrtd|›¦«‚d}|D]}|dkr||fz }Œ|t|vrBtd|›d|›dt t| ¦«¦«›¦«‚|jt||ŽS)NzQcreate_device_mesh cannot create contiguous submeshes for physical mesh topology rVrzEcreate_device_mesh cannot create contiguous submeshes for mesh_shape z and physical mesh topology z. Available mesh_shapes: )r>r`rr~r_Úkeysrm)rQrÚtopologyÚmesh_shape_no_trivial_dimsÚdim_sizes r0Ú_transpose_trickrÛ`s €õZÑ Ô €*Ø Ô €(Ø Õ&Ð&Ð&Ý ð -Ø"*ð -ð -ñôðð 13ÐØð0ð0€hØ1‚}€}Ø X KÑ/Ð øàÕ'8¸Ô'BÐBÐBÝ ð MØ ð Mð MØ>Fð Mð Må"&Õ'8¸Ô'B×'GÒ'GÑ'IÔ'IÑ"JÔ"Jð Mð Mñôðð !ˆÔ Ý˜Ô"Ð#=Ô>ð ðr2)Úcontiguous_submeshesrPúSequence[Any] | NonerÜcó —|€tj¦«}tj|¦«t |¦«kr"tdt |¦«›d|›¦«‚|d}t |jd¦«}|||||¬¦«}||S|j dkr8t|¦«}|rt||¦«}t|||¬¦«\}} |Stj |¦« |¦«}|S)ahCreates a performant device mesh for jax.sharding.Mesh. Args: mesh_shape: shape of logical mesh, ordered by increasing network-intensity e.g. [replica, data, mdl] where mdl has the most network communication requirements. devices: optionally, the devices to construct a mesh for. Defaults to jax.devices(). contiguous_submeshes: if True, this function will attempt to create a mesh where each process's local devices form a contiguous submesh. A ValueError will be raised if this function can't produce a suitable mesh. This setting was sometimes necessary before the introduction of jax.Array to ensure non-ragged local arrays; if using jax.Arrays, it's better to keep this set to False. allow_split_physical_axes: If True, we will split physical axes if necessary to produce the desired device mesh. Raises: ValueError: if the number of devices doesn't equal the product of `mesh_shape`. Returns: A np.ndarray of JAX devices with mesh_shape as its shape that can be fed into jax.sharding.Mesh with good collective performance. NzNumber of devices z& must equal the product of mesh_shape r$)rÜÚtpurO)Úxbrr(rfr%r~rNÚgetrËÚplatformrÕrÛrzr)r,) rrrÜrPrÂÚhandlerr rQr.rWs r0Úcreate_device_meshrä|s9€ð@ €_ÝŒj‰lŒl€GÝ„WˆZÑÔC ™LœLÒ(Ð(Ý ð &S ™\œ\ð &ð &Ø#ð &ð &ñôðð˜”€+å$×(Ò(¨Ô)@À$ÑGÔG€'ØÐØ ˆWØGÐ2Fðñô€FðÐØ €màÔ˜UÒ"Ð"Ý*¨7Ñ3Ô3€MØðBÝ& }°jÑAÔA€mÝ5ØØØ";ðñôN€Kð Ðå”*˜WÑ%Ô%×-Ò-¨jÑ9Ô9€KØÐr2T)Úprocess_is_granuleÚshould_sort_granules_by_keyrPÚdcn_mesh_shaperåræcó8‡‡‡‡ —|€tj¦«}|rdnd}t|d|¦«sJ‚tjt ¦«Š|D]+}‰t ||¦« |¦«Œ,|r-ˆfd„t‰ ¦«¦«D¦«n‰ ¦«}tj|¦«t|¦«kr"tdt|¦«›d|›¦«‚ˆˆfd„|D¦«Š tjt|¦«¦« |¦«} tjˆ fd „t$g¬ ¦«| ¦«} tj| ¦«¦«}|S)aÎCreates a device mesh for hybrid (e.g., ICI and DCN) parallelism. Args: mesh_shape: shape of the logical mesh for the faster/inner network, ordered by increasing network intensity, e.g. [replica, data, mdl] where mdl has the most network communication requirements. dcn_mesh_shape: shape of the logical mesh for the slower/outer network, in the same order as mesh_shape. devices: optionally, the devices to construct a mesh for. Defaults to jax.devices(). process_is_granule: if True, this function will treat processes as the units of the slower/outer network. Otherwise it will look for slice_index attributes on devices and use slices as the units. Enabling this is meant as a fallback for platforms that don't set slice_index. should_sort_granules_by_key: Whether device granules should be sorted by the granule key, either slice or process index, depending on process_is_granule. allow_split_physical_axes: If True, we will split physical axes if necessary to produce the desired device mesh. Raises: ValueError: if the number of slices to which the `devices` belong doesn't equal the product of `dcn_mesh_shape`, or if the number of devices belonging to any single slice does not equal the product of `mesh_shape`. Returns: A np.ndarray of JAX devices with mesh_shape * dcn_mesh_shape as its shape that can be fed into jax.sharding.Mesh for hybrid parallelism. NÚ process_indexÚslice_indexrcó •—g|] }‰|‘ŒSrVrV)r:rAÚgranule_dicts €r0rXz-create_hybrid_device_mesh..êsø€Ð@Ð@Ð@˜S€|CÔÐ@Ð@Ð@r2zNumber of slices z* must equal the product of dcn_mesh_shape có4•—g|]}t‰|‰¬¦«‘ŒS)rO)rä)r:ÚgranulerPrs €€r0rXz-create_hybrid_device_mesh..ósCø€ððððõØ Ø Ø$=ðñôðððr2có•—‰|SrÊrV)rZÚper_granule_meshess €r0r@z+create_hybrid_device_mesh..ýs ø€Ð"4°QÔ"7€r2)Úotypes)ràrrÀr’r“r_r8rlrCr×Úvaluesr(rfr%r~r»r,Ú vectorizerÏÚblockÚtolist)rrçrrårærPÚattrÚdevÚgranulesÚgranule_meshÚblocksr.rìrðs` ` @@r0Úcreate_hybrid_device_meshrû¼sÈøøøø€ðL €_ÝŒj‰lŒl€GØ.Ð Aˆˆ°M€$Ý ˜”˜TÑ "Ô "Ð"Ð"Ð"ÝÔ(Ñ.Ô.€,Ø ð1ð1€cØ•˜˜dÑ#Ô#Ô$×+Ò+¨CÑ0Ô0Ð0Ð0ð %ð!Ð@Ð@Ð@Ð@¥F¨<×+<Ò+<Ñ+>Ô+>Ñ$?Ô$?Ð@Ñ@Ô@Ð@à×ÒÑ Ô ðõ „Wˆ^ÑÔ¥ H¡ ¤ Ò-Ð-Ý ð +C ™MœMð +ð +Ø(ð +ð +ñôðððððððð ñôÐõ”3˜x™=œ=Ñ)Ô)×1Ò1°.ÑAÔA€,ØI2Œ<Ð7Ð7Ð7Ð7ÅÀÐIÑIÔIØñô€&õ”˜Ÿš™œÑ)Ô)€+Ø Ðr2)rrrr r!r")rrrr r!r3)rQr"rrrPrRr!rS)rQr"rrr!rS)rwrkr!r‡)r}rror"rqrkr!rŽ) rwr"rxr"r}rror"r!rR)rQr"r‚rror"r!r")r!r)rÅr r!r")rQr"rrr!r"rÊ) rrrrÝrÜrRrPrRr!r")rrrçrrrÝrårRrærRrPrRr!r")-Ú__doc__Ú __future__rr’Úcollections.abcrrrrrcÚloggingr‰ÚtypingrÚjax._srcr ràÚnumpyr(Ú getLoggerÚ__name__r&rÌrÍÚ_TPU_V4Ú_TPU_V5_LITErÚ__annotations__r+rDrEr1rMrNrzrhrrr€rrÄrÕrÛrärûrVr2r0úrsþðð(Ð'Ð'à"Ð"Ð"Ð"Ð"Ð"àÐÐÐØIÐIÐIÐIÐIÐIÐIÐIÐIÐIÐIÐIØÐÐÐØ€€€Ø€€€ØÐÐÐÐÐà%Ð%Ð%Ð%Ð%Ð%ØÐÐÐà ˆÔ ˜8Ñ $Ô $€à €Ø €Ø €Ø€ð ðð ðð ððØððð ððØððð#ðððððñð4,ÐØ#ÐØMÐð3ð3ð3ð3ð8$ð$ð$ð$ð\Ð *ØÐ *ØÐ)ðððððñð',ð {ð{ð{ð{ð{ð{ð|J"ðJ"ðJ"ðJ"ðZðððð@ð@ð@ð@ðFGðGðGðGðT3ð3ð3ð3ðl;ð;ð;ð;ð ð ð ð ðFðððð<%)ð=ð"'Ø&+ð=ð=ð=ð=ð=ð=ðF%)ðEð %Ø(,Ø&+ðEðEðEðEðEðEðEðEr2