Mojo nightly

Version: 1.0.0b2.dev2026052006

May 20, 2026

This version is still a work in progress.

Language enhancements

Types can parameterize the out argument modifier when they want into being bindable to alternate address spaces, e.g.:

struct MemType(Movable):
  # Can be constructed into any address space.
  def __init__[addr_space: AddressSpace](out[addr_space] self):
      ...

  # Only constructable into GLOBAL address space.
  def __init__(arg: Int, out[AddressSpace.GLOBAL] self):
      ...

Mojo now supports building types that support implicit conversions for widening origins, allowing code like this to "just work" without rebind:

def origin_superset_conversion(
  a: String, b: String, c: Bool
) -> Pointer[String, origin_of(a, b)]:
  if c:  # These pointers implicitly convert.
      return Pointer(to=a)
  else:
      return Pointer(to=b)

Language changes

Support for "set-only" accessors has been removed. You need to define a __getitem__ or __getattr__ to use a type that defines the corresponding setter. This eliminates a class of bugs determining the effective element type.
Implicit std imports are now an error, following a period of deprecation. Imports from the standard library must now be fully qualified. The compiler thus no longer squats on these module names, paving the way for user modules named algorithm, memory, etc.

Library changes

Coord, coord(), Idx, ComptimeInt, RuntimeInt, and related coordinate helpers now live in the standard library module std.utils.coord. The layout.coord module re-exports the same symbols for layout and kernel code; layout also hoists the common names at package scope for convenience.
Python -> Mojo FFI calls registered through PythonModuleBuilder and PythonTypeBuilder have significantly reduced per-call overhead:
- Non-kwargs callables registered with def_function / def_method / def_staticmethod now use CPython's METH_FASTCALL calling convention rather than METH_VARARGS. Kwargs-accepting functions still use METH_VARARGS | METH_KEYWORDS.
- PythonObject.__del__ skips the PyGILState_Ensure / PyGILState_Release round-trip when the current thread already holds the GIL (checked via PyGILState_Check). On the common Python -> Mojo FFI path (where CPython hands the callee an already-held GIL) the destructor pays just the check and a direct Py_DecRef. The public contract is unchanged - dropping a PythonObject from a thread that does not hold the GIL remains safe.
- Int(py=obj) and Scalar[IntDType](py=obj) fast-path exact Python int via PyLong_AsSsize_t.
Added TileTensor.copy_from() and TileTensor.split() for copying between compatible tile views and splitting tiles into static or runtime-sized partitions.
String.as_bytes_mut() has been renamed to String.unsafe_as_bytes_mut(), to reflect that writing invalid UTF-8 to the resulting Span[Byte] can lead to later issues like out of bounds access.
List[T] no longer requires its type to be Copyable, but now works with Movable-only types. Iteration still requires Copyable and will emit a comptime assert if not satisfied.

reflect[T] is now a comptime alias for the Reflected[T] handle type rather than a function returning a zero-sized handle instance. All methods on Reflected[T] are @staticmethods, and the type is no longer constructible. Drop the parens at call sites:

# Before
comptime r = reflect[Point]()
print(r.field_count())
print(reflect[Point]().name())
comptime y_handle = reflect[Point]().field_type["y"]()
var v: y_handle.T = 3.14

# After
comptime r = reflect[Point]
print(r.field_count())
print(reflect[Point].name())
comptime y_handle = reflect[Point].field_type["y"]
var v: y_handle.T = 3.14

field_type[name] is now a parametric comptime member alias that yields Reflected[FieldT] directly — no trailing (), and the result is fully composable (e.g. reflect[T].field_type["x"].name()). The previously deprecated free functions get_type_name, get_base_type_name, and the struct_field_* family (along with the ReflectedType[T] wrapper) have been removed; use the corresponding methods on reflect[T]:

Removed	Replacement
`get_type_name[T]()`	`reflect[T].name()`
`get_base_type_name[T]()`	`reflect[T].base_name()`
`is_struct_type[T]()`	`reflect[T].is_struct()`
`struct_field_count[T]()`	`reflect[T].field_count()`
`struct_field_names[T]()`	`reflect[T].field_names()`
`struct_field_types[T]()`	`reflect[T].field_types()`
`struct_field_index_by_name[T, name]()`	`reflect[T].field_index[name]()`
`struct_field_type_by_name[T, name]()`	`reflect[T].field_type[name]`
`struct_field_ref[idx, T](s)`	`reflect[T].field_ref[idx](s)`
`offset_of[T, name=name]()`	`reflect[T].field_offset[name=name]()`
`offset_of[T, index=index]()`	`reflect[T].field_offset[index=index]()`
`ReflectedType[T]`	`Reflected[T]`

Added ReflectedFn[func], a function-side reflection handle accessed via the reflect_fn[func] comptime alias. Exposes function introspection through static methods, paralleling the type-side Reflected[T] API:

from std.reflection import reflect_fn

def my_func(x: Int) -> Int:
    return x + 1

def main():
    print(reflect_fn[my_func].display_name())  # "my_func"
    print(reflect_fn[my_func].linkage_name())  # mangled symbol name

Added alloc, free, and Layout in memory.alloc for layout-aware memory allocation. A Layout[T] bundles an element count and alignment into a single value that is passed to both alloc and free, keeping size and alignment requirements explicit and co-located at every call site.
```
from memory import alloc, free, Layout

var layout = Layout[Int32](count=4)
var ptr = alloc(layout)
# ... initialize & use ptr ...
free(ptr, layout)
```
The default seed for random.Random, random.NormalRandom, and the internal _PhiloxWrapper has changed from 0 to 0x3D30F19CD101 (67280421310721) to match PyTorch's at::Philox4_32_10 default. Calls that omitted the seed argument will now produce a different output stream; pass seed=0 explicitly to keep the previous behavior.
Added nth() as a default method on the Iterator trait. It advances the iterator by n elements (destroying them) and returns the next element, or None if the iterator runs out before reaching index n.
```
var l = [10, 20, 30, 40]
print(iter(l).nth(0).value())   # 10
print(iter(l).nth(3).value())   # 40
var missing = iter(l).nth(10)   # None (Optional)
```
String and StringSlice now have a keyword only string[codepoint=...] that indexes by unicode codepoint offsets.
Several StringSlice constructors are now deprecated.
- StringSlice(ptr=..., length=...) is deprecated; use StringSlice(unsafe_from_utf8=Span(...)) instead.
- StringSlice(unsafe_from_utf8_ptr=...) (taking a raw nul-terminated UnsafePointer[Byte] or UnsafePointer[c_char]) is deprecated; construct a CStringSlice from the pointer and use the new StringSlice(unsafe_from_utf8=CStringSlice(...)) constructor instead.

PythonObject convertibility got simplified and cleaned up. When working with types that required custom conversions to PythonObject, we used to write code like this:

struct MyCustomType(ConvertibleToPython, ImplicitlyCopyable):
   def to_python_object(var self) raises -> PythonObject:
      return PythonObject( ... custom logic ...)

def hi_python(a: Some[ImplicitlyCopyable & ConvertibleToPython]) raises:
    print(t"Hi, {a.to_python_object()}!")

def example():
    hi_python(MyCustomType())

This approach allows custom types to implement ConvertibleToPython to get a domain specific encoding as a Python object. Mojo has simplified this by making all ConvertibleToPython types implicitly convert to PythonObject, so this can/should be simplified to:

def hi_python(a: PythonObject) raises:
    print(t"Hi, {a}!")

The Intable trait no longer inherits from ImplicitlyDestructible. Generic code that relied on receiving the destructor bound transitively through this trait must now spell it out explicitly, for example T: Intable & ImplicitlyDestructible.

Tooling changes

The mojo compiler will now print the filename and line number in diagnostics that point to inaccessible source locations (e.g., from precompiled libraries) instead of a location at the top of the main file:

# Before
$> mojo example.mojo
/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraint
   vec[base + i] = values[i].cast[dtype]()
   ~~~^~~~~~~~~~

/path/to/example.mojo:1:1: note: constraint declared here evaluated to False, expected 'mut'
from std.algorithm.functional import elementwise
^
/path/to/example.mojo:1:1: note: function declared here
from std.algorithm.functional import elementwise
^

# After
$> mojo example.mojo
/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraint
    vec[base + i] = values[i].cast[dtype]()
    ~~~^~~~~~~~~~

max/kernels/src/layout/layout_tensor.mojo:2092: note: constraint declared here evaluated to False, expected 'mut'
max/kernels/src/layout/layout_tensor.mojo:2090: note: function declared here

The mojo package command has renamed to mojo precompile. Similarly, the .mojopkg file extension has been deprecated; favor the .mojoc file extension instead.
```
# Before
mojo package my_package -o my_package.mojopkg

# After
mojo precompile my_package -o my_package.mojoc
```

GPU programming

Added DeviceContextList[size] in std.gpu.host: a fixed-size, Copyable/ImplicitlyCopyable/Sized collection of DeviceContext values. Multi-device custom-op execute methods now receive a DeviceContextList[N] — the graph compiler synthesizes one from the per-device contexts attached to the op via a variadic constructor. Kernels can index into it with dev_ctxs[i] (runtime) or dev_ctxs.__getitem_param__[i]() (comptime), and iterate with len(). This replaces the previous DeviceContextPtrList pattern.

from gpu.host import DeviceContext, DeviceContextList

@compiler.register("mo.distributed.allreduce.sum")
struct DistributedAllReduceSum:
    @staticmethod
    def execute[
        dtype: DType, rank: Int, target: StaticString, _trace_name: StaticString,
    ](
        outputs: FusedOutputVariadicTensors[dtype=dtype, rank=rank, ...],
        inputs: InputVariadicTensors[dtype=dtype, rank=rank, ...],
        signal_buffers: MutableInputVariadicTensors[dtype=DType.uint8, rank=1, ...],
        dev_ctxs: DeviceContextList,
    ) capturing raises:
        comptime num_devices = inputs.size
        # ... use dev_ctxs[i] per device ...

DeviceContext.enqueue_function[func] and DeviceContext.compile_function[func] now accept a single kernel argument instead of requiring it to be passed twice. The previous two-argument forms enqueue_function[func, func] and compile_function[func, func] are deprecated. The transitional enqueue_function_experimental and compile_function_experimental aliases are also deprecated; switch to enqueue_function / compile_function.
```
# Before
ctx.enqueue_function[my_kernel, my_kernel](grid_dim=1, block_dim=1)
ctx.enqueue_function_experimental[my_kernel](grid_dim=1, block_dim=1)

# After
ctx.enqueue_function[my_kernel](grid_dim=1, block_dim=1)
```

Removed

The DeviceContextPtr and DeviceContextPtrList types have been removed from std.runtime.asyncrt. Custom-op execute methods now take DeviceContext directly (or Optional[DeviceContext] where the context is genuinely optional), and multi-device ops take DeviceContextList[N] (see the new entry under Library changes). The helpers get_device_context() and get_optional_device_context() are no longer needed — pass the DeviceContext through directly. The CpuDeviceContext runtime always supplies a real context for the CPU path, so the nullable wrapper is no longer required.

# Before
from runtime.asyncrt import DeviceContextPtr, DeviceContextPtrList

@compiler.register("my_op")
struct MyOp:
    @staticmethod
    def execute[target: StaticString](
        output: OutputTensor,
        input: InputTensor,
        ctx: DeviceContextPtr,
    ) raises:
        var gpu_ctx = ctx.get_device_context()
        ...

# After
from gpu.host import DeviceContext

@compiler.register("my_op")
struct MyOp:
    @staticmethod
    def execute[target: StaticString](
        output: OutputTensor,
        input: InputTensor,
        ctx: DeviceContext,
    ) raises:
        ...

The use_blocking_impl parameter has been removed from elementwise (in std.algorithm.functional), and the analogous single_thread_blocking_override parameter has been removed from the reduction APIs (reduce, max, min, sum, product, mean in std.algorithm.reduction). These operations now always dispatch work the same way, with a single worker used automatically when the problem size is small, so the blocking variants are no longer needed.
The legacy fn keyword now produces an error instead of a warning. Please move to def.
The previously-deprecated constrained[cond, msg]() function has been removed. Use comptime assert cond, msg instead.
The previously-deprecated Int-returning overload of normalize_index has been removed. Use the UInt-returning overload (or write the index arithmetic inline, e.g. x[len(x) - 1]).
The previously-deprecated default UnsafePointer() null constructor has been removed. To model a nullable pointer use Optional[UnsafePointer[...]]. For a non-null placeholder for delayed initialization, use UnsafePointer.unsafe_dangling().
The deprecated free-function reflection API in std.reflection has been removed. Use the unified reflect[T]() -> Reflected[T] API instead.

Migration table:
- struct_field_count[T]() → reflect[T]().field_count()
- struct_field_names[T]() → reflect[T]().field_names()
- struct_field_types[T]() → reflect[T]().field_types()
- struct_field_index_by_name[T, name]() → reflect[T]().field_index[name]()
- struct_field_type_by_name[T, name]() → reflect[T]().field_type[name]()
- struct_field_ref[idx](s) → reflect[T]().field_ref[idx](s)
- is_struct_type[T]() → reflect[T]().is_struct()
- offset_of[T, name=...]() → reflect[T]().field_offset[name=...]()
- offset_of[T, index=...]() → reflect[T]().field_offset[index=...]()
- ReflectedType[T] → Reflected[T]
String and StringSlice can now be sliced by codepoints, e.g. String("🔄🔥🔄")[codepoint=1:2] returns "🔥".
String and StringSlice can now be indexed by graphemes, e.g. String("👨‍🚀🧑‍🌾क्षि")[grapheme=1] returns "🧑‍🌾".

Fixed

Reduced the virtual address space reserved by every mojo invocation by ~1 GiB. The JIT memory mapper's reservation granularity was 1 GiB, so each fresh reservation was rounded up to that size and mmapped PROT_READ|PROT_WRITE, inflating VmPeak and counting against Linux RLIMIT_AS. This caused non-deterministic OOM crashes in libKGENCompilerRTShared.so when two mojo processes ran concurrently on memory-constrained CI runners (e.g. GitHub Actions free-tier, 7 GiB). The granularity is now 64 MiB; large compiles still work because the mapper reserves additional slabs on demand. (Issue #6433)
Attempting to import a source Mojo package from a broken symlink will no longer result in a compiler crash. (Issue #6424)

Language enhancements​

Language changes​

Library changes​

Tooling changes​

GPU programming​

Removed​

Fixed​