As we are nearing the release of liquidsoap 2.2.0
, some of our users have raised concerns with memory usage in the application. We made a couple of solid
improvements here and there but there is probably more
to do.
As new functionalities are added and getting everyone excited, it’s easy to loose track of the fact that more feature can also mean more resource consumption. In the liquidsoap scripting language, typically, the addition of record and methods has led to an amazing series of API cleanup and new features but, as a result, our standard library has grown a lot and, also, pretty much all runtime values now have methods attached to them.
Likewise, the new multitrack feature can also increase resource usage, mostly CPU, because a source can now potentially be queried several time per streaming loop, once for each content type (audio, video, etc). This should be under control now, though!
Another aspect of our iterative improvement that can lead to temporary memory increase is long-lasting, under the hood, changes. Since 2.1.0
, we have been
slowly moving toward a immutable content API. That is, an API where media content are decoded as elementary chunks and passed down as-is through the streaming
process before being eventually delivered to the outputs. Only operators that do have to modify content (amplify
, crossfade etc.) would then create
new content.
Immutable content do make a lot of sense when dealing with video content and fit better with the FFMpeg content API. Internally, we are now using immutable content chunks but the operators still need to be migrated, which can lead to temporary increase of resource usage (CPU and memory).
Lastly, as the application is growing and moving toward video processing, we are also hitting some technical limitations of the OCaml compiler and garbage collector. This was reported here.
Needless to say, we plan on addressing all of these either as 2.2.x
follow-ups, when possible, or as part of the larger changes planned for 2.3.x
. Meanwhile,
we have written some documentations on memory usage and how you can tweak and optimize it:
Controlling memory usage
When using liquidsoap in production, it can be important to understand how to control the memory footprint of the application. This is not an easy topic as there are several layers of memory management inside the application and also some trade-off considerations between memory footprint and CPU usage.
As of writing (version 2.2.0
), some of the trade-off that we are making with the OCaml garbage collector do not seem
satisfactory in some memory-intensive conditions. Hopefully, this will improve in future major release (2.3.x
and later).
But first, let’s look at what’s going on.
The OCaml memory model
The OCaml compiler provides a garbage collector. This module is able to track memory blocks used by the OCaml program and free them when they are not used without the programmer’s intervention.
This is done by scanning the memory currently allocated by the OCaml program to identify the memory blocks that are not in use anymore. While this is transparent to the user (you!), this also means that there will be extra CPU cycles dedicated to this operation.
How often these cycle occur help controlling the growth of unused memory but with the understanding that to minimize unused memory, more CPU cycles have to be dedicated to tracking it.
You can find more information about the OCaml garbage collection on this page.
Inside liquidsoap scripts, the operations that the OCaml compiler provides to control the garbage collector are available within the
runtime.gc
module. The documentation for these operations can be found in the OCaml Gc module documentation.
Typically, to change the garbage collector parameters, one can do:
# This code was contributed by AzuraCast:
# Possible settings:
# - less memory: space_overhead = 20
# - less cpu: space_overhead = 140
# - balanced: space_overhead = 80
# Optimize for memory usage over CPU
# This results in a slightly increased
# CPU usage and reduced memory usage.
runtime.gc.set(runtime.gc.get().{
space_overhead = 20,
allocation_policy = 2
})
These parameters and functions make it possible to experiment and see if you can find better parameters for your application.
C memory allocations
Not all the memory in the application is allocated by the OCaml garbage collector. External libraries such as ffmpeg
, libmp3lame
and etc. need to allocate their own memory. This is usually referred to as C memory allocations though it does not have
to be allocated by a program written in C
.. Another, more technically appropriate is heap memory though, dynamically memory allocated
by the OCaml garbage collector also lives in the program’s heap.. 😅
This type of memory is also cleaned up by the OCaml garbage collector. To do so, a custom block is passed to the OCaml program with a reference to a C memory pointer and how to clean it up. When the OCaml program detects that this custom block is no longer in use, it triggers the required operations to clean its corresponding C memory.
However, things get complicated when considering how to fine-tune the garbage collector to account for memory allocated on the C side..
Remember that, as we discussed in the previous section, the garbage collector has to consume CPU cycles to free up memory. And, in the case of memory allocated on the C side, a single OCaml value (usually a small amount of memory) can actually refer to a much larger amount of C memory. This is typically the case when the corresponding C memory represents decoded video frames, which is usually a fairly large amount of memory.
In general, the trade-off is: if the garbage collector does not run often enough, a lot of these rather larger C memory blocks are lingering longer, which leads to potentially huge amount of memory needlessly consumed by the application.
Conversely, if the garbage collector runs too often, memory usage is controlled but CPU usage is increased.
As of now, the strategy implemented by the OCaml compiler consists in tracking the ratio of OCaml held memory vs. its corresponding C memory and running the garbage collector more often when this ratio increases. However, this is not optimal in cases where the application purposefully holds large amount of C memory such as when doing video processing.
In the future, we would like to explore tightening up our control of this mechanism. It should be possible trick the garbage collector by not declaring the full anmount of allocated C memory to make it possible to run the memory cleaning operations on purpose and at specific times, typically after a streaming cycle has ended.
Most of the tools for that are already exported in the scripting language so, we will make sure to report our progress on the blog for anyone to test it.
Audio data format
Another source of memory usage is the audio data format. By default, we store audio data using OCaml’s native floating point numbers in order to be able to run the application, including audio processing (crossfade, filters, fades etc) at the best possible speed and CPU usage. However, OCaml’s native float are stored using 64 bits (8 bytes), which is a large amount of memory per number.
If you are concerned with reducing your audio memory footprint, for instance if your applications has a lot of audio sources with buffers, you can do a couple of things:
- Use the ffmpeg raw content.
This means storing all the audio content as ffmpeg audio frames. This is an opaque format that works very well if your script can use ffmpeg end-to-end, for instance processing audio using ffmpeg filters..
- Use one of the
pcm_f32
orpcm_s16
audio format.
These formats are less opaque. Their data is stored in a C memory array and can be accessed by the OCaml program. Some, but not all, of our operators
do support them transparently. When using pcm_s16
, audio samples are stored as 16 bit signed integers (2 bytes, the audio CD format). When using pcm_f32
, audio samples are stored as
32 bit float (4 bytes). 16 bit signed integers is probably enough for most applications and consumes 4 times less memory than OCaml’s native floating point numbers.
The pcm_*
formats can be required by the encoders by adding pcm_s16
or pcm_f32
to their list of parameters. This will, in turn, inform all operators
and decoders to operate with this format, if they support it:
# Mp3 encoder, pcm_s16
encoder = %mp3(pcm_s16, channels=2)
# Ogg/opus encoder, pcm_f32
encoder = %ogg(%vorbis(pcm_f32))
# FFmpeg AAC encoder, pcm_s16
encoder = %ffmpeg(format="mp4",%audio(pcm_s16, codec="aac"))
For both pcm_*
and ffmpeg raw formats, you can use also conversion functions (ffmpeg.raw.decode.*
, ffmpeg.raw.encode.*
, audio.decode.pcm_*
, audio.encode.pcm_*
) to convert
content back and forth.
In general, working with the pcm_*
formats is easier. If you know what you are doing, though, working with raw FFmpeg frames can also have some advantages. In both cases,
there might be an increase in CPU usage if your script needs to process audio (for instance via a crossfade
) when converting these formats back and forth.
Finally, if you need to store large amount of audio data, for instance to create a one hour delay, you should consider using the track.audio.defer
operator which was designed for
this purpose.
jemalloc
Lastly, the user-land memory allocator jemalloc can be used to control all memory allocations (C and OCaml). This allocator is particularly good at preventing memory fragmentation, which is an important topic for an application like liquidsoap running short streaming cycle involving small amount of memory (FFmpeg frames etc).
The allocator is enabled by installing the jemalloc
opam package and is included in all our production builds (except windows). It also comes with a lot of customization options
that are exported via the runtime.jemalloc.* functions.
If you want to explore more, we recommend reading about it and then exploring the manual page which contains details about all the available settings.