Controlling memory usage
When using liquidsoap in production, it can be important to understand how to control the memory footprint of the application. This is not an easy topic as there are several layers of memory management inside the application and also some trade-off considerations between memory footprint and CPU usage.
As of writing (version 2.2.0
), some of the trade-off
that we are making with the OCaml garbage collector do not seem
satisfactory in some memory-intensive conditions. Hopefully, this will
improve in future major release (2.3.x
and later).
But first, let’s look at what’s going on.
The OCaml memory model
The OCaml compiler provides a garbage collector. This module is able to track memory blocks used by the OCaml program and free them when they are not used without the programmer’s intervention.
This is done by scanning the memory currently allocated by the OCaml program to identify the memory blocks that are not in use anymore. While this is transparent to the user (you!), this also means that there will be extra CPU cycles dedicated to this operation.
How often these cycle occur help controlling the growth of unused memory but with the understanding that to minimize unused memory, more CPU cycles have to be dedicated to tracking it.
You can find more information about the OCaml garbage collection on this page.
Inside liquidsoap scripts, the operations that the OCaml compiler
provides to control the garbage collector are available within the
runtime.gc
module. The documentation for these operations
can be found in the OCaml Gc
module documentation.
Typically, to change the garbage collector parameters, one can do:
# This code was contributed by AzuraCast. Possible settings:
# - less memory: space_overhead = 20
# - less cpu: space_overhead = 140
# - balanced: space_overhead = 80
# Optimize for memory usage over CPU: this results in a slightly increased CPU
# usage and reduced memory usage.
runtime.gc.set(runtime.gc.get().{space_overhead=20, allocation_policy=2})
These parameters and functions make it possible to experiment and see if you can find better parameters for your application.
C memory allocations
Not all the memory in the application is allocated by the OCaml
garbage collector. External libraries such as ffmpeg
,
libmp3lame
and etc. need to allocate their own memory. This
is usually referred to as C memory allocations though it does
not have to be allocated by a program written in C
..
Another, more technically appropriate is heap memory though,
dynamically memory allocated by the OCaml garbage collector also lives
in the program’s heap.. 😅
This type of memory is also cleaned up by the OCaml garbage collector. To do so, a custom block is passed to the OCaml program with a reference to a C memory pointer and how to clean it up. When the OCaml program detects that this custom block is no longer in use, it triggers the required operations to clean its corresponding C memory.
However, things get complicated when considering how to fine-tune the garbage collector to account for memory allocated on the C side..
Remember that, as we discussed in the previous section, the garbage collector has to consume CPU cycles to free up memory. And, in the case of memory allocated on the C side, a single OCaml value (usually a small amount of memory) can actually refer to a much larger amount of C memory. This is typically the case when the corresponding C memory represents decoded video frames, which is usually a fairly large amount of memory.
In general, the trade-off is: if the garbage collector does not run often enough, a lot of these rather larger C memory blocks are lingering longer, which leads to potentially huge amount of memory needlessly consumed by the application.
Conversely, if the garbage collector runs too often, memory usage is controlled but CPU usage is increased.
As of now, the strategy implemented by the OCaml compiler consists in tracking the ratio of OCaml held memory vs. its corresponding C memory and running the garbage collector more often when this ratio increases. However, this is not optimal in cases where the application purposefully holds large amount of C memory such as when doing video processing.
In the future, we would like to explore tightening up our control of this mechanism. It should be possible trick the garbage collector by not declaring the full anmount of allocated C memory to make it possible to run the memory cleaning operations on purpose and at specific times, typically after a streaming cycle has ended.
Most of the tools for that are already exported in the scripting language so, we will make sure to report our progress on the blog for anyone to test it.
Audio data format
Another source of memory usage is the audio data format. By default, we store audio data using OCaml’s native floating point numbers in order to be able to run the application, including audio processing (crossfade, filters, fades etc) at the best possible speed and CPU usage. However, OCaml’s native float are stored using 64 bits (8 bytes), which is a large amount of memory per number.
If you are concerned with reducing your audio memory footprint, for instance if your applications has a lot of audio sources with buffers, you can do a couple of things:
- Use the ffmpeg raw content.
This means storing all the audio content as ffmpeg audio frames. This is an opaque format that works very well if your script can use ffmpeg end-to-end, for instance processing audio using ffmpeg filters..
- Use one of the
pcm_f32
orpcm_s16
audio format.
These formats are less opaque. Their data is stored in a C memory
array and can be accessed by the OCaml program. Some, but not all, of
our operators do support them transparently. When using
pcm_s16
, audio samples are stored as 16 bit signed integers
(2 bytes, the audio CD format). When using pcm_f32
, audio
samples are stored as 32 bit float (4 bytes). 16 bit signed integers is
probably enough for most applications and consumes 4 times less memory
than OCaml’s native floating point numbers.
The pcm_*
formats can be required by the encoders by
adding pcm_s16
or pcm_f32
to their list of
parameters. This will, in turn, inform all operators and decoders to
operate with this format, if they support it:
# Mp3 encoder, pcm_s16
encoder = %mp3(pcm_s16, channels=2)
# Ogg/opus encoder, pcm_f32
encoder = %ogg(%vorbis(pcm_f32))
# FFmpeg AAC encoder, pcm_s16
encoder = %ffmpeg(format="mp4",%audio(pcm_s16, codec="aac"))
For both pcm_*
and ffmpeg raw formats, you can use also
conversion functions (ffmpeg.raw.decode.*
,
ffmpeg.raw.encode.*
, audio.decode.pcm_*
,
audio.encode.pcm_*
) to convert content back and forth.
In general, working with the pcm_*
formats is easier. If
you know what you are doing, though, working with raw FFmpeg frames can
also have some advantages. In both cases, there might be an increase in
CPU usage if your script needs to process audio (for instance via a
crossfade
) when converting these formats back and
forth.
Finally, if you need to store large amount of audio data, for
instance to create a one hour delay, you should consider using the
track.audio.defer
operator which was designed for this
purpose.
jemalloc
Lastly, the user-land memory allocator jemalloc can be used to control all memory allocations (C and OCaml). This allocator is particularly good at preventing memory fragmentation, which is an important topic for an application like liquidsoap running short streaming cycle involving small amount of memory (FFmpeg frames etc).
The allocator is enabled by installing the jemalloc
opam
package and is included in all our production builds (except windows).
It also comes with a lot of customization options that are exported via
the runtime.jemalloc.*
functions.
If you want to explore more, we recommend reading about it and then exploring the manual page which contains details about all the available settings.