<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">THESIS</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Courdier_THESIS_2024/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Courdier, Evann</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">ambiguous segmentation</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">discrete diffusion</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">efficient transformers</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">future segmentation</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">patch pausing</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">real-time segmentation</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">semantic segmentation</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2="0">
			<subfield code="i">EXTERNAL</subfield>
			<subfield code="u">http://publications.idiap.ch/attachments/papers/2024/Courdier_THESIS_2024.pdf</subfield>
			<subfield code="x">PUBLIC</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2024</subfield>
			<subfield code="b">Ecole polytechnique fédérale de Lausanne (EPFL)</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2=" ">
			<subfield code="u">https://infoscience.epfl.ch/handle/20.500.14299/203213</subfield>
			<subfield code="z">URL</subfield>
		</datafield>
		<datafield tag="024" ind1="7" ind2=" ">
			<subfield code="a">10.5075/epfl-thesis-9858</subfield>
			<subfield code="2">doi</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">Deep learning has revolutionized the field of computer vision, a success largely attributable
to the growing size of models, datasets, and computational power. Simultaneously, a criti-
cal pain point arises as several computer vision applications are deployed on low-power
embedded devices, necessitating real-time processing capabilities. This challenge intensi-
fies for semantic segmentation, a dense prediction task demanding substantial memory
and computational resources. This thesis explores techniques to streamline real-time
segmentation networks, enhance their efficiency, and deal with potential ambiguity.
First, we introduce a latency-aware segmentation metric, a measure that combines the
mean Intersection over Union with the network processing time, providing a practical
metric for applied settings. Emphasis is placed on the concept of "anticipation" in real-
time networks - these systems should be capable of predicting future input segmentation.
Consequently, we then design an anticipatory convolutional network incorporating an
inventive convolution layer. This novel layer reduces computation by reusing features
from previous video frame computations, exploiting their temporal coherence. Next, we
present a method to accelerate transformer-based segmentation networks called ‘patch-
pausing’. This technique halts the processing of image patches deemed to be already
correctly segmented by assessing the network’s confidence in its prediction. Remarkably,
our experimental results indicate that more than half of the patches can be paused early in
the process, with a minimal impact on segmentation accuracy. This study concludes with
the introduction of a discrete diffusion model for segmentation. This model allows for the
sampling of multiple potential segmentations for a given input while accurately following
the training data distribution. Combining this diffusion model within an autoregressive
scheme, we successfully showcase its capacity to generate long-term future predictions of
segmentation.
The implementation and evaluation of these approaches contribute to the ongoing efforts
to improve real-time segmentation networks and facilitate more efficient deployment of
computer vision applications on low-power devices.</subfield>
		</datafield>
	</record>
</collection>