D1.1 reference platform_v1_20161215

TULIPP
H2020-ICT-O4-2015
Grant Agreement n° 688403
D1.1: Reference Platform v1
Authors
Organization Participant
THALES François Duhem
SUNDANCE Flemming Christensen
HIPPEROS Antonio Paolillo
RUB Lester Kalms
SYNECTIVE Magnus Peterson
IOSB Tobias Schuchert
NTNU Magnus Jahre
NTNU Ananya Muddukrishna
HIPPEROS Ben Rodriguez

REFERENCE:
TULIPP project – Grant
Agreement n° 688403
DATE: 15/12/2016
ISSUE: 1 PAGE: 2/47
Page 2 of 47
Document Description
Deliverable number D1.1
Deliverable title Reference Platform v1
Work Package WP1
Deliverable nature Report
Dissemination level Public
Contractual delivery date 2016-12-01
Actual delivery 2016-12-15
Version 1.1
Written by Approved by
Name
Signature
TULIPP consortium

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 3/47
Page 3 of 47
Version history
Version Date Description
1.0 2016-11-01 Draft deliverable for project-wide and
PO review
1.1 2016-11-30 Updates to draft

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 4/47
Page 4 of 47
Executive Summary
This deliverable presents the current progress towards defining the reference platform. It is the first in a
series of three documents that present refinements and extensions to the reference platform.
The reference platform is presented in the context of the starter kit, a conceptual package consisting of the
platform instance, project applications, and reference platform handbook. The aim of the starter kit is to
provide engineers with a generic evaluation platform that serves as a base for productively developing low
power image processing applications.
The platform instance is a physical processing system consisting of hardware, Operating System (OS), and
application development tools.
The reference platform handbook is a set of guidelines for low power image processing embedded systems.
We use guidelines as shorthand for the reference platform handbook.
Guidelines recommend application implementation methods supported by the platform instance. A
guideline is a goal-oriented, expert-formulated encapsulation of advice and recommended implementation
methods for low power image processing. A vendor platform that enables guidelines by providing suitable
implementation methods is called an instance. An instance is fully compliant if it provides recommended
implementation methods for all the guidelines that it supports. We envisage that compliance to guidelines
will be judged and certified by an independent body identified by the eco-system of stakeholders created
during the project.
A project-wide workflow formulates and evaluates guidelines within project applications. The workflow
uses available expertise and runs continuously during and beyond the project. Off the critical path, the
workflow improves existing technology to facilitate evaluation of guidelines and encourage vendors to
support recommended implementation methods in their instances.
The output of the workflow at the end of the project is the reference platform handbook. The handbook is
vetted and maintained by an ecosystem of stakeholders created during the project. The handbook is not
finalized at the end of the project and can be updated by anyone who uses the workflow, subject to approval
from the maintainers of the handbook.
The project applications are selected to demonstrate the effectiveness of the platform instance and
guidelines. Selected from the automotive, UAV, and medical image processing domains, the project
applications interpose challenging requirements and highly interesting design tradeoffs. Implementations
that meet the requirements can potentially have a lasting impact in their respective communities and act
as compelling success stories for the starter kit.
Since we have not formulated guidelines yet, the platform instance cannot be substantiated. However, an
initial platform is needed to evaluate guidelines and develop the platform instance. We have put together
the initial platform using our experience and prior work. The selected hardware component of the initial
platform is the Xilinx Zynq UltraScale+ MPSoC, the OS is the HIPPEROS RTOS, and the development tools are
those recommended by Xilinx for the Zynq MPSoC —Vivado and SDSoC.

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 5/47
Page 5 of 47
Table of Contents
List of Figures ......................................................................................................................................... 7
List of Tables........................................................................................................................................... 8
1 Introduction..................................................................................................................................... 9
1.1 Image processing embedded systems............................................................................................. 9
1.2 Starter kit....................................................................................................................................... 11
1.3 Contributions................................................................................................................................. 13
1.4 Structure........................................................................................................................................ 13
2 Reference Platform Handbook...................................................................................................... 14
2.1 Need for guidelines ....................................................................................................................... 14
2.2 Definition....................................................................................................................................... 15
2.3 Workflow to generate guidelines.................................................................................................. 18
2.4 Alignment with objectives............................................................................................................. 21
2.5 Handbook overview....................................................................................................................... 21
2.6 Related work.................................................................................................................................. 23
2.7 Summary........................................................................................................................................ 23
3 Project Applications....................................................................................................................... 25
3.1 Automotive image processing....................................................................................................... 25
3.1.1 Application selection ............................................................................................................. 25
3.1.2 Exploiting key performance/energy factors.......................................................................... 26
3.2 Medical image processing ............................................................................................................. 27
3.3 UAV image processing................................................................................................................... 28
3.4 Summary........................................................................................................................................ 30
4 Initial Platform............................................................................................................................... 31
4.1 Hardware components.................................................................................................................. 31

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 6/47
Page 6 of 47
4.1.1 Image processing hardware .................................................................................................. 31
4.1.2 Application requirements...................................................................................................... 35
4.1.3 Selection ................................................................................................................................ 37
4.2 Real-Time Operating System (RTOS)............................................................................................. 38
4.2.1 Application requirements...................................................................................................... 39
4.2.2 Selection ................................................................................................................................ 40
4.3 Development tools ........................................................................................................................ 41
4.3.1 Xilinx Vivado .......................................................................................................................... 42
4.3.2 Xilinx SDSoC ........................................................................................................................... 42
4.4 Summary........................................................................................................................................ 43
5 Conclusions.................................................................................................................................... 44
6 References..................................................................................................................................... 46

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 7/47
Page 7 of 47
List of Figures
Figure 1: Generic image processing embedded system................................................................................ 10
Figure 2: Block diagram of Xilinx Zynq UltraScale+ MPSoC designed for Embedded Vision applications. ... 11
Figure 3: Starter kit for low power image processing. .................................................................................. 12
Figure 4: Example of a guideline. .................................................................................................................. 15
Figure 5: Instantiating platforms based on guidelines.................................................................................. 16
Figure 6: The workflow to generate and evaluate guidelines that define the platform instance (critical path
in red) ............................................................................................................................................................ 18
Figure 7: Envisioned structure of the reference platform handbook. .......................................................... 22
Figure 8: Recent UAV application examples. Left: DHL Parcelcopter 3.0. Right: eHang 184........................ 28
Figure 9: Block diagram of an example SOC. Image Source: Xilinx Application Note XAPP1219.................. 32
Figure 10: SoC vendors divided by architecture............................................................................................ 33

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 8/47
Page 8 of 47
List of Tables
Table 1: Camera standards.................................................................................................................... 35
Table 2: High-end vision interfaces ....................................................................................................... 37
Table 3: RTOS and application requirements........................................................................................ 40
Table 4: HIPPEROS RTOS interfaces....................................................................................................... 41

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 9/47
Page 9 of 47
1 Introduction
The TULIPP project has the objectives to (a) define a reference platform for low power image processing, (b)
instantiate the reference platform for specific applications, (c) improve the performance of each application
on its reference platform instance, and (d) setup an ecosystem that advances image processing norms using
the reference platform. Due to its central role, the reference platform together with its components is the
foremost deliverable. It essentially underpins all project objectives. All efforts in the project contribute
towards building the reference platform, either directly or indirectly.
This document presents the current progress towards building the TULIPP reference platform. The document
is the first in a series of three documents that present refinements and extensions to the reference platform.
This chapter is structured as follows. First, we position the reference platform in two relevant contexts —
image processing embedded systems and a conceptual package called the starter kit. Next, we state
contributions of the project made so far towards building the reference platform. We describe the overall
structure of the document in the end.
1.1 Image processing embedded systems
Image processing embedded systems are instances of the reference platform.
Image processing embedded systems, also called embedded vision systems, literally see what is going on.
They capture static images or video of their environments through cameras and process them using signal
processing and artificial intelligence (AI) algorithms to understand and react to unfolding events. Due to the
wide-spread availability of cheap, efficient processors and cameras, image processing embedded systems
are part of mainstream technology today. Applications of image processing embedded systems include
automotive driver assistance systems (ADAS), vision-based medical devices, aerial surveillance, industrial
security and robotics, AI-based recognition, augmented reality, and home entertainment. With the growing
efficiency of processing and camera technology, image processing embedded systems are expected to
become a ubiquitous part of human society.
A high-level anatomy of a generic image processing embedded system is shown in Figure 1. Images captured
from cameras are processed using processors supported by memory. Both processors and memory are
specialized for high performance image processing. An example of a computing system that specifically
targets image processing applications is the Xilinx UltraScale+ MPSoC shown in Figure 2. Back to Figure 1,
application specific interfaces connect to display screens, input devices, and other parts of the larger
system. For some applications, heavy processing can be offloaded to the cloud via network connections.
However, cloud offloading is not suitable for many other applications such as those considered in our
project since they have hard real-time requirements with low latencies and response times of a cloud
system can vary widely. Also, cloud offloading introduces additional reliability and security issues in many
cases.

REFERENCE:
DATE: 15/12/2016
ISSUE: 1 PAGE: 10/47
Page 10 of 47
Therefore, cloud offloading is outside the scope of this project.
Figure 1: Generic image processing embedded system.

REFERENCE:
DATE: 15/12/2016
Page 11 of 47
Figure 2: Block diagram of Xilinx Zynq UltraScale+ MPSoC designed for Embedded Vision applications1.
1.2 Starter kit
The reference platform is bundled along with other project deliverables into a conceptual package called
the starter kit. The aim of the starter kit is to provide engineers with a generic evaluation platform that
serves as a base for productively developing low power high performance real-time image processing
applications. A high-level view of the starter kit is shown in Figure 3. It consists of the reference platform
handbook, project applications, and platform instance.
1
https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html

REFERENCE:
DATE: 15/12/2016
Page 12 of 47
Figure 3: Starter kit for low power image processing.
application development tools. Components of the platform instance are specialized for low power image
processing applications.
The project applications showcase the abilities of the platform instance. We have selected three
applications from automotive, medical, and Unmanned Aerial Vehicle (UAV) domains for the project. The
applications have challenging requirements and pose highly interesting design tradeoffs that are typical in
the world of low power high performance real-time image processing.
The reference platform is presented as a set of guidelines called the reference platform handbook. For
the rest of the document, we use guidelines as shorthand for the reference platform handbook.
Guidelines are focused on low power image processing. Structurally, they are an encapsulation of advice
and recommended implementation methods distilled from expert knowledge. Guidelines enable productive
application development by pruning infeasible paths from vast design/implementation spaces and by
ensuring inexperienced engineers avoid mistakes that are detected late and are costly to correct.
Guidelines influence the platform instance and project applications as shown in Figure 3. The platform
instance provides implementation methods recommended by guidelines. We define this process as
instantiation. Productive development of project applications is enabled through the advice offered by
guidelines. Therefore, the platform instance and project applications are demonstrations of the
effectiveness of guidelines.

REFERENCE:
DATE: 15/12/2016
Page 13 of 47
1.3 Contributions
The main contributions of this deliverable to the TULIPP project are:
1. A conceptual package of project contributions called the starter kit consisting of the platform instance,
project applications, and reference platform handbook.
2. Conceptual definition of the reference platform handbook i.e., guidelines for building low power
image processing applications. The definition is aligned with all project objectives.
3. A project-wide workflow to formulate guidelines and extensively evaluate them. The workflow uses
available expertise and runs continuously during and beyond the project.
4. A detailed description of project applications selected to demonstrate the effectiveness of the
platform instance and reference platform handbook
5. A detailed motivation of selection criteria and key features of the components of an initial platform
that serves as base for the platform instance.
1.4 Structure
The rest of the document is structured to describe the starter kit in detail. Chapter 2 presents the conceptual
definition of the reference platform handbook i.e., guidelines, the implications of the definition, and a
project-wide workflow to formulate and evaluate guidelines. We present guidelines first because of their
strong influence on the rest of the starter kit i.e., on the platform instance and project applications. Chapters
3 describes the project applications. Chapter 4 describes the initial platform that serves as a base for the
platform instance. Chapter 5 presents conclusions and future work.

REFERENCE:
DATE: 15/12/2016
Page 14 of 47
2 Reference Platform Handbook
In this chapter, we present the conceptual definition of the reference platform handbook, a part of the
starter kit. We repeat that we use guidelines as shorthand for the reference platform handbook.
The chapter is arranged as follows. First, we motivate the need for guidelines. Next, we concretely define
guidelines and explain the implications of the definition. Next, we present a project-wide workflow to
produce guidelines. Following that, we explain how the guidelines are aligned with project objectives. Then,
we present an overview of the handbook of guidelines delivered at the end of the project. We discuss
related work that focus on guidelines and summarize in the end.
2.1 Need for guidelines
Low power high performance real-time image processing embedded systems operate in constrained
environments. Typical constraints are low power and limited footprint, but other constraints may also be
present. For instance, high reliability and security can also be constraints. Example applications include
drone-based surveillance systems and portable medical devices. These systems place low power
consumption as a first-class requirement in addition to high performance and a small form factor.
The requirements of low power high performance real-time image processing embedded systems are
typically met by a complex combination of development steps involving proper component selection and
extraction of high performance from processors and memory. As an example, consider the steps involved
in extracting high performance from processors and memory. Teams of engineers implement high
performance image processing algorithms in software and custom hardware accelerators, intelligently
schedule computation on heterogeneous execution resources (see Figure 2 for resources exposed to
engineers), make efficient use of an increasingly complex memory hierarchy, decide on low power settings
for the OS etc. Low-level and non-intuitive performance analysis and debugging methods typically add
several trial-and-error iterations within each step, even for experts, or may also involve complex model-
based analysis techniques.
Guidelines can help teams manage the complexity of building low power high performance real-time image
processing embedded systems. Guidelines provide goal orientation by distilling expert knowledge into
advice (what to do), recommended implementation methods (how) and may also involve recommended
tools and analysis methods. This not only prunes infeasible paths from vast design and implementation
spaces, but also ensures that inexperienced engineers avoid mistakes that are detected late and are costly
to correct. If the mechanism to distill guidelines is made explicit, then a common collection of guidelines
can be updated by anyone with new insights, simplifying knowledge sharing within the low power image
processing community.

REFERENCE:
DATE: 15/12/2016
Page 15 of 47
2.2 Definition
A guideline is an encapsulation of an advice and a recommended implementation method. The advice
captures expert insights in a precise, context-based formulation and orients the follower (the person
reading the advice) towards a goal. The recommended implementation method suggests interfaces and
steps that work well in practice to implement the advice.
Both the advice and the recommended implementation methods are supported by theoretical and
experimental evidence that is either gathered within the project or is pre-existing in the community. The
advice in a guideline is essentially inarguable since it is based on proofs and facts. However, it is not
necessary to treat the recommended implementation method as a strict rule. Using alternative
implementation methods or new tools is allowed and encouraged.
Figure 4 contains an example of a guideline for obtaining high performance on multicore processors with
vector units.
In the guideline example in Figure 4, the advice orients the follower towards the goal of achieving high
performance in the specific context of multicore processors with vector units. The advice advocates
simultaneously exploiting all execution resources for higher performance and provides experimental
evidence as support. The recommended implementation method points to OpenMP as a productive choice
and links to usage examples.
Guidelines stem from expert insights in image processing and embedded system domains and cover
performance, power, and productivity aspects. Generation and evaluation of guidelines is discussed in detail
in Section 2.3.
Guidelines may overlap or exclude one another as a natural consequence of the vast design and
implementation spaces for low power image processing. Overlapping guidelines are those whose advice are
based on the same underlying principle, or those whose recommended implementation methods are the
same. Exclusion occurs when a pair of guidelines point in different but equally competent directions to reach
the same goal.
 Advice: Exploit both vectorization and multithreading for high performance on multicore
processors with vector units such as the ARM Cortex A9. On these architectures, utilizing all
hardware execution resources is key to achieve high performance [2] [4, 5].
 Recommended implementation method: Use OpenMP. OpenMP is a widely supported parallel
programming API that enables programmers to express vectorization and multithreading
operations concisely using compiler directives. Programmers need not worry about specifying
scheduling and synchronization operations in code. These are handled transparently by the
OpenMP runtime system. See the official OpenMP examples[6] to understand in more detail
about exploiting vectorization and multithreading simultaneously.
Figure 4: Example of a guideline.

REFERENCE:
DATE: 15/12/2016
Page 16 of 47
Guidelines are targeted towards both developers and vendors. Developers conform to the guidelines by
paying heed to the encapsulated advice and considering recommended implementation methods. Vendors
ensure recommended methods are available in their products to attract developers. If implementing
recommended methods is infeasible due to constraints, then developers and vendors can consider
alternative methods.
We call the process in which a vendor identifies or implements recommended methods compliant with a
relevant subset of guidelines as instantiation, as shown in Figure 5. The output of instantiation process is an
instance. The platform instance is an example of instantiation if the project consortium is considered as a
vendor.
Figure 5: Instantiating platforms based on guidelines.
A vendor-supplied instance can be partially compliant or fully compliant with the guidelines. A partially
compliant instance provides alternative methods to implement guidelines, whereas a fully compliant
instance provides only recommended implementation methods for the guidelines it supports. Both
instances in Figure 5 are partially compliant. The platform instance is an example of a fully compliant
instance since it provides only recommended implementation methods for the guidelines it supports.

REFERENCE:
DATE: 15/12/2016
Page 17 of 47
We have tentatively decided that compliance to guidelines will be judged and certified by an independent
body identified by the eco-system of stakeholders created during the project. A more certain decision will
be taken as the project matures.
It is important for vendors to note that an effort to support the complete set of guidelines is not
recommended. Support for all guidelines will likely lead to an over-designed instance that is no longer
relevant to the application and violates hard constraints such as cost and total power consumption.

REFERENCE:
DATE: 15/12/2016
Page 18 of 47
2.3 Workflow to generate guidelines
A key contribution of this deliverable is a workflow to produce guidelines. The requirements of the workflow
are that that it should (a) generate guidelines for low power image processing, (b) evaluate the guidelines,
and (c) be aligned with all project objectives. The workflow is illustrated in Figure 6. As evident from the
illustration, the workflow captures project-wide effort towards a common vision.
We explain the different steps involved in the workflow starting with the critical path next.
Figure 6: The workflow to generate and evaluate guidelines that define the platform instance (critical path in red)

REFERENCE:
DATE: 15/12/2016
Page 19 of 47
1. Image processing and embedded system domain expertise: The workflow starts by gathering insights
from experts in the domains of image processing and embedded systems. Insights refer to deep,
underpinning knowledge that come from experience and modeling. The expertise required to produce
insights is not restricted to the project partners. External experts such as those in the advisory board
can also contribute with insights. Insights have no restrictions on format and can be expressed in any
enlightening manner. At present, the areas of interest for insights are low power, high performance,
and productivity. More areas will be considered as the project matures.
As a running example of the workflow, consider the insight expressed by an embedded systems expert
as a single line of text:
2. Guideline formulation: Gathered insights are analyzed and formulated into guidelines. This involves
judging the context and orientation of insights, translating them into advice, and deciding on a
recommended implementation method. Translation is necessary since guidelines are goal-oriented,
precise, and context-based whereas insights have no restrictions on their format. There is no one-to-
one mapping between a guideline and an insight. Many insights can coalesce to produce a guideline,
or a single insight can be fertile enough to produce several guidelines.
Continuing the running example, the insight is translated into the following guideline:
3. Application development: In this step, guidelines formulated are implemented within the applications
and evaluated for impact. In the context of the project, the applications are the project applications
part of the starter kit. Guidelines are implemented either using recommended or alternative methods.
We assume that interfaces from existing technology are adequate to implement guidelines, but also
recognize that existing interfaces might lack productivity. In the context of the project, technology to
implement guidelines is made available through the platform instance. Guidelines are evaluated for
goal orientation, the suitability of the recommended implementation method, and arguments for
choosing alternative methods. We expect that all formulated guidelines will be evaluated in one or
more applications since they have a common driving theme – low power high performance real-time
image processing. Once all guidelines are evaluated, the workflow ends.
Insight: Real-time OS (RTOS) provide determinism often at the cost of performance.
 Advice: Balance between deterministic execution and performance while setting real-
time OS (RTOS) parameters[1].
 Recommended implementation method: Configure the RTOS using vendor suggested
methods[3]. Test and document candidate configurations extensively.

REFERENCE:
DATE: 15/12/2016
Page 20 of 47
An excerpt from a possible evaluation of the guideline formulated in the running example could read
as follows:
4. Technology development: This is the only step of the workflow that is not on the critical path. In this
step, existing technology is improved to enable productive implementation of guidelines during the
application development step. The stimulus to improve existing technology comes from application
developers in the form of technology improvement ideas. The improvements aim to increase the
productivity of recommended implementation methods to levels that can compete with alternative
methods. When recommended implementation methods promise high productivity, developers need
not waste time deciding on more productive alternatives. Vendors are more likely to provide
recommended implementation methods in their platforms to gain higher compliance. In the context
of the project, technology improvements are delivered through the platform instance.
Examples of potential technology improvements include application specific evaluation boards, high
performance library routines and custom IP blocks for image processing, low power enabling patches
to RTOS schedulers, improved performance analysis techniques for Zynq SoC, workflow tweaks for
Xilinx SDSoC etc.
The HIPPEROS configuration plugin provided by the project for Xilinx SDSoC 2016.2 is the technology
improvement demonstrated in the running example in the previous step.
The workflow is not iterative since there are no feedback paths or cycles. Feedback is unnecessary since
guidelines are based on proofs, facts and expertise. In addition to being feed-forward, the workflow
proceeds in a pipelined manner. Guideline formulation starts once a few insights are available. While
experts are in the process of extracting new insights and guidelines are being formulated using available
insights, application developers continue to build applications to satisfy requirements using available
expertise and interfaces from existing technology. Whenever guidelines become available, application
developers shift to evaluating them and at the same time benefit from the orientation offered. Technology
development starts when application developers realize that recommended implementation methods in
guidelines have lower productivity than alternatives and present ideas for improvement.
The medical image processing application missed 20/100 deadlines when the default Linux
kernel 4.7.2 provided by Xilinx was used. No deadlines were missed under the HIPPEROS RTOS
configured with the Least Laxity First scheduling policy, but performance was restricted to 8
FPS, well under the required 24 FPS goal. Enabling multicore execution (SMP) and a hybrid
scheduling policy with both data locality and laxity awareness increased performance to 28
FPS with no deadlines missed – a clear win for HIPPEROS. We configured HIPPEROS using the
HIPPEROS configuration plugin provided by the project for Xilinx SDSoC 2016.2.

REFERENCE:
DATE: 15/12/2016
Page 21 of 47
Since the workflow to generate guidelines is explicit, the set of guidelines defined during the project is not
fixed. It can grow continuously by including new guidelines, by evaluating guidelines in new applications,
and by making implementation methods more productive. This enables the outcomes of the project to have
a long-term and far-ranging impact. We envision that the workflow will be administered beyond the project
by the ecosystem of stakeholders created during the project.
2.4 Alignment with objectives
Guidelines are aligned with the objectives of the project. We summarize the alignment per objective taken
verbatim from the project proposal below.
Objective 1: Define a reference platform for low power image processing applications.
Guidelines are focused towards low power high performance real-time image processing and stem
from experts in image processing and embedded systems. Guidelines drive the development of the
platform instance.
Objective 2: Instantiate the reference platform through use-case applications.
Guidelines influence both platform instance and project applications (Figure 3).
Another alignment with the objective is a concrete definition of the process of instantiation — vendors
providing suitable implementation methods for relevant guidelines in their platforms (Section 2.2).
Objective 3: Demonstrate and plan improvements of defined key performance indicators.
Application developers use the orientation offered by guidelines to move towards improved key
performance indicators.
Objective 4: Start up and manage an ecosystem of stakeholders to extend image processing norms.
The ecosystem built by the project consortium involves developers, vendors, and those interested to
generate guidelines through the workflow. Guidelines generated during the project are vetted by an
advisory board consisting of stakeholders identified by the project. Distilling expert insights into goal-
oriented guidelines that are evaluated on relevant applications is a solid example of advancing image
processing norms.
2.5 Handbook overview
A preliminary vision of the table of contents of the reference platform handbook delivered at the end of the
project is shown in Figure 7. Handbook chapters are based upon abstraction layers and workflows involved
in typical hardware-software co-design of image processing embedded systems. Each chapter contains
related guidelines. The section Local and cloud processing is left as a placeholder since cloud processing is

REFERENCE:
DATE: 15/12/2016
Page 22 of 47
not studied in the project. Evaluations of the guidelines in project applications are presented in case-study
chapters.
Figure 7: Envisioned structure of the reference platform handbook.
The handbook will be written continuously during the project. Updates will be made to it as and when a
guideline is formulated or evaluated. The advisory board will periodically be presented with newer versions
of handbook for review. Incomplete versions of the handbook can also be released to public reviewers to
judge acceptability and get feedback.
The handbook is not finalized at the end of the project. Rather it is living document that welcomes updates
beyond the project timeline from the ecosystem of developers, vendors, and guideline generators. We
expect updates will be in the form of newer guidelines, updates to recommended implementation methods,
and new case-studies. A good platform for facilitating continuous updates to the handbook is GitBook[7].
We envision that the handbook will be maintained beyond the project by the ecosystem of stakeholders
created during the project.
The long-term impact of the handbook will be in the areas of productivity and standards for low power high
performance real-time image processing embedded systems. Guidelines present expert insights in a goal-
oriented form to developers and vendors. This has the potential to significantly improve productivity for
developers since costly mistakes are avoided and vast design/implementation spaces pruned. When
developers express increasing affinity to guidelines, vendors will strive for compliance by providing suitable

REFERENCE:
DATE: 15/12/2016
Page 23 of 47
implementation methods, naturally leading to standardization of low power high performance real-time
image processing platforms.
2.6 Related work
Guidelines are everywhere in the world of image processing and embedded systems. Common examples of
disseminating guidelines include methodology descriptions[8-10] , standard specifications[5-7], best
practice accounts[11, 12], online blogs[13, 14], peer-reviewed publications[12-16], technical reports[15,
16], training material[17-19], and books[1, 9, 20-22]. However, to the best of our knowledge, there is no
dedicated set of guidelines for low power high performance real-time image processing embedded systems,
and the existing guidelines mainly concentrate on the hardware and fail to provide a more comprehensive
view of the whole system in a hardware-software co-design approach. This lack provides additional impetus
for us to include guidelines in the starter kit.
We compare related work closest to our definition of guidelines below.
• The Xilinx UltraFast Design Methodology series[8] is a set of embedded system design guidelines that
helps developers avoid pitfalls, increase system performance, and adopt productive workflows. The
series collects best practices, key principles, specific do’s and don’ts from experts within Xilinx and
outside. Vendors who decide to adopt the guidelines can choose to be trained and certified by
Xilinx[23]. The guidelines defined in the project has similar intentions as the Xilinx UltraFast
Methodology series but is focused towards low power image processing.
• Low power methodology manual[9] is a book from Synopsys about power efficient SoC design written
by expert engineers at the time when low power was turning into a major concern for SoC designers.
The book concentrates on the decisions that engineers need to make while designing low power chips
and provides implementation recommendations and pitfall-avoiding advice. This is similar in spirit to
the orientation and recommendations offered by our guidelines.
• The Zynq Book[4] is a book that focuses on the Xilinx Zynq platform. The book is written to be a
friendly, comprehensive guide to newcomers and an authoritative reference for experts. The initial
chapters cover important questions such as what the Zynq platform is capable of and what are the
intended application areas? Later chapters provide a detailed guide for typical design flow processes
such as hardware-software partitioning, HLS, IP integration, and integration of Operating Systems.
Although the book is not explicitly structured as a set of guidelines, the reader does feel well-oriented
in the Zynq landscape. We aim to deliver guidelines in the same focused, friendly, and authoritative
manner as the Zynq Book.
2.7 Summary
In summary, we have presented the conceptual definition of guidelines for building low power high
performance real-time image processing embedded systems. We use guidelines as shorthand for reference
platform handbook, a part of the starter kit.

REFERENCE:
DATE: 15/12/2016
Page 24 of 47
Guidelines are goal-oriented, stem from expert insights, and encapsulate both advice and a recommended
implementation method. Guidelines will be maintained in a handbook by the ecosystem of stakeholders
created during the project.
We presented a workflow that generates guidelines using available expertise and evaluates them within
applications. Assuming the ecosystem of stakeholders administers the workflow beyond the project
timeline, the workflow enables the handbook of guidelines to grow continuously by including new
guidelines, by evaluating guidelines in new applications, and by making implementation methods more
productive. This enables the TULIPP project to have a long-term and far-ranging impact.
Guidelines are vetted by an advisory board of experts, and targeted towards developers and vendors.
Developers pay heed to the guidelines and consider recommended implementation methods whose
productivity is comparable with alternative methods. Vendors ensure recommended methods are available
in their products to attract developers.
A vendor platform that enables guidelines by providing suitable implementation methods is an instance
that can be certified for compliance. We have tentatively decided that compliance will be certified by an
independent body identified by the ecosystem of stakeholders created during the project. A more certain
decision will be taken when the project matures. The platform instance is an instance that is fully compliant
since it only provides recommended implementation methods for the guidelines it supports.

REFERENCE:
DATE: 15/12/2016
Page 25 of 47
3 Project Applications
The chapter presents the project applications part of the starter kit. These applications are from
automotive, medical, and UAV image processing domains. In the sections that follow, project applications
are discussed by domain. The discussion concerns with selection criteria and key performance/energy
factors of project applications.
3.1 Automotive image processing
The automotive industry has traditionally been quite slow in adopting new technologies and major break-
throughs have been rare. However, this has changed over the last decade due to advances in sensor
technology and the increasing processing power of small, low power embedded systems. Radars and
cameras have become more common, not only in driver assistance systems but also in safety critical
applications like automatic emergency braking. The first cars with rudimentary automatic driving
capabilities have entered the market. Indeed, the automotive industry is in a ground-breaking development
phase.
In the field of ADAS – Advanced Driver Assistance Systems – many of the systems are based on vision
technology. These are applications like lane assist, traffic sign recognition, adaptive speed control,
pedestrian and vehicle detection – first introduced as luxury features but now soon required to get 5 stars
in Euro NCAP safety rating. This means that most new cars in a few years will have these applications, calling
for small, low power, real-time and high performance solutions.
The availability of new vision based systems and sensors has also accelerated development towards
autonomous cars that will be the final game changer. These will rely on combinations of vision, radar and
other sensors to make the car aware of the environment around it and to keep a high security level.
Vision technology in the automotive segment is rapidly growing and will have a never-ending need for low
power, high performance systems. Vision algorithms used are normally sophisticated, computation
intensive, and require low latency. When coupled with small size and low power consumption constraints
– common requirements for all equipment in cars, automotive vision applications create extreme design
challenges for developers.
3.1.1 Application selection
An important and widely-used component of automotive vision systems is the object classifier, used to
detect pedestrians, vehicles, traffic signs etc. A real-time object classifier is one of the most computing
intensive vision components used in automotive systems. Strong requirements on both physical size and
power consumption makes the challenge of building good real time object classifiers even harder.
The automotive application selected for evaluation in the project as a reference is a vision-based pedestrian
detection system, targeting applications like automatic emergency braking and collision avoidance.

REFERENCE:
DATE: 15/12/2016
Page 26 of 47
3.1.2 Exploiting key performance/energy factors
The most important performance/energy factors for classifiers used in automotive vision applications are:
 Processing speed in frames per second: The time limit for a frame is fixed and set based on external
factors. The classifier is required to process a frame in within one frame time unit. Typically, the
classifier shares the available compute power with other applications running on the same
platform. This division impacts the frame rate.
Ensuring a good flow of data and exploiting parallelism can increase the processing speed. Detailed
profiling of computation and memory access characteristics enables wise partitioning of the
classification algorithm into different execution resources, including custom accelerators on the
FPGA fabric.
Our experience tells that most parts of the pedestrian classifier application need to be accelerated
on the FPGA fabric for performance. In practice, this means implementing parts of the classifier in
hardware (using VHDL or Verilog), and setting up communication interfaces between software and
hardware implementations. Another approach would be to use high-level synthesis tools such as
Xilinx SDSoC that enable programming the FPGA in C/C++. A mix of high-level synthesis and
VHDL/Verilog approaches is also possible.
We also know that the classifier uses several parallel streams of scattered memory accesses. Finding
a good caching pattern is therefore important for performance. This will be a tradeoff between fast,
local memory on the FPGA, and the slower, but considerably larger, external memory.
 Latency: Low latency is required for the system to take actions in time. Since classification is used
in safety critical applications, low latency is an important enabler of early warnings. Typical classifier
implementations are pipelined for frame rate performance. However, even if the number of frames
processed per second is high due to pipelining, the latency can still be prohibitively long due to deep
pipelining.
 Low-power constraints: Car manufacturers normally set strict power consumption limitations on
individual ECUs (Electronic Control Units). Therefore, low power consumption is a strong, general
constraint. In addition, automotive vision units are typically mounted behind the rear mirror where
the space is extremely limited and cooling poor. Keeping the temperature of the unit within
reasonable limits requires power consumption to be kept at a minimum.
Power consumption should preferably be monitored with 5ms resolution. This enables
understanding power consumption variations during one frame time which is typically 33ms. It is
also beneficial if power measurement is done at a higher granularity, for example be separated
across sub-systems or components, to isolate problems and target optimizations.
 Detection performance: The detection rate, i.e., percentage of pedestrians detected, must be
extremely high to enable use in safety critical applications. At the same time, the false detection
rate must be low enough for the system not be annoying for the driver. Note that detection

REFERENCE:
DATE: 15/12/2016
Page 27 of 47
performance is treated as a constant in the classifier application and will not be improved during
the project.
3.2 Medical image processing
Medical imaging devices visualize body parts, organs, tissues, and cells for clinical diagnosis, disease
monitoring, and treatment. Techniques used in medical imaging devices include optical imaging, nuclear
imaging, and radiology. The radiology technique renders anatomical and physiological views of the human
body at a very high spatial and temporal resolution on ultrasound, CT scan, X-ray, MRI and similar devices.
These devices require high processing capabilities, whether to give the result in the shortest possible time
after the examination or to provide practitioners with real-time imaging during an operation. Also, the
medical imaging domain is subject to strong regulations to guarantee patient safety.
The medical imaging application selected as a reference is a mobile C-arm, a surgical X-ray device embedded
with real-time image processing capabilities. The device will replace C-arms with external processing units,
which usually are powerful, bulky computers. The mobile C-arm is destined for use in operating rooms,
where compact size and real-time monitoring of the patient are important requirements. Since X-ray
sensors are highly sensitive to heat, low power consumption and low heat dissipation are first-class
constraints. Additionally, higher processing power is required to process noisy images and provide useful
information to the surgeon. Images are noisy since the radiation dose administered to the patient is reduced
per new safety regulations.
The mobile C-arm application is a good example of typical medical imaging applications since it demands
high performance, real-time guarantees, and low power consumption. It also reflects the typical need for
integrating processing units close to sensors to decrease latency and size. Compact size is an important
requirement for medical imaging devices used in operating rooms plagued by overcrowding and the
presence of many instruments close to the surgeon and the patient table.
The following performance/energy factors are key to medical imaging applications:
 Parallelism: This is essential to sustain high throughput and hard real-time performance for all
frames. If the algorithm chain is entirely dataflow, then pipelined parallelism can be used. Using
heterogeneous processing units and custom hardware accelerators is also a good way to exploit
parallelism while improving energy efficiency compared to the same level of parallelization on a
homogeneous device. Typical techniques for exploiting parallelism include high level synthesis for
FPGA-based acceleration, rewriting portions of the code in OpenCL when targeting GPGPUs, and
using OpenMP for multicore platforms.

REFERENCE:
DATE: 15/12/2016
Page 28 of 47
 Memory optimizations: Carefully studying memory access patterns and partitioning data structures
accordingly is crucial to improve overall performance in terms of both frame rate and latency. Also,
streaming input from sensors (for example: Gigabit Ethernet links) directly to processing units
without going to shared memory reduces latency while placing fewer constraints on the memory
subsystem.
 Accuracy: Medical imaging systems must be accurate and not introduce any artifacts in processed
images per safety regulations. In the mobile C-arm application, accuracy of one LSB (Least
Significant Bit) compared to a golden reference model is required. The golden reference model is
implemented using single-precision (16-bit) floating-point numbers. Fixed-point implementation
can be used for the whole application if required accuracy is always verified. Fixed-point
implementations can significantly reduce power consumption in the system.
3.3 UAV image processing
In recent years, the use of Unmanned Aerial Vehicles (UAVs) has increased significantly. Currently, most
important markets for UAVs are aerial video/photography, precision farming, and surveillance/monitoring.
Ongoing technical improvements in size, endurance, and usability of UAVs enables their use in more areas.
For example, UAV prototypes for delivering goods (DHL or Amazon) or carrying people (eHang) are available
as shown Figure 8.
UAV applications require fast on-board processing, low weight, and low power consumption. These have a
direct impact on endurance of a UAV, a key selling point. Power hungry processing requires a larger battery
to maintain the same endurance, but this increases the weight of the UAV.
Figure 8: Recent UAV application examples. Left: DHL Parcelcopter 3.02. Right: eHang 1843.
UAVs are equipped with several sensors such as laser scanners and cameras to monitor the environment.
One important aspect in this monitoring process is to detect objects in the flight path to avoid collisions.
2
Credit Deutsche Post DHL (http://www.dpdhl.com/en/media_relations/specials/parcelcopter.html)
3
Credit eHang (http://www.ehang.com/ehang184)

REFERENCE:
DATE: 15/12/2016
Page 29 of 47
The selected UAV reference application estimates depth in images captured from stereo cameras mounted
on the UAV and oriented in the direction of flight. Estimated depth information is used to detect objects in
the flight path and avoid collisions. Depth estimation is done on-board in real-time. The application is
applicable in other domains that need depth estimation as well.
Stereo cameras, laser scanners, and ultra-sonic sensors are commonly used in existing depth estimation
methods. Stereo cameras are not as robust as laser scanners, but weigh less and are cheaper. Furthermore,
depending on the baseline (distance between the mounted cameras), stereo cameras have a greater range
for detecting objects compared to a laser scanners or ultra-sonic sensors. Additionally, the high frame-rate
of stereo cameras enables higher detection rates and higher flight speeds at which obstacles can be avoided.
Recently, Barry and Tedrake [24] proposed a collision detection system for drones that works at 120 frames
per second and is based on stereo cameras. Their system requires complex configuration of sophisticated
processing modules such as an inertial measurement unit (IMU) and a state estimator to calculate a reliable
local depth map. In the application selected for the project, we aim to build a simpler and more user friendly
(only few easy-to-understand configuration parameters) system, with equally sophisticated and adjustable
depth map generation, running at real-time on low weight embedded hardware with low power
consumption.
The application uses Semi Global Matching (SGM), a well-known and robust algorithm for depth estimation.
Implementations of SGM for architectures such as multicore processors (based on OpenMP), GPGPUs
(OpenCV, OpenCL) and FPGAs exist. However, comparisons of specific implementations and guidelines for
choosing relevant architectures are not available to the best of our knowledge. Our aim is to further improve
the performance of the SGM algorithm and gain a deeper understanding of the factors which influence its
power consumption and performance.
The key performance/energy factors of typical UAV applications are:
 Temporal and spatial resolution: The temporal resolution determines how many frames per
second can be computed by the system. This has an impact on the possible detection rates, which
in turn determines how fast a UAV can fly while performing automatic obstacle avoidance. The
spatial resolution, i.e. the number of pixels per frame, has significant impact on the accuracy of
the solution and the size of objects that can be detected. An important tradeoff is image
resolution. Higher resolution images enable detecting smaller objects, but decrease detection
rates because the computation time for depth estimation increases.
Temporal and spatial resolution depends on the requirements of the UAV application. Some UAVs
move slowly, but need high detection accuracy, either because the equipment is expensive, or the
area in which to maneuver is small. In contrast, there are UAVs that move very fast, yet have
enough space to maneuver. In such cases, accuracy is less important and therefore high temporal
and low spatial resolution is sufficient.

REFERENCE:
DATE: 15/12/2016
Page 30 of 47
Temporal and spatial resolution also have a significant influence on power consumption. A
thorough analysis of power consumption for different parameter sets affecting the accuracy of
the algorithm, for example different temporal and spatial resolutions, size of smoothing filters or
neighborhood operators is crucial.
 Latency: Low latency is important to ensure fast reaction time. For example, during an obstacle
avoiding maneuver. Using high performance camera interfaces, such as Camera Link modules, is
necessary to minimize the latency of camera streams and subsequent processing.
 Data precision: Assumptions about data precision highly effect accuracy. Using integer operations
requires less power but reduces accuracy compared to computing with floating-point operations.
The tradeoff to explore is accuracy vs. power consumption using integer instead of floating-point
computations.
3.4 Summary
The chapter presented project applications, a part of the starter kit. The presentation involved selection
criteria and key performance/energy factors of project applications from automotive, UAV, and medical
image processing domains. The project applications present challenging requirements and highly
interesting design tradeoffs
The automotive application — a pedestrian classifier — represents a disruptive embrace of vision-based
technology by the automotive market. The application demands low latency processing and high-quality
heat management to pass safety regulations. There are several key tradeoffs to explore, including pipeline
processing and memory partitioning.
The medical application — a mobile X-ray C-arm — aims to provide surgeons with high-quality X-ray images
at real time despite new safety regulations that reduce radiation dose administered to patients resulting in
noisier input images. High-quality heat management is crucial to isolate heat-sensitive X-ray sensors in close
vicinity. Key design decisions involve numerical accuracy and streaming of data.
The UAV application — onboard depth estimation — is a fundamental step towards enabling autonomous
flight in UAVs. The application demands high-performance processing and low weight. Design tradeoffs
involving image resolution, weight, and numerical accuracy promise exciting challenges ahead.
It is yet to be determined how project applications will be delivered in the starter kit. Source code of project
applications will likely not be included due to IP restrictions.

REFERENCE:
DATE: 15/12/2016
Page 31 of 47
4 Initial Platform
The chapter motivates the selection criteria and key features of the initial platform that serves as a base for
the platform instance, a part of the starter kit.
Recall that the platform instance is a physical processing system consisting of hardware, Real-Time
Operating System (RTOS), and application development tools. Components of the platform instance are
specialized for low power high performance real-time image processing applications. Guidelines
recommend the application implementation methods supported by the platform instance.
Since we have not formulated guidelines yet, the platform instance cannot be substantiated. However, a
base platform is needed to evaluate guidelines and develop the platform instance. This base platform is
called the initial platform. We have put together the initial platform using our experience and prior work.
4.1 Hardware components
The aim of the section is to motivate the choice and key features of hardware components in the initial
platform. We first start with insights on processing technologies that enable low power image processing
embedded applications. Next, we discuss hardware requirements of typical applications. In the end, we
highlight features of selected hardware components.
4.1.1 Image processing hardware
4.1.1.1 System-on-Chip (SoC)
A central component of any image processing embedded system is the CPU (Central Processing Unit) or in
more humanist terms, the brain! The latest development in the world of CPUs is the System-on-Chip (SoC).
The block diagram of an example SoC is shown in Figure 9. SoCs exist since integration technology is mature
enough to move multiple Input/Output (I/O) functions into a single chip/device.

REFERENCE:
DATE: 15/12/2016
Page 32 of 47
Figure 9: Block diagram of an example SOC. Image Source: Xilinx Application Note XAPP12194.
The SoC market is overwhelmingly large. Figure 10 covers vendors that offer a SoC product in the open
market and is available for purchase from mainstream distributors without large commitments for
quantities.
4
http://www.xilinx.com/support/documentation/application_notes/xapp1219-system-performance-modeling.pdf

REFERENCE:
DATE: 15/12/2016
Page 33 of 47
Figure 10: SoC vendors divided by architecture
The ideal SoC for the initial platform should have heterogenous multicore processors (at least four 64-bit
cores and a few 32-bit cores); a dedicated graphics processing unit; support for custom hardware; flexible,
standard interfaces to cameras, sensors, and human interaction devices; and peak power consumption
within typical low power device limits.
We discuss SOCs that are suitable for the initial platform under these broad categories: homogeneous
MPSoCs (MultiProcessor SoCs), Graphics SoCs, DSP SoCs, and hybrid SOCs.
 Good candidates among homogeneous MPSoCs for image processing are the low-end AMD/Intel
CPUs (x86) and NXP’s family of iMX6 CPUs (32-bit ARM). The main attractive features are low cost
and the availability of mature software development stack. However, homogeneous MPSoCs only
scale to a handful of cores and are power-hungry.

REFERENCE:
DATE: 15/12/2016
Page 34 of 47
 GPU SoCs integrate GPU technology on chip and are relatively new in the market. For example,
NVIDIA® recently launched a GPU SoC called the X15
. This SoC features low power, low performance
(compared to a full-fledged GPU card), and standard I/O interfaces.
 DSP SoCs combine DSP cores in a SoC package. Texas Instruments (TI) has a DSP SoC family called
KeyStone™6
that offers exceptionally high floating-point processing at relatively low clock
frequencies using up to 8 DSP Cores in a shared memory configuration. These are C-programmable,
support demanding video compression algorithms such as the H.265, but the power consumption
is well above 24W.
 Hybrid SoCs combine MPSoC, GPU, DSP, and FPGA technologies on the same chip. The Xilinx® Zynq™
UltraScale+ MPSoC (Figure 2) is a hybrid SoC that provides heterogeneous multicore processors
(different speeds and functionality), a GPU, a FPGA substrate, and numerous standard interfaces.
The IMX series by NXP is another good example of hybrid SoC suitable for image processing. This
series has several heterogeneous multicore processors, standard I/O, and is complimented by an
application specific accelerator. A new iMX device called the iMX87
is highly suitable for the initial
platform although not currently shipping. One exceptionally good candidate is the Movidius8
Video
Processing Unit, a hybrid SoC. The processor is used in Google’s Tango9
project but not yet
commercially available.
In general, the most suitable SoC candidates for the reference are those that cannot be bought in single
quantities, like the ones found in all modern smartphones and other high-end/high-volume applications
such as set-top boxes, games consoles, etc. These devices are manufactured by companies such as
Samsung®, Broadcom®, AMD®, Qualcomm®, ST®, and NXP® and sold directly to companies such as Apple®,
Amazon® and Sony® for use in consumer products.
4.1.1.2 Standard interfaces
Numerous standard interfaces for image processing have appeared in the market to answer the growing
demand for more speed and smaller external connectors. Table 1 provides an overview of the interfaces,
including hyperlinks to the standard bodies that control them.
Choice of the image processing interface ultimately depends on application requirements such as input
latency and resolution. Serial interfaces (for example, USB3.0 Vision and GigE Vison) are low cost and
support flexible cables that can be hundreds of meters long. However, the disadvantages are long latency
and or low resolution. Parallel Interfaces (for example, FDPNox and Camera Link) improve speed and
resolution, but require big and heavy connectors/cables that can only be a few meters long. Since the initial
platform is generic, it should ideally provide both serial and parallel image processing interfaces.
5 http://www.nvidia.com/object/tegra-x1-processor.html
6 http://www.ti.com/lsds/ti/processors/dsp/c6000_dsp/c66x/overview.page
7 http://www.nxp.com/pages/i.mx-8-multisensory-enablement-kit:i.MX8-MEK?tid=vaniMX8-MEK
8
http://www.movidius.com/solutions/vision-processing-unit
9
https://www.google.com/atap/project-tango/

REFERENCE:
DATE: 15/12/2016
Page 35 of 47
Table 1: Camera standards
Device Standard
Body
Rate -
Gbits/sec
Rate -
GBytes/sec
Year
Camera Link Base 1-0 @
24-bit 85 MHz
AIA 2.040 0.255 2000
Camera Link Full 2.0 @
64-bit 85 MHz
AIA 5.44 0.680 2007
Camera Link HS AIA 16.8 2.1 2012
CoaXPress JIIA 6.25 0.78 2010
DisplayPort 1.0 (4-lane
High Bit Rate)
VESA 10.8 1.35 2007
Reduced Bit Rate)
VESA 6.48 0.810 2006
High Bit Rate 2)
VESA 21.6 2.7 2009
High Bit Rate 3)
VESA 32.4 4.05 2014
DVI - Dual Link DDWG 9.90 1.238 1999
DVI - Single Link DDWG 4.95 0.619 1999
FireWire - IEEE-1394 1394 3.2 0.4 1994
FPD-Link I, II & III - OpenLDI None 10.5 1.312 1999
GigE Vision AIA 1 0.125 2006
HDMI 1.0 HDMI 4.95 0.619 2002
HDMI 1.3 HDMI 10.2 1.275 2006
HDMI 2.0 HDMI 18.0 2.25 2013
SMPTE 292M - HD-SDI SMPTE 1.485 0.186 1989
SMPTE 424M - 3G-SDI SMPTE 2.97 0.371 1989
SuperMHL MDL 159 19.9 2016
Thunderbolt Thunderbolt 10 1.25 2011
Thunderbolt 2 Thunderbolt 20 2.5 2013
Thunderbolt 3 Thunderbolt 40 5 2015
USB Vision - based on USB3 AIA 2.8 0.35 2013
4.1.2 Application requirements
The main dimensions of hardware considered by typical high performance real-time image processing
embedded applications are: performance and accuracy; total/standby power efficiency and power-on cycle-
time; environmental factors such as size, weight, noise, expected life time; heat management; and cost.

REFERENCE:
DATE: 15/12/2016
Page 36 of 47
Applications typically vary in their requirements along the main dimensions. Take for example the project
applications part of the starter kit:
 The mobile C-arm X-ray medical application places a strong requirement on high-performance
processing. This is because high quality, hard real-time images are sought despite the reduced X-
ray dose administered to patients as per new regulations. The application will be embedded next
to an X-ray sensor that is highly sensitive to heat. The entire unit will be placed in an air-conditioned,
well-ventilated surgery room with uninterrupted access to wall-socket AC power. Compact size of
the electronics board and good cooling are therefore other requirements. Passive cooling can also
increase reliability and the lower noise levels in the room. The requirement of speed and flexible
cables for the camera interface is satisfied by the GigE standard as shown in Table 2.
 The pedestrian detection automotive application places strong requirements on processing speed
and latency to maintain accuracy with very low latency from sensor to warning. Dynamic
reconfiguration of the hardware platform is another requirement since the application should adapt
to changing vehicle speeds and driving environment. Other important considerations are heat
generated and tolerated, as the application will be housed in a hot environment with low
ventilation. Long-term reliability is critical since customers will expect at least 10+ years of life. Total
power consumption or weight is of less importance than volume cost. The requirement of a low
latency camera interface with dedicated cables makes FDPNox a good choice.
 The collision detection UAV application is like the automotive application in processing and latency
requirements. In contrast, the life-time requirement does not run into decades and cost is a smaller
consideration as volume is not high for the UAV market. Low weight is the strongest requirement.
One way to reduce weight is to choose hardware with active cooling. Passive cooling requires heavy
metals. Interestingly, once the UAV is air-borne, the on-board electronics is less likely to overheat
since airflow from the rotors offers cooling. Secure air-to-land communication interfaces are
required, but this is more a software issue. The Camera Link interface is required for precision and
low latency.

REFERENCE:
DATE: 15/12/2016
Page 37 of 47
Table 2: High-end vision interfaces
Interface/
Feature
CoaXPress Camera
Link
GigE
Vision
1.x
USB3.0 GigE Vision
2.x
Camera
Link HS
Single-Link
Speed
6.25 Gb/s 2 Gb/s 1 Gb/s 5 Gb/s Up to 10
Gb/s
3.125
Gb/s
Maximum
Speed
N * 6.25
Gb/s
(N cables)
5.5 Gb/s
(2
cables)
2 Gb/s
(Four
lanes)
5 Gb/s 20/40 Gb/s
(Four lanes)
2N *
3.125
Gb/s (N
cables)
Cost Low to High
(H/W
required)
Medium
to High
(H/W
required)
Medium Low Medium to
High
Medium
Complexity Low Low High Medium High Medium
Cabling Coaxial Custom
multi-
core
Cat-6 Complex Cat-6,
optical fiber
CX-4
Maximum
Length
100m/50m 10m/7m 100m 3m Fiber
20km+
15m,
fiber
300m+
Data
Integrity
CRC,
8B/10B
None CRC,
8B/10B,
Resend
CRC CRC,
8B/10B or
64B/66B,
Resend
CRC,
Resend
Real Time
Trigger
Yes (±4 ns) Yes No No Yes (>25 ns) No
4.1.3 Selection
We have selected the Xilinx® Zynq™ UltraScale+ MPSoC (Figure 2), a hybrid SoC as the processing
component of the initial platform. The Zynq MPSoC can cover a huge variation in the main dimensions
considered by low power image processing applications, offers a host of I/O interfaces, and has a capable
software stack.
The Zynq MPSoC provides limited flexibility in two dimensions — power and cost. We discuss these
limitations first.
The Zynq MPSoC has a prohibitively high unit cost — a long-standing problem for hybrid SoCs. It also has an
estimated power budget of 24W. This value is at the threshold of low power electronics. 24W is also at the

REFERENCE:
DATE: 15/12/2016
Page 38 of 47
boundary of the power budget of Power-Over-Ethernet10
(PoE) and Power-over-Camera-Link11
(PoCL)
standards conceived to eliminate secondary power sources in the final system. Power dissipation much less
than 24W can be obtained by power-aware operation, switching to low power ASICs or adopting future
FPGA technology.
The Zynq MPSoC can be configured such that applications operate well within the estimated 24W power
budget. The SoC has four 64-bit and two 32-bit processing cores. Applications that do not need the
combined power of these cores can simply turn them off to save power. All components involved in camera
processing can also be turned off when known paths are being followed. For example, during point-to-point
navigation of the UAV, or when the automobile is moving in reverse. The frequency of the processing cores
can be dynamically throttled to suit the speed of the UAV or automobile. That way, lower speeds naturally
consume less energy. The FPGA substrate can be fully erased to save power when the vehicle is idle, thanks
to fast dynamic reconfiguration capabilities on the Zynq MPSoC.
We admit that is unlikely that the Zynq MPSoC will be chosen as the production platform for low power
image processing applications due to its high unit cost and borderline power dissipation. However, the Zynq
MPSoC provides unmatched flexibility in all other dimensions making it a compelling choice for the starter
kit.
The Zynq MPSoC belongs to a SoC family called EV12
. The family offers by default powerful processing that
can be expanded with larger FPGAs. There exist variants in the family with downgraded multicore processing
capabilities and without dedicated video codecs for image compression. All variants in the family have a
host of standard interfaces and a dedicated GPU (ARM Mali™13
) in common.
The Zynq MPSoC is manufactured using 16nm silicon technology, close to the technology edge for the year
2016 as defined by ITRS14
. The SoC has already been proven in the field, 15
so it is not a reliability risk and
will enter full production in 2017 for commercial parts, followed by a rollout of automotive-grade and
defense-grade variants. The varying camera interface requirements of typical applications can be
accommodated using add-on modules of the Zynq MPSoC.
The chosen board for the Zynq MPSoC is compatible with PC/104. More than 40 vendors16
are
manufacturing compatible boards and are committed to the on-going development of PC/104 specification.
4.2 Real-Time Operating System (RTOS)
Low power high performance real-time image processing applications require high-performance real-time
scheduling, memory protection, power management, and reliable access to peripherals and devices on the
10 https://en.wikipedia.org/wiki/Power_over_Ethernet
11
https://en.wikipedia.org/wiki/Camera_Link
12 http://www.xilinx.com/support/documentation/data_sheets/ds890-ultrascale-overview.pdf
13 https://en.wikipedia.org/wiki/Mali_(GPU)
14 https://en.wikipedia.org/wiki/International_Technology_Roadmap_for_Semiconductors
15
http://www.xilinx.com/video/soc/zynq-ultrascale-plus-says-hello-world.html
16
http://pc104.org/membership/members/

REFERENCE:
DATE: 15/12/2016
Page 39 of 47
target platform. Implementing these techniques at the application level severely limits productivity. Even
when implemented competently, the techniques become unreliable when several applications with
competing requirements execute simultaneously on the target platform. The obvious solution is to use a
Real-Time Operating System (RTOS) that can abstract away low-level hardware details and provide
convenient and reliable services for productive multiprogrammed application development.
The aim of the section is to motivate the choice of RTOS for the initial platform. First, we discuss
requirements of typical low power high performance real-time image processing embedded applications
concretely, and next present our selection.
4.2.1 Application requirements
Requirements placed by typical low power image processing embedded applications on the RTOS are:
 Real-time scheduling and virtual memory support.
 Efficient utilization of available parallelism on the target platform. For example, symmetric
multiprocessing (SMP), heterogeneous processors, and reconfigurable hardware.
 Power optimization using state-of-the-art techniques such as Dynamic Voltage and Frequency
Scaling (DVFS, dynamically changing the frequency of the cores depending on the load) and Dynamic
Power Management (DPM, the ability to power cores off and on). This implies the presence of
device drivers to control hardware and kernel features such as power-aware scheduling and
governor policies.
 Architecture-optimized abstractions for high-performance image processing. Examples: OpenCV,
OpenCL, and OpenMP.
Support for Xilinx Zynq UltraScale+ MPSoC is an additional requirement that stems from our choice of
hardware component. We bundle this requirement into the list of application requirements.
General-purpose Operating Systems (GPOSs) and simple Board Support Packages (BSPs) do not completely
satisfy all application requirements. For example, GPOSs do not provide safe guarantees in terms of timing
and energy efficiency[25, 26], while BSPs are too simple to provide high-performance abstractions such as
OpenCV.
A Real-time Operating System (RTOS) can potentially fulfil all application requirements. We evaluated these
eight state-of-the-art RTOS: FreeRTOS from Real Time Engineers Ltd17
, PikeOS from SYSGO AG18
, Real-Time
Executive for Multiprocessor Systems (RTEMS) from OAR Corporation19
, VxWorks from Wind River20
, QNX
17
http://www.freertos.org/
18
https://www.sysgo.com/products/pikeos-rtos-and-virtualizationconcept/
19 https://www.rtems.org
20 http://www.windriver.com/products/vxworks/

REFERENCE:
DATE: 15/12/2016
Page 40 of 47
from BlackBerry Ltd21
, Integrity from Green Hills Software22
, Nucleus RTOS from Mentor Graphics23
, and
HIPPEROS RTOS from HIPPEROS S.A.24
for satisfiability of application requirements. Results of the
evaluation are summarized in Table 3.
Table 3: RTOS and application requirements
RTOS /
Application
requirement
FreeRT
OS
PikeOS RTEMS VxWorks QNX Integrity Nucleus HIPPEROS
Real-time
scheduling
Yes Yes Yes Yes Yes Yes Yes Yes
Virtual
memory
support
No Yes No Yes Yes Yes Yes Yes
SMP No Yes No Yes Yes Yes Yes Yes
Heterogenous
processing
No No No Yes No No No Yes
DVFS No No No No No No No Yes
DPM No No No No No No Yes Yes
OpenCV No No No No No No No On-going
OpenCL/GL No No No No No No No On-going
OpenMP No No No No No No No On-going
Zynq MPSoC Yes No No Yes No No No Yes
4.2.2 Selection
We selected the HIPPEROS RTOS since it was the only RTOS that met all application requirements. The
HIPPEROS RTOS is based on a micro-kernel built from the ground-up with hard real-time scheduling, time &
space isolation and multicore execution as base design principles. The kernel has a micro-kernel architecture
with support for virtual memory, efficient IPC, and real-time scheduling algorithms aware of power and
thermal constraints. This ability to integrate low power optimization as part of the kernel is a major
21 http://www.windriver.com/products/vxworks/
22 http://www.ghs.com/products/rtos/integrity.html
23
https://www.mentor.com/embeddedsoftware/nucleus/
24
http://www.hipperos.com/

REFERENCE:
DATE: 15/12/2016
Page 41 of 47
differentiator when compared to SOTA systems where power management is left to the user and interferes
with real-time constraints.
A summary of relevant interfaces provided or under development in the project by HIPPEROS RTOS for
image processing embedded application and tool development is shown in Table 4.
Table 4: HIPPEROS RTOS interfaces
Interface Target Comments
LibC Application NA
POSIX Application A subset relevant for embedded
applications only
C/C++ Application NA
OpenMP Application NA
OpenCV Application NA
OpenCL/GL Application NA
Power management API Application NA
Integration in vendor toolchains (Xilinx
SDSoC and Xilinx Vivado)
Toolchain NA
OS configuration tool Toolchain Configuring devices, tasks etc.
Integration with remote target
debugging tools (OpenOCD and JTAG)
Toolchain NA
Generic bootloader such as UBoot Toolchain NA
FPGA reconfiguration service Application Enables online FPGA
reconfiguration
Logging service Application NA
Read hardware performance counters Application NA
4.3 Development tools
Since the Xilinx® Zynq™ UltraScale+ MPSoC is the selected hardware component (Section 4.2.3), we decided
to use application development tools recommended by Xilinx. Any other choice of development tools would
simply have not been as productive or reliable. In addition, we were encouraged by Xilinx’s solid support
service and extensive documentation. The recommended development tools for the Zynq MPSoC are:

REFERENCE:
DATE: 15/12/2016
Page 42 of 47
4.3.1 Xilinx Vivado
Xilinx Vivado25
enables programmers to build hardware at a low abstraction level. A key feature of Vivado
is its library of pre-configured, optimized hardware IP blocks that can be easily integrated into existing
designs using the IP-Xact standard format, saving a lot of work for developers. These pre-configured IP
blocks include hardware implementations of frequently-used functions (for example, OpenCV functions)
and interfaces between architecture components (for example, memory interfaces between FPGA and
APU). Additionally, Vivado supports a partial reconfiguration workflow. We expect that partial
reconfiguration will play a crucial role in meeting the low power requirements of large applications. Vivado
includes functionality to connect, program, and debug one or more FPGA devices, either directly using JTAG
or indirectly using a SD-Card.
Vivado HLS is an extension of Vivado that provides high-level synthesis capability. Developers can insert
Vivado HLS pragmas (compiler directives) in C/C++ and OpenCL code to automatically generate equivalent
hardware. This makes hardware design accessible to a wider community.
Vivado enables debugging hardware at different levels of accuracy. The basic debug process is RTL
simulation which enables early verification of designs. However, it is difficult to simulate very large designs
and to accurately simulate a real-world system environment using RTL simulation. A more accurate debug
process is post-implementation design simulation. This verifies the functionality of the design using a timing
accurate model. The most accurate debug process is runtime debugging where the design is tested under
real conditions and its interfaces are validated. However, extra hardware is required to observe signals of
interest.
Vivado provides dynamic power optimization mechanisms (clock gating) and power estimation tools for all
design stages. The Xilinx Power Estimator (XPE) is used at pre-implementation stages of the design cycle to
get detailed estimates of power dissipation and temperature, enabling early architecture evaluation and
informed configuration of power supplies and heat management devices. Post-routing power estimates in
Vivado are more accurate and can pin-point power hungry components in the design hierarchy.
Vivado commands can also be executed from the command-line or using scripts written in the Tcl language.
This enables automation of the design process. For example, it is possible to rebuild whole applications
without interacting with the GUI.
4.3.2 Xilinx SDSoC
Xilinx SDSoC26
is the latest incarnation of the Xilinx SDK tool that enables developers to design whole
application software stacks (application code, operating system, third-party libraries, root file system, boot
loader etc.). A key diverging feature of SDSoC is the unification of previously separate hardware and
software design flows into a single software-centric design flow where developers can simply accelerate
25 https://www.xilinx.com/products/design-tools/vivado.html
26
https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html

REFERENCE:
DATE: 15/12/2016
Page 43 of 47
C/C++ functions on the FPGA at the click of a button. Clever automation in the background takes care of
generating equivalent hardware and required communication mechanisms. Developers can fine-tune
accelerated functions using pragma-based compiler directives. The directives include loop pipelining, loop
unrolling, bit-width reduction etc.
Acceleration is essentially a trade-off between performance and the maximum memory throughput. SDSoC
profiling tools allow programmers to profile memory access behavior and to estimate the latency of
accelerated functions.
Like Vivado, SDSoC commands can be executed on the command line or using Tcl-based scripts, enabling
design automation.
4.4 Summary
In summary, we presented selection criteria and key features of the components of the initial platform that
serves as a base for the platform instance, a part of the starter kit. The initial platform is needed to develop
the platform instance once guidelines are formulated. We have put together the initial platform based on
our experience and prior work.
Hardware components and RTOS of the initial platform were selected based on a thorough analysis of
application requirements and available choices. Selection of application development tools for the initial
platform was relatively simple since we decided to stick to hardware vendor recommendations for the sake
of productivity and reliability.
The selected hardware component for the initial platform is the Xilinx Zynq UltraScale+ MPSoC, a hybrid
SoC that combines heterogenous multicore processing, a dedicated GPU, a large FPGA fabric, and numerous
standard interfaces for image processing. The Zynq MPSoC meets processing requirements of typical
applications. However, it has a high unit cost and consumes peak power that is at the threshold for low
power electronics. Nevertheless, it is compelling choice for the initial platform based on its processing
power, choice of power optimizations, flexibility in image processing interfaces, and mature software stack.
The selected RTOS for the initial platform is the HIPPEROS RTOS. The RTOS supports requirements of typical
project applications besides supporting the Zynq MPSoC. HIPPEROS supports real-time scheduling, virtual
memory, efficient IPC, power management, and high-performance programming abstractions such as
OpenCV and OpenMP.
The selected application development tools for the initial platform are those recommended by Xilinx for the
Zynq MPSoC —Vivado and SDSoC. These tools together support high-level synthesis, low-level hardware
design, fine control of middleware, detailed performance/energy analysis, and user automation, enabling
both experts and novices to work productively.

REFERENCE:
DATE: 15/12/2016
Page 44 of 47
5 Conclusions
The deliverable presented the reference platform in the context of image processing embedded systems
and the starter kit.
The starter kit is a conceptual package consisting of the platform instance, project applications, and
reference platform handbook. The aim of the starter kit is to provide engineers with a generic evaluation
platform that serves as a base for productively developing low power image processing applications.
application development tools. Components of the platform instance are specialized for low power image
processing applications.
The reference platform handbook is a set of guidelines for low power image processing embedded systems.
We use guidelines as shorthand for the reference platform handbook.
Guidelines recommend application implementation methods supported by the platform instance. A
guideline is a goal-oriented, expert-formulated encapsulation of advice and recommended implementation
methods for low power image processing. A vendor platform that enables guidelines by providing suitable
implementation methods is called an instance. The platform instance is an instance that is fully compliant
since it only provides recommended implementation methods for the guidelines it supports. We have
tentatively decided that compliance to guidelines will be judged and certified by an independent body
identified by the eco-system of stakeholders created during the project. A more certain decision will be
taken as the project matures.
A project-wide workflow formulates and evaluates guidelines within project applications. The workflow
uses available expertise and runs continuously during and beyond the project. Off the critical path, the
workflow improves existing technology to facilitate evaluation of guidelines and encourage vendors to
support recommended implementation methods in their instances.
The output of the workflow at the end of the project is the reference platform handbook. The handbook is
vetted and maintained by an ecosystem of stakeholders created during the project. The handbook is not
finalized at the end of the project and can be updated by anyone who uses the workflow, subject to approval
from the maintainers of the handbook.
Project applications, a part of the starter kit are selected to demonstrate the effectiveness of the platform
instance and guidelines. Selected from automotive, UAV, and medical image processing domains, the
project applications interpose challenging requirements and highly interesting design tradeoffs.
Implementations that meet the requirements can potentially have a lasting impact in their respective
communities and act as compelling success stories for the starter kit. It is yet to be determined how project
applications will be delivered in the starter kit. Source code of project applications will likely not be included
due to IP restrictions.

REFERENCE:
DATE: 15/12/2016
Page 45 of 47
Since we have not formulated guidelines yet, the platform instance cannot be substantiated. However, a
base platform is needed to evaluate guidelines and develop the platform instance. This base platform is
called the initial platform. We have put together the initial platform using our experience and prior work.
The selected hardware component of the initial platform is the Xilinx Zynq UltraScale+ MPSoC, the OS is the
HIPPEROS RTOS, and the development tools are those recommended by Xilinx for the Zynq MPSoC —Vivado
and SDSoC.
The path ahead is clear. We will continue to run the workflow to formulate and evaluate guidelines within
project applications. Technology used during the process will be part of the platform instance as
recommended implementation methods. The next version of the document will present a cumulative list of
guidelines, their evaluations within project applications, and developments made to the platform instance.

REFERENCE:
DATE: 15/12/2016
Page 46 of 47
6 References
[1] C. Simmonds, Mastering Embedded Linux Programming: Packt Publishing Ltd, 2015.
[2] (4 October 2016). NEON - ARM. Available:
http://www.arm.com/products/processors/technologies/neon.php
[3] (4 October 2016). RT PREEMPT HOWTO - RTwiki. Available:
https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
[4] (4 October 2016). The Zynq Book: About The Book. Available:
http://www.zynqbook.com/about.html
[5] N. Rajovic, P. M. Carpenter, I. Gelado, N. Puzovic, A. Ramirez, and M. Valero, "Supercomputing with
Commodity CPUs: Are Mobile SoCs Ready for HPC?," 2013, pp. 1-12.
[6] (4 October 2016). OpenMP Specifications. Available: http://openmp.org/wp/openmp-
specifications/
[7] (4 September 2016). GitBook · Writing Made Easy. Available: https://www.gitbook.com/
[8] (4 October 2016). UltraFast Design Methodology. Available:
https://www.xilinx.com/products/design-tools/ultrafast.html
[9] M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low Power Methodology Manual: For System-
on-Chip Design: Springer Publishing Company, Incorporated, 2007.
[10] A. Ahmed and W. Wolf, "Hardware/Software Interface Codesign for Embedded Systems," 2005
2005.
[11] (4 October 2016). Android Best Practices | OpenCV. Available:
http://opencv.org/platforms/android/android-best-practices.html
[12] (4 October 2016). Best Practices in Embedded Systems Programming. Available:
http://www.embedded.com/collections/4398825/Best-practices-in-programming
[13] (4 October 2016). FPGARelated.com - All You Can Eat FPGA. Available:
https://www.fpgarelated.com/
[14] (4 October 2016). EVA Blog. Available: http://www.embedded-vision.com/industry-analysis/blog
[15] (19 September 2016). Xcell Publications. Available: http://www.xilinx.com/about/xcell-
publications.html
[16] (4 October 2016). EVA Technical Articles. Available: http://www.embedded-vision.com/industry-
analysis/technical-articles
[17] (4 October 2016). Embedded Linux Experts - Free Electrons. Available: http://free-
electrons.com/doc/training/embedded-linux/
[18] (4 October 2016). Training and Videos | Zedboard. Available:
http://zedboard.org/support/trainings-and-videos
[19] (4 October 2016). 1080p60 HD Medical Endoscope. Available:
http://www.xilinx.com/applications/medical/endoscope.html
[20] M. Wolf, High-Performance Embedded Computing: Applications in Cyber-Physical Systems and
Mobile Computing: Newnes, 2014.
[21] E. White, Making Embedded Systems: Design Patterns for Great Software: " O'Reilly Media, Inc.",
2011.
[22] J. C. Russ, "The Image Processing Handbook, Sixth Edition," CRC Press, April 2011 2011.
[23] (4 October 2016). Xilinx Alliance Member Design Services. Available:
http://www.xilinx.com/alliance/design-services.html#certified

REFERENCE:
DATE: 15/12/2016
Page 47 of 47
[24] A. J. Barry and R. Tedrake, "Pushbroom stereo for high-speed navigation in cluttered
environments," in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp.
3046-3052.
[25] A. Paolillo, O. Desenfans, V. Svoboda, J. Goossens, and B. Rodriguez, "A New Configurable and
Parallel Embedded Real-time Micro-Kernel for Multi-core platforms," OSPERT 2015, p. 25, 2015.
[26] B. B. Brandenburg, "Scheduling and locking in multiprocessor real-time operating systems,"
Citeseer, 2011.

D1.1 reference platform_v1_20161215

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to D1.1 reference platform_v1_20161215

Similar to D1.1 reference platform_v1_20161215 (20)

More from Tulipp. Eu

More from Tulipp. Eu (18)

Recently uploaded

Recently uploaded (20)

D1.1 reference platform_v1_20161215