Network Working Group B. Trammell
Request for Comments: 5655 E. Boschi
Category: Standards Track Hitachi Europe
October 2009 Specification of the IP Flow Information Export (IPFIX) File Format
This document describes a file format for the storage of flow data
based upon the IP Flow Information Export (IPFIX) protocol. It
proposes a set of requirements for flat-file, binary flow data file
formats, then specifies the IPFIX File format to meet these
requirements based upon IPFIX Messages. This IPFIX File format is
designed to facilitate interoperability and reusability among a wide
variety of flow storage, processing, and analysis tools.
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the BSD License.
B.1.3. Template Format .......................................59B.1.4. Information Model .....................................59B.1.5. Template Management ...................................59B.1.6. Transport .............................................59B.2. A Method for Transforming NetFlow V9 Messages to IPFIX ....60B.3. NetFlow V9 Transformation Example .........................611. Introduction
This document specifies a file format based upon IPFIX, designed to
facilitate interoperability and reusability among a wide variety of
flow storage, processing, and analysis tools. It begins with an
overview of the IPFIX File format, and a quick summary of how IPFIX
Files work in Section 3. The detailed specification of the IPFIX
File format appears in Section 7; this section includes general
specifications for IPFIX File Readers and IPFIX File Writers and
specific recommendations for common situations in which they are
used. The format makes use of the IPFIX Options mechanism for
additional file metadata, in order to avoid requiring any protocol
extensions, and to minimize the effort required to adapt IPFIX
implementations to use the file format; a detailed definition of the
Options Templates used for storage metadata appears in Section 8.
Appendix A contains a detailed example IPFIX File.
An advantage of file-based storage is that files can be readily
encapsulated within each other and other data storage and
transmission formats. The IPFIX File format leverages this to
provide encryption, described in Section 9 and compression, described
in Section 10. Section 11 provides specific recommendations for
integration of IPFIX File data with other formats.
The IPFIX File format was designed to be applicable to a wide variety
of flow storage situations; the motivation behind its creation is
described in Section 4. The document outlines of the set of
requirements the format is designed to meet in Section 5, and
explores the applicability of such a format to various specific
application areas in Section 6. These sections are intended to give
background on the development of IPFIX Files.
1.1. IPFIX Documents Overview
"Specification of the IP Flow Information Export (IPFIX) Protocol for
the Exchange of IP Traffic Flow Information" [RFC5101] and its
associated documents define the IPFIX protocol, which provides
network engineers and administrators with access to IP traffic flow
"Architecture for IP Flow Information Export" [RFC5470] defines the
architecture for the export of measured IP flow information out of an
IPFIX Exporting Process to an IPFIX Collecting Process, and the basic
terminology used to describe the elements of this architecture, per
the requirements defined in "Requirements for IP Flow Information
Export" [RFC3917]. [RFC5101] then covers the details of the method
for transporting IPFIX Data Records and Templates via a congestion-
aware transport protocol from an IPFIX Exporting Process to an IPFIX
"Information Model for IP Flow Information Export" [RFC5102]
describes the Information Elements used by IPFIX, including details
on Information Element naming, numbering, and data type encoding.
"IP Flow Information Export (IPFIX) Applicability" [RFC5472]
describes the various applications of the IPFIX protocol and their
use of information exported via IPFIX, and it relates the IPFIX
architecture to other measurement architectures and frameworks.
In addition, "Exporting Type Information for IP Flow Information
Export (IPFIX) Information Elements" [RFC5610] specifies a method for
encoding Information Model properties within an IPFIX Message stream.
This document references [RFC5101] and [RFC5470] for terminology,
defines IPFIX File Writer and IPFIX File Reader in terms of the IPFIX
Exporting Process and IPFIX Collecting Process definitions from
[RFC5101], and extends the IPFIX Information Model defined in
[RFC5102] to provide new Information Elements for IPFIX File
metadata. It uses the method described in [RFC5610] to support the
self-description of IPFIX Files containing enterprise-specific
This section defines terminology related to the IPFIX File format.
In addition, terms used in this document that are defined in the
"Terminology" section of [RFC5101] are to be interpreted as defined
IPFIX File: An IPFIX File is a serialized stream of IPFIX Messages;
this stream may be stored on a filesystem or transported using any
technique customarily used for files. Any IPFIX Message stream
that would be considered valid when transported over one or more
of the specified IPFIX transports (Stream Control Transmission
Protocol (SCTP), TCP, or UDP) as defined in [RFC5101] is
considered an IPFIX File. However, this document extends that
definition with recommendations on the construction of IPFIX Files
that meet the requirements identified in Section 5.
IPFIX File Reader: An IPFIX File Reader is a process that reads
IPFIX Files from a filesystem. An IPFIX File Reader operates as
an IPFIX Collecting Process as specified in [RFC5101], except as
modified by this document.
IPFIX File Writer: An IPFIX File Writer is a process that writes
IPFIX Files to a filesystem. An IPFIX File Writer operates as an
IPFIX Exporting Process as specified in [RFC5101], except as
modified by this document.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Design Overview
An IPFIX File is simply a data stream containing one or more IPFIX
Messages serialized to some filesystem. Though any set of valid
IPFIX Messages can be serialized into an IPFIX File, the
specification includes guidelines designed to ease storage and
retrieval of flow data using the IPFIX File format.
IPFIX Files contain only IPFIX Messages; any file metadata such as
checksums or export session details are stored using Options within
the IPFIX Message. This design is completely compatible with the
IPFIX protocol on the wire. A schematic of a typical IPFIX File is
handling workflow application. Sets of flow data relevant to
Internet measurement research may be published as files, much as
libpcap [pcap] packet trace files are, to provide common datasets for
the repeatability of research efforts; these files would have
lifetimes measured in months or years. Operational flow measurement
systems also have a need for long-term, archival storage of flow
data, either as a primary flow data repository, or as a backing tier
for online storage in a relational database management system
The variety of applications of flow data, and the variety of
presently deployed storage approaches, indicates the need for a
standard approach to flow storage with applicability across the
continuum of time scales over which flow data is stored. A storage
format based around flat files would best address the variety of
storage requirements. While much work has been done on structured
storage via RDBMS, relational database systems are not a good basis
for format standardization owing to the fact that their internal data
structures are generally private to a single implementation and
subject to change for internal reasons. Also, there are a wide
variety of operations available on flat files, and external tools and
standards can be leveraged to meet file-based flow storage
requirements. Further, flow data is often not very semantically
complicated, and is managed in very high volume; therefore, an RDBMS-
based flow storage system would not benefit much from the advantages
of relational database technology.
The simplest way to create a new file format is simply to serialize
some internal data model to disk, with either textual or binary
representation of data elements, and some framing strategy for
delimiting fields and records. "Ad hoc" file formats such as this
have several important disadvantages. They impose the semantics of
the data model from which they are derived on the file format, and as
such, they are difficult to extend, describe, and standardize.
Indeed, one de facto standard for the storage of flow data is one of
these ad hoc formats. A common method of storing data collected via
Cisco NetFlow is to serialize a stream of raw NetFlow datagrams into
files. These NetFlow PDU files consist of a collection of header-
prefixed blocks (corresponding to the datagrams as received on the
wire) containing fixed-length binary flow records. NetFlow V5, V7,
and V8 data may be mixed within a given file, as the header on each
datagram defines the NetFlow version of the records following. While
this NetFlow PDU file format has all the disadvantages of an ad hoc
format, and is not extensible to data models other than that defined
by Cisco NetFlow, it is at least reasonably well understood due to
Over the past decade, XML has emerged as a new "universal"
representation format for structured data. It is intended to be
human readable; indeed, that is one reason for its rapid adoption.
However, XML has limited usefulness for representing network flow
data. Network flow data has a simple, repetitive, non-hierarchical
structure that does not benefit much from XML. An XML representation
of flow data would be an essentially flat list of the attributes and
their values for each flow record.
The XML approach to data encoding is very heavyweight when compared
to binary flow encoding. XML's use of start- and end-tags, and
plaintext encoding of the actual values, leads to significant
inefficiency in encoding size. Typical network traffic datasets can
contain millions or billions of flows per hour of traffic
represented. Any increase in storage size per record can have
dramatic impact on flow data storage and transfer sizes. While data
compression algorithms can partially remove the redundancy introduced
by XML encoding, they introduce additional overhead of their own.
A further problem is that XML processing tools require a full XML
parser. XML parsers are fully general and therefore complex,
resource-intensive, and relatively slow, introducing significant
processing time overhead for large network-flow datasets. In
contrast, parsers for typical binary flow data encodings are simply
structured, since they only need to parse a very small header and
then have complete knowledge of all following fields for the
particular flow. These can then be read in a very efficient linear
This leads us to propose the IPFIX Message format as the basis for a
new flow data file format. The IPFIX Working Group, in defining the
IPFIX protocol, has already defined an information model and data
formatting rules for representation of flow data. Especially at
shorter time scales, when a file is a unit of data interchange, the
filesystem may be viewed as simply another IPFIX Message transport
between processes. This format is especially well suited to
representing flow data, as it was designed specifically for flow data
export; it is easily extensible, unlike ad hoc serialization, and
compact, unlike XML. In addition, IPFIX is an IETF Standards-Track
protocol for the export and collection of flow data; using a common
format for storage and analysis at the collection side allows
implementors to use substantially the same information model and data
formatting implementation for transport as well as storage.
In this section, we outline a proposed set of requirements
[SAINT2007] for any persistent storage format for flow data. First
and foremost, a flow data file format should support storage across
the continuum of time scales important to flow storage applications.
Each of the requirements enumerated in the sections below is broadly
applicable to flow storage applications, though each may be more
important at certain time scales. For each, we first identify the
requirement, then explain how the IPFIX Message format addresses it,
or briefly outline the changes that must be made in order for an
IPFIX-based file format to meet the requirement.
5.1. Record Format Flexibility
Due to the wide variety of flow attributes collected by different
network flow attribute measurement systems, the ideal flow storage
format will not impose a single data model or a specific record type
on the flows it stores. The file format must be flexible and
extensible; that is, it must support the definition of multiple
record types within the file itself, and must be able to support new
field types for data within the records in a graceful way.
IPFIX provides record format flexibility through the use of Templates
to describe each Data Record, through the use of an IANA Registry to
define its Information Elements, and through the use of enterprise-
specific Information Elements.
Archived data may be read at a time in the future when any external
reference to the meaning of the data may be lost. The ideal flow
storage format should be self-describing; that is, a process reading
flow data from storage should be able to properly interpret the
stored flows without reference to anything other than standard
sources (e.g., the standards document describing the file format) and
the stored flow data itself.
The IPFIX Message format is partially self-describing; that is, IPFIX
Templates containing only IANA-assigned Information Elements can be
completely interpreted according to the IPFIX Information Model
without additional external data.
However, Templates containing private information elements lack
detailed type and semantic information; a Collecting Process
receiving Data Records described by a Template containing enterprise-
specific Information Elements it does not understand can only treat
the data contained within those Information Elements as octet arrays.
To be fully self-describing, enterprise-specific Information Elements
must be additionally described via IPFIX Options according to the
Information Element Type Options Template defined in [RFC5610].
5.3. Data Compression
Regardless of the representation format, flow data describing traffic
on real networks tends to be highly compressible. Compression tends
to improve the scalability of flow collection systems, by reducing
the disk storage and I/O bandwidth requirement for a given workload.
The ideal flow storage format should support applications that wish
to leverage this fact by supporting compression of stored data.
The IPFIX Message format has no support for data compression, as the
IPFIX protocol was designed for speed and simplicity of export. Of
course, any flat file is readily compressible using a wide variety of
external data compression tools, formats, and algorithms; therefore,
this requirement can be met via encapsulation in one of these
formats. Section 10 specifies an encapsulation based on bzip2 or
gzip, to maximize interoperability.
A few simple optimizations can be made by File Writers to increase
the integrity and usability of compressed IPFIX data; these are
outlined in Section 10.3.
5.4. Indexing and Searching
Binary, record-stream-oriented file formats natively support only one
form of searching: sequential scan in file order. By choosing the
order of records in a file carefully (e.g., by flow end time), a file
can be indexed by a single key.
Beyond this, properly addressing indexing is an application-specific
problem, as it inherently involves trade-offs between storage
complexity and retrieval speed, and requirements vary widely based on
time scales and the types of queries used from site to site.
However, a generic standard flow storage format may provide limited
direct support for indexing and searching.
The ideal flow storage format will support a limited table of
contents facility noting that the records in a file contain data
relating only to certain keys or values of keys, in order to keep
multi-file search implementations from having to scan a file for data
it does not contain.
The IPFIX Message format has no direct support for indexing.
However, the technique described in "Reducing Redundancy in IP Flow
Information Export (IPFIX) and Packet Sampling (PSAMP) Reports"
[RFC5473] can be used to describe the contents of a file in a limited
way. Additionally, as flow data is often sorted and divided by time,
the start and end time of the flows in a file may be declared using
the File Time Window Options Template defined in Section 8.1.2.
5.5. Error Recovery
When storing flow data for archival purposes, it is important to
ensure that hardware or software faults do not introduce errors into
the data over time. The ideal flow storage format will support the
detection and correction of encoding-level errors in the data.
Note that more advanced error correction is best handled at a layer
below that addressed by this document. Error correction is a topic
well addressed by the storage industry in general (e.g., by Redundant
Array of Independent Disks (RAID) and other technologies). By
specifying a flow storage format based upon files, we can leverage
these features to meet this requirement.
However, the ideal flow storage format will be resilient against
errors, providing an internal facility for the detection of errors
and the ability to isolate errors to as few data records as possible.
Note that this requirement interacts with the choice of data
compression or encryption algorithm. For example, the use of block
compression algorithms can serve to isolate errors to a single
compression block, unlike stream compressors, which may fail to
resynchronize after a single bit error, invalidating the entire
The IPFIX Message format does not support data integrity assurance.
It is assumed that advanced error correction will be provided
externally. Compression and encryption, if used, provide some
allowance for detection, if not correction, of errors. For simple
error detection support in the absence of compression or encryption,
checksums may be attached to messages via IPFIX Options according to
the Message Checksum Options Template defined in Section 8.1.1.
5.6. Authentication, Confidentiality, and Integrity
Archival storage of flow data may also require assurance that no
unauthorized entity can read or modify the stored data. Cryptography
can be applied to this problem to ensure integrity and
confidentiality by signing and encryption.
As with error correction, this problem has been addressed well at a
layer below that addressed by this document. We can leverage the
fact that existing cryptographic technologies work quite well on data
stored in files to meet this requirement.
Beyond support for the use of Transport Layer Security (TLS) for
transport over TCP or Datagram Transport Layer Security (DTLS) for
transport over SCTP or UDP, both of which provide transient
authentication and confidentiality, the IPFIX protocol does not
support this requirement directly. The IETF has specified the
Cryptographic Message Syntax (CMS) [RFC3852] for creating detached
signatures for integrity and authentication; Section 9 specifies a
CMS-based method for signing IPFIX Files. Confidentiality protection
is assumed to be met by methods external to this specification,
leveraging one of the many such technologies for encrypting files to
meet specific application and process requirements; however, notes on
improving archival integrity of encrypted IPFIX Files are given in
5.7. Anonymization and Obfuscation
To ensure the privacy of individuals and organizations at the
endpoints of communications represented by flow records, it is often
necessary to obfuscate or anonymize stored and exported flow data.
The ideal flow storage format will provide for a notation that a
given information element on a given record type represents
anonymized, rather than real, data.
The IPFIX protocol presently has no support for anonymization
notation. It should be noted that anonymization is one of the
requirements given for IPFIX in [RFC3917]. The decision to qualify
this requirement with 'MAY' and not 'MUST' in the requirements
document, and its subsequent lack of specification in the current
version of the IPFIX protocol, is due to the fact that anonymization
algorithms are still an open area of research, and that there
currently exist no standardized methods for anonymization.
No support is presently defined in [RFC5101] or this IPFIX-based File
format for anonymization, as anonymization notation is an area of
open work for the IPFIX Working Group.
5.8. Session Auditability and Replayability
Certain use cases for archival flow storage require the storage of
collection infrastructure details alongside the data itself. These
details include information about how and when data was received, and
where it was received from. They are useful for auditing as well as
for the replaying received data for testing purposes.
The IPFIX protocol contains no direct support for auditability and
replayability, though the IPFIX Information Model does define various
Information Elements required to represent collection infrastructure
details. These details may be stored in IPFIX Files using the Export
Session Details Options Template defined in Section 8.1.3, and the
Message Details Options Template defined in Section 8.1.4.
5.9. Performance Characteristics
The ideal standard flow storage format will not have a significant
negative impact on the performance of the application generating or
processing flow data stored in the format. This is a non-functional
requirement, but it is important to note that a standard that implies
a significant performance penalty is unlikely to be widely
implemented and adopted.
An examination of the IPFIX protocol would seem to suggest that
implementations of it are not particularly prone to slowness; indeed,
a template-based data representation is more easily subject to
optimization for common cases than representations that embed
structural information directly in the data stream (e.g., XML).
However, a full analysis of the impact of using IPFIX Messages as a
basis for flow data storage on read/write performance will require
more implementation experience and performance measurement.