The bayse-tools
package provides functionality that allows users to convert a growing number of network flow formats
into the lightweight BayseFlow format. Doing so offers the ability to interact with the Bayse labeling and
knowledgebase functionality, which provides detailed insights about how your systems are communicating with the
Internet and your internal devices. For more information about use cases, explore our
Use Cases online.
Note that this package requires libpcap-dev to be installed on your system. Please use your system's package manager (such as apt on Ubuntu) to install libpcap-dev:
sudo apt-get install libpcap-dev
While Windows isn't yet supported due to issues with underlying libraries (specifically pcapy), we'd welcome anyone who wants to document the steps to make it work. At the very minimum, you will need the following:
- A C++ compiler. Microsoft Visual Studio Build Tools is known to work.
- Npcap's SDK, which is a replacement for the WinPCAP developer's kit.
To install the bayse-tools package, simply type the following:
pip install bayse-tools
There are two modules available within this package. Details about each are below.
module name: converter
This module allows you to convert captured network traffic from any of our supported formats into unlabeled BayseFlows (BayseFlows without BayseFlow category labels). This is useful if you already have network telemetry that you'd like to enrich with our knowledgebase and labeling.
To import this module into your project, type the following:
from bayse_tools.converter import convert
Bayse currently supports conversion into BayseFlow format from the following formats:
Format | Specific File Types |
---|---|
Packet Capture | CAP , PCAP , and PCAPNG |
Zeek | conn.log and dns.log in TSV or JSON format |
Interflow | comma-separated list of JSON records |
Netflow | CSV of unidirectional Netflow v9 records |
?
If you have a format that you'd like us to support, please contact support at bayse [.] io.
To convert a Packet Capture into an unlabeled BayseFlow file, simply enter the following:
convert.convert_pcap(<path_to_packet_capture>)
To convert a Zeek conn.log
into an unlabeled BayseFlow file, simply enter the following:
convert.convert_zeek(<path_to_conn_log>, zeek_dnsfile_location=<optional_path_to_dns_log>)
If you have (and would like to include) DNS information that was captured by Zeek, provide the dns.log
in addition
to the conn.log
. Naming will be much enhanced by doing so.
To convert an Interflow log into an unlabeled BayseFlow file, simply enter the following:
convert.convert_interflow(<path_to_interflow_log>)
Any DNS Interflows should also be passed in within the Interflow log.
Note that there are MANY fields in Interflow that will not be a part of this converter, as they are not necessary to convert to BayseFlow. The log file is expected to contain a comma-separated list of Interflow records in JSON format. For non-DNS records, below is an example Interflow record that includes only the fields we need:
[{"timestamp": 1656517273641, "duration": 401, "_id": "6c0liABC8qtQm3loQr7H", "msg_class": "interflow_traffic", "srcip": "172.18.40.120", "srcport": 55503,"dstip": "142.251.40.65", "dstip_host": "ci3.googleusercontent.com", "dstport": 80, "proto_name": "tcp", "outbytes_total": 0, "inpkts_delta": 5, "outpkts_delta": 0, "inbytes_total": 17765}]
To convert a Netflow CSV into an unlabeled BayseFlow file, simply enter the following:
convert.convert_netflow(<path_to_Netflow_csv>)
Similar to Interflow, there are MANY fields in Netflow v9 that are not used.
Zeek and Interflow data is supported as JSON files in two formats:
- Comma-Separated List of JSON records (preferred):
[{json_record_1},{json_record_2}]
- One JSON record per line:
{json_record_1}
{json_record_2}
Moreover, Zeek also supports plaintext TSV
(i.e. the OG). If you are converting plaintext logs, please make sure
that your conn.log
files have the following fields in the following order (Field # lines are annotations for
clarity below and should NOT be included in the file):
Field # 1 2 3 4 5
ts uid id.orig_h id.orig_p id.resp_h
Field # 6 7 8 9 10
id.resp_p proto service duration orig_bytes
Field # 11 12 13 14
resp_bytes conn_state local_orig local_resp
Field # 15 16 17 18 19
missed_bytes history orig_pkts orig_ip_bytes resp_pkts
Field # 20 21
resp_ip_bytes tunnel_parents
An example of a flow line from the conn.log
is as follows ("field #" lines are our annotation):
Field # 1 2 3
1601060272.439360 CC9S3G178KjzSMTGRk 192.168.100.224
Field # 4 5 6 7 8 9
137 192.168.100.255 137 udp dns 12.114023
Field # 10 11 12 13 14 15 16 17
1186 0 S0 - - 0 D 23
Field # 18 19 20 21
1830 0 0 -
For dns.log files (again, if you're not using JSON), make sure that the file matches the following:
Field # 1 2 3 4 5 6 7 8
ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto trans_id
Field # 9 10 11 12 13 14 15 16
rtt query qclass qclass_name qtype qtype_name rcode rcode_name
Field # 17 18 19 20 21 22 23 24
AA TC RD RA Z answers TTLs rejected
Note that all field names are not used. If you do not have a field, please give it an appropriate default value as defined by the Zeek format.
module name: streaming
This module allows you to directly capture network traffic as unlabeled BayseFlows (BayseFlows without flow category labels). This is beneficial for a number of reasons:
BayseFlows are extremely lightweight, so you can actually do this continuously in the background without affecting system performance (i.e. you can collect network telemetry continuously on your endpoints)! BayseFlows have no identifying data, so you can avoid worrying about accidentally leaking information (URIs, passwords, keys, etc...)
To import this module into your project, type the following:
from bayse_tools.streaming import streaming
To capture BayseFlows continuously using this module from your project, enter the following (note that you'll need to be root by default to capture packets):
streaming.start(<interface_name>, <duration_in_seconds>, <optional_verbosity>)
Note that the above will run in perpetuity, capturing packets from the interface you specified (such as enp0s3 on
Ubuntu systems). Every time the specified duration you provided is reached (if you provide no value, it defaults to
300 seconds), a sample will be created, processed, and summarized. Providing the optional is_verbose
value will
print a small amount of information about the number of BayseFlows and its UUID.
The BayseFlow format is saved as a .bf
file and is comprised of records containing that look like the following:
{
"hash": "6d8091a34ae3427bbffa5c19e5d3391c",
"trafficDate": "1654204899.899094",
"fileName": "testCase24_ipv6_withDNS_and_icmpv6.conn.bf",
"BayseFlows": [
{
"src": "192.168.12.164:62239",
"dst": "9.9.9.9:53",
"destinationNameSource": "original",
"srcPkts": 1,
"srcBytes": 32,
"dstPkts": 1,
"dstBytes": 155,
"relativeStart": 0.0,
"protocolInformation": "UDP",
"identifier": "CvJxCegvPKSYQpTeh",
"duration": 0.055491
},
{
"src": "2607:fb90:d726:7b19:5cc5:a122:d8ae:24e6:58740",
"dst": "r3.o.lencr.org:80",
"destinationNameSource": "passive",
"srcPkts": 84,
"srcBytes": 14526,
"dstPkts": 47,
"dstBytes": 35524,
"relativeStart": 0.112856,
"protocolInformation": "TCP",
"identifier": "COVJH14OZAUWEf1Vd",
"duration": 48.68848
},
...
]
}
Details about each field are as follows:
Field Name | Purpose | Derived How |
---|---|---|
hash | helps to identify the BayseFlow file | randomly-generated |
trafficDate | identifies the time (as UTC epoch time in seconds.ms) the traffic capture started | timestamps from initial file |
fileName | name of the file | based on name of input file |
BayseFlows | mapping of each flow to the BayseFlow format (fields below here are subfields of BayseFlows) | Combines packets/flows based on their 4-tuples |
--> src | identify the source of the flow | attempts to identify the true source based on how the flow is communicating; will add a DNS name if known |
--> dst | identify the destination of the flow | attempts to identify the true destination based on how the session is communicating; will add a DNS name if known |
--> destinationNameSource | identify how the destination was named | names are derived (in order) by direct knowledge of the name (in the flow; session ), the closest DNS lookup (passive ), a short-term local cache (cache ), a long-term local cache (cache ), or they stay as an IP (original ) |
--> srcPkts | number of packets from the source | counts how many packets seen in the input file from this source in this flow |
--> srcBytes | number of bytes from the source | counts how many bytes were seen in the transport layer payloads (or ICMP data) from the source in this flow |
--> dstPkts | number of packets from the destination | counts how many packets seen in the input file from this destination in this flow |
--> dstBytes | number of bytes from the destination | counts how many bytes were seen in the transport layer payloads (or ICMP data) from the destination in this flow |
--> relativeStart | how far into this file this flow began | time (as seconds.ms) from trafficDate |
--> protocolInformation | communicate any information about the transport layer protocol (or ICMP) seen | transport layer reported by network layer |
--> identifier | a way to tie the BayseFlow back to a flow in the original input format | unique_id for Zeek, TCP/UDP stream for Packet Captures |
--> duration | identify how long (seconds.ms) a flow lasted | start of first packet to end of last packet in flow |
BayseFlows can also be labeled with information about characteristics of the flow that are interesting. These labels
are useful for identifying activity that is (ab)normal, understanding how data is flowing locally or to/from the
Internet, and so on. Doing so adds the label
field to the .bf
file. Take the following flow for example:
{
"src": "192.168.1.133:59893",
"dst": "54.37.70.105:8080",
"destinationNameSource": "original",
"srcPkts": 218,
"srcBytes": 1540,
"dstPkts": 1801,
"dstBytes": 2497139,
"relativeStart": 17.916785,
"protocolInformation": "TCP",
"identifier": "115",
"duration": 5.62408,
"label": "floodOfXLFilelikeDownloaded"
}
The floodOfXLFilelikeDownloaded
shows us that something that looks like a large file was downloaded in just
seconds. Depending on whether or not that activity is expected from that destination can help you to determine if
you should care.
To label your data, you'll first need to register for an account on Bayse, request a free API key, and reach out to hello at bayse [.] io. We'd love to help you get started!
If you are having issues installing bayse-tools
, here are some common issues:
- Your system does not have libpcap-dev installed. Please refer to the Installation steps above for details.
- You have
pcapy
installed and are trying to use Python 3.10 or greater. In v1.0.1 ofbayse-tools
, we have moved away frompcapy
in favor ofpcapy-ng
. Please make sure to uninstall the former if it exists.
If nothing above helps or you have other questions, please reach out to us at support at bayse [.] io
This software is provided under the MIT Software License. See the accompanying LICENSE file for more information.