Skip to content

A collection of tools useful across the Bayse ecosystem

License

Notifications You must be signed in to change notification settings

BayseIntelligence/bayse_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayse Tools

The bayse-tools package provides functionality that allows users to convert a growing number of network flow formats into the lightweight BayseFlow format. Doing so offers the ability to interact with the Bayse labeling and knowledgebase functionality, which provides detailed insights about how your systems are communicating with the Internet and your internal devices. For more information about use cases, explore our Use Cases online.

Installation

Note that this package requires libpcap-dev to be installed on your system. Please use your system's package manager (such as apt on Ubuntu) to install libpcap-dev:

sudo apt-get install libpcap-dev

While Windows isn't yet supported due to issues with underlying libraries (specifically pcapy), we'd welcome anyone who wants to document the steps to make it work. At the very minimum, you will need the following:

  • A C++ compiler. Microsoft Visual Studio Build Tools is known to work.
  • Npcap's SDK, which is a replacement for the WinPCAP developer's kit.

To install the bayse-tools package, simply type the following:

pip install bayse-tools

Available Modules

There are two modules available within this package. Details about each are below.

A. Standalone Converter

module name: converter

This module allows you to convert captured network traffic from any of our supported formats into unlabeled BayseFlows (BayseFlows without BayseFlow category labels). This is useful if you already have network telemetry that you'd like to enrich with our knowledgebase and labeling.

To import this module into your project, type the following:

from bayse_tools.converter import convert

Supported File Formats

Bayse currently supports conversion into BayseFlow format from the following formats:

Format Specific File Types
Packet Capture CAP, PCAP, and PCAPNG
Zeek conn.log and dns.log in TSV or JSON format
Interflow comma-separated list of JSON records
Netflow CSV of unidirectional Netflow v9 records

? If you have a format that you'd like us to support, please contact support at bayse [.] io.

Usage

To convert a Packet Capture into an unlabeled BayseFlow file, simply enter the following:

convert.convert_pcap(<path_to_packet_capture>)

To convert a Zeek conn.log into an unlabeled BayseFlow file, simply enter the following:

convert.convert_zeek(<path_to_conn_log>, zeek_dnsfile_location=<optional_path_to_dns_log>)

If you have (and would like to include) DNS information that was captured by Zeek, provide the dns.log in addition to the conn.log. Naming will be much enhanced by doing so.

To convert an Interflow log into an unlabeled BayseFlow file, simply enter the following:

convert.convert_interflow(<path_to_interflow_log>)

Any DNS Interflows should also be passed in within the Interflow log.

Note that there are MANY fields in Interflow that will not be a part of this converter, as they are not necessary to convert to BayseFlow. The log file is expected to contain a comma-separated list of Interflow records in JSON format. For non-DNS records, below is an example Interflow record that includes only the fields we need:

[{"timestamp": 1656517273641, "duration": 401, "_id": "6c0liABC8qtQm3loQr7H", "msg_class": "interflow_traffic", "srcip": "172.18.40.120", "srcport": 55503,"dstip": "142.251.40.65", "dstip_host": "ci3.googleusercontent.com", "dstport": 80, "proto_name": "tcp", "outbytes_total": 0, "inpkts_delta": 5, "outpkts_delta": 0, "inbytes_total": 17765}]

To convert a Netflow CSV into an unlabeled BayseFlow file, simply enter the following:

convert.convert_netflow(<path_to_Netflow_csv>)

Similar to Interflow, there are MANY fields in Netflow v9 that are not used.

Plaintext Formats

Zeek and Interflow data is supported as JSON files in two formats:

  1. Comma-Separated List of JSON records (preferred):

[{json_record_1},{json_record_2}]

  1. One JSON record per line:
{json_record_1}
{json_record_2}

Moreover, Zeek also supports plaintext TSV (i.e. the OG). If you are converting plaintext logs, please make sure that your conn.log files have the following fields in the following order (Field # lines are annotations for clarity below and should NOT be included in the file):

        Field #  1       2        3                 4               5
                ts      uid     id.orig_h       id.orig_p       id.resp_h
        Field #    6             7        8        9               10
                id.resp_p       proto   service duration        orig_bytes
        Field #     11              12              13              14
                resp_bytes      conn_state      local_orig      local_resp
        Field #     15             16      17               18             19
                missed_bytes    history orig_pkts       orig_ip_bytes  resp_pkts
        Field #       20              21
                resp_ip_bytes   tunnel_parents

An example of a flow line from the conn.log is as follows ("field #" lines are our annotation):

        Field #       1                        2                      3
                1601060272.439360       CC9S3G178KjzSMTGRk      192.168.100.224
        Field #   4             5        6       7       8          9
                 137    192.168.100.255 137     udp     dns     12.114023
        Field #  10     11    12      13     14      15      16       17
                1186    0     S0      -       -       0       D       23
        Field #  18     19      20      21
                1830    0       0       -

For dns.log files (again, if you're not using JSON), make sure that the file matches the following:

        Field #       1       2          3               4               5               6             7         8
                      ts      uid     id.orig_h       id.orig_p       id.resp_h       id.resp_p       proto   trans_id
        Field #       9       10        11               12             13         14           15        16
                     rtt     query    qclass         qclass_name       qtype   qtype_name      rcode   rcode_name
        Field #      17      18      19      20      21        22         23        24
                     AA      TC      RD      RA      Z       answers     TTLs    rejected

Note that all field names are not used. If you do not have a field, please give it an appropriate default value as defined by the Zeek format.

B. Streaming BayseFlow Collector

module name: streaming

This module allows you to directly capture network traffic as unlabeled BayseFlows (BayseFlows without flow category labels). This is beneficial for a number of reasons:

BayseFlows are extremely lightweight, so you can actually do this continuously in the background without affecting system performance (i.e. you can collect network telemetry continuously on your endpoints)! BayseFlows have no identifying data, so you can avoid worrying about accidentally leaking information (URIs, passwords, keys, etc...)

To import this module into your project, type the following:

from bayse_tools.streaming import streaming

Usage

To capture BayseFlows continuously using this module from your project, enter the following (note that you'll need to be root by default to capture packets):

streaming.start(<interface_name>, <duration_in_seconds>, <optional_verbosity>)

Note that the above will run in perpetuity, capturing packets from the interface you specified (such as enp0s3 on Ubuntu systems). Every time the specified duration you provided is reached (if you provide no value, it defaults to 300 seconds), a sample will be created, processed, and summarized. Providing the optional is_verbose value will print a small amount of information about the number of BayseFlows and its UUID.

BayseFlow Format

The BayseFlow format is saved as a .bf file and is comprised of records containing that look like the following:

{
  "hash": "6d8091a34ae3427bbffa5c19e5d3391c",
  "trafficDate": "1654204899.899094",
  "fileName": "testCase24_ipv6_withDNS_and_icmpv6.conn.bf",
  "BayseFlows": [
    {
      "src": "192.168.12.164:62239",
      "dst": "9.9.9.9:53",
      "destinationNameSource": "original",
      "srcPkts": 1,
      "srcBytes": 32,
      "dstPkts": 1,
      "dstBytes": 155,
      "relativeStart": 0.0,
      "protocolInformation": "UDP",
      "identifier": "CvJxCegvPKSYQpTeh",
      "duration": 0.055491
    },
    {
      "src": "2607:fb90:d726:7b19:5cc5:a122:d8ae:24e6:58740",
      "dst": "r3.o.lencr.org:80",
      "destinationNameSource": "passive",
      "srcPkts": 84,
      "srcBytes": 14526,
      "dstPkts": 47,
      "dstBytes": 35524,
      "relativeStart": 0.112856,
      "protocolInformation": "TCP",
      "identifier": "COVJH14OZAUWEf1Vd",
      "duration": 48.68848
    },
    ...
  ]
}

Details about each field are as follows:

Field Name Purpose Derived How
hash helps to identify the BayseFlow file randomly-generated
trafficDate identifies the time (as UTC epoch time in seconds.ms) the traffic capture started timestamps from initial file
fileName name of the file based on name of input file
BayseFlows mapping of each flow to the BayseFlow format (fields below here are subfields of BayseFlows) Combines packets/flows based on their 4-tuples
--> src identify the source of the flow attempts to identify the true source based on how the flow is communicating; will add a DNS name if known
--> dst identify the destination of the flow attempts to identify the true destination based on how the session is communicating; will add a DNS name if known
--> destinationNameSource identify how the destination was named names are derived (in order) by direct knowledge of the name (in the flow; session), the closest DNS lookup (passive), a short-term local cache (cache), a long-term local cache (cache), or they stay as an IP (original)
--> srcPkts number of packets from the source counts how many packets seen in the input file from this source in this flow
--> srcBytes number of bytes from the source counts how many bytes were seen in the transport layer payloads (or ICMP data) from the source in this flow
--> dstPkts number of packets from the destination counts how many packets seen in the input file from this destination in this flow
--> dstBytes number of bytes from the destination counts how many bytes were seen in the transport layer payloads (or ICMP data) from the destination in this flow
--> relativeStart how far into this file this flow began time (as seconds.ms) from trafficDate
--> protocolInformation communicate any information about the transport layer protocol (or ICMP) seen transport layer reported by network layer
--> identifier a way to tie the BayseFlow back to a flow in the original input format unique_id for Zeek, TCP/UDP stream for Packet Captures
--> duration identify how long (seconds.ms) a flow lasted start of first packet to end of last packet in flow

Labeling

BayseFlows can also be labeled with information about characteristics of the flow that are interesting. These labels are useful for identifying activity that is (ab)normal, understanding how data is flowing locally or to/from the Internet, and so on. Doing so adds the label field to the .bf file. Take the following flow for example:

    {
      "src": "192.168.1.133:59893",
      "dst": "54.37.70.105:8080",
      "destinationNameSource": "original",
      "srcPkts": 218,
      "srcBytes": 1540,
      "dstPkts": 1801,
      "dstBytes": 2497139,
      "relativeStart": 17.916785,
      "protocolInformation": "TCP",
      "identifier": "115",
      "duration": 5.62408,
      "label": "floodOfXLFilelikeDownloaded"
    }

The floodOfXLFilelikeDownloaded shows us that something that looks like a large file was downloaded in just seconds. Depending on whether or not that activity is expected from that destination can help you to determine if you should care.

To label your data, you'll first need to register for an account on Bayse, request a free API key, and reach out to hello at bayse [.] io. We'd love to help you get started!

Troubleshooting

If you are having issues installing bayse-tools, here are some common issues:

  • Your system does not have libpcap-dev installed. Please refer to the Installation steps above for details.
  • You have pcapy installed and are trying to use Python 3.10 or greater. In v1.0.1 of bayse-tools, we have moved away from pcapy in favor of pcapy-ng. Please make sure to uninstall the former if it exists.

If nothing above helps or you have other questions, please reach out to us at support at bayse [.] io

License

This software is provided under the MIT Software License. See the accompanying LICENSE file for more information.

About

A collection of tools useful across the Bayse ecosystem

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages