Overview

Overview

WiSig is the largest WiFi RF fingerprinting dataset publicly available. It contains 10 million packets captured from 174 off-the-shelf WiFi transmitters and 41 USRP receivers over 4 captures spanning a month. WiSig is available, not just as raw captures, but as conveniently prepackaged subsets of limited size, along with the preprocessing scripts and examples.

WiSig enables deployment-scale research into RF fingerprinting. By including signals captured by a large number Rx along multiple days, it can provide a better understanding of the impact of receivers and channels on transmitter identification.

Capture and Processing

The data was captured in Orbit testbed grid. The Orbit grid consists of a 20 by 20 two dimensional grid, with each node consisting of a computer equipped with radio hardware. All nodes have at least one WiFi radio and some have USRPs.

In a one day capture, signals from each transmitter are captured one-at-time. Each transmitter is configured to send random bytes to a WiFi access point (AP) using the same spoofed MAC and IP addressed over the 2.4 GHz band. During this transmission, all USRP receivers are configured to capture the signals at the same bandwidth for a duration of about 0.5s. The signals captured by the USRPs without any processing constitute the Raw WiSig dataset. Raw WiSig is not directly usable for RF fingeprinting since it contains idle time and the ACK response from the WiFi AP.

The USRP captures forming Raw WiSig are then processed to obtain the identification (Id) signals. In our processing, we used energy detection to isolate the packets and remove the ACK responses. Two versions of the Id Signals are provided; the first one consists of the first 256 samples of the unprocessed preambles, the second one contains an equalized version of the same 256 samples. After processing the Id Signals, we obtain the Full WiSig dataset.

More details about the capture and processing are provided in the paper.

Compact Subsets

While Full WiSig contains captures from all Tx and Rx, it is not balanced. That is not all Tx-Rx pairs have the same number of signals for all days. This variability in transmitted packets is due to the WiFi MAC protocol, along with the lack of time synchronization among Rx. Additionally, the size of Full WiSig is relatively large (Over 70GB).

To make WiSig easier to use, we created five prepackaged compact subsets, which focus on a given aspect of the WiSig datset: ManySig provides a large number of signals for each Tx-Rx pairs, ManyTx provides a a large number of Tx, ManyRx focuses on having a large number of Rx, and SingleDay provides relatively many signals and transmitters but only for one day. Note that only ManySig and SingleDay are perfectly balanced. For the remaining datasets at least 90% of Tx per Rx satisfy the number of signals.

NameNo of TxNo of RxNo of Sig.Days
ManySig61210004
ManyTx15018504
ManyRx10322004
SingleDay28108001