WiSig: RF Fingerprinting Dataset

The WiFi Signal (WiSig) dataset consists of WiFi captures made using many transmitters and receivers over different days in the Orbit testbed. A more detailed description of the dataset and its potential uses is described in the dataset paper [1].

Overview

WiSig is the largest WiFi RF fingerprinting dataset publicly available. It contains 10 million packets captured from 174 off-the-shelf WiFi transmitters and 41 USRP receivers over 4 captures spanning a month. WiSig is available, not just as raw captures, but as conveniently prepackaged subsets of limited size, along with the preprocessing scripts and examples.

WiSig enables deployment-scale research into RF fingerprinting. By including signals captured by a large number Rx along multiple days, it can provide a better understanding of the impact of receivers and channels on transmitter identification.

Capture and Processing

The data was captured in Orbit testbed grid. The Orbit grid consists of a 20 by 20 two dimensional grid, with each node consisting of a computer equipped with radio hardware. All nodes have at least one WiFi radio and some have USRPs.

In a one day capture, signals from each transmitter are captured one-at-time. Each transmitter is configured to send random bytes to a WiFi access point (AP) using the same spoofed MAC and IP addressed over the 2.4 GHz band. During this transmission, all USRP receivers are configured to capture the signals at the same bandwidth for a duration of about 0.5s. The signals captured by the USRPs without any processing constitute the Raw WiSig dataset. Raw WiSig is not directly usable for RF fingeprinting since it contains idle time and the ACK response from the WiFi AP.

The USRP captures forming Raw WiSig are then processed to obtain the identification (Id) signals. In our processing, we used energy detection to isolate the packets and remove the ACK responses. Two versions of the Id Signals are provided; the first one consists of the first 256 samples of the unprocessed preambles, the second one contains an equalized version of the same 256 samples. After processing the Id Signals, we obtain the Full WiSig dataset.

More details about the capture and processing are provided in the paper.

Compact Subsets

While Full WiSig contains captures from all Tx and Rx, it is not balanced. That is not all Tx-Rx pairs have the same number of signals for all days. This variability in transmitted packets is due to the WiFi MAC protocol, along with the lack of time synchronization among Rx. Additionally, the size of Full WiSig is relatively large (Over 70GB).

To make WiSig easier to use, we created five prepackaged compact subsets, which focus on a given aspect of the WiSig datset: ManySig provides a large number of signals for each Tx-Rx pairs, ManyTx provides a a large number of Tx, ManyRx focuses on having a large number of Rx, and SingleDay provides relatively many signals and transmitters but only for one day. Note that only ManySig and SingleDay are perfectly balanced. For the remaining datasets at least 90% of Tx per Rx satisfy the number of signals.

NameNo of TxNo of RxNo of Sig.Days
ManySig61210004
ManyTx15018504
ManyRx10322004
SingleDay28108001

Compact Subsets & Examples

We provide the instructions to download and use the compact subsets. This is the recommended approach to get started with WiSig.

Data

The data can be directly downloaded as zipped files, one file per susbset. Note that the different subsets have some transmitters, receivers, and signals in common, hence, they should not be treated as distinct.

DownloadSize
ManyRx1.2 GB
ManyTx2.5 GB
ManySig1.4 GB
SingleDay1 GB

Code

github: wisig-examples

Description: The code provides functions to load the compact datasets. It also includes the code and the weights used to generate the WiSig use cases presented in the paper. A description of the signals in Full WiSig, along with the hardware of each Tx and Rx is also provided.


Full WiSig

We provide the instructions to download and use Full WiSig. You are recommended to use Full WiSig only if your requirements are not met using the prepackaged subsets; that is you need more Tx, Rx, or signals. Downloading and using Full WiSig requires more memory and storage than the prepackaged subsets.

Data

Depending whether you want to download the whole Full WiSig or only some files, there a several options

  1. Whole dataset:
    1. You can download the zipped version through the links directly. Pro: Smaller download size. Con: The zipped data per file is about 8GB, which might need a lot of memory and space to extract.
    2. You can download the unzipped version. This can be accomplished using google backup and sync by adding the folder to your personal drive, then sync it to your computer. Other google sync utilities can be used (like gdrive, or similar). Pros: No unzipping. Cons: Larger download size
  2. Partial dataset:
    The function create_dataset_impl provides the download size and links for the needed files, which are not on disk. If only a reasonable number of files is needed, it might be better to download them individually instead of the whole dataset.
Download DirectoryTotal Size
Full WiSig (unzipped)76.9 GB
Full WiSig (zipped)42 GB

Code

github: wisig-subset-creation

Description: The code provides solvers to assist the users create a Tx and Rx lists to meet his required parameters. Then it provides a function to specify the missing files to download and then package the required data into a dictionary, which can be stored as pkl file. Afterwards, the provided code for the compact subsets examples can be used.


Raw WiSig

Raw WiSig contains the data uploaded directly from Orbit testbed. It has a huge size of 1.7 TB. Processing it consists of many steps run manually, which would take a few days to run aside from the large storage required.

Data

The WiSig Raw was uploaded directly from Orbit testbed. The data is stored per receiver. For each receiver, signals from at most 20 Tx are zipped together as a single Tx group. Each Rx has up 9 Tx groups. Note that in the provided google drive folder, there are multiple folders with the same name (google does not automatically merge them into one folder)

To download the data, google backup and sync can be used by adding the folder to your personal drive, then syncing the data to your computer. Other google sync utilities can be used (like gdrive, or similar).

Similarly, a function is provided to provide direct links, if only a subset of the data is needed.

Download DirectoryTotal Size
WiSig Raw1.4 TB

Code

github: wisig-process-raw

Description: The code provides function to specify the files to download, then extract them. It provides the MATLAB code for packet detection & screening, equalization, then the creation of the pkl files.


Replicate Capture

We provide the code to replicate the capture setup used by WiSig. Replicating the captures requires at least two WiFi modules (one of which can act as an access point), and a USRP. The code provided assumes three PCs are used, one for each device, with Ubuntu installed.

Code

github: wisig-capture-commands

Description: Provides the code to replicate the WiSig capture for one WiFi Tx, WiFi AP, and USRP Rx.

Usage

If you use the WiSig datasets/codes or any (modified) part of them, please cite:

  1. The WiSig paper:

[1] S. Hanna, S. Karunaratne, and D. Cabric, “WiSig: A Large-Scale WiFi Signal Dataset for Receiver and Channel Agnostic RF Fingerprinting,” arXiv:2112.15363 [eess], Dec. 2021, Accessed: Jan. 03, 2022. [Online]. Available: http://arxiv.org/abs/2112.15363

@article{hanna_wisig_2021,
title = {{{WiSig}}: {{A Large-Scale WiFi Signal Dataset}} for {{Receiver}} and {{Channel Agnostic RF Fingerprinting}}},
shorttitle = {{{WiSig}}},
author = {Hanna, Samer and Karunartne, Samurdhi and Cabric, Danijela},
year = {2021},
month = dec,
journal = {arXiv:2112.15363 [eess]},
eprint = {2112.15363},
eprinttype = {arxiv},
primaryclass = {eess},
archiveprefix = {arXiv},
keywords = {Electrical Engineering and Systems Science – Signal Processing},
}

2. The Orbit testbed paper:

D. Raychaudhuri, I. Seskar, M. Ott, S. Ganu, K. Ramachandran,H. Kremo, R. Siracusa, H. Liu, and M. Singh, “Overview of the ORBITradio grid testbed for evaluation of next-generation wireless networkprotocols,” inWireless Communications and Networking Conference,2005 IEEE, vol. 3, pp. 1664–1669, IEEE, 2005.

@inproceedings{orbit_2005, title = {Overview of the {{ORBIT}} Radio Grid Testbed for Evaluation of Next-Generation Wireless Network Protocols}, booktitle = {Wireless {{Communications}} and {{Networking Conference}}, 2005 {{IEEE}}}, author = {Raychaudhuri, Dipankar and Seskar, Ivan and Ott, Max and Ganu, Sachin and Ramachandran, Kishore and Kremo, Haris and Siracusa, Robert and Liu, Hang and Singh, Manpreet}, year = {2005}, volume = {3}, pages = {1664–1669}, publisher = {{IEEE}} }

License

The WiSig dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Please contact the following email address for any questions:

uclacores+wisig@g.ucla.edu