betrügerischen Fake-Shops &
schädlichen Android Apps auf der Spur



What does the fake shop dataset of MAL2 include?

Customers report suspected cases of fraudulent online shops to the Watchlist Internet. ÖIAT's experts check these reports in a manual and multistage procedure. The suspicions of customers were confirmed in 96% of cases (2.814) during the MAL2 project. The evaluation of the tool-supported evaluation procedure shows that 85% of the fake shops and 83% of the trademark counterfeit shops could be clearly identified as such after stage one of the procedure ("online research"). In stage two "payment methods" a further 13% and 16% followed respectivley. In stage three "review of imprint information", 1.8% and 0.3% followed. 

The results of the expert-supported fake shop evaluation procedure are part of the fake shop data set. 2.756 fake shops and 283 professional online shops were archived for the machine learning application of the MAL2 project. Professional shops include providers that have been awarded the Austrian e-commerce quality mark.

The website data scraping is based on the open source web crawling framework Scrapy and is written in Python. The main function of the tool is to archive relevant contents of websites that could be relevant for the classification of fake shops. This includes all HTML code, CSS code, images of a url and all first order links.

The Scrapy object LinkExtractor is used to define which characteristics are archived. In addition, a screenshot of the page is created using a docker container with the Javascript rendering service Splash. A log file with timestamp, software version and category of the webshop is saved with the archive record.

What does the malware dataset of MAL2 include?

The MAL2 Android Malware Ground-Truth dataset was compiled in two iterations. The first iteration was kept small with 56.392 APKs and was used to test the proof-of-concept prototype in the project. For this purpose, 45.676 APKs (of which a total of "Benign“ 27.965) were used as training data, 5.076 as test data and 5.640 as validation data. It contains samples from 430 different PUA familes and 25 Trojan families. In the final iteration 790 thousand APK datasets consisting of Malware, Adware, Probably Clean and Google Play Samples were collected and their correct allocation was verified by using the IKARUS scanner.

Using the developed MAL2 framework, a feature extraction from the ground-truth dataset took place. The resulting text data is part of the Android malware dataset.

Request form

for the "MAL2 Ground Truth Dataset"

AIT Logo

Oiat Logo

KSÖ Logo

X-Net Logo

Ikarus Logo