Pages

Thursday, March 31, 2016

From Telemetry to Open Source: an Overview of Windows 10 Source Tree


There is a lot of internal information available about Microsoft software, despite the fact that it is closed-source. For example, export of library functions by names, which provides some information on the interfaces used. Debugging symbols used for troubleshooting of operating system errors are publicly available; however, there are only compiled binary modules at hand. In this article, we will try to determine what they looked like prior to compilation using only legal methods. 

Raising this question is not new, as Mark Russinovich and Alex Ionescu did this before; however, my research was more detailed. What we need is debugging symbol packages, which are publically available, in this case — the most recent release of Windows 10 (64 bit), both free and checked builds.

Debugging symbols are a set of .pdb (program database) files that keep various information used for debugging purposes of Windows binary modules including names for globals, functions, and data structures, sometimes even with field names.

We can also use information from an almost-publicly-available checked build of Windows 10. This kind of build is full of debugging assertions that contain sensitive information about local variable names and even source line numbers.



The example above, while not providing an absolute path, does expose extremely helpful path information. 

If we feed debugging symbols to the "strings" utility by Sysinternals, we get around 13 GB of raw data. However, repeating this with Windows installation files is a bad idea because it would generate useless data. Therefore, we limit target file types with the following list: exe — executable files, sys — drivers, dll — libraries, ocx — ActiveX components, cpl — control panel elements, efi — EFI applications, in particular, the bootloader. Then we get additional 5.3 GB of raw data. I was initially surprised that there were so few programs that can open gigabytes-large files and even fewer programs that can search for specific data inside those files. I used 010 Editor for manual operations on the raw and temporary data and python scripts for automated data filtering.

Filtering Symbol Data

The symbol file contains a list of object files used for linking of a corresponding executable image. Object file paths are absolute.


  • Filtering clue No. 1: find strings using the mask ":\\".

We are able to get the absolute paths, sort them and remove duplicates, and due to the low volume of junk data, it can be removed manually. These results indicate the source tree structure. The root directory is "d:\th", which may stand for threshold, part of the name of the November release of Windows 10 — Threshold 1. However, we only get a few filenames starting with "d:\th". This is because the linker uses already compiled files as an input. Source files are compiled into the folders "d:\th.obj.amd64fre" for the release or free version of Windows and "d:\th.obj.amd64chk" for the checked or debug version.
  • Filtering clue No. 2: assuming that source files are stored as the corresponding object files after compilation, we can “decompile” object files back to the source ones. Please note that this step can produce an inaccurate structure in the source tree because we don't know for certain the compilation options used.
For example: 

d:\th.obj.amd64fre\shell\osshell\games\freecell\objfre\amd64\freecellgame.obj

turns into

d:\th\shell\osshell\games\freecell\freecellgame.c??

As for the file extensions, an object file can be produced from a range of different file types like "c", "cpp", "cxx", etc. and there is no way to identify the type of a source file, so we leave the "c??" extension.

There are a lot of different root directories, not only "d:\th". Others include "d:\th.public.chk" and "d:\th.public.fre", however, we shall omit these because they are just placeholders for publicly available SDKs. We also note there are many driver projects, which are seemingly built at developers' workplaces:

c:\users\joseph-liu\desktop\sources\rtl819xp_src\common\objfre_win7_amd64\amd64\eeprom.obj
C:\ALLPROJECTS\SW_MODEM\pcm\amd64\pcm.lib
C:\Palau\palau_10.4.292.0\sw\host\drivers\becndis\inbox\WS10\sandbox\Debug\x64\eth_tx.obj
C:\Users\avarde\Desktop\inbox\working\Contents\Sources\wl\sys\amd64\bcmwl63a\bcmwl63a\x64\Windows8Debug\nicpci.obj

There is a standard set of drivers for the devices that are compatible with public specifications, such as USB XHCI controllers, which is a part of a Windows source tree, while all vendor-specific drivers are built somewhere else.
  • Filtering clue No. 3: remove binary files, because we are only interested in source ones. Remove "pdb", "exp", "lib"; "res" files can be reverted to the original "rc" (resource compiler) files.

While this output is neat, we cannot get any additional information about source files from this step, so we must work with the next data set. 

Filtering Raw Binaries Data

As there are only a few absolute filenames in this data set, we will use the following extensions as a filter:
  • "c" — C sources
  • "cpp" — C++ sources
  • "cxx" — C or C++ sources
  • "h" — C header
  • "hpp" — C++ header
  • "hxx" — C or C++ header
  • "asm" — assembly source (MASM)
  • "inc" — assembly header (MASM)
  • "def" — module definition file
After the data is filtered, we can see that even though the filenames are not absolute, they are relative to the "d:\th" root, so we just add the "d:\th" string to all of the resulting filenames.

At this stage, there are problems with the filtered data. The first problem: we are not sure that object file paths were properly reverted to the source files paths.
  • Filtering clue No. 4: let's check if there are matching filepaths between filtered symbol data and filtered data from binaries.
They do match, so that means that we properly restored most of the directory structure for the source tree. There are some folders that might not be properly restored, but this level of inaccuracy is acceptable. We can also replace the "c??" extensions with a matching filepaths extensions.

The second problem is header files. Although a header file is a very important part of a source tree, it is not compiled into an object file. This means that we can't restore the information about header files from object files, so we can only locate and restore header files that were found in the raw data from binaries.

The third problem is that we still don't know the extensions for the most source files.
  • Filtering clue No. 5: assume that a directory contains source files of the same type.
This means that if a directory already contains the "cpp" source file, it is likely that all the other files in the same folder will be "cpp" sources.


  • Filtering clue No. 6: use external sources of information for detail specification.
I used Windows Research Kernel as a reference to the assembler sources and renamed some assembly sources by hand.

Inspecting the Result Data

A keyword search in the source filenames for "telemetry" resulted in 424 hits, the most interesting of which are listed below.

d:\th\admin\enterprisemgmt\enterprisecsps\v2\certificatecore\certificatestoretelemetry.cpp
d:\th\base\appcompat\appraiser\heads\telemetry\telemetryappraiser.cpp
d:\th\base\appmodel\search\common\telemetry\telemetry.cpp
d:\th\base\diagnosis\feedback\siuf\libs\telemetry\siufdatacustom.c??
d:\th\base\diagnosis\pdui\de\wizard\wizardtelemetryprovider.c??
d:\th\base\enterpriseclientsync\settingsync\azure\lib\azuresettingsyncprovidertelemetry.cpp
d:\th\base\fs\exfat\telemetry.c
d:\th\base\fs\fastfat\telemetry.c
d:\th\base\fs\udfs\telemetry.c
d:\th\base\power\energy\platformtelemetry.c??
d:\th\base\power\energy\sleepstudytelemetry.c??
d:\th\base\stor\vds\diskpart\diskparttelemetry.c??
d:\th\base\stor\vds\diskraid\diskraidtelemetry.cpp
d:\th\base\win32\winnls\els\advancedservices\spelling\platformspecific\current\spellingtelemetry.c??
d:\th\drivers\input\hid\hidcore\hidclass\telemetry.h
d:\th\drivers\mobilepc\location\product\core\crowdsource\locationoriontelemetry.cpp
d:\th\drivers\mobilepc\sensors\common\helpers\sensorstelemetry.cpp
d:\th\drivers\wdm\bluetooth\user\bthtelemetry\bthtelemetry.c??
d:\th\drivers\wdm\bluetooth\user\bthtelemetry\fingerprintcollector.c??
d:\th\drivers\wdm\bluetooth\user\bthtelemetry\localradiocollector.c??
d:\th\drivers\wdm\usb\telemetry\registry.c??
d:\th\drivers\wdm\usb\telemetry\telemetry.c??
d:\th\ds\dns\server\server\dnsexe\dnstelemetry.c??
d:\th\ds\ext\live\identity\lib\tracing\lite\microsoftaccounttelemetry.c??
d:\th\ds\security\base\lsa\server\cfiles\telemetry.c
d:\th\ds\security\protocols\msv_sspi\dll\ntlmtelemetry.c??
d:\th\ds\security\protocols\ssl\telemetry\telemetry.c??
d:\th\ds\security\protocols\sspcommon\ssptelemetry.c??
d:\th\enduser\windowsupdate\client\installagent\common\commontelemetry.cpp
d:\th\enduser\winstore\licensemanager\lib\telemetry.cpp
d:\th\minio\ndis\sys\mp\ndistelemetry.c??
d:\th\minio\security\base\lsa\security\driver\telemetry.cxx
d:\th\minkernel\fs\cdfs\telemetry.c
d:\th\minkernel\fs\ntfs\mp\telemetry.c??
d:\th\minkernel\fs\refs\mp\telemetry.c??
d:\th\net\netio\iphlpsvc\service\teredo_telemetry.c
d:\th\net\peernetng\torino\telemetry\notelemetry\peerdistnotelemetry.c??
d:\th\net\rras\ip\nathlp\dhcp\telemetryutils.c??
d:\th\net\winrt\networking\src\sockets\socketstelemetry.h
d:\th\shell\cortana\cortanaui\src\telemetrymanager.cpp
d:\th\shell\explorer\traynotificationareatelemetry.h
d:\th\shell\explorerframe\dll\ribbontelemetry.c??
d:\th\shell\fileexplorer\product\fileexplorertelemetry.c??
d:\th\shell\osshell\control\scrnsave\default\screensavertelemetryc.c??
d:\th\windows\moderncore\inputv2\inputprocessors\devices\keyboard\lib\keyboardprocessortelemetry.c??
d:\th\windows\published\main\touchtelemetry.h
d:\th\xbox\onecore\connectedstorage\service\lib\connectedstoragetelemetryevents.cpp
d:\th\xbox\shellui\common\xbox.shell.data\telemetryutil.c??

These results don’t generate additional information about the telemetry internals, but they do provide an interesting starting point for a more detailed research. 

I next found PatchGuard, but the source tree contains only one file of an unknown type (most likely binary).

d:\th\minkernel\ntos\ke\patchgd.wmp

Searching the unfiltered data reveals that PatchGuard is in fact a separate project.

d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen00.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen01.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen02.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen03.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen04.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen05.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen06.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen07.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen08.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp\xcptgen09.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp_noltcg\patchgd.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp_noltcg\patchgda.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp_noltcg\patchgda2.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp_noltcg\patchgda3.c??
d:\bnb_kpg\minkernel\oem\src\kernel\patchgd\mp_noltcg\patchgda4.c??

I also searched for random phrases and words. Some interesting results are provided below:

d:\th\windows\core\ntgdi\fondrv\otfd\atmdrvr\umlib\backdoor.c??
d:\th\inetcore\edgehtml\src\site\webaudio\opensource\wtf\wtfvector.h
d:\th\printscan\print\drivers\renderfilters\msxpsfilters\util\opensource\libjpeg\jaricom.c??
d:\th\printscan\print\drivers\renderfilters\msxpsfilters\util\opensource\libpng\png.c??
d:\th\printscan\print\drivers\renderfilters\msxpsfilters\util\opensource\libtiff\tif_compress.c??
d:\th\printscan\print\drivers\renderfilters\msxpsfilters\util\opensource\zlib\deflate.c??

You are invited to check Windows 10 source tree at Github and share your findings.

Author: Artem Shishkin, Positive Research

19 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. This comment has been removed by a blog administrator.

      Delete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete
  12. So guys hope you all prepared for Mothers Day festival because Moms Day is very close, Today we are happy to share with you Happy Mothers Day 2016 Stuffs which are going to be your favorite for ever.
    Mothers Day Slogans

    ReplyDelete
  13. How I Was Rescued By A God Fearing And Trusted Lender {Lexieloancompany@yahoo.com}..

    Hello, I am Andrew Thompson currently living in CT USA, God has bless me with two kids and a lovely Wife, I promise to share this Testimony because of God favor in my life, 2days ago I was in desperate need of money so I thought of having a loan then I ran into wrong hands who claimed to be loan lender not knowing he was a scam. he collected 1,500.00 USD from me and refuse to email me since then I was confuse, but God came to my rescue, one faithful day I went to church after the service I share idea with a friend and she introduce me to LEXIE LOAN COMPANY, she said she was given 98,000.00 USD by MR LEXIE , THE MANAGING DIRECTOR OF LEXIE LOAN COMPANY. So I collected his email Address , he told me the rules and regulation and I followed, then after processing of the Documents, he gave me my loan of 55,000.00 USD... well if you are interested in a loan you can as well contact him on this Email: Lexieloancompany@yahoo.com or call/sms on: +18168926958 thanks, I am sure he will also help you. Website: http://lexieloans.bravesites.com


    ReplyDelete
  14. Incredible! Much obliged for your archives, its been exceptionally useful. Much obliged again to share your data.
    TheCaseSolutions.com Review

    ReplyDelete
  15. Europe's football website, UEFA.com, is the official site of UEFA, the Union of European Football Associations, and the governing body of football in Europe.
    ‎Group I
    Euro 2016 The 2016 UEFA European Championship, commonly referred to as UEFA Euro 2016 or simply Euro 2016, is the 15th edition of the UEFA European

    ReplyDelete
  16. Best Place To Get A Solution To Your Financial Problems (Lexieloancompany@yahoo.com)!!!

    My Name is Nicole Marie, I live in USA and life is worth living comfortably for me and my family now and i really have never seen goodness shown to me this much in my life, As i am a struggling mum with two kids and i have been going through a serious problem as my husband encountered a terrible accident last two weeks, and the doctors stated that he needs to undergo a delicate surgery for him to be able to walk again and i could not afford the bills for his surgery then i went to the bank for a loan and they turn me down stating that i have no credit card, from there i ran to my father and he was not able to help me, then when i was browsing through yahoo answers i came across a God fearing man (Mr Martinez Lexie) who provides loans at an affordable interest rate and i have been hearing about so many scams on the Internet mostly Africa, but at this my desperate situation, i had no choice than to give it an attempt due to the fact that the company is from United State of America, and surprisingly it was all like a dream, i received a loan of $82,000.00 USD and i payed for my husband surgery and thank GOD today he is ok and can walk, my family is happy and i said to myself that i will shout to the world the wonders this great and God fearing Man Mr Martinez Lexie did for me and my family; so if anyone is in genuine and serious need of a loan do contact this GOD fearing man via Email: ( Lexieloancompany@yahoo.com ) or through the Company website: http://lexieloans.bravesites.com OR text: +18168926958 thanks


    ReplyDelete