Sunday, April 12, 2015

the Deep Web - the technical truth

if you try to google "deep web" you'll get tons of results, most, unfortunately interesting conspiracies and "bad" stuff.

What's in it?

but the truth is that the 1st main portion of the deep web is just a lot of data that is just out there, that anyone can get there, and google just cant index it for 1 of 3 main reasons - 
1 - its API format data - i.e. JSON, XML, numbers ect.
2 - its form query based data, meaning its in a database and you need to send a query to get results.
3 - its logon based (facebook, ect.)

the crawlers knows how to handle web pages, today even with JS and AJAX, but still only web pages.

there are of course more reasons, for example google, and when i say google i mean search engines, will not index illegal stuff, heavy gore, copy-rights (more relevant on youtube), and commercial reasons. 

so ye, most of the deep web is just data, yet for specific people, such as NASA's API's, and here is a list of 60 deep sites with API's or forms that are alone exceed surface web 40 times, int the "The Deep Web: Surfacing Hidden Value" and find "Sixty Largest Deep Web Sites".


so lets talk about some terms:
Deep Web - anything on the internet that will not appear in any google search.
Invisible Web - the old term for Deep Web, was dropped since its not exactly right, is a visible web just not floating up to the search engines.
Surface Web - anything that is find-able with a google search
Visible Web - old term for Surface Web
Dark Web - anything that is illegal, can be also in the Surface Web, but usually in not indexed and more often refers to places that needs a bit of hacking to get in to.
Deep Web Layers - refers to the amount of hacking tools, needed to get into a website/place in the web, which is a mirror to the level or security used by it to hide / secure it.
Anonymous Browsing - your ability to go somewhere in the web without someone know its you.
proxy - you are in computer A in the US, but you go to places via computer B in Russia so everybody think your in russia, B is your proxy.
TOR - old term for Surface Web

Is There More?

well the 2nd main portion of the deep web is everything that is logon based, google groups, by un-published posts, universities web sites, governments, dropbox, cloud, things you may already have, you just need the username and password for it.

3rd part is stuff that require a software, starting from simple FTP's to torrents and is friends to TOR. again, you all use FTP's sometimes when you download files and they are up in the surface.

4th part, is the more private stuff like VPN's LAN's ect. sometimes not even using standard protocols and ports ect.

basically the "deeper" layers (private forums), or "dark web" (hackers, criminals), or just other private groups like gov. intelligence agencies, or private ones (commercials or not...), are technically part of these 2 parts only they excel more in hidings, security, hacking, ect., but in the end they are places on the web you could eventually get there with the right authority.


so everybody likes to talk about the layers of the deep web.
lets start by stating the real and known 4 layers. its important to understand that the only thing between layer A and B is the means to get it, nothing more.

Layer 1 - Surface Web, anything you can get by a google search.
Layer 2 - Anonymous (no logon required) deep web, all those API's and form and query based pages and databases, and other non-indexed places, as well as non-browser content
Layer 3 - logon required. anything secured. if you want to get in you need authentications.
Layer 4 - anything heavily secured. truly the only difference between layer 3 to 4 is the security.

there are several conspiracies you'll read around, like 5th level the "Marianas Web", probably after Mariana's trench, which is suppose to be a place with quantum computing and basic AI algorithms, which you should have these techs to even get there and another one is about the 6-7-8 layers, which are blocking algorithms, people trying to get to the 8th while stopping others, and some row data/leftovers which should be able to affect everything, respectively.

truly i think that the 8th is technically impossible and therefor there is no 6-8, and about the 5th, well technically its possible, meaning that its not delusions, yet we dont know about anyone that confirms it, so i guess not.

Can I Get There?

and no.

all the layers have few common divisions or results:
1 - white / legal browser part - you can just browse there
2 - program based part, that starts with PDFs and Words up to dropbox, torrents, and TOR or proxy.
3 - illegal / gore part, need TOR or similar to find and browse
4 - security protected part.

layer 1 just google. here is an exception for the divisions.

layer 2 you just need to discover the url / relevant program / relevant auth, most "white" an legal stuff are in the relevant sites like NASA, and some you'll need to uncover the url's with TOR and deep web search engines. btw just google "deep web links" and you'll find your base treasury. 

layer 3 is really either authentication, starting with username and password up to private VPN's and forums. or hacking tools. 

layer 4 is not for you, usually you'll pick someones radar by even trying to sniff around. people there are heavy pro's, heavy crime, heavy intelligence, heavy gov. ect..

again in any layer you can find either an HTTP/S site, a doc, a proxy/program needed target, or just a place you need authentication to go into.

TOR,  .onion links and anonymous browsing

since internet started people wanted to browse without anyone to be able to know who they are. it can be someone not willing google and other commercials to sniff around him even if its just for flowers (read about how google adds work), it can be a gov. agent trying to find some data about sensitive stuff, it can be a hacker (you go to jail fast today for haking), it can be crime, ect.

so people created some anonymous browsers, to name a few:

TOR is currently the more popular and suppose to be the most secure, although even some university researchers already able to track the traffic, so be mature about what you do there. most people in business claim that the NSA can already decrypt TOR, and read later about traffic.

freenet is more popular for its "trust" policy, meaning that when you search for something your user gets a trust level and other sites and groups show themselves by your level to you.

a bigger list with more explanation here.

TOR is also an acronym for “The Onion Router”, and while your browsing is encrypted people now have .onion (instead of .com) websites where the location of the web is also encrypted, you can only browse there with relevant network support like TOR browser.

there are many level 2-3 sites that you should be anonymous and have a proxy to get there and today TOR does it for you, most of it is either leaks, crime/illegal or gore.

for most times, to be honest, you need these for stuff you dont want gov./bad folks scanners to pick you name/IP. for the endless debate if its good or bad i likes this BBC show

dont worry, they will find a way to track TOR and then a new TOR will arise and so on. google about it and there are some debates on about it.

IMPORTANT NOTE: TOR is designed to hide you IP, not you content, meaning that your traffic IS MONITORED, just not your location. so you can search for crime and gore but YOU CANT PUT USERNAME AND PASSWORD. read more here.

hope that covers you curiosity about the deep web, and if you want to see for yourself just download tor and google "deep web links", you'll find your share.


last thing, while searching the web for images about deep web i collected the best, in my mind, that show the real picture.

image 1 i think really describes how it looks, the surface web is more in the middle of things instead of on the top, while image 2, although commercial, has a bit more explanations

image 3  and image 4 shows better the layers.

1 comment:

  1. "Deep Web" means anything that can't be found on q normal search engine. The deeper you go, through more layers of encryption, the darker the content becomes. Hence the Dark Web
    Crazyask Deep web Links the DarkWeb