To analyze the coverage of the SID identification algorithm, we collected the index pages of Alexa’s top one million sites, storing both the response headers and the response pages. We used a widget to collect only the main page, without attempting to load any subresource (e.g. images, scripts, etc.), thus we only sent one request to each listed site. One exception is a response with a redirect, which we followed until it pointed at an actual page. To assess the validity of the SID identification algorithm, we conducted a manual analysis of a subset of 1,000 sites. This analysis shows that the SID identification algorithm effectively filters out SIDs from other values. Out of 5,500 cookies, 1,953 are identified as SIDs. Of these 1,953 cookies, we only discovered 10 cookies that cannot be obviously classified as a SID.
Analyzing the set of the top one million sites shows 472,834 sites ask the browser to set a cookie upon the first request. Running the SID identification algorithm on these cookies reveals that the cookies of 349,480 domains (73.98%) contain a session identifier: 266,305 domains use a SID with a known name and 98,305 use a SID that matches the three heuristics. Note that these numbers indicate that 15,130 domains use both a SID with a known name and a SID that matches the heuristics. Manual inspection of a subset shows that several sites use indeed multiple SID key/value pairs, for instance, two different kinds of identifiers (a session ID and a visitor ID) or different keys for the same SID value. Finally, analyzing the use of the Domain option on cookies containing a session identifier shows that 6.5% of 349,480 sites make the SID available for all subdomains. These numbers suggest that of the 472,834 sites setting cookies, 123,354 do not include a session identifier in their cookies. We isolated these sites and conducted a follow-up study: similar to the first study, we fetched their index page twice, independently from each other. By comparing the cookie’s key/value pairs present in the response, we can detect potential false negatives in the SID identification algorithm: if the cookies of both responses are exactly the same, then these cookies cannot represent a session identifier, since a SID cannot be shared between two independent requests. The results show that of the 123,354 sites, 77,935 set at least one different cookie value on both requests. Applying the length and randomness heuristic suggests that 69,405 of these domains actually set some kind of identifier. In total, this means that with the elementary SID identification algorithm, Serene already protects 349,480 domains out of 418,885 domains setting a SID, or 83.43%.
Conclusion. The analysis of the support of the elementary SID identification algorithm shows that Serene is able to protect a large majority of Alexa’s top one million sites. In Section 4.6 we elaborate on potential refinements of the SID identification algorithm, allowing us to increase the level of protection.
In the second part of the evaluation, we take a closer look at the impact of Serene’s protective measures on the functionality of available sites. We prepared a clean Firefox profile with Serene installed. We instructed Firefox to load each site using this clean profile, stopped Firefox after 25 seconds, and collected statistics generated by Serene. Note that this process not only loads the index pages, but also all included resources, both within the domain and external, thus triggering Serene’s protective measures. Our study shows that of the one million processed sites, Serene has no negative effect on the functionality of 524,014 (93.14%) of 562,538 sites that set cookies. A follow-up manual analysis of the most common impacted traffic patterns reveals that third-party services, such as tracking, analytics, or advertising, often trigger Serene’s protective measures. Several sites have even documented this behavior. Additionally, recent initiatives such as tracking protection lists or Do-Not-Track also aim at discouraging this behavior. Removing obvious instances of these services brings Serene’s compatibility to 95.55%.
Conclusion. The compatibility study of Serene’s impact on available Web applications shows that Serene fully preserves the functionality of 93.14% of sites. Not counting privacy-invasive third-party services brings the level of compatibility up to 95.55%. For the remaining 4.45%, we suggest a follow-up user study to investigate the noticeable impact on an application’s functionality.
In future work, we suggest refining the elementary algorithm by carefully integrating the more generic, heuristic algorithms, in order to reduce the false-negative rate. To support this suggestion, we ran the 77,935 domains that sent two different cookie values in two independent requests through SessionShield’s algorithm, which suggests that further refinement can extend support to 63,384 of these 77,935 domains, resulting in total compatibility of 98.6%.
Subdomain Attack Vector. As mentioned before, Serene covers all session fixation attack vectors, except for header-based attacks (A5 and A6 ). Attack vector A6 is most likely to occur and is launched through a Set-Cookie header that sets a cookie belonging to all subdomains. In order to launch such an attack, the attacker needs to control such a subdomain and needs to be able to set custom response headers (i.e. a Set-Cookie header).
Preventing these session fixation attacks at the client-side is currently not possible, because the pattern of an attack is very similar to a legitimate usage pattern, where a domain wants to set a SID belonging to all subdomains. Simply disallowing such SIDs would break a substantial fraction of sites. It shows that already 6.5% (22,706 out of 349,480) of sites setting a SID on their index page use the Domain option. Web also states that existing applications depend on sharing cookies across subdomains. Complete client-side protection is very challenging, due to potential abuse of headers, the only legitimate mechanism currently available for Web applications. In a wide-scale study of Alexa’s top one million sites, we have shown that Serene fully preserves 95.14% of the functionality, while protecting 83.43% of investigated applications. Future refinement and a follow-up user study are the keys to increasing both the compatibility and coverage.