Ha! Got it! And there's nothing you can do to prevent it!
Mwah-ha-ha!
i now have a record of your IP address, i can use it to find out where you're from. i also (more than likely) have a record of your Operating System and what sort of browser you use too! i also know your preferences in pages to view, and with any hope, what page you were looking at before you came here. Broo-ha-ha! Yes, now that information is mine, what nefarious purposes shall i put it to?
Well, if i'm anything like any major website, probably not a helluva lot.
You see, IP addresses are the pennies of the internet. You can find them pretty much anywhere, and in most cases, it's about as worthless. Sure, i've got an address for you, but it could be one generated from your ISP's DHCP table, it could be the address for your router (and there may be bunches of computers behind that), it could be a proxy for AOL, heck, it could even be some european anonymizer site. Anyone with a few years of network experience or who needs to spend time doing user analysis generally realizes this, well… nearly anyone.
Let's get a few things straight, shall we? Yes, it is pretty darn hard to be completely anonymous on the net. Mostly because if my machine wants to talk to your machine, there's got to be a way for me to get the little bits of information from wherever i am to where-ever you are. It's not a little like calling me via the phone. You've got a number and i've got a number and with a little bit of effort, i can generally figure out your number.
So, why is it that address isn't a precious commodity that needs to be safeguarded against evil? Well, other than the above reasons, let me also mention that big websites (like, say, Google, Yahoo, Amazon, Ebay, etc.) get millions of hits per second. (That's Million and Second) These are spread out over banks and banks of machines stored in various locations around the world. Each record is several hundred characters to several thousand characters long, but at millions of hits per second we're talking about some substantially staggering numbers of records to keep track of. Why is that? Well, mixed in with your query for "Nekkid Britany's Spheres" are tens of thousands of other folks looking for stuff, bots crawling for information (legit and less so), Greasemonkey updaters, personal shopping agents, DoS attacks, Virus based crawlers, Internet worms and, well, damn near everything else you can think of. Trying to do anything with that amount of just plain noise is worthless.
Instead, what big sites do is generally toss the various collected logs into a digital grist mill that culls out just the most pertinant facts, usually several days later. The original logs? Pitched to make way for the terabytes of log information for each of the next several days.
But what about all those nifty personal services? Yeah, see, those work because you've got an account. To the big sites, you having an account is FAR more interesting because now there's a way to track you regardless of what IP/Machine/Browser you happen to be using. Also, since only a fraction of the traffic they get are from folks with actual accounts, it's easier to get and manage that data. That's why sites like Google, Ebay, Amazon and Yahoo! keep wanting you to get an account. Plus if you have an account, they can assign a cookie to you that makes you MUCH easier to "customize content" for.
i'll also note that there's a pretty good sized percentage of those accounts that are "trash" too, but it's far less than the amount of crap a site gets that's purely anonymous. i'll add that smaller sites (like this one) are actually MORE of a threat to your IP based identity because the total amount of information i get is actually pretty darn manageable. i can actually store all of the information into various databases and track folks much more closely than any large site would want to.
See, this is what happens when folks stop being cynical. Cynical people realize that hyperventilating over crap like search engines storing your IP address is f(iretr)ucking stupid. There's far more efficient methods that big sites use to track you that have been perfected long before cheap broadband made static IPs slightly more prevalent in the consumer base, and those tracking methods have the added advantage of working through things like different browsers, proxies, DHTML configurations or any of the other bits of stuff that make IP addresses continue to be worthless.
i wonder if i should note that i've also got his contact info too?
Sure, but there's a significant point of diminishing returns. A good example of this is the fact that you can declare the sales tax you pay as a Federal deduction. Do you catalog and record every single receipt that contains that information noting the time and location of purchase? My bet is probably "no". Instead you either take the aggregate deduction based on estimation of average sales tax payments.
Big sites are like that too. Yes, they *could* track that sort of information, but what value is there in doing it? Why should they purchase, dedicate, expensively power and colo RAID arrays full of information that may have a 20-40% validity rate when they can get so much more qualified information by averaging anonymous hits that meet a given profile (e.g. between hours of 12AM-2AM GMT, the use profiles of folks in EST are very different than the folks in PST). Customizing whether you show ads for Prime Time TV and traffic reports makes a helluva lot more sense than tracking a spurious and potentially invalid IP address.
Plus, let us consider the implications of tracking an anonymous IP from a pure usage point of view. Let's say that You use IP 10.1.2.3, and have for several years. You then move (or change providers, or whatever) and now, suddenly the user 10.1.2.3 is completely different. If I was to continue to show the same ads to that user, my CTR would be horrible with no warning or remedy to me. I've just lost what may have been a highly profitable revenue point because you were a complete clickmonkey.
It is FAR more useful to me to either plant a permanent cookie on you (which they all do) for tracking regardless of what IP you come in at. Even better would be for me to convince you to become a registered customer and build a relationship with you so that I can milk you for more sellable details.
While I appreciate the paranoia, understand that this is a minor point. It's like someone trying to pick you out by the sound of your shoes in Grand Central Station during rush hour. It's certaily possible, but there are far better ways of doing it.
I'm well aware of the inner workings of large sites.
I agree, I can think of little legitimate reason for them to be keeping that data around. The only use I can think of would be to answer queries like "give us all of the IP addresses of people who searched for X", whether for internal or external use.
There's an easy answer. They can just say that they don't keep logs with IP addresses. Except that they do, or they're unwilling to say that they don't, which is effectively the same thing. Why? I don't know. Your guess is as good as mine.
But really, IP address logging is just an example, in the larger picture of raising awareness that a LOT of big companies are storing a LOT of data on a LOT of people, in ways that are largely unregulated, and most people aren't aware that there's even an issue.
Even if we don't agree, I'm glad that we're all talking about it now.
Well, I do agree that:
* Companies are keeping lots of information about people, most of it without either consent or declaration (however I'd argue that most of that knowledge is information that is either public or tendered as part of a given transaction). I'll note Wal*Mart using a fingerprint check-out frightens me far worse than most of the online practices I've seen
* Large companies do not declare what they do with this data. Usually, it's wrapped with the "Trade Secrets" banner, which is generally true, as that information is often the principal point of differentiation between companies competing for the same sales.
* Folks should be made aware of this. Most folks simply live in blissful ignorance.
I don't agree that highlighting this one aspect of tracking as a major concern is particularly noteworthy in comparison to all of the other methods. (I'd compare it to Target having records of the customer fingerprints, except that I think I'm the only person on the planet who refuses to use the electric door.)
Still that's my opinion and it differs from yours, and thus the world goes on with no loss to any.
What's an individual/consumer profile worth, either to you or a third party? If it's more than a few pennies, then there's probably good money in compiling what you can.
The thing to me about IP addresses is that in many cases they can serve as the links to chain together cookies/identities from different sites, if you're in collusion. And on the other side, cookies can be used to chain together IP addresses of users who move around, say from work to home. Slowly a meta profile is built, and in many cases it will be wrong or loosely connected.
But with the scale of population and the business models (0.1% conversion to customer, for example) being used, you only need to jackpot data every once in a while in order for it to be valuable.
You only need enough value to build the infrastructure that pays for it, then the marginal cases become legitimate and the data can be used even in low-recovery applications.
Once the data exists, it will be sought by law enforcement and the government both for legitimate reasons and for fishing trips.
All as long as there's value for somebody in some form of profile building. And what could that be? I don't know - say serving the most expensive click-thru ads to the people/cookies/IPs that have the highest sucker (er, I mean customer) rate?
I mean, after all, this is all just an automated form of telephone marketing, isn't it?
We're not a big site, but we track our web usage. (I parse the logs every now and then.) There are two main reasons.
1) We're curious as to what people are using. We've got several applications on our web site, and would like to feel that they're being used.
2) We're curious where people are visiting from. Getting the country/institution's fine for our needs. Anything beyond that's probably noise.
Callous: Any collusion being done across sites would still be done at the cookie level. Understand that the majority of computers out there do not have reliable IP addresses and more importantly, even less had reliable addresses five to ten years ago when this was first arose as a potential marketing tactic. Tracking by IPs doesn't make sense because it's not the best way to do it. It's like you tracking me by my footprints. Sure, you could do it, but wouldn't tracking me by my phone or Social Security number be easier?
Mike: Yep, I do it to. I store and process my logs and do things like watch entry and exit pages. I even toss cookies at people (but I don't store and track those). Probably like me, you're using that data in aggregate rather than by individual and certainly not paying attention beyond the single session.
Heck, even Spammers figured out the weakness of tracking by IPs by raising farms of zombie machines.
I believe that with the appropriate statistical methods, varying IP addresses are not a challenge. Cookies are good. Cookies that rat you out between sites, those are ideal. Cookies that rat you out between sites that are tied to accounts, those are a godsend. IPs are session-related glue when someone drops off your scope.
Worth the effort? That I can't guess because I'm not in the industry and have an unbroken history of underestimating the "value" (Oh and how using that word in this context twists me inside) of consumer profiles and advertising.
Save This Page

It would be one thing if the big sites did indeed throw away that data. Maybe they do. Google, at least, doesn't.
I agree with you - the data mining possibilites of registered users are probably far more invasive. The implications of that seem pretty obvious to me - if your information is available to you forever, they're keeping it. But I think that a lot of people think they're being anonymous by not logging in, and those people are in for a shock.
If anyone's got the computing capacity to mine that large a dataset, it's Google. If I were them, I probably would.