Web Server Logs and Privacy (2024)

I’m not sure when it started, but there’s been an increase in people asking about internet privacy and data collection. People are rightfully concerned about the use of data and not just on this website. This is good but highlights that I haven’t done the best job telling people what information is captured in a web server log and how it can be used. I can’t answer for all websites, but I can for this one. My answers may surprise you.

Many of you know I’m concerned about privacy, so I was startled to get a reader request to remove data years back before GDPR. The more I dug into the issue, the more I realized people don’t know what data is collected. And as we all know, fear takes on a life of its own. I hope that this article will address those concerns.

Article Contents

The Data Collection Process

The process begins when your web browser requests a page from our website. Many sites maintain a server log that records these transactions. A log transaction is generally defined as getting a resource such as the web page text, picture, file, etc. Most websites keep a server log as it has beneficial information such as traffic patterns and errors. In our case, the webserver uses the Apache HTTP combined server log format. This may vary based on the hosting company.

As you move through this website, multiple lines are appended to this log in chronological order. You might think of this as a more complicated version of your web browser history because it deals with many people. There are multiple lines because a web page consists of many resources such as images, text, CSS style sheets, ads, etc. In other words, the web page you’re viewing may appear as one item to you, but from the web server’s view, there might have been dozens of requests to display the page. Each request becomes a line item in the webserver log. As a result, these logs can be huge. Based on the configuration, they could be produced daily or monthly.

Where’s My Personal Info?

While weblogs contain lots of data, I can’t say they are fun reading. The data can be useful, but you need a log analysis tool or a service like Splunk or Sumo Logic to make sense of the information. In addition, different collection methods may capture different information. Some hosting companies also provide analysis programs.

In the case of the reader who wanted her data removed, she thought these logs collected all sorts of personal info. And in a sense, they do but not like many people think. There is no line item that says Jane Doe from Franklin, Tennessee came to this site from Bing and read 2 articles, and left the site by clicking an Amazon book recommendation.

The best way to illustrate why I can’t tell this is to show an example log entry.

Server Log Example

Below, you’ll see one item request from the raw server log that I’ve parsed to make reading easier. I’ve also numbered the data elements. In the webserver log, this information appears as one long line. This example is about 10 years old but still works. I no longer use these logs and don’t wish to reinstall them to update the article example. So, excuse me for older references and protocols.

(1) 65.192.81.64

(2)

(3) –

The Log Data Elements

(1) IP address

The first data item is the IP address of the client making the request. A client could be your computer, firewall, proxy, smartphone, and so on. The IP address is dynamic for some people, meaning that it shows as 65.192.48.61 on May 11, but it might be different the next time you visit. Or, in the case of some firewalls, it could be all the computers behind the firewall are using the same IP address. Also, people who use VPNs usually have different IP addresses. Here’s a VPN primer from ExpressVPN.

There is more information that can be inferred from an IP address, such as your location. There is a method for assigning blocks of numbers. For example, internet service providers (ISP) or large companies may be assigned blocks of IP addresses. If you want to see how your IP translates, go to Google’s Q&A page on IPs.

People should also know the geographic information isn’t always precise. For example, many years ago, when I was analyzing our city’s logs, many entries showed Vienna, Virginia. Not a neighboring community to California. At the time, AOL’s networks were set up to show all users from that location.

It’s important to note that the web logs don’t translate the IP location. The geographic translation is done by an analysis program. A simpler option is for webmasters to use something like Google Analytics. While Google may capture the IP address, it does not provide it to webmasters.

(2) Identity Check

At first, I thought the displayed hyphen was a delimiter, but it actually means data is not available. The field is used for determining the identity of the client machine. The name was a little worrisome until I read the Apache documentation that states, “This information is highly unreliable and should almost never be used except on tightly controlled internal networks. Apache httpd will not even attempt to determine this information unless IdentityCheck is set to On.

(3) UserID

Again, the field shows as a hyphen since no data was collected. This field might show data if the article being requested was password protected and I required authentication. I do use this field for internal use to access test areas.

(4) When did the server finish the request

This is the time the web server finished getting your information. The -0700 indicates our web server is 7 time zones behind GMT.

(5) What can I get you?

This line indicates what you requested. In this instance, the reader requested the article on creating Outlook signatures. The HTTP/1.1 indicates what protocol was used. A protocol is a format two devices use to exchange information.

(6) Result Code

This number indicates the status code the server sent back to your browser. If everything worked, you get your request. Otherwise, you might see one of our infamous “Oops…we’re sorry pages (aka 404 errors). In this case, the 200 indicates the page was successfully received by your web browser.

(7) Size

This figure indicates the size of the object returned. In this case, it was the size of the article or 7537 bytes.

(8) Who sent you?

One advantage to the combined log format is it shows who referred you to our site. Don’t worry as the who is never a person. In the example above, the reader searched the US version of Google for “Outlook signature.” This information is passed along in the URL from search engines or links from other websites.

We should mention that the search engines stopped showing what the reader searched for in the logs many years ago.

(9-12) Browser Information

Items 9-12 are sent by your browser and show which version you’re using and your operating system. In the example above, the client was using the US version of Windows NT 5.1 with version 1.0.3 of Firefox.

What Do You Do With This Data?

The next question is whether I use all this data. The short answer I use some of the data, but not all. While web server logs collect a lot of information, that doesn’t mean it’s correct or meaningful. I might use the log data if I see some anomaly or strange bot behavior. I’m more apt to use Google Analytics as the interface is much simpler.

Are there any pages that are broken that I need to fix?

I can decide there is a problem by looking at items 5 and 6. This is an important issue since a broken or slow web page is a terrible user experience.

What Are People Reading?

OK, no one should ever be shocked that a webmaster wants to know this information. After all, if you’re not reading their content, they don’t have a business. It only makes sense that web admins want to know the most read articles and the least read articles.

Hey, are you new to these parts?

As with any business, you like to get new customers and keep the regulars. This is the type of information you can get after accumulating enough daily log files. Even then, the info isn’t precise because so many people have dynamic IPs or come in using a different device. One way I could avoid this problem is to force people to register, but I don’t.

How did you find us?

As you might expect, item 8 can help us in this regard. I look at the referrer information as it indicates where someone posted information about our site or articles. This gives us an opportunity to read what was said on another website and post our comments if needed.

Just because a referrer is listed, doesn’t mean I can go back to the referring site or want to. In one case last year, I saw a huge number of referrers from a private adult-oriented group on a major portal. As much as I was curious about why all these people were referring to one of our articles, I didn’t pursue this one. I would first need to register with this site and secondly the content, including their Privacy Policy, was in Portuguese.

The biggest concern people usually have is seeing their search terms included in a log entry. I can understand this, as I never knew this happened until I looked at a web server log. The search terms are useful as it gives me an idea of what type of information people need. These keywords have also helped us with language differences where as an US-based author I might use one term, but someone from Europe might use a term or phrase I might not know. Yes, I’m still trying to figure out what the Brits mean by a “punter”.

Update: Search engines no longer pass along the keywords.

What browsers are people using?

We use item 12 to answer this question. The reason we’re interested is that different browsers handle the web code in different ways. While the differences may be subtle, there are times where I have abandoned some features because I couldn’t get them to work correctly with a specific browser.

I suppose if I had ample time and budget I would be more proactive with this information. For example, I might offer a reminder to people using older browsers to upgrade as they may be at risk.

The other reason I look at this info is there are certain bots designed to harvest email addresses or images from websites. Since I don’t have forums, I don’t have to worry about this too much. I still block these agents when appropriate.

You downloaded how much data?

Many people have the notion that the web is free. Well, this is true if you don’t have a website. The truth is that websites have data costs in terms of storage or bandwidth transmission. This factors into my hosting agreement.

In most cases, bandwidth isn’t an issue. I’m more than happy to offer content to people. After all, this website intends to help people. However, I draw the line when it becomes clear people are scraping huge chunks of this site for their economic gain.

Bottom line

I suspect the above information answered some questions about internet privacy and web server logs. Certainly, I can answer items about this site but can’t speak for other sites. The brilliance of the web is how it is interconnected, but it comes with risks. The downside is some sites do install spyware or combine server log information with other databases, which show more information about you than you might be aware of. The best defense is to be vigilant about spyware and always read End User License Agreements (EULA) and Privacy Policies.

FAQs

What do web server logs show? ›

Web Server logs provide an overview of all activity associated with the web server. For most organizations these logs are the only way to understand how and when the server is used and by whom.

Discover More Details ›

What are the 3 types of server logs? ›

Availability Logs: track system performance, uptime, and availability. Resource Logs: provide information about connectivity issues and capacity limits. Threat Logs: contain information about system, file, or application traffic that matches a predefined security profile within a firewall.

Read On ›

Why do we need web server logs? ›

Using web server logs, you can easily know where the problem is coming from and solve it on time. Logs are automatically created by the server and consist of files containing information such as errors, requests made to the server, and other information worth looking at.

Know More ›

Are log files personal data? ›

IP addresses, geolocation data, and other log data could potentially be combined with other information like username or online activity to identify a particular person. Therefore, log data can be considered personal information. For this reason, it must be treated as personal information under applicable law.

Learn More ›

What data is mined from web server log files? ›

Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user.

How do I read a web server log? ›

Double-click on the log file and it will likely open in a text program by default, or you can choose the program you'd like to use to open the file by using the right-click and “Open With” option. Another option is to use a web browser and open the server log file in HTML.

Get More Info ›

How long do server logs last? ›

As a baseline, most organizations keep audit logs, IDS logs and firewall logs for at least two months. On the other hand, various laws and regulations require businesses to keep logs for durations varying between six months and seven years. Below you can find some of those regulations and required durations.

Know More ›

Where are server logs located? ›

By default, Event Viewer log files use the . evt extension and are located in the %SystemRoot%\System32\winevt\Logs folder. Log file name and location information is stored in the registry. You can edit this information to change the default location of the log files.

View Details ›

What information can be found in an access log file? ›

An access log is a log file that records all events related to client applications and user access to a resource on a computer. Examples can be web server access logs, FTP command logs, or database query logs. Managing access logs is an important task for system administrators.

Explore More ›

What are the disadvantages of log files? ›

The disadvantages of log file analysis: Caching and proxies: since a log file can only record data that is created by direct server access, all accesses that occur via the cache memory of the browser and via proxy servers are not included in the protocol.

Learn More ›

What information should not be placed in a log? ›

Credit card information should never be logged. ID numbers (such as SSN in the US or Teudat Zehut # in Israel). Network computer names, network share paths.

Know More ›

What is privacy logging? ›

Logs end up being stored, analyzed, and archived even if they contain privacy-sensitive data that are illegal under privacy laws such as the EU's General Data Protection Regulation (GDPR)—that includes specific guidelines for log data. The most recent examples include GitHub, Facebook and Twitter.

Find Out More ›

Does a web server actually remember that you're logged in? ›

The message is sent back to the server each time the browser requests a page from the server. A web server has no memory so the hosted website you are visiting transfers a cookie file of the browser on your computer's hard disk so that the site can remember who you are and your preferences.

Keep Reading ›

What data is stored on a web server? ›

The HTTP server is able to understand HTTP and URLs. As hardware, a web server is a computer that stores web server software and other files related to a website, such as HTML documents, images and JavaScript files.

Read The Full Story ›

What type of data is held on a web server? ›

On the hardware side, a web server is a computer that stores web server software and a website's component files (for example, HTML documents, images, CSS stylesheets, and JavaScript files). A web server connects to the Internet and supports physical data interchange with other devices connected to the web.

Discover More Details ›

How do I analyze server log files? ›

How To Do Log Analysis

Collect/export the right log data (usually filtered for search engine crawler User Agents only) for as wide a time frame as possible. ...
Parse log data to convert it into a format readable by data analysis tools (often tabular format for use in databases or spreadsheets)

More items...

Tell Me More ›

How do I clear my web server logs? ›

Open Server Manager, click the Tools menu, and then click Task Scheduler. In the Actions pane of the Task Scheduler dialog box, click Create Task. On the General tab of the Create Task dialog box, enter a name for the task, such as "Delete Log Files".

What is Web log data? ›

It is the data generated automatically by the web server as a result of interaction with the website by the visitors.

Read On ›

Can server logs be deleted? ›

You can manually delete task, server, and File Transfer Service log files older than the specified number of days.

Get More Info ›

How long do Internet providers keep logs? ›

Our Verdict. Internet Service Providers (ISPs) can see everything you do online. This includes your browsing history, the videos you watch, and the websites you visit – even in private browsing mode. In most countries, ISPs can track and store this information for up to two years.

How to check server logs in command? ›

Open up a terminal window and issue the command cd /var/log. Now issue the command ls and you will see the logs housed within this directory (Figure 1).

Keep Reading ›

How do I read a log file? ›

How to Open a LOG File. The data contained in these files are usually regular text files. You can read a LOG file with any text editor, like Windows Notepad. You might be able to open one in your web browser, too.

Explore More ›

How do I view server logs in Windows? ›

Click Start > Control Panel > System and Security > Administrative Tools. Double-click Event Viewer. Select the type of logs that you wish to review (ex: Windows Logs)

Read On ›

What are some reasons you would want to use a log file? ›

What is log file used for? Log files are used to record events or activities that occur within a computer system, application, or program. They serve as a detailed record of what has happened within a system, which can be used for troubleshooting problems or investigating security incidents.

Discover More Details ›

Can I safely delete log files? ›

You can remove a log file if all of the following are true: the log file is not involved in an active transaction. a checkpoint has been performed after the log file was created. the log file is not the only log file in the environment.

Discover More ›

What happens if I delete log files? ›

If you delete it while it is written depending on the writing method, it will be either recreated with new data or space will continue to be written but the file won't be accessible. Third case, the file is written/closed on each new data block so then you will get "file not found" or other type of errors.

Learn More ›

Is it always okay to delete a log file? ›

Yes, log files can be safely deleted. Next time a log file needs to be appended to and is missing, it will be created (don't delete the actual Logs folder itself though). Log files are always presumed transient.

What are the 3 basic log rules? ›

Logarithm Rules and Properties

Division rule. Power rule/Exponential Rule. Change of base rule.

Get More Info Here ›

What are common log mistakes? ›

By far the most common mistake made by students with log properties, is that they remember there is a link between addition and multiplication, and between division and subtraction, but they don't remember which direction the property goes.

Read The Full Story ›

What are the safety concerns with logging? ›

Logging involves exposures to a wide variety of hazards, including: work in close proximity to heavy equipment and trucks; tree falls, log movements and falling objects; ergonomic issues; hand-arm and whole-body vibration; noise, and; environmental factors.

Explore More ›

What are the three types of privacy? ›

Digital privacy can be defined under three sub-related categories: information privacy, communication privacy, and individual privacy.

How do you keep sensitive data out of logs? ›

Best Practices for Keeping Sensitive Data Out of Your Logs

Encrypt Data in Transit. ...
Isolate Sensitive Data. ...
Tokenize Sensitive Data. ...
Keeping Sensitive Data Out of URLs. ...
Mask or Redact Sensitive Data. ...
Code Reviews. ...
Structured Logging. ...
Automated Alerts.

Nov 12, 2022

Get More Info Here ›

What security events should be logged? ›

Which events should be logged?

authentication successes and failures;
access control successes and failures;
session activity, such as files and applications used, particularly system utilities;
changes in user privileges;
processes starting or stopping;
changes to configuration settings;
software installed or deleted;

More items...

What is the most common log? ›

In mathematics, the common logarithm is the logarithm with base 10. It is also known as the decadic logarithm and as the decimal logarithm, named after its base, or Briggsian logarithm, after Henry Briggs, an English mathematician who pioneered its use, as well as standard logarithm.

Get More Info ›

What are the four main properties of logs? ›

The Four Basic Properties of Logs

log_b(xy) = log_bx + log_by.
log_b(x/y) = log_bx - log_by.
log_b(xⁿ) = n log_bx.
log_bx = log_ax / log_ab.

Discover More Details ›

What are the two main types of logging? ›

Logging is generally categorized into two categories: selective and clear-cutting. Selective logging is selective because loggers choose only wood that is highly valued, such as mahogany. Clear-cutting is not selective.

What are examples of logs? ›

For example, 2³ = 8; therefore, 3 is the logarithm of 8 to base 2, or 3 = log₂ 8. In the same fashion, since 10² = 100, then 2 = log₁₀ 100. Logarithms of the latter sort (that is, logarithms with base 10) are called common, or Briggsian, logarithms and are written simply log n.

What are firewall logs? ›

Firewall Rules Logging lets you audit, verify, and analyze the effects of your firewall rules. For example, you can determine if a firewall rule designed to deny traffic is functioning as intended. Firewall Rules Logging is also useful if you need to determine how many connections are affected by a given firewall rule.

Show Me More ›

Where are event logs stored? ›

In Windows, the event logs are stored in the C:\WINDOWS\system32\config\ folder. They are created for each system access, operating system blip, security modification, hardware malfunction and driver issue.

Get More Info Here ›

What are common Windows log files? ›

Windows Event Logs Types for Security

Security Log: These logs keep track of activities that may compromise security, such as failed login sessions or removing important files. ...
Application Log: ...
File Replication Service Log: ...
System Log: ...
DNS Server Log: ...
Directory Service Log:

Jan 4, 2023

Read On ›

What is the difference between Windows event log and syslog? ›

When thinking about syslog vs. event log, it helps to remember an event log is a subset of what might be tracked in syslog. Syslog servers capture information from multiple logs and store it in a central location.