By Dave Owen, website administrator for universemonitor.com
This website features images and data that come from numerous sources. I use two main methods to obtain the necessary files from the sources: Embedding and copying. These are explained below in a way that I hope is suitable for both system administrators and non-technical people.
If you're reading this page because you represent an organization that is featured at this website, hopefully the information below will clarify how it works and how it affects you. Spoiler: It shouldn't affect you at all.
Option 1: Embedding
There are various ways to "embed" content (such as an image or video) from one website into another. Typically, a piece of code is added to a web page, and that "embed code" contains instructions to insert the desired content directly from the original website. When a user visits the web page with the embed code (I'll call it the "embedding site"), their browser requests and displays files from both the embedding site and the original site.
This method has the advantage of being easy to set up, but on the downside, the original site has to serve every file whenever it's requested by the embedding site. This can be costly and put strain on the original site if the embedding site gets very busy. The original site has no control over the incoming requests and would need to ask the embedding site's webmaster to stop doing it.
For this reason and others, I prefer to use the copying method shown below. However, I am happy to use embedding or hotlinking if you prefer. It's highly unlikely that our traffic will have any noticeable load on your server, but I do bear this possibility in mind when I analyze our statistics. If I feel there's any chance of our traffic getting to the point that it could impact you, I'll take action to stop it. Of course I'll stop or limit traffic any time if you ask me to.
Option 2: Copying files to our server
I prefer this method because I've found it to be more robust, and it guarantees that there will never be a burdensome load on the source's server.
Typically I'll set up a script that requests a file (e.g. json file, webcam image, etc) at regular intervals. The request rate varies from once per minute to once per day, with the most common being once every 15 minutes. A copy of the file is made on our server and this is the file that is served to our visitors. This way, it doesn't matter how many visitors we get, your server only ever needs to send the file once.
System administrators: In most cases I use a simple GET request. My scripts identify themselves with headers like this:
"User-Agent: UniverseMonitor/1.0 File Requester", "From: [my-email-redacted]"
What does all this means for the source?
In almost all cases, nothing. If you control the server or website that hosts the original files, you don't need to do anything. I'm effectively just another visitor to your website. In some cases I might ask if it's possible to access files in a certain way, but I'm not going to make a nuisance of myself.
I'm happy to discuss things like how often this site requests your files, the protocols it uses, etc. I value my sources and I'm willing to work hard to ensure that you're happy.
If I am using content from your organization, I thank you sincerely. I share your passion for science and I want to help make it accessible.
If you have any questions or suggestions, please let me know. If there's anything I can do in return for your support, name it.
Thank you and kind regards,
Dave