Solr is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance.
Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an “inverse index”, which searches orders of magnitude faster.
Solr exposes industry standard HTTP REST-like APIs with both XML and JSON support, and will integrate with any system or programming language supporting these standards. For ease of use there are also client libraries available for Java, C#, PHP, Python, Ruby and most other popular programming languages.
One of the most effective ways to get more readers is when readers share your content through social media. One of the most effective ways to increase your search engine rankings is when websites link to your content. It doesn’t just happen. But also, it can’t be forced.
- Is there a correlation between shares and links?
- What content gets both shares and links?
- What formats get relatively more shares or links?
In their summary, Content, Shares, and Links: Insights from Analyzing 1 Million Articles, the researchers reported:
What we found is that the majority of content published on the internet is simply ignored when it comes to shares and links. The data suggests most content is simply not worthy of sharing or linking, and also that people are very poor at amplifying content. It may sound harsh but it seems most people are wasting their time either producing poor content or failing to amplify it.
Shares are much easier to get than links. Sharing can be almost effortless for your readers. Getting links requires more work – from you. And settling for “average” results may not be what you want. Just as most people have a higher than average number of legs, most articles get a lower than average number of links (slightly above zero). In a random sample of articles, 75% had no external links, and 50% had less than five social shares. Of course, there are notable exceptions, and the article that received 5.7 million shares blew the curve.
What content is most likely to be shared? (Hint: it’s not infographics)
- Lists: yes
- Videos: yes
- Quizzes: yes
- “Why” posts: yes
But getting lots of shares doesn’t mean you will earn a lot of web links. Or vice versa. Off-beat quizzes, videos, and cat pictures might go viral as they’re shared around the world, but websites aren’t as likely to link to them. The conclusion? People seem to share and link to content for different reasons.
What content is most likely to be shared and linked to?
- Deep research
- opinion-forming content
- content from popular domains
- major news sites
- authoritative, research-backed content
By the way, longer content consistently receives more shares and links than shorter-form content. An article that’s 1,000-2,000 words long is twice as likely to get linked to as an article that’s less than 1,000 words long. Compared to a short article, a 3,000-10,000 word is twice as likely to be shared and three times as likely to be linked to.
In recent years, web designers have been discussing a concept called “designing in the open.” That is, letting other people see the website in development while it’s still being developed. This can mean an “open source” attitude, where anybody can chime in electronically. Or it can simply mean giving your client the URL of your development site so they can keep up with what you’re doing.
According to Brad Frost, losing the “Big Reveal” is one of the benefits of designing in the open. You’re not staking a month of work on whether your client likes what you did all month, or wants you to start over.
Basecamp’s Ryan Singer explains why he likes designing in the open:
Instead of asking for 10 changes and waiting a week, you can ask for 1 change and wait 15 minutes. Evaluate the change, praise it or identify weaknesses, and suggest the next change. By asking for small changes, you take the pressure off the designer because you aren’t asking for miracles. You also take the pressure off the review process because the set of constraints and motivating concerns is smaller. The design is easier to talk about because there are a fewer factors involved.
There are disadvantages to designing in the open, of course. When seeing a work-in-progress, clients may criticize the details instead of evaluating the big picture. That’s why many designers like to show clients black and white pencil sketches instead of the current State of the Website. When it’s obvious that they’re not looking at the final version, clients are less likely to ask, “Uh, you do intend to do this in color? Just checking.”
But if designing in the open became a habit, maybe your clients would get used to looking at the forest instead of the trees. Maybe they would learn to accept your ongoing project for what it is – ongoing. Maybe they would appreciate the chance to participate in the creation of their website while it’s being created, and not at carefully orchestrated intervals.
Suppose the Vestry Board of the Cathedral of Florence had come to Michelangelo in 1501 saying, “We have decided the sculpture of David should look like this,” handed him some sketches, bid him ‘buon giorno’ and went out to dinner together.
The problem: paper isn’t marble. Paper is flat, marble isn’t. A sketch can suggest what a sculpture might look like, from one angle. But a sketch isn’t a sculpture. A sketch can’t even become a sculpture, unless you turn it into paper-mache. No matter how well-thought-out the sketches may be, the artist has to create the sculpture from scratch every time.
Clients don’t always understand this.
- Some may schedule two months for management to discuss the website, and one week for designers and developers to create the website. As if making a website is just an afterthought when making a website. They talk as though the website is basically done once the developers receive the mockup or the copy. But at that point the website doesn’t yet exist.
- Some may expect the developers to start work before the client decides what images, words or even what purpose the website will have. Sometimes developers receive the content so close to the deadline, they are forced to start making the website without it.
- Some honestly don’t see web developers as part of the communications process. Web designers are mere decorators, web developers mere programmers. So they haven’t been included in the discussion about target audience and goals. The problem is that, every moment, web developers must visualize their target audience and make decisions on the best way to reach the site goals.
If sculptors make sculptures, it’s even more true that web developers make websites. My point is even more true of web design than it is of print design. A print designer is not just a technician, but you can treat him or her like one: “Here are two images. Combine them in Photoshop. Buon giorno.” You wouldn’t do that to your designer, but you could. And it might work, especially if your designer is as brilliant as Michelangelo and feeling equanimous that day. But you can’t do that to a web designer, and I’m not being persnickety. Here are three reasons why it literally wouldn’t work.
- We’re not Michelangelo. We’re flattered by your confidence in us, but we don’t know how to do everything. Web design requires imagination and problem-solving skills, but its tools are still limited.
- Screen sizes are not set in stone. Next year’s phones will have more pixels or a different shape. Last year’s phones may have fewer pixels. With dozens of common screen sizes in use, it no longer means much to to create a pixel-perfect imitation of a PSD on the Web. Which arrangement of pixels do you mean?
- Craftsmen must work within limitations. Even conference speakers and authors of web design books may not know how to do what you have dreamed up – it may not yet be possible on current browsers. Michelangelo had limitations too – he had to work with a block of stone that two sculptors before him had already gouged and carved on. And it took him three years.
Broken links are a nuisance for everybody involved. They make your website appear ill-kempt. Google notices that, and lowers your search engine rankings. Part of my job is fixing broken links on our websites. I always try to find replacements, but if I fail, what can I do but remove the link completely? If your visitors can’t find what they’re looking for, they may stop looking for you. Or they may email you or call you instead, defeating the purpose of having a website.
When someone visits a web page that’s no longer there, your server sends a 404 message (not found). But what message are you sending to your visitors?
“We decided to move all our pages, and we want you to figure out where they moved to.”
“This website wasn’t important enough for us to update, so why should it be important to you?”
We don’t know what you’re asking about, and we don’t care.”
“We used to know a lot about this subject, but we’re clueless now.”
That’s not the message you mean to send, of course, since you’re not clueless. You’re still the authority in your field. (EDIT: Or else you could send a 410 message.) Maybe when you redesigned your website, you resigned yourself to some broken links. Maybe some of your pages are outdated, and you don’t want anyone to see them again. But you don’t mean to send the message that your website no longer has answers, let alone that nobody has answers, to their questions anymore. The problem is that other websites may have linked to your old pages, or your visitors have bookmarked them. Don’t believe me? According to Google Webmaster Tools, people were still looking for news stories from a ten-year-old version of the main Texas A&M University website. As of last week.
By the way, that Google Webmasters Tools link (above) is where you should start attacking the problem. Log into your Google Webmasters account (you do have an account, don’t you?) and click on the Not Found tab. Google has kindly listed all your broken links in order of priority – the ones that your poor visitors are still trying to find. Click on any URL, then on the Linked From tab in the popup box, and you will see the other web pages that are linking to your missing page. To automatically check for broken links each week, we use SiteImprove.
Some other solutions – your mileage may vary :
- Preserve your pages – Sir Tim Berners-Lee said it and I still believe it: cool URLs don’t change. So when you set up your website, to avoid having to change news.html to news.cfm to news.php whenever you change your backend technology, make it the index page in a /news/ directory. If every page is an index page, you don’t need specify the file name or extension. So inside http://www.tamu.edu/admissions/, it doesn’t matter if the page is named index.html, index.asp, index.php, or index.jsp. It will be treated all the same by your browser. By using this technique (and an .htaccess file), we were able to move the President’s website from WordPress to Cascade Server without breaking any links.
- Alias your pages – But what if your backend technology has changed and you didn’t set up your directory structure this way? With Apache configurations and .htaccess files, you can rename your .php files as .asp files. A useful tool against industrial espionage.
- Stub your pages – Even if you changed your directory structure, maybe you can keep abbreviated versions of the old pages alive, at the old location, if they are still being visited often. Stub pages should quickly answer the most common reason for visiting the page, and conclude with, “For the latest details, visit our new page.”
- Redirect your pages – You can do amazing things with server settings such as the Apache redirect directive or the mod_rewrite module. That’s what the WordPress .htaccess file uses. Your server can send the new page when visitors ask for the old page. You can do that with entire subdirectories.
- Admit your lack – If you can’t fix the link, make sure your visitors see a helpful 404 page, not the default server error message. In a previous version of our website, when we created a customized 404 page that detected when visitors were looking for one of our most popular misplaced links, such as the Online Picasso Project, and directed them to the new location of that page.
- Timestamp your pages – Especially on a blog or a news site, you can do this by literally adding the publication date near the top. You may not need to throw away a page, such as a bio, that is slightly outdated but has good information. When they see the date, visitors (and Google) will be able to judge how current the information is. If you’re not going to move the good information to a new page, don’t throw away the old one.
- Update your pages – If your visitors have been going to the same page on your site for 15 years, do you really need to trash that URL and insult your visitors? Instead, keep the page alive and correct the misinformation. If you expect your visitors to get used to a new URL, it might take another 15 years.
- Inform your referrers – Email the webmasters of the sites responsible for your highest priority crawl errors, giving the broken link, the page where you found it, and the updated link. I did this for a couple of dozen of our most common referrers. Google Translate helped, in some cases, to communicate with non-English-speaking webmasters.
- Refer to your informers – Hospitality and customer service means that you don’t stop with, “It’s not my department.” The more often you’re asked about something that isn’t your job, the more diligently you should shout from the housetops, “I would be delighted to tell you whose job it is!” (Saves wear and tear on you. too). So if you no longer handle the topic that your outdated web page discusses, point your visitors to the best replacement page, where your visitors can find the quickest solution. Yes, it may be important to you that the Associate Provost for Agency Accommodation and Achievement only deals with researchers who were funded before 2010 by private foundations that are located in what was the Laurasia supercontinent during the Mesozoic period. But your visitors don’t care. You can link them to the right department, even if it isn’t yours, faster than they can Google for it. After all, you’re the authority in your field.
How do you deal with broken links on your website? Do you have any better suggestions?
Great article from our colleagues in the Tarleton web services group on how to (and how not to) format your page so that readers can find the important content — http://tarletonweb.blogspot.com/2014/06/end-bloodletting-in-digital-visibility16.html
Categories and tags are core components of WordPress, which is among the most popular platforms used to host web sites. Even professional developers have a hard time understanding the difference in how the two should be used, and the lay audience generally sees no distinction. Proper use of these elements, though, can have a profound effect on making a site more successful.
Use of categories and tags
WordPress itself says that “tags are similar to categories, but they are generally used to describe your post in more detail.” They also say that categories are meant to be hierarchical, while tags exist independently and are independent of any structure. Categories, then, are meant to classify your overall article, while tags describe the content elements of the article.
Categories should be firmly established. Articles fit within categories, not the other way around. Tags, though, are more free ranging. They depend on the content of the article. WordPress recommends having between five and fifteen tags for each article to sufficiently describe the content.
Having a controlled library that comprises the core set of tags is crucial. We want to make sure that all references to a particular entity are tagged exactly the same so that we capture all of the articles pertaining to that subject. For example, we would not want to have separate tags on different pages referring to the university as “Texas A&M,” “Texas A&M University,” “TAMU,” etc. These just serve to dilute the power of the tag to describe your site’s content. That being said, a site with a wide range of topic will not be effective if it uses only this core set of tags. Tags must properly describe the article’s content, and to do that they must be based on the article itself rather than a pre-compiled list of key terms.
Tag clouds are the most common use of leveraging your tags. These show the most common tags on your site, with the size of the text indicating how many articles contain that particular tag. This gives a visual reference to let users know what kind of content they can find elsewhere on the site. It also serves as an index to your site, letting users find other areas of interest that they might otherwise not have come to your site for.
Another powerful, but seldom used, method of leveraging tags is using them as a data feed to populate pages on other sites. Consider [spoiler alert] an experts list site, for example, where each individual’s listing shows related articles that are published in your news site. Matching the expertise keywords with a WordPress tag lets us quickly and easily create a synergy between these two sites, making both more valuable than either on their own.
Effect on Search Engine Optimization
Tags can have an influence on search optimization as well. Modern search algorithms are good at picking out synonyms, so you should avoid using such common terms as separate tag names. Search engines will see this as duplicate content and penalize your site for it.
One other SEO aspect to consider is the concept of link bleeding. In general, each page’s link value is divided among all the links on the page. The tag cloud will likely contain a lot of links, and the value of the less used tags could be pretty minor. They would therefore take away from the page’s effective optimization. Adding nofollow tags on the tag cloud is therefore usually a good idea.
If done correctly, the “categories” and “tags” pages on your site should be among the highest ranking pages.
We are taught from elementary school when we first start writing compositions that the first thing we should do, before we ever start writing, is to identify our audience. The same thing holds true for building web sites. But do we really do it? Some do. At least in a rudimentary way. We put together a committee, ask that questions, and after a lot of argument come up with something. In today’s higher ed it is usually “prospective students and their parents,” at least when talking about the university’s main page. What do we do after that though?
We often write the chosen audience on our production documentation, check it off the list and cease thinking about it altogether. The web committee then moves on to deciding content (we are a modern selection committee, we realize that content must come before design.) We send out surveys, make sure we talk to every department on campus to get buy-in, maybe we even survey other universitys’ sites to see what they have that we don’t. What’s missing from this picture? Where in that process is the audience that we are supposed to be writing for?
I once attended a conference where one of the presenter summed up in one line what we should be doing…give your viewers what *they* want, not what *you* want. That completely turns our process on its head. It means we have to do research, make value judgments, and even risk alienating constituencies on campus who might not like what that means for their content.
The tricky part comes in determining how far to go with this new paradigm. Something like branding is definitely something that we want, but that doesn’t mean we should get rid of it. It doesn’t interfere with, and if done well should even support the mission of the site. We are still trying to attract those prospective students, and branding should reinforce the information they are trying to obtain.
One thing to watch out for is turning your site into an extended university org chart. Think instead in terms of navigating content by services…for example provide a link to “tutoring” rather than “The Office of Classroom Excellence.” The audience doesn’t care *who* provides the service, it is the service itself that they are looking for.
Also, be careful of the language that you use. We have our own extensive jargon, and fall into it too easily. Being at the university every day we are exposed to (so much so that we take for granted) a lot of terms, concepts, organizational makeups, etc. that mean little to the general public. Write for the audience using terms that, again, they place meaning on rather than what we hold dear.
We tried to take this approach in our last implementation of our university website. We removed most of the links that were previously aimed more at faculty/staff and an on-campus audience, even if they were the most popular link on the page. We took flack for doing so, but it was short lived, people adjusted, and the site is now much more aligned with its purpose of serving the prospective student and their parents.
Read any web design book, blog, or article, and they’ll tell you that a good, user-friendly, custom 404 page is one of the most important elements that you can add to any site. And they’re right. But we as web administrators know that “good” and “user-friendly” are wide open for interpretation. Most of the time users hit the page, ignore it, and go about their business. Users seldom bother to report the broken link, and when they do they very often don’t include all (or sometimes even any!) of the information we need to track down and fix the link.
With our new website we’ve tried something different, making it as easy as possible for users to report bad links. We did this on purpose – we knew that with the new design there would be plenty of them. The university site had been poorly limping along on a very bad information architecture for years. Directory structures were illogical and poorly named, and the access file contained almost two hundred redirects for sites that had been on the server but moved years ago. We decided to clean house with this new version of the site, knowing there would be some (a lot) of short term pain, but that in the end it would be a better environment.
We have left or recreated many of the most-used redirects and old directories through a combination of symlinks and rewrite rules, but with the amount we started with there are still many that are now causing 404 errors. Enter the new custom error page…
The most important thing we did on the page was to add a “Report this broken link” feature and make sure it was prominently highlighted. Through a few server side include tricks the mailto link will open the user’s mail client and populate the first line of the message with the requested page and the referring page, if applicable. This insures that we have all of the information we need to find and fix the broken link.
With this feature prominent on the page, we have increased the number of reports from several per week to several per hour…and not all to the newly non-redirected site. Several have been to files that have not been on the server for years, but are still linked from somewhere on the Internet. Bad links that have existed for years but have never been reported. The difference, it seems, is in making it easy for the user to do so.