PDA

View Full Version : Getting your web site recognized by search engines



George_Race
09-09-2012, 08:47 AM
I am starting this thread because I noticed that many web sites, especially those built by end users, do not have good luck being recognized by the search engines. There is a whole industry behind "being recognized" on the web, but there are some simple things that are overlooked by "home" designers of web sites that will be of great help.

Here are the basics of how you site gets recognized on the web. You can spend a lot of money with Google and other search engines and "Buy" a higher ranking in the search engine world. Or, you can provide information in your "header" and search engines will find and index your web site, by the information your provide.

Next I will review what i am doing on my web site to drive "search engine" traffic to my site.

Here is what my "Header" information looks like on all of my web pages:

<html>

<head>
<title>My Projects Page</title>
<meta name="generator" content="Namo WebEditor">
<META NAME="TITLE" CONTENT=" Homebuilt Aircraft Parts from MyKitAirplane.">
<META NAME="AUTHOR" CONTENT="http://www.mykitairplane.com">
<META NAME="SUBJECT" CONTENT=" Aircraft Parts ">
<META NAME="DESCRIPTION" CONTENT="Race Consulting supplies information and components for Zenith homebuilt aircraft ">
<META NAME="ABSTRACT" CONTENT="Homebuilt Aircraft Parts from Race Consulting. - Parts and supplies for the Zenith Airplane Builder">
<META NAME="KEYWORDS" CONTENT="Inspection Openings,baggage door kit,fuel tank trim plate,fuel filler trim,fuel drain trim,pilot supplies,aircraft parts,home built aircraft,kit planes,Zenith Parts,Instrument Bezels,Inspection Opening Kits,Tie Down Kits,Custom Panels,2 and 3 Axis Controller,Experimental Aircraft,MyKitAirplane Builders Program,Airplane Building DVD,Zenith 701 POH,Zenith CH701,Aircraft Data Plate,instrument panels">
<META NAME="COPYRIGHT" CONTENT="http://www.mrrace.com">
<META NAME="LANGUAGE" CONTENT="English">
<META NAME="DISTRIBUTION" CONTENT="global worldwide">
<META NAME="REVISIT-AFTER" CONTENT="7 DAYS">
<META NAME="GOOGLEBOT" CONTENT="index,follow">
<META NAME="ROBOTS" CONTENT="index,follow">
<meta name="namo-theme" content="Theme\RaceConsulting Design\GoldWorld">
</head>

Understand the above "stuff" and placing it in the appropriate place on your site will go a long ways toward "being recognized and found" with search engines. I suggest you copy the above information, paste it in notepad, and print it out. This will make it easier to follow as I try to explain what each entry does.

To start with, how do you find the stuff above in any web site? Here is something to try right here on this site, as you read this article. First, right click on this page, anywhere, with your mouse. From the list that opens up, click on "View Page Source." That will open a window that shows the "META" information at the top of the page. META information is background information used by web search engines and robots to "index" your pages on the World Wide Web. This site does have "meta" entries, but only a minimal amount.

Now looking at my list, from above, here are some key things to ponder on. Each entry beginning with META NAME= is conveying information used for web indexing. You can read down the list, each entry is important. Important to your sight being found is the content of your "SUBJECT," "DESCRIPTION," "ABSTRACT" and "KEYWORDS" areas. Of all of those, the KEYWORDS area must contain as much "unique" to your situation information as you can provide. Each entry must be separated by a "comma" and there is no limit to the size of the list.

The next important group provides information to "Robots" that constantly search the web for information to include in their search lists. If you want more information about how Robots work, do a search on "Search Engines and Robots" for a lot more information on how they work. Key to this group are the following entries: "DISTRIBUTION," "REVISIT-AFTER," "GOOGLEBOT," and "ROBOTS". Those are the entries that direct the Search Engine Robots what to do and how often to look for information on your site for their search index information.

If you look at the page headers on the sites you visit, you may find a lack of this kind of information. So how can they be so easily recognized by search engines like Google? The answer is simple, all it takes it money. You can literally buy your way into a top ranking in any particular search engine through their provided services. But, if you don't want to spend thousands getting your Web Site higher in the search engine list, you can use the above information to get started, and it is free and under your control.

Hope this helps you to understand "Site Search Ranking" just a little better.

George

Peter.
09-09-2012, 02:28 PM
Yep I was surprised when I recently found that you can 'bid' on search words so that your site comes out top in the rankings. The company I work for told me that one particular search term was costing them 8 a hit.

dp
09-09-2012, 03:14 PM
Search Engine Optimization is a complex and dynamic endeavor. Both Google and Bing have pages that can help you avoid pitfalls in your SEO activities. Since accuracy and timely information is also important it is a good idea to take the time to create a sitemap of your pages. http://en.wikipedia.org/wiki/Sitemaps

Don't ignore Bing in your SEO efforts - with the bad press Google has earned, Bing is becoming a fast growing first choice for search. Full disclosure: Google is 100% blocked at my firewall.

It is also a very good idea to make sure your HTML is clean by having it scanned for errors. That can be done here: http://validator.w3.org/

Get information on common SEO errors. Make sure it is recent information as this truly is a dynamic process. http://www.opportunitiesplanet.com/seo-2/common-seo-mistakes-to-avoid/

Get multiple opinions - nobody who was right should be considered still right regarding many of these tips.

Since you want your pages to be read they have to be readable and that means they need to be renderable on a variety of browsers and operating systems. That now more than ever includes mobile devices. If you are using a canned product such as Wordpress, Drupal, or other content management system (CMS) there may be little you can do to control this other than finding acceptable themes to install, but if you hand craft your pages, layout will be your job. There are some good and bad ideas about that, too. http://webdesignerwall.com/trends/960-grid-system-is-getting-old

Size matters. And is changing. Stay on top of this. http://www.w3schools.com/browsers/browsers_display.asp

Mobile devices are the fastest growing segment in browsers now. Don't be left behind. http://www.w3.org/Mobile/

Evan
09-09-2012, 04:54 PM
Google's search algorithms are very complex and go far beyond simple meta content. Google ignores the meta content entirely and instead looks at the actual web page content. Meta content is too easily abused and unreliable. The largest factor the Google uses is the number of other reputable web sites that link to yours. That eliminates creating "link farms" and it also matters how well the linking sites relate to your content. Also, the higher the rating of the linking web site the more important a link from that site is to your site.

Google also dislikes hidden content such as white text on a white background. Not only will it be ignored, it will also result in a lower rating.

If you want to see a page of mine that has been in the top ten now for about 15 years look up "karelian bear dogs". The web page is rather pathetic as I designed it 15 years ago and the owners haven't paid me to update it since it works so well for them. When it first went up it was about the only site on line about Bear Dogs so it has numerous links from all over the world. That is what keeps it in the top ten. It also has a hard to spot tag cloud at the bottom of the page but it isn't ignored because it is on top of a graphic.

See http://crazywolf.com/beardog/ It currently comes up as number four in the above search which is a number that many would kill for.

Google looks at content and it rates content according to header tags, among many other things. It looks for the primary search terms to be at or near the top of the page and the larger the text size the higher the rating it gives to that search term. It also looks for repetitions of the search terms within the text body but if repeated too many times it will down rank them. There are many other factors but what I have mentioned are some of the most important for a simple web page.

Incidentally, I create my web sites with notepad.

John Stevenson
09-09-2012, 05:19 PM
See http://crazywolf.com/beardog/ It currently comes up as number four in the above search which is a number that many would kill for.



Google the phrase Home Workshop or Homeworkshop.

No bad, been there for 4 years.

dp
09-09-2012, 05:57 PM
Google the phrase Home Workshop or Homeworkshop.

No bad, been there for 4 years.

It took me to http://www.homeworkshop.org.uk/ - a site with a very annoying cookie policy that leaves a nag badge on the screen. I took the hint and left.

Tony Ennis
09-09-2012, 06:15 PM
Google ignores the meta content entirely and instead looks at the actual web page content. Meta content is too easily abused and unreliable.

Evan is 100% correct in this. I'm a web guy; I've been through this.

Paul Alciatore
09-09-2012, 07:00 PM
Evan,

I was particularly interested in this thread and your comments because I am going to update my own site soon.
Two things I noticed. First, I did use Google to search for "karelian bear dogs" as you suggested. It did not come up as number four in that search. It wasn't even on the first page of results. It was the sixth entry down on the tenth page, making it #96 in the results. In the top 100 which would get noticed by me as I usually stop or refine my search after ten pages. I have to wonder if there are other factors in Google's search. Perhaps it also factors in the habits of the person (computer) doing the search. You have probably visited that site before so it is higher on your results list. Just a thought.

The other thing is, after visiting the site, I have to say that any apology about it is really not called for. It is a very nicely done site and probably a lot nicer than 99% of what is on the web. It is certainly nicer than mine or this page. I may hire you for my site.

goose
09-09-2012, 07:01 PM
If you want to see a page of mine that has been in the top ten now for about 15 years look up "karelian bear dogs". The web page is rather pathetic as I designed it 15 years ago and the owners haven't paid me to update it since it works so well for them. When it first went up it was about the only site on line about Bear Dogs so it has numerous links from all over the world. That is what keeps it in the top ten. It also has a hard to spot tag cloud at the bottom of the page but it isn't ignored because it is on top of a graphic.



Seriously, have you checked that recently, that site isn't coming up until page 10 on Google. I tried Bing and Yahoo with equal results.

BTW, I love the keywords hidden behind the background graphic.

dp
09-09-2012, 07:27 PM
Seriously, have you checked that recently, that site isn't coming up until page 10 on Google. I tried Bing and Yahoo with equal results.

BTW, I love the keywords hidden behind the background graphic.

Did you put the quotes around the string? I had the same experience until I did that then it cam in #4. I don't count the sticky pay ads at the top.

John Stevenson
09-09-2012, 07:31 PM
It took me to http://www.homeworkshop.org.uk/ - a site with a very annoying cookie policy that leaves a nag badge on the screen. I took the hint and left.

Dennis, did you accept the cookies ? I'd like to know as we may not see what others see.
The cookie policy is now part of UK law which we have to comply with.

All I see is a small green tick in the bottom left corner

goose
09-09-2012, 08:42 PM
Did you put the quotes around the string? I had the same experience until I did that then it cam in #4. I don't count the sticky pay ads at the top.

quotes = page 5 Google, page 2 Yahoo and Bing

dp
09-09-2012, 09:41 PM
Dennis, did you accept the cookies ? I'd like to know as we may not see what others see.
The cookie policy is now part of UK law which we have to comply with.

Of course not. The site is not entitled to store their data on my systems. The Brits don't seem to understand who owns space on my system.


All I see is a small green tick in the bottom left corner

Much bigger. And it floats so stays right there as I scroll.

http://metalworkingathome.com/images/cookiemonster.jpg

Evan
09-09-2012, 09:46 PM
Where it comes up depends first on what country you are in. If you are in Canada it will come up higher than anywhere else since it is a Canadian site. Even closeness to the site location can play a part although mine will show up as being in the Toronto area since that is where the server is located. It also depends on your search history although that hasn't had an influence on my results since this is the first time I have looked it up since I built this computer.

You can try searches with different Google locations just by using the URL for that server cloud such as google.ca for Canada. I do that all the time when I check the news to see what is considered important in different parts of the world.

dp
09-10-2012, 12:38 AM
Google has started doing something stupid, recently, at least on my servers. I have a tool running called DenyHosts. It is a PHP tool that explores the authentication log and looks for the signature of brute force login attempts. This is where a black hat site attempts to log in using a long list of login names and passwords. The services being attacked are ssh, smtp, ftp, imap, pop, and telnet. I used to get thousands of these connection attempts each week, and each would have hundreds of attempts. It is a big waste of bandwidth. Using an automated script, the blackhat attaches using one or more of the protocols and starts running down the list of user names and passwords. So this PHP script I run looks for a certain repeating pattern of failure in the authentication log and when a match is found it grabs the offending IP address and drops it into the /etc/hosts.deny file. That file is consulted by the xinetd super server at each connection attempt and if there is a match between a connecting system and an IP entry on that list, xinetd will drop the connection with prejudice. No more wasted bandwidth.

So what Google in their infinite wisdom is doing is connecting to my ftp server as user anonymous. My server doesn't allow anonymous ftp, and if this happens three times in an undisclosed period of time the connecting system is blocked. For freaking ever. So one by one, Google servers are cutting their own throats.

Evan
09-10-2012, 02:46 AM
My passwords do not appear in the rockyou password database which contains 14 million lines of passwords culled from data breakins all over the net including Sony and other similar hacks. I always check my passwords against that database as well as the Cain database and a couple of others. There is one program that can search a database that size in a few seconds, Notepad++. Most others will choke to death on it.

I don't allow anonymous FTP either. You get three tries from the same IP and then it locks you out for 10 minutes.

Have you tried putting robots.txt in the greeting?


Google-specific: Google also accepts and follows robots.txt files for FTP sites. FTP-based robots.txt files are accessed via the FTP protocol, using an anonymous login.

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

I don't know if they will recognize it there but they might.

MrFluffy
09-10-2012, 04:26 AM
Why are you guys still using ftp? come back 1990 all is forgiven :)
In case you don't know, telnet, ftp, rsync, rsh etc are all depreciated as they send the username and passwords in cleartext as part of establishing the session, therefore are trivial to sniff out the traffic if you can get access to any of the devices along the hops. And cisco etc routers all have facility for packet dumping with basic pattern matching to do it automated (and how many of them are left by the isp as admin/admin credentials, or managed inband... lots sadly is the answer). The sftp subsystem runs on a different port (22, part of the ssh daemon suite) and can be set up to only allow public key access, so almost no chance of being brute forced then. If your really paranoid run the sshd on a high numbered port as most scanners tend to do 1-1024 and some known ports listed in /etc/services. Pick a unknown port in the 50,000's and it'll be too much pain and bandwidth to even find it listening for a 3rd party.

For the seo stuff, as above, if you start keyword etc stuffing, it will drop you down the rank, and most of the SEO optimization services are scammers, who at absolute best services will work for a few weeks before google tweaks its algo's yet again in the ongoing spam link war, and seo is about spamming, make no mistake. I just make honest pages now, keep the information in plain text so the robots can parse it easily with descriptive alt tags on images etc.

John's site has to have cookies to comply with eu law forced on everyone. Although I have some stuff thats hosted in greece at the moment and haven't bothered yet, and the enforcement peeps haven't had a word about how naughty I'm being ;) In fact a lot of public bodies websites aren't compliant to that degree yet either...

Finally lol at using bing instead of google. You pretty much are using google results in bing :D
Are they still copying their results directly from google, or are they hiding it better since they got caught doing that wholesale recently? :)

John Stevenson
09-10-2012, 04:35 AM
John's site has to have cookies to comply with eu law forced on everyone. Although I have some stuff thats hosted in greece at the moment and haven't bothered yet, and the enforcement peeps haven't had a word about how naughty I'm being ;) In fact a lot of public bodies websites aren't compliant to that degree yet either...



Law or no law it's still no # 1 :)

We don't make the rules and it's a completely free site, Flylo would give his high teeth for one in his area :)
We have had about 6 people contact us to say they don't like the cookie idea but tough, they are not paying for it and those wingers are the sort that don't buy anyway.

George_Race
09-10-2012, 06:44 AM
I am curious how I rank on Google in other parts of the world. Here is how I look here in the U.S.A.

"Race Consulting" No. 1
"MyKitAirplane" No. 1, first six listing of my products
"MrRace" No. 1

George

George_Race
09-10-2012, 07:21 AM
If you are curious about the ranking of your or any other web site, you can find out here.
http://www.urlpulse.com
Lots of other good information as well.

Where it says "WWW" just add your web site name to find out.

If you are interested, here are the top 4 in the world. Look up their information using urlpulse.

1. Google 2. Facebook 3. YouTube 4. Yahoo

George

dp
09-10-2012, 09:50 AM
Why are you guys still using ftp? come back 1990 all is forgiven :)

You don't need to be running a server to be probed on that port. Rsync connects via ssh same as sftp and is hardly deprecated. But ssh, scp, sftp, and rsync all use port 22 and it gets hammered all the time. We provide what ever services our customers wish except the unsecured Berkeley 'r' services.

dp
09-10-2012, 09:54 AM
My passwords do not appear in the rockyou password database which contains 14 million lines of passwords culled from data breakins all over the net including Sony and other similar hacks. I always check my passwords against that database as well as the Cain database and a couple of others. There is one program that can search a database that size in a few seconds, Notepad++. Most others will choke to death on it.

I don't allow anonymous FTP either. You get three tries from the same IP and then it locks you out for 10 minutes.

Have you tried putting robots.txt in the greeting?



https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

I don't know if they will recognize it there but they might.

There is no way to fetch a robots.txt file unless you are logged in which can't happen.

Evan
09-10-2012, 11:50 AM
You can put the "no robots" text in the FTP greeting. Whether they pay attention is another matter.


Race Consulting comes up #5 here. One of the reasons is that I read in a bunch of languages so it will show me languages besides English.

http://ixian.ca/pics9/race.gif

aboard_epsilon
09-10-2012, 12:16 PM
Those who pay google or put google add-choices on their site get bigger priority .

i got my sites up the rankings by reciprocal links

and adding it to http://www.dmoz.org/add.html

far as in concerned ..that directory does it all.
oh...and a YOUTUBE video gets you up there as well.

all the best...markj

MrFluffy
09-10-2012, 03:48 PM
You don't need to be running a server to be probed on that port. Rsync connects via ssh same as sftp and is hardly deprecated. But ssh, scp, sftp, and rsync all use port 22 and it gets hammered all the time. We provide what ever services our customers wish except the unsecured Berkeley 'r' services.
Rsync with SSH as the TLS isnt deprecated, its very useful and still fit for purpose and we use the -essh option with keys extensively. Vanilla old port 873 rsync is, and thats what I meant, and nobody better be doing it over t'internet with any expectations of privacy today.

And your stack DOES need to be listening on that port to respond, or inetd would just respond closed, or just drop the packet on the floor (much better default, slows down portscans and attackers nicely). You cant stack smash or bruteforce a port that hasnt got a daemon listening behind it. Or drop it on the loadbalancer or upstream firewall before it even gets there if the server isnt configured to listen on it.

MrFluffy
09-10-2012, 03:52 PM
If you are curious about the ranking of your or any other web site, you can find out here.
http://www.urlpulse.com
Lots of other good information as well.

Where it says "WWW" just add your web site name to find out.

If you are interested, here are the top 4 in the world. Look up their information using urlpulse.

1. Google 2. Facebook 3. YouTube 4. Yahoo

George

You can also go directly to a target markets regional variant of google. For instance, www.google.co.uk, www.google.fr etc. You will then get served that regional page instead of being redirected to the .com or your country variant if your outside the USA. Also useful if you want to do some searches and your travelling in ulan betor and dont speak mongolian or something :)

SteveF
09-10-2012, 04:43 PM
..
Google also dislikes hidden content such as white text on a white background. Not only will it be ignored, it will also result in a lower rating.


What would be the reasons why someone would put hidden content on their web site?

Steve

Evan
09-10-2012, 06:23 PM
Simple. Use tags that aren't relevant to attract traffic like a bunch of porn related hidden content. It boosts hit numbers which is handy if you are trying to sell advertising.

dp
09-10-2012, 07:56 PM
And your stack DOES need to be listening on that port to respond, or inetd would just respond closed, or just drop the packet on the floor (much better default, slows down portscans and attackers nicely). You cant stack smash or bruteforce a port that hasnt got a daemon listening behind it. Or drop it on the loadbalancer or upstream firewall before it even gets there if the server isnt configured to listen on it.

So I've been at this now since the mid 1970s and have some experience. I've run many of Seattle's largest data centers. I think I said you didn't need a listener for the blackhat to try. But because my customers want it I do have listeners and I use lots of tools to keep the asshats out. What I really like doing is catching them at it and sharing their IP with other admins. If they're trying ftp you can be sure they're trying everything. For that my listeners have active blacklisting which works a treat. I also run rsyncd in a VPC where is is both perfectly safe and very useful.

We can probably bore this group to tears with what we know about networking. As we both know, nobody is safe.

We now return the topic to the OP ;)