Archive for the ‘MySQL’ Category
Make a Search Engine in PHP and MySQL
Why would you want to make a search engine anyway? There already is a search engine to rule them all. You can use Google to find just about anything in the Internet and I doubt you will ever have the same computing and storage capabilities as the big G.
So why then make your own search engine?
To make money of course!
… and to become famous as the creator of the next big search engine or because as a programmer or engineer you like challenges. Making a search engine for the public Internet is tricky and if you’re like me you like to solve tricky problems.
The third application is a customized, high speed site search for you large
thousands of pages website. An indexed search engine will be a lot faster than
a full text search function and if Google’s site search isn’t flexible enough
for your site you can make your own search functionality.
THE BASICS OF SEARCH
The basis of any BIG search engine is a word to web page index, basically a long list of words and how well they relate to different web pages.
To make a search engine you have to do four things:
Decide what pages to fetch and fetch them Parse out words, phrases and links from the page Give a score to every keyword or key phrase indicating how well the phrase relates to that pages and store the scores in the search engine index Provide a way for users to query the index and get a list of matching web pages
This is not hard for a seasoned programmer. It can be done in a day if you know regular expressions and have some experience with HTML and databases.
Now you have a working search engine, just add a lot of computers and hard drives and you’ll soon index all of the Internet. If you’re not prepared to go that far a one terabyte disk will hold an index of about 50 million pages.
HOW TO SCORE PAGES
After completing basic search functionality there’s a lot of work before anyone will want to use your new machine.
An index is not enough. What’s challenging is how to score pages to give the end user the search results that’s most relevant to his idea of what hi is searching for.
You’ll need to decide how much weight to put on keywords in the tile tag, description and main web page contents. To make good scoring you will also want to boost keywords found in the URL of the page and check the anchor text of inbound links.
Keeping track of inbound links is the most useful and most challenging of the above, you’ll need to keep a separate database table with info on all links between pages you index.
WHAT TO INDEX AND NOT TO INDEX
Other obstacles you will find when you start indexing real Internet content is the fact that there is wast amounts of useless junk floating around everywhere and eventually your index will become full of spam, affiliate pages, parked domains, work in progress homepages without content, link farms used by search engine optimizers, mirror sites using data feeds to create thousands of pages with product listings or other reproduced content etc, etc…
When indexing from the Internet you will have to find ways to filter out the junk content from what people are actually reading and searching for.
To start with you could limit how deep into sub directories you crawl, how many link hops from a domain index page you crawl and how many links per web page to allow.
PARSING WEBSITES
There’s a million ways, both right and wrong to write HTML and when you index from the Internet you will need to handle all of them.
When parsing keywords from pages you not only need to handle the complete HTML standard but also all the non-standard ways that is unofficially supported by Internet browsers.
To be able to read all pages you will also need to parse client side javascript, handle frames, CSS and iframes.
This is a large part of the work on a general search engine, to be able to read all sorts of content.
WHY SO MANY URLS?
Finally you’ll need to deal with the fact that many websites have many URLS pointing to the same web page. Just look at this example:
dmoz.org
www.dmoz.org
dmoz.org/index.html
www.dmoz.org/index.html
All those URLs point to the same web page. If you don’t make special code to handle that you’ll soon have 4 results in your search engine (one for every URL) all going to the same page. Users will not like you.
There is also the possibility of query strings where a session ID after the question mark in the URL will create almost infinite URLs for the same web page.
google.com?SID=4434324325325
google.com?SID=4387483748377
google.com?SID=7654565644466
To the search engine there will be a really big number of pages all containing the same content.
The quick fix of course is to not index pages that include a query string. Or to strip the query string from pages. This works but will also remove a lot of legitimate content (think forums) from your index.
You now have all the information you need to make a site search engine. If you’re going for a general Internet search engine there’s a lot more details you need to include. Like robots.txt, site maps, redirects, proxies, recognizing content types, advanced ranking algorithms as well as handling terabytes of data.
I’ll cover more detail in a future article. Good luck with your next search engine project. engine algorithms.
Dbconvert For Mysql & Postgresql
If you are database manager or database server administrator, even if you are expert in database design DBConvert tools can simplify your daily work with routine data processing. The main idea of converter is the distinguishing any details in database structure, among data types and relations between database elements in source and target.
DBConvert line performs database migration tools that accomplish data import and export in both forward and backward directions. DBConvert is specialized in cross-database migration between multiple databases. No matter which database you use, your db structure and data will be accurately replicated.
In Sync group we picked together utilities meant for comparison and synchronizing data in different databases and for comparison Identical, Different, Missing and Additional records. Realized in programs as Insert sync, Update sync and Drop sync features you are allowed to accomplish overall synchronization.
DBForms series holds a special place in DBConvert line. It includes innovative tools that able to turn your MS Access forms to web pages written in PHP and available for viewing through Internet browser. MS Access tables in their turn are converted to MySQL or PostgreSQL database which connects to adjusted PHP module. DBForms from MS Access to ASP.Net + MS SQL converts mdb file (Microsoft Access database) to ASP.Net web-pages with MS SQL back end. Our unique tool easily transforms Access forms and their parts to .aspx pages.
With integrated Data Mapping feature you can easily match one data type to its closest and relative equivalent in target fields. Compatible data type assignment makes your conversion more flexible.
Useful Tools for Btrieve, Pervasive.sql, and Other Sql Server
Now you can find the useful and inexpensive software working with Btrieve, Pervasive SQL, mySQL, Firebird SQL and more. Next popular and inexpensive system working on the platforms NetWare, Windows, DOS.
Database Manager for Pervasive SQL Version 2.1 will allow to get a full supervision on your database Pervasive SQL. Easy way to operate a database using SQL Script. MDI interface, management of all objects of a database – tables, views, procedures, relations, triggers and users . The Wizard of Export and Import will transfer your, given,in other applications through ODBC.
New ! Now you can change the dictionary DDF dynamically. It is free ! Download library here . Want custom-made library? Inform about it.
Also today subsequent programs are offered to your attention :
Btrieve Grid Control – will allow quickly to create the programs which will edit the data in files Btrieve.You can edit your files in a grid or form, search data etc.The version is accessible for uses in VC and VB.
DDF Editor for Btrieve – Allows to create dictionaries of the description of datas, create, view, edit, export and import Btrieve files. Version 2.0 supports all types of the data Pervasive SQL . Also DDF Editor support Btrieve 6.15 for Windows, NetWare and previous versions for NetWare. If you have only server NetWare, this program will work. As additional bonus you get API (Visual C++) for the access from 32-bit application to 16-bit client Btrieve.
Database Manager – will allow to get a full supervision, management and administration on your database Pervasive SQL. You will easy control users, groups by privileges on tables and column, create and edit procedures and triggers, create and edit your tables and view , import and export datas and descriptions from any source ODBC. You may execute SQL scripts. Database Manager not limited use by Pervasive SQL. Many possibility of this program applicable to any ODBC Source.
The program BtrOle – Server OLE Automation, will make sure immediate access to data Btrieve from Excel, Visual Basic.
Unix Client for Btrieve – Will set aside to have rapid access to Btrieve from any Unix systems. User manager allow access only to the chosen users and hosts.
Read more about it
http://www.cuvashi.com/business/database/