Saturday, 31 January 2015

Top Tips for Data Mining Success

You may have trieTips for data mining successd data mining before but you seem to be lost in the maze of confusion, data overload, and a number of strange terms and icons. Do not fret, you are not alone. There may be a number of first timers who are in the same boat as you do. Stop, refocus and start all over again with the following tips in mind.

It is important that proper handling of the data mining procedure must be employed. Easy as it may sound, it can only bring in great results when it is placed in the expert hands and when done according to the right patterns and processes. This is not to say that data mining is only successful for a gifted and trained few. It means serious consideration, preparation, and training must be part of the groundwork before disembarking into it.

The most practical and tested tips are: know your desired outcomes; set expectations; assign the right personnel; avoid data dump; create a deployment scheme; develop a maintenance plan.

Know your desired outcomes

As the major proprietor of your business, you of all people should have a clear view in mind of what you really want for your business. Thus, before trying on new strategies and techniques that are recommended to you, you must know what your desired outcomes are. For instance, if your business is in real estate, you must be able to foresee which direction your market should go. Are you going up on skyscrapers or towards the horizons in the countryside? From great lengths, you go to the specifics and clearly spell out what you want and where it should be.

Set expectations

In connection with identifying your outcomes, you must also set realistic and attainable expectations. These are the very things that preclude possible obstacles and frustrations in the coming years. You can see where your business is going by web research or data mining. You can see the past and present of your competitors and you can also set your own future based on the experiences of others. It is often wise to set expectations that you have not attained before. It is like plowing and preparing the ground because you know rain is coming and it is the right time to plant and gain great harvest.

Assign the right personnel

When you find the right person as well as the right data mining service, you can cut short tiresome planning, devising and preparation. If you are in a small enterprise, you can spearhead the procedure but if you have enough staff at your disposal, choose one who is not only knowledgeable but also reliable and dedicated. You do not want someone who is only a good starter and one who would leave you hanging when the going gets tough.

Avoid data dump

Being sure of what you want can help you avoid unnecessary data. Data mining like real mining is being able to know where the gold is and is able to get it done in the most efficient and effective way. Being able to identify the legal sites and reliable, well researched information is the short cut to finding the right and exact data. It would be a waste of time and effort if you are aimlessly opening and clicking on unsure and ambiguous websites. There are a lot of links that lead you to more links and are simply making money out of others’ ignorance.

Create a deployment scheme

Like any other venture, you must also be able to delegate the task as well as the information that you gather. Since you are not a superhuman, learn to seek the assistance of others and be sure that you know who to trust. In addition, you must have a classification and segregation of the needed materials so that these will be easy to locate and analyze. In other words, order and proper organization is another tip in order to achieve success in data mining.

Develop a maintenance plan

Finally, along with orderliness and efficiency, you must see to it that you have an effective maintenance plan. What to do with old data and where to store the vital ones are concerns that need to be considered too. In addition, there is a need for a watchdog in the whole duration of your business venture. This will not only assure you of security of your data but also keep you on healthy and solid ground. This maintenance can be both a cleaning and healing spot for your business’ overall life and sustainability.

So much can be said about how to go about with your business using data mining but there is a factor that is uniquely your own. Above and beyond all these techniques and strategies, trust your instincts. You are the better judge of your desires and actions; thus, you must spend time alone in reflection, contemplation and retrospection. Being silent and alone can make you see things that are missed among all the movements and noise. Once in a while, leave the scene and look objectively at your work. Remember, there is wisdom in alienation and objectivity.

Source: http://www.loginworks.com/blogs/web-scraping-blogs/213-tips-for-data-mining-success/

Wednesday, 21 January 2015

How to Deal with Content Scrapers

There are few approaches that people take when dealing with content scrapers. The Do Nothing Approach, Kill them all approach, Take Advantage of them approach.

The Do Nothing Approach

This is by far the easiest approach you can take. Usually the most popular bloggers would recommend this because it takes A LOT of time fighting the scrapers. This approach simply recommends that “instead of fighting them, spend your time producing even more quality content and having fun”. Now obviously if it is a well-known blog like Smashing Magazine, CSS-Tricks, Problogger, or others, then they do not have to worry about it. They are authority sites in Google’s eyes.

However during the Panda Update, we know some good sites got flagged as scrapers because google thought their scrapers were original content. So this approach is not always the best in our opinion.

Kill them all Approach

The exact opposite of the “Do Nothing Approach”. In this approach, you simply contact the scraper and ask them to take the content down. If they refuse to do so or simply do not reply to your requests, then you file a DMCA (Digital Millennium Copyright Act) with their host. In our experience, majority of the scraping websites do not have a contact form available. If they do, then utilize it. If they do not have the contact form, then you need to do a Whois Lookup.

Whois Lookup

You can see the contact info on the administrative contact. Usually the administrative, and technical contact is the same. The whois also shows the domain registrar. Most well-known web hosting companies and domain registrars have DMCA forms or emails. You can see that this specific person is with Hostgator because of their nameservers. HostGator has a form for DMCA complaints. If the nameserver is something like ns1.theirdomain.com, then you have to dig deeper by doing reverse IP lookups and searching for IPs.

You can also use a third party service for DMCA.com for takedowns.

Jeff Starr in his article suggest that you should block the bad guy’s IPs. Access your logs for their IP address, and then block it with something like this in your root .htaccess file:

1    Deny from 123.456.789

You can also redirect them to a dummy feed by doing something like this:

1    RewriteCond %{REMOTE_ADDR} 123\.456\.789\.

2    RewriteRule .* http://dummyfeed.com/feed [R,L]

You can get really creative here as Jeff suggests. Send them to really large text feeds full with Lorem Ipsum. You can send them some disgusting images of bad things. You can also send them right back to their own server causing an infinite loop which will crash their site.

The last approach that we take is to take Advantage of them.

Source:http://www.wpbeginner.com/beginners-guide/beginners-guide-to-preventing-blog-content-scraping-in-wordpress/

Tuesday, 6 January 2015

Data Mining - Techniques and Process of Data Mining

Data mining as the name suggest is extracting informative data from a huge source of information. It is like segregating a drop from the ocean. Here a drop is the most important information essential for your business, and the ocean is the huge database built up by you.

Recognized in Business

Businesses have become too creative, by coming up with new patterns and trends and of behavior through data mining techniques or automated statistical analysis. Once the desired information is found from the huge database it could be used for various applications. If you want to get involved into other functions of your business you should take help of professional data mining services available in the industry

Data Collection

Data collection is the first step required towards a constructive data-mining program. Almost all businesses require collecting data. It is the process of finding important data essential for your business, filtering and preparing it for a data mining outsourcing process. For those who are already have experience to track customer data in a database management system, have probably achieved their destination.

Algorithm selection

You may select one or more data mining algorithms to resolve your problem. You already have database. You may experiment using several techniques. Your selection of algorithm depends upon the problem that you are want to resolve, the data collected, as well as the tools you possess.

Regression Technique

The most well-know and the oldest statistical technique utilized for data mining is regression. Using a numerical dataset, it then further develops a mathematical formula applicable to the data. Here taking your new data use it into existing mathematical formula developed by you and you will get a prediction of future behavior. Now knowing the use is not enough. You will have to learn about its limitations associated with it. This technique works best with continuous quantitative data as age, speed or weight. While working on categorical data as gender, name or color, where order is not significant it better to use another suitable technique.

Classification Technique

There is another technique, called classification analysis technique which is suitable for both, categorical data as well as a mix of categorical and numeric data. Compared to regression technique, classification technique can process a broader range of data, and therefore is popular. Here one can easily interpret output. Here you will get a decision tree requiring a series of binary decisions.

Our best wishes are with you for your endeavors.

Source:http://ezinearticles.com/?Data-Mining---Techniques-and-Process-of-Data-Mining&id=5302867