Sunday, October 21, 2007

Downloading Pages via SSL and proxy using LWP Perl

If you use LWP modules in perl then you can download pages via HTTP easily using a proxy. However if you wanna download pages with HTTPS through a proxy then it becomes a problem.
This is because both proxy and HTTPS are not implemented simultaneously in LWP modules although individually they work quite well.
But there is a way around. To download paged via HTTPS you have to give LWP the impression that you are not using any proxy. Thus you set the environment variable HTTPS_PROXY as a valid proxy and dont set anything as the proxy in the request object for https. This would make it work like a charm. :)
Link to above Funda.

Thursday, June 28, 2007

Recovering deleted files in an ext3 File System

I was working on my Debian machine few days ago when accidentally I pressed "rm -rf *" in a wrong directory(my main working directory). Well, I lost all the data. And after hunting net for hours, find out that there is no traditional method available for recovering files in ext3 file system.
In Ext2 file system, OS marks the blocks, inode and directory entry associated with the deleted file as unused. It did not use to delete the links, so that other files could still use them. And because of this, recovery of lost files were possible using certain tools like debufs, e2undel, recover, Midnight Commander etc. Giis-gET iT i sAY also claims to recover files from both ext2 and ext3 file system but I don't know it works or not.

In Ext3 file system, when the file is deleted, file size and block addresses in inodes are also deleted. Therefore, its not possible to find out where the file was originally located or more specifically it becomes difficult to track back the inode of the file using debugfs.
But, if you are lucky, file could still be there in your hard drive, though in chunks. Steps you must follow -

  • Do not save any file in the same partition in which your deleted files were. This might result in overwriting of occupied blocks of deleted files.
  • Mount another drive to the system(if not mounted) to save the dump.
  • Find the Partition in which deleted files was there using `mount -t`
  • If you know certain lines from the file, which are more or less unique to the file you deleted, then it might be useful. In my case, my Perl code began with "package rankSentences;", so I used that as the identifying text.
  • Try the command =>
grep -a -B[lines before] -A[lines after] ‘identifyingText’ /dev/[partition] > dump.txt

here,
  • -a is for treating text file as if it is binary file.
  • -A Number of lines of the dump that you want to extract after the "indentifyingText" (which was "package rankSentences" in my case.
  • -B Number of lines before "identifyingText" you want to extract.

If you don't specify "identifyingText" this might result in dump.txt to be very large in size as it will start dumping the data of complete partition to that file.
And ya, be patient, it might take some time depending upon the size of your partition. After running that command you can look for chunks of your deleted file in dump.txt. And if you are lucky, you might find the complete file at one place. :)

--compiled from various sources on net.

Friday, June 08, 2007

Recovering Lost Emails from Google Desktop Search's Cache

This post is for you if you
  • Have accidentally deleted mails and
  • Use Google Desktop Search (GDS) and
  • Were fortunate or sensible enough or possibly farsighted [:)] to have enabled email caching in GDS and
  • Want to retrieve your mails

Andrew Hardwick has written a very simple and useful script for the above task. His script was tested on GDS version 2 (build 20051208-en). Refer to the instructions given on his page to get the details of the script.

I recently needed to do similar task for someone who was using Google Desktop 5.1.0705.04505-en-pb. The above script worked just fine with slight modifications in the regular expressions. I am uploading the modified script in here and a copy of Andrew's Instructions here.

Enjoy !!

Friday, January 05, 2007

Stock Index

What does the stock Index indicate?

First, let’s look at what an index number represents. Although there are different ways to calculate index numbers, it is important to remember the numbers represent a change from an original or base value.

The number is not important. What is important is the percent change over time. This movement up or down gives you and idea of how the index is performing. Is the Dow up or down? The index is calculated “on the fly” during trading to give investors a sense of direction to the market it represents

The sensex's calculation involves dividing the total market capitalisation of 30 companies in the index by the index divisor — the only link to the original base period value of the sensex.

These 30 companies are taken as the representative of the Market. There may be more than 30 companies registered in the exchange.


Spell Checkers

Ever Wondered how search engines like Google, Yahoo or for instance Microsoft word Does spell checking.

Well the basic idea is to get Levenshtein or Damerau-levenshtein distance between the mis-spelled word and the other word. But do we find the distance with all the other words that are in the vocabulary. The answer is No. We reduce the subset of the words to checked. In the Search Engines it can be the most recent or the most used words and there are other techniques to do so.

Some insight into Google spell checking.
Spelling Suggestion at Wiki.

Tuesday, January 02, 2007

Casting Out Nines

How do you check the answer of a large multiplication like 8216*4215=36430440 without actually doing the multiplication.

Sum 8+2+1+6=17. Take modulo 9 = 8.
Sum 4+2+1+5=12. Take modulo 9 = 3.

Sum 3+6+4+3+4+4=24. Take modulo 9 = 6.

Now check 8*3=24 and 2+4 = 6.

Austrian Method of Subtraction

How do you subtract 5592 from 11149. Simple question....

There's an interesting method which involves only addition as it is easier to do than subtraction. Add from the amount to be subtracted upto the next higher unit, then to the next and so on.

Example 5592 + 8 = 5600. 5600 + 400 = 6000. 6000 + 5000 = 11000. 11000 + 149 = 11149.
so the answer is 8+400+5000+149 = 5557.

Pascal's Triangle

There's a very good article about Pascal's Triangle in Wiki. I am just enumerating some important points for quick reference.

- Construction of Triangle, value of the element is equal to the sum of the above two adjecent values.

- Sum of the elements of the mth row is equal to 2 raised to power m-1.

- Elements of mth row correspond to coefficients of 11 raised to power m-1.

- Generally the elements of mth row are the coefficients of the binomial expression (x+y) raised to power m-1.

- The diagonals next to the edge diagonals contain the natural numbers in order. Moving inwards, the next pair of diagonals contain the triangular numbers in order.

- Starting at the top 1, without backtracking or going sideways, try to get to another node via these grid paths as many ways as possible. The answer is whatever number the node has.

- Sum of a shallow diagonal is a Fibonacci's number.

- In rows where the second number (the 1st number following 1) is prime, all the terms in that row except the 1s are multiples of that prime.

- Pascal's triangle can be used as a lookup table for the number of arbitrarily dimensioned elements within a single arbitrarily dimensioned version of a triangle (known as a simplex).

- No one knows whether any number occurs 5 times in a Pascal's triangle.

Monday, January 01, 2007

Hats

There are 5 persons in a line, all facing in the same direction. There are 5 hats, 2 white and 3 black. A person can't see the color of his own hat, but can see the color of the hat of all the people standing in front of him.

The Third/middle person was asked to tell the color of his own hat. He couldn't answer. Then the second person( the person in front of the middle one) was asked the color of his own hat. He too couldn't answer. Tell the color of the Hat of the first person in the line?


===========================================

Solution:

Middle person couldn't tell the color. It means of the two persons standing in front of him at-least one of them was wearing a black hat.

Second person couldn't tell the Hat's color. He knows that there has to be at-least black hat either on him or on the first person. If the first person is wearing white then he is surely wearing black. But if the first person is wearing black, then he can't tell. Thus the first person is wearing black.

Salaries of three persons

There are 3 persons A,B,C and their salaries are X, Y, Z respectively.

Problem: Device a scheme so that all three of them know X+Y+Z, by asking each other questions.

Constraint: None should come to the other one's salary, i.e. A should not know Y and Z and so on...


===============================================

Solution:

C tells B that its salary is K (a fake value)
B tells A the total K+Y
A tells C the total K+Y+X
Finally C calculates X+Y+Z and tells it to everybody.


Saturday, December 30, 2006

Calculate n-1

Well the problem is : your input in 'n' and you have to calculate n-1 without using the '-' opertaor... simple one I guess

The solution I thought was:
/*
sum=0;
while(i < a="~0;" b="n+a;">

Quine

Have u ever come across a code that prints its own code as an output....

these type of codes are calle Quines... and here is a link containg many single liners .

but the most simple one in linux would be
int main()
{
system("cat filename.c");
}
I tried this code.. it compiled with a warning but executed...

Aneways there is in iterestiing link which gives the world's shortest quine... if u have two spare minutes, go throught it.. it would be interesting...

Scanf

very often we have encountered problems where a code fragment like
for(i=0;i<3;i++)> simple... the stray '/n' escape sequence remains in the input stream and is taken by the second scanf. One of the ways is to use a getc() function before second scanf, but not really good one other solution is to use fgets from stdin ... this is a nice solution as it takes the whole string from the input stream including '\n' note that we can't simply flush the input stream by fflush(stdin) as its behaviour on input stream is undefined... we can use it to flush the output stream..

Increment Operators

is the C statement /*printf("%d, %d, %d\n", i,i++,++i)*/ a valid statement?
Think hard before arriving to any conclusions....

Well its not!!

firstly ehat do u expect the evalution order of functions a(), b() and c() in the following statement
printf("%d,%d,%d",a(),b(),c());

well by the C standards the behaviour is *unspecified* i.e. the arguments of a function may be evaluated in any order that may also be interleaved.... now this unspecified behaviour of evalution of the arguments may lead to undefined behaviour and hence the ambiguous result.

Secondly there is no sequence point between the different expressions(i, ++i, i++) and according to C standards modifying the value of an expression more than once between two sequence points leads to undefined behaviour and hence explains the ambiguous result.

try printf("%d", (i++,++i)); note: there is a sequence point in the expression.

sizeof()

Well found something new about the sizeof()

firstly it is an unary operator and shouldn't be mistaken for a function

secondaly, what do you expect the output of
int i=10;
j=sizeof(i++);
printf("%d, %d",i,j);

well the value of i doesn't get incremented as sizeof is a compile time operator and produces a compile-time integer constant value. The expession inside sizeof() is only expected to deduce its type and is not fully evaluated.... should have known this though!!!!

also found a new thing today.... sizeof('x') returns 2 or 4 but not 1. This is because by definition, in C, character constants are integers and not characters!!!! I also found this somewhere that sizeof("") will give the size of char in both C and C++, but haven't tried it yet.

Also,
char chr[1000];
sizeof(chr) will give 1000 and not 4.

gets()

Well we all know that whenever we compile a file containing a refference to gets() function, the compiler gives a warning that using gets() might be dangerous.....

The reason is very simple... gets simply copies the input string on to the program stack without inpecting its size... for example the code...
void enterstring()
{
char string[FIXEDLENTH];
gets(string);
}
now when the function is called then the space for the local variable string gets allocated in the program stack. If the length of the input is now greater than the FIXEDSIZE then the stack gets over written... but how this is dangerous???

Well one can very easily modify the flow of control in a program by just buffer overflow... consider for example... when the function enterstring was called then the stack contains the return address in to the main function.. now I could input the buffer in such way such that my buffer contains a malicious machine code and also changes the return address in the stack to a position in the buffer from where my malicious code starts... bas ho gaya kam.... or even if you cant write a machine language code in the buffer than also you could change the return address and thereby modifying the program execution...

I have a very interesting link which gives the above task as an lab assignment!!!!! Its really COOL have a look.... and for those who really wanna hack theres an another link .

string inside scanf()

what will the following statement do???

scanf( " %d %d" + scanf("%d %d",&i,&amp;j) ,&a);

the inner scanf will take two integers as input and storing them in i and j respectively and returning 2.

the outer scanf's expression will now become "%d %d"+2 which will become " %d".... how??? now the expression inside the " " in scanf in considered as a string and is stored in some part of the memory and when we call the scanf function then a pointer to this memory location is passed on to scanf... but in this case the pointer gets incremented by two and thus points to the string " %d" rather than "%d %d", and since the former one is a legitimate expression for scanf the function takes one input in a.

not a big deal

what would be the output of the following code statement

main(_){for(--_;putchar(_++["J!Mpwf!Zpv\1"]-1););}

think, think.....

Well the output is "I Love You" :-)

How?? It goes as follows: firstly the underscore in the main is a variable equivalent to argc but I cant understand thet why the code doesn't give warning in the main, beacuse either ir should have both the arguments of main in it or none of them....anyways since we run the program by the command ./a.out the argc value is 1 and hence '_' is initialized to 1
going further in the for loop the variable _ gets its inital value 0. Now there is something important to understand... a[i]=*(a+i)=*(i+a)=i[a] thus in C it doesn't matter that what is the base and waht is the subscript.... thus the *thing* inside the putchar can be understood as arbitString[_++]-1 which goes on printing the the elements of the string -1 which comes out to I Love You and stops when the string ends.... simple... isn't it.

P NP

Well I have always been consufed with NP, P CO-NP and all such funda.... so today decided to get the basic funda clear about this:

NP: It is the class of problems that can be solved by non deterministic finite automata by making a polynomial number of guesses and the solution can be verified in polynomial time.

P: is a set of problems that can be solved in an polynomial time thus concluding that if a problem is P then it is NP

now the question comes that whether NP=P.... the answer to this is uncertain as P is the class of problems that can be solved "easily" in polynomial time while NP is a class of problems that can be varified in a polynomial time and hence they are considered to be different.

NP - complete: a problem is said to be NP - Complete if it is NP and every other NP problem can be reduced to it in polynomial time.... now its significance is that to show how hard a particular problem is: i.e. we can prove that for a particular problem no polynomial time algorithm exsists by contradiction if we reduce a problem for which no polynomial time solution exsists to it in polynomial time.... and finally if for any NP complete problem we find a polynomial time algorithm then since all the NP problems are reducible to it, all of them will have a polynomial time solution and hence NP class will now become equivalent to P and thus existance of NP complete problems is a factor which makes people believe that NP != P....

finally if a problem is not NP but all the NP problems are reducible to it in polynomial time then it is known as NP- hard.... which implies that a NP hard problem is atleast as hard as any other NP problem, infact it might be harder.



Memory Leak

In simple terms memory leaks may be defined as the loss of any memory allocated during the program... for a more precise and informative defineation refer here.

A very simple code exhibting memory leak:

int main()
{
int *p,q;
p=(int*)malloc(sizeof(int));
q=10;
p=&q;
printf("the value of p=%d",*p);
}

in the above case the memory allocated to p is lost in line number 6 and may be considered as a memory leak.

Goto

It has always been suugested to not use Goto in our programs, instead use for while and other stuff like it. why??

firstly, using goto may insert labels into the program and when we encounter a label at a particular place in the program, it becomes very difficult to say from where might have the control landed up there. It makes the understanding of the difficult and hence the debugging gets even tougher.

secondly, using constructs like for, while makes the program somewhat structured and its easy to trace the flow of control withinn the code... also using these constructs maps the code to real world scenario where a conditional statement may be represented as an decision and a for while may be regarded as repeating sometimes over and over.... but what does Goto represent in real world scenario??

Thirdly, making use of too much Goto may result in speghetti code . There is also a story associated with goto. have a look.

Twelve days of Christmas

This is my fisrt post about obfuscated programming and this stuff seems to be amazing...... at first when I saw this code , I was amazed to see its output ..... kaise ho sakta hain yaar... Then finally found the solution to it... and it was really nicely explained and dealt with... and about the author of the code... he might be a real genius I might say....

During the study of how the program works, I learned about listed expressions and how the value of the whole expression becomes that of the last expression. Aneways the links are nice going through if you have some time.

Puzzle

consider the following code fragment:

void fun()
{
}

int main()
{
int i=10;
fun();
printf("%d\n",i);
}

Now the task is to write something in function fun() such that main prints any other value than 10.... note that the value is to be printed in main()
think.....

Well the answer is as follows:
void fun()
{
#define printf(a,b) printf("20\n")
}
simple... isn't it :-)

To find the greater of two number

Yes, but without using comparison operators!!!!

the first solution came from Mayank... which was:

int max(a,b)
{
if(a/b)
return a;
else return b;
}

The above would not work if b=0... now the second one came from Muneish which was:

x=(a-b)/|a-b|

if(x+|x|)
return a;
else return b;

Now this one is also wrong since a-b could cause overflow...
Well the correct solution goes as follows:

void compare (int a, int b)
{
int diff, msb;

msb = 1 << (sizeof(int)*8 - 1);

if ((a & msb) &amp;& !(b &msb)) /* a is negative and not b */ return b; else if ((b & msb) &amp;amp;amp;amp;amp;& !(a &msb)) /* b is negative and not a */ return a; /* a and b are of same sign */ diff = a-b; if (!diff || !(diff & msb)) /* a==b and a > b */
return a;
return b;
}

I have shamelessly copied this code from Joju John, but was worth the search... A point to be noticed is that msb and diif variables are taken as int but you have to make there types as same as that of a and b.

Puzzle

You are given an array with n elements { x1, x2 ... xk ... xn }. You are also given an array location k. Using constant space and O(n) time, rotate the array such that it contains { xk, xk+1 ... xn, x1, x2 ... xk-1 }.

Well the first thoughts which came to my mind were those involving swapping the elements and one can also achieve the solution this way, but its not the best way. The best way out is as follows:

We know that reversing of a list can be done in constant time. Thus do as follows:
1.reverse the list from x1 to xk-1 {xk-1, xk-2, ....., x1, xk, ...... , xn}
2.reverse the list from xk to xn {xk-1, xk-2, ....., x1, xn, ......., xk}
3.reverse the whole list {xk, xk+1, ..... xn, x1, x2, ......., xk-1}

Nice na!!!!

Simple One

Consider the following program
#include
void main()
{
if(.....)
printf("AB");
else
printf("C");
}
Q: fill in the blank within if condition so that o/p is ABC

here is ans

if(!printf("AB"))
------
else
----


:)

Pointers and Reference

Although they look the same but have major differences. These are said to be valid in C++. I don't know about C. The info can be found here.
Also we may say that pointers consume memory but references may be thought of as alias for an object and thus having no memory consumption butI not 100% sure on this point.

String Constants

char *a = "abc";
a = "def";

The above is correct but the code crashes when we do a[1] = 'd';

This may be because actually when we say char * a ="abc"; then it means a is pointing to a const memory location which contains "abc" and when we say a= "def"; its again modifying pointer's value to be now pointing to some other constant location; but when u\we say a[1]= 'd' then we are modyfying a const so it crashes!
C doesnot defines behaviour when you chnage a string literal that is why it is crashing.You will end up getting different results with different compilers or on diff machines too.

Padding

static struct
{
unsigned char **segment;
unsigned int *reg1;
unsigned int *reg2;
int offset;
int offset16;
int pad[3]; /* Make structure power of 2 bytes wide for speed */
} ModRMRM[256];

wat is d significant of pad[3]????
how it become fast if structure is of power 2???



ou are taking array of structures...
so for that say u have to access jth element of array ..u do it by array[j] ...how does ur compiler do it is array+j*size of (that is *array) right!!
so if sizefo struct if power of 2 then the multiplication can be speed up by just doing shifting of bits ..accessing array elements by subscript is quite frequent...thats why this practice gives a lot of performance speedup

Size of a Datatype

how to find the size of any datatype in c ,without using sizeof operator and also without using the technique of bitwise operator?

U can use this technique

#define MY_SIZE(p) ((char *)((&p)+1)) - ((char *)&p)

The problem with this is u have to pass the variable and can't directly use as sizeof(int).

Reverse an Array

An array of size N has distinct values 1…N in random order. You have only operator called rev(X) (where X is any value from 0 to N-1) which reverses all values from 0 to X (example values 2,3,1,4 and rev(2) gets you 1,3,2,4). Our objective is to sort the array using minimum number of Rev(X) functions. How many permutations of N size array require exactly N number of rev(X) operators to get sorted?

My answer is,
1) find out max num. in an array say position i, use rev(i) and after use rev(N).
2) repeat above step by decreasing N by one each time

More Efficient

which is more efficient of these two
FOR LOOPS

for(int i=0;i<10;i++);
for(int i=0;i<10;++i);

Post increment is less efficient that Pre because in former the value is first updated and then a temporary variable having the previous value is returned. See this.

Anyways for(int i=10; i>0; --i) is the most efficient this is more efficient bcoz in this case ur comparing i with 0 rather than pre stored value(10) which takes some more operations wher as comparision with 0 can be easily done by comparing with a logical value 0.

without "" in printf

can u print any string wthout using " " in printf function?

without using " " anywhere in the program

#define str(a) #a

printf(str(Hi there));

I haven't figured out how this happens , but happens for sure!

without main

Although its not possible to write a program without main() but still it may be possible that main() is not present explicitly... as follows:

#include
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)
begin()
{
printf("Stumped??\n");
}


But main() is indirectly present.

Integer Limits

Any way to get the limits of an integer using a code?

There are two ways:

1.Simpler one

#include
#include
int main()
{
printf("Range of int: %d to %d\n", INT_MIN, INT_MAX);
}

2. Another One

upper_limit=(~(0u)>>1);
0u
fill in all 0s in the register so it looks like 000...000 (represented as MSB...LSB)
~(0u)
Now, flip all bits so it looks like 111...111
~(0u)>>1
Shift right one bit so that MSB is zero
011...111 which is maximum positive number that you can represent.

for lower range:
lower_limit=( (~0)^(~0u>>1)) (or lower_limit = ~upper_limit) logic flips 011...111 to 100...000 which is the lowest number presentable.

Maximal Munch

How is the expression x+++y parsed? As x++ + y or x + ++y ? Why?

The issue is with tokenization. The ANSI C standard requires that the longest possible sequence of characters be considered as a token. + and ++ are both valid tokens. When a +++ is encountered , the longest possible token is the ++. So, +++ is tokenized into ++ +.

On a similar note, x+++++y is always tokenized as x ++ ++ + y, and gives a syntax error. This is inspite of the fact that the tokenization x ++ + ++ y would not have an error.

This 'oddity' is known as the "maximal munch" principle.

Saturday, November 04, 2006

Web 2.0, for the layman!

We have often heard the term Web 2.0. What does it mean? Surely there's no new version of the web available and neither there are any particular group of web sites that open only in Firefox 2.0. Then, what is it?

Web 2.0 points out to the new era of web which is much different from delivering just HTML content to the user. It is a new way of thinking, it points out to the web that is getting smarter and richer as more and more people start to use it. The trend for the Web nowadays is Social interaction, user-generated content and blogs which when taken together form the next generation, user driven, intelligent web.

Soon gone will be the days when people used desktop software that came in versions. The software has become an service now. The versions are no more there, only improvement goes on. Do you know which version of Google reader or Gmail do you use, or do you know what have been the improvements going on? The whole software cycle of design-develop-test-ship-install will be finished. Software will now rely on the Beta development model, where it is continually refined and improved and the users will become beta developers. It is Web 2.0.

The above description has been taken from the Expert of the Web2.0 report published by John Musser and Tim O'Reilly called the "Web 2.0, Principles and Best Practices".

Also there is a Third Annual Web2.0 Conference from November 7-9 , 2006.

The term Web 2.0 was coined by O'Reilly's Media in 2004, pointing out the second generation of web services , where people could collaborate and share information. There are many examples of Web 2.0 services:Orkut and MySpace are two of the most popular web social networking sites, Google's Spreadsheet and documents are an another example, then there is YouTube for sharing videos and Slideshare.net for sharing slide shows. Also there's an Indian startup like burrp, where users write reviews and recommendations for the right spots, be it a pan vaala or a good restaurant......and the list is unending!!

I would like to end with line by Ross Mayfield,

Web 1.0 was commerce , Web 2.0 is people.

Biasing Web Results for Topic Fimiliarity

This post is based on a research paper by Yahoo! Research written by Omid Madani and Rosie Jones of Yahoo! and Giridhar Kumaran from University of Massachusetts. It is based on the fact that based on the user’s familiarity with the search topic, it would be appropriate to give him either introductory or advanced search results.

The findings are based on a four-fold procedure. Firstly, the definition of advanced and introductory web pages is given.

An Introductory webpage is defined as a page that doesn’t presuppose any background knowledge of the topic and to an extent introduces or defines key terms in the topic.

An advanced webpage would be one which assumes sufficient background knowledge of the topic and probably builds upon them.

Then it is shown that the definitions above hold for a set od people, by the inter-labeler agreement. Three annotators are asked to label randomized sets of results for particular queries and are found to agree about 70% of the time. Also based on their labeling it is found that the search engines have an equal bias towards both introductory and advanced web pages. Also the precision for an introductory page to be at position 1 is slightly more than 0.5, showing that search engines generally make the top result an introductory one. The work tries to improve the precision for introductory documents from the positions 1 to 10.

An experiment was performed on the introductory and the advanced documents according to Fog, Flesch and Kincaid indices. All of them marked the documents as unreadable and weren’t able to distinguish the introductory from the advanced thus showing that the reading level measures aren’t enough to distinguish the documents. Also an experiment was performed in which a query was expanded using introductory trigger words. But it was found that it didn’t bring about significant improvement in the rankings of the introductory documents.

Thus a familiarity classifier was developed using reading level measures, distribution of stop words in the text and the non text features like the average line-length. This classifier when trained could label documents as introductory or advanced. It could be used to increase the precision at the top rankings by including more results there. However relevance can’t be increased this way. But the documents can be classified at crawl time, thus addressing this problem too.

The study was able to re-rank the documents, producing a statistically significantly higher proportion of introductory documents at top most 5 positions and at topmost 10 positions, over baseline search engine retrieval. This kind of topic-independent, user-independent classifier is empowering for personalized search, as with a single change to the retrieval reranking, any user can specify whether they want introductory or advanced documents for any query.

Further work in this area would be integrate user profile to automatically know the knowledge level of the user, so that user doesn’t have to point out explicitly whether he wants advanced results or introductory results. This scheme could have majority of the10 results matching the user profile information. If the user clicks upon the minority results, its clear that he wants the opposite information. Also the classifier could include more features which help in better identification of advanced documents from the introductory documents.

The publication is available here.

Friday, November 03, 2006

Useful Keyboard shortcuts for Microsoft Word



C refers to Control
S refers to Shift

C+Home Go to first line of Page from anywhere
S+End Go to last line of a Page from anywhere
C+S+> Increase selected text in increments like the drop down font menu
C+S+< Decrease selected text in increments like the drop down font menu
C+S++ Apply superscript formatting
C+= Apply subscript formatting
C+S+] Increase selected text one point
C+S+[ Decrease selected text one point
S+F3 Change case of the letters
C+S+W Underline words but not spaces
C+E Center a paragraph
C+BkSp Delete one word to the left
C+Del Delete one word to the right
C+J Justify a paragraph
C+L Left align a paragraph
C+R Right align a paragraph
C+S+(<--/-->) Extend selection to the beginning/end of a word